Re: [DISCUSS] Pyarrow wheels for Python 3.11

2022-11-07 Thread Antoine Pitrou
Le 07/11/2022 à 15:37, Raúl Cumplido a écrit : El lun, 7 nov 2022 a las 14:14, Neal Richardson (< neal.p.richard...@gmail.com>) escribió: Two unrelated thoughts: 1. Since it sounds like we need to do a patch release for the wheels, should we include any other critical bugfixes that have

Re: [VOTE] Move issue tracking to GitHub Issues

2022-10-31 Thread Antoine Pitrou
Le 31/10/2022 à 14:17, Neal Richardson a écrit : The vote passes with 10 binding +1 votes and 9 other +1s. Thanks all. The Infra policy change takes effect on November 6, so we have this week to work through the migration and other questions. It doesn't have to be done by November 6. Let's

Re: [ANNOUNCE] New Arrow committer: Will Jones

2022-10-28 Thread Antoine Pitrou
Welcome Will, and thanks for your contributions! Le 28/10/2022 à 01:56, Sutou Kouhei a écrit : On behalf of the Arrow PMC, I'm happy to announce that Will Jones has accepted an invitation to become a committer on Apache Arrow. Welcome, and thank you for your contributions! kou

[Discuss][Python] Stop publishing universal wheels?

2022-10-27 Thread Antoine Pitrou
Hello, Currently, for macOS we're publishing both arm64, x86_64 *and* universal2 binary wheels (the latter contain both arm64 and x86_64 code in a single binary). Here are some observations from me: * Producing universal2 wheels is more complex than producing single-architecture wheels

Re: [VOTE] Move issue tracking to GitHub Issues

2022-10-27 Thread Antoine Pitrou
+1 (binding) but let's make sure we have a quality migration to keep as much of the JIRA metadata as possible. Regards Antoine. Le 27/10/2022 à 01:02, Neal Richardson a écrit : I propose that we move issue tracking from the ASF's Jira to GitHub Issues. This has been discussed on [1] and

Re: [ANNOUNCE] New Arrow PMC member: Nicola Crane

2022-10-26 Thread Antoine Pitrou
Welcome, Nic! Le 26/10/2022 à 16:37, Dewey Dunnington a écrit : Congrats, Nic! On Wed, Oct 26, 2022 at 10:10 AM Jacob Wujciak wrote: Congratulations!  On Wed, Oct 26, 2022 at 3:04 PM Larry White wrote: Congratulations, Nic! On Wed, Oct 26, 2022 at 2:31 AM Alenka Frim wrote: 

Re: [RESULT][VOTE] Release Apache Arrow 10.0.0 - RC0

2022-10-26 Thread Antoine Pitrou
Le 26/10/2022 à 07:37, Sutou Kouhei a écrit : Two more: - Make the CPP PARQUET related version as "RELEASED" on JIRA - Start the new version on JIRA for the related CPP PARQUET version I don't have admin permission on PARQUET. Could someone who has admin permission on PARQUET do them? Ok,

Re: [DISCUSS] Migrating away from Travis-CI

2022-10-25 Thread Antoine Pitrou
We can turn those builds into Crossbow builds. QEMU is not reasonable given the already long build times. Regards Antoine. Le 25/10/2022 à 03:26, Matt Topol a écrit : I'd prefer not to remove them as there are definitely known users of both architectures for the Golang libraries. Is

Re: [DISCUSS] Move issue tracking to

2022-10-22 Thread Antoine Pitrou
Hi Neal, Le 22/10/2022 à 15:35, Neal Richardson a écrit : Their email says: Infra knows this process change places an increasing burden on PMC members for managing contributors, and makes it harder for people to contribute bug reports. We suggest projects consider using GitHub Issues for

Re: archery-lint unknown targets

2022-10-21 Thread Antoine Pitrou
. From: Antoine Pitrou Sent: Friday, October 21, 2022 11:52 AM To: dev@arrow.apache.org Subject: Re: archery-lint unknown targets This probably means that you don't have (the right versions of) clang-format and clang-tidy installed. The poor error message is unfortunate

Re: archery-lint unknown targets

2022-10-21 Thread Antoine Pitrou
This probably means that you don't have (the right versions of) clang-format and clang-tidy installed. The poor error message is unfortunate. Recently we have bumped the required versions (see recent ML announcement), you need to have clang-format and clang-tidy 14. Regards Antoine.

Re: [DISCUSS] Integrate existing Spark connector for Flight

2022-10-21 Thread Antoine Pitrou
a place the code can live in the mean time? Matt Phelps From: Antoine Pitrou Date: Monday, October 17, 2022 at 2:48 PM To: dev@arrow.apache.org Subject: Re: [DISCUSS] Integrate existing Spark connector for Flight CAUTION: This email originated from outside of the organization. Do not click links

Re: C# and -1 null_count

2022-10-21 Thread Antoine Pitrou
Hello, Le 20/10/2022 à 17:32, John Muehlhausen a écrit :             if (fieldNullCount < 0)             {                 throw new InvalidDataException("Null count length must be >= 0"); // TODO:Localize exception message             } Above from Ipc/ArrowReaderImplementation.cs.

Re: [DISCUSS] Maintenance policy

2022-10-19 Thread Antoine Pitrou
Hi Kou, Le 19/10/2022 à 06:29, Sutou Kouhei a écrit : My proposal: We maintain the last major release: * We maintain 9.Y.Z when the latest major release is 9.0.0 * We may release 9.Y.Z when we find a problem such as a security vulnerability in 9.Y.Z * We drop support for 9.Y.Z when we

Re: [DISCUSS] Integrate existing Spark connector for Flight

2022-10-17 Thread Antoine Pitrou
Le 17/10/2022 à 21:27, David Li a écrit : Hey Matt, This is cool to see. To be clear, this is an implementation of Spark DataSourceV2 using Arrow Flight? I think the questions I have are: - Does this belong under Arrow, or under Spark - I lean towards it being closer to Spark than Arrow;

Re: [codegen] Dealing with conflicting names in target language

2022-10-17 Thread Antoine Pitrou
Hi Marco, Le 16/10/2022 à 21:12, - a écrit : Hi all, I've noticed the Arrow schema [1] defines some table types that clash with primitive type names in the language I'm targeting. For instance, we see `table Int{}` and `table Bool{}` , both of which are primitive types in Haskell. Are

Re: Build Frustrations

2022-10-10 Thread Antoine Pitrou
quot; "" CMAKE_CXX_FLAGS ${CMAKE_CXX_FLAGS}) On Mon, Oct 10, 2022 at 1:29 PM Antoine Pitrou wrote: Then instead pass "-D_GLIBCXX_USE_CXX11_ABI=0" when building the C++ libraries? Le 10/10/2022 à 20:20, Joseph Porter a écrit : Hi Antoine, Here's what I did: export PYARROW_

Re: Parser for expressions

2022-10-10 Thread Antoine Pitrou
I don't see the point of having two different syntaxes. Also, IMHO lisp-style is harder for many people, so I would rather a more "traditional" syntax (though Lisp is historically traditional, of course ;-)). Le 10/10/2022 à 21:10, Sasha Krassovsky a écrit : Yes that makes a lot of

Re: Build Frustrations

2022-10-10 Thread Antoine Pitrou
the c++11 directives to the CMakeLists.txt in the python module. -Joe On Mon, Oct 10, 2022 at 12:37 PM Antoine Pitrou wrote: Le 10/10/2022 à 19:27, Joseph Porter a écrit : I've tried building with explicit flags to encourage the libraries to include the cxx11 symbol (in python/CMakeLists.txt). That does

Re: Build Frustrations

2022-10-10 Thread Antoine Pitrou
Le 10/10/2022 à 19:27, Joseph Porter a écrit : I've tried building with explicit flags to encourage the libraries to include the cxx11 symbol (in python/CMakeLists.txt). That doesn't seem to impact this issue: set (CMAKE_CXX_STANDARD 11) set (CMAKE_CXX_STANDARD_REQUIRED ON) set

Re: [DISCUSS] Sponsor the upstream open source projects

2022-10-08 Thread Antoine Pitrou
Hi Remzi, Arrow is not an organization in itself, so I'm not sure how to organize that. That said, it is a worthwhile discussion, especially as several companies are investing development resources into Arrow (such as my employer - Voltron Data). Regards Antoine. Le 08/10/2022 à

Re: [VOTE] Adopt ADBC database client connectivity specification

2022-10-05 Thread Antoine Pitrou
+1 (binding), with the caveat that I looked mostly at the C API. Regards Antoine. Le 21/09/2022 à 17:40, David Li a écrit : Hello, We have been discussing [1] standard interfaces for Arrow-based database access and have been working on implementations of the proposed interfaces [2], all

Re: [EXTERNAL] Re: [C++][Python]Built-in GRPC health checks in FlightServerBase

2022-10-04 Thread Antoine Pitrou
Le 04/10/2022 à 00:54, Akshaya Annavajhala (AK) a écrit : Thanks for the important clarification - reading up on UCX, it makes sense to implement a health check abstraction. A couple follow up musings: 1. "Hosting" vs "data plane" communication protocols. Clearly this isn't something worth

Re: [DISCUSS] Apache Iceberg / Apache Hudi support in Arrow

2022-10-03 Thread Antoine Pitrou
Hi all, Le 03/10/2022 à 17:03, Will Jones a écrit : Hi Rusty, Note we discussed Iceberg a while ago [1]. I don't think we've discussed Hudi in any depth. As I see it, we are waiting on three things: 1. Someone willing to move forward the Iceberg / Hudi integration. 2. The Iceberg and Hudi

Re: [DISCUSS] Python Wheel Size

2022-10-03 Thread Antoine Pitrou
Hi Rusty, Le 02/10/2022 à 22:51, Rusty Conover a écrit : Hi Arrow Team, I'm using Apache Arrow with AWS Lambda Functions. The primary motivation is AWS Athena's user-defined functions[1]. Those functions process and return Arrow IPC segments. * The published Python wheels for Apache Arrow

Re: [Java] UTF-16 support for VarCharVectors

2022-09-30 Thread Antoine Pitrou
Le 30/09/2022 à 18:57, Kevin Bambrick a écrit : The issue I am facing is sending a UTF-16 string over the wire. Ok, then you can just transcode the strings before sending them as String, *or* you can send them as Binary (not String). Where do these UTF-16 strings come from? > What would

Re: [DISCUSS][C++] C++ API as a user-facing API

2022-09-30 Thread Antoine Pitrou
On Thu, 29 Sep 2022 11:19:44 -0700 Will Jones wrote: > In a discussion about new additions to C++ docs, someone had a question: > Should we even be documenting this? > > Long-time contributors to Arrow C++ noted that many parts were written > without the intention that those APIs would not be

Re: [Java] UTF-16 support for VarCharVectors

2022-09-30 Thread Antoine Pitrou
On Thu, 29 Sep 2022 15:19:59 -0400 Larry White wrote: > Interesting. This doesn't seem to be a Java issue, per se then. I've seen > admonations in various Arrow Java threads to always specify the Charset for > the conversion - and so assumed more than one Charset was legal - and have > written

Re: [Discuss] Deprecating Plasma

2022-09-27 Thread Antoine Pitrou
Ok, I've filed https://issues.apache.org/jira/browse/ARROW-17860 for this. Regards Antoine. Le 22/09/2022 à 17:38, Antoine Pitrou a écrit : Hello, The Plasma object store (*) hasn't received significant maintenance since at least 2020. The original authors have stopped contributing

[Discuss] Deprecating Plasma

2022-09-22 Thread Antoine Pitrou
Hello, The Plasma object store (*) hasn't received significant maintenance since at least 2020. The original authors have stopped contributing to the Arrow community and instead forked their own code for internal use inside another project

Re: [PROPOSAL] Serve stable and development versions of Arrow Cookbooks

2022-09-22 Thread Antoine Pitrou
Hello, Le 22/09/2022 à 15:27, Raúl Cumplido a écrit : I want to get feedback on the following proposal: Stable will be hosted as today at https://arrow.apache.org/cookbook and the development version at https://arrow.apache.org/cookbook/dev. In order to automate the process my initial idea

Re: [VOTE] Adopt ADBC database client connectivity specification

2022-09-22 Thread Antoine Pitrou
Hello, I would urge people to review the proposed ADBC APIs, especially the Go and Java APIs which probably benefitted from less feedback than the C one. Regards Antoine. Le 21/09/2022 à 17:40, David Li a écrit : Hello, We have been discussing [1] standard interfaces for Arrow-based

Re: unclear compilation errors with util::optional

2022-09-22 Thread Antoine Pitrou
Hi Yaron, On git master we recently moved to C++17 and therefore removed compatibility backports such as arrow::util::optional. Now you should just use std::optional. So be sure to rebase your work on master and fix any reference to those compatibility backports in your code. Regards

Re: [RESULT][VOTE] Format: Rules and procedures for Canonical extension types

2022-09-20 Thread Antoine Pitrou
The PR was submitted and merged here: https://github.com/apache/arrow/pull/14167 We can now start discussing specific canonical types! Regards Antoine. Le 30/08/2022 à 18:06, Antoine Pitrou a écrit : Hello, With 3 binding +1 votes, 3 non-binding +1 votes, and no -1 vote, the vote has

Re: apparently misleading test assertion printout

2022-09-19 Thread Antoine Pitrou
. From: Antoine Pitrou Sent: Monday, September 19, 2022 3:57 AM To: dev@arrow.apache.org Subject: Re: apparently misleading test assertion printout Hi Yaron, This is what GoogleTest does when it doesn't know how to print out a value. Guidance to fix this at: https://github.com/google

Re: apparently misleading test assertion printout

2022-09-19 Thread Antoine Pitrou
Hi Yaron, This is what GoogleTest does when it doesn't know how to print out a value. Guidance to fix this at: https://github.com/google/googletest/blob/main/docs/advanced.md#teaching-googletest-how-to-print-your-values Regards Antoine. Le 19/09/2022 à 09:54, Yaron Gvili a écrit : Hi,

Re: [C++][Gandiva] Proposal to Add A Parser Frontend for Gandiva

2022-09-18 Thread Antoine Pitrou
Hello, I would add that Gandiva does not seem to have a lot of active maintainers nowadays (as opposed to people who merely add functions for their own use cases). If you would like to take such a responsability it would probably be very welcome. Regards Antoine. Le 16/09/2022 à

Re: RLE array slicing

2022-09-15 Thread Antoine Pitrou
Le 15/09/2022 à 10:14, Micah Kornfield a écrit : I agree slicing can be tricky here. Since slicing is not part of the specification, maybe there should be two separate discussions here. I'll be honest, I forget exactly how slicing works in the C++ implementation, but is Slicing is part of

Re: RLE array slicing

2022-09-15 Thread Antoine Pitrou
On Thu, 15 Sep 2022 09:25:53 +0200 Antoine Pitrou wrote: > > Why would the run ends and the values have the same offset? > Also, how do you interpret the run ends if you have a physical offset > into the values array? > > > Say you have the logical values: [5, 5, 5, 6,

Re: RLE array slicing

2022-09-15 Thread Antoine Pitrou
Le 14/09/2022 à 20:18, Weston Pace a écrit : I will clarify the offset problem. It essentially boils down to "if you don't have constant access to elements then an array length offset does not give you constant access to buffer offsets". We start with an RLE array of length 200. We slice it

Re: [DISC] Remove Kartothek integration tests from nightlies

2022-09-13 Thread Antoine Pitrou
Hello, +1 from me. We should not have integration builds without a dedicated maintainer to look after them. Regards Antoine. Le 13/09/2022 à 10:47, Raul Cumplido Dominguez a écrit : Hi, Currently Kartothek [1] nightly builds are flaky [2]. The Kartothek project does not seem to be

[Discuss][Packaging] Ownership of conda packages

2022-09-07 Thread Antoine Pitrou
Hello, Arrow C++ and its bindings have corresponding packages in conda-forge, which are maintained in https://github.com/conda-forge/arrow-cpp-feedstock/ and https://github.com/conda-forge/r-arrow-feedstock. These package configurations are also supposed to be tested against our own

Re: [VOTE] Substrait for Flight SQL

2022-09-07 Thread Antoine Pitrou
I can make this revision as well. On Tue, Sep 6, 2022, at 12:37, Antoine Pitrou wrote: Le 06/09/2022 à 17:21, David Li a écrit : Thanks Antoine! I've updated the PR (except for the comment about timeout units, since SqlInfo values can't be doubles/floats unless we change the schema there) Can we change

Re: [VOTE] Substrait for Flight SQL

2022-09-06 Thread Antoine Pitrou
Le 06/09/2022 à 17:21, David Li a écrit : Thanks Antoine! I've updated the PR (except for the comment about timeout units, since SqlInfo values can't be doubles/floats unless we change the schema there) Can we change the schema in a backwards-compatible way?

Re: [VOTE] Substrait for Flight SQL

2022-09-06 Thread Antoine Pitrou
Hi, Sorry for the delay. I took the time to read the protobuf definitions again and posted a few (relatively minor) comments in the PR. On the principle the spec looks sound so I'm giving this a +1 (binding). Regards Antoine. Le 01/09/2022 à 01:51, David Li a écrit : Hello, I am

Re: Alluxio cache read support

2022-09-06 Thread Antoine Pitrou
Le 06/09/2022 à 09:45, Manoj Kumar a écrit : Hi Sutou Kouhei/Team *[Background]* Working on intel gazelle_plugin , It's a C++ based backend with an arrow compute engine for spark. Now during scan i.e reading data from HDFS/Cloud currently we

Re: [ANNOUNCE] New Arrow PMC member: Weston Pace

2022-09-05 Thread Antoine Pitrou
Congratulations Weston! Le 05/09/2022 à 15:30, Ian Cook a écrit : Congratulations Weston! On Mon, Sep 5, 2022 at 01:56 Sutou Kouhei wrote: The Project Management Committee (PMC) for Apache Arrow has invited Weston Pace to become a PMC member and we are pleased to announce that Weston Pace

Re: [DISC][C++] Conventions for Mentioning Classes and Methods in Documentation

2022-08-31 Thread Antoine Pitrou
Le 30/08/2022 à 21:08, Kae Suarez a écrit : I do not know about the namespace issue in the API reference, but when focusing on the User's Guide and Getting Started sections, we can announce at the top of the page what namespace is relevant. I personally recommend using only the arrow namespace

Re: [RESULT][VOTE] C++: switch to C++17

2022-08-31 Thread Antoine Pitrou
be worked on/merged immediately or should we wait until after 10.0 is out? Sasha Krassovsky On Aug 29, 2022, at 2:20 AM, Antoine Pitrou wrote: Hello, With 5 binding +1 votes, 7 non-binding +1 votes, and no -1 vote, the vote has passed. The next steps will be conceptually as follows: - require C

[RESULT][VOTE] Format: Rules and procedures for Canonical extension types

2022-08-30 Thread Antoine Pitrou
prepare a PR adding these rules to the specs chapter of the project documentation. Regards Antoine. Le 24/08/2022 à 17:24, Antoine Pitrou a écrit : Hello, I would like to propose we vote for the following set of rules for registering well-known ("canonical") extension types. * Canonical

Re: [DISC][C++] Conventions for Mentioning Classes and Methods in Documentation

2022-08-30 Thread Antoine Pitrou
Hello Kae, Le 29/08/2022 à 19:28, Kae Suarez a écrit : I personally like the idea of using namespace directives in Sphinx to keep things less cluttered and easier to write, then using the class directive each time so links are always available. I would agree with this. As for the

Re: Usage of the name Feather?

2022-08-29 Thread Antoine Pitrou
I agree with this as well. Regards Antoine. On Mon, 29 Aug 2022 11:29:45 -0400 Andrew Lamb wrote: > In the rust implementation we use the term "Arrow IPC" and I support your > option 1: > > > The name Feather V2 is deprecated. Only the extension ".arrow" will be > used for IPC files. >

[RESULT][VOTE] C++: switch to C++17

2022-08-29 Thread Antoine Pitrou
and features where desirable to reduce clutter and improve maintainability (there might be more than 3 PRs though :-)) Regards Antoine. Le 24/08/2022 à 17:31, Antoine Pitrou a écrit : Hello, I would like to propose that the Arrow C++ implementation switch to C++17 as its baseline supported

Re: [VOTE] Format: Rules and procedures for Canonical extension types

2022-08-29 Thread Antoine Pitrou
Hello, Just a heads up that more PMC votes are needed here. Le 24/08/2022 à 17:24, Antoine Pitrou a écrit : Hello, I would like to propose we vote for the following set of rules for registering well-known ("canonical") extension types. * Canonical extension types are

Re: Proposal: A Table Data Structure for Arrow Java

2022-08-26 Thread Antoine Pitrou
Le 25/08/2022 à 19:01, Larry White a écrit : Hi all, Thank you, Antoine and everyone for the feedback. It's been very helpful. The proposal has been updated to incorporate suggested changes and clarify as needed. Several people have expressed support for the idea of using a Java version of

Re: [DISC] Improving Arrow's database support

2022-08-25 Thread Antoine Pitrou
use directly. [1]: https://github.com/apache/arrow-adbc/issues/53 On Thu, Aug 25, 2022, at 12:08, Antoine Pitrou wrote: Le 25/08/2022 à 17:51, David Li a écrit : Fair enough, thank you. I'll try to expand a bit. (Sorry for the wall of text that follows…) These are the components: - Core adbc.h

Re: [DISC] Improving Arrow's database support

2022-08-25 Thread Antoine Pitrou
Le 25/08/2022 à 17:51, David Li a écrit : Fair enough, thank you. I'll try to expand a bit. (Sorry for the wall of text that follows…) These are the components: - Core adbc.h header - Driver manager for C/C++ - Flight SQL-based driver - Postgres-based driver (WIP) - SQLite-based driver

Re: [DISC] Improving Arrow's database support

2022-08-25 Thread Antoine Pitrou
On Fri, 19 Aug 2022 14:09:44 -0400 "David Li" wrote: > Since it's been a while, I'd like to give an update. There are also a few > questions I have around distribution. > > Currently: > - Supported in C, Java, and Python. > - For C/Python, there are basic drivers wrapping Flight SQL and SQLite,

Apache Software Foundation community survey 2022

2022-08-25 Thread Antoine Pitrou
(copied below is a message from the Apache Software Foundation) Hello everyone, The 2022 ASF Community Survey is looking to gather scientific data that allows us to understand our community better, both in its demographic composition, and also in collaboration styles and preferences. We

Re: [VOTE] Format: Rules and procedures for Canonical extension types

2022-08-25 Thread Antoine Pitrou
"arrow." instead of "org.apache.arrow." On Wed, Aug 24, 2022 at 12:16 PM David Li wrote: +1 (binding) Just to check, these rules will presumably be committed into the documentation as well? On Wed, Aug 24, 2022, at 11:24, Antoine Pitrou wrote: Hello, I would like to pro

Re: Proposal: A Table Data Structure for Arrow Java

2022-08-24 Thread Antoine Pitrou
Hi, Can Java developers please take a look at Larry's proposal below? As for my 2 cents as a non-Java developer: That's a detailed and well-explained proposal, thank you. My only concern is that you're proposing to implement this first as a set of contiguous vectors. The various

[VOTE] C++: switch to C++17

2022-08-24 Thread Antoine Pitrou
Hello, I would like to propose that the Arrow C++ implementation switch to C++17 as its baseline supported version (currently C++11). The rationale and subsequent discussion can be read in the archives here: https://lists.apache.org/thread/9g14n3odhj6kzsgjxr6k6d3q73hg2njr The exact steps

[VOTE] Format: Rules and procedures for Canonical extension types

2022-08-24 Thread Antoine Pitrou
Hello, I would like to propose we vote for the following set of rules for registering well-known ("canonical") extension types. * Canonical extension types are described and maintained in a separate document under the format specifications directory:

Re: DISCUSS: [Format] Rules and procedures for Canonical extension types

2022-08-24 Thread Antoine Pitrou
Le 17/08/2022 à 18:45, Joris Van den Bossche a écrit : +1 on the overall proposal, documenting those in a central place sounds good to me. On Wed, 17 Aug 2022 at 18:10, Antoine Pitrou wrote: * The specification text to be added *must* follow these requirements 1) It *must* have

Re: [DISC] Improving Arrow's database support

2022-08-19 Thread Antoine Pitrou
easier - servers implement Flight SQL, clients consume ADBC. On Fri, Aug 19, 2022, at 17:01, Antoine Pitrou wrote: I see. What is the point of wrapping Flight SQL in ADBC then? Just for consistency with other drivers? Le 19/08/2022 à 23:00, David Li a écrit : No, sorry: I meant only the API

Re: [DISC] Improving Arrow's database support

2022-08-19 Thread Antoine Pitrou
ldn't port the SQLite driver to pure C with nanoarrow but I've mostly used it as a testbed and not tried to make it a 'real' driver. On Fri, Aug 19, 2022, at 16:19, Antoine Pitrou wrote: Le 19/08/2022 à 20:09, David Li a écrit : Since it's been a while, I'd like to give an update. There are also a few

Re: [DISC] Improving Arrow's database support

2022-08-19 Thread Antoine Pitrou
Le 19/08/2022 à 20:09, David Li a écrit : Since it's been a while, I'd like to give an update. There are also a few questions I have around distribution. Currently: - Supported in C, Java, and Python. - For C/Python, there are basic drivers wrapping Flight SQL and SQLite, with a draft of a

DISCUSS: [Format] Rules and procedures for Canonical extension types

2022-08-17 Thread Antoine Pitrou
Hello all, The Arrow format has support for extension types, but there's no official way to agree accross implementations on well-known extension types. This issue has come up a couple times with people wanting to implement support for types such as JSON or UUID in order to enable better

Re: [ARROW-17255] Logical JSON type in Arrow

2022-08-17 Thread Antoine Pitrou
created a pull request introducing a canonical extension type as discussed in this thread. https://github.com/apache/arrow/pull/13901 Thanks for all the input! On Wed, Aug 3, 2022 at 10:46 AM Antoine Pitrou wrote: Le 03/08/2022 à 16:19, Lee, David a écrit : There are probably two ways

Re: [C++] Can we drop support for Visual Studio 2017?

2022-08-17 Thread Antoine Pitrou
No opposition from me. Regards Antoine. Le 17/08/2022 à 10:05, Sutou Kouhei a écrit : Hi, Can we drop support for Visual Studio 2017? Visual Studio 2017 reached EOL at 2022-04-12: https://docs.microsoft.com/en-us/lifecycle/products/visual-studio-2017 Listing| Start Date |

Re: DISCUSS: [C++] Switch to C++17

2022-08-17 Thread Antoine Pitrou
Le 17/08/2022 à 16:52, Weston Pace a écrit : Sorry for a "one more thing email" but I had one more thought regarding R 3.6 support for Windows. I think those users should continue to be able to use Arrow 10.0.0. Any particular reason why this should be 10.0 and not 9.0 for example? (is due

Re: dealing with tester timeout in a CI job

2022-08-17 Thread Antoine Pitrou
Look for ARROW_SCOPED_TRACE Le 17/08/2022 à 16:22, Yaron Gvili a écrit : There are no sleeps nor deadlocks; it's just due to a large configuration-space that I agree can be reduced by sampling. Could you explain how to use SCOPED_TEST, or refer to documentation about it? I understand your

Re: DISCUSS: [C++] Switch to C++17

2022-08-17 Thread Antoine Pitrou
Le 17/08/2022 à 10:48, Jacob Wujciak a écrit : I am generally in favour of this proposal but would like to mention that we have to be able to build on MacOS 10.13 for the R package due to CRAN using it. The CRAN builder comes with: Apple LLVM version 10.0.0 (clang-1000.10.44.4); GNU Fortran

Re: [C++] Moving from -O3 to -O2 optimization level in release builds

2022-08-17 Thread Antoine Pitrou
For the record, https://github.com/apache/arrow/pull/13661 was finally merged. It switches to -O2 by default and selectively re-enables auto-vectorization on gcc. Regards Antoine. Le 21/07/2022 à 17:11, Antoine Pitrou a écrit : Le 21/07/2022 à 16:34, Wes McKinney a écrit : Based

DISCUSS: [C++] Switch to C++17

2022-08-17 Thread Antoine Pitrou
Hello, We are in 2022 and Arrow C++ still strives to be compatible with C++11. Maintaining compatibility has caused us growing pains since third-party libraries have begun requiring C++14 or later. Boost is warning that it will soon require C++14

Re: [C++] Purpose of C++ bundled dependencies

2022-08-04 Thread Antoine Pitrou
I would welcome trimming down our hand-written dependency bundling and delegate most of the work to vcpkg or conan, but I don't know how usable and flexible those alternatives are. Somehow more knowledgeable (probably Kou or perhaps Krisztian?) should answer. (also note that using an

Re: [QUESTION] How is mmap implemented for 8bit padded files?

2022-08-03 Thread Antoine Pitrou
at runtime? Only if you do things that are alignment-sensitive. That said, while it is formally allowed AFAIK, it probably occurs rarely so potential issues (if any) are probably not surfaced. Best regards Antoine. Best, Jorge On Tue, Aug 2, 2022 at 6:59 PM Antoine Pitrou wrote: Hi

Re: [ARROW-17255] Logical JSON type in Arrow

2022-08-03 Thread Antoine Pitrou
://github.com/apache/parquet-format/blob/master/LogicalTypes.md*json__;Iw!!KSjYCgUGsB4!aTjWsSjJoE1gN7iM84QJUDoTt3F1A9BBpaLGscg9jYN26Eohr9bN8y0ccxgI8S3zLfGUjXBV2ewE9sNlK7dP$ On Mon, Aug 1, 2022 at 11:39 PM Antoine Pitrou wrote: Le 01/08/2022 à 22:53, Pradeep Gollakota a écrit : Thanks for all

Re: Replace conda with mamba in docs?

2022-08-02 Thread Antoine Pitrou
I would hope conda get their act together and improve on this. I have mixed feelings about complicating the documentation with explanations of how mamba is (often? usually?) a better replacement to conda. Generally we should focus on Arrow-specific issues and avoid distracting the user with

Re: [QUESTION] How is mmap implemented for 8bit padded files?

2022-08-02 Thread Antoine Pitrou
Hi Jorge, So there are two aspects to the answer: - ideally, the C++ implementation also works on non-aligned data (though this is poorly tested, if any) - when mmap'ing a file, you should get a page-aligned address As for int128 and int256, these usually don't exist at the hardware

Re: [DISCUSS][Format] Starting to do some concrete work on the new "StringView" columnar data type

2022-08-02 Thread Antoine Pitrou
Le 01/08/2022 à 19:13, Wes McKinney a écrit : If we start placing restrictions on how the out-of-line string buffers are managed and externalized, it risks undermining the zero-copy interoperability benefits that we're trying to achieve with this. But embedded pointers in turn undermine

Re: [ARROW-17255] Logical JSON type in Arrow

2022-08-02 Thread Antoine Pitrou
Le 01/08/2022 à 22:53, Pradeep Gollakota a écrit : Thanks for all the great feedback. To proceed forward, we seem to need decisions around the following: 1. Whether to use arrow extensions or first class types. The consensus is building towards using arrow extensions. +1 2. What do we do

Re: [DISCUSS][Format] Dynamic data encodings in the IPC format and C ABI

2022-08-01 Thread Antoine Pitrou
Potentially extending the IPC format to support these additional flexibilities is the easy part. The difficult part is to shoehorn the newstanding flexibility into existing APIs, also leaking into the expectations of downstream users. For example, in C++ it is expected that a

Re: [DISCUSS][Format] Starting to do some concrete work on the new "StringView" columnar data type

2022-07-31 Thread Antoine Pitrou
Hi Wes, Le 31/07/2022 à 00:02, Wes McKinney a écrit : I understand there are still some aspects of this project that cause some squeamishness (like having arbitrary memory addresses embedded within array values whose lifetime a C ABI consumer may not know about -- we already export memory

Re: [ARROW-17255] Logical JSON type in Arrow

2022-07-30 Thread Antoine Pitrou
Le 30/07/2022 à 01:02, Wes McKinney a écrit : I think either path: * Canonical extension type * First-class type in the Type union in Flatbuffers would be OK. The canonical extension type option is the preferable path here, I think, because it allows Arrow implementations without any special

Re: Help needed with PR #13659: Fixing build/unit test issues in msvc/win32

2022-07-22 Thread Antoine Pitrou
. This isn't great since library users may have policies that disallow warnings. On Fri., Jul. 22, 2022, 05:47 Antoine Pitrou, wrote: We could perhaps suppress the integer downcast warnings, but only on 32-bit Windows (not 64-bit, not other platforms). Regards Antoine. Le 22/07/2022 à 14:42, Arkadiy

Re: Help needed with PR #13659: Fixing build/unit test issues in msvc/win32

2022-07-22 Thread Antoine Pitrou
We could perhaps suppress the integer downcast warnings, but only on 32-bit Windows (not 64-bit, not other platforms). Regards Antoine. Le 22/07/2022 à 14:42, Arkadiy Vertleyb (BLOOMBERG/ 120 PARK) a écrit : Hi James. I don't have strong feelings about whose PR is used and how exactly

Re: [DISCUSS] Disable dependabot automated PRs

2022-07-21 Thread Antoine Pitrou
+1 for disabling. Le 21/07/2022 à 15:35, Raul Cumplido Dominguez a écrit : Hi, There was a discussion on Zulip dev about disabling dependabot alerts and updates [1] Based on this Apache INFRA wiki page we should be able to disable them [2]. There are currently several open PRs from

Re: [C++] Moving from -O3 to -O2 optimization level in release builds

2022-07-21 Thread Antoine Pitrou
Le 21/07/2022 à 16:34, Wes McKinney a écrit : Based on the discussion in https://github.com/apache/arrow/pull/13661, it seems that one major issue with switching to -O2 is that auto-vectorization (which we rely on in places) and perhaps some other optimization passes would have to be manually

Re: [C++] Adding Run-Length Encoding to Arrow

2022-07-19 Thread Antoine Pitrou
Le 08/07/2022 à 15:19, Wes McKinney a écrit : * I believe that having a Type::RLE is the right approach in C++ and it makes dynamic dispatch everywhere in the library pretty straightforward. +1 on this, as it will raise a nice NotImplemented error for existing code rather than crash or

Re: [C++] Help with Parquet backward compatibility regression between 2.0.0 and 3.0.0

2022-07-18 Thread Antoine Pitrou
Le 18/07/2022 à 03:54, Wes McKinney a écrit : This patch caused Parquet files written with 2.0.0 to be unreadable in 3.0.0 onward https://github.com/apache/arrow/commit/ef0feb2c9c959681d8a105cbadc1ae6580789e69 This was reported on June 14 on dev@ and I git-bisected to the root cause:

Re: Proposal: Unassign idle issues

2022-07-12 Thread Antoine Pitrou
On Fri, 8 Jul 2022 09:49:28 -0600 Todd Farmer wrote: > > In summary, here are the actions I propose: > > 1. Establish a threshold at which assigned, idle issues should be > unassigned and comment added. > 2. Define that threshold to be 90 days. > 3. Document the above as a project policy for

Re: Undefined symbol error using pyarrow

2022-07-07 Thread Antoine Pitrou
I don't think you need anything more on the PyArrow side, but you need to (re)compile Arrow C++ with ARROW_COMPUTE enabled, is that the case? Le 07/07/2022 à 22:16, Li Jin a écrit : Hello, I am trying to build Arrow/Pyarrow with our internal build system (cmake based) and encounter and

Re: accessing Substrait protobuf Python classes from PyArrow

2022-07-07 Thread Antoine Pitrou
what would be a proper way to avoid exposing them? Perhaps the classes should be generated into a private package, e.g., under `python/_ep`? (ep stands for external project) Yaron. From: Antoine Pitrou Sent: Sunday, July 3, 2022 3:20 PM To: dev@arrow.apache.org Sub

Re: accessing Substrait protobuf Python classes from PyArrow

2022-07-03 Thread Antoine Pitrou
I agree that giving direct access to protobuf classes is not Arrow's job. You can probably take the upstream (i.e. Substrait's) protobuf definitions and compile them yourself, using whatever settings required by your project. Regards Antoine. Le 03/07/2022 à 21:16, Jeroen van Straten a

Re: [C++] Kernel function registry evolution

2022-06-29 Thread Antoine Pitrou
reallocated memory can execute without having to touch shared_ptrs or deal with other objects with excess microperformance overhead) where such optimization can happen more easily. On Mon, Jun 6, 2022 at 4:08 AM Antoine Pitrou wrote: Le 06/06/2022 à 09:34, Sasha Krassovsky a écrit : Wow that's a lot

Re: [Nightly builds] Crossbow nightly report page announcement + next steps

2022-06-29 Thread Antoine Pitrou
On Mon, 27 Jun 2022 12:46:40 +0200 Raul Cumplido Dominguez wrote: > Hi, > > During the last months there has been some work going on in order to > improve the visibility of our nightly builds, the failures, for how long > have they been failing, etcetera. > > We started by adding some

Re: [ANNOUNCE] New Arrow committers: Dewey Dunnington, Alenka Frim, and Rok Mihevc

2022-06-22 Thread Antoine Pitrou
Welcome to our new committers! Le 22/06/2022 à 20:02, Andrew Lamb a écrit : Congratulations! On Wed, Jun 22, 2022 at 1:27 PM Dragoș Moldovan-Grünfeld < dragos.m...@gmail.com> wrote: Congratulations! Sent from my iPhone On 22 Jun 2022, at 18:13, Neal Richardson wrote: On behalf of

Re: Existence/name/scope for minimal C/C++ Arrow C Data interface helpers

2022-06-16 Thread Antoine Pitrou
Can we name it miniarrow or nanoarrow? We don't want to convey the message that there is a parallel C API for Arrow. Le 15/06/2022 à 05:18, Dewey Dunnington a écrit : Hi all, I drafted a second PR [1] drafting a design for storing parsed information obtained from a struct ArrowSchema

Re: [VOTE] Mark C Stream Interface as Stable

2022-06-08 Thread Antoine Pitrou
Le 08/06/2022 à 20:55, Jorge Cardoso Leitão a écrit : 0 (binding) - imo there is some unclarity over what is expected to be passed over the C streaming interface - an Array or a StructArray. I think the spec claims the former, but the C++ implementation (which I assume is the reference here)

<    1   2   3   4   5   6   7   8   9   10   >