Re: [DISCUSS] Developing an "Arrow Compute IR [Intermediate Representation]" to decouple language front ends from Arrow-native compute engines

2021-08-16 Thread Arun Sharma
Thank you for putting together this proposal. Very exciting development. I left some comments in the RFC doc, summarized here as: * Flatbuffer is usable as a serialization agnostic IDL ( https://adsharma.github.io/flattools/) * serde library + msgpack is a worthy candidate to consider for serializ

Re: [DISCUSS] Binary Values in Key value pairs WAS: Re: [INFO_REQUEST][FLIGHT] - Dynamic schema changes in ArrowFlight streams

2021-08-16 Thread Micah Kornfield
I agree with you any thoughts on a way forward for at least hardening the spec (or should this be done at the same time as adding the new field)? On Mon, Aug 16, 2021 at 1:45 AM Wes McKinney wrote: > I've been poking around the project, and I'm growing concerned that > our use of the KeyValue fi

Re: [DISCUSS] Clarifying interpretation of Time32/Time64 past 24 hours

2021-08-16 Thread Antoine Pitrou
PS : need to check what databases do / allow, as well Le 16/08/2021 à 23:12, Antoine Pitrou a écrit : POSIX allows for a single leap second: https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/time.h.html The Windows API does not seem to know about leap seconds: https://docs.microsoft

Re: [DISCUSS] Clarifying interpretation of Time32/Time64 past 24 hours

2021-08-16 Thread Antoine Pitrou
POSIX allows for a single leap second: https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/time.h.html The Windows API does not seem to know about leap seconds: https://docs.microsoft.com/en-us/windows/win32/api/minwinbase/ns-minwinbase-systemtime The standard Python type `datetime.time`

Re: [DISCUSS] Clarifying interpretation of Time32/Time64 past 24 hours

2021-08-16 Thread Neal Richardson
At the risk of opening a can of worms, isn't it possible that a time could exceed 24 hours? Like, when there are leap seconds added? > Some experiments inspired by an SO post[1] led me to question the meaning of time. Looks like the arrow mailing list is taking a philosophical turn :) Neal On M

Re: [DISCUSS] Clarifying interpretation of Time32/Time64 past 24 hours

2021-08-16 Thread Antoine Pitrou
Le 16/08/2021 à 20:52, Weston Pace a écrit : Some experiments inspired by an SO post[1] led me to question the meaning of time. The main question is **what happens when the value exceeds 24 hours?**. A) One potential interpretation is that these are invalid but neither the C++ implementatio

[DISCUSS] Clarifying interpretation of Time32/Time64 past 24 hours

2021-08-16 Thread Weston Pace
Some experiments inspired by an SO post[1] led me to question the meaning of time. The main question is **what happens when the value exceeds 24 hours?**. A) One potential interpretation is that these are invalid but neither the C++ implementation or pyarrow reject these today. Nor do they corr

Re: [DISCUSS][Python] Making NumPy optional dependency?

2021-08-16 Thread Antoine Pitrou
I agree that "what happens when Numpy is not available at runtime" is a rather annoying problem. I'm not sure what happens when you call one of the Numpy C API functions and Numpy is not found (crash? error return?). It can probably be detected, but needs to be done consistently at the start of

Re: [DISCUSS][Python] Making NumPy optional dependency?

2021-08-16 Thread Wes McKinney
I've thought about this in the past, and I would like to make NumPy an optional dependency, but one of the things that kept me from trying was the extent to which NumPy arrays are supported as inputs (or elements of inputs) to pyarrow.array. The implementation in python_to_arrow.cc is significantly

Re: [DISCUSS][Python] Moving Python specific code into pyarrow

2021-08-16 Thread Wes McKinney
It seems like a good idea to attempt to make this change. The most difficult thing might be projects that use the arrow/python/pyarrow.h C++ API, so we would have to provide a viable migration path for those. turbodbc is one example https://github.com/blue-yonder/turbodbc/search?l=C%2B%2B&q=pyarro

[DISCUSS][Python] Making NumPy optional dependency?

2021-08-16 Thread Alessandro Molina
As Arrow/PyArrow grows more compute functions and features we might move toward a world where the number of users relying on PyArrow without going through Pandas or NumPy might grow. NumPy is a compile time dependency for PyArrow as it's required to compile the C++ code needed to implement the pan

Re: [DISCUSS][Python] Moving Python specific code into pyarrow

2021-08-16 Thread Eduardo Ponce
I agree with this proposal, the Arrow C++ library does not need to depend on Python or PyArrow code. AFAIU this will eliminate the use of -DARROW_PYTHON build flag for Arrow C++ given that Python-related code will be compiled with PyArrow builds. Besides the use of "ARROW_PYTHON" env variable in CM

Re: [DISCUSS][Python] Moving Python specific code into pyarrow

2021-08-16 Thread Antoine Pitrou
I definitely think this is desirable. There's probably going to be a bit of work getting it to pass on all CI (including the various nightly builds). Regards Antoine. Le 16/08/2021 à 17:08, Alessandro Molina a écrit : PyArrow is currently full Cython codebase, but in reality it relies on

Re: [DISCUSS][Python] Moving Python specific code into pyarrow

2021-08-16 Thread Micah Kornfield
This seems reasonable as long as it is actually feasible (the dependencies are cleanly separable).. A while ago I had a proof of concept bazel build working that was able to automatically build the changes together. On Monday, August 16, 2021, David Li wrote: > I support this. In the past I had

Re: [DISCUSS][Python] Moving Python specific code into pyarrow

2021-08-16 Thread David Li
I support this. In the past I had to effectively do this manually to build Arrow/PyArrow in a monorepo (to build for multiple Python versions simultaneously without having conflicting copies of Arrow for each Python version). From what I remember, there's some usage of Arrow-internal headers th

[DISCUSS][Python] Moving Python specific code into pyarrow

2021-08-16 Thread Alessandro Molina
PyArrow is currently full Cython codebase, but in reality it relies on some classes and functions that are implemented in C++ within the src/python directory ( https://github.com/apache/arrow/tree/master/cpp/src/arrow/python ). Especially for numpy/pandas conversion code that has to interface with

[RESULT][VOTE][RUST] Release Apache Arrow Rust 5.2.0 RC1

2021-08-16 Thread Andrew Lamb
The vote passes with 3 +1 binding and 1 +1 non-binding The release is available here: https://dist.apache.org/repos/dist/release/arrow/arrow-rs-5.2.0 The release has also been published to crates.io: https://crates.io/crates/arrow/5.2.0 https://crates.io/crates/arrow-flight/5.2.0 https://crates

Re: [DISCUSS] Binary Values in Key value pairs WAS: Re: [INFO_REQUEST][FLIGHT] - Dynamic schema changes in ArrowFlight streams

2021-08-16 Thread Wes McKinney
I've been poking around the project, and I'm growing concerned that our use of the KeyValue field has already been non-compliant in many cases since we do not validate UTF8-ness. Since we also use KeyValue to handle opaque data serialization for extension types [1], the fact that the specification