Re: [ARROW-17255] Logical JSON type in Arrow

2022-08-02 Thread Lee, David
While I do like having a json type, adding processing functionality especially around compute capabilities might be limiting. Arrow already supports nested lists and structs which can cover json structures while offering vectorized processing. Json should only be a logical representation of wh

Re: [VOTE] Release Apache Arrow 9.0.0 - RC2

2022-08-02 Thread Yibo Cai
+1 (binding) Verified source (cpp/python/go) and wheels on ubuntu-20.04 aarch64. TEST_DEFAULT=0 TEST_CPP=1 TEST_PYTHON=1 TEST_GO=1 dev/release/verify-release-candidate.sh 9.0.0 2 TEST_DEFAULT=0 TEST_WHEELS=1 dev/release/verify-release-candidate.sh 9.0.0 2 On 7/30/22 07:10, Krisztián Szűcs w

Re: [Java-Cookbook] UnsatisfiedLinkError error while running 'make javatest'

2022-08-02 Thread Ashish
I couldn't make things work on the M1 Mac, kept running into issues. Last issue was ld: library not found for -lc++ Tried various solution around it and then switched to Intel mac @David/@Larry - A fresh clone on intel mac passed all the tests, including the dataset. I have not built the jni lib

Re: [VOTE] Release Apache Arrow 9.0.0 - RC2

2022-08-02 Thread Krisztián Szűcs
+1 (binding) Verified source, wheels, jars and binaries on arm64 macOS. The automatized crossbow verification tasks have also passed [1]. [1]: https://github.com/apache/arrow/pull/13749 On Tue, Aug 2, 2022 at 6:14 PM David Li wrote: > > +1 (binding) > > Verified: source/wheels/binaries on Ubun

Arrow sync call August 3 at 12:00 US/Eastern, 16:00 UTC

2022-08-02 Thread Ian Cook
Hi all, Our biweekly sync call is tomorrow at 12:00 noon Eastern time. The Zoom meeting URL for this and other biweekly Arrow sync calls is: https://zoom.us/j/87649033008?pwd=SitsRHluQStlREM0TjJVYkRibVZsUT09 Alternatively, enter this information into the Zoom website or app to join the call: Mee

Re: [FlightSQL][JDBC] Additional changes to the JDBC driver

2022-08-02 Thread David Li
Would it be OK to get what's there into the main branch first? i.e., open a PR from the apache/flight-jdbc-driver (or a contributor's clone of it, that would make it easier to address review comments). I'd like to get through the review of what we currently have since the PR will be large. And t

Re: [DISCUSS][FlightRPC] Lifetime of endpoints

2022-08-02 Thread David Li
Does attempting to use the client and retrying authentication errors work? Also - if the endpoints are ephemeral, the clients would just get evicted from/time out of the cache naturally anyways right? I've always found the implied statefulness of Handshake rather fragile/unsuited to gRPC and wo

Re: Re: Help needed with PR #13659: Fixing build/unit test issues in msvc/win32

2022-08-02 Thread James Duong
A heads-up, the warnings are now fixed in the 32-bit build n this PR: https://github.com/apache/arrow/pull/13532 I haven't fixed the linker error yet -- disabling parquet build didn't seem to work (or I haven't properly disabled the parquet build) and need to fix Archery warnings. On Tue, Jul 26,

[DISCUSS][FlightRPC] Lifetime of endpoints

2022-08-02 Thread James Duong
Raising a discussion from this JDBC PR: https://github.com/rafael-telles/arrow/pull/42#discussion_r930298691 It would make sense for an application to want to pool FlightClients when possible. When getFlightInfo is used, it can potentially return several different servers to connect to. However th

Re: Help with writing/reading from s3

2022-08-02 Thread Will Jones
Hi Li Jin, I'm not sure yet what changed, but I believe you can fix that error simply by omitting the scheme prefix from the URI and just use the page when loading the dataset. Here's my repro: import pyarrow as pa import pyarrow.dataset as ds from pyarrow.fs import S3FileSystem s3fs = S3FileSys

[FlightSQL][JDBC] Additional changes to the JDBC driver

2022-08-02 Thread James Duong
Hi, We have a few additional important changes for the Flight SQL JDBC Driver: - Avoid sending headers for built-in properties such as hostname, port. - Make handling of connection URI key names and Properties keys case-insensitive. - Create separate FlightClients for each new endpoint returned by

Re: [DISCUSS][Format] Starting to do some concrete work on the new "StringView" columnar data type

2022-08-02 Thread Wes McKinney
On Tue, Aug 2, 2022 at 1:02 AM Antoine Pitrou wrote: > > > Le 01/08/2022 à 19:13, Wes McKinney a écrit : > > > > If we start placing restrictions on how the out-of-line string buffers > > are managed and externalized, it risks undermining the zero-copy > > interoperability benefits that we're tryi

Re: [ARROW-17255] Logical JSON type in Arrow

2022-08-02 Thread Wes McKinney
I should add that since Parquet has JSON, BSON, and UUID types, that while UUID is just a simple fixed sized binary, that having the extension types so that the metadata flows through accurately to Parquet would be net beneficial: https://github.com/apache/parquet-format/blob/master/src/main/thrif

Re: Proposal: Unassign idle issues

2022-08-02 Thread Todd Farmer
Thank you all for the feedback on the proposal. I unassigned 371 idle (not updated within past 90 days) issues on 2022-07-12 [1]. It doesn't appear that this action has caused problems or confusion - please let me know if I've missed anything. There are currently an additional 31 issues that are no

Re: Replace conda with mamba in docs?

2022-08-02 Thread Antoine Pitrou
I would hope conda get their act together and improve on this. I have mixed feelings about complicating the documentation with explanations of how mamba is (often? usually?) a better replacement to conda. Generally we should focus on Arrow-specific issues and avoid distracting the user with

Re: [QUESTION] How is mmap implemented for 8bit padded files?

2022-08-02 Thread Antoine Pitrou
Hi Jorge, So there are two aspects to the answer: - ideally, the C++ implementation also works on non-aligned data (though this is poorly tested, if any) - when mmap'ing a file, you should get a page-aligned address As for int128 and int256, these usually don't exist at the hardware level

Re: [VOTE] Release Apache Arrow 9.0.0 - RC2

2022-08-02 Thread David Li
+1 (binding) Verified: source/wheels/binaries on Ubuntu 18.04 I had some issues with binaries, I will file JIRAs once I reproduce them (basically: needed to pass a flag to Docker - this may be an issue with my setup - and APT tends to be flaky; it would be nice if we configured it to retry dow

Re: [Java-Cookbook] UnsatisfiedLinkError error while running 'make javatest'

2022-08-02 Thread Larry White
> > Can I build using these instruction > > https://arrow.apache.org/docs/developers/java/building.html#building-jni-libraries-on-macos > ? They should work, yes. On Tue, Aug 2, 2022 at 11:32 AM Ashish wrote: > It's M1 Mac. > > Not sure how the build worked till last week. Phew I need to start

Re: [Java-Cookbook] UnsatisfiedLinkError error while running 'make javatest'

2022-08-02 Thread Ashish
It's M1 Mac. Not sure how the build worked till last week. Phew I need to start drinking coffee again :( Can I build using these instruction https://arrow.apache.org/docs/developers/java/building.html#building-jni-libraries-on-macos ? thanks Ashish On Tue, Aug 2, 2022 at 8:22 AM David Li wrote

Re: [Java-Cookbook] UnsatisfiedLinkError error while running 'make javatest'

2022-08-02 Thread David Li
What platform are you running on? It looks like MacOS? We don't currently ship the JNI binaries for M1 (https://issues.apache.org/jira/browse/ARROW-16608) On Tue, Aug 2, 2022, at 10:39, Ashish wrote: > Hi, > > Running into this weird issue. While running "make javatest", following > error shows u

[Java-Cookbook] UnsatisfiedLinkError error while running 'make javatest'

2022-08-02 Thread Ashish
Hi, Running into this weird issue. While running "make javatest", following error shows up for dataset section Exception java.lang.UnsatisfiedLinkError: Can't load library: /var/folders/bg/72tsr4491sz9vf1yrvb9l66mgn/T/jnilib-13800921913896828051.tmp at ClassLoader.loadLibrary (ClassLoad

Re: [VOTE] Release Apache Arrow 9.0.0 - RC2

2022-08-02 Thread Jacob Wujciak
Hi, +1 (non-binding) I have verified on Windows where I had a non-blocking test failure. See ARROW-17281 [1] [1]: https://issues.apache.org/jira/browse/ARROW-17281 On Mon, Aug 1, 2022 at 4:36 PM Raul Cumplido Dominguez wrote: > Hi, > > +1 (non-binding) > > TLDR, I've found an issue on Integra

Re: [DISCUSS][Format] Starting to do some concrete work on the new "StringView" columnar data type

2022-08-02 Thread Antoine Pitrou
Le 01/08/2022 à 19:13, Wes McKinney a écrit : If we start placing restrictions on how the out-of-line string buffers are managed and externalized, it risks undermining the zero-copy interoperability benefits that we're trying to achieve with this. But embedded pointers in turn undermine zero