Re: [DISCUSS] Big Endian support in Arrow (was: Re: [Java] Supporting Big Endian)

2020-08-30 Thread Jacques Nadeau
I didn't realize that Ishizaki isn't just proposing a BE platform support, he is proposing a new BE version of the format. In this situation computers speaking Arrow potentially have to convert from one version to the other version. For example two machines communicating with Arrow flight now have

Re: [DISCUSS] Big Endian support in Arrow (was: Re: [Java] Supporting Big Endian)

2020-08-30 Thread Micah Kornfield
Looking over the outstanding PRs while the code isn't necessarily pretty, I don't think they are too invasive. Also it seems that Kazuaki Ishizaki is willing to add benchmarks where necessary to verify the lack of performance regressions. (Please correct me if I misunderstood). Jacques and Liya

Re: ORC writer

2020-08-30 Thread Micah Kornfield
Hi Ying Zhou, Those sound like the right places for the work to happen. I'm not aware of any abandoned attempts at write support. -Micah On Sat, Aug 29, 2020 at 12:12 PM Ying Zhou wrote: > Hi, > > I’m interested in writing a binder so that we can write ORC files in > Arrow. I likely should con

Re: Compression in Arrow - Question

2020-08-30 Thread Micah Kornfield
Hi Mark, > There is definitely a tradeoff between processing speed and compression, > however I feel there is a use case for 'small in memory footprint' > independent of 'high speed processing'. > Though I appreciate arrow team may not want to address that, given the > focus on processing speed.

Re: Question on pyarrow compute

2020-08-30 Thread Andrew Wieteska
Hi Drew, Pyarrow's compute module should be accessible in 0.17+ so updating your installation should solve the problem. I don't think you need the nightly build, just a newer release. For updating, it might be you will need to use the conda-forge channel, as it is possible that the default Anacon

Question on pyarrow compute

2020-08-30 Thread Drew Moore
Hello, My apologies if this isn't the right place to contact. I’m interested in exploring the Python binding. I was wondering how I could access the compute functions (other than sum) from pyarrow. I see this Jira ticket: https://issues.apache.org/jira/browse/ARROW-7871, and it references a pyar

RE: Compression in Arrow - Question

2020-08-30 Thread mark
All, Micah: appears my google-fu wasn't strong enough to find the previous thread, so thanks for pointing that out. There is definitely a tradeoff between processing speed and compression, however I feel there is a use case for 'small in memory footprint' independent of 'high speed proces

Re: Compression in Arrow - Question

2020-08-30 Thread Micah Kornfield
Agreed, I think it would be useful to make sure the "compute" interfaces have the right hooks to support alternate encodings. On Sunday, August 30, 2020, Wes McKinney wrote: > That said, there is nothing preventing the development of programming > interfaces for compressed / encoded data right n

Re: Compression in Arrow - Question

2020-08-30 Thread Wes McKinney
That said, there is nothing preventing the development of programming interfaces for compressed / encoded data right now. When it comes to transporting such data, that's when we will have to decide on what to support and what new metadata structures are required. For example, we could add RLE to C

[NIGHTLY] Arrow Build Report for Job nightly-2020-08-30-0

2020-08-30 Thread Crossbow
Arrow Build Report for Job nightly-2020-08-30-0 All tasks: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-08-30-0 Failed Tasks: - conda-osx-clang-py36: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-08-30-0-azure-conda-osx-clang-py36 - cond