Re: My focus for Rust implementation for 2.0.0

2020-08-14 Thread Kirill Lykov
Sounds interesting as we wanted to start using DataFusion. Btw, I vaguely remember that in the original repository you had issue like "investigate DataFusion with Gandiva", I'm curious why you have decided to give up with it? On Thu, Aug 13, 2020 at 5:11 PM Andy Grove wrote: > > Some of you may

RE: Arrow Flight + Go, Arrow for Realtime

2020-08-14 Thread mark
Thanks Wes & Sebastien, I've tested Arrow in Go-WASM now and it is working fine. Still getting my head around best way to use it for our Use case (IoT Data) My goal here is to hit a Flight endpoint from the Browser (GO-WASM specifically), and pull (all or part of) an Arrow dataset on the s

RE: Arrow Flight + Go, Arrow for Realtime

2020-08-14 Thread mark
Thanks Wes, I'll likely work on that once I get my head around Arrow in general and confirm will use for the project. Considerations for how to account for the streaming append problem to an otherwise immutable dataset is current concern. Still thinking through that. Regards Mark. -

[NIGHTLY] Arrow Build Report for Job nightly-2020-08-14-0

2020-08-14 Thread Crossbow
Arrow Build Report for Job nightly-2020-08-14-0 All tasks: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-08-14-0 Failed Tasks: - test-conda-cpp-valgrind: URL: https://github.com/ursa-labs/crossbow/branches/all?query=nightly-2020-08-14-0-github-test-conda-cpp-valgrind

Re: Building an executable with arrow flight (C++)

2020-08-14 Thread Wes McKinney
Using ExternalProject should work as well (again, if it doesn't, it's a bug and should be reported). We should augment our examples to include an example use with ExternalProject https://issues.apache.org/jira/browse/ARROW-9740 On Thu, Aug 13, 2020 at 9:44 PM Radu Teodorescu wrote: > Hi Wes, >

Re: My focus for Rust implementation for 2.0.0

2020-08-14 Thread Andy Grove
First, an update on progress. Once the PRs for ARROW-9711 and ARROW-9716 are merged, it is possible to run TPC-H query 1 against a 100 GB data set with similar performance to Apache Spark in local mode. I plan on testing larger datasets over the weekend. To answer Kirill's question, I wouldn't nec

[Java] Supporting Big Endian

2020-08-14 Thread Micah Kornfield
Kazuaki Ishizak has started working on Big Endian support in Java (including setting up CI for it). Thank you! We previously discussed support for Big Endian architectures in C++ [1] and generally agreed that it was a reasonable thing to do. Similar to C++ I think as long as we have a working CI

Re: [Java] Supporting Big Endian

2020-08-14 Thread Kazuaki Ishizaki
Hi Micah, Thank you for pick up this topic. It is great to discuss the support of Big Endian in Java implementation in the mailing list. I may miss the technical blockers. Any comments are appreciated. Best Regards, Kazuaki Ishizaki From: Micah Kornfield To: dev Date: 2020/08/15 08:

Re: [Java] Supporting Big Endian

2020-08-14 Thread Jacques Nadeau
Hey Micah, thanks for starting the discussion. I just skimmed that thread and it isn't entirely clear that there was a conclusion that the overhead was worth it. I think everybody agrees that it would be nice to have the code work on both platforms. On the flipside, the code noise for a rare case

Re: [DISCUSS] Adding a pull-style iterator API to the C data interface

2020-08-14 Thread Jacques Nadeau
I think this unlocks a bunch of use cases. I think people are generally using Arrow in simpler, non-streaming ways right now and thus the quiet. Producing an iterator pattern is logical as you move to streams of smaller chunks (common in distributed and multi-tenant systems). On Mon, Aug 10, 2020

Re: [DISSCUSS][JAVA] Avoid set reader/writer indices in FieldVector#getFieldBuffers

2020-08-14 Thread Jacques Nadeau
Per my comments there, the introduction of field buffers was added as part of the fieldvector addition when we have vectors that weren't field level. This meant that getbuffers and getfieldbuffers were at different levels at hierarchy (getbuffers being more general). I believe we no longer have the

Re: [DISCUSS] Support of higher bit-width Decimal type

2020-08-14 Thread Jacques Nadeau
Do we have a good definition of what is necessary to add a new data type? Adding a type but not pulling it through most of the code seems less than ideal since it means one part of Arrow doesn't work with another (providing a less optimal end-user experience). For example, would this work include

Re: change in pyarrow scalar equality?

2020-08-14 Thread Bryan Cutler
Thanks for the detailed response and background on this Joris! My case was certainly not necessary to compare pyarrow scalars, so it would have been better to just raise an error, but there are probably other cases where that wouldn't be preferred. Anyway, I think it would be a good idea to documen

Re: [DISCUSS] How to extended time value range for Timestamp type?

2020-08-14 Thread Jacques Nadeau
+1, let's be cautious adding these kinds of things. On Wed, Aug 5, 2020 at 5:49 AM Wes McKinney wrote: > I also am not sure there is a good case for a new built-in type since it > introduces a good deal of complexity, particularly when there is the > extension type option. We’ve been living with

Re: Gandiva and Threads

2020-08-14 Thread Jacques Nadeau
@ravin...@dremio.com @prav...@dremio.com thoughts? On Tue, Jul 28, 2020 at 3:39 PM Wes McKinney wrote: > Perhaps Gandiva does not handle sliced arrays properly? This would be > worth investigating > > On Mon, Jul 27, 2020 at 7:43 PM Matt Youill > wrote: > > > > Managed to track down the issue

Re: [DISCUSS] Support of higher bit-width Decimal type

2020-08-14 Thread Micah Kornfield
Hi Jacques, Do we have a good definition of what is necessary to add a new data type? > Adding a type but not pulling it through most of the code seems less than > ideal since it means one part of Arrow doesn't work with another (providing > a less optimal end-user experience). I think what I pro

Re: [DISCUSS] Plasma appears to have been forked, consider deprecating pyarrow.serialization

2020-08-14 Thread Micah Kornfield
> > Regarding Plasma, you're right we should have started this conversation > earlier! The way it's being developed in Ray currently isn't useful as a > standalone project. We realized that tighter integration with Ray's object > lifetime tracking could be important, and removing IPCs and making it