[DISCUSS] Improving Contributor Guidelines

2021-03-04 Thread Micah Kornfield
Hi Everyone, I am writing to give a bump to some of what was written in reply to Andrew's thread on auto-creating JIRAs. I would like to try to focus on small (hopefully) short term achievable items, to make the community friendlier to newcomers and reduce toil for regular contributors. 1. I thi

Re: [Java] IPC stream write with re-stated dictionaries

2021-03-04 Thread Micah Kornfield
Hi Joris, I do believe this is missing. I believe we worked around this for testing by directly writing dictionary batches to the stream [1]. Thanks, Micah [1] https://github.com/apache/arrow/blob/master/java/vector/src/test/java/org/apache/arrow/vector/ipc/TestArrowReaderWriter.java#L614 On Th

Re: [Flight Extension] Request for Comments

2021-03-04 Thread Nate Bauernfeind
Regarding the BarrageRecordBatch: I have been concatenating them; it’s one batch with two sets of arrow payloads. They don’t have separate metadata headers; the update is to be applied atomically. I have only studied the Java Arrow Flight implementation, and I believe it is usable maybe with some

Re: [C++] Generating random Date64 & Timestamp arrays

2021-03-04 Thread Wes McKinney
Agreed, though keep in mind that rather than "some form of reinterpretation at ArrayData level", you can use the Array::View function, so it would look something like auto ty = date64(); auto arr = *rag.Int64(...)->View(ty); On Thu, Mar 4, 2021 at 3:47 AM Antoine Pitrou wrote: > > > Hi Ying, > >

Re: [Rust] Arrow in WebAssemby

2021-03-04 Thread Dominik Moritz
I just remembered a bigger issue I ran into. I wanted to read from IPC but I don’t have a file. I do have the data as [u8] already. The current API incurs more copies than necessary (I think) and therefore the performance of reading IPC is worse than in JS. ( https://issues.apache.org/jira/project

Re: [Flight Extension] Request for Comments

2021-03-04 Thread David Li
Re: the multiple batches, that makes sense. In that case, depending on how exactly the two record batches are laid out, I'd suggest considering a Union of Struct columns (where a Struct is essentially interchangeable with a record batch or table) - that would let you encode two distinct record b

[Java] IPC stream write with re-stated dictionaries

2021-03-04 Thread Joris Peeters
Hello, For my use case I'm sending an Arrow IPC-stream from a server to a client, with some columns being dictionary-encoded. Dictionary-encoding happens on the fly, though, so the full dictionary isn't known yet at the beginning of the stream, but rather is computed for every batch, and Dictionar

[NIGHTLY] Arrow Build Report for Job nightly-2021-03-04-0

2021-03-04 Thread Crossbow
Arrow Build Report for Job nightly-2021-03-04-0 All tasks: https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-03-04-0 Failed Tasks: - conda-linux-gcc-py37-aarch64: URL: https://github.com/ursacomputing/crossbow/branches/all?query=nightly-2021-03-04-0-drone-conda-linux

Re: [C++] Generating random Date64 & Timestamp arrays

2021-03-04 Thread Antoine Pitrou
Hi Ying, Yes, this approach sounds reasonable. It would be useful at some point to add random date/timestamp generation to RandomArrayGenerator, though. Regards Antoine. Le 04/03/2021 à 04:36, Ying Zhou a écrit : Hi, I’d like to generate random Date64 & Timestamp arrays with artificial