[jira] [Created] (ARROW-2571) [C++] Lz4Codec doesn't properly handle empty data

2018-05-10 Thread Dmitry Kalinkin (JIRA)
Dmitry Kalinkin created ARROW-2571: -- Summary: [C++] Lz4Codec doesn't properly handle empty data Key: ARROW-2571 URL: https://issues.apache.org/jira/browse/ARROW-2571 Project: Apache Arrow

Re: [JS] Arrow output from JS library?

2018-05-10 Thread Paul Taylor
Quick update on the Arrow JS ipc buffer writer: I had a chance to revisit this branch on my fork last night, and managed to get a working prototype of the RecordBatchStreamWriter correctly serializing the integration test data to

Re: How to model massive nested data

2018-05-10 Thread Wes McKinney
hi Tyler, I am not sure the Arrow Java libraries have yet been used for interacting with larger than memory datasets, but this would be a good opportunity to try to get this working. In the C++ libraries, any Arrow data structures can easily reference memory-mapped data on disk; none of the data

[jira] [Created] (ARROW-2570) [Python] Add support for writing parquet files with LZ4 compression

2018-05-10 Thread Dmitry Kalinkin (JIRA)
Dmitry Kalinkin created ARROW-2570: -- Summary: [Python] Add support for writing parquet files with LZ4 compression Key: ARROW-2570 URL: https://issues.apache.org/jira/browse/ARROW-2570 Project:

PyArrow and Parquet DELTA_BINARY_PACKED

2018-05-10 Thread Feras Salim
Hi, I was wondering if I'm missing something or currently the `DELTA_BINARY_PACKED` is only available for reading when it comes to parquet files, I can't find a way for the writer to encode timestamp data with `DELTA_BINARY_PACKED`, furthermore I seem to get about 10% increase in final file size

Re: How to model massive nested data

2018-05-10 Thread Lukasz Cwik
Is it also possible to iterate over the iterator more then once. Can I have multiple iterators at different positions for iterator all working independently? On Thu, May 10, 2018 at 12:22 PM Tyler Akidau wrote: > Hello Arrow folks, > > I've been skimming through the Arrow

How to model massive nested data

2018-05-10 Thread Tyler Akidau
Hello Arrow folks, I've been skimming through the Arrow docs and code trying to figure out how one might model nested data structures where the nested portions themselves might be massive (i.e., larger than available memory). AFAICT, the nesting constructs in Arrow appear to assume that you can

[jira] [Created] (ARROW-2568) [Python] Expose thread pool size setting to Python, and deprecate "nthreads"

2018-05-10 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-2568: - Summary: [Python] Expose thread pool size setting to Python, and deprecate "nthreads" Key: ARROW-2568 URL: https://issues.apache.org/jira/browse/ARROW-2568

[jira] [Created] (ARROW-2566) [CI] Add codecov.io badge to README

2018-05-10 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-2566: - Summary: [CI] Add codecov.io badge to README Key: ARROW-2566 URL: https://issues.apache.org/jira/browse/ARROW-2566 Project: Apache Arrow Issue Type: Task

[CI] Code coverage reports

2018-05-10 Thread Antoine Pitrou
Hi, Previous efforts to gather and publish C++ code coverage using the free service provided by coveralls.io have stalled (see ARROW-27). I went ahead and experimented with another free service, codecov.io. I got it to work with our C++ and Rust code bases. An example report can be seen here:

[jira] [Created] (ARROW-2565) [Plasma] new subscriber cannot receive notifications about existing objects

2018-05-10 Thread Zhijun Fu (JIRA)
Zhijun Fu created ARROW-2565: Summary: [Plasma] new subscriber cannot receive notifications about existing objects Key: ARROW-2565 URL: https://issues.apache.org/jira/browse/ARROW-2565 Project: Apache