Read Arrow 0.9.0 output using newer pyarrow version

2019-03-10 Thread Rares Vernica
Hello, I have a C++ library using Arrow 0.9.0 to serialize data The code looks like this: std::shared_ptr arrowBatch; arrowBatch = arrow::RecordBatch::Make(_arrowSchema, nCells, _arrowArrays); std::shared_ptr arrowBuffer(new arrow::PoolBuffer(_arrowPool)); arrow::io::BufferOutputStream

Re: OversizedAllocationException for pandas_udf in pyspark

2019-03-10 Thread Abdeali Kothari
Hi, any help on this would be much appreciated. I've not been able to figure out any reason for this to happen yet On Sat, Mar 2, 2019, 11:50 Abdeali Kothari wrote: > Hi Li Jin, thanks for the note. > > I get this error only for larger data - when I reduce the number of > records or the number

Re: [C++] Failing constructors and internal state

2019-03-10 Thread Wes McKinney
hi Edmon, Here's an example of a function that does some schema validation: https://github.com/apache/arrow/blob/master/cpp/src/arrow/table.cc#L450 The issue is less about the magnitude of the cost and more of a software engineering question about layering of concerns. Consider two code paths:

Re: [C++] Failing constructors and internal state

2019-03-10 Thread Wes McKinney
I think having consistent methods for both validated and unvalidated construction is a good idea. Being fairly passionate about microperformance, I don't think we should penalize responsible users of unsafe/unvalidated APIs (e.g. by taking them away and replacing them with variants featuring

Re: [C++] Failing constructors and internal state

2019-03-10 Thread Micah Kornfield
I agree there should always be a path to avoid the validation but I think there should also be an easy way to have validation included and a clear way to tell the difference. IMO, having strong naming convention so callers can tell the difference, and code reviewers can focus more on less safe

[jira] [Created] (ARROW-4818) [Rust] [Parquet] Parquet reader does not support null values

2019-03-10 Thread Andy Grove (JIRA)
Andy Grove created ARROW-4818: - Summary: [Rust] [Parquet] Parquet reader does not support null values Key: ARROW-4818 URL: https://issues.apache.org/jira/browse/ARROW-4818 Project: Apache Arrow

[jira] [Created] (ARROW-4817) [Rust] [DataFusion] Small re-org of modules

2019-03-10 Thread Andy Grove (JIRA)
Andy Grove created ARROW-4817: - Summary: [Rust] [DataFusion] Small re-org of modules Key: ARROW-4817 URL: https://issues.apache.org/jira/browse/ARROW-4817 Project: Apache Arrow Issue Type:

[jira] [Created] (ARROW-4816) [Rust] [DataFusion] Add support for repartitioning

2019-03-10 Thread Andy Grove (JIRA)
Andy Grove created ARROW-4816: - Summary: [Rust] [DataFusion] Add support for repartitioning Key: ARROW-4816 URL: https://issues.apache.org/jira/browse/ARROW-4816 Project: Apache Arrow Issue

Re: [C++] Failing constructors and internal state

2019-03-10 Thread Wes McKinney
hi folks, I think some issues are being conflated here, so let me try to dig through them. Let's first look at the two cited bugs that were fixed, if I have this right: * ARROW-4766: root cause dereferencing a null pointer * ARROW-4774: root cause unsanitized Python user input None of the 4

[jira] [Created] (ARROW-4815) [Rust] [DataFusion] Add support for * in SQL projection

2019-03-10 Thread Andy Grove (JIRA)
Andy Grove created ARROW-4815: - Summary: [Rust] [DataFusion] Add support for * in SQL projection Key: ARROW-4815 URL: https://issues.apache.org/jira/browse/ARROW-4815 Project: Apache Arrow Issue

Re: Assignee on Jira

2019-03-10 Thread paddy horan
Thanks Kou, appreciate it. P From: Kouhei Sutou Sent: Sunday, March 10, 2019 12:59 AM To: dev@arrow.apache.org Subject: Re: Assignee on Jira Hi, Yes. We need to add the user to the "contributor" role in JIRA to assign to the user. Adding an user to the

[jira] [Created] (ARROW-4814) [Python] Exception when writing nested columns that are tuples to parquet

2019-03-10 Thread Suvayu Ali (JIRA)
Suvayu Ali created ARROW-4814: - Summary: [Python] Exception when writing nested columns that are tuples to parquet Key: ARROW-4814 URL: https://issues.apache.org/jira/browse/ARROW-4814 Project: Apache