Re: A couple of questions about pyarrow.parquet

2019-05-23 Thread Uwe L. Korn
Hello Ted, regarding predicate pushdown in Python, have a look at my unfinished PR at https://github.com/apache/arrow/pull/2623. This was stopped since we were missing native filter in Arrow. The requirements for that have now been implemented and we could probably reactivate the PR. Uwe On

Re: A couple of questions about pyarrow.parquet

2019-05-17 Thread Ted Gooch
Thanks Micah and Wes. Definitely interested in the *Predicate Pushdown* and *Schema inference, schema-on-read, and schema normalization *sections. On Fri, May 17, 2019 at 12:47 PM Wes McKinney wrote: > Please see also > > >

Re: A couple of questions about pyarrow.parquet

2019-05-17 Thread Wes McKinney
Please see also https://docs.google.com/document/d/1bVhzifD38qDypnSjtf8exvpP3sSB5x_Kw9m-n66FB2c/edit?usp=drivesdk And prior mailing list discussion. I will comment in more detail on the other items later On Fri, May 17, 2019, 2:44 PM Micah Kornfield wrote: > I can't help on the first

Re: A couple of questions about pyarrow.parquet

2019-05-17 Thread Micah Kornfield
I can't help on the first question. Regarding push-down predicates, there is an open JIRA [1] to do just that [1] https://issues.apache.org/jira/browse/PARQUET-473

A couple of questions about pyarrow.parquet

2019-05-17 Thread Ted Gooch
Hi, I've been doing some work trying to get the parquet read path going for the python iceberg library. I have two questions that I couldn't get figured out, and was hoping I could get some guidance from the list here. First, I'd like to create a