Re: Parquet to Arrow in Java

2019-09-04 Thread Chao Sun
Thanks Uwe for pointing out the Iceberg effort - will take a look. It is good to have a "standard" Parquet-to-Arrow reader implementation live in the Arrow project though, so that in future different projects can just refer to this instead of implementing their own. Chao On Wed, Sep 4, 2019 at

Re: Parquet to Arrow in Java

2019-09-04 Thread Uwe L. Korn
Hello, You may want to interact with the Apache Iceberg community here. They are currently a similar things: https://lists.apache.org/thread.html/3bb4f89a0b37f474cf67915f91326fa845afa597bdd2463c98a2c8b9@%3Cdev.iceberg.apache.org%3E I'm not involved in this, just reading both mailing lists and

Re: Parquet to Arrow in Java

2019-09-04 Thread Chao Sun
Bumping this. We may have an upcoming use case for this as well. Want to know if anyone is actively working on this? I also heard that Dremio has internally implemented a performant Parquet to Arrow reader. Is there any plan to open source it? that could save us a lot of work. Thanks, Chao On

Re: Parquet to Arrow in Java

2019-08-09 Thread Renjie Liu
Hi: I'm working on the rust part and expecting to finish this recently. I'm also interested in the java version because we are trying to embed arrow in spark to implement vectorized processing. Maybe we can work together. Micah Kornfield 于 2019年8月5日周一 下午1:50写道: > Hi Anoop, > I think a

Re: Parquet to Arrow in Java

2019-08-04 Thread Micah Kornfield
Hi Anoop, I think a contribution would be welcome. There was a recent discussion thread on what would be expected from new "readers" for Arrow data in Java [1]. I think its worth reading through but my recollections of the highlights are: 1. A short design sketch in the JIRA that will track the

Re: Parquet to Arrow in Java

2019-08-04 Thread Anoop Johnson
Thanks for the response Micah. I could implement this and contribute to Arrow Java. To help me get started, are there any pointers on how the C++ or Rust implementations currently read Parquet into Arrow? Are they reading Parquet row-by-row and building Arrow batches or are there better ways of

Re: Parquet to Arrow in Java

2019-07-30 Thread Micah Kornfield
Hi Anoop, There isn't currently anything in the Arrow Java library that does this. It is something that I think we want to add at some point. Dremio [1] has some Parquet related code, but I haven't looked at it to understand how easy it is to use as a standalone library and whether is supports

Parquet to Arrow in Java

2019-07-28 Thread Anoop Johnson
Arrow Newbie here. What is the recommended way to convert Parquet data into Arrow, preferably doing predicate/column pushdown? One can implement this as custom code using the Parquet API, and re-encode it in Arrow using the Arrow APIs, but is this supported by Arrow out of the box? Thanks,