Just a clarification the functionality in Java is from Avro to Arrow (not Arrow to Avro).
On Mon, Jun 29, 2020 at 2:25 PM Wes McKinney <[email protected]> wrote: > On Mon, Jun 29, 2020 at 4:15 PM Cindy McMullen <[email protected]> > wrote: > > > > Hi, Wes - > > > > Yes, we're using Java/Scala, but also have a good Python code base for > our data scientists. Our goal is to replace storage/representation of > Thrift for ML features with some more OSS-friendly format, such as Parquet > or Avro, and avoid writing multiple adapters. > > > > Ideally, we could stream data from Parquet disk in batches into > Arrow-compatible consumers. Is this a reasonable fit for something like > Arrow Flight? > > Yes, Flight is definitely designed for that -- fast / efficient > delivery of Arrow record batches over TCP. > > > > > On Mon, Jun 29, 2020 at 2:37 PM Wes McKinney <[email protected]> > wrote: > >> > >> hi Cindy, > >> > >> Could you clarify which PL you are working in (though assuming Scala / > >> Java judging by your e-mail address)? > >> > >> In C++ we have reasonably mature Parquet->Arrow reading but not yet > >> conversion from Arrow to Avro. In Java, I am not sure what is the > >> state of the art for getting Parquet into Arrow but this code does not > >> live in Apache Arrow -- I know that Apache Iceberg has done some work > >> around this but I'm not sure how consumable it is as a library. > >> Java-Arrow does have some preliminary support for converting Arrow to > >> Avro, I believe. So there's some engineering here to do in any case. > >> > >> best, > >> Wes > >> > >> On Mon, Jun 29, 2020 at 2:45 PM Cindy McMullen <[email protected]> > wrote: > >> > > >> > Can I use Arrow to stream data from a Parquet file source and consume > it via Avro? >
