HI Cindy, Naming is hard :(. The Consumer classes consume avro data and write it to arrow. For example the AvroArraysConsumer [1] has the following description "Consumer which consume array type values from avro decoder. Write the data to ListVector." ListVector is the analogous arrow structure to avro arrays.
Thanks, Micah [1] https://arrow.apache.org/docs/java/org/apache/arrow/consumers/AvroArraysConsumer.html On Tue, Jun 30, 2020 at 8:02 AM Cindy McMullen <[email protected]> wrote: > Hi, Micah - > > I see the Avro*Consumer classes in the javadocs > <https://arrow.apache.org/docs/java/>, which would lead me to believe we > have Arrow to Avro capability. What am I missing? > > On Mon, Jun 29, 2020 at 9:33 PM Micah Kornfield <[email protected]> > wrote: > >> Just a clarification the functionality in Java is from Avro to Arrow (not >> Arrow to Avro). >> >> >> >> On Mon, Jun 29, 2020 at 2:25 PM Wes McKinney <[email protected]> wrote: >> >>> On Mon, Jun 29, 2020 at 4:15 PM Cindy McMullen <[email protected]> >>> wrote: >>> > >>> > Hi, Wes - >>> > >>> > Yes, we're using Java/Scala, but also have a good Python code base for >>> our data scientists. Our goal is to replace storage/representation of >>> Thrift for ML features with some more OSS-friendly format, such as Parquet >>> or Avro, and avoid writing multiple adapters. >>> > >>> > Ideally, we could stream data from Parquet disk in batches into >>> Arrow-compatible consumers. Is this a reasonable fit for something like >>> Arrow Flight? >>> >>> Yes, Flight is definitely designed for that -- fast / efficient >>> delivery of Arrow record batches over TCP. >>> >>> > >>> > On Mon, Jun 29, 2020 at 2:37 PM Wes McKinney <[email protected]> >>> wrote: >>> >> >>> >> hi Cindy, >>> >> >>> >> Could you clarify which PL you are working in (though assuming Scala / >>> >> Java judging by your e-mail address)? >>> >> >>> >> In C++ we have reasonably mature Parquet->Arrow reading but not yet >>> >> conversion from Arrow to Avro. In Java, I am not sure what is the >>> >> state of the art for getting Parquet into Arrow but this code does not >>> >> live in Apache Arrow -- I know that Apache Iceberg has done some work >>> >> around this but I'm not sure how consumable it is as a library. >>> >> Java-Arrow does have some preliminary support for converting Arrow to >>> >> Avro, I believe. So there's some engineering here to do in any case. >>> >> >>> >> best, >>> >> Wes >>> >> >>> >> On Mon, Jun 29, 2020 at 2:45 PM Cindy McMullen <[email protected]> >>> wrote: >>> >> > >>> >> > Can I use Arrow to stream data from a Parquet file source and >>> consume it via Avro? >>> >>
