Hi John, The overview of the java API might help here [1]. I also wrote up some notes on avro->Arrow conversion for a different user question [2]. ARROW-9613 [3] is tracking the impedance mismatch I mentioned in the e-mail.
Hope this helps. -Micah [1] https://arrow.apache.org/docs/java/ipc.html#writing-and-reading-random-access-files [2] https://lists.apache.org/thread.html/rfa51f801b752faa881d318cff7394ee5b43161c100a707810c6c92fd%40%3Cuser.arrow.apache.org%3E [3] https://issues.apache.org/jira/browse/ARROW-9613 On Mon, Dec 28, 2020 at 10:33 PM John E. Conlon <[email protected]> wrote: > Creating a DataEngineering pipeline that will create transform binary Avro > objects in S3 buckets to S3 Arrow objects and Parquet objects. > > See that Java libraries don't support Parquet at this time so I plan to > first use the Arrow Java libraries for the Avro->Arrow transform and then > use the Python Arrow to do the Arrow->Parquet transform. > > On the Java side I plan to download my Avro objects to a file, then create > the Arrow files and then upload these. > > See the AvroToArrow.avroToArrowIterator(schema, decoder, config) also see > the tests using AvroToArrow but even though I have read the limited > documentation I am not sure how to use go about using this to read the Avro > files and write output Arrow file. > > Can someone provide me with an example? > > > > >
