Creating a DataEngineering pipeline that will create transform binary Avro objects in S3 buckets to S3 Arrow objects and Parquet objects.
See that Java libraries don't support Parquet at this time so I plan to first use the Arrow Java libraries for the Avro->Arrow transform and then use the Python Arrow to do the Arrow->Parquet transform. On the Java side I plan to download my Avro objects to a file, then create the Arrow files and then upload these. See the AvroToArrow.avroToArrowIterator(schema, decoder, config) also see the tests using AvroToArrow but even though I have read the limited documentation I am not sure how to use go about using this to read the Avro files and write output Arrow file. Can someone provide me with an example?
