Creating a DataEngineering pipeline that will create transform binary Avro 
objects in S3 buckets to S3 Arrow objects and Parquet objects.  

See that Java libraries don't support Parquet at this time so I plan to first 
use the Arrow Java libraries for the Avro->Arrow transform and then use the 
Python Arrow to do the Arrow->Parquet transform.  

On the Java side I plan to download my Avro objects to a file, then create the 
Arrow files and then upload these.  

See the AvroToArrow.avroToArrowIterator(schema, decoder, config) also see the 
tests using AvroToArrow but even though I have read the limited documentation I 
am not sure how to use go about using this to read the Avro files and write 
output Arrow file. 

Can someone provide me with an example? 




Reply via email to