I implemented something similar to this recently. What you can do is mount a tmpfs, batch up GenericRecords, write them to a Parquet file in the tmpfs, then read it back into a byte[] to do with it as you wish.
On 30 August 2017 at 13:17, Mike Percy <[email protected]> wrote: > I know that this reply is quite late. I'm not aware of any Flume Parquet > writer that currently exists. If it was me I would stream it to HDFS in > Avro format and then use an ETL job (perhaps via Spark or Impala) to > convert the Avro to Parquet in large batches. Parquet is well suited to > large batches of records due to its columnar nature. > > Mike > > On Sun, Jul 16, 2017 at 11:24 PM, Kumar, Ashok 6. (Nokia - IN/Bangalore) < > [email protected]> wrote: > >> Hi all , >> >> >> >> I have avro data coming from kafka and I want to convert it into Parquet >> using flume. I am not sure how to do it. Can anyone help me out in this. >> >> >> >> Regards , >> >> Ashok >> > > -- Matt Sicker <[email protected]>
