You could bunch these messages up into a single Flume event and then write a 
serializer that reads each of these Avro events and then writes them into an 
Avro container file (you can take a look at the AvroEventSerializer) - the 
downside of this is that you’d have to decode and re-encode the files in your 
serializer. 


Thanks,
Hari

On Tue, Sep 30, 2014 at 12:54 AM, Bryce Alcock <[email protected]>
wrote:

> Not sure if I am approaching this problem correctly, But here is the basic
> outline:
> I would like to send say 10000, or even more small Avro messages in a
> single Flume Event For storage on HDFS.
> When I do this, it corrupts the "Avro" file created on HDFS because (I
> assume based in a bit of reading) that it messes with the "Framing" that
> Avro provides.
> So the long and the short of it is that if I send, say 2, Flume events each
> containing 10000 Avro Messages for storage on HDFS and stores the 2
> "Packets of" of avro messages in a single file on HDFS (using the HDFS
> sink), the first 10000 messages are readable, but the 10001 message is
> corrupt.
> I am doing this for performance purposes, I need to be sending about
> 1500*3600 = 5,400,000  (yes 5.4 million) small messages every ~4 seconds.
> I know this is alot of messages....
> I can produce the message at the correct rate, but I cannot flume them in
> very fast because I have to create an "Flume Event" with a Avro Schema
> attached to each message, so I thought if I could batch up a bunch of them
> at once, It would be more efficient.
> Thanks In Advacnce!
> Q. Boiler

Reply via email to