Hi, The scenario is a machine dynamically generates data, which consists sections of binary data. We use Flume SDK to collect data and the sink is HDFS(SequenceFile).
I'm curious what is in the sequence file, since Flume is unaware of schema. i.e., How does Flume and Avro do serialization without schema? ( Directly writing raw bytes to disk file may cause alignment issue). http://stackoverflow.com/questions/18001818/avro-schema-storage is similar to my question. Also, how the key is determined in the sequence file? If my understanding is not correct, please indicate correct usage of Flume with Avro. Thank you for your clarification. Cheers, Blade
