Hey Mike,

Thanks for your quick response!

I looked into the parquet + avro solution, it is a possibility for us to try.
I still have the same problem though, how can I serialize with parquet?

Thanks
Alem

Von: Mike Thomsen [mailto:[email protected]]
Gesendet: Dienstag, 2. Juni 2015 13:04
An: [email protected]
Betreff: Re: How to write avro objects to HDFS?

You can take the patch I wrote and apply it to a copy and pasted version of the 
HDFS bolt from storm-hdfs. Then you just need to add this to main() in your 
topology where "conf" is the topology Config object

Map<String, Object> hdfsConfig = new HashMap<String, Object>();
        hdfsConfig.put("fs.file.impl", "org.apache.hadoop.fs.LocalFileSystem");
        hdfsConfig.put("fs.hdfs.impl", 
"org.apache.hadoop.hdfs.DistributedFileSystem");
        hdfsConfig.put("io.serializations", 
"org.apache.hadoop.io.serializer.JavaSerialization,org.apache.avro.hadoop.io.AvroSerialization");
        conf.put("storm.hdfs.config", hdfsConfig);
I would caution you to not go this route. HDFS sequence files are really not a 
good match for Storm + Avro. You can easily end up with duplicates in them if 
you're not careful because processing Avro data is a lot more CPU-intensive 
than typical uses of Storm. So you'll want to make sure you give yourself some 
extra room in the timeouts and max pending tuples.
My understand is that Apache Parquet supports Avro and it seems to be a lot 
better than HDFS sequence files. It's worth a look before you get deep into 
this.

On Tue, Jun 2, 2015 at 5:42 AM, Filli Alem 
<[email protected]<mailto:[email protected]>> wrote:
Hi,
Im struggeling with writing avro objects to HDFS. Is this possible yet? If so 
how?
Im able to read messages from Kafka and output them to the console, but I have 
no idea on how to write them.

I found this commit but it doesn’t seem to be in the code base yet:
https://patch-diff.githubusercontent.com/raw/apache/storm/pull/347.patch

any help is much appreciated.
Alem

.


.


.
.

Reply via email to