Parquet appears to have its own API for that. You'll have to look for how
it handles Avro. I believe I saw it as a supported serialization type.

On Wed, Jun 3, 2015 at 9:06 AM, Filli Alem <[email protected]> wrote:

>  Hey Mike,
>
>
>
> Thanks for your quick response!
>
>
>
> I looked into the parquet + avro solution, it is a possibility for us to
> try.
>
> I still have the same problem though, how can I serialize with parquet?
>
>
>
> Thanks
>
> Alem
>
>
>
> *Von:* Mike Thomsen [mailto:[email protected]]
> *Gesendet:* Dienstag, 2. Juni 2015 13:04
> *An:* [email protected]
> *Betreff:* Re: How to write avro objects to HDFS?
>
>
>
> You can take the patch I wrote and apply it to a copy and pasted version
> of the HDFS bolt from storm-hdfs. Then you just need to add this to main()
> in your topology where "conf" is the topology Config object
>
> Map<String, Object> hdfsConfig = new HashMap<String, Object>();
>         hdfsConfig.put("fs.file.impl",
> "org.apache.hadoop.fs.LocalFileSystem");
>         hdfsConfig.put("fs.hdfs.impl",
> "org.apache.hadoop.hdfs.DistributedFileSystem");
>         hdfsConfig.put("io.serializations",
> "org.apache.hadoop.io.serializer.JavaSerialization,org.apache.avro.hadoop.io.AvroSerialization");
>         conf.put("storm.hdfs.config", hdfsConfig);
>
> I would caution you to not go this route. HDFS sequence files are really
> not a good match for Storm + Avro. You can easily end up with duplicates in
> them if you're not careful because processing Avro data is a lot more
> CPU-intensive than typical uses of Storm. So you'll want to make sure you
> give yourself some extra room in the timeouts and max pending tuples.
>
> My understand is that Apache Parquet supports Avro and it seems to be a
> lot better than HDFS sequence files. It's worth a look before you get deep
> into this.
>
>
>
> On Tue, Jun 2, 2015 at 5:42 AM, Filli Alem <[email protected]> wrote:
>
>  Hi,
>
> Im struggeling with writing avro objects to HDFS. Is this possible yet? If
> so how?
>
> Im able to read messages from Kafka and output them to the console, but I
> have no idea on how to write them.
>
>
>
> I found this commit but it doesn’t seem to be in the code base yet:
>
> https://patch-diff.githubusercontent.com/raw/apache/storm/pull/347.patch
>
>
>
> any help is much appreciated.
>
> Alem
>
>
>
> .
>
>
> * .*
>
>
>    .
> * .*
>

Reply via email to