Re: how to write kafka connect hdfs parquet sink.

2016-07-25 Thread Ewen Cheslack-Postava
Kidong, Yes, if you are using a different format for serializing data in Kafka, the Converter interface is what you'd need to implement. We isolated serialization + conversion from connectors precisely so connectors don't need to worry about the exact format of data in Kafka, instead only having

Re: how to write kafka connect hdfs parquet sink.

2016-07-25 Thread Kidong Lee
Hi Ewen, do you mean, I should implement avro converter like AvroConverter of confluent? I think, I should also understand connect internal data structure which

Re: how to write kafka connect hdfs parquet sink.

2016-07-25 Thread Ewen Cheslack-Postava
If I'm understanding your setup properly, you need a way to convert your data from your own Avro format to Connect format. From there, the existing Parquet support in the HDFS connector should work for you. So what you need is your own implementation of an AvroConverter, which is what loads the

Re: how to write kafka connect hdfs parquet sink.

2016-07-25 Thread Clifford Resnick
You would probably use the Hadoop parquet-mr WriteSupport, which has less to do with mapreduce, more to do with all the encodings that go into writing a Parquet file. Avro as an intermediate serialization works great, but I think most of your work would be in managing rolling from one file to

Re: how to write kafka connect hdfs parquet sink.

2016-07-25 Thread Dustin Cote
I believe what you are looking for is a ParquetSerializer which I'm not aware of any existing ones. In that case, you'd have to write your own, and your AvroSerializer is probably a good thing to template from. Then you would just use the HDFSSink Connector again and change the serialization

how to write kafka connect hdfs parquet sink.

2016-07-24 Thread Kidong Lee
Hi, I have read confluent kafka connect hdfs but I don't want to use schema registry from confluent. I have produced avro encoded bytes to kafka, at that time, I have written my own avro serializer, not used