Re: how to write kafka connect hdfs parquet sink.

2016-07-25 Thread Ewen Cheslack-Postava
Kidong, Yes, if you are using a different format for serializing data in Kafka, the Converter interface is what you'd need to implement. We isolated serialization + conversion from connectors precisely so connectors don't need to worry about the exact format of data in Kafka, instead only having

Re: how to write kafka connect hdfs parquet sink.

2016-07-25 Thread Kidong Lee
Hi Ewen, do you mean, I should implement avro converter like AvroConverter of confluent? I think, I should also understand connect internal data structure which

Re: Re: KafkaConsumer position block

2016-07-25 Thread yuanjia8...@163.com
Thanks Guozhang. Yuanjia Li From: Guozhang Wang Date: 2016-07-26 07:06 To: users@kafka.apache.org Subject: Re: Re: KafkaConsumer position block Hi Yuanjia, If the consumer has just been created and there is no metadata in it yet, seeking to the latest offset would require at least two

Re: release of 0.10.1

2016-07-25 Thread Guozhang Wang
David, Regex consumption is included in 0.10.0.0 already I think? Guozhang On Sun, Jul 24, 2016 at 6:57 PM, David Garcia wrote: > We basically need the regex(java-util regex) support for specifying source > topics. > > > On 7/23/16, 7:41 PM, "Ewen Cheslack-Postava"

Re: Re: KafkaConsumer position block

2016-07-25 Thread Guozhang Wang
Hi Yuanjia, If the consumer has just been created and there is no metadata in it yet, seeking to the latest offset would require at least two round-trips to the broker to first get the metadata of the partitions, and then get the offsets from the partition hosted brokers. Note that

Kafka: Intermittent slowness when consuming first message from topic

2016-07-25 Thread Joyce, David (XIG)
I am using Kafka 0.9.0.1. The first time I start up my application it takes 20-30 seconds to retrieve the "latest" message from the topic I've used different Kafka brokers (with different configs) yet I still see this behaviour. There is usually no slowness for subsequent messages. Is this

Re: how to write kafka connect hdfs parquet sink.

2016-07-25 Thread Ewen Cheslack-Postava
If I'm understanding your setup properly, you need a way to convert your data from your own Avro format to Connect format. From there, the existing Parquet support in the HDFS connector should work for you. So what you need is your own implementation of an AvroConverter, which is what loads the

NotLeaderForPartitionException -- how to recover?

2016-07-25 Thread Samuel Taylor
Hi everyone, Has anyone seen this error and/or is there a good way to recover from it? I have two brokers running, and after attempting to produce to a topic (using the library kafka-python), and they are both spitting out lots of messages that look like this: ERROR [ReplicaFetcherThread-0-1],

Re: Mirror maker higher offset in the mirror.

2016-07-25 Thread Gerard Klijs
Things like consumer rebalances on the cluster you copy from, and brokers going down on the cluster your writing down can cause duplications. The default settings are set to prevent data loss, making data duplication more likely to happen in case of error. You could possibly make a simple consumer

Re: how to write kafka connect hdfs parquet sink.

2016-07-25 Thread Clifford Resnick
You would probably use the Hadoop parquet-mr WriteSupport, which has less to do with mapreduce, more to do with all the encodings that go into writing a Parquet file. Avro as an intermediate serialization works great, but I think most of your work would be in managing rolling from one file to

Re: how to write kafka connect hdfs parquet sink.

2016-07-25 Thread Dustin Cote
I believe what you are looking for is a ParquetSerializer which I'm not aware of any existing ones. In that case, you'd have to write your own, and your AvroSerializer is probably a good thing to template from. Then you would just use the HDFSSink Connector again and change the serialization

Kafka Producer Buffer Exhausted

2016-07-25 Thread Muqtafi Akhmad
hello Kafka Users, I found some exceptions indicating that buffer in my Kafka producers are full (exception thrown is : org.apache.kafka.clients.producer.BufferExhaustedException). I set the buffer size to 30MB and send the event using producer's asynchronous method. (1) Is there any suggestion