Re: Checkpoint takes long with FlinkKafkaConsumer

2016-07-08 Thread Hironori Ogibayashi
Stephan, Thank you so much for your advise. I have modified FlinkKafkaConsumer09 source code and running the job again. It is working well so far (just a few hours.) I will keep it running this weekend. Here is the modification detail.

Re: Checkpoint takes long with FlinkKafkaConsumer

2016-07-05 Thread Stephan Ewen
I looked into this a bit and it, I think it is a Flink issue: The blocking is between the poll() and the commitToKafka() calls. The "commitToKafkaCall()" is not part of the checkpoint, it comes only after the checkpoint. So even if it is not called, this should not block the checkpoint. What may

Re: Checkpoint takes long with FlinkKafkaConsumer

2016-07-05 Thread Hironori Ogibayashi
Hi, Sorry for my late response. Actually, I received no response in Kafka mailing list and still cannot find the root cause. But when I use FlinkKafkaConsumer082, I do not encounter this issue, so I will use FlinkKafkaConsumer082. Thanks Hironori 2016-06-17 2:59 GMT+09:00 Ufuk Celebi

Re: Checkpoint takes long with FlinkKafkaConsumer

2016-06-16 Thread Ufuk Celebi
Thanks :) On Thu, Jun 16, 2016 at 3:21 PM, Hironori Ogibayashi wrote: > Ufuk, > > Yes, of course. I will be sure to update when I got some more information. > > Hironori > > 2016-06-16 1:56 GMT+09:00 Ufuk Celebi : >> Hey Hironori, >> >> thanks for reporting

Re: Checkpoint takes long with FlinkKafkaConsumer

2016-06-16 Thread Hironori Ogibayashi
Ufuk, Yes, of course. I will be sure to update when I got some more information. Hironori 2016-06-16 1:56 GMT+09:00 Ufuk Celebi : > Hey Hironori, > > thanks for reporting this. Could you please update this thread when > you have more information from the Kafka list? > > – Ufuk

Re: Checkpoint takes long with FlinkKafkaConsumer

2016-06-15 Thread Ufuk Celebi
Hey Hironori, thanks for reporting this. Could you please update this thread when you have more information from the Kafka list? – Ufuk On Wed, Jun 15, 2016 at 2:48 PM, Hironori Ogibayashi wrote: > Kostas, > > Thank you for your advise. I have posted my question to the

Re: Checkpoint takes long with FlinkKafkaConsumer

2016-06-15 Thread Hironori Ogibayashi
Kostas, Thank you for your advise. I have posted my question to the Kafka mailing list. I think Kafka brokers are fine because no errors on producer side with 15,000 msg/sec and from OS metrics, all of my brokers receives almost the same amount of network traffic. Thanks, Hironori 2016-06-14

Re: Checkpoint takes long with FlinkKafkaConsumer

2016-06-14 Thread Kostas Kloudas
Hello Hironori, The logs just show that you get stuck in the Kafka consumer polling loop, which does not allow the consumer lock to be released. Thus the Flink part of the consumer is never actually called. To my understanding this does not seem to be a Flink issue. Or at least this is not

Re: Checkpoint takes long with FlinkKafkaConsumer

2016-06-14 Thread Hironori Ogibayashi
Kostas, I have attached a log file from one of the taskManager. (The same host I executed jstack) I noticed that there are lots of "Marking the coordinator 2147482645 dead" message in the log. MyContinuousProcessingTimeTriggerGlobal in the log is my custom trigger which is based on

Re: Checkpoint takes long with FlinkKafkaConsumer

2016-06-14 Thread Hironori Ogibayashi
Kostas, Thank you for your response. Yes, I am using latest Flink, which is 1.0.3. Thanks, Hironori 2016-06-14 19:02 GMT+09:00 Kostas Kloudas : > Hello Hironori, > > Are you using the latest Flink version? > There were some changes in the FlinkConsumer in the latest

Re: Checkpoint takes long with FlinkKafkaConsumer

2016-06-14 Thread Kostas Kloudas
Hello Hironori, Are you using the latest Flink version? There were some changes in the FlinkConsumer in the latest releases. Thanks, Kostas > On Jun 14, 2016, at 11:52 AM, Hironori Ogibayashi > wrote: > > Hello, > > I am running Flink job which reads topics from Kafka

Checkpoint takes long with FlinkKafkaConsumer

2016-06-14 Thread Hironori Ogibayashi
Hello, I am running Flink job which reads topics from Kafka and write results to Redis. I use FsStatebackend with HDFS. I noticed that taking checkpoint takes serveral minutes and sometimes expires. --- 2016-06-14 17:25:40,734 INFO org.apache.flink.runtime.checkpoint.CheckpointCoordinator -