[jira] [Commented] (FLINK-3066) Kafka producer fails on leader change

2016-02-09 Thread Gyula Fora (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15138677#comment-15138677
 ] 

Gyula Fora commented on FLINK-3066:
---

Thank you Robert for the help, it is a good catch :)

> Kafka producer fails on leader change
> -
>
> Key: FLINK-3066
> URL: https://issues.apache.org/jira/browse/FLINK-3066
> Project: Flink
>  Issue Type: Bug
>  Components: Kafka Connector, Streaming Connectors
>Affects Versions: 0.10.0, 1.0.0
>Reporter: Gyula Fora
>
> I got the following exception during my streaming job:
> {code}
> 16:44:50,637 INFO  org.apache.flink.runtime.jobmanager.JobManager 
>- Status of job 4d3f9443df4822e875f1400244a6e8dd (deduplo!) changed to 
> FAILING.
> java.lang.Exception: Failed to send data to Kafka: This server is not the 
> leader for that topic-partition.
>   at 
> org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducer.checkErroneous(FlinkKafkaProducer.java:275)
>   at 
> org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducer.invoke(FlinkKafkaProducer.java:246)
>   at 
> org.apache.flink.streaming.api.operators.StreamSink.processElement(StreamSink.java:37)
>   at 
> org.apache.flink.streaming.runtime.io.StreamInputProcessor.processInput(StreamInputProcessor.java:166)
>   at 
> org.apache.flink.streaming.runtime.tasks.OneInputStreamTask.run(OneInputStreamTask.java:63)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:221)
>   at org.apache.flink.runtime.taskmanager.Task.run(Task.java:588)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.kafka.common.errors.NotLeaderForPartitionException: 
> This server is not the leader for that topic-partition.
> {code}
> And then the job crashed and recovered. This should probably be something 
> that we handle without crashing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3066) Kafka producer fails on leader change

2016-02-08 Thread Robert Metzger (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15137610#comment-15137610
 ] 

Robert Metzger commented on FLINK-3066:
---

I have to correct my comment from november last year: It is the Producer which 
is failing. In this case we need to see why the Kafka's producer implementation 
is not properly handling this.

> Kafka producer fails on leader change
> -
>
> Key: FLINK-3066
> URL: https://issues.apache.org/jira/browse/FLINK-3066
> Project: Flink
>  Issue Type: Bug
>  Components: Kafka Connector, Streaming Connectors
>Affects Versions: 0.10.0, 1.0.0
>Reporter: Gyula Fora
>
> I got the following exception during my streaming job:
> {code}
> 16:44:50,637 INFO  org.apache.flink.runtime.jobmanager.JobManager 
>- Status of job 4d3f9443df4822e875f1400244a6e8dd (deduplo!) changed to 
> FAILING.
> java.lang.Exception: Failed to send data to Kafka: This server is not the 
> leader for that topic-partition.
>   at 
> org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducer.checkErroneous(FlinkKafkaProducer.java:275)
>   at 
> org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducer.invoke(FlinkKafkaProducer.java:246)
>   at 
> org.apache.flink.streaming.api.operators.StreamSink.processElement(StreamSink.java:37)
>   at 
> org.apache.flink.streaming.runtime.io.StreamInputProcessor.processInput(StreamInputProcessor.java:166)
>   at 
> org.apache.flink.streaming.runtime.tasks.OneInputStreamTask.run(OneInputStreamTask.java:63)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:221)
>   at org.apache.flink.runtime.taskmanager.Task.run(Task.java:588)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.kafka.common.errors.NotLeaderForPartitionException: 
> This server is not the leader for that topic-partition.
> {code}
> And then the job crashed and recovered. This should probably be something 
> that we handle without crashing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3066) Kafka producer fails on leader change

2016-02-08 Thread Robert Metzger (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15137662#comment-15137662
 ] 

Robert Metzger commented on FLINK-3066:
---

I further looked into the issue. There is a quite obvious way to fix it:
Set the number of retries to a value above 0. By default Kafka sets it to 0 to 
avoid duplicates.
Your job is called "deduplo", so I assume the objective here is to avoid 
duplicates.
The fundamental issue is that Kafka's producers don't have any "exactly-once" 
guarantees / they don't support no way of committing data.

I would vote to close this issue since this is a limitation of the underlying 
API / system.

> Kafka producer fails on leader change
> -
>
> Key: FLINK-3066
> URL: https://issues.apache.org/jira/browse/FLINK-3066
> Project: Flink
>  Issue Type: Bug
>  Components: Kafka Connector, Streaming Connectors
>Affects Versions: 0.10.0, 1.0.0
>Reporter: Gyula Fora
>
> I got the following exception during my streaming job:
> {code}
> 16:44:50,637 INFO  org.apache.flink.runtime.jobmanager.JobManager 
>- Status of job 4d3f9443df4822e875f1400244a6e8dd (deduplo!) changed to 
> FAILING.
> java.lang.Exception: Failed to send data to Kafka: This server is not the 
> leader for that topic-partition.
>   at 
> org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducer.checkErroneous(FlinkKafkaProducer.java:275)
>   at 
> org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducer.invoke(FlinkKafkaProducer.java:246)
>   at 
> org.apache.flink.streaming.api.operators.StreamSink.processElement(StreamSink.java:37)
>   at 
> org.apache.flink.streaming.runtime.io.StreamInputProcessor.processInput(StreamInputProcessor.java:166)
>   at 
> org.apache.flink.streaming.runtime.tasks.OneInputStreamTask.run(OneInputStreamTask.java:63)
>   at 
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:221)
>   at org.apache.flink.runtime.taskmanager.Task.run(Task.java:588)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.kafka.common.errors.NotLeaderForPartitionException: 
> This server is not the leader for that topic-partition.
> {code}
> And then the job crashed and recovered. This should probably be something 
> that we handle without crashing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)