[GitHub] flink pull request #2580: [FLINK-4723] [kafka-connector] Unify committed off...

2016-10-17 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/2580


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request #2580: [FLINK-4723] [kafka-connector] Unify committed off...

2016-10-13 Thread rmetzger
Github user rmetzger commented on a diff in the pull request:

https://github.com/apache/flink/pull/2580#discussion_r83222904
  
--- Diff: 
flink-streaming-connectors/flink-connector-kafka-base/src/test/java/org/apache/flink/streaming/connectors/kafka/KafkaConsumerTestBase.java
 ---
@@ -208,6 +207,235 @@ public void runFailOnNoBrokerTest() throws Exception {
}
}
}
+
+   /**
+* Ensures that the committed offsets to Kafka are the offsets of "the 
next record to process"
+*/
+   public void runCommitOffsetsToKafka() throws Exception {
+   // 3 partitions with 50 records each (0-49, so the expected 
commit offset of each partition should be 50)
+   final int parallelism = 3;
+   final int recordsInEachPartition = 50;
+
+   final String topicName = 
writeSequence("testCommitOffsetsToKafkaTopic", recordsInEachPartition, 
parallelism, 1);
+
+   final StreamExecutionEnvironment env = 
StreamExecutionEnvironment.createRemoteEnvironment("localhost", flinkPort);
+   env.getConfig().disableSysoutLogging();
+   
env.getConfig().setRestartStrategy(RestartStrategies.noRestart());
+   env.setParallelism(parallelism);
+   env.enableCheckpointing(200);
+
+   DataStream stream = 
env.addSource(kafkaServer.getConsumer(topicName, new SimpleStringSchema(), 
standardProps));
+   stream.addSink(new DiscardingSink());
+
+   final AtomicReference errorRef = new 
AtomicReference<>();
+   final Thread runner = new Thread("runner") {
+   @Override
+   public void run() {
+   try {
+   env.execute();
+   }
+   catch (Throwable t) {
+   if (!(t.getCause() instanceof 
JobCancellationException)) {
+   errorRef.set(t);
+   }
+   }
+   }
+   };
+   runner.start();
+
+   final Long l50 = 50L; // the final committed offset in Kafka 
should be 50
+   final long deadline = 3 + System.currentTimeMillis();
+
+   KafkaTestEnvironment.KafkaOffsetHandler kafkaOffsetHandler = 
kafkaServer.createOffsetHandler(standardProps);
+
+   do {
+   Long o1 = 
kafkaOffsetHandler.getCommittedOffset(topicName, 0);
+   Long o2 = 
kafkaOffsetHandler.getCommittedOffset(topicName, 1);
+   Long o3 = 
kafkaOffsetHandler.getCommittedOffset(topicName, 2);
+
+   if (l50.equals(o1) && l50.equals(o2) && l50.equals(o3)) 
{
+   break;
+   }
+
+   Thread.sleep(100);
+   }
+   while (System.currentTimeMillis() < deadline);
+
+   // cancel the job
+   
JobManagerCommunicationUtils.cancelCurrentJob(flink.getLeaderGateway(timeout));
+
+   final Throwable t = errorRef.get();
+   if (t != null) {
+   throw new RuntimeException("Job failed with an 
exception", t);
+   }
+
+   // final check to see if offsets are correctly in Kafka
+   Long o1 = kafkaOffsetHandler.getCommittedOffset(topicName, 0);
+   Long o2 = kafkaOffsetHandler.getCommittedOffset(topicName, 1);
+   Long o3 = kafkaOffsetHandler.getCommittedOffset(topicName, 2);
+   Assert.assertEquals(Long.valueOf(50L), o1);
+   Assert.assertEquals(Long.valueOf(50L), o2);
+   Assert.assertEquals(Long.valueOf(50L), o3);
+
+   kafkaOffsetHandler.close();
+   deleteTestTopic(topicName);
+   }
+
+   /**
+* This test first writes a total of 200 records to a test topic, reads 
the first 100 so that some offsets are
+* committed to Kafka, and then startup the consumer again to read the 
remaining records starting from the committed offsets.
+* The test ensures that whatever offsets were committed to Kafka, the 
consumer correctly picks them up
+* and starts at the correct position.
+*/
+   public void runStartFromKafkaCommitOffsets() throws Exception {
+   final int parallelism = 3;
+   final int recordsInEachPartition = 300;
+
+   final String topicName = 
writeSequence("testStartFromKafkaCommitOffsetsTopic", recordsInEachPartition, 
parallelism, 1);
+
+   final StreamExecutionEnvironment env1 = 
StreamExecutionEnvironment.createRemoteEnvironment("localhost", flinkPort);
+   env1.ge

[GitHub] flink pull request #2580: [FLINK-4723] [kafka-connector] Unify committed off...

2016-10-10 Thread rmetzger
Github user rmetzger commented on a diff in the pull request:

https://github.com/apache/flink/pull/2580#discussion_r82577343
  
--- Diff: 
flink-streaming-connectors/flink-connector-kafka-base/src/test/java/org/apache/flink/streaming/connectors/kafka/KafkaConsumerTestBase.java
 ---
@@ -208,6 +207,235 @@ public void runFailOnNoBrokerTest() throws Exception {
}
}
}
+
+   /**
+* Ensures that the committed offsets to Kafka are the offsets of "the 
next record to process"
+*/
+   public void runCommitOffsetsToKafka() throws Exception {
+   // 3 partitions with 50 records each (0-49, so the expected 
commit offset of each partition should be 50)
+   final int parallelism = 3;
+   final int recordsInEachPartition = 50;
+
+   final String topicName = 
writeSequence("testCommitOffsetsToKafkaTopic", recordsInEachPartition, 
parallelism, 1);
+
+   final StreamExecutionEnvironment env = 
StreamExecutionEnvironment.createRemoteEnvironment("localhost", flinkPort);
+   env.getConfig().disableSysoutLogging();
+   
env.getConfig().setRestartStrategy(RestartStrategies.noRestart());
+   env.setParallelism(parallelism);
+   env.enableCheckpointing(200);
+
+   DataStream stream = 
env.addSource(kafkaServer.getConsumer(topicName, new SimpleStringSchema(), 
standardProps));
+   stream.addSink(new DiscardingSink());
+
+   final AtomicReference errorRef = new 
AtomicReference<>();
+   final Thread runner = new Thread("runner") {
+   @Override
+   public void run() {
+   try {
+   env.execute();
+   }
+   catch (Throwable t) {
+   if (!(t.getCause() instanceof 
JobCancellationException)) {
+   errorRef.set(t);
+   }
+   }
+   }
+   };
+   runner.start();
+
+   final Long l50 = 50L; // the final committed offset in Kafka 
should be 50
+   final long deadline = 3 + System.currentTimeMillis();
+
+   KafkaTestEnvironment.KafkaOffsetHandler kafkaOffsetHandler = 
kafkaServer.createOffsetHandler(standardProps);
+
+   do {
+   Long o1 = 
kafkaOffsetHandler.getCommittedOffset(topicName, 0);
+   Long o2 = 
kafkaOffsetHandler.getCommittedOffset(topicName, 1);
+   Long o3 = 
kafkaOffsetHandler.getCommittedOffset(topicName, 2);
+
+   if (l50.equals(o1) && l50.equals(o2) && l50.equals(o3)) 
{
+   break;
+   }
+
+   Thread.sleep(100);
+   }
+   while (System.currentTimeMillis() < deadline);
+
+   // cancel the job
+   
JobManagerCommunicationUtils.cancelCurrentJob(flink.getLeaderGateway(timeout));
+
+   final Throwable t = errorRef.get();
+   if (t != null) {
+   throw new RuntimeException("Job failed with an 
exception", t);
+   }
+
+   // final check to see if offsets are correctly in Kafka
+   Long o1 = kafkaOffsetHandler.getCommittedOffset(topicName, 0);
+   Long o2 = kafkaOffsetHandler.getCommittedOffset(topicName, 1);
+   Long o3 = kafkaOffsetHandler.getCommittedOffset(topicName, 2);
+   Assert.assertEquals(Long.valueOf(50L), o1);
+   Assert.assertEquals(Long.valueOf(50L), o2);
+   Assert.assertEquals(Long.valueOf(50L), o3);
+
+   kafkaOffsetHandler.close();
+   deleteTestTopic(topicName);
+   }
+
+   /**
+* This test first writes a total of 200 records to a test topic, reads 
the first 100 so that some offsets are
+* committed to Kafka, and then startup the consumer again to read the 
remaining records starting from the committed offsets.
+* The test ensures that whatever offsets were committed to Kafka, the 
consumer correctly picks them up
+* and starts at the correct position.
+*/
+   public void runStartFromKafkaCommitOffsets() throws Exception {
+   final int parallelism = 3;
+   final int recordsInEachPartition = 300;
+
+   final String topicName = 
writeSequence("testStartFromKafkaCommitOffsetsTopic", recordsInEachPartition, 
parallelism, 1);
+
+   final StreamExecutionEnvironment env1 = 
StreamExecutionEnvironment.createRemoteEnvironment("localhost", flinkPort);
+   env1.ge

[GitHub] flink pull request #2580: [FLINK-4723] [kafka-connector] Unify committed off...

2016-10-10 Thread rmetzger
Github user rmetzger commented on a diff in the pull request:

https://github.com/apache/flink/pull/2580#discussion_r82576069
  
--- Diff: 
flink-streaming-connectors/flink-connector-kafka-base/src/test/java/org/apache/flink/streaming/connectors/kafka/KafkaConsumerTestBase.java
 ---
@@ -208,6 +207,235 @@ public void runFailOnNoBrokerTest() throws Exception {
}
}
}
+
+   /**
+* Ensures that the committed offsets to Kafka are the offsets of "the 
next record to process"
+*/
+   public void runCommitOffsetsToKafka() throws Exception {
+   // 3 partitions with 50 records each (0-49, so the expected 
commit offset of each partition should be 50)
+   final int parallelism = 3;
+   final int recordsInEachPartition = 50;
+
+   final String topicName = 
writeSequence("testCommitOffsetsToKafkaTopic", recordsInEachPartition, 
parallelism, 1);
+
+   final StreamExecutionEnvironment env = 
StreamExecutionEnvironment.createRemoteEnvironment("localhost", flinkPort);
+   env.getConfig().disableSysoutLogging();
+   
env.getConfig().setRestartStrategy(RestartStrategies.noRestart());
+   env.setParallelism(parallelism);
+   env.enableCheckpointing(200);
+
+   DataStream stream = 
env.addSource(kafkaServer.getConsumer(topicName, new SimpleStringSchema(), 
standardProps));
+   stream.addSink(new DiscardingSink());
+
+   final AtomicReference errorRef = new 
AtomicReference<>();
+   final Thread runner = new Thread("runner") {
+   @Override
+   public void run() {
+   try {
+   env.execute();
+   }
+   catch (Throwable t) {
+   if (!(t.getCause() instanceof 
JobCancellationException)) {
+   errorRef.set(t);
+   }
+   }
+   }
+   };
+   runner.start();
+
+   final Long l50 = 50L; // the final committed offset in Kafka 
should be 50
+   final long deadline = 3 + System.currentTimeMillis();
+
+   KafkaTestEnvironment.KafkaOffsetHandler kafkaOffsetHandler = 
kafkaServer.createOffsetHandler(standardProps);
+
+   do {
+   Long o1 = 
kafkaOffsetHandler.getCommittedOffset(topicName, 0);
+   Long o2 = 
kafkaOffsetHandler.getCommittedOffset(topicName, 1);
+   Long o3 = 
kafkaOffsetHandler.getCommittedOffset(topicName, 2);
+
+   if (l50.equals(o1) && l50.equals(o2) && l50.equals(o3)) 
{
+   break;
+   }
+
+   Thread.sleep(100);
+   }
+   while (System.currentTimeMillis() < deadline);
+
+   // cancel the job
+   
JobManagerCommunicationUtils.cancelCurrentJob(flink.getLeaderGateway(timeout));
+
+   final Throwable t = errorRef.get();
+   if (t != null) {
+   throw new RuntimeException("Job failed with an 
exception", t);
+   }
+
+   // final check to see if offsets are correctly in Kafka
+   Long o1 = kafkaOffsetHandler.getCommittedOffset(topicName, 0);
+   Long o2 = kafkaOffsetHandler.getCommittedOffset(topicName, 1);
+   Long o3 = kafkaOffsetHandler.getCommittedOffset(topicName, 2);
+   Assert.assertEquals(Long.valueOf(50L), o1);
+   Assert.assertEquals(Long.valueOf(50L), o2);
+   Assert.assertEquals(Long.valueOf(50L), o3);
+
+   kafkaOffsetHandler.close();
+   deleteTestTopic(topicName);
+   }
+
+   /**
+* This test first writes a total of 200 records to a test topic, reads 
the first 100 so that some offsets are
--- End diff --

300, 150


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request #2580: [FLINK-4723] [kafka-connector] Unify committed off...

2016-10-03 Thread tzulitai
GitHub user tzulitai opened a pull request:

https://github.com/apache/flink/pull/2580

[FLINK-4723] [kafka-connector] Unify committed offsets to Kafka to be the 
next record to process

The description within the JIRA ticket 
([FLINK-4723](https://issues.apache.org/jira/browse/FLINK-4723)) explains the 
reasoning for this change.

With this change, offsets committed to Kafka are larger by 1 compared to 
the internally checkpointed offsets. This is changed at the 
`FlinkKafkaConsumerBase` level, so that offsets given through the abstract 
`commitSpecificOffsetsToKafka()` method to the version-specific implementations 
are already incremented and represent the next record to process. This way, the 
version-specific implementations simply commit the given offsets without the 
need to manipulate them.

This PR also includes major refactoring of the IT tests to add commit 
offset related IT tests to `FlinkKafkaConsumerTestBase`, and let both the 0.8 
and 0.9 consumers run offset committing / initial offset startup tests 
(previously only the 0.8 consumer had these tests).

R: @rmetzger what's your take on this?

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tzulitai/flink FLINK-4723

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/2580.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2580


commit cc782ffd4c174f23c45349771b318a08a2be75a3
Author: Tzu-Li (Gordon) Tai 
Date:   2016-10-02T08:54:57Z

[FLINK-4723] [kafka-connector] Unify committed offsets to Kafka to be next 
record to process




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---