Re: [jira] [Created] (FLINK-2695) KafkaITCase.testConcurrentProducerConsumerTopology failed on Travis

2015-09-23 Thread Stephan Ewen
The tests use a ZooKeeper mini cluster and multiple Kafka MiniClusters.

It appears that these are not very stable in our test setup. Let's see what
we can do to improve reliability there.

1) As a first step, I would suggest to reduce the number of concurrent
tests to one for this project, as it will prevent that we have multiple
concurrent mini clusters competing for compute resources.

2) The method "SimpleConsumerThread.getLastOffset()" Should probably
re-retrieve the leader, or we should allow the program more recovery
retries...

Greetings,
Stephan


On Wed, Sep 23, 2015 at 4:04 AM, Li, Chengxiang <chengxiang...@intel.com>
wrote:

> Found more KafkaITCase failure at:
> https://travis-ci.org/apache/flink/jobs/81592146
>
> Failed tests:
>
> KafkaITCase.testConcurrentProducerConsumerTopology:50->KafkaConsumerTestBase.runSimpleConcurrentProducerConsumerTopology:334->KafkaTestBase.tryExecute:313
> Test failed: The program execution failed: Job execution failed.
> Tests in error:
>
> KafkaITCase.testCancelingEmptyTopic:57->KafkaConsumerTestBase.runCancelingOnEmptyInputTest:594
> »
>
> KafkaITCase.testCancelingFullTopic:62->KafkaConsumerTestBase.runCancelingOnFullInputTest:529
> »
>
> KafkaITCase.testMultipleSourcesOnePartition:89->KafkaConsumerTestBase.runMultipleSourcesOnePartitionExactlyOnceTest:450
> » ProgramInvocation
>
> KafkaITCase.testOffsetInZookeeper:45->KafkaConsumerTestBase.runOffsetInZookeeperValidationTest:205->KafkaConsumerTestBase.writeSequence:938
> » ProgramInvocation
>
> KafkaITCase.testOneToOneSources:79->KafkaConsumerTestBase.runOneToOneExactlyOnceTest:356
> » ProgramInvocation
>
> It happens only on the test mode of JDK: oraclejdk8
> PROFILE="-Dhadoop.version=2.5.0 -Dmaven.javadoc.skip=true".
>
> Thanks
> Chengxiang
>
> -Original Message-----
> From: Till Rohrmann (JIRA) [mailto:j...@apache.org]
> Sent: Thursday, September 17, 2015 11:02 PM
> To: dev@flink.apache.org
> Subject: [jira] [Created] (FLINK-2695)
> KafkaITCase.testConcurrentProducerConsumerTopology failed on Travis
>
> Till Rohrmann created FLINK-2695:
> 
>
>  Summary: KafkaITCase.testConcurrentProducerConsumerTopology
> failed on Travis
>  Key: FLINK-2695
>  URL: https://issues.apache.org/jira/browse/FLINK-2695
>  Project: Flink
>   Issue Type: Bug
> Reporter: Till Rohrmann
> Priority: Critical
>
>
> The {{KafkaITCase.testConcurrentProducerConsumerTopology}} failed on
> Travis with
>
> {code}
> ---
>  T E S T S
> ---
> Running org.apache.flink.streaming.connectors.kafka.KafkaProducerITCase
> Running org.apache.flink.streaming.connectors.kafka.KafkaITCase
> Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 14.296 sec
> - in org.apache.flink.streaming.connectors.kafka.KafkaProducerITCase
> 09/16/2015 17:19:36 Job execution switched to status RUNNING.
> 09/16/2015 17:19:36 Source: Custom Source -> Sink: Unnamed(1/1)
> switched to SCHEDULED
> 09/16/2015 17:19:36 Source: Custom Source -> Sink: Unnamed(1/1)
> switched to DEPLOYING
> 09/16/2015 17:19:36 Source: Custom Source -> Sink: Unnamed(1/1)
> switched to RUNNING
> 09/16/2015 17:19:36 Source: Custom Source -> Sink: Unnamed(1/1)
> switched to FINISHED
> 09/16/2015 17:19:36 Job execution switched to status FINISHED.
> 09/16/2015 17:19:36 Job execution switched to status RUNNING.
> 09/16/2015 17:19:36 Source: Custom Source -> Map -> Flat Map(1/1)
> switched to SCHEDULED
> 09/16/2015 17:19:36 Source: Custom Source -> Map -> Flat Map(1/1)
> switched to DEPLOYING
> 09/16/2015 17:19:36 Source: Custom Source -> Map -> Flat Map(1/1)
> switched to RUNNING
> 09/16/2015 17:19:36 Source: Custom Source -> Map -> Flat Map(1/1)
> switched to FAILED
> java.lang.Exception: Could not forward element to next operator
> at
> org.apache.flink.streaming.connectors.kafka.internals.LegacyFetcher.run(LegacyFetcher.java:242)
> at
> org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer.run(FlinkKafkaConsumer.java:382)
> at
> org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:58)
> at
> org.apache.flink.streaming.runtime.tasks.SourceStreamTask.run(SourceStreamTask.java:58)
> at
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:171)
> at org.apache.flink.runtime.taskmanager.Task.run(Task.java:581)
> at java.

[jira] [Created] (FLINK-2695) KafkaITCase.testConcurrentProducerConsumerTopology failed on Travis

2015-09-17 Thread Till Rohrmann (JIRA)
Till Rohrmann created FLINK-2695:


 Summary: KafkaITCase.testConcurrentProducerConsumerTopology failed 
on Travis
 Key: FLINK-2695
 URL: https://issues.apache.org/jira/browse/FLINK-2695
 Project: Flink
  Issue Type: Bug
Reporter: Till Rohrmann
Priority: Critical


The {{KafkaITCase.testConcurrentProducerConsumerTopology}} failed on Travis with

{code}
---
 T E S T S
---
Running org.apache.flink.streaming.connectors.kafka.KafkaProducerITCase
Running org.apache.flink.streaming.connectors.kafka.KafkaITCase
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 14.296 sec - in 
org.apache.flink.streaming.connectors.kafka.KafkaProducerITCase
09/16/2015 17:19:36 Job execution switched to status RUNNING.
09/16/2015 17:19:36 Source: Custom Source -> Sink: Unnamed(1/1) switched to 
SCHEDULED 
09/16/2015 17:19:36 Source: Custom Source -> Sink: Unnamed(1/1) switched to 
DEPLOYING 
09/16/2015 17:19:36 Source: Custom Source -> Sink: Unnamed(1/1) switched to 
RUNNING 
09/16/2015 17:19:36 Source: Custom Source -> Sink: Unnamed(1/1) switched to 
FINISHED 
09/16/2015 17:19:36 Job execution switched to status FINISHED.
09/16/2015 17:19:36 Job execution switched to status RUNNING.
09/16/2015 17:19:36 Source: Custom Source -> Map -> Flat Map(1/1) switched 
to SCHEDULED 
09/16/2015 17:19:36 Source: Custom Source -> Map -> Flat Map(1/1) switched 
to DEPLOYING 
09/16/2015 17:19:36 Source: Custom Source -> Map -> Flat Map(1/1) switched 
to RUNNING 
09/16/2015 17:19:36 Source: Custom Source -> Map -> Flat Map(1/1) switched 
to FAILED 
java.lang.Exception: Could not forward element to next operator
at 
org.apache.flink.streaming.connectors.kafka.internals.LegacyFetcher.run(LegacyFetcher.java:242)
at 
org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer.run(FlinkKafkaConsumer.java:382)
at 
org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:58)
at 
org.apache.flink.streaming.runtime.tasks.SourceStreamTask.run(SourceStreamTask.java:58)
at 
org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:171)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:581)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.RuntimeException: Could not forward element to next 
operator
at 
org.apache.flink.streaming.runtime.tasks.OutputHandler$CopyingChainingOutput.collect(OutputHandler.java:332)
at 
org.apache.flink.streaming.runtime.tasks.OutputHandler$CopyingChainingOutput.collect(OutputHandler.java:316)
at 
org.apache.flink.streaming.runtime.io.CollectorWrapper.collect(CollectorWrapper.java:50)
at 
org.apache.flink.streaming.runtime.io.CollectorWrapper.collect(CollectorWrapper.java:30)
at 
org.apache.flink.streaming.runtime.tasks.SourceStreamTask$SourceOutput.collect(SourceStreamTask.java:92)
at 
org.apache.flink.streaming.api.operators.StreamSource$NonTimestampContext.collect(StreamSource.java:88)
at 
org.apache.flink.streaming.connectors.kafka.internals.LegacyFetcher$SimpleConsumerThread.run(LegacyFetcher.java:449)
Caused by: java.lang.RuntimeException: Could not forward element to next 
operator
at 
org.apache.flink.streaming.runtime.tasks.OutputHandler$CopyingChainingOutput.collect(OutputHandler.java:332)
at 
org.apache.flink.streaming.runtime.tasks.OutputHandler$CopyingChainingOutput.collect(OutputHandler.java:316)
at 
org.apache.flink.streaming.runtime.io.CollectorWrapper.collect(CollectorWrapper.java:50)
at 
org.apache.flink.streaming.runtime.io.CollectorWrapper.collect(CollectorWrapper.java:30)
at 
org.apache.flink.streaming.api.operators.StreamMap.processElement(StreamMap.java:37)
at 
org.apache.flink.streaming.runtime.tasks.OutputHandler$CopyingChainingOutput.collect(OutputHandler.java:329)
... 6 more
Caused by: 
org.apache.flink.streaming.connectors.kafka.testutils.SuccessException
at 
org.apache.flink.streaming.connectors.kafka.KafkaConsumerTestBase$7.flatMap(KafkaConsumerTestBase.java:896)
at 
org.apache.flink.streaming.connectors.kafka.KafkaConsumerTestBase$7.flatMap(KafkaConsumerTestBase.java:876)
at 
org.apache.flink.streaming.api.operators.StreamFlatMap.processElement(StreamFlatMap.java:47)
at 
org.apache.flink.streaming.runtime.tasks.OutputHandler$CopyingChainingOutput.collect(OutputHandler.java:329)
... 11 more

09/16/2015 17:19:36 Job execution switched to status FAILING.
09/16/2015 17:19:36 Job execution switched to status FAILED.
org.apache.flink.client.program.ProgramInvocationException: The program 
execution failed: Job execution failed.
at