[jira] [Created] (FLINK-27756) Fix Intermittingly failing test in `AsyncSinkWriterTest`

2022-05-24 Thread Ahmed Hamdy (Jira)
Ahmed Hamdy created FLINK-27756:
---

 Summary: Fix Intermittingly failing test in `AsyncSinkWriterTest`
 Key: FLINK-27756
 URL: https://issues.apache.org/jira/browse/FLINK-27756
 Project: Flink
  Issue Type: Sub-task
  Components: Connectors / Kinesis
Affects Versions: 1.15.0
Reporter: Ahmed Hamdy
Assignee: Ahmed Hamdy
 Fix For: 1.15.0


h2. Motivation

- Add documentation for the kinesis firehose table api feature.
- Add user guide and configuration list.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


Re: Failing Test

2016-04-05 Thread Maximilian Michels
Thanks, the actual problem is that the ActorSystem gets shutdown. This
breaks the testing code. Should be fixed once
https://github.com/apache/flink/pull/1852 is merged.

On Tue, Apr 5, 2016 at 12:25 PM, Matthias J. Sax  wrote:
> Happened again after your fix:
> https://travis-ci.org/apache/flink/jobs/120620482
>
> -Matthias
>
>
> On 04/01/2016 08:57 PM, Maximilian Michels wrote:
>> Fixed with the resolution of 
>> https://issues.apache.org/jira/browse/FLINK-3689.
>>
>> On Fri, Apr 1, 2016 at 12:40 PM, Maximilian Michels  wrote:
>>> Hi Matthias,
>>>
>>> Thanks for spotting the test failure. It's actually a bug in the code
>>> and not a test problem. Fixing it.
>>>
>>> Cheers,
>>> Max
>>>
>>> On Fri, Apr 1, 2016 at 9:33 AM, Ufuk Celebi  wrote:
 Hey Matthias,

 the test has been only recently added with the resource management
 refactoring. It's probably just a too aggressive timeout for Travis.

 @Max: Did you ever see this fail?

 – Ufuk

 On Fri, Apr 1, 2016 at 9:24 AM, Matthias J. Sax  wrote:
> Anyone seen this before? One-time thing or test instability?
>
>> ClusterShutdownITCase.testClusterShutdown:71 assertion failed: timeout 
>> (29848225634 nanoseconds) during expectMsgClass waiting for class 
>> org.apache.flink.runtime.clusterframework.messages.StopClusterSuccessful
>
>
> -Matthias
>
>


Re: Failing Test

2016-04-05 Thread Matthias J. Sax
Happened again after your fix:
https://travis-ci.org/apache/flink/jobs/120620482

-Matthias


On 04/01/2016 08:57 PM, Maximilian Michels wrote:
> Fixed with the resolution of https://issues.apache.org/jira/browse/FLINK-3689.
> 
> On Fri, Apr 1, 2016 at 12:40 PM, Maximilian Michels  wrote:
>> Hi Matthias,
>>
>> Thanks for spotting the test failure. It's actually a bug in the code
>> and not a test problem. Fixing it.
>>
>> Cheers,
>> Max
>>
>> On Fri, Apr 1, 2016 at 9:33 AM, Ufuk Celebi  wrote:
>>> Hey Matthias,
>>>
>>> the test has been only recently added with the resource management
>>> refactoring. It's probably just a too aggressive timeout for Travis.
>>>
>>> @Max: Did you ever see this fail?
>>>
>>> – Ufuk
>>>
>>> On Fri, Apr 1, 2016 at 9:24 AM, Matthias J. Sax  wrote:
 Anyone seen this before? One-time thing or test instability?

> ClusterShutdownITCase.testClusterShutdown:71 assertion failed: timeout 
> (29848225634 nanoseconds) during expectMsgClass waiting for class 
> org.apache.flink.runtime.clusterframework.messages.StopClusterSuccessful


 -Matthias




signature.asc
Description: OpenPGP digital signature


Re: Failing Test

2016-04-02 Thread Matthias J. Sax
Thanks. Just tried is out and it works :)

On 04/01/2016 08:57 PM, Maximilian Michels wrote:
> Fixed with the resolution of https://issues.apache.org/jira/browse/FLINK-3689.
> 
> On Fri, Apr 1, 2016 at 12:40 PM, Maximilian Michels  wrote:
>> Hi Matthias,
>>
>> Thanks for spotting the test failure. It's actually a bug in the code
>> and not a test problem. Fixing it.
>>
>> Cheers,
>> Max
>>
>> On Fri, Apr 1, 2016 at 9:33 AM, Ufuk Celebi  wrote:
>>> Hey Matthias,
>>>
>>> the test has been only recently added with the resource management
>>> refactoring. It's probably just a too aggressive timeout for Travis.
>>>
>>> @Max: Did you ever see this fail?
>>>
>>> – Ufuk
>>>
>>> On Fri, Apr 1, 2016 at 9:24 AM, Matthias J. Sax  wrote:
 Anyone seen this before? One-time thing or test instability?

> ClusterShutdownITCase.testClusterShutdown:71 assertion failed: timeout 
> (29848225634 nanoseconds) during expectMsgClass waiting for class 
> org.apache.flink.runtime.clusterframework.messages.StopClusterSuccessful


 -Matthias




signature.asc
Description: OpenPGP digital signature


Re: Failing Test

2016-04-01 Thread Maximilian Michels
Fixed with the resolution of https://issues.apache.org/jira/browse/FLINK-3689.

On Fri, Apr 1, 2016 at 12:40 PM, Maximilian Michels  wrote:
> Hi Matthias,
>
> Thanks for spotting the test failure. It's actually a bug in the code
> and not a test problem. Fixing it.
>
> Cheers,
> Max
>
> On Fri, Apr 1, 2016 at 9:33 AM, Ufuk Celebi  wrote:
>> Hey Matthias,
>>
>> the test has been only recently added with the resource management
>> refactoring. It's probably just a too aggressive timeout for Travis.
>>
>> @Max: Did you ever see this fail?
>>
>> – Ufuk
>>
>> On Fri, Apr 1, 2016 at 9:24 AM, Matthias J. Sax  wrote:
>>> Anyone seen this before? One-time thing or test instability?
>>>
 ClusterShutdownITCase.testClusterShutdown:71 assertion failed: timeout 
 (29848225634 nanoseconds) during expectMsgClass waiting for class 
 org.apache.flink.runtime.clusterframework.messages.StopClusterSuccessful
>>>
>>>
>>> -Matthias
>>>


Re: Failing Test

2016-04-01 Thread Maximilian Michels
Hi Matthias,

Thanks for spotting the test failure. It's actually a bug in the code
and not a test problem. Fixing it.

Cheers,
Max

On Fri, Apr 1, 2016 at 9:33 AM, Ufuk Celebi  wrote:
> Hey Matthias,
>
> the test has been only recently added with the resource management
> refactoring. It's probably just a too aggressive timeout for Travis.
>
> @Max: Did you ever see this fail?
>
> – Ufuk
>
> On Fri, Apr 1, 2016 at 9:24 AM, Matthias J. Sax  wrote:
>> Anyone seen this before? One-time thing or test instability?
>>
>>> ClusterShutdownITCase.testClusterShutdown:71 assertion failed: timeout 
>>> (29848225634 nanoseconds) during expectMsgClass waiting for class 
>>> org.apache.flink.runtime.clusterframework.messages.StopClusterSuccessful
>>
>>
>> -Matthias
>>


Re: Failing Test

2016-04-01 Thread Ufuk Celebi
Hey Matthias,

the test has been only recently added with the resource management
refactoring. It's probably just a too aggressive timeout for Travis.

@Max: Did you ever see this fail?

– Ufuk

On Fri, Apr 1, 2016 at 9:24 AM, Matthias J. Sax  wrote:
> Anyone seen this before? One-time thing or test instability?
>
>> ClusterShutdownITCase.testClusterShutdown:71 assertion failed: timeout 
>> (29848225634 nanoseconds) during expectMsgClass waiting for class 
>> org.apache.flink.runtime.clusterframework.messages.StopClusterSuccessful
>
>
> -Matthias
>


Failing Test

2016-04-01 Thread Matthias J. Sax
Anyone seen this before? One-time thing or test instability?

> ClusterShutdownITCase.testClusterShutdown:71 assertion failed: timeout 
> (29848225634 nanoseconds) during expectMsgClass waiting for class 
> org.apache.flink.runtime.clusterframework.messages.StopClusterSuccessful


-Matthias



signature.asc
Description: OpenPGP digital signature


[jira] [Created] (FLINK-2839) Failing test: OperatorStatsAccumulatorTest.testAccumulatorAllStatistics

2015-10-09 Thread Gabor Gevay (JIRA)
Gabor Gevay created FLINK-2839:
--

 Summary: Failing test: 
OperatorStatsAccumulatorTest.testAccumulatorAllStatistics
 Key: FLINK-2839
 URL: https://issues.apache.org/jira/browse/FLINK-2839
 Project: Flink
  Issue Type: Bug
  Components: flink-contrib
Reporter: Gabor Gevay
Priority: Minor


I saw this test failure:

{code}
Tests run: 4, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 7.633 sec <<< 
FAILURE! - in 
org.apache.flink.contrib.operatorstatistics.OperatorStatsAccumulatorTest
testAccumulatorAllStatistics(org.apache.flink.contrib.operatorstatistics.OperatorStatsAccumulatorTest)
  Time elapsed: 1.5 sec  <<< FAILURE!
java.lang.AssertionError: The total number of heavy hitters should be between 0 
and 5.
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.assertTrue(Assert.java:41)
at 
org.apache.flink.contrib.operatorstatistics.OperatorStatsAccumulatorTest.testAccumulatorAllStatistics(OperatorStatsAccumulatorTest.java:151)
{code}

Full log 
[here|https://s3.amazonaws.com/archive.travis-ci.org/jobs/84469788/log.txt].

Maybe the test should set a constant seed to the {{Random}} object.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-2832) Failing test: RandomSamplerTest.testReservoirSamplerWithReplacement

2015-10-07 Thread Vasia Kalavri (JIRA)
Vasia Kalavri created FLINK-2832:


 Summary: Failing test: 
RandomSamplerTest.testReservoirSamplerWithReplacement
 Key: FLINK-2832
 URL: https://issues.apache.org/jira/browse/FLINK-2832
 Project: Flink
  Issue Type: Bug
  Components: Tests
Affects Versions: 0.10
Reporter: Vasia Kalavri
Priority: Critical
 Fix For: 0.10


Tests run: 17, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 19.133 sec <<< 
FAILURE! - in org.apache.flink.api.java.sampling.RandomSamplerTest
testReservoirSamplerWithReplacement(org.apache.flink.api.java.sampling.RandomSamplerTest)
  Time elapsed: 2.534 sec  <<< FAILURE!
java.lang.AssertionError: KS test result with p value(0.11), d 
value(0.103090)
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.assertTrue(Assert.java:41)
at 
org.apache.flink.api.java.sampling.RandomSamplerTest.verifyKSTest(RandomSamplerTest.java:342)
at 
org.apache.flink.api.java.sampling.RandomSamplerTest.verifyRandomSamplerWithSampleSize(RandomSamplerTest.java:330)
at 
org.apache.flink.api.java.sampling.RandomSamplerTest.verifyReservoirSamplerWithReplacement(RandomSamplerTest.java:289)
at 
org.apache.flink.api.java.sampling.RandomSamplerTest.testReservoirSamplerWithReplacement(RandomSamplerTest.java:192)

Results :

Failed tests: 
  
RandomSamplerTest.testReservoirSamplerWithReplacement:192->verifyReservoirSamplerWithReplacement:289->verifyRandomSamplerWithSampleSize:330->verifyKSTest:342
 KS test result with p value(0.11), d value(0.103090)

Full log [here|https://travis-ci.org/apache/flink/jobs/84120131].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Failing test

2015-10-06 Thread Till Rohrmann
If there is none yet, then we do. Label it with "test-stability". I think
the consensus was also to mark it as critical.

Otherwise, just add the log to the JIRA.

On Tue, Oct 6, 2015 at 2:57 PM, Matthias J. Sax  wrote:

> Hi,
>
> One test just failed on current master:
> https://travis-ci.org/apache/flink/jobs/83871008
>
> Do we need a JIRA?
>
> >   LeaderChangeStateCleanupTest.testReelectionOfSameJobManager:245 »
> Timeout Futu...
>
>
> -Matthias
>
>


Failing test

2015-10-06 Thread Matthias J. Sax
Hi,

One test just failed on current master:
https://travis-ci.org/apache/flink/jobs/83871008

Do we need a JIRA?

>   LeaderChangeStateCleanupTest.testReelectionOfSameJobManager:245 » Timeout 
> Futu...


-Matthias



signature.asc
Description: OpenPGP digital signature


[jira] [Created] (FLINK-2628) Failing Test: StreamFaultToleranceTestBase.runCheckpointedProgram

2015-09-07 Thread Martin Liesenberg (JIRA)
Martin Liesenberg created FLINK-2628:


 Summary: Failing Test: 
StreamFaultToleranceTestBase.runCheckpointedProgram
 Key: FLINK-2628
 URL: https://issues.apache.org/jira/browse/FLINK-2628
 Project: Flink
  Issue Type: Bug
  Components: Tests
Reporter: Martin Liesenberg


In pullrequest #1097 the test 
StreamFaultToleranceTestBase.runCheckpointedProgram

The changes introduced in the pull request are most likely unrelated. I can not 
reproduce it locally. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Failing Test: KafkaITCase and KafkaProducerITCase

2015-09-07 Thread Stephan Ewen
I have a patch pending that should help with these timeout issues (and null
checks)...

On Mon, Sep 7, 2015 at 2:41 PM, Matthias J. Sax  wrote:

> Please lock here:
>
> https://travis-ci.org/apache/flink/jobs/79086396
>
> > Failed tests:
> > KafkaITCase>KafkaTestBase.prepare:155 Test setup failed: Unable to
> connect to zookeeper server within timeout: 6000
> > KafkaProducerITCase>KafkaTestBase.prepare:155 Test setup failed: Unable
> to connect to zookeeper server within timeout: 6000
> >
> > Tests in error:
> > KafkaITCase>KafkaTestBase.shutDownServices:196 » NullPointer
> > KafkaProducerITCase>KafkaTestBase.shutDownServices:196 » NullPointer
>
> I did not find any JIRA for it.
>
>
> -Matthias
>
>


[jira] [Created] (FLINK-2599) Failing Test: SlotCountExceedingParallelismTest

2015-08-31 Thread Matthias J. Sax (JIRA)
Matthias J. Sax created FLINK-2599:
--

 Summary: Failing Test: SlotCountExceedingParallelismTest
 Key: FLINK-2599
 URL: https://issues.apache.org/jira/browse/FLINK-2599
 Project: Flink
  Issue Type: Bug
  Components: Tests
Reporter: Matthias J. Sax
Priority: Critical


{noformat}
Tests run: 2, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 200.355 sec <<< 
FAILURE! - in 
org.apache.flink.runtime.jobmanager.SlotCountExceedingParallelismTest
org.apache.flink.runtime.jobmanager.SlotCountExceedingParallelismTest Time 
elapsed: 200.355 sec <<< ERROR!
java.util.concurrent.TimeoutException: Futures timed out after [20 
milliseconds]
at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:153)
at scala.concurrent.Await$$anonfun$ready$1.apply(package.scala:95)
at scala.concurrent.Await$$anonfun$ready$1.apply(package.scala:95)
at 
scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
at scala.concurrent.Await$.ready(package.scala:95)
at 
org.apache.flink.runtime.minicluster.FlinkMiniCluster.waitForTaskManagersToBeRegistered(FlinkMiniCluster.scala:237)
at 
org.apache.flink.runtime.minicluster.FlinkMiniCluster.(FlinkMiniCluster.scala:95)
at 
org.apache.flink.runtime.testingUtils.TestingCluster.(TestingCluster.scala:43)
at 
org.apache.flink.runtime.testingUtils.TestingCluster.(TestingCluster.scala:51)
at 
org.apache.flink.runtime.testingUtils.TestingCluster.(TestingCluster.scala:56)
at 
org.apache.flink.runtime.testingUtils.TestingUtils$.startTestingCluster(TestingUtils.scala:65)
at 
org.apache.flink.runtime.testingUtils.TestingUtils.startTestingCluster(TestingUtils.scala)
at 
org.apache.flink.runtime.jobmanager.SlotCountExceedingParallelismTest.setUp(SlotCountExceedingParallelismTest.java:49)
org.apache.flink.runtime.jobmanager.SlotCountExceedingParallelismTest Time 
elapsed: 200.355 sec <<< ERROR!

java.lang.NullPointerException: null
at 
org.apache.flink.runtime.jobmanager.SlotCountExceedingParallelismTest.tearDown(SlotCountExceedingParallelismTest.java:57)
{noformat}

https://travis-ci.org/apache/flink/jobs/77887433



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-2596) Failing Test: RandomSamplerTest

2015-08-28 Thread Matthias J. Sax (JIRA)
Matthias J. Sax created FLINK-2596:
--

 Summary: Failing Test: RandomSamplerTest
 Key: FLINK-2596
 URL: https://issues.apache.org/jira/browse/FLINK-2596
 Project: Flink
  Issue Type: Bug
Reporter: Matthias J. Sax
Priority: Critical


{noformat}
Tests run: 17, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 14.925 sec  
FAILURE! - in org.apache.flink.api.java.sampling.RandomSamplerTest

testReservoirSamplerWithMultiSourcePartitions2(org.apache.flink.api.java.sampling.RandomSamplerTest)
 Time elapsed: 0.444 sec  ERROR!

java.lang.IllegalArgumentException: Comparison method violates its general 
contract!
at java.util.TimSort.mergeLo(TimSort.java:747)
at java.util.TimSort.mergeAt(TimSort.java:483)
at java.util.TimSort.mergeCollapse(TimSort.java:410)
at java.util.TimSort.sort(TimSort.java:214)
at java.util.TimSort.sort(TimSort.java:173)
at java.util.Arrays.sort(Arrays.java:659)
at java.util.Collections.sort(Collections.java:217)
at 
org.apache.flink.api.java.sampling.RandomSamplerTest.transferFromListToArrayWithOrder(RandomSamplerTest.java:375)
at 
org.apache.flink.api.java.sampling.RandomSamplerTest.getSampledOutput(RandomSamplerTest.java:367)
at 
org.apache.flink.api.java.sampling.RandomSamplerTest.verifyKSTest(RandomSamplerTest.java:338)
at 
org.apache.flink.api.java.sampling.RandomSamplerTest.verifyRandomSamplerWithSampleSize(RandomSamplerTest.java:330)
at 
org.apache.flink.api.java.sampling.RandomSamplerTest.verifyReservoirSamplerWithReplacement(RandomSamplerTest.java:290)
at 
org.apache.flink.api.java.sampling.RandomSamplerTest.testReservoirSamplerWithMultiSourcePartitions2(RandomSamplerTest.java:212)

Results :

Tests in error:

RandomSamplerTest.testReservoirSamplerWithMultiSourcePartitions2:212-verifyReservoirSamplerWithReplacement:290-verifyRandomSamplerWithSampleSize:330-verifyKSTest:338-getSampledOutput:367-transferFromListToArrayWithOrder:375
 » IllegalArgument
{noformat}

https://travis-ci.org/apache/flink/jobs/77750329



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: [FAILING TEST] StateCheckpoinedITCase

2015-08-24 Thread Stephan Ewen
Pushed a fix for the StateCheckpointedITCase

On Mon, Aug 24, 2015 at 12:19 PM, Maximilian Michels m...@apache.org wrote:

 +1 for labeling the JIRAs with test-stability.

 On Sat, Aug 22, 2015 at 8:21 PM, Márton Balassi balassi.mar...@gmail.com
 wrote:

  +1 for Vasia's suggestion
  On Aug 22, 2015 8:07 PM, Vasiliki Kalavri vasilikikala...@gmail.com
  wrote:
 
   I just came across 2 more :/
   I'm also in favor of tracking these with JIRA. How about
 test-stability
   for a label?
  
   -V.
  
   On 21 August 2015 at 12:47, Matthias J. Sax 
  mj...@informatik.hu-berlin.de
   
   wrote:
  
I like the idea with the special label. Otherwise, it will be
 difficult
to find the correct tickets.
   
-Matthias
   
On 08/21/2015 12:15 PM, Till Rohrmann wrote:
 I'm also in favor of JIRA, because I fear that nobody will keep the
   wiki
 page in sync. Maybe we can assign a special label for test
 stability
  to
 these JIRA issues. Then we can quickly find all currently instable
  test
 cases.

 On Fri, Aug 21, 2015 at 11:02 AM, Robert Metzger 
  rmetz...@apache.org
 wrote:

 I agree that we should look for a solution other than opening a
 lot
  of
 small discussion threads on the mailing list.

 When I have a test failure, I usually search my gmail inbox to see
whether
 somebody else wrote something about the error already.
 Creating a JIRA for each failing test might be a better approach.
Because
 that's what bugtrackers are made for ;) (And the issues still pop
 up
when
 doing a gmail search)

 On Thu, Aug 20, 2015 at 10:16 PM, Matthias J. Sax 
 mj...@informatik.hu-berlin.de wrote:

 Thanks for the info.

 Over the weeks I lost track which errors/failing/instable tests
 are
know
 an which not. Should we start a wiki page or similar to collect
  know
 errors? If a test fails on a know error, it can just be ignored.
  This
 would avoid spam on the mailing list.

 Any thoughts about this?

 -Matthias

 On 08/20/2015 10:08 PM, Robert Metzger wrote:
 Sachin saw the error as well, as reported here:
 https://issues.apache.org/jira/browse/FLINK-2468
 I also see it from time to time.I have a wip branch where I
  relaxed
the
 constraints for the test to pass a bit.

 On Thu, Aug 20, 2015 at 10:05 PM, Matthias J. Sax 
 mj...@informatik.hu-berlin.de wrote:

 Error message is:

 Failed tests:




   
  
 
 StateCheckpoinedITCaseStreamFaultToleranceTestBase.runCheckpointedProgram:103-postSubmit:98
 Test inconclusive: failure occurred before first checkpoint

 See: https://travis-ci.org/mjsax/flink/jobs/76483093


 -Matthias







   
   
  
 



Re: [FAILING TEST] StateCheckpoinedITCase

2015-08-24 Thread Stephan Ewen
+1 for a test-stability label and labeling these issues as critical

On Mon, Aug 24, 2015 at 6:31 PM, Stephan Ewen se...@apache.org wrote:

 Pushed a fix for the StateCheckpointedITCase

 On Mon, Aug 24, 2015 at 12:19 PM, Maximilian Michels m...@apache.org
 wrote:

 +1 for labeling the JIRAs with test-stability.

 On Sat, Aug 22, 2015 at 8:21 PM, Márton Balassi balassi.mar...@gmail.com
 
 wrote:

  +1 for Vasia's suggestion
  On Aug 22, 2015 8:07 PM, Vasiliki Kalavri vasilikikala...@gmail.com
  wrote:
 
   I just came across 2 more :/
   I'm also in favor of tracking these with JIRA. How about
 test-stability
   for a label?
  
   -V.
  
   On 21 August 2015 at 12:47, Matthias J. Sax 
  mj...@informatik.hu-berlin.de
   
   wrote:
  
I like the idea with the special label. Otherwise, it will be
 difficult
to find the correct tickets.
   
-Matthias
   
On 08/21/2015 12:15 PM, Till Rohrmann wrote:
 I'm also in favor of JIRA, because I fear that nobody will keep
 the
   wiki
 page in sync. Maybe we can assign a special label for test
 stability
  to
 these JIRA issues. Then we can quickly find all currently instable
  test
 cases.

 On Fri, Aug 21, 2015 at 11:02 AM, Robert Metzger 
  rmetz...@apache.org
 wrote:

 I agree that we should look for a solution other than opening a
 lot
  of
 small discussion threads on the mailing list.

 When I have a test failure, I usually search my gmail inbox to
 see
whether
 somebody else wrote something about the error already.
 Creating a JIRA for each failing test might be a better approach.
Because
 that's what bugtrackers are made for ;) (And the issues still
 pop up
when
 doing a gmail search)

 On Thu, Aug 20, 2015 at 10:16 PM, Matthias J. Sax 
 mj...@informatik.hu-berlin.de wrote:

 Thanks for the info.

 Over the weeks I lost track which errors/failing/instable tests
 are
know
 an which not. Should we start a wiki page or similar to collect
  know
 errors? If a test fails on a know error, it can just be ignored.
  This
 would avoid spam on the mailing list.

 Any thoughts about this?

 -Matthias

 On 08/20/2015 10:08 PM, Robert Metzger wrote:
 Sachin saw the error as well, as reported here:
 https://issues.apache.org/jira/browse/FLINK-2468
 I also see it from time to time.I have a wip branch where I
  relaxed
the
 constraints for the test to pass a bit.

 On Thu, Aug 20, 2015 at 10:05 PM, Matthias J. Sax 
 mj...@informatik.hu-berlin.de wrote:

 Error message is:

 Failed tests:




   
  
 
 StateCheckpoinedITCaseStreamFaultToleranceTestBase.runCheckpointedProgram:103-postSubmit:98
 Test inconclusive: failure occurred before first checkpoint

 See: https://travis-ci.org/mjsax/flink/jobs/76483093


 -Matthias







   
   
  
 





Re: [FAILING TEST] StateCheckpoinedITCase

2015-08-24 Thread Maximilian Michels
+1 for labeling the JIRAs with test-stability.

On Sat, Aug 22, 2015 at 8:21 PM, Márton Balassi balassi.mar...@gmail.com
wrote:

 +1 for Vasia's suggestion
 On Aug 22, 2015 8:07 PM, Vasiliki Kalavri vasilikikala...@gmail.com
 wrote:

  I just came across 2 more :/
  I'm also in favor of tracking these with JIRA. How about test-stability
  for a label?
 
  -V.
 
  On 21 August 2015 at 12:47, Matthias J. Sax 
 mj...@informatik.hu-berlin.de
  
  wrote:
 
   I like the idea with the special label. Otherwise, it will be difficult
   to find the correct tickets.
  
   -Matthias
  
   On 08/21/2015 12:15 PM, Till Rohrmann wrote:
I'm also in favor of JIRA, because I fear that nobody will keep the
  wiki
page in sync. Maybe we can assign a special label for test stability
 to
these JIRA issues. Then we can quickly find all currently instable
 test
cases.
   
On Fri, Aug 21, 2015 at 11:02 AM, Robert Metzger 
 rmetz...@apache.org
wrote:
   
I agree that we should look for a solution other than opening a lot
 of
small discussion threads on the mailing list.
   
When I have a test failure, I usually search my gmail inbox to see
   whether
somebody else wrote something about the error already.
Creating a JIRA for each failing test might be a better approach.
   Because
that's what bugtrackers are made for ;) (And the issues still pop up
   when
doing a gmail search)
   
On Thu, Aug 20, 2015 at 10:16 PM, Matthias J. Sax 
mj...@informatik.hu-berlin.de wrote:
   
Thanks for the info.
   
Over the weeks I lost track which errors/failing/instable tests are
   know
an which not. Should we start a wiki page or similar to collect
 know
errors? If a test fails on a know error, it can just be ignored.
 This
would avoid spam on the mailing list.
   
Any thoughts about this?
   
-Matthias
   
On 08/20/2015 10:08 PM, Robert Metzger wrote:
Sachin saw the error as well, as reported here:
https://issues.apache.org/jira/browse/FLINK-2468
I also see it from time to time.I have a wip branch where I
 relaxed
   the
constraints for the test to pass a bit.
   
On Thu, Aug 20, 2015 at 10:05 PM, Matthias J. Sax 
mj...@informatik.hu-berlin.de wrote:
   
Error message is:
   
Failed tests:
   
   
   
   
  
 
 StateCheckpoinedITCaseStreamFaultToleranceTestBase.runCheckpointedProgram:103-postSubmit:98
Test inconclusive: failure occurred before first checkpoint
   
See: https://travis-ci.org/mjsax/flink/jobs/76483093
   
   
-Matthias
   
   
   
   
   
   
   
  
  
 



Re: [FAILING TEST] RandomSamplerTest

2015-08-24 Thread Maximilian Michels
Hi Matthias,

Thanks for reporting. The label test-stability exists now.

Cheers,
Max

On Sun, Aug 23, 2015 at 12:32 PM, Matthias J. Sax 
mj...@informatik.hu-berlin.de wrote:

 Hi,

 because there is (not yet) a label for failing tests, I just report it
 over the mailing list again. I also open a JIRA for it).

  Tests run: 17, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 15.943
 sec  FAILURE! - in org.apache.flink.api.java.sampling.
 
 testPoissonSamplerFraction(org.apache.flink.api.java.sampling.RandomSamplerTest)
 Time elapsed: 0.017 sec  FAILURE!
  java.lang.AssertionError: expected fraction: 0.01, result fraction:
 0.011300
  at org.junit.Assert.fail(Assert.java:88)
  at org.junit.Assert.assertTrue(Assert.java:41)
  at
 org.apache.flink.api.java.sampling.RandomSamplerTest.verifySamplerFraction(RandomSamplerTest.java:249)
  at
 org.apache.flink.api.java.sampling.RandomSamplerTest.testPoissonSamplerFraction(RandomSamplerTest.java:116)
 
  Results :
  Failed tests:
  Successfully installed excon-0.33.0
 
 RandomSamplerTest.testPoissonSamplerFraction:116-verifySamplerFraction:249
 expected fraction: 0.01, result fraction: 0.011300

 https://travis-ci.org/apache/flink/jobs/76720572

 -Matthias




[jira] [Created] (FLINK-2564) Failing Test: RandomSamplerTest

2015-08-23 Thread Matthias J. Sax (JIRA)
Matthias J. Sax created FLINK-2564:
--

 Summary: Failing Test: RandomSamplerTest
 Key: FLINK-2564
 URL: https://issues.apache.org/jira/browse/FLINK-2564
 Project: Flink
  Issue Type: Bug
Reporter: Matthias J. Sax


{noformat}
Tests run: 17, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 15.943 sec  
FAILURE! - in org.apache.flink.api.java.sampling.   
testPoissonSamplerFraction(org.apache.flink.api.java.sampling.RandomSamplerTest)
 Time elapsed: 0.017 sec  FAILURE!
java.lang.AssertionError: expected fraction: 0.01, result fraction: 0.011300
at org.junit.Assert.fail(Assert.java:88)
at org.junit.Assert.assertTrue(Assert.java:41)
at 
org.apache.flink.api.java.sampling.RandomSamplerTest.verifySamplerFraction(RandomSamplerTest.java:249)
at 
org.apache.flink.api.java.sampling.RandomSamplerTest.testPoissonSamplerFraction(RandomSamplerTest.java:116)

Results :
Failed tests:
Successfully installed excon-0.33.0
RandomSamplerTest.testPoissonSamplerFraction:116-verifySamplerFraction:249 
expected fraction: 0.01, result fraction: 0.011300
{noformat}

Full log: https://travis-ci.org/apache/flink/jobs/76720572



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[FAILING TEST] RandomSamplerTest

2015-08-23 Thread Matthias J. Sax
Hi,

because there is (not yet) a label for failing tests, I just report it
over the mailing list again. I also open a JIRA for it).

 Tests run: 17, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 15.943 sec 
  FAILURE! - in org.apache.flink.api.java.sampling. 
 testPoissonSamplerFraction(org.apache.flink.api.java.sampling.RandomSamplerTest)
  Time elapsed: 0.017 sec  FAILURE!
 java.lang.AssertionError: expected fraction: 0.01, result fraction: 
 0.011300
 at org.junit.Assert.fail(Assert.java:88)
 at org.junit.Assert.assertTrue(Assert.java:41)
 at 
 org.apache.flink.api.java.sampling.RandomSamplerTest.verifySamplerFraction(RandomSamplerTest.java:249)
 at 
 org.apache.flink.api.java.sampling.RandomSamplerTest.testPoissonSamplerFraction(RandomSamplerTest.java:116)
 
 Results :
 Failed tests:
 Successfully installed excon-0.33.0
 RandomSamplerTest.testPoissonSamplerFraction:116-verifySamplerFraction:249 
 expected fraction: 0.01, result fraction: 0.011300

https://travis-ci.org/apache/flink/jobs/76720572

-Matthias



signature.asc
Description: OpenPGP digital signature


Re: [FAILING TEST] StateCheckpoinedITCase

2015-08-22 Thread Vasiliki Kalavri
I just came across 2 more :/
I'm also in favor of tracking these with JIRA. How about test-stability
for a label?

-V.

On 21 August 2015 at 12:47, Matthias J. Sax mj...@informatik.hu-berlin.de
wrote:

 I like the idea with the special label. Otherwise, it will be difficult
 to find the correct tickets.

 -Matthias

 On 08/21/2015 12:15 PM, Till Rohrmann wrote:
  I'm also in favor of JIRA, because I fear that nobody will keep the wiki
  page in sync. Maybe we can assign a special label for test stability to
  these JIRA issues. Then we can quickly find all currently instable test
  cases.
 
  On Fri, Aug 21, 2015 at 11:02 AM, Robert Metzger rmetz...@apache.org
  wrote:
 
  I agree that we should look for a solution other than opening a lot of
  small discussion threads on the mailing list.
 
  When I have a test failure, I usually search my gmail inbox to see
 whether
  somebody else wrote something about the error already.
  Creating a JIRA for each failing test might be a better approach.
 Because
  that's what bugtrackers are made for ;) (And the issues still pop up
 when
  doing a gmail search)
 
  On Thu, Aug 20, 2015 at 10:16 PM, Matthias J. Sax 
  mj...@informatik.hu-berlin.de wrote:
 
  Thanks for the info.
 
  Over the weeks I lost track which errors/failing/instable tests are
 know
  an which not. Should we start a wiki page or similar to collect know
  errors? If a test fails on a know error, it can just be ignored. This
  would avoid spam on the mailing list.
 
  Any thoughts about this?
 
  -Matthias
 
  On 08/20/2015 10:08 PM, Robert Metzger wrote:
  Sachin saw the error as well, as reported here:
  https://issues.apache.org/jira/browse/FLINK-2468
  I also see it from time to time.I have a wip branch where I relaxed
 the
  constraints for the test to pass a bit.
 
  On Thu, Aug 20, 2015 at 10:05 PM, Matthias J. Sax 
  mj...@informatik.hu-berlin.de wrote:
 
  Error message is:
 
  Failed tests:
 
 
 
 
 StateCheckpoinedITCaseStreamFaultToleranceTestBase.runCheckpointedProgram:103-postSubmit:98
  Test inconclusive: failure occurred before first checkpoint
 
  See: https://travis-ci.org/mjsax/flink/jobs/76483093
 
 
  -Matthias
 
 
 
 
 
 
 




Re: [FAILING TEST] StateCheckpoinedITCase

2015-08-22 Thread Márton Balassi
+1 for Vasia's suggestion
On Aug 22, 2015 8:07 PM, Vasiliki Kalavri vasilikikala...@gmail.com
wrote:

 I just came across 2 more :/
 I'm also in favor of tracking these with JIRA. How about test-stability
 for a label?

 -V.

 On 21 August 2015 at 12:47, Matthias J. Sax mj...@informatik.hu-berlin.de
 
 wrote:

  I like the idea with the special label. Otherwise, it will be difficult
  to find the correct tickets.
 
  -Matthias
 
  On 08/21/2015 12:15 PM, Till Rohrmann wrote:
   I'm also in favor of JIRA, because I fear that nobody will keep the
 wiki
   page in sync. Maybe we can assign a special label for test stability to
   these JIRA issues. Then we can quickly find all currently instable test
   cases.
  
   On Fri, Aug 21, 2015 at 11:02 AM, Robert Metzger rmetz...@apache.org
   wrote:
  
   I agree that we should look for a solution other than opening a lot of
   small discussion threads on the mailing list.
  
   When I have a test failure, I usually search my gmail inbox to see
  whether
   somebody else wrote something about the error already.
   Creating a JIRA for each failing test might be a better approach.
  Because
   that's what bugtrackers are made for ;) (And the issues still pop up
  when
   doing a gmail search)
  
   On Thu, Aug 20, 2015 at 10:16 PM, Matthias J. Sax 
   mj...@informatik.hu-berlin.de wrote:
  
   Thanks for the info.
  
   Over the weeks I lost track which errors/failing/instable tests are
  know
   an which not. Should we start a wiki page or similar to collect know
   errors? If a test fails on a know error, it can just be ignored. This
   would avoid spam on the mailing list.
  
   Any thoughts about this?
  
   -Matthias
  
   On 08/20/2015 10:08 PM, Robert Metzger wrote:
   Sachin saw the error as well, as reported here:
   https://issues.apache.org/jira/browse/FLINK-2468
   I also see it from time to time.I have a wip branch where I relaxed
  the
   constraints for the test to pass a bit.
  
   On Thu, Aug 20, 2015 at 10:05 PM, Matthias J. Sax 
   mj...@informatik.hu-berlin.de wrote:
  
   Error message is:
  
   Failed tests:
  
  
  
  
 
 StateCheckpoinedITCaseStreamFaultToleranceTestBase.runCheckpointedProgram:103-postSubmit:98
   Test inconclusive: failure occurred before first checkpoint
  
   See: https://travis-ci.org/mjsax/flink/jobs/76483093
  
  
   -Matthias
  
  
  
  
  
  
  
 
 



Re: [FAILING TEST] StateCheckpoinedITCase

2015-08-21 Thread Matthias J. Sax
I like the idea with the special label. Otherwise, it will be difficult
to find the correct tickets.

-Matthias

On 08/21/2015 12:15 PM, Till Rohrmann wrote:
 I'm also in favor of JIRA, because I fear that nobody will keep the wiki
 page in sync. Maybe we can assign a special label for test stability to
 these JIRA issues. Then we can quickly find all currently instable test
 cases.
 
 On Fri, Aug 21, 2015 at 11:02 AM, Robert Metzger rmetz...@apache.org
 wrote:
 
 I agree that we should look for a solution other than opening a lot of
 small discussion threads on the mailing list.

 When I have a test failure, I usually search my gmail inbox to see whether
 somebody else wrote something about the error already.
 Creating a JIRA for each failing test might be a better approach. Because
 that's what bugtrackers are made for ;) (And the issues still pop up when
 doing a gmail search)

 On Thu, Aug 20, 2015 at 10:16 PM, Matthias J. Sax 
 mj...@informatik.hu-berlin.de wrote:

 Thanks for the info.

 Over the weeks I lost track which errors/failing/instable tests are know
 an which not. Should we start a wiki page or similar to collect know
 errors? If a test fails on a know error, it can just be ignored. This
 would avoid spam on the mailing list.

 Any thoughts about this?

 -Matthias

 On 08/20/2015 10:08 PM, Robert Metzger wrote:
 Sachin saw the error as well, as reported here:
 https://issues.apache.org/jira/browse/FLINK-2468
 I also see it from time to time.I have a wip branch where I relaxed the
 constraints for the test to pass a bit.

 On Thu, Aug 20, 2015 at 10:05 PM, Matthias J. Sax 
 mj...@informatik.hu-berlin.de wrote:

 Error message is:

 Failed tests:



 StateCheckpoinedITCaseStreamFaultToleranceTestBase.runCheckpointedProgram:103-postSubmit:98
 Test inconclusive: failure occurred before first checkpoint

 See: https://travis-ci.org/mjsax/flink/jobs/76483093


 -Matthias






 



signature.asc
Description: OpenPGP digital signature


Re: [FAILING TEST] StateCheckpoinedITCase

2015-08-21 Thread Robert Metzger
I agree that we should look for a solution other than opening a lot of
small discussion threads on the mailing list.

When I have a test failure, I usually search my gmail inbox to see whether
somebody else wrote something about the error already.
Creating a JIRA for each failing test might be a better approach. Because
that's what bugtrackers are made for ;) (And the issues still pop up when
doing a gmail search)

On Thu, Aug 20, 2015 at 10:16 PM, Matthias J. Sax 
mj...@informatik.hu-berlin.de wrote:

 Thanks for the info.

 Over the weeks I lost track which errors/failing/instable tests are know
 an which not. Should we start a wiki page or similar to collect know
 errors? If a test fails on a know error, it can just be ignored. This
 would avoid spam on the mailing list.

 Any thoughts about this?

 -Matthias

 On 08/20/2015 10:08 PM, Robert Metzger wrote:
  Sachin saw the error as well, as reported here:
  https://issues.apache.org/jira/browse/FLINK-2468
  I also see it from time to time.I have a wip branch where I relaxed the
  constraints for the test to pass a bit.
 
  On Thu, Aug 20, 2015 at 10:05 PM, Matthias J. Sax 
  mj...@informatik.hu-berlin.de wrote:
 
  Error message is:
 
  Failed tests:
 
 
 StateCheckpoinedITCaseStreamFaultToleranceTestBase.runCheckpointedProgram:103-postSubmit:98
  Test inconclusive: failure occurred before first checkpoint
 
  See: https://travis-ci.org/mjsax/flink/jobs/76483093
 
 
  -Matthias
 
 
 




Re: [FAILING TEST] StateCheckpoinedITCase

2015-08-21 Thread Till Rohrmann
I'm also in favor of JIRA, because I fear that nobody will keep the wiki
page in sync. Maybe we can assign a special label for test stability to
these JIRA issues. Then we can quickly find all currently instable test
cases.

On Fri, Aug 21, 2015 at 11:02 AM, Robert Metzger rmetz...@apache.org
wrote:

 I agree that we should look for a solution other than opening a lot of
 small discussion threads on the mailing list.

 When I have a test failure, I usually search my gmail inbox to see whether
 somebody else wrote something about the error already.
 Creating a JIRA for each failing test might be a better approach. Because
 that's what bugtrackers are made for ;) (And the issues still pop up when
 doing a gmail search)

 On Thu, Aug 20, 2015 at 10:16 PM, Matthias J. Sax 
 mj...@informatik.hu-berlin.de wrote:

  Thanks for the info.
 
  Over the weeks I lost track which errors/failing/instable tests are know
  an which not. Should we start a wiki page or similar to collect know
  errors? If a test fails on a know error, it can just be ignored. This
  would avoid spam on the mailing list.
 
  Any thoughts about this?
 
  -Matthias
 
  On 08/20/2015 10:08 PM, Robert Metzger wrote:
   Sachin saw the error as well, as reported here:
   https://issues.apache.org/jira/browse/FLINK-2468
   I also see it from time to time.I have a wip branch where I relaxed the
   constraints for the test to pass a bit.
  
   On Thu, Aug 20, 2015 at 10:05 PM, Matthias J. Sax 
   mj...@informatik.hu-berlin.de wrote:
  
   Error message is:
  
   Failed tests:
  
  
 
 StateCheckpoinedITCaseStreamFaultToleranceTestBase.runCheckpointedProgram:103-postSubmit:98
   Test inconclusive: failure occurred before first checkpoint
  
   See: https://travis-ci.org/mjsax/flink/jobs/76483093
  
  
   -Matthias
  
  
  
 
 



[FAILING TEST] StateCheckpoinedITCase

2015-08-20 Thread Matthias J. Sax
Error message is:

 Failed tests:
 StateCheckpoinedITCaseStreamFaultToleranceTestBase.runCheckpointedProgram:103-postSubmit:98
  Test inconclusive: failure occurred before first checkpoint

See: https://travis-ci.org/mjsax/flink/jobs/76483093


-Matthias



signature.asc
Description: OpenPGP digital signature


Re: [FAILING TEST] StateCheckpoinedITCase

2015-08-20 Thread Robert Metzger
Sachin saw the error as well, as reported here:
https://issues.apache.org/jira/browse/FLINK-2468
I also see it from time to time.I have a wip branch where I relaxed the
constraints for the test to pass a bit.

On Thu, Aug 20, 2015 at 10:05 PM, Matthias J. Sax 
mj...@informatik.hu-berlin.de wrote:

 Error message is:

  Failed tests:
 
 StateCheckpoinedITCaseStreamFaultToleranceTestBase.runCheckpointedProgram:103-postSubmit:98
 Test inconclusive: failure occurred before first checkpoint

 See: https://travis-ci.org/mjsax/flink/jobs/76483093


 -Matthias




Re: [FAILING TEST] StateCheckpoinedITCase

2015-08-20 Thread Matthias J. Sax
Thanks for the info.

Over the weeks I lost track which errors/failing/instable tests are know
an which not. Should we start a wiki page or similar to collect know
errors? If a test fails on a know error, it can just be ignored. This
would avoid spam on the mailing list.

Any thoughts about this?

-Matthias

On 08/20/2015 10:08 PM, Robert Metzger wrote:
 Sachin saw the error as well, as reported here:
 https://issues.apache.org/jira/browse/FLINK-2468
 I also see it from time to time.I have a wip branch where I relaxed the
 constraints for the test to pass a bit.
 
 On Thu, Aug 20, 2015 at 10:05 PM, Matthias J. Sax 
 mj...@informatik.hu-berlin.de wrote:
 
 Error message is:

 Failed tests:

 StateCheckpoinedITCaseStreamFaultToleranceTestBase.runCheckpointedProgram:103-postSubmit:98
 Test inconclusive: failure occurred before first checkpoint

 See: https://travis-ci.org/mjsax/flink/jobs/76483093


 -Matthias


 



signature.asc
Description: OpenPGP digital signature


Re: [FAILING TEST] BlobLibraryCacheManagerTest

2015-08-16 Thread Stephan Ewen
Looks like a rare race between the cleanup (two changes) and the test
validating both changes.

I'll push a fix to make the test more reliable.

On Sun, Aug 16, 2015 at 11:04 PM, Matthias J. Sax 
mj...@informatik.hu-berlin.de wrote:

 Hi,

 I hit a failing test in flink-runtime. Not sure if it is known already:

  Failed tests:
  CheckpointCoordinatorTest.testCheckpointTimeoutIsolated:594 expected:0
 but was:1

 Please see: https://travis-ci.org/mjsax/flink/jobs/75847501


 -Matthias




[FAILING TEST] BlobLibraryCacheManagerTest

2015-08-16 Thread Matthias J. Sax
Hi,

I hit a failing test in flink-runtime. Not sure if it is known already:

 Failed tests:
 CheckpointCoordinatorTest.testCheckpointTimeoutIsolated:594 expected:0 but 
 was:1

Please see: https://travis-ci.org/mjsax/flink/jobs/75847501


-Matthias



signature.asc
Description: OpenPGP digital signature


Re: Failing test in Gelly

2015-08-10 Thread Stephan Ewen
May be an issue with the embedded YARN mini cluster...

On Mon, Aug 10, 2015 at 8:37 PM, Stephan Ewen se...@apache.org wrote:

 I think the YARN problem is as before, but with a longer timeout.

 Before, when after 60 seconds the expected output did not come, the tests
 aborted.
 The timeout is now 180 seconds, which is probably so long that the
 deadlock detector (5 minutes no output) kicks in.

 In any case, there is something broken, because the YARN program does not
 properly finish.

 On Sun, Aug 9, 2015 at 9:49 PM, Matthias J. Sax 
 mj...@informatik.hu-berlin.de wrote:

 Not sure about the yarn test... As yarn was instable all the time I just
 ignored it...

 -Matthias

 On 08/09/2015 09:38 PM, Ufuk Celebi wrote:
  PS what about the yarn test case... Is that one known (with that trace)?
 
  On Sunday, August 9, 2015, Ufuk Celebi u...@apache.org wrote:
 
  There is an issue for this from last week. Couldn't look into it last
  week, will do tomorrow. Thanks for the logs. :)
 
  On Sunday, August 9, 2015, Matthias J. Sax 
 mj...@informatik.hu-berlin.de
  javascript:_e(%7B%7D,'cvml','mj...@informatik.hu-berlin.de');
 wrote:
 
  Wrong link... sorry.
 
  https://travis-ci.org/mjsax/flink/jobs/74787655
 
 
 
  On 08/09/2015 04:02 PM, Maximilian Michels wrote:
  Hi Matthias,
 
  Is that the correct build URL? I can't spot any failing Gelly tests.
 The
  build appears to be stuck in the YARNSessionFIFOITCase.
 
  Cheers,
  Max
 
  On Sun, Aug 9, 2015 at 3:37 PM, Matthias J. Sax 
  mj...@informatik.hu-berlin.de wrote:
 
  Hi,
 
  I got a new failing test in this build (flink-gelly)
  https://travis-ci.org/mjsax/flink/jobs/74787658
 
  The branch is basically the current master, as I only fixed
  documentation stuff in this PR.
 
 
  -Matthias
 
 
 
 
 
 





Re: Failing test in Gelly

2015-08-10 Thread Stephan Ewen
I think the YARN problem is as before, but with a longer timeout.

Before, when after 60 seconds the expected output did not come, the tests
aborted.
The timeout is now 180 seconds, which is probably so long that the deadlock
detector (5 minutes no output) kicks in.

In any case, there is something broken, because the YARN program does not
properly finish.

On Sun, Aug 9, 2015 at 9:49 PM, Matthias J. Sax 
mj...@informatik.hu-berlin.de wrote:

 Not sure about the yarn test... As yarn was instable all the time I just
 ignored it...

 -Matthias

 On 08/09/2015 09:38 PM, Ufuk Celebi wrote:
  PS what about the yarn test case... Is that one known (with that trace)?
 
  On Sunday, August 9, 2015, Ufuk Celebi u...@apache.org wrote:
 
  There is an issue for this from last week. Couldn't look into it last
  week, will do tomorrow. Thanks for the logs. :)
 
  On Sunday, August 9, 2015, Matthias J. Sax 
 mj...@informatik.hu-berlin.de
  javascript:_e(%7B%7D,'cvml','mj...@informatik.hu-berlin.de'); wrote:
 
  Wrong link... sorry.
 
  https://travis-ci.org/mjsax/flink/jobs/74787655
 
 
 
  On 08/09/2015 04:02 PM, Maximilian Michels wrote:
  Hi Matthias,
 
  Is that the correct build URL? I can't spot any failing Gelly tests.
 The
  build appears to be stuck in the YARNSessionFIFOITCase.
 
  Cheers,
  Max
 
  On Sun, Aug 9, 2015 at 3:37 PM, Matthias J. Sax 
  mj...@informatik.hu-berlin.de wrote:
 
  Hi,
 
  I got a new failing test in this build (flink-gelly)
  https://travis-ci.org/mjsax/flink/jobs/74787658
 
  The branch is basically the current master, as I only fixed
  documentation stuff in this PR.
 
 
  -Matthias
 
 
 
 
 
 




Failing test in Gelly

2015-08-09 Thread Matthias J. Sax
Hi,

I got a new failing test in this build (flink-gelly)
https://travis-ci.org/mjsax/flink/jobs/74787658

The branch is basically the current master, as I only fixed
documentation stuff in this PR.


-Matthias



signature.asc
Description: OpenPGP digital signature


Re: Failing Test again

2015-08-04 Thread Robert Metzger
I've assigned https://issues.apache.org/jira/browse/FLINK-1680 to myself.
Maybe Tachyon 0.7 will fix the issues.

On Tue, Aug 4, 2015 at 1:57 PM, Stephan Ewen se...@apache.org wrote:

 Yes.

 We should know, though, whether this is a Java 6 bug, or a bug in our
 system that just happens to occur only with Java 6 (because of different
 timings in this other engine)

 On Tue, Aug 4, 2015 at 12:27 PM, Chesnay Schepler 
 chesnay.schep...@fu-berlin.de wrote:

  Aren't we dropping java 6 support?
 
 
  On 04.08.2015 12:21, Stephan Ewen wrote:
 
  The StateCheckpointedITCase has not failed so far, which also test
 these
  guarantees thoroughly.
 
  But we need to first rule out the BarrierBuffer. The problem is that the
  bug occur only on Java 6 and cannot be reproduced locally...
 
  On Tue, Aug 4, 2015 at 12:14 PM, Gyula Fóra gyula.f...@gmail.com
 wrote:
 
  Honestly I don't think the partitioned state changes have anything to do
  with the stability, only the reworked test case, which now test proper
  exactly-once which was missing before.
 
  Stephan Ewen se...@apache.org ezt írta (időpont: 2015. aug. 4., K,
  12:12):
 
  Yes, the build stability is super serious right now.
 
  Here are the problems in question, and what we could do about this:
 
 
 
  BarrierBuffer:
  
  Barrier Buffer tests fail in Java 6 builds.
 
  I have not found a way to diagnose that problem, yet, but if we cannot
 
  find
 
  the issue today, I would be willing to revert my latest commits on the
  barrier buffer to increase the stability.
 
 
  StreamCheckpointingITCase
  ---
  This seems to have started with either the barrier buffer, or the
  updated
  partitioned state. If fixing/reverting the barrier buffer does not fix
 
  it,
 
  and no fix has come up
 
  until then, let's revert the latest changes to the partitioned state
 and
  re-add them when they are stable.
 
 
  Tachyon:
  -
  The Tachyon mini cluster has a problem, apparently, the programs exit
 
  with
 
  a sysexit or segfault.
 
  Since we have no Tachyon code ourselves, do we need this test as part
 of
  the nightly tests?
  Can we make this a manual test that we trigger on demand?
 
 
 
  Greetings,
  Stephan
 
 
 
 
  On Tue, Aug 4, 2015 at 11:41 AM, Aljoscha Krettek 
 aljos...@apache.org
  wrote:
 
  I've also seen this fail:
 
  https://travis-ci.org/apache/flink/jobs/74025862
 
  in SuccessAfterNetworkBuffersFailureITCase
 
  Build seems quite flaky recently.
 
  On Tue, 4 Aug 2015 at 10:27 Matthias J. Sax 
 
  mj...@informatik.hu-berlin.de
 
  wrote:
 
  Rebased on:
 
 
 
 
 
 https://github.com/mjsax/flink/commit/fab61a1954ff1554448e826e1d273689ed520fc3
 
  But if the gap between two rebases is large, it's hard to say what
 
  the
 
  problem might be...
 
  The old parent commit (ie, rebase before last rebase) was
 
 
 
 
 https://github.com/mjsax/flink/commit/148395bcd81a93bcb1473e4e93f267edb3b71c7e
 
  -Matthias
 
  On 08/04/2015 08:57 AM, Aljoscha Krettek wrote:
 
  What are the commits that you rebased on? Could you maybe narrow
 
  down
 
  what
 
  caused the regression?
 
  On Mon, 3 Aug 2015 at 23:31 Matthias J. Sax 
 
  mj...@informatik.hu-berlin.de
 
  wrote:
 
  I only report failing tests after a rebase. ;)
 
  -Matthias
 
  On 08/03/2015 11:23 PM, Henry Saputra wrote:
 
  Thanks for reporting it , Matthias. Will try to run Travis for
 
  latest
 
  Flink.
 
  Tachyon test is a bit flaky. Maybe updating to latest release
 
  could
 
  help.
 
  - Henry
 
  On Mon, Aug 3, 2015 at 2:18 PM, Matthias J. Sax
  mj...@informatik.hu-berlin.de wrote:
 
  Today, not a single built was successful completely. Please see
 
  here:
 
  Flink Streaming Core:
  https://travis-ci.org/mjsax/flink/jobs/73938109
  https://travis-ci.org/mjsax/flink/jobs/73951362
  https://travis-ci.org/apache/flink/jobs/73938124
  https://travis-ci.org/apache/flink/jobs/73899795
  https://travis-ci.org/apache/flink/jobs/73938122
  https://travis-ci.org/apache/flink/jobs/73952441
 
  Flink Taychon:
  https://travis-ci.org/apache/flink/jobs/73938123
 
 
  -Matthias
 
 
 
 
 



Re: Failing Test again

2015-08-04 Thread Aljoscha Krettek
I've also seen the BufferSpillerTest fail:
https://travis-ci.org/apache/flink/jobs/74057503


On Tue, 4 Aug 2015 at 14:10 Robert Metzger rmetz...@apache.org wrote:

 I've assigned https://issues.apache.org/jira/browse/FLINK-1680 to myself.
 Maybe Tachyon 0.7 will fix the issues.

 On Tue, Aug 4, 2015 at 1:57 PM, Stephan Ewen se...@apache.org wrote:

  Yes.
 
  We should know, though, whether this is a Java 6 bug, or a bug in our
  system that just happens to occur only with Java 6 (because of different
  timings in this other engine)
 
  On Tue, Aug 4, 2015 at 12:27 PM, Chesnay Schepler 
  chesnay.schep...@fu-berlin.de wrote:
 
   Aren't we dropping java 6 support?
  
  
   On 04.08.2015 12:21, Stephan Ewen wrote:
  
   The StateCheckpointedITCase has not failed so far, which also test
  these
   guarantees thoroughly.
  
   But we need to first rule out the BarrierBuffer. The problem is that
 the
   bug occur only on Java 6 and cannot be reproduced locally...
  
   On Tue, Aug 4, 2015 at 12:14 PM, Gyula Fóra gyula.f...@gmail.com
  wrote:
  
   Honestly I don't think the partitioned state changes have anything to
 do
   with the stability, only the reworked test case, which now test
 proper
   exactly-once which was missing before.
  
   Stephan Ewen se...@apache.org ezt írta (időpont: 2015. aug. 4., K,
   12:12):
  
   Yes, the build stability is super serious right now.
  
   Here are the problems in question, and what we could do about this:
  
  
  
   BarrierBuffer:
   
   Barrier Buffer tests fail in Java 6 builds.
  
   I have not found a way to diagnose that problem, yet, but if we
 cannot
  
   find
  
   the issue today, I would be willing to revert my latest commits on
 the
   barrier buffer to increase the stability.
  
  
   StreamCheckpointingITCase
   ---
   This seems to have started with either the barrier buffer, or the
   updated
   partitioned state. If fixing/reverting the barrier buffer does not
 fix
  
   it,
  
   and no fix has come up
  
   until then, let's revert the latest changes to the partitioned state
  and
   re-add them when they are stable.
  
  
   Tachyon:
   -
   The Tachyon mini cluster has a problem, apparently, the programs
 exit
  
   with
  
   a sysexit or segfault.
  
   Since we have no Tachyon code ourselves, do we need this test as
 part
  of
   the nightly tests?
   Can we make this a manual test that we trigger on demand?
  
  
  
   Greetings,
   Stephan
  
  
  
  
   On Tue, Aug 4, 2015 at 11:41 AM, Aljoscha Krettek 
  aljos...@apache.org
   wrote:
  
   I've also seen this fail:
  
   https://travis-ci.org/apache/flink/jobs/74025862
  
   in SuccessAfterNetworkBuffersFailureITCase
  
   Build seems quite flaky recently.
  
   On Tue, 4 Aug 2015 at 10:27 Matthias J. Sax 
  
   mj...@informatik.hu-berlin.de
  
   wrote:
  
   Rebased on:
  
  
  
  
  
 
 https://github.com/mjsax/flink/commit/fab61a1954ff1554448e826e1d273689ed520fc3
  
   But if the gap between two rebases is large, it's hard to say what
  
   the
  
   problem might be...
  
   The old parent commit (ie, rebase before last rebase) was
  
  
  
  
 
 https://github.com/mjsax/flink/commit/148395bcd81a93bcb1473e4e93f267edb3b71c7e
  
   -Matthias
  
   On 08/04/2015 08:57 AM, Aljoscha Krettek wrote:
  
   What are the commits that you rebased on? Could you maybe narrow
  
   down
  
   what
  
   caused the regression?
  
   On Mon, 3 Aug 2015 at 23:31 Matthias J. Sax 
  
   mj...@informatik.hu-berlin.de
  
   wrote:
  
   I only report failing tests after a rebase. ;)
  
   -Matthias
  
   On 08/03/2015 11:23 PM, Henry Saputra wrote:
  
   Thanks for reporting it , Matthias. Will try to run Travis for
  
   latest
  
   Flink.
  
   Tachyon test is a bit flaky. Maybe updating to latest release
  
   could
  
   help.
  
   - Henry
  
   On Mon, Aug 3, 2015 at 2:18 PM, Matthias J. Sax
   mj...@informatik.hu-berlin.de wrote:
  
   Today, not a single built was successful completely. Please
 see
  
   here:
  
   Flink Streaming Core:
   https://travis-ci.org/mjsax/flink/jobs/73938109
   https://travis-ci.org/mjsax/flink/jobs/73951362
   https://travis-ci.org/apache/flink/jobs/73938124
   https://travis-ci.org/apache/flink/jobs/73899795
   https://travis-ci.org/apache/flink/jobs/73938122
   https://travis-ci.org/apache/flink/jobs/73952441
  
   Flink Taychon:
   https://travis-ci.org/apache/flink/jobs/73938123
  
  
   -Matthias
  
  
  
  
  
 



Re: Failing Test again

2015-08-04 Thread Matthias J. Sax
Rebased on:

https://github.com/mjsax/flink/commit/fab61a1954ff1554448e826e1d273689ed520fc3

But if the gap between two rebases is large, it's hard to say what the
problem might be...

The old parent commit (ie, rebase before last rebase) was
https://github.com/mjsax/flink/commit/148395bcd81a93bcb1473e4e93f267edb3b71c7e

-Matthias

On 08/04/2015 08:57 AM, Aljoscha Krettek wrote:
 What are the commits that you rebased on? Could you maybe narrow down what
 caused the regression?
 
 On Mon, 3 Aug 2015 at 23:31 Matthias J. Sax mj...@informatik.hu-berlin.de
 wrote:
 
 I only report failing tests after a rebase. ;)

 -Matthias

 On 08/03/2015 11:23 PM, Henry Saputra wrote:
 Thanks for reporting it , Matthias. Will try to run Travis for latest
 Flink.

 Tachyon test is a bit flaky. Maybe updating to latest release could help.

 - Henry

 On Mon, Aug 3, 2015 at 2:18 PM, Matthias J. Sax
 mj...@informatik.hu-berlin.de wrote:
 Today, not a single built was successful completely. Please see here:

 Flink Streaming Core:
 https://travis-ci.org/mjsax/flink/jobs/73938109
 https://travis-ci.org/mjsax/flink/jobs/73951362
 https://travis-ci.org/apache/flink/jobs/73938124
 https://travis-ci.org/apache/flink/jobs/73899795
 https://travis-ci.org/apache/flink/jobs/73938122
 https://travis-ci.org/apache/flink/jobs/73952441

 Flink Taychon:
 https://travis-ci.org/apache/flink/jobs/73938123


 -Matthias



 



signature.asc
Description: OpenPGP digital signature


Re: Failing Test again

2015-08-04 Thread Stephan Ewen
Yes.

We should know, though, whether this is a Java 6 bug, or a bug in our
system that just happens to occur only with Java 6 (because of different
timings in this other engine)

On Tue, Aug 4, 2015 at 12:27 PM, Chesnay Schepler 
chesnay.schep...@fu-berlin.de wrote:

 Aren't we dropping java 6 support?


 On 04.08.2015 12:21, Stephan Ewen wrote:

 The StateCheckpointedITCase has not failed so far, which also test these
 guarantees thoroughly.

 But we need to first rule out the BarrierBuffer. The problem is that the
 bug occur only on Java 6 and cannot be reproduced locally...

 On Tue, Aug 4, 2015 at 12:14 PM, Gyula Fóra gyula.f...@gmail.com wrote:

 Honestly I don't think the partitioned state changes have anything to do
 with the stability, only the reworked test case, which now test proper
 exactly-once which was missing before.

 Stephan Ewen se...@apache.org ezt írta (időpont: 2015. aug. 4., K,
 12:12):

 Yes, the build stability is super serious right now.

 Here are the problems in question, and what we could do about this:



 BarrierBuffer:
 
 Barrier Buffer tests fail in Java 6 builds.

 I have not found a way to diagnose that problem, yet, but if we cannot

 find

 the issue today, I would be willing to revert my latest commits on the
 barrier buffer to increase the stability.


 StreamCheckpointingITCase
 ---
 This seems to have started with either the barrier buffer, or the
 updated
 partitioned state. If fixing/reverting the barrier buffer does not fix

 it,

 and no fix has come up

 until then, let's revert the latest changes to the partitioned state and
 re-add them when they are stable.


 Tachyon:
 -
 The Tachyon mini cluster has a problem, apparently, the programs exit

 with

 a sysexit or segfault.

 Since we have no Tachyon code ourselves, do we need this test as part of
 the nightly tests?
 Can we make this a manual test that we trigger on demand?



 Greetings,
 Stephan




 On Tue, Aug 4, 2015 at 11:41 AM, Aljoscha Krettek aljos...@apache.org
 wrote:

 I've also seen this fail:

 https://travis-ci.org/apache/flink/jobs/74025862

 in SuccessAfterNetworkBuffersFailureITCase

 Build seems quite flaky recently.

 On Tue, 4 Aug 2015 at 10:27 Matthias J. Sax 

 mj...@informatik.hu-berlin.de

 wrote:

 Rebased on:




 https://github.com/mjsax/flink/commit/fab61a1954ff1554448e826e1d273689ed520fc3

 But if the gap between two rebases is large, it's hard to say what

 the

 problem might be...

 The old parent commit (ie, rebase before last rebase) was



 https://github.com/mjsax/flink/commit/148395bcd81a93bcb1473e4e93f267edb3b71c7e

 -Matthias

 On 08/04/2015 08:57 AM, Aljoscha Krettek wrote:

 What are the commits that you rebased on? Could you maybe narrow

 down

 what

 caused the regression?

 On Mon, 3 Aug 2015 at 23:31 Matthias J. Sax 

 mj...@informatik.hu-berlin.de

 wrote:

 I only report failing tests after a rebase. ;)

 -Matthias

 On 08/03/2015 11:23 PM, Henry Saputra wrote:

 Thanks for reporting it , Matthias. Will try to run Travis for

 latest

 Flink.

 Tachyon test is a bit flaky. Maybe updating to latest release

 could

 help.

 - Henry

 On Mon, Aug 3, 2015 at 2:18 PM, Matthias J. Sax
 mj...@informatik.hu-berlin.de wrote:

 Today, not a single built was successful completely. Please see

 here:

 Flink Streaming Core:
 https://travis-ci.org/mjsax/flink/jobs/73938109
 https://travis-ci.org/mjsax/flink/jobs/73951362
 https://travis-ci.org/apache/flink/jobs/73938124
 https://travis-ci.org/apache/flink/jobs/73899795
 https://travis-ci.org/apache/flink/jobs/73938122
 https://travis-ci.org/apache/flink/jobs/73952441

 Flink Taychon:
 https://travis-ci.org/apache/flink/jobs/73938123


 -Matthias







Re: Failing Test again

2015-08-04 Thread Stephan Ewen
Yes, the build stability is super serious right now.

Here are the problems in question, and what we could do about this:



BarrierBuffer:

Barrier Buffer tests fail in Java 6 builds.

I have not found a way to diagnose that problem, yet, but if we cannot find
the issue today, I would be willing to revert my latest commits on the
barrier buffer to increase the stability.


StreamCheckpointingITCase
---
This seems to have started with either the barrier buffer, or the updated
partitioned state. If fixing/reverting the barrier buffer does not fix it,
and no fix has come up

until then, let's revert the latest changes to the partitioned state and
re-add them when they are stable.


Tachyon:
-
The Tachyon mini cluster has a problem, apparently, the programs exit with
a sysexit or segfault.

Since we have no Tachyon code ourselves, do we need this test as part of
the nightly tests?
Can we make this a manual test that we trigger on demand?



Greetings,
Stephan




On Tue, Aug 4, 2015 at 11:41 AM, Aljoscha Krettek aljos...@apache.org
wrote:

 I've also seen this fail: https://travis-ci.org/apache/flink/jobs/74025862

 in SuccessAfterNetworkBuffersFailureITCase

 Build seems quite flaky recently.

 On Tue, 4 Aug 2015 at 10:27 Matthias J. Sax mj...@informatik.hu-berlin.de
 
 wrote:

  Rebased on:
 
 
 
 https://github.com/mjsax/flink/commit/fab61a1954ff1554448e826e1d273689ed520fc3
 
  But if the gap between two rebases is large, it's hard to say what the
  problem might be...
 
  The old parent commit (ie, rebase before last rebase) was
 
 
 https://github.com/mjsax/flink/commit/148395bcd81a93bcb1473e4e93f267edb3b71c7e
 
  -Matthias
 
  On 08/04/2015 08:57 AM, Aljoscha Krettek wrote:
   What are the commits that you rebased on? Could you maybe narrow down
  what
   caused the regression?
  
   On Mon, 3 Aug 2015 at 23:31 Matthias J. Sax 
  mj...@informatik.hu-berlin.de
   wrote:
  
   I only report failing tests after a rebase. ;)
  
   -Matthias
  
   On 08/03/2015 11:23 PM, Henry Saputra wrote:
   Thanks for reporting it , Matthias. Will try to run Travis for latest
   Flink.
  
   Tachyon test is a bit flaky. Maybe updating to latest release could
  help.
  
   - Henry
  
   On Mon, Aug 3, 2015 at 2:18 PM, Matthias J. Sax
   mj...@informatik.hu-berlin.de wrote:
   Today, not a single built was successful completely. Please see
 here:
  
   Flink Streaming Core:
   https://travis-ci.org/mjsax/flink/jobs/73938109
   https://travis-ci.org/mjsax/flink/jobs/73951362
   https://travis-ci.org/apache/flink/jobs/73938124
   https://travis-ci.org/apache/flink/jobs/73899795
   https://travis-ci.org/apache/flink/jobs/73938122
   https://travis-ci.org/apache/flink/jobs/73952441
  
   Flink Taychon:
   https://travis-ci.org/apache/flink/jobs/73938123
  
  
   -Matthias
  
  
  
  
 
 



Re: Failing Test again

2015-08-04 Thread Chesnay Schepler

Aren't we dropping java 6 support?

On 04.08.2015 12:21, Stephan Ewen wrote:

The StateCheckpointedITCase has not failed so far, which also test these
guarantees thoroughly.

But we need to first rule out the BarrierBuffer. The problem is that the
bug occur only on Java 6 and cannot be reproduced locally...

On Tue, Aug 4, 2015 at 12:14 PM, Gyula Fóra gyula.f...@gmail.com wrote:


Honestly I don't think the partitioned state changes have anything to do
with the stability, only the reworked test case, which now test proper
exactly-once which was missing before.

Stephan Ewen se...@apache.org ezt írta (időpont: 2015. aug. 4., K,
12:12):


Yes, the build stability is super serious right now.

Here are the problems in question, and what we could do about this:



BarrierBuffer:

Barrier Buffer tests fail in Java 6 builds.

I have not found a way to diagnose that problem, yet, but if we cannot

find

the issue today, I would be willing to revert my latest commits on the
barrier buffer to increase the stability.


StreamCheckpointingITCase
---
This seems to have started with either the barrier buffer, or the updated
partitioned state. If fixing/reverting the barrier buffer does not fix

it,

and no fix has come up

until then, let's revert the latest changes to the partitioned state and
re-add them when they are stable.


Tachyon:
-
The Tachyon mini cluster has a problem, apparently, the programs exit

with

a sysexit or segfault.

Since we have no Tachyon code ourselves, do we need this test as part of
the nightly tests?
Can we make this a manual test that we trigger on demand?



Greetings,
Stephan




On Tue, Aug 4, 2015 at 11:41 AM, Aljoscha Krettek aljos...@apache.org
wrote:


I've also seen this fail:

https://travis-ci.org/apache/flink/jobs/74025862

in SuccessAfterNetworkBuffersFailureITCase

Build seems quite flaky recently.

On Tue, 4 Aug 2015 at 10:27 Matthias J. Sax 

mj...@informatik.hu-berlin.de

wrote:


Rebased on:




https://github.com/mjsax/flink/commit/fab61a1954ff1554448e826e1d273689ed520fc3

But if the gap between two rebases is large, it's hard to say what

the

problem might be...

The old parent commit (ie, rebase before last rebase) was



https://github.com/mjsax/flink/commit/148395bcd81a93bcb1473e4e93f267edb3b71c7e

-Matthias

On 08/04/2015 08:57 AM, Aljoscha Krettek wrote:

What are the commits that you rebased on? Could you maybe narrow

down

what

caused the regression?

On Mon, 3 Aug 2015 at 23:31 Matthias J. Sax 

mj...@informatik.hu-berlin.de

wrote:


I only report failing tests after a rebase. ;)

-Matthias

On 08/03/2015 11:23 PM, Henry Saputra wrote:

Thanks for reporting it , Matthias. Will try to run Travis for

latest

Flink.

Tachyon test is a bit flaky. Maybe updating to latest release

could

help.

- Henry

On Mon, Aug 3, 2015 at 2:18 PM, Matthias J. Sax
mj...@informatik.hu-berlin.de wrote:

Today, not a single built was successful completely. Please see

here:

Flink Streaming Core:
https://travis-ci.org/mjsax/flink/jobs/73938109
https://travis-ci.org/mjsax/flink/jobs/73951362
https://travis-ci.org/apache/flink/jobs/73938124
https://travis-ci.org/apache/flink/jobs/73899795
https://travis-ci.org/apache/flink/jobs/73938122
https://travis-ci.org/apache/flink/jobs/73952441

Flink Taychon:
https://travis-ci.org/apache/flink/jobs/73938123


-Matthias









Re: Failing Test again

2015-08-04 Thread Gyula Fóra
Honestly I don't think the partitioned state changes have anything to do
with the stability, only the reworked test case, which now test proper
exactly-once which was missing before.

Stephan Ewen se...@apache.org ezt írta (időpont: 2015. aug. 4., K, 12:12):

 Yes, the build stability is super serious right now.

 Here are the problems in question, and what we could do about this:



 BarrierBuffer:
 
 Barrier Buffer tests fail in Java 6 builds.

 I have not found a way to diagnose that problem, yet, but if we cannot find
 the issue today, I would be willing to revert my latest commits on the
 barrier buffer to increase the stability.


 StreamCheckpointingITCase
 ---
 This seems to have started with either the barrier buffer, or the updated
 partitioned state. If fixing/reverting the barrier buffer does not fix it,
 and no fix has come up

 until then, let's revert the latest changes to the partitioned state and
 re-add them when they are stable.


 Tachyon:
 -
 The Tachyon mini cluster has a problem, apparently, the programs exit with
 a sysexit or segfault.

 Since we have no Tachyon code ourselves, do we need this test as part of
 the nightly tests?
 Can we make this a manual test that we trigger on demand?



 Greetings,
 Stephan




 On Tue, Aug 4, 2015 at 11:41 AM, Aljoscha Krettek aljos...@apache.org
 wrote:

  I've also seen this fail:
 https://travis-ci.org/apache/flink/jobs/74025862
 
  in SuccessAfterNetworkBuffersFailureITCase
 
  Build seems quite flaky recently.
 
  On Tue, 4 Aug 2015 at 10:27 Matthias J. Sax 
 mj...@informatik.hu-berlin.de
  
  wrote:
 
   Rebased on:
  
  
  
 
 https://github.com/mjsax/flink/commit/fab61a1954ff1554448e826e1d273689ed520fc3
  
   But if the gap between two rebases is large, it's hard to say what the
   problem might be...
  
   The old parent commit (ie, rebase before last rebase) was
  
  
 
 https://github.com/mjsax/flink/commit/148395bcd81a93bcb1473e4e93f267edb3b71c7e
  
   -Matthias
  
   On 08/04/2015 08:57 AM, Aljoscha Krettek wrote:
What are the commits that you rebased on? Could you maybe narrow down
   what
caused the regression?
   
On Mon, 3 Aug 2015 at 23:31 Matthias J. Sax 
   mj...@informatik.hu-berlin.de
wrote:
   
I only report failing tests after a rebase. ;)
   
-Matthias
   
On 08/03/2015 11:23 PM, Henry Saputra wrote:
Thanks for reporting it , Matthias. Will try to run Travis for
 latest
Flink.
   
Tachyon test is a bit flaky. Maybe updating to latest release could
   help.
   
- Henry
   
On Mon, Aug 3, 2015 at 2:18 PM, Matthias J. Sax
mj...@informatik.hu-berlin.de wrote:
Today, not a single built was successful completely. Please see
  here:
   
Flink Streaming Core:
https://travis-ci.org/mjsax/flink/jobs/73938109
https://travis-ci.org/mjsax/flink/jobs/73951362
https://travis-ci.org/apache/flink/jobs/73938124
https://travis-ci.org/apache/flink/jobs/73899795
https://travis-ci.org/apache/flink/jobs/73938122
https://travis-ci.org/apache/flink/jobs/73952441
   
Flink Taychon:
https://travis-ci.org/apache/flink/jobs/73938123
   
   
-Matthias
   
   
   
   
  
  
 



Re: Failing Test again

2015-08-04 Thread Aljoscha Krettek
I've also seen this fail: https://travis-ci.org/apache/flink/jobs/74025862

in SuccessAfterNetworkBuffersFailureITCase

Build seems quite flaky recently.

On Tue, 4 Aug 2015 at 10:27 Matthias J. Sax mj...@informatik.hu-berlin.de
wrote:

 Rebased on:


 https://github.com/mjsax/flink/commit/fab61a1954ff1554448e826e1d273689ed520fc3

 But if the gap between two rebases is large, it's hard to say what the
 problem might be...

 The old parent commit (ie, rebase before last rebase) was

 https://github.com/mjsax/flink/commit/148395bcd81a93bcb1473e4e93f267edb3b71c7e

 -Matthias

 On 08/04/2015 08:57 AM, Aljoscha Krettek wrote:
  What are the commits that you rebased on? Could you maybe narrow down
 what
  caused the regression?
 
  On Mon, 3 Aug 2015 at 23:31 Matthias J. Sax 
 mj...@informatik.hu-berlin.de
  wrote:
 
  I only report failing tests after a rebase. ;)
 
  -Matthias
 
  On 08/03/2015 11:23 PM, Henry Saputra wrote:
  Thanks for reporting it , Matthias. Will try to run Travis for latest
  Flink.
 
  Tachyon test is a bit flaky. Maybe updating to latest release could
 help.
 
  - Henry
 
  On Mon, Aug 3, 2015 at 2:18 PM, Matthias J. Sax
  mj...@informatik.hu-berlin.de wrote:
  Today, not a single built was successful completely. Please see here:
 
  Flink Streaming Core:
  https://travis-ci.org/mjsax/flink/jobs/73938109
  https://travis-ci.org/mjsax/flink/jobs/73951362
  https://travis-ci.org/apache/flink/jobs/73938124
  https://travis-ci.org/apache/flink/jobs/73899795
  https://travis-ci.org/apache/flink/jobs/73938122
  https://travis-ci.org/apache/flink/jobs/73952441
 
  Flink Taychon:
  https://travis-ci.org/apache/flink/jobs/73938123
 
 
  -Matthias
 
 
 
 




Failing Test

2015-08-03 Thread Matthias J. Sax
Hi,

I just hit a failing test
(https://travis-ci.org/apache/flink/jobs/73899795). It is know or new?

 Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 86.929 sec 
  FAILURE! - in 
 org.apache.flink.test.checkpointing.StreamCheckpointingITCase
 runCheckpointedProgram(org.apache.flink.test.checkpointing.StreamCheckpointingITCase)
  Time elapsed: 77.945 sec  FAILURE!
 java.lang.AssertionError: expected:25 but was:0
 at org.junit.Assert.fail(Assert.java:88)
 at org.junit.Assert.failNotEquals(Assert.java:743)
 at org.junit.Assert.assertEquals(Assert.java:118)
 at org.junit.Assert.assertEquals(Assert.java:144)
 at 
 org.apache.flink.test.checkpointing.StreamCheckpointingITCase.runCheckpointedProgram(StreamCheckpointingITCase.java:164)



-Matthias



signature.asc
Description: OpenPGP digital signature


Failing Test again

2015-08-03 Thread Matthias J. Sax
Today, not a single built was successful completely. Please see here:

Flink Streaming Core:
https://travis-ci.org/mjsax/flink/jobs/73938109
https://travis-ci.org/mjsax/flink/jobs/73951362
https://travis-ci.org/apache/flink/jobs/73938124
https://travis-ci.org/apache/flink/jobs/73899795
https://travis-ci.org/apache/flink/jobs/73938122
https://travis-ci.org/apache/flink/jobs/73952441

Flink Taychon:
https://travis-ci.org/apache/flink/jobs/73938123


-Matthias



signature.asc
Description: OpenPGP digital signature


Re: Failing Test again

2015-08-03 Thread Henry Saputra
Thanks for reporting it , Matthias. Will try to run Travis for latest Flink.

Tachyon test is a bit flaky. Maybe updating to latest release could help.

- Henry

On Mon, Aug 3, 2015 at 2:18 PM, Matthias J. Sax
mj...@informatik.hu-berlin.de wrote:
 Today, not a single built was successful completely. Please see here:

 Flink Streaming Core:
 https://travis-ci.org/mjsax/flink/jobs/73938109
 https://travis-ci.org/mjsax/flink/jobs/73951362
 https://travis-ci.org/apache/flink/jobs/73938124
 https://travis-ci.org/apache/flink/jobs/73899795
 https://travis-ci.org/apache/flink/jobs/73938122
 https://travis-ci.org/apache/flink/jobs/73952441

 Flink Taychon:
 https://travis-ci.org/apache/flink/jobs/73938123


 -Matthias



Re: Failing Test again

2015-08-03 Thread Matthias J. Sax
I only report failing tests after a rebase. ;)

-Matthias

On 08/03/2015 11:23 PM, Henry Saputra wrote:
 Thanks for reporting it , Matthias. Will try to run Travis for latest Flink.
 
 Tachyon test is a bit flaky. Maybe updating to latest release could help.
 
 - Henry
 
 On Mon, Aug 3, 2015 at 2:18 PM, Matthias J. Sax
 mj...@informatik.hu-berlin.de wrote:
 Today, not a single built was successful completely. Please see here:

 Flink Streaming Core:
 https://travis-ci.org/mjsax/flink/jobs/73938109
 https://travis-ci.org/mjsax/flink/jobs/73951362
 https://travis-ci.org/apache/flink/jobs/73938124
 https://travis-ci.org/apache/flink/jobs/73899795
 https://travis-ci.org/apache/flink/jobs/73938122
 https://travis-ci.org/apache/flink/jobs/73952441

 Flink Taychon:
 https://travis-ci.org/apache/flink/jobs/73938123


 -Matthias




signature.asc
Description: OpenPGP digital signature


Re: Failing Test

2015-08-03 Thread Stephan Ewen
Seen this a few times as well.

May be something with the latest partitioned state changes...

On Mon, Aug 3, 2015 at 5:48 PM, Matthias J. Sax 
mj...@informatik.hu-berlin.de wrote:

 Hi,

 I just hit a failing test
 (https://travis-ci.org/apache/flink/jobs/73899795). It is know or new?

  Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 86.929
 sec  FAILURE! - in
 org.apache.flink.test.checkpointing.StreamCheckpointingITCase
 
 runCheckpointedProgram(org.apache.flink.test.checkpointing.StreamCheckpointingITCase)
 Time elapsed: 77.945 sec  FAILURE!
  java.lang.AssertionError: expected:25 but was:0
  at org.junit.Assert.fail(Assert.java:88)
  at org.junit.Assert.failNotEquals(Assert.java:743)
  at org.junit.Assert.assertEquals(Assert.java:118)
  at org.junit.Assert.assertEquals(Assert.java:144)
  at
 org.apache.flink.test.checkpointing.StreamCheckpointingITCase.runCheckpointedProgram(StreamCheckpointingITCase.java:164)



 -Matthias




Re: Failing Test

2015-07-17 Thread Maximilian Michels
Thanks Matthias for overlooking the issue.

Thank you Till for the problem formulation and the suggested steps for
solving the synchronization problem. I will look into this as soon as
possible.

Cheers,
Max

On Fri, Jul 17, 2015 at 11:18 AM, Matthias J. Sax 
mj...@informatik.hu-berlin.de wrote:

 I will open an JIRA for this. It's getting complicated.

 On 07/17/2015 11:04 AM, Till Rohrmann wrote:
  I think the problem might be related to the way the test is constructed.
  The test submits a job to the JM and then tries to poll the accumulators
  from the JM. If it does not succeed, then the polling is retried with an
  decreasing pause in between. Furthermore, the task which updates the
  accumulators also sleeps for the same period until it reads the next
  element and updates the accumulators.
 
  Since the test does not use an explicit synchronization but instead
 relies
  on sleeps, it will most likely exhibit a flakey behaviour. Sleeps don't
  work reliable enough, especially on Travis, to guarantee a certain thread
  interleaving. I'd recommend introducing explicit synchronization
 mechanism
  which control the behaviour of the accumulator producing task and
 explicit
  testing messages which indicate that a new accumulator value has arrived
 at
  the JM.
 
  Cheers,
  Till
 
  On Thu, Jul 16, 2015 at 11:04 PM, Matthias J. Sax 
  mj...@informatik.hu-berlin.de wrote:
 
  Hi,
 
  the test still fails. This time in both runs (Flink Travis and my own
  Travis) -- only for Java 8 again:
 
  https://travis-ci.org/apache/flink/jobs/71314132
  https://travis-ci.org/mjsax/flink/jobs/71179608
 
  -Matthias
 
 
  On 07/16/2015 02:28 PM, Matthias J. Sax wrote:
  Great! I will. As 4 of 5 runs succeeded I cannot test explicitly. Will
  have an eye on it in future runs.
 
  -Matthias
 
 
  On 07/16/2015 02:24 PM, Maximilian Michels wrote:
  Hi Matthias,
 
  I've pushed a fix to the master. The problem should be solved. Please
  tell
  me if your Travis reports an error again. My Travis never complained
 :)
 
  Cheers,
  Max
 
  On Thu, Jul 16, 2015 at 12:00 PM, Maximilian Michels m...@apache.org
  wrote:
 
  Hi Matthias,
 
  This is indeed a timing issue when checking for the results in this
  test.
  The new accumulator implementation now continuously reports from the
  running tasks to the job manager. This was merged yesterday.
 
  The assertion that fails there is a bit strict. Actually, I've
 already
  integrated a retry mechanism that fails only if the assertions don't
  hold
  for a configured number of times.
 
  I'll commit a fix to the master. Thanks for reporting!
 
  Cheers,
  Max
 
  On Thu, Jul 16, 2015 at 11:33 AM, Ufuk Celebi u...@apache.org
 wrote:
 
  Hey,
 
  this has been merged yesterday. I guess it's a timing issue when
  verifying the results. Can you file an issue for this?
 
  – Ufuk
 
  On 16 Jul 2015, at 11:30, Matthias J. Sax 
  mj...@informatik.hu-berlin.de
  wrote:
 
  Hi,
 
  I hit another failing test (that is new to me):
 
  Results :
  Failed tests:
 
 
 
 AccumulatorLiveITCase.testProgram:106-access$1100:68-checkFlinkAccumulators:189
  null
 
 
  Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed:
  8.694
  sec  FAILURE! - in
  org.apache.flink.test.accumulators.AccumulatorLiveITCase
 
  testProgram(org.apache.flink.test.accumulators.AccumulatorLiveITCase)
  Time elapsed: 8.021 sec  FAILURE!
  java.lang.AssertionError: null
  at org.junit.Assert.fail(Assert.java:86)
  at org.junit.Assert.assertTrue(Assert.java:41)
  at org.junit.Assert.assertTrue(Assert.java:52)
  at
 
 
 org.apache.flink.test.accumulators.AccumulatorLiveITCase.checkFlinkAccumulators(AccumulatorLiveITCase.java:189)
  at
 
 
 org.apache.flink.test.accumulators.AccumulatorLiveITCase.access$1100(AccumulatorLiveITCase.java:68)
 
  Please see: https://travis-ci.org/mjsax/flink/jobs/71179608
 
  Does anyone know anything about it?
 
  BTW: Even if this test is in flink-tests, the problem seems not to
 be
  related to https://issues.apache.org/jira/browse/FLINK-2032
 because
  accumulators are tested. There are not result files involved (as
 fas
  as
  I can tell).
 
 
 
  -Matthias
 
 
 
 
 
 
 
 
 




Re: Failing Test

2015-07-17 Thread Till Rohrmann
I think the problem might be related to the way the test is constructed.
The test submits a job to the JM and then tries to poll the accumulators
from the JM. If it does not succeed, then the polling is retried with an
decreasing pause in between. Furthermore, the task which updates the
accumulators also sleeps for the same period until it reads the next
element and updates the accumulators.

Since the test does not use an explicit synchronization but instead relies
on sleeps, it will most likely exhibit a flakey behaviour. Sleeps don't
work reliable enough, especially on Travis, to guarantee a certain thread
interleaving. I'd recommend introducing explicit synchronization mechanism
which control the behaviour of the accumulator producing task and explicit
testing messages which indicate that a new accumulator value has arrived at
the JM.

Cheers,
Till

On Thu, Jul 16, 2015 at 11:04 PM, Matthias J. Sax 
mj...@informatik.hu-berlin.de wrote:

 Hi,

 the test still fails. This time in both runs (Flink Travis and my own
 Travis) -- only for Java 8 again:

 https://travis-ci.org/apache/flink/jobs/71314132
 https://travis-ci.org/mjsax/flink/jobs/71179608

 -Matthias


 On 07/16/2015 02:28 PM, Matthias J. Sax wrote:
  Great! I will. As 4 of 5 runs succeeded I cannot test explicitly. Will
  have an eye on it in future runs.
 
  -Matthias
 
 
  On 07/16/2015 02:24 PM, Maximilian Michels wrote:
  Hi Matthias,
 
  I've pushed a fix to the master. The problem should be solved. Please
 tell
  me if your Travis reports an error again. My Travis never complained :)
 
  Cheers,
  Max
 
  On Thu, Jul 16, 2015 at 12:00 PM, Maximilian Michels m...@apache.org
 wrote:
 
  Hi Matthias,
 
  This is indeed a timing issue when checking for the results in this
 test.
  The new accumulator implementation now continuously reports from the
  running tasks to the job manager. This was merged yesterday.
 
  The assertion that fails there is a bit strict. Actually, I've already
  integrated a retry mechanism that fails only if the assertions don't
 hold
  for a configured number of times.
 
  I'll commit a fix to the master. Thanks for reporting!
 
  Cheers,
  Max
 
  On Thu, Jul 16, 2015 at 11:33 AM, Ufuk Celebi u...@apache.org wrote:
 
  Hey,
 
  this has been merged yesterday. I guess it's a timing issue when
  verifying the results. Can you file an issue for this?
 
  – Ufuk
 
  On 16 Jul 2015, at 11:30, Matthias J. Sax 
 mj...@informatik.hu-berlin.de
  wrote:
 
  Hi,
 
  I hit another failing test (that is new to me):
 
  Results :
  Failed tests:
 
 
 AccumulatorLiveITCase.testProgram:106-access$1100:68-checkFlinkAccumulators:189
  null
 
 
  Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed:
 8.694
  sec  FAILURE! - in
  org.apache.flink.test.accumulators.AccumulatorLiveITCase
 
 testProgram(org.apache.flink.test.accumulators.AccumulatorLiveITCase)
  Time elapsed: 8.021 sec  FAILURE!
  java.lang.AssertionError: null
  at org.junit.Assert.fail(Assert.java:86)
  at org.junit.Assert.assertTrue(Assert.java:41)
  at org.junit.Assert.assertTrue(Assert.java:52)
  at
 
 org.apache.flink.test.accumulators.AccumulatorLiveITCase.checkFlinkAccumulators(AccumulatorLiveITCase.java:189)
  at
 
 org.apache.flink.test.accumulators.AccumulatorLiveITCase.access$1100(AccumulatorLiveITCase.java:68)
 
  Please see: https://travis-ci.org/mjsax/flink/jobs/71179608
 
  Does anyone know anything about it?
 
  BTW: Even if this test is in flink-tests, the problem seems not to be
  related to https://issues.apache.org/jira/browse/FLINK-2032 because
  accumulators are tested. There are not result files involved (as fas
 as
  I can tell).
 
 
 
  -Matthias
 
 
 
 
 
 




Re: Failing Test

2015-07-17 Thread Matthias J. Sax
I will open an JIRA for this. It's getting complicated.

On 07/17/2015 11:04 AM, Till Rohrmann wrote:
 I think the problem might be related to the way the test is constructed.
 The test submits a job to the JM and then tries to poll the accumulators
 from the JM. If it does not succeed, then the polling is retried with an
 decreasing pause in between. Furthermore, the task which updates the
 accumulators also sleeps for the same period until it reads the next
 element and updates the accumulators.
 
 Since the test does not use an explicit synchronization but instead relies
 on sleeps, it will most likely exhibit a flakey behaviour. Sleeps don't
 work reliable enough, especially on Travis, to guarantee a certain thread
 interleaving. I'd recommend introducing explicit synchronization mechanism
 which control the behaviour of the accumulator producing task and explicit
 testing messages which indicate that a new accumulator value has arrived at
 the JM.
 
 Cheers,
 Till
 
 On Thu, Jul 16, 2015 at 11:04 PM, Matthias J. Sax 
 mj...@informatik.hu-berlin.de wrote:
 
 Hi,

 the test still fails. This time in both runs (Flink Travis and my own
 Travis) -- only for Java 8 again:

 https://travis-ci.org/apache/flink/jobs/71314132
 https://travis-ci.org/mjsax/flink/jobs/71179608

 -Matthias


 On 07/16/2015 02:28 PM, Matthias J. Sax wrote:
 Great! I will. As 4 of 5 runs succeeded I cannot test explicitly. Will
 have an eye on it in future runs.

 -Matthias


 On 07/16/2015 02:24 PM, Maximilian Michels wrote:
 Hi Matthias,

 I've pushed a fix to the master. The problem should be solved. Please
 tell
 me if your Travis reports an error again. My Travis never complained :)

 Cheers,
 Max

 On Thu, Jul 16, 2015 at 12:00 PM, Maximilian Michels m...@apache.org
 wrote:

 Hi Matthias,

 This is indeed a timing issue when checking for the results in this
 test.
 The new accumulator implementation now continuously reports from the
 running tasks to the job manager. This was merged yesterday.

 The assertion that fails there is a bit strict. Actually, I've already
 integrated a retry mechanism that fails only if the assertions don't
 hold
 for a configured number of times.

 I'll commit a fix to the master. Thanks for reporting!

 Cheers,
 Max

 On Thu, Jul 16, 2015 at 11:33 AM, Ufuk Celebi u...@apache.org wrote:

 Hey,

 this has been merged yesterday. I guess it's a timing issue when
 verifying the results. Can you file an issue for this?

 – Ufuk

 On 16 Jul 2015, at 11:30, Matthias J. Sax 
 mj...@informatik.hu-berlin.de
 wrote:

 Hi,

 I hit another failing test (that is new to me):

 Results :
 Failed tests:


 AccumulatorLiveITCase.testProgram:106-access$1100:68-checkFlinkAccumulators:189
 null


 Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed:
 8.694
 sec  FAILURE! - in
 org.apache.flink.test.accumulators.AccumulatorLiveITCase

 testProgram(org.apache.flink.test.accumulators.AccumulatorLiveITCase)
 Time elapsed: 8.021 sec  FAILURE!
 java.lang.AssertionError: null
 at org.junit.Assert.fail(Assert.java:86)
 at org.junit.Assert.assertTrue(Assert.java:41)
 at org.junit.Assert.assertTrue(Assert.java:52)
 at

 org.apache.flink.test.accumulators.AccumulatorLiveITCase.checkFlinkAccumulators(AccumulatorLiveITCase.java:189)
 at

 org.apache.flink.test.accumulators.AccumulatorLiveITCase.access$1100(AccumulatorLiveITCase.java:68)

 Please see: https://travis-ci.org/mjsax/flink/jobs/71179608

 Does anyone know anything about it?

 BTW: Even if this test is in flink-tests, the problem seems not to be
 related to https://issues.apache.org/jira/browse/FLINK-2032 because
 accumulators are tested. There are not result files involved (as fas
 as
 I can tell).



 -Matthias








 



signature.asc
Description: OpenPGP digital signature


Failing Test

2015-07-16 Thread Matthias J. Sax
Hi,

I hit another failing test (that is new to me):

 Results :
 Failed tests:
 AccumulatorLiveITCase.testProgram:106-access$1100:68-checkFlinkAccumulators:189
  null


 Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 8.694 sec  
 FAILURE! - in org.apache.flink.test.accumulators.AccumulatorLiveITCase
 testProgram(org.apache.flink.test.accumulators.AccumulatorLiveITCase) Time 
 elapsed: 8.021 sec  FAILURE!
 java.lang.AssertionError: null
 at org.junit.Assert.fail(Assert.java:86)
 at org.junit.Assert.assertTrue(Assert.java:41)
 at org.junit.Assert.assertTrue(Assert.java:52)
 at 
 org.apache.flink.test.accumulators.AccumulatorLiveITCase.checkFlinkAccumulators(AccumulatorLiveITCase.java:189)
 at 
 org.apache.flink.test.accumulators.AccumulatorLiveITCase.access$1100(AccumulatorLiveITCase.java:68)

Please see: https://travis-ci.org/mjsax/flink/jobs/71179608

Does anyone know anything about it?

BTW: Even if this test is in flink-tests, the problem seems not to be
related to https://issues.apache.org/jira/browse/FLINK-2032 because
accumulators are tested. There are not result files involved (as fas as
I can tell).



-Matthias



signature.asc
Description: OpenPGP digital signature


Re: Failing Test

2015-07-16 Thread Maximilian Michels
Hi Matthias,

This is indeed a timing issue when checking for the results in this test.
The new accumulator implementation now continuously reports from the
running tasks to the job manager. This was merged yesterday.

The assertion that fails there is a bit strict. Actually, I've already
integrated a retry mechanism that fails only if the assertions don't hold
for a configured number of times.

I'll commit a fix to the master. Thanks for reporting!

Cheers,
Max

On Thu, Jul 16, 2015 at 11:33 AM, Ufuk Celebi u...@apache.org wrote:

 Hey,

 this has been merged yesterday. I guess it's a timing issue when verifying
 the results. Can you file an issue for this?

 – Ufuk

 On 16 Jul 2015, at 11:30, Matthias J. Sax mj...@informatik.hu-berlin.de
 wrote:

  Hi,
 
  I hit another failing test (that is new to me):
 
  Results :
  Failed tests:
 
 AccumulatorLiveITCase.testProgram:106-access$1100:68-checkFlinkAccumulators:189
 null
 
 
  Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 8.694
 sec  FAILURE! - in
 org.apache.flink.test.accumulators.AccumulatorLiveITCase
  testProgram(org.apache.flink.test.accumulators.AccumulatorLiveITCase)
 Time elapsed: 8.021 sec  FAILURE!
  java.lang.AssertionError: null
  at org.junit.Assert.fail(Assert.java:86)
  at org.junit.Assert.assertTrue(Assert.java:41)
  at org.junit.Assert.assertTrue(Assert.java:52)
  at
 org.apache.flink.test.accumulators.AccumulatorLiveITCase.checkFlinkAccumulators(AccumulatorLiveITCase.java:189)
  at
 org.apache.flink.test.accumulators.AccumulatorLiveITCase.access$1100(AccumulatorLiveITCase.java:68)
 
  Please see: https://travis-ci.org/mjsax/flink/jobs/71179608
 
  Does anyone know anything about it?
 
  BTW: Even if this test is in flink-tests, the problem seems not to be
  related to https://issues.apache.org/jira/browse/FLINK-2032 because
  accumulators are tested. There are not result files involved (as fas as
  I can tell).
 
 
 
  -Matthias
 




Re: Failing Test

2015-07-16 Thread Ufuk Celebi
Hey,

this has been merged yesterday. I guess it's a timing issue when verifying the 
results. Can you file an issue for this?

– Ufuk

On 16 Jul 2015, at 11:30, Matthias J. Sax mj...@informatik.hu-berlin.de wrote:

 Hi,
 
 I hit another failing test (that is new to me):
 
 Results :
 Failed tests:
 AccumulatorLiveITCase.testProgram:106-access$1100:68-checkFlinkAccumulators:189
  null
 
 
 Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 8.694 sec 
  FAILURE! - in org.apache.flink.test.accumulators.AccumulatorLiveITCase
 testProgram(org.apache.flink.test.accumulators.AccumulatorLiveITCase) Time 
 elapsed: 8.021 sec  FAILURE!
 java.lang.AssertionError: null
 at org.junit.Assert.fail(Assert.java:86)
 at org.junit.Assert.assertTrue(Assert.java:41)
 at org.junit.Assert.assertTrue(Assert.java:52)
 at 
 org.apache.flink.test.accumulators.AccumulatorLiveITCase.checkFlinkAccumulators(AccumulatorLiveITCase.java:189)
 at 
 org.apache.flink.test.accumulators.AccumulatorLiveITCase.access$1100(AccumulatorLiveITCase.java:68)
 
 Please see: https://travis-ci.org/mjsax/flink/jobs/71179608
 
 Does anyone know anything about it?
 
 BTW: Even if this test is in flink-tests, the problem seems not to be
 related to https://issues.apache.org/jira/browse/FLINK-2032 because
 accumulators are tested. There are not result files involved (as fas as
 I can tell).
 
 
 
 -Matthias
 



Re: Failing Test

2015-07-16 Thread Maximilian Michels
Hi Matthias,

I've pushed a fix to the master. The problem should be solved. Please tell
me if your Travis reports an error again. My Travis never complained :)

Cheers,
Max

On Thu, Jul 16, 2015 at 12:00 PM, Maximilian Michels m...@apache.org wrote:

 Hi Matthias,

 This is indeed a timing issue when checking for the results in this test.
 The new accumulator implementation now continuously reports from the
 running tasks to the job manager. This was merged yesterday.

 The assertion that fails there is a bit strict. Actually, I've already
 integrated a retry mechanism that fails only if the assertions don't hold
 for a configured number of times.

 I'll commit a fix to the master. Thanks for reporting!

 Cheers,
 Max

 On Thu, Jul 16, 2015 at 11:33 AM, Ufuk Celebi u...@apache.org wrote:

 Hey,

 this has been merged yesterday. I guess it's a timing issue when
 verifying the results. Can you file an issue for this?

 – Ufuk

 On 16 Jul 2015, at 11:30, Matthias J. Sax mj...@informatik.hu-berlin.de
 wrote:

  Hi,
 
  I hit another failing test (that is new to me):
 
  Results :
  Failed tests:
 
 AccumulatorLiveITCase.testProgram:106-access$1100:68-checkFlinkAccumulators:189
 null
 
 
  Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 8.694
 sec  FAILURE! - in
 org.apache.flink.test.accumulators.AccumulatorLiveITCase
  testProgram(org.apache.flink.test.accumulators.AccumulatorLiveITCase)
 Time elapsed: 8.021 sec  FAILURE!
  java.lang.AssertionError: null
  at org.junit.Assert.fail(Assert.java:86)
  at org.junit.Assert.assertTrue(Assert.java:41)
  at org.junit.Assert.assertTrue(Assert.java:52)
  at
 org.apache.flink.test.accumulators.AccumulatorLiveITCase.checkFlinkAccumulators(AccumulatorLiveITCase.java:189)
  at
 org.apache.flink.test.accumulators.AccumulatorLiveITCase.access$1100(AccumulatorLiveITCase.java:68)
 
  Please see: https://travis-ci.org/mjsax/flink/jobs/71179608
 
  Does anyone know anything about it?
 
  BTW: Even if this test is in flink-tests, the problem seems not to be
  related to https://issues.apache.org/jira/browse/FLINK-2032 because
  accumulators are tested. There are not result files involved (as fas as
  I can tell).
 
 
 
  -Matthias
 





Re: Failing Test

2015-07-16 Thread Matthias J. Sax
Great! I will. As 4 of 5 runs succeeded I cannot test explicitly. Will
have an eye on it in future runs.

-Matthias


On 07/16/2015 02:24 PM, Maximilian Michels wrote:
 Hi Matthias,
 
 I've pushed a fix to the master. The problem should be solved. Please tell
 me if your Travis reports an error again. My Travis never complained :)
 
 Cheers,
 Max
 
 On Thu, Jul 16, 2015 at 12:00 PM, Maximilian Michels m...@apache.org wrote:
 
 Hi Matthias,

 This is indeed a timing issue when checking for the results in this test.
 The new accumulator implementation now continuously reports from the
 running tasks to the job manager. This was merged yesterday.

 The assertion that fails there is a bit strict. Actually, I've already
 integrated a retry mechanism that fails only if the assertions don't hold
 for a configured number of times.

 I'll commit a fix to the master. Thanks for reporting!

 Cheers,
 Max

 On Thu, Jul 16, 2015 at 11:33 AM, Ufuk Celebi u...@apache.org wrote:

 Hey,

 this has been merged yesterday. I guess it's a timing issue when
 verifying the results. Can you file an issue for this?

 – Ufuk

 On 16 Jul 2015, at 11:30, Matthias J. Sax mj...@informatik.hu-berlin.de
 wrote:

 Hi,

 I hit another failing test (that is new to me):

 Results :
 Failed tests:

 AccumulatorLiveITCase.testProgram:106-access$1100:68-checkFlinkAccumulators:189
 null


 Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 8.694
 sec  FAILURE! - in
 org.apache.flink.test.accumulators.AccumulatorLiveITCase
 testProgram(org.apache.flink.test.accumulators.AccumulatorLiveITCase)
 Time elapsed: 8.021 sec  FAILURE!
 java.lang.AssertionError: null
 at org.junit.Assert.fail(Assert.java:86)
 at org.junit.Assert.assertTrue(Assert.java:41)
 at org.junit.Assert.assertTrue(Assert.java:52)
 at
 org.apache.flink.test.accumulators.AccumulatorLiveITCase.checkFlinkAccumulators(AccumulatorLiveITCase.java:189)
 at
 org.apache.flink.test.accumulators.AccumulatorLiveITCase.access$1100(AccumulatorLiveITCase.java:68)

 Please see: https://travis-ci.org/mjsax/flink/jobs/71179608

 Does anyone know anything about it?

 BTW: Even if this test is in flink-tests, the problem seems not to be
 related to https://issues.apache.org/jira/browse/FLINK-2032 because
 accumulators are tested. There are not result files involved (as fas as
 I can tell).



 -Matthias




 



signature.asc
Description: OpenPGP digital signature


Re: Failing Test

2015-07-16 Thread Matthias J. Sax
Hi,

the test still fails. This time in both runs (Flink Travis and my own
Travis) -- only for Java 8 again:

https://travis-ci.org/apache/flink/jobs/71314132
https://travis-ci.org/mjsax/flink/jobs/71179608

-Matthias


On 07/16/2015 02:28 PM, Matthias J. Sax wrote:
 Great! I will. As 4 of 5 runs succeeded I cannot test explicitly. Will
 have an eye on it in future runs.
 
 -Matthias
 
 
 On 07/16/2015 02:24 PM, Maximilian Michels wrote:
 Hi Matthias,

 I've pushed a fix to the master. The problem should be solved. Please tell
 me if your Travis reports an error again. My Travis never complained :)

 Cheers,
 Max

 On Thu, Jul 16, 2015 at 12:00 PM, Maximilian Michels m...@apache.org wrote:

 Hi Matthias,

 This is indeed a timing issue when checking for the results in this test.
 The new accumulator implementation now continuously reports from the
 running tasks to the job manager. This was merged yesterday.

 The assertion that fails there is a bit strict. Actually, I've already
 integrated a retry mechanism that fails only if the assertions don't hold
 for a configured number of times.

 I'll commit a fix to the master. Thanks for reporting!

 Cheers,
 Max

 On Thu, Jul 16, 2015 at 11:33 AM, Ufuk Celebi u...@apache.org wrote:

 Hey,

 this has been merged yesterday. I guess it's a timing issue when
 verifying the results. Can you file an issue for this?

 – Ufuk

 On 16 Jul 2015, at 11:30, Matthias J. Sax mj...@informatik.hu-berlin.de
 wrote:

 Hi,

 I hit another failing test (that is new to me):

 Results :
 Failed tests:

 AccumulatorLiveITCase.testProgram:106-access$1100:68-checkFlinkAccumulators:189
 null


 Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 8.694
 sec  FAILURE! - in
 org.apache.flink.test.accumulators.AccumulatorLiveITCase
 testProgram(org.apache.flink.test.accumulators.AccumulatorLiveITCase)
 Time elapsed: 8.021 sec  FAILURE!
 java.lang.AssertionError: null
 at org.junit.Assert.fail(Assert.java:86)
 at org.junit.Assert.assertTrue(Assert.java:41)
 at org.junit.Assert.assertTrue(Assert.java:52)
 at
 org.apache.flink.test.accumulators.AccumulatorLiveITCase.checkFlinkAccumulators(AccumulatorLiveITCase.java:189)
 at
 org.apache.flink.test.accumulators.AccumulatorLiveITCase.access$1100(AccumulatorLiveITCase.java:68)

 Please see: https://travis-ci.org/mjsax/flink/jobs/71179608

 Does anyone know anything about it?

 BTW: Even if this test is in flink-tests, the problem seems not to be
 related to https://issues.apache.org/jira/browse/FLINK-2032 because
 accumulators are tested. There are not result files involved (as fas as
 I can tell).



 -Matthias





 



signature.asc
Description: OpenPGP digital signature


[jira] [Created] (FLINK-2349) Instable (failing) Test

2015-07-12 Thread Matthias J. Sax (JIRA)
Matthias J. Sax created FLINK-2349:
--

 Summary: Instable (failing) Test
 Key: FLINK-2349
 URL: https://issues.apache.org/jira/browse/FLINK-2349
 Project: Flink
  Issue Type: Bug
  Components: Tests
Reporter: Matthias J. Sax


Instable Test fails regularly:
  - https://travis-ci.org/apache/flink/builds/70397048
  - https://travis-ci.org/mjsax/flink/jobs/70432777
  - https://travis-ci.org/mjsax/flink/jobs/70432616
  - https://travis-ci.org/mjsax/flink/jobs/70386808

Failed tests: 
   
ProcessFailureStreamingRecoveryITCaseAbstractProcessFailureRecoveryTest.testTaskManagerProcessFailure:198
 The program encountered a ProgramInvocationException : The program execution 
failed: Job execution failed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)