Re: Testing Apache Flink 0.9.0-rc1

2015-06-14 Thread Maximilian Michels
Sorry, it was already out. I was merely struggling with the Maven deploy
command because tools/generate_specific_pom.sh is not entirely compatible
with old versions of Perl or sed. The script was generating incorrect
pom.xml files..

On Sat, Jun 13, 2015 at 9:14 PM, Aljoscha Krettek aljos...@apache.org
wrote:

 The new release candidate is not yet done? We have a very simple fix that
 allows the RowSerializer of the Table API to work with null-fields. I think
 we should include that. What do you think?

 On Fri, 12 Jun 2015 at 23:50 Ufuk Celebi u...@apache.org wrote:

  I'm with Till on this. Robert's position is valid as well. Again, there
 is
  no core disagreement here. No one wants to add it to dist.
 
  On 12 Jun 2015, at 00:40, Ufuk Celebi u...@apache.org
  javascript:_e(%7B%7D,'cvml','u...@apache.org'); wrote:
 
 
  On 11 Jun 2015, at 20:04, Fabian Hueske fhue...@gmail.com
  javascript:_e(%7B%7D,'cvml','fhue...@gmail.com'); wrote:
 
  How about the following issues?
 
  1. The Hbase Hadoop Compat issue, Ufuk is working on
 
 
  I was not able to reproduce this :( I ran HadoopInputFormats against
  various sources and confirmed the results and everything was fine so far.
 
 
  The issue has been resolved Not a problem.
 



Re: Testing Apache Flink 0.9.0-rc1

2015-06-12 Thread Till Rohrmann
Hi guys,

I just noticed while testing the TableAPI on the cluster that it is not
part of the dist module. Therefore, programs using the TableAPI will only
run when you put the TableAPI jar directly on the cluster or if you build a
fat jar including the TableAPI jar. This is nowhere documented.
Furthermore, this also applies to Gelly and FlinkML.

Cheers,
Till

On Fri, Jun 12, 2015 at 9:16 AM Till Rohrmann trohrm...@apache.org wrote:

 I'm currently going through the license file and I discovered some
 skeletons in our closet. This has to be merged as well. But I'm still
 working on it (we have a lot of dependencies).

 Cheers,
 Till


 On Fri, Jun 12, 2015 at 12:51 AM Ufuk Celebi u...@apache.org wrote:


 On 12 Jun 2015, at 00:49, Fabian Hueske fhue...@gmail.com wrote:

  2. is basically done. I have a patch which updates the counters on page
  reload but that shouldn't be hard to extend to dynamic updates.

 Very nice! :-) Thanks!




Re: Testing Apache Flink 0.9.0-rc1

2015-06-12 Thread Maximilian Michels
We should have a nightly cluster test for every library. Let's keep that in
mind for the future. Very nice find, Till!

Since there were not objections, I cherry-picked the proposed commits from
the document to the release-0.9 branch. If I understand correctly, we can
create the new release candidate once Till has checked the licenses, Ufuk's
TableInput fix has been merged, and Fabian's web interface improvement are
in. Plus, we need to include all Flink libraries in flink-dist. Are you
going to fix that as well, Till?

On Fri, Jun 12, 2015 at 9:53 AM, Ufuk Celebi u...@apache.org wrote:


 On 12 Jun 2015, at 09:45, Till Rohrmann trohrm...@apache.org wrote:

  Hi guys,
 
  I just noticed while testing the TableAPI on the cluster that it is not
  part of the dist module. Therefore, programs using the TableAPI will only
  run when you put the TableAPI jar directly on the cluster or if you
 build a
  fat jar including the TableAPI jar. This is nowhere documented.
  Furthermore, this also applies to Gelly and FlinkML.

 I think all of these should be included in the fat jar. They are all
 highly advertized components.

 Very good catch, Till! I didn't get around to testing Table API on a
 cluster, yet.


Re: Testing Apache Flink 0.9.0-rc1

2015-06-12 Thread Márton Balassi
@Till: This also apples to the streaming connectors.

On Fri, Jun 12, 2015 at 9:45 AM, Till Rohrmann trohrm...@apache.org wrote:

 Hi guys,

 I just noticed while testing the TableAPI on the cluster that it is not
 part of the dist module. Therefore, programs using the TableAPI will only
 run when you put the TableAPI jar directly on the cluster or if you build a
 fat jar including the TableAPI jar. This is nowhere documented.
 Furthermore, this also applies to Gelly and FlinkML.

 Cheers,
 Till

 On Fri, Jun 12, 2015 at 9:16 AM Till Rohrmann trohrm...@apache.org
 wrote:

  I'm currently going through the license file and I discovered some
  skeletons in our closet. This has to be merged as well. But I'm still
  working on it (we have a lot of dependencies).
 
  Cheers,
  Till
 
 
  On Fri, Jun 12, 2015 at 12:51 AM Ufuk Celebi u...@apache.org wrote:
 
 
  On 12 Jun 2015, at 00:49, Fabian Hueske fhue...@gmail.com wrote:
 
   2. is basically done. I have a patch which updates the counters on
 page
   reload but that shouldn't be hard to extend to dynamic updates.
 
  Very nice! :-) Thanks!
 
 



Re: Testing Apache Flink 0.9.0-rc1

2015-06-12 Thread Fabian Hueske
I have another fix, but this is just a documentation update (FLINK-2207)
and will be done soon.

2015-06-12 10:02 GMT+02:00 Maximilian Michels m...@apache.org:

 We should have a nightly cluster test for every library. Let's keep that in
 mind for the future. Very nice find, Till!

 Since there were not objections, I cherry-picked the proposed commits from
 the document to the release-0.9 branch. If I understand correctly, we can
 create the new release candidate once Till has checked the licenses, Ufuk's
 TableInput fix has been merged, and Fabian's web interface improvement are
 in. Plus, we need to include all Flink libraries in flink-dist. Are you
 going to fix that as well, Till?

 On Fri, Jun 12, 2015 at 9:53 AM, Ufuk Celebi u...@apache.org wrote:

 
  On 12 Jun 2015, at 09:45, Till Rohrmann trohrm...@apache.org wrote:
 
   Hi guys,
  
   I just noticed while testing the TableAPI on the cluster that it is not
   part of the dist module. Therefore, programs using the TableAPI will
 only
   run when you put the TableAPI jar directly on the cluster or if you
  build a
   fat jar including the TableAPI jar. This is nowhere documented.
   Furthermore, this also applies to Gelly and FlinkML.
 
  I think all of these should be included in the fat jar. They are all
  highly advertized components.
 
  Very good catch, Till! I didn't get around to testing Table API on a
  cluster, yet.



Re: Testing Apache Flink 0.9.0-rc1

2015-06-12 Thread Márton Balassi
As for outstanding issues I think streaming is good to go as far as I know.
I am personally against including all libraries - at least speaking for the
streaming connectors. Robert, Stephan and myself had a detailed discussion
on that some time ago and the disadvantage of having all the libraries in
the distribution is the dependency mess that they pull. In this case I
would rather add documentation on putting them in the user jar then. As for
the other libraries they do not depend on so much external code, so +1 for
putting them in.

On Fri, Jun 12, 2015 at 10:02 AM, Maximilian Michels m...@apache.org wrote:

 We should have a nightly cluster test for every library. Let's keep that in
 mind for the future. Very nice find, Till!

 Since there were not objections, I cherry-picked the proposed commits from
 the document to the release-0.9 branch. If I understand correctly, we can
 create the new release candidate once Till has checked the licenses, Ufuk's
 TableInput fix has been merged, and Fabian's web interface improvement are
 in. Plus, we need to include all Flink libraries in flink-dist. Are you
 going to fix that as well, Till?

 On Fri, Jun 12, 2015 at 9:53 AM, Ufuk Celebi u...@apache.org wrote:

 
  On 12 Jun 2015, at 09:45, Till Rohrmann trohrm...@apache.org wrote:
 
   Hi guys,
  
   I just noticed while testing the TableAPI on the cluster that it is not
   part of the dist module. Therefore, programs using the TableAPI will
 only
   run when you put the TableAPI jar directly on the cluster or if you
  build a
   fat jar including the TableAPI jar. This is nowhere documented.
   Furthermore, this also applies to Gelly and FlinkML.
 
  I think all of these should be included in the fat jar. They are all
  highly advertized components.
 
  Very good catch, Till! I didn't get around to testing Table API on a
  cluster, yet.



Re: Testing Apache Flink 0.9.0-rc1

2015-06-12 Thread Till Rohrmann
What about the shaded jars?

On Fri, Jun 12, 2015 at 11:32 AM Ufuk Celebi u...@apache.org wrote:

 @Max: for the new RC. Can you make sure to set the variables correctly
 with regard to stable/snapshot versions in the docs?


Re: Testing Apache Flink 0.9.0-rc1

2015-06-12 Thread Ufuk Celebi

On 12 Jun 2015, at 09:45, Till Rohrmann trohrm...@apache.org wrote:

 Hi guys,
 
 I just noticed while testing the TableAPI on the cluster that it is not
 part of the dist module. Therefore, programs using the TableAPI will only
 run when you put the TableAPI jar directly on the cluster or if you build a
 fat jar including the TableAPI jar. This is nowhere documented.
 Furthermore, this also applies to Gelly and FlinkML.

I think all of these should be included in the fat jar. They are all highly 
advertized components.

Very good catch, Till! I didn't get around to testing Table API on a cluster, 
yet.

Re: Testing Apache Flink 0.9.0-rc1

2015-06-12 Thread Ufuk Celebi
After thinking about it a bit more, I think that's fine.

+1 to document and keep it as it is.


Re: Testing Apache Flink 0.9.0-rc1

2015-06-12 Thread Till Rohrmann
Well I think the initial idea was to keep the dist jar as small a possible
and therefore we did not include the libraries. I'm not sure whether we can
decide this here ad-hoc. If the community says that we shall include these
libraries then I can add them. But bear in mind that all of them have some
transitive dependencies which will be added as well.

On Fri, Jun 12, 2015 at 10:15 AM Márton Balassi balassi.mar...@gmail.com
wrote:

 As for outstanding issues I think streaming is good to go as far as I know.
 I am personally against including all libraries - at least speaking for the
 streaming connectors. Robert, Stephan and myself had a detailed discussion
 on that some time ago and the disadvantage of having all the libraries in
 the distribution is the dependency mess that they pull. In this case I
 would rather add documentation on putting them in the user jar then. As for
 the other libraries they do not depend on so much external code, so +1 for
 putting them in.

 On Fri, Jun 12, 2015 at 10:02 AM, Maximilian Michels m...@apache.org
 wrote:

  We should have a nightly cluster test for every library. Let's keep that
 in
  mind for the future. Very nice find, Till!
 
  Since there were not objections, I cherry-picked the proposed commits
 from
  the document to the release-0.9 branch. If I understand correctly, we can
  create the new release candidate once Till has checked the licenses,
 Ufuk's
  TableInput fix has been merged, and Fabian's web interface improvement
 are
  in. Plus, we need to include all Flink libraries in flink-dist. Are you
  going to fix that as well, Till?
 
  On Fri, Jun 12, 2015 at 9:53 AM, Ufuk Celebi u...@apache.org wrote:
 
  
   On 12 Jun 2015, at 09:45, Till Rohrmann trohrm...@apache.org wrote:
  
Hi guys,
   
I just noticed while testing the TableAPI on the cluster that it is
 not
part of the dist module. Therefore, programs using the TableAPI will
  only
run when you put the TableAPI jar directly on the cluster or if you
   build a
fat jar including the TableAPI jar. This is nowhere documented.
Furthermore, this also applies to Gelly and FlinkML.
  
   I think all of these should be included in the fat jar. They are all
   highly advertized components.
  
   Very good catch, Till! I didn't get around to testing Table API on a
   cluster, yet.
 



Re: Testing Apache Flink 0.9.0-rc1

2015-06-12 Thread Ufuk Celebi

On 12 Jun 2015, at 10:44, Till Rohrmann trohrm...@apache.org wrote:

 Yes you're right Ufuk. At the moment the user has to place the jars in the
 lib folder of Flink. If this folder is not shared then he has to do it for
 every node on which Flink runs.

OK. I guess there is a nice way to do this with YARN as well. I think it 
decreases the out-of-the-box experience quite a bit if you want to use these 
nice features.

What's your stand on this issue?

Re: Testing Apache Flink 0.9.0-rc1

2015-06-12 Thread Ufuk Celebi

On 12 Jun 2015, at 10:29, Till Rohrmann trohrm...@apache.org wrote:

 Well I think the initial idea was to keep the dist jar as small a possible
 and therefore we did not include the libraries. I'm not sure whether we can
 decide this here ad-hoc. If the community says that we shall include these
 libraries then I can add them. But bear in mind that all of them have some
 transitive dependencies which will be added as well.

I'm against the connectors as well, but not having Table API, Flink ML, and 
Gelly not in seems odd to me.

Or maybe I'm missing something. Someone who wants to try this out has to place 
the dependencies manually into the lib folder of the Flink installation, right?

Re: Testing Apache Flink 0.9.0-rc1

2015-06-12 Thread Till Rohrmann
I think I found a real release blocker. Currently we don't add license
files to our shaded jars. For example
the flink-shaded-include-yarn-0.9.0-milestone-1.jar shades hadoop code.
This code also includes the `org.apache.util.bloom.*` classes. These
classes are licensed under  The European Commission project OneLab. We have
a notice in the LICENSE file of our binary distribution but I think we also
have to add them in the shaded jar. There might even be more code bundled
as part of some shaded jars which I have not spotted yet.

Furthermore, I noticed that we list all Apache License dependencies in our
LICENSE file of our binary distribution (which we don't have to do).
However, we don't do it in our jars which contain for example guava and asm
as shaded dependencies. Maybe we should be consistent here.

But maybe I overlook something here and we don't have to do it.

On Fri, Jun 12, 2015 at 10:29 AM Till Rohrmann trohrm...@apache.org wrote:

 Well I think the initial idea was to keep the dist jar as small a possible
 and therefore we did not include the libraries. I'm not sure whether we can
 decide this here ad-hoc. If the community says that we shall include these
 libraries then I can add them. But bear in mind that all of them have some
 transitive dependencies which will be added as well.


 On Fri, Jun 12, 2015 at 10:15 AM Márton Balassi balassi.mar...@gmail.com
 wrote:

 As for outstanding issues I think streaming is good to go as far as I
 know.
 I am personally against including all libraries - at least speaking for
 the
 streaming connectors. Robert, Stephan and myself had a detailed discussion
 on that some time ago and the disadvantage of having all the libraries in
 the distribution is the dependency mess that they pull. In this case I
 would rather add documentation on putting them in the user jar then. As
 for
 the other libraries they do not depend on so much external code, so +1 for
 putting them in.

 On Fri, Jun 12, 2015 at 10:02 AM, Maximilian Michels m...@apache.org
 wrote:

  We should have a nightly cluster test for every library. Let's keep
 that in
  mind for the future. Very nice find, Till!
 
  Since there were not objections, I cherry-picked the proposed commits
 from
  the document to the release-0.9 branch. If I understand correctly, we
 can
  create the new release candidate once Till has checked the licenses,
 Ufuk's
  TableInput fix has been merged, and Fabian's web interface improvement
 are
  in. Plus, we need to include all Flink libraries in flink-dist. Are you
  going to fix that as well, Till?
 
  On Fri, Jun 12, 2015 at 9:53 AM, Ufuk Celebi u...@apache.org wrote:
 
  
   On 12 Jun 2015, at 09:45, Till Rohrmann trohrm...@apache.org wrote:
  
Hi guys,
   
I just noticed while testing the TableAPI on the cluster that it is
 not
part of the dist module. Therefore, programs using the TableAPI will
  only
run when you put the TableAPI jar directly on the cluster or if you
   build a
fat jar including the TableAPI jar. This is nowhere documented.
Furthermore, this also applies to Gelly and FlinkML.
  
   I think all of these should be included in the fat jar. They are all
   highly advertized components.
  
   Very good catch, Till! I didn't get around to testing Table API on a
   cluster, yet.
 




Re: Testing Apache Flink 0.9.0-rc1

2015-06-12 Thread Ufuk Celebi

On 12 Jun 2015, at 00:40, Ufuk Celebi u...@apache.org wrote:

 
 On 11 Jun 2015, at 20:04, Fabian Hueske fhue...@gmail.com wrote:
 
 How about the following issues?
 
 1. The Hbase Hadoop Compat issue, Ufuk is working on
 
 I was not able to reproduce this :( I ran HadoopInputFormats against various 
 sources and confirmed the results and everything was fine so far.

The issue has been resolved as Not a problem. There was some misconfiguration 
in the user code.

Re: Testing Apache Flink 0.9.0-rc1

2015-06-12 Thread Till Rohrmann
I'm currently going through the license file and I discovered some
skeletons in our closet. This has to be merged as well. But I'm still
working on it (we have a lot of dependencies).

Cheers,
Till

On Fri, Jun 12, 2015 at 12:51 AM Ufuk Celebi u...@apache.org wrote:


 On 12 Jun 2015, at 00:49, Fabian Hueske fhue...@gmail.com wrote:

  2. is basically done. I have a patch which updates the counters on page
  reload but that shouldn't be hard to extend to dynamic updates.

 Very nice! :-) Thanks!



Re: Testing Apache Flink 0.9.0-rc1

2015-06-12 Thread Till Rohrmann
I'm in favour of option b) as well.

On Fri, Jun 12, 2015 at 12:05 PM Ufuk Celebi u...@apache.org wrote:

 Yes, the LICENSE files are definitely a release blocker.

 a) Either we wait with the RC until we have fixed the LICENSES, or

 b) Put out next RC to continue with testing and then update it with the
 LICENSE [either we find something before the LICENSE update or we only have
 to review the LICENSE change]

 Since this is not a vote yet, it doesn't really matter, but I'm leaning
 towards b).


 On Fri, Jun 12, 2015 at 11:43 AM, Till Rohrmann till.rohrm...@gmail.com
 wrote:

  What about the shaded jars?
 
  On Fri, Jun 12, 2015 at 11:32 AM Ufuk Celebi u...@apache.org wrote:
 
   @Max: for the new RC. Can you make sure to set the variables correctly
   with regard to stable/snapshot versions in the docs?
 



Re: Testing Apache Flink 0.9.0-rc1

2015-06-12 Thread Fabian Hueske
+1 for b)

I'm organizing + merging the commits that need to go the new candidate
right now. Will let you know, when I am done.

2015-06-12 14:03 GMT+02:00 Till Rohrmann till.rohrm...@gmail.com:

 I'm in favour of option b) as well.

 On Fri, Jun 12, 2015 at 12:05 PM Ufuk Celebi u...@apache.org wrote:

  Yes, the LICENSE files are definitely a release blocker.
 
  a) Either we wait with the RC until we have fixed the LICENSES, or
 
  b) Put out next RC to continue with testing and then update it with the
  LICENSE [either we find something before the LICENSE update or we only
 have
  to review the LICENSE change]
 
  Since this is not a vote yet, it doesn't really matter, but I'm leaning
  towards b).
 
 
  On Fri, Jun 12, 2015 at 11:43 AM, Till Rohrmann till.rohrm...@gmail.com
 
  wrote:
 
   What about the shaded jars?
  
   On Fri, Jun 12, 2015 at 11:32 AM Ufuk Celebi u...@apache.org wrote:
  
@Max: for the new RC. Can you make sure to set the variables
 correctly
with regard to stable/snapshot versions in the docs?
  
 



Re: Testing Apache Flink 0.9.0-rc1

2015-06-12 Thread Robert Metzger
Regarding the discussion with including ML, Gelly, streaming connectors
into flink-dist.
I'm strongly against adding those into our jar because they blow up the
dependencies we are shipping by default.

Also, the maven archetype sets up everything so that the dependencies are
packaged into the usercode jar.
I'd say most of the time users are using custom dependencies anyways
(Guava), so they need to set this up properly.

I would not start recommending our users putting their dependencies into
the lib/ folder. Its much more convenient to let maven do the fat-jar
packaging.


On Fri, Jun 12, 2015 at 9:44 AM, Till Rohrmann trohrm...@apache.org wrote:

 I've finished the legal check of the source and binary distribution. The PR
 with the LICENSE and NOTICE file updates can be found here [1].

 What I haven't done yet is addressing the issue with the shaded
 dependencies. I think that we have to add to all jars which contain
 dependencies as binary data a LICENSE/NOTICE file referencing the included
 dependencies if they are not licensed under Apache-2.0 or contain a special
 NOTICE portion.

 Cheers,
 Till

 [1] https://github.com/apache/flink/pull/830

 On Fri, Jun 12, 2015 at 5:44 PM Maximilian Michels m...@apache.org wrote:

  I almost finished creating the new release candidate. Then the maven
 deploy
  command failed on me for the hadoop1 profile:
 
  [INFO]
  
  [INFO] BUILD FAILURE
  [INFO]
  
  [INFO] Total time: 19:15.388s
  [INFO] Finished at: Fri Jun 12 15:25:50 UTC 2015
  [INFO] Final Memory: 126M/752M
  [INFO]
  
  [ERROR] Failed to execute goal
  org.apache.maven.plugins:maven-checkstyle-plugin:2.12.1:check (validate)
 on
  project flink-language-binding-generic: Failed during checkstyle executi
  on: Unable to find suppressions file at location:
  /tools/maven/suppressions.xml: Could not find resource
  '/tools/maven/suppressions.xml'. - [Help 1]
  [ERROR]
  [ERROR] To see the full stack trace of the errors, re-run Maven with the
 -e
  switch.
  [ERROR] Re-run Maven using the -X switch to enable full debug logging.
  [ERROR]
  [ERROR] For more information about the errors and possible solutions,
  please read the following articles:
  [ERROR] [Help 1]
  http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
  [ERROR]
  [ERROR] After correcting the problems, you can resume the build with the
  command
  [ERROR]   mvn goals -rf :flink-language-binding-generic
 
  I need to look into this later. Unfortunately, I'm traveling this
 weekend.
 
  On Fri, Jun 12, 2015 at 3:34 PM, Fabian Hueske fhue...@gmail.com
 wrote:
 
   OK, guys. I merged and pushed the last outstanding commits to the
   release-0.9 branch.
   Good to go for a new candidate.
  
   2015-06-12 14:30 GMT+02:00 Maximilian Michels m...@apache.org:
  
+1 Let's constitute the changes in a new release candidate.
   
On Fri, Jun 12, 2015 at 2:06 PM, Fabian Hueske fhue...@gmail.com
   wrote:
   
 +1 for b)

 I'm organizing + merging the commits that need to go the new
  candidate
 right now. Will let you know, when I am done.

 2015-06-12 14:03 GMT+02:00 Till Rohrmann till.rohrm...@gmail.com
 :

  I'm in favour of option b) as well.
 
  On Fri, Jun 12, 2015 at 12:05 PM Ufuk Celebi u...@apache.org
  wrote:
 
   Yes, the LICENSE files are definitely a release blocker.
  
   a) Either we wait with the RC until we have fixed the LICENSES,
  or
  
   b) Put out next RC to continue with testing and then update it
  with
the
   LICENSE [either we find something before the LICENSE update or
 we
only
  have
   to review the LICENSE change]
  
   Since this is not a vote yet, it doesn't really matter, but I'm
leaning
   towards b).
  
  
   On Fri, Jun 12, 2015 at 11:43 AM, Till Rohrmann 
 till.rohrm...@gmail.com
  
   wrote:
  
What about the shaded jars?
   
On Fri, Jun 12, 2015 at 11:32 AM Ufuk Celebi u...@apache.org
 
wrote:
   
 @Max: for the new RC. Can you make sure to set the
 variables
  correctly
 with regard to stable/snapshot versions in the docs?
   
  
 

   
  
 



Re: Testing Apache Flink 0.9.0-rc1

2015-06-12 Thread Till Rohrmann
I agree mostly with Robert. However, one could also argue that by not
including the libraries in the dist package, the user code jar will also be
blown up by the dependencies added by the library. This will slow down job
submission, because it has to be distributed on the cluster. Furthermore, I
wouldn't expect all our users to use the quickstarts archetypes or to set
up maven such that it builds a fat jar.

I think the best is to explicitly document how to use the libraries and
what to do in order to run it on the cluster.
On Jun 12, 2015 9:15 PM, Robert Metzger rmetz...@apache.org wrote:

 Regarding the discussion with including ML, Gelly, streaming connectors
 into flink-dist.
 I'm strongly against adding those into our jar because they blow up the
 dependencies we are shipping by default.

 Also, the maven archetype sets up everything so that the dependencies are
 packaged into the usercode jar.
 I'd say most of the time users are using custom dependencies anyways
 (Guava), so they need to set this up properly.

 I would not start recommending our users putting their dependencies into
 the lib/ folder. Its much more convenient to let maven do the fat-jar
 packaging.


 On Fri, Jun 12, 2015 at 9:44 AM, Till Rohrmann trohrm...@apache.org
 wrote:

  I've finished the legal check of the source and binary distribution. The
 PR
  with the LICENSE and NOTICE file updates can be found here [1].
 
  What I haven't done yet is addressing the issue with the shaded
  dependencies. I think that we have to add to all jars which contain
  dependencies as binary data a LICENSE/NOTICE file referencing the
 included
  dependencies if they are not licensed under Apache-2.0 or contain a
 special
  NOTICE portion.
 
  Cheers,
  Till
 
  [1] https://github.com/apache/flink/pull/830
 
  On Fri, Jun 12, 2015 at 5:44 PM Maximilian Michels m...@apache.org
 wrote:
 
   I almost finished creating the new release candidate. Then the maven
  deploy
   command failed on me for the hadoop1 profile:
  
   [INFO]
  
 
   [INFO] BUILD FAILURE
   [INFO]
  
 
   [INFO] Total time: 19:15.388s
   [INFO] Finished at: Fri Jun 12 15:25:50 UTC 2015
   [INFO] Final Memory: 126M/752M
   [INFO]
  
 
   [ERROR] Failed to execute goal
   org.apache.maven.plugins:maven-checkstyle-plugin:2.12.1:check
 (validate)
  on
   project flink-language-binding-generic: Failed during checkstyle
 executi
   on: Unable to find suppressions file at location:
   /tools/maven/suppressions.xml: Could not find resource
   '/tools/maven/suppressions.xml'. - [Help 1]
   [ERROR]
   [ERROR] To see the full stack trace of the errors, re-run Maven with
 the
  -e
   switch.
   [ERROR] Re-run Maven using the -X switch to enable full debug logging.
   [ERROR]
   [ERROR] For more information about the errors and possible solutions,
   please read the following articles:
   [ERROR] [Help 1]
  
 http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
   [ERROR]
   [ERROR] After correcting the problems, you can resume the build with
 the
   command
   [ERROR]   mvn goals -rf :flink-language-binding-generic
  
   I need to look into this later. Unfortunately, I'm traveling this
  weekend.
  
   On Fri, Jun 12, 2015 at 3:34 PM, Fabian Hueske fhue...@gmail.com
  wrote:
  
OK, guys. I merged and pushed the last outstanding commits to the
release-0.9 branch.
Good to go for a new candidate.
   
2015-06-12 14:30 GMT+02:00 Maximilian Michels m...@apache.org:
   
 +1 Let's constitute the changes in a new release candidate.

 On Fri, Jun 12, 2015 at 2:06 PM, Fabian Hueske fhue...@gmail.com
wrote:

  +1 for b)
 
  I'm organizing + merging the commits that need to go the new
   candidate
  right now. Will let you know, when I am done.
 
  2015-06-12 14:03 GMT+02:00 Till Rohrmann 
 till.rohrm...@gmail.com
  :
 
   I'm in favour of option b) as well.
  
   On Fri, Jun 12, 2015 at 12:05 PM Ufuk Celebi u...@apache.org
   wrote:
  
Yes, the LICENSE files are definitely a release blocker.
   
a) Either we wait with the RC until we have fixed the
 LICENSES,
   or
   
b) Put out next RC to continue with testing and then update
 it
   with
 the
LICENSE [either we find something before the LICENSE update
 or
  we
 only
   have
to review the LICENSE change]
   
Since this is not a vote yet, it doesn't really matter, but
 I'm
 leaning
towards b).
   
   
On Fri, Jun 12, 2015 at 11:43 AM, Till Rohrmann 
  till.rohrm...@gmail.com
   
wrote:
   
 What about the shaded jars?

 On Fri, Jun 12, 2015 at 11:32 AM Ufuk Celebi 
 u...@apache.org
  
 wrote:

Re: Testing Apache Flink 0.9.0-rc1

2015-06-11 Thread Maximilian Michels
Yes, we would include those in the new release candidate.
On Jun 11, 2015 5:22 PM, Aljoscha Krettek aljos...@apache.org wrote:

 Aren't there still some commits at the top of the release document that
 need to be cherry-picked to the release branch?

 On Thu, 11 Jun 2015 at 17:13 Maximilian Michels m...@apache.org wrote:

  The deadlock in the scheduler is now fixed. Based on the changes that
 have
  been push to the release-0.9 branch, I'd like to create a new release
  candidate later on. I think we have gotten the most critical issues out
 of
  the way. Would that be ok for you?
 
  On Wed, Jun 10, 2015 at 5:56 PM, Fabian Hueske fhue...@gmail.com
 wrote:
 
   Yes, that needs to be fixed IMO
  
   2015-06-10 17:51 GMT+02:00 Till Rohrmann trohrm...@apache.org:
  
Yes since it is clearly a deadlock in the scheduler, the current
  version
shouldn't be released.
   
On Wed, Jun 10, 2015 at 5:48 PM Ufuk Celebi u...@apache.org wrote:
   

 On 10 Jun 2015, at 16:18, Maximilian Michels m...@apache.org
 wrote:

  I'm debugging the TaskManagerFailsWithSlotSharingITCase. I've
  located
its
  cause but still need to find out how to fix it.

 Very good find, Max!

 Max, Till, and I have looked into this and it is a reproducible
   deadlock
 in the scheduler during concurrent slot release (in failure cases).
  Max
 will attach the relevant stack trace to the issue.

 I think this is a release blocker. Any opinions?

 – Ufuk
   
  
 



Re: Testing Apache Flink 0.9.0-rc1

2015-06-11 Thread Fabian Hueske
How about the following issues?

1. The Hbase Hadoop Compat issue, Ufuk is working on
2. The incorrect webinterface counts

@Ufuk were you able to reproduce the bug?
The deadlock in the scheduler is now fixed. Based on the changes that have
been push to the release-0.9 branch, I'd like to create a new release
candidate later on. I think we have gotten the most critical issues out of
the way. Would that be ok for you?

On Wed, Jun 10, 2015 at 5:56 PM, Fabian Hueske fhue...@gmail.com wrote:

 Yes, that needs to be fixed IMO

 2015-06-10 17:51 GMT+02:00 Till Rohrmann trohrm...@apache.org:

  Yes since it is clearly a deadlock in the scheduler, the current version
  shouldn't be released.
 
  On Wed, Jun 10, 2015 at 5:48 PM Ufuk Celebi u...@apache.org wrote:
 
  
   On 10 Jun 2015, at 16:18, Maximilian Michels m...@apache.org wrote:
  
I'm debugging the TaskManagerFailsWithSlotSharingITCase. I've
located
  its
cause but still need to find out how to fix it.
  
   Very good find, Max!
  
   Max, Till, and I have looked into this and it is a reproducible
 deadlock
   in the scheduler during concurrent slot release (in failure cases).
Max
   will attach the relevant stack trace to the issue.
  
   I think this is a release blocker. Any opinions?
  
   – Ufuk
 



Re: Testing Apache Flink 0.9.0-rc1

2015-06-11 Thread Ufuk Celebi

On 12 Jun 2015, at 00:49, Fabian Hueske fhue...@gmail.com wrote:

 2. is basically done. I have a patch which updates the counters on page
 reload but that shouldn't be hard to extend to dynamic updates.

Very nice! :-) Thanks!


Re: Testing Apache Flink 0.9.0-rc1

2015-06-11 Thread Maximilian Michels
The deadlock in the scheduler is now fixed. Based on the changes that have
been push to the release-0.9 branch, I'd like to create a new release
candidate later on. I think we have gotten the most critical issues out of
the way. Would that be ok for you?

On Wed, Jun 10, 2015 at 5:56 PM, Fabian Hueske fhue...@gmail.com wrote:

 Yes, that needs to be fixed IMO

 2015-06-10 17:51 GMT+02:00 Till Rohrmann trohrm...@apache.org:

  Yes since it is clearly a deadlock in the scheduler, the current version
  shouldn't be released.
 
  On Wed, Jun 10, 2015 at 5:48 PM Ufuk Celebi u...@apache.org wrote:
 
  
   On 10 Jun 2015, at 16:18, Maximilian Michels m...@apache.org wrote:
  
I'm debugging the TaskManagerFailsWithSlotSharingITCase. I've located
  its
cause but still need to find out how to fix it.
  
   Very good find, Max!
  
   Max, Till, and I have looked into this and it is a reproducible
 deadlock
   in the scheduler during concurrent slot release (in failure cases). Max
   will attach the relevant stack trace to the issue.
  
   I think this is a release blocker. Any opinions?
  
   – Ufuk
 



Re: Testing Apache Flink 0.9.0-rc1

2015-06-11 Thread Aljoscha Krettek
Aren't there still some commits at the top of the release document that
need to be cherry-picked to the release branch?

On Thu, 11 Jun 2015 at 17:13 Maximilian Michels m...@apache.org wrote:

 The deadlock in the scheduler is now fixed. Based on the changes that have
 been push to the release-0.9 branch, I'd like to create a new release
 candidate later on. I think we have gotten the most critical issues out of
 the way. Would that be ok for you?

 On Wed, Jun 10, 2015 at 5:56 PM, Fabian Hueske fhue...@gmail.com wrote:

  Yes, that needs to be fixed IMO
 
  2015-06-10 17:51 GMT+02:00 Till Rohrmann trohrm...@apache.org:
 
   Yes since it is clearly a deadlock in the scheduler, the current
 version
   shouldn't be released.
  
   On Wed, Jun 10, 2015 at 5:48 PM Ufuk Celebi u...@apache.org wrote:
  
   
On 10 Jun 2015, at 16:18, Maximilian Michels m...@apache.org wrote:
   
 I'm debugging the TaskManagerFailsWithSlotSharingITCase. I've
 located
   its
 cause but still need to find out how to fix it.
   
Very good find, Max!
   
Max, Till, and I have looked into this and it is a reproducible
  deadlock
in the scheduler during concurrent slot release (in failure cases).
 Max
will attach the relevant stack trace to the issue.
   
I think this is a release blocker. Any opinions?
   
– Ufuk
  
 



Re: Testing Apache Flink 0.9.0-rc1

2015-06-10 Thread F. Beligianni
Regarding the iteration partitioning feature, since I use it of course I
find it very useful, but it is true that it needs to be tested more
extensively and also be discussed by the community before it is added in a
release.
Moreover, given the fact that I can still use it for research purposes (I
had already cherry-picked before it is being merged to the master branch),
there is no actual reason to  put it in the next release, so that the
community has more time to discuss and decide about the feature.
Lastly, I cross checked the SAMOA application, and till now, there is still
no algorithm implemented in the SAMOA API which needs the new feature.

Faye.

2015-06-10 11:28 GMT+02:00 Sachin Goel sachingoel0...@gmail.com:

 I have run mvn clean verify five times now and every time I'm getting
 these failed tests:

  BlobUtilsTest.before:45 null
   BlobUtilsTest.before:45 null
   BlobServerDeleteTest.testDeleteFails:291 null
   BlobLibraryCacheManagerTest.testRegisterAndDownload:196 Could not
 remove write permissions from cache directory
   BlobServerPutTest.testPutBufferFails:224 null
   BlobServerPutTest.testPutNamedBufferFails:286 null
   JobManagerStartupTest.before:55 null
   JobManagerStartupTest.before:55 null
   DataSinkTaskTest.testFailingDataSinkTask:317 Temp output file has
 not been removed
   DataSinkTaskTest.testFailingSortingDataSinkTask:358 Temp output file
 has not been removed
   TaskManagerTest.testSubmitAndExecuteTask**:123 assertion failed:
 timeout (19998080696 nanoseconds) during expectMsgClass waiting for
 class
 org.apache.flink.runtime.messages.RegistrationMessages$RegisterTaskManager
   TaskManagerProcessReapingTest.testReapProcessOnFailure:133
 TaskManager process did not launch the TaskManager properly. Failed to
 look up akka.tcp://flink@127.0.0.1:50673/user/taskmanager

 ** fails randomly.

 Is someone able to reproduce these while building on a windows machine? I
 would try to debug these myself but I'm not yet familiar with the core
 architecture and API.

 ​-- Sachin

 On Wed, Jun 10, 2015 at 2:46 PM, Aljoscha Krettek aljos...@apache.org
 wrote:

  The KMeans quickstart example does not work with the current state of
  the KMeansDataGenerator. I created PR that brings the two in sync.
  This should probably go into the release since it affects initial user
  satisfaction.
 
  On Wed, Jun 10, 2015 at 11:14 AM, Márton Balassi
  balassi.mar...@gmail.com wrote:
   As for the streaming commit cherry-picked to the release branch:
   This is an unfortunate communication issue, let us make sure that we
   clearly communicate similar issues in the future.
  
   As for FLINK-2192: This is essentially a duplicate issue of the
  testability
   of the streaming iteration. Not a blocker, I will comment on the JIRA
   ticket, Gabor Hermann is already working on the root cause.
  
   On Wed, Jun 10, 2015 at 11:07 AM, Ufuk Celebi u...@apache.org wrote:
  
   Hey Gyula, Max,
  
   On 10 Jun 2015, at 10:54, Gyula Fóra gyula.f...@gmail.com wrote:
  
This feature needs to be included in the release, it has been tested
  and
used extensively. And many applciations depend on it.
  
   It would be nice to announce/discuss this before just cherry-picking
 it
   into the release branch. The issue is that no one (except you) knows
  that
   this is important. Let's just make sure to do this for future fixes.
  
   Having said that... it seems to be an important fix. Does someone have
   time (looking at Aljoscha ;)) to review the changes?
  
Maximilian Michels m...@apache.org ezt írta (időpont: 2015. jún.
  10.,
   Sze,
10:47):
   
With all the issues discovered, it looks like we'll have another
  release
candidate. Right now, we have discovered the following problems:
   
1 YARN ITCase fails [fixed via 2eb5cfe]
2 No Jar for SessionWindowing example [fixed in #809]
3 Wrong description of the input format for the graph examples (eg.
ConnectedComponents) [fixed in #809]
4 TaskManagerFailsWithSlotSharingITCase fails
5 ComplexIntegrationTest.complexIntegrationTest1() (FLINK-2192)
 fails
  
   Can we verify that the tests are defect and not the tested component?
 ;)
   Otherwise, I would not block the release on flakey tests.
  
6 Submitting KMeans example to Web Submission Client does not work
 on
Firefox.
7 Zooming is buggy in Web Submission Client (Firefox)
Do we have someone familiar with the web interface who could take a
   look at
the Firefox issues?
  
   If not, I would not block the release on this.
 



Re: Testing Apache Flink 0.9.0-rc1

2015-06-10 Thread Fabian Hueske
Adding one more thing to the list:

The code contains a misplaced class (mea culpa) in flink-java,
org.apache.flink.api.java.SortPartitionOperator which is API facing and
should be moved to the operators package. If we do that after the release,
it will break binary compatibility. I created FLINK-2196 and will open a PR
soon.

If nobody objects, I'll merge it into the 0.9 release branch as well.

2015-06-10 11:02 GMT+02:00 Maximilian Michels m...@apache.org:

 I'm not against including the feature but I'd like to discuss it first. I
 believe that only very carefully selected commits should be added to
 release-0.9. If that feature happens to be tested extensively and is very
 important for user satisfactory then we might include it.

 On Wed, Jun 10, 2015 at 10:59 AM, F. Beligianni faybeligia...@gmail.com
 wrote:

  I agree with Gyula regarding the iteration partitioning.
  I have also been using this feature for developing machine learning
  algorithms. And I think SAMOA also needs this feature.
 
  Faye
 
  2015-06-10 10:54 GMT+02:00 Gyula Fóra gyula.f...@gmail.com:
 
   This feature needs to be included in the release, it has been tested
 and
   used extensively. And many applciations depend on it.
  
   Maximilian Michels m...@apache.org ezt írta (időpont: 2015. jún. 10.,
   Sze,
   10:47):
  
With all the issues discovered, it looks like we'll have another
  release
candidate. Right now, we have discovered the following problems:
   
1 YARN ITCase fails [fixed via 2eb5cfe]
2 No Jar for SessionWindowing example [fixed in #809]
3 Wrong description of the input format for the graph examples (eg.
ConnectedComponents) [fixed in #809]
4 TaskManagerFailsWithSlotSharingITCase fails
5 ComplexIntegrationTest.complexIntegrationTest1() (FLINK-2192) fails
6 Submitting KMeans example to Web Submission Client does not work on
Firefox.
7 Zooming is buggy in Web Submission Client (Firefox)
   
Do we have someone familiar with the web interface who could take a
  look
   at
the Firefox issues?
   
One more important thing: The release-0.9 branch should only be used
  for
bug fixes or prior discussed feature changes. Adding new features
  defies
the purpose of carefully testing in advance and can have
 unforeseeable
consequences. In particular, I'm referring to #810 pull request:
https://github.com/apache/flink/pull/810
   
IMHO, this one shouldn't have been cherry-picked onto the release-0.9
branch. I would like to remove it from there if no objections are
  raised.
   
   
   
  
 
 https://github.com/apache/flink/commit/e0e6f59f309170e5217bdfbf5d30db87c947f8ce
   
On Wed, Jun 10, 2015 at 8:52 AM, Aljoscha Krettek 
 aljos...@apache.org
  
wrote:
   
 This doesn't look good, yes.

 On Wed, Jun 10, 2015 at 1:32 AM, Ufuk Celebi u...@apache.org
 wrote:

  While looking into FLINK-2188 (HBase input) I've discovered that
   Hadoop
  input formats implementing Configurable (like
mapreduce.TableInputFormat)
  don't have the Hadoop configuration set via
 setConf(Configuration).
 
  I have a small fix for this, which I have to clean up. First, I
   wanted
to
  check what you think about this issue wrt the release.
 Personally,
  I
 think
  this is a release blocker, because it essentially means that no
   Hadoop
  input format, which relies on the Configuration instance to be
 set
   this
 way
  will work (this is to some extent a bug of the respective input
formats)
 –
  most notably the HBase TableInputFormat.
 
  – Ufuk
 
  On 09 Jun 2015, at 18:07, Chiwan Park chiwanp...@icloud.com
  wrote:
 
   I attached jps and jstack log about hanging
  TaskManagerFailsWithSlotSharingITCase to JIRA FLINK-2183.
  
   Regards,
   Chiwan Park
  
   On Jun 10, 2015, at 12:28 AM, Aljoscha Krettek 
   aljos...@apache.org

  wrote:
  
   I discovered something that might be a feature, rather than a
  bug.
 When
  you
   submit an example using the web client without giving
 parameters
   the
   program fails with this:
  
   org.apache.flink.client.program.ProgramInvocationException:
 The
   main
  method
   caused an error.
  
   at
  
 

   
  
 
 org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:452)
  
   at
  
 

   
  
 
 org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:353)
  
   at org.apache.flink.client.program.Client.run(Client.java:315)
  
   at
  
 

   
  
 
 org.apache.flink.client.web.JobSubmissionServlet.doGet(JobSubmissionServlet.java:302)
  
   at
 javax.servlet.http.HttpServlet.service(HttpServlet.java:668)
  
   at
 javax.servlet.http.HttpServlet.service(HttpServlet.java:770)
  
   at
  

Re: Testing Apache Flink 0.9.0-rc1

2015-06-10 Thread Ufuk Celebi
Hey Gyula, Max,

On 10 Jun 2015, at 10:54, Gyula Fóra gyula.f...@gmail.com wrote:

 This feature needs to be included in the release, it has been tested and
 used extensively. And many applciations depend on it.

It would be nice to announce/discuss this before just cherry-picking it into 
the release branch. The issue is that no one (except you) knows that this is 
important. Let's just make sure to do this for future fixes. 

Having said that... it seems to be an important fix. Does someone have time 
(looking at Aljoscha ;)) to review the changes?

 Maximilian Michels m...@apache.org ezt írta (időpont: 2015. jún. 10., Sze,
 10:47):
 
 With all the issues discovered, it looks like we'll have another release
 candidate. Right now, we have discovered the following problems:
 
 1 YARN ITCase fails [fixed via 2eb5cfe]
 2 No Jar for SessionWindowing example [fixed in #809]
 3 Wrong description of the input format for the graph examples (eg.
 ConnectedComponents) [fixed in #809]
 4 TaskManagerFailsWithSlotSharingITCase fails
 5 ComplexIntegrationTest.complexIntegrationTest1() (FLINK-2192) fails

Can we verify that the tests are defect and not the tested component? ;) 
Otherwise, I would not block the release on flakey tests.

 6 Submitting KMeans example to Web Submission Client does not work on
 Firefox.
 7 Zooming is buggy in Web Submission Client (Firefox)
 Do we have someone familiar with the web interface who could take a look at
 the Firefox issues?

If not, I would not block the release on this.

Re: Testing Apache Flink 0.9.0-rc1

2015-06-10 Thread Aljoscha Krettek
I added a section at the top of the release testing document to keep
track of commits that we might want to cherry-pick to the release. I
included the YARNSessionFIFOITCase fix and the optional stream
iteration partitioning (both already on release branch).

On Wed, Jun 10, 2015 at 12:51 PM, Fabian Hueske fhue...@gmail.com wrote:
 @Sachin: I reproduced the build error on my Windows machine.

 2015-06-10 12:22 GMT+02:00 Maximilian Michels m...@apache.org:

 @Sachin: This looks like a file permission issue. We should have someone
 else verify that on a Windows system.

 On Wed, Jun 10, 2015 at 11:28 AM, Sachin Goel sachingoel0...@gmail.com
 wrote:

  I have run mvn clean verify five times now and every time I'm getting
  these failed tests:
 
   BlobUtilsTest.before:45 null
BlobUtilsTest.before:45 null
BlobServerDeleteTest.testDeleteFails:291 null
BlobLibraryCacheManagerTest.testRegisterAndDownload:196 Could not
  remove write permissions from cache directory
BlobServerPutTest.testPutBufferFails:224 null
BlobServerPutTest.testPutNamedBufferFails:286 null
JobManagerStartupTest.before:55 null
JobManagerStartupTest.before:55 null
DataSinkTaskTest.testFailingDataSinkTask:317 Temp output file has
  not been removed
DataSinkTaskTest.testFailingSortingDataSinkTask:358 Temp output file
  has not been removed
TaskManagerTest.testSubmitAndExecuteTask**:123 assertion failed:
  timeout (19998080696 nanoseconds) during expectMsgClass waiting for
  class
 
 org.apache.flink.runtime.messages.RegistrationMessages$RegisterTaskManager
TaskManagerProcessReapingTest.testReapProcessOnFailure:133
  TaskManager process did not launch the TaskManager properly. Failed to
  look up akka.tcp://flink@127.0.0.1:50673/user/taskmanager
 
  ** fails randomly.
 
  Is someone able to reproduce these while building on a windows machine? I
  would try to debug these myself but I'm not yet familiar with the core
  architecture and API.
 
  -- Sachin
 
  On Wed, Jun 10, 2015 at 2:46 PM, Aljoscha Krettek aljos...@apache.org
  wrote:
 
   The KMeans quickstart example does not work with the current state of
   the KMeansDataGenerator. I created PR that brings the two in sync.
   This should probably go into the release since it affects initial user
   satisfaction.
  
   On Wed, Jun 10, 2015 at 11:14 AM, Márton Balassi
   balassi.mar...@gmail.com wrote:
As for the streaming commit cherry-picked to the release branch:
This is an unfortunate communication issue, let us make sure that we
clearly communicate similar issues in the future.
   
As for FLINK-2192: This is essentially a duplicate issue of the
   testability
of the streaming iteration. Not a blocker, I will comment on the JIRA
ticket, Gabor Hermann is already working on the root cause.
   
On Wed, Jun 10, 2015 at 11:07 AM, Ufuk Celebi u...@apache.org
 wrote:
   
Hey Gyula, Max,
   
On 10 Jun 2015, at 10:54, Gyula Fóra gyula.f...@gmail.com wrote:
   
 This feature needs to be included in the release, it has been
 tested
   and
 used extensively. And many applciations depend on it.
   
It would be nice to announce/discuss this before just cherry-picking
  it
into the release branch. The issue is that no one (except you) knows
   that
this is important. Let's just make sure to do this for future fixes.
   
Having said that... it seems to be an important fix. Does someone
 have
time (looking at Aljoscha ;)) to review the changes?
   
 Maximilian Michels m...@apache.org ezt írta (időpont: 2015. jún.
   10.,
Sze,
 10:47):

 With all the issues discovered, it looks like we'll have another
   release
 candidate. Right now, we have discovered the following problems:

 1 YARN ITCase fails [fixed via 2eb5cfe]
 2 No Jar for SessionWindowing example [fixed in #809]
 3 Wrong description of the input format for the graph examples
 (eg.
 ConnectedComponents) [fixed in #809]
 4 TaskManagerFailsWithSlotSharingITCase fails
 5 ComplexIntegrationTest.complexIntegrationTest1() (FLINK-2192)
  fails
   
Can we verify that the tests are defect and not the tested
 component?
  ;)
Otherwise, I would not block the release on flakey tests.
   
 6 Submitting KMeans example to Web Submission Client does not
 work
  on
 Firefox.
 7 Zooming is buggy in Web Submission Client (Firefox)
 Do we have someone familiar with the web interface who could
 take a
look at
 the Firefox issues?
   
If not, I would not block the release on this.
  
 



Re: Testing Apache Flink 0.9.0-rc1

2015-06-10 Thread Maximilian Michels
With all the issues discovered, it looks like we'll have another release
candidate. Right now, we have discovered the following problems:

1 YARN ITCase fails [fixed via 2eb5cfe]
2 No Jar for SessionWindowing example [fixed in #809]
3 Wrong description of the input format for the graph examples (eg.
ConnectedComponents) [fixed in #809]
4 TaskManagerFailsWithSlotSharingITCase fails
5 ComplexIntegrationTest.complexIntegrationTest1() (FLINK-2192) fails
6 Submitting KMeans example to Web Submission Client does not work on
Firefox.
7 Zooming is buggy in Web Submission Client (Firefox)

Do we have someone familiar with the web interface who could take a look at
the Firefox issues?

One more important thing: The release-0.9 branch should only be used for
bug fixes or prior discussed feature changes. Adding new features defies
the purpose of carefully testing in advance and can have unforeseeable
consequences. In particular, I'm referring to #810 pull request:
https://github.com/apache/flink/pull/810

IMHO, this one shouldn't have been cherry-picked onto the release-0.9
branch. I would like to remove it from there if no objections are raised.

https://github.com/apache/flink/commit/e0e6f59f309170e5217bdfbf5d30db87c947f8ce

On Wed, Jun 10, 2015 at 8:52 AM, Aljoscha Krettek aljos...@apache.org
wrote:

 This doesn't look good, yes.

 On Wed, Jun 10, 2015 at 1:32 AM, Ufuk Celebi u...@apache.org wrote:

  While looking into FLINK-2188 (HBase input) I've discovered that Hadoop
  input formats implementing Configurable (like mapreduce.TableInputFormat)
  don't have the Hadoop configuration set via setConf(Configuration).
 
  I have a small fix for this, which I have to clean up. First, I wanted to
  check what you think about this issue wrt the release. Personally, I
 think
  this is a release blocker, because it essentially means that no Hadoop
  input format, which relies on the Configuration instance to be set this
 way
  will work (this is to some extent a bug of the respective input formats)
 –
  most notably the HBase TableInputFormat.
 
  – Ufuk
 
  On 09 Jun 2015, at 18:07, Chiwan Park chiwanp...@icloud.com wrote:
 
   I attached jps and jstack log about hanging
  TaskManagerFailsWithSlotSharingITCase to JIRA FLINK-2183.
  
   Regards,
   Chiwan Park
  
   On Jun 10, 2015, at 12:28 AM, Aljoscha Krettek aljos...@apache.org
  wrote:
  
   I discovered something that might be a feature, rather than a bug.
 When
  you
   submit an example using the web client without giving parameters the
   program fails with this:
  
   org.apache.flink.client.program.ProgramInvocationException: The main
  method
   caused an error.
  
   at
  
 
 org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:452)
  
   at
  
 
 org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:353)
  
   at org.apache.flink.client.program.Client.run(Client.java:315)
  
   at
  
 
 org.apache.flink.client.web.JobSubmissionServlet.doGet(JobSubmissionServlet.java:302)
  
   at javax.servlet.http.HttpServlet.service(HttpServlet.java:668)
  
   at javax.servlet.http.HttpServlet.service(HttpServlet.java:770)
  
   at
  org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:532)
  
   at
  
 
 org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:453)
  
   at
  
 
 org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:227)
  
   at
  
 
 org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:965)
  
   at
  org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:388)
  
   at
  
 
 org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:187)
  
   at
  
 
 org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:901)
  
   at
  
 
 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:117)
  
   at
  org.eclipse.jetty.server.handler.HandlerList.handle(HandlerList.java:47)
  
   at
  
 
 org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:113)
  
   at org.eclipse.jetty.server.Server.handle(Server.java:352)
  
   at
  
 
 org.eclipse.jetty.server.HttpConnection.handleRequest(HttpConnection.java:596)
  
   at
  
 
 org.eclipse.jetty.server.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:1048)
  
   at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:549)
  
   at
 org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:211)
  
   at
  org.eclipse.jetty.server.HttpConnection.handle(HttpConnection.java:425)
  
   at
  
 
 org.eclipse.jetty.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:489)
  
   at
  
 
 org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:436)
  
   at java.lang.Thread.run(Thread.java:745)
  
   Caused by: java.lang.NullPointerException
  
   at
  
 
 

Re: Testing Apache Flink 0.9.0-rc1

2015-06-10 Thread Sachin Goel
I have run mvn clean verify five times now and every time I'm getting
these failed tests:

 BlobUtilsTest.before:45 null
  BlobUtilsTest.before:45 null
  BlobServerDeleteTest.testDeleteFails:291 null
  BlobLibraryCacheManagerTest.testRegisterAndDownload:196 Could not
remove write permissions from cache directory
  BlobServerPutTest.testPutBufferFails:224 null
  BlobServerPutTest.testPutNamedBufferFails:286 null
  JobManagerStartupTest.before:55 null
  JobManagerStartupTest.before:55 null
  DataSinkTaskTest.testFailingDataSinkTask:317 Temp output file has
not been removed
  DataSinkTaskTest.testFailingSortingDataSinkTask:358 Temp output file
has not been removed
  TaskManagerTest.testSubmitAndExecuteTask**:123 assertion failed:
timeout (19998080696 nanoseconds) during expectMsgClass waiting for
class org.apache.flink.runtime.messages.RegistrationMessages$RegisterTaskManager
  TaskManagerProcessReapingTest.testReapProcessOnFailure:133
TaskManager process did not launch the TaskManager properly. Failed to
look up akka.tcp://flink@127.0.0.1:50673/user/taskmanager

** fails randomly.

Is someone able to reproduce these while building on a windows machine? I
would try to debug these myself but I'm not yet familiar with the core
architecture and API.

​-- Sachin

On Wed, Jun 10, 2015 at 2:46 PM, Aljoscha Krettek aljos...@apache.org
wrote:

 The KMeans quickstart example does not work with the current state of
 the KMeansDataGenerator. I created PR that brings the two in sync.
 This should probably go into the release since it affects initial user
 satisfaction.

 On Wed, Jun 10, 2015 at 11:14 AM, Márton Balassi
 balassi.mar...@gmail.com wrote:
  As for the streaming commit cherry-picked to the release branch:
  This is an unfortunate communication issue, let us make sure that we
  clearly communicate similar issues in the future.
 
  As for FLINK-2192: This is essentially a duplicate issue of the
 testability
  of the streaming iteration. Not a blocker, I will comment on the JIRA
  ticket, Gabor Hermann is already working on the root cause.
 
  On Wed, Jun 10, 2015 at 11:07 AM, Ufuk Celebi u...@apache.org wrote:
 
  Hey Gyula, Max,
 
  On 10 Jun 2015, at 10:54, Gyula Fóra gyula.f...@gmail.com wrote:
 
   This feature needs to be included in the release, it has been tested
 and
   used extensively. And many applciations depend on it.
 
  It would be nice to announce/discuss this before just cherry-picking it
  into the release branch. The issue is that no one (except you) knows
 that
  this is important. Let's just make sure to do this for future fixes.
 
  Having said that... it seems to be an important fix. Does someone have
  time (looking at Aljoscha ;)) to review the changes?
 
   Maximilian Michels m...@apache.org ezt írta (időpont: 2015. jún.
 10.,
  Sze,
   10:47):
  
   With all the issues discovered, it looks like we'll have another
 release
   candidate. Right now, we have discovered the following problems:
  
   1 YARN ITCase fails [fixed via 2eb5cfe]
   2 No Jar for SessionWindowing example [fixed in #809]
   3 Wrong description of the input format for the graph examples (eg.
   ConnectedComponents) [fixed in #809]
   4 TaskManagerFailsWithSlotSharingITCase fails
   5 ComplexIntegrationTest.complexIntegrationTest1() (FLINK-2192) fails
 
  Can we verify that the tests are defect and not the tested component? ;)
  Otherwise, I would not block the release on flakey tests.
 
   6 Submitting KMeans example to Web Submission Client does not work on
   Firefox.
   7 Zooming is buggy in Web Submission Client (Firefox)
   Do we have someone familiar with the web interface who could take a
  look at
   the Firefox issues?
 
  If not, I would not block the release on this.



Re: Testing Apache Flink 0.9.0-rc1

2015-06-10 Thread F. Beligianni
I agree with Gyula regarding the iteration partitioning.
I have also been using this feature for developing machine learning
algorithms. And I think SAMOA also needs this feature.

Faye

2015-06-10 10:54 GMT+02:00 Gyula Fóra gyula.f...@gmail.com:

 This feature needs to be included in the release, it has been tested and
 used extensively. And many applciations depend on it.

 Maximilian Michels m...@apache.org ezt írta (időpont: 2015. jún. 10.,
 Sze,
 10:47):

  With all the issues discovered, it looks like we'll have another release
  candidate. Right now, we have discovered the following problems:
 
  1 YARN ITCase fails [fixed via 2eb5cfe]
  2 No Jar for SessionWindowing example [fixed in #809]
  3 Wrong description of the input format for the graph examples (eg.
  ConnectedComponents) [fixed in #809]
  4 TaskManagerFailsWithSlotSharingITCase fails
  5 ComplexIntegrationTest.complexIntegrationTest1() (FLINK-2192) fails
  6 Submitting KMeans example to Web Submission Client does not work on
  Firefox.
  7 Zooming is buggy in Web Submission Client (Firefox)
 
  Do we have someone familiar with the web interface who could take a look
 at
  the Firefox issues?
 
  One more important thing: The release-0.9 branch should only be used for
  bug fixes or prior discussed feature changes. Adding new features defies
  the purpose of carefully testing in advance and can have unforeseeable
  consequences. In particular, I'm referring to #810 pull request:
  https://github.com/apache/flink/pull/810
 
  IMHO, this one shouldn't have been cherry-picked onto the release-0.9
  branch. I would like to remove it from there if no objections are raised.
 
 
 
 https://github.com/apache/flink/commit/e0e6f59f309170e5217bdfbf5d30db87c947f8ce
 
  On Wed, Jun 10, 2015 at 8:52 AM, Aljoscha Krettek aljos...@apache.org
  wrote:
 
   This doesn't look good, yes.
  
   On Wed, Jun 10, 2015 at 1:32 AM, Ufuk Celebi u...@apache.org wrote:
  
While looking into FLINK-2188 (HBase input) I've discovered that
 Hadoop
input formats implementing Configurable (like
  mapreduce.TableInputFormat)
don't have the Hadoop configuration set via setConf(Configuration).
   
I have a small fix for this, which I have to clean up. First, I
 wanted
  to
check what you think about this issue wrt the release. Personally, I
   think
this is a release blocker, because it essentially means that no
 Hadoop
input format, which relies on the Configuration instance to be set
 this
   way
will work (this is to some extent a bug of the respective input
  formats)
   –
most notably the HBase TableInputFormat.
   
– Ufuk
   
On 09 Jun 2015, at 18:07, Chiwan Park chiwanp...@icloud.com wrote:
   
 I attached jps and jstack log about hanging
TaskManagerFailsWithSlotSharingITCase to JIRA FLINK-2183.

 Regards,
 Chiwan Park

 On Jun 10, 2015, at 12:28 AM, Aljoscha Krettek 
 aljos...@apache.org
  
wrote:

 I discovered something that might be a feature, rather than a bug.
   When
you
 submit an example using the web client without giving parameters
 the
 program fails with this:

 org.apache.flink.client.program.ProgramInvocationException: The
 main
method
 caused an error.

 at

   
  
 
 org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:452)

 at

   
  
 
 org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:353)

 at org.apache.flink.client.program.Client.run(Client.java:315)

 at

   
  
 
 org.apache.flink.client.web.JobSubmissionServlet.doGet(JobSubmissionServlet.java:302)

 at javax.servlet.http.HttpServlet.service(HttpServlet.java:668)

 at javax.servlet.http.HttpServlet.service(HttpServlet.java:770)

 at
   
 org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:532)

 at

   
  
 
 org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:453)

 at

   
  
 
 org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:227)

 at

   
  
 
 org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:965)

 at
   
  org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:388)

 at

   
  
 
 org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:187)

 at

   
  
 
 org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:901)

 at

   
  
 
 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:117)

 at
   
  org.eclipse.jetty.server.handler.HandlerList.handle(HandlerList.java:47)

 at

   
  
 
 org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:113)

 at org.eclipse.jetty.server.Server.handle(Server.java:352)

 at

   
  
 
 

Re: Testing Apache Flink 0.9.0-rc1

2015-06-10 Thread Maximilian Michels
I'm not against including the feature but I'd like to discuss it first. I
believe that only very carefully selected commits should be added to
release-0.9. If that feature happens to be tested extensively and is very
important for user satisfactory then we might include it.

On Wed, Jun 10, 2015 at 10:59 AM, F. Beligianni faybeligia...@gmail.com
wrote:

 I agree with Gyula regarding the iteration partitioning.
 I have also been using this feature for developing machine learning
 algorithms. And I think SAMOA also needs this feature.

 Faye

 2015-06-10 10:54 GMT+02:00 Gyula Fóra gyula.f...@gmail.com:

  This feature needs to be included in the release, it has been tested and
  used extensively. And many applciations depend on it.
 
  Maximilian Michels m...@apache.org ezt írta (időpont: 2015. jún. 10.,
  Sze,
  10:47):
 
   With all the issues discovered, it looks like we'll have another
 release
   candidate. Right now, we have discovered the following problems:
  
   1 YARN ITCase fails [fixed via 2eb5cfe]
   2 No Jar for SessionWindowing example [fixed in #809]
   3 Wrong description of the input format for the graph examples (eg.
   ConnectedComponents) [fixed in #809]
   4 TaskManagerFailsWithSlotSharingITCase fails
   5 ComplexIntegrationTest.complexIntegrationTest1() (FLINK-2192) fails
   6 Submitting KMeans example to Web Submission Client does not work on
   Firefox.
   7 Zooming is buggy in Web Submission Client (Firefox)
  
   Do we have someone familiar with the web interface who could take a
 look
  at
   the Firefox issues?
  
   One more important thing: The release-0.9 branch should only be used
 for
   bug fixes or prior discussed feature changes. Adding new features
 defies
   the purpose of carefully testing in advance and can have unforeseeable
   consequences. In particular, I'm referring to #810 pull request:
   https://github.com/apache/flink/pull/810
  
   IMHO, this one shouldn't have been cherry-picked onto the release-0.9
   branch. I would like to remove it from there if no objections are
 raised.
  
  
  
 
 https://github.com/apache/flink/commit/e0e6f59f309170e5217bdfbf5d30db87c947f8ce
  
   On Wed, Jun 10, 2015 at 8:52 AM, Aljoscha Krettek aljos...@apache.org
 
   wrote:
  
This doesn't look good, yes.
   
On Wed, Jun 10, 2015 at 1:32 AM, Ufuk Celebi u...@apache.org wrote:
   
 While looking into FLINK-2188 (HBase input) I've discovered that
  Hadoop
 input formats implementing Configurable (like
   mapreduce.TableInputFormat)
 don't have the Hadoop configuration set via setConf(Configuration).

 I have a small fix for this, which I have to clean up. First, I
  wanted
   to
 check what you think about this issue wrt the release. Personally,
 I
think
 this is a release blocker, because it essentially means that no
  Hadoop
 input format, which relies on the Configuration instance to be set
  this
way
 will work (this is to some extent a bug of the respective input
   formats)
–
 most notably the HBase TableInputFormat.

 – Ufuk

 On 09 Jun 2015, at 18:07, Chiwan Park chiwanp...@icloud.com
 wrote:

  I attached jps and jstack log about hanging
 TaskManagerFailsWithSlotSharingITCase to JIRA FLINK-2183.
 
  Regards,
  Chiwan Park
 
  On Jun 10, 2015, at 12:28 AM, Aljoscha Krettek 
  aljos...@apache.org
   
 wrote:
 
  I discovered something that might be a feature, rather than a
 bug.
When
 you
  submit an example using the web client without giving parameters
  the
  program fails with this:
 
  org.apache.flink.client.program.ProgramInvocationException: The
  main
 method
  caused an error.
 
  at
 

   
  
 
 org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:452)
 
  at
 

   
  
 
 org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:353)
 
  at org.apache.flink.client.program.Client.run(Client.java:315)
 
  at
 

   
  
 
 org.apache.flink.client.web.JobSubmissionServlet.doGet(JobSubmissionServlet.java:302)
 
  at javax.servlet.http.HttpServlet.service(HttpServlet.java:668)
 
  at javax.servlet.http.HttpServlet.service(HttpServlet.java:770)
 
  at

  org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:532)
 
  at
 

   
  
 
 org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:453)
 
  at
 

   
  
 
 org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:227)
 
  at
 

   
  
 
 org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:965)
 
  at

  
 org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:388)
 
  at
 

   
  
 
 

Re: Testing Apache Flink 0.9.0-rc1

2015-06-10 Thread Maximilian Michels
I'm debugging the TaskManagerFailsWithSlotSharingITCase. I've located its
cause but still need to find out how to fix it.

On Wed, Jun 10, 2015 at 2:25 PM, Aljoscha Krettek aljos...@apache.org
wrote:

 I added a section at the top of the release testing document to keep
 track of commits that we might want to cherry-pick to the release. I
 included the YARNSessionFIFOITCase fix and the optional stream
 iteration partitioning (both already on release branch).

 On Wed, Jun 10, 2015 at 12:51 PM, Fabian Hueske fhue...@gmail.com wrote:
  @Sachin: I reproduced the build error on my Windows machine.
 
  2015-06-10 12:22 GMT+02:00 Maximilian Michels m...@apache.org:
 
  @Sachin: This looks like a file permission issue. We should have someone
  else verify that on a Windows system.
 
  On Wed, Jun 10, 2015 at 11:28 AM, Sachin Goel sachingoel0...@gmail.com
 
  wrote:
 
   I have run mvn clean verify five times now and every time I'm
 getting
   these failed tests:
  
BlobUtilsTest.before:45 null
 BlobUtilsTest.before:45 null
 BlobServerDeleteTest.testDeleteFails:291 null
 BlobLibraryCacheManagerTest.testRegisterAndDownload:196 Could not
   remove write permissions from cache directory
 BlobServerPutTest.testPutBufferFails:224 null
 BlobServerPutTest.testPutNamedBufferFails:286 null
 JobManagerStartupTest.before:55 null
 JobManagerStartupTest.before:55 null
 DataSinkTaskTest.testFailingDataSinkTask:317 Temp output file has
   not been removed
 DataSinkTaskTest.testFailingSortingDataSinkTask:358 Temp output file
   has not been removed
 TaskManagerTest.testSubmitAndExecuteTask**:123 assertion failed:
   timeout (19998080696 nanoseconds) during expectMsgClass waiting for
   class
  
 
 org.apache.flink.runtime.messages.RegistrationMessages$RegisterTaskManager
 TaskManagerProcessReapingTest.testReapProcessOnFailure:133
   TaskManager process did not launch the TaskManager properly. Failed to
   look up akka.tcp://flink@127.0.0.1:50673/user/taskmanager
  
   ** fails randomly.
  
   Is someone able to reproduce these while building on a windows
 machine? I
   would try to debug these myself but I'm not yet familiar with the core
   architecture and API.
  
   -- Sachin
  
   On Wed, Jun 10, 2015 at 2:46 PM, Aljoscha Krettek 
 aljos...@apache.org
   wrote:
  
The KMeans quickstart example does not work with the current state
 of
the KMeansDataGenerator. I created PR that brings the two in sync.
This should probably go into the release since it affects initial
 user
satisfaction.
   
On Wed, Jun 10, 2015 at 11:14 AM, Márton Balassi
balassi.mar...@gmail.com wrote:
 As for the streaming commit cherry-picked to the release branch:
 This is an unfortunate communication issue, let us make sure that
 we
 clearly communicate similar issues in the future.

 As for FLINK-2192: This is essentially a duplicate issue of the
testability
 of the streaming iteration. Not a blocker, I will comment on the
 JIRA
 ticket, Gabor Hermann is already working on the root cause.

 On Wed, Jun 10, 2015 at 11:07 AM, Ufuk Celebi u...@apache.org
  wrote:

 Hey Gyula, Max,

 On 10 Jun 2015, at 10:54, Gyula Fóra gyula.f...@gmail.com
 wrote:

  This feature needs to be included in the release, it has been
  tested
and
  used extensively. And many applciations depend on it.

 It would be nice to announce/discuss this before just
 cherry-picking
   it
 into the release branch. The issue is that no one (except you)
 knows
that
 this is important. Let's just make sure to do this for future
 fixes.

 Having said that... it seems to be an important fix. Does someone
  have
 time (looking at Aljoscha ;)) to review the changes?

  Maximilian Michels m...@apache.org ezt írta (időpont: 2015.
 jún.
10.,
 Sze,
  10:47):
 
  With all the issues discovered, it looks like we'll have
 another
release
  candidate. Right now, we have discovered the following
 problems:
 
  1 YARN ITCase fails [fixed via 2eb5cfe]
  2 No Jar for SessionWindowing example [fixed in #809]
  3 Wrong description of the input format for the graph examples
  (eg.
  ConnectedComponents) [fixed in #809]
  4 TaskManagerFailsWithSlotSharingITCase fails
  5 ComplexIntegrationTest.complexIntegrationTest1()
 (FLINK-2192)
   fails

 Can we verify that the tests are defect and not the tested
  component?
   ;)
 Otherwise, I would not block the release on flakey tests.

  6 Submitting KMeans example to Web Submission Client does not
  work
   on
  Firefox.
  7 Zooming is buggy in Web Submission Client (Firefox)
  Do we have someone familiar with the web interface who could
  take a
 look at
  the Firefox issues?

 If not, I would not block the release on this.
   
  
 



Re: Testing Apache Flink 0.9.0-rc1

2015-06-10 Thread Aljoscha Krettek
This doesn't look good, yes.

On Wed, Jun 10, 2015 at 1:32 AM, Ufuk Celebi u...@apache.org wrote:

 While looking into FLINK-2188 (HBase input) I've discovered that Hadoop
 input formats implementing Configurable (like mapreduce.TableInputFormat)
 don't have the Hadoop configuration set via setConf(Configuration).

 I have a small fix for this, which I have to clean up. First, I wanted to
 check what you think about this issue wrt the release. Personally, I think
 this is a release blocker, because it essentially means that no Hadoop
 input format, which relies on the Configuration instance to be set this way
 will work (this is to some extent a bug of the respective input formats) –
 most notably the HBase TableInputFormat.

 – Ufuk

 On 09 Jun 2015, at 18:07, Chiwan Park chiwanp...@icloud.com wrote:

  I attached jps and jstack log about hanging
 TaskManagerFailsWithSlotSharingITCase to JIRA FLINK-2183.
 
  Regards,
  Chiwan Park
 
  On Jun 10, 2015, at 12:28 AM, Aljoscha Krettek aljos...@apache.org
 wrote:
 
  I discovered something that might be a feature, rather than a bug. When
 you
  submit an example using the web client without giving parameters the
  program fails with this:
 
  org.apache.flink.client.program.ProgramInvocationException: The main
 method
  caused an error.
 
  at
 
 org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:452)
 
  at
 
 org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:353)
 
  at org.apache.flink.client.program.Client.run(Client.java:315)
 
  at
 
 org.apache.flink.client.web.JobSubmissionServlet.doGet(JobSubmissionServlet.java:302)
 
  at javax.servlet.http.HttpServlet.service(HttpServlet.java:668)
 
  at javax.servlet.http.HttpServlet.service(HttpServlet.java:770)
 
  at
 org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:532)
 
  at
 
 org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:453)
 
  at
 
 org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:227)
 
  at
 
 org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:965)
 
  at
 org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:388)
 
  at
 
 org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:187)
 
  at
 
 org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:901)
 
  at
 
 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:117)
 
  at
 org.eclipse.jetty.server.handler.HandlerList.handle(HandlerList.java:47)
 
  at
 
 org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:113)
 
  at org.eclipse.jetty.server.Server.handle(Server.java:352)
 
  at
 
 org.eclipse.jetty.server.HttpConnection.handleRequest(HttpConnection.java:596)
 
  at
 
 org.eclipse.jetty.server.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:1048)
 
  at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:549)
 
  at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:211)
 
  at
 org.eclipse.jetty.server.HttpConnection.handle(HttpConnection.java:425)
 
  at
 
 org.eclipse.jetty.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:489)
 
  at
 
 org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:436)
 
  at java.lang.Thread.run(Thread.java:745)
 
  Caused by: java.lang.NullPointerException
 
  at
 
 org.apache.flink.api.common.JobExecutionResult.getAccumulatorResult(JobExecutionResult.java:78)
 
  at org.apache.flink.api.java.DataSet.collect(DataSet.java:409)
 
  at org.apache.flink.api.java.DataSet.print(DataSet.java:1345)
 
  at
 
 org.apache.flink.examples.java.wordcount.WordCount.main(WordCount.java:80)
 
  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 
  at
 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
 
  at
 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 
  at java.lang.reflect.Method.invoke(Method.java:497)
 
  at
 
 org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:437)
 
  ... 24 more
 
 
  This also only occurs when you uncheck the suspend execution while
 showing
  plan.
 
  I think this arises because the new print() uses collect() which tries
 to
  get the job execution result. I guess the result is Null since the job
 is
  submitted asynchronously when the checkbox is unchecked.
 
 
  Other than that, the new print() is pretty sweet when you run the
 builtin
  examples from the CLI. You get all the state changes and also the
 result,
  even when running in cluster mode on several task managers. :D
 
 
  On Tue, Jun 9, 2015 at 3:41 PM, Aljoscha Krettek aljos...@apache.org
  wrote:
 
  I discovered another problem:
  https://issues.apache.org/jira/browse/FLINK-2191 The closure cleaner
  cannot be disabled in part of the Streaming Java API and all of the
  Streaming 

Re: Testing Apache Flink 0.9.0-rc1

2015-06-10 Thread Fabian Hueske
Yes, that needs to be fixed IMO

2015-06-10 17:51 GMT+02:00 Till Rohrmann trohrm...@apache.org:

 Yes since it is clearly a deadlock in the scheduler, the current version
 shouldn't be released.

 On Wed, Jun 10, 2015 at 5:48 PM Ufuk Celebi u...@apache.org wrote:

 
  On 10 Jun 2015, at 16:18, Maximilian Michels m...@apache.org wrote:
 
   I'm debugging the TaskManagerFailsWithSlotSharingITCase. I've located
 its
   cause but still need to find out how to fix it.
 
  Very good find, Max!
 
  Max, Till, and I have looked into this and it is a reproducible deadlock
  in the scheduler during concurrent slot release (in failure cases). Max
  will attach the relevant stack trace to the issue.
 
  I think this is a release blocker. Any opinions?
 
  – Ufuk



Re: Testing Apache Flink 0.9.0-rc1

2015-06-10 Thread Till Rohrmann
Yes since it is clearly a deadlock in the scheduler, the current version
shouldn't be released.

On Wed, Jun 10, 2015 at 5:48 PM Ufuk Celebi u...@apache.org wrote:


 On 10 Jun 2015, at 16:18, Maximilian Michels m...@apache.org wrote:

  I'm debugging the TaskManagerFailsWithSlotSharingITCase. I've located its
  cause but still need to find out how to fix it.

 Very good find, Max!

 Max, Till, and I have looked into this and it is a reproducible deadlock
 in the scheduler during concurrent slot release (in failure cases). Max
 will attach the relevant stack trace to the issue.

 I think this is a release blocker. Any opinions?

 – Ufuk


Re: Testing Apache Flink 0.9.0-rc1

2015-06-09 Thread Till Rohrmann
I also encountered a failing TaskManagerFailsWithSlotSharingITCase using
Java8. I could, however, not reproduce the error a second time. The stack
trace is:

The JobManager should handle hard failing task manager with slot
sharing(org.apache.flink.runtime.jobmanager.TaskManagerFailsWithSlotSharingITCase)
 Time elapsed: 1,400.148 sec   ERROR!
java.util.concurrent.TimeoutException: Futures timed out after [20
milliseconds]
at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:153)
at scala.concurrent.Await$anonfun$ready$1.apply(package.scala:86)
at scala.concurrent.Await$anonfun$ready$1.apply(package.scala:86)
at 
scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
at scala.concurrent.Await$.ready(package.scala:86)
at 
org.apache.flink.runtime.minicluster.FlinkMiniCluster.shutdown(FlinkMiniCluster.scala:162)
at 
org.apache.flink.runtime.minicluster.FlinkMiniCluster.stop(FlinkMiniCluster.scala:149)
at 
org.apache.flink.runtime.jobmanager.TaskManagerFailsWithSlotSharingITCase$anonfun$1$anonfun$apply$mcV$sp$3.apply$mcV$sp(TaskManagerFailsWithSlotSharingITCase.scala:140)
at 
org.apache.flink.runtime.jobmanager.TaskManagerFailsWithSlotSharingITCase$anonfun$1$anonfun$apply$mcV$sp$3.apply(TaskManagerFailsWithSlotSharingITCase.scala:95)
at 
org.apache.flink.runtime.jobmanager.TaskManagerFailsWithSlotSharingITCase$anonfun$1$anonfun$apply$mcV$sp$3.apply(TaskManagerFailsWithSlotSharingITCase.scala:95)
at 
org.scalatest.Transformer$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22)
at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
at org.scalatest.Transformer.apply(Transformer.scala:22)
at org.scalatest.Transformer.apply(Transformer.scala:20)
at org.scalatest.WordSpecLike$anon$1.apply(WordSpecLike.scala:953)
at org.scalatest.Suite$class.withFixture(Suite.scala:1122)
at 
org.apache.flink.runtime.jobmanager.TaskManagerFailsWithSlotSharingITCase.withFixture(TaskManagerFailsWithSlotSharingITCase.scala:36)

Results :

Tests in error:
  
TaskManagerFailsWithSlotSharingITCase.run:36-org$scalatest$BeforeAndAfterAll$super$run:36-org$scalatest$WordSpecLike$super$run:36-runTests:36-runTest:36-withFixture:36
» Timeout

On Tue, Jun 9, 2015 at 11:26 AM Maximilian Michels m...@apache.org
http://mailto:m...@apache.org wrote:

The name of the Git branch was not correct. Thank you, Aljoscha, for
 noticing. I've changed it from release-0.9-rc1 to release-0.9.0-rc1.
 This has no affect on the validity of the release candidate.

​


Re: Testing Apache Flink 0.9.0-rc1

2015-06-09 Thread Maximilian Michels
The name of the Git branch was not correct. Thank you, Aljoscha, for
noticing. I've changed it from release-0.9-rc1 to release-0.9.0-rc1.
This has no affect on the validity of the release candidate.


Re: Testing Apache Flink 0.9.0-rc1

2015-06-09 Thread Aljoscha Krettek
I would suggest we use this format to notify others that we did a task:

Assignees:
 - Aljoscha: done
 - Ufuk: found bug in such an such...
 - Chiwan Park: done, ...

The simple status doesn't work with multiple people on one task.

On Tue, Jun 9, 2015 at 9:40 AM, Ufuk Celebi u...@apache.org wrote:
 Hey all,

 1. it would be nice if we find more people to also do testing of the 
 streaming API. I think it's especially good to have people on it, which did 
 not use it before.

 2. Just to make sure: the assignee field of each task is a list, i.e. we 
 can and should have more people testing per task. ;-)

 – Ufuk

 On 08 Jun 2015, at 19:00, Chiwan Park chiwanp...@icloud.com wrote:

 Hi. I’m very excited about preparing a new major release. :)
 I just picked two tests. I will report status as soon as possible.

 Regards,
 Chiwan Park

 On Jun 9, 2015, at 1:52 AM, Maximilian Michels m...@apache.org wrote:

 Hi everyone!

 As previously discussed, the Flink developer community is very eager to get
 out a new major release. Apache Flink 0.9.0 will contain lots of new
 features and many bugfixes. This time, I'll try to coordinate the release
 process. Feel free to correct me if I'm doing something wrong because I
 don't no any better :)

 To release a great version of Flink to the public, I'd like to ask everyone
 to test the release candidate. Recently, Flink has received a lot of
 attention. The expectations are quite high. Only through thorough testing
 we will be able to satisfy all the Flink users out there.

 Below is a list from the Wiki that we use to ensure the legal and
 functional aspects of a release [1]. What I would like you to do is pick at
 least one of the tasks, put your name as assignee in the link below, and
 report back once you verified it. That way, I hope we can quickly and
 thoroughly test the release candidate.

 https://docs.google.com/document/d/1BhyMPTpAUYA8dG8-vJ3gSAmBUAa0PBSRkxIBPsZxkLs/edit

 Best,
 Max

 Git branch: release-0.9-rc1
 Release binaries: http://people.apache.org/~mxm/flink-0.9.0-rc1/
 Maven artifacts:
 https://repository.apache.org/content/repositories/orgapacheflink-1037/
 PGP public key for verifying the signatures:
 http://pgp.mit.edu/pks/lookup?op=vindexsearch=0xDE976D18C2909CBF


 Legal
 

 L.1 Check if checksums and GPG files match the corresponding release files

 L.2 Verify that the source archives do NOT contains any binaries

 L.3 Check if the source release is building properly with Maven (including
 license header check (default) and checkstyle). Also the tests should be
 executed (mvn clean verify)

 L.4 Verify that the LICENSE and NOTICE file is correct for the binary and
 source release.

 L.5 All dependencies must be checked for their license and the license must
 be ASL 2.0 compatible (http://www.apache.org/legal/resolved.html#category-x)
 * The LICENSE and NOTICE files in the root directory refer to dependencies
 in the source release, i.e., files in the git repository (such as fonts,
 css, JavaScript, images)
 * The LICENSE and NOTICE files in flink-dist/src/main/flink-bin refer to
 the binary distribution and mention all of Flink's Maven dependencies as
 well

 L.6 Check that all POM files point to the same version (mostly relevant to
 examine quickstart artifact files)

 L.7 Read the README.md file


 Functional
 

 F.1 Run the start-local.sh/start-local-streaming.sh,
 start-cluster.sh/start-cluster-streaming.sh, start-webclient.sh scripts and
 verify that the processes come up

 F.2 Examine the *.out files (should be empty) and the log files (should
 contain no exceptions)
 * Test for Linux, OS X, Windows (for Windows as far as possible, not all
 scripts exist)
 * Shutdown and verify there are no exceptions in the log output (after
 shutdown)
 * Check all start+submission scripts for paths with and without spaces
 (./bin/* scripts are quite fragile for paths with spaces)

 F.3 local mode (start-local.sh, see criteria below)
 F.4 cluster mode (start-cluster.sh, see criteria below)
 F.5 multi-node cluster (can simulate locally by starting two taskmanagers,
 see criteria below)

 Criteria for F.3 F.4 F.5
 
 * Verify that the examples are running from both ./bin/flink and from the
 web-based job submission tool
 * flink-conf.yml should define more than one task slot
 * Results of job are produced and correct
 ** Check also that the examples are running with the build-in data and
 external sources.
 * Examine the log output - no error messages should be encountered
 ** Web interface shows progress and finished job in history


 F.6 Test on a cluster with HDFS.
 * Check that a good amount of input splits is read locally (JobManager log
 reveals local assignments)

 F.7 Test against a Kafka installation

 F.8 Test the ./bin/flink command line client
 * Test info option, paste the JSON into the plan visualizer HTML file,
 check that plan is rendered
 * Test the parallelism flag (-p) to override the configured 

Re: Testing Apache Flink 0.9.0-rc1

2015-06-09 Thread Maximilian Michels
+1 makes sense.

On Tue, Jun 9, 2015 at 10:48 AM, Aljoscha Krettek aljos...@apache.org
wrote:

 I would suggest we use this format to notify others that we did a task:

 Assignees:
  - Aljoscha: done
  - Ufuk: found bug in such an such...
  - Chiwan Park: done, ...

 The simple status doesn't work with multiple people on one task.

 On Tue, Jun 9, 2015 at 9:40 AM, Ufuk Celebi u...@apache.org wrote:
  Hey all,
 
  1. it would be nice if we find more people to also do testing of the
 streaming API. I think it's especially good to have people on it, which did
 not use it before.
 
  2. Just to make sure: the assignee field of each task is a list, i.e.
 we can and should have more people testing per task. ;-)
 
  – Ufuk
 
  On 08 Jun 2015, at 19:00, Chiwan Park chiwanp...@icloud.com wrote:
 
  Hi. I’m very excited about preparing a new major release. :)
  I just picked two tests. I will report status as soon as possible.
 
  Regards,
  Chiwan Park
 
  On Jun 9, 2015, at 1:52 AM, Maximilian Michels m...@apache.org wrote:
 
  Hi everyone!
 
  As previously discussed, the Flink developer community is very eager
 to get
  out a new major release. Apache Flink 0.9.0 will contain lots of new
  features and many bugfixes. This time, I'll try to coordinate the
 release
  process. Feel free to correct me if I'm doing something wrong because I
  don't no any better :)
 
  To release a great version of Flink to the public, I'd like to ask
 everyone
  to test the release candidate. Recently, Flink has received a lot of
  attention. The expectations are quite high. Only through thorough
 testing
  we will be able to satisfy all the Flink users out there.
 
  Below is a list from the Wiki that we use to ensure the legal and
  functional aspects of a release [1]. What I would like you to do is
 pick at
  least one of the tasks, put your name as assignee in the link below,
 and
  report back once you verified it. That way, I hope we can quickly and
  thoroughly test the release candidate.
 
 
 https://docs.google.com/document/d/1BhyMPTpAUYA8dG8-vJ3gSAmBUAa0PBSRkxIBPsZxkLs/edit
 
  Best,
  Max
 
  Git branch: release-0.9-rc1
  Release binaries: http://people.apache.org/~mxm/flink-0.9.0-rc1/
  Maven artifacts:
 
 https://repository.apache.org/content/repositories/orgapacheflink-1037/
  PGP public key for verifying the signatures:
  http://pgp.mit.edu/pks/lookup?op=vindexsearch=0xDE976D18C2909CBF
 
 
  Legal
  
 
  L.1 Check if checksums and GPG files match the corresponding release
 files
 
  L.2 Verify that the source archives do NOT contains any binaries
 
  L.3 Check if the source release is building properly with Maven
 (including
  license header check (default) and checkstyle). Also the tests should
 be
  executed (mvn clean verify)
 
  L.4 Verify that the LICENSE and NOTICE file is correct for the binary
 and
  source release.
 
  L.5 All dependencies must be checked for their license and the license
 must
  be ASL 2.0 compatible (
 http://www.apache.org/legal/resolved.html#category-x)
  * The LICENSE and NOTICE files in the root directory refer to
 dependencies
  in the source release, i.e., files in the git repository (such as
 fonts,
  css, JavaScript, images)
  * The LICENSE and NOTICE files in flink-dist/src/main/flink-bin refer
 to
  the binary distribution and mention all of Flink's Maven dependencies
 as
  well
 
  L.6 Check that all POM files point to the same version (mostly
 relevant to
  examine quickstart artifact files)
 
  L.7 Read the README.md file
 
 
  Functional
  
 
  F.1 Run the start-local.sh/start-local-streaming.sh,
  start-cluster.sh/start-cluster-streaming.sh, start-webclient.sh
 scripts and
  verify that the processes come up
 
  F.2 Examine the *.out files (should be empty) and the log files (should
  contain no exceptions)
  * Test for Linux, OS X, Windows (for Windows as far as possible, not
 all
  scripts exist)
  * Shutdown and verify there are no exceptions in the log output (after
  shutdown)
  * Check all start+submission scripts for paths with and without spaces
  (./bin/* scripts are quite fragile for paths with spaces)
 
  F.3 local mode (start-local.sh, see criteria below)
  F.4 cluster mode (start-cluster.sh, see criteria below)
  F.5 multi-node cluster (can simulate locally by starting two
 taskmanagers,
  see criteria below)
 
  Criteria for F.3 F.4 F.5
  
  * Verify that the examples are running from both ./bin/flink and from
 the
  web-based job submission tool
  * flink-conf.yml should define more than one task slot
  * Results of job are produced and correct
  ** Check also that the examples are running with the build-in data and
  external sources.
  * Examine the log output - no error messages should be encountered
  ** Web interface shows progress and finished job in history
 
 
  F.6 Test on a cluster with HDFS.
  * Check that a good amount of input splits is read locally (JobManager
 log
  reveals local assignments)
 
  F.7 Test 

Re: Testing Apache Flink 0.9.0-rc1

2015-06-09 Thread Aljoscha Krettek
I also saw the same error on my third mvn clean verify run. Before it
always failed in the YARN tests.

On Tue, Jun 9, 2015 at 12:23 PM, Till Rohrmann trohrm...@apache.org wrote:

 I also encountered a failing TaskManagerFailsWithSlotSharingITCase using
 Java8. I could, however, not reproduce the error a second time. The stack
 trace is:

 The JobManager should handle hard failing task manager with slot

 sharing(org.apache.flink.runtime.jobmanager.TaskManagerFailsWithSlotSharingITCase)
  Time elapsed: 1,400.148 sec   ERROR!
 java.util.concurrent.TimeoutException: Futures timed out after [20
 milliseconds]
 at
 scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
 at
 scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:153)
 at scala.concurrent.Await$anonfun$ready$1.apply(package.scala:86)
 at scala.concurrent.Await$anonfun$ready$1.apply(package.scala:86)
 at
 scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
 at scala.concurrent.Await$.ready(package.scala:86)
 at
 org.apache.flink.runtime.minicluster.FlinkMiniCluster.shutdown(FlinkMiniCluster.scala:162)
 at
 org.apache.flink.runtime.minicluster.FlinkMiniCluster.stop(FlinkMiniCluster.scala:149)
 at
 org.apache.flink.runtime.jobmanager.TaskManagerFailsWithSlotSharingITCase$anonfun$1$anonfun$apply$mcV$sp$3.apply$mcV$sp(TaskManagerFailsWithSlotSharingITCase.scala:140)
 at
 org.apache.flink.runtime.jobmanager.TaskManagerFailsWithSlotSharingITCase$anonfun$1$anonfun$apply$mcV$sp$3.apply(TaskManagerFailsWithSlotSharingITCase.scala:95)
 at
 org.apache.flink.runtime.jobmanager.TaskManagerFailsWithSlotSharingITCase$anonfun$1$anonfun$apply$mcV$sp$3.apply(TaskManagerFailsWithSlotSharingITCase.scala:95)
 at
 org.scalatest.Transformer$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22)
 at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
 at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
 at org.scalatest.Transformer.apply(Transformer.scala:22)
 at org.scalatest.Transformer.apply(Transformer.scala:20)
 at org.scalatest.WordSpecLike$anon$1.apply(WordSpecLike.scala:953)
 at org.scalatest.Suite$class.withFixture(Suite.scala:1122)
 at
 org.apache.flink.runtime.jobmanager.TaskManagerFailsWithSlotSharingITCase.withFixture(TaskManagerFailsWithSlotSharingITCase.scala:36)

 Results :

 Tests in error:

 TaskManagerFailsWithSlotSharingITCase.run:36-org$scalatest$BeforeAndAfterAll$super$run:36-org$scalatest$WordSpecLike$super$run:36-runTests:36-runTest:36-withFixture:36
 » Timeout

 On Tue, Jun 9, 2015 at 11:26 AM Maximilian Michels m...@apache.org
 http://mailto:m...@apache.org wrote:

 The name of the Git branch was not correct. Thank you, Aljoscha, for
  noticing. I've changed it from release-0.9-rc1 to release-0.9.0-rc1.
  This has no affect on the validity of the release candidate.
 
 ​



Re: Testing Apache Flink 0.9.0-rc1

2015-06-09 Thread Sachin Goel
On my local machine, several flink runtime tests are failing on mvn clean
verify. Here is the log output: http://pastebin.com/raw.php?i=VWbx2ppf

--
​ Sachin​

On Tue, Jun 9, 2015 at 4:04 PM, Aljoscha Krettek aljos...@apache.org
wrote:

 I also saw the same error on my third mvn clean verify run. Before it
 always failed in the YARN tests.

 On Tue, Jun 9, 2015 at 12:23 PM, Till Rohrmann trohrm...@apache.org
 wrote:

  I also encountered a failing TaskManagerFailsWithSlotSharingITCase using
  Java8. I could, however, not reproduce the error a second time. The stack
  trace is:
 
  The JobManager should handle hard failing task manager with slot
 
 
 sharing(org.apache.flink.runtime.jobmanager.TaskManagerFailsWithSlotSharingITCase)
   Time elapsed: 1,400.148 sec   ERROR!
  java.util.concurrent.TimeoutException: Futures timed out after [20
  milliseconds]
  at
  scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
  at
  scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:153)
  at scala.concurrent.Await$anonfun$ready$1.apply(package.scala:86)
  at scala.concurrent.Await$anonfun$ready$1.apply(package.scala:86)
  at
 
 scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
  at scala.concurrent.Await$.ready(package.scala:86)
  at
 
 org.apache.flink.runtime.minicluster.FlinkMiniCluster.shutdown(FlinkMiniCluster.scala:162)
  at
 
 org.apache.flink.runtime.minicluster.FlinkMiniCluster.stop(FlinkMiniCluster.scala:149)
  at
 
 org.apache.flink.runtime.jobmanager.TaskManagerFailsWithSlotSharingITCase$anonfun$1$anonfun$apply$mcV$sp$3.apply$mcV$sp(TaskManagerFailsWithSlotSharingITCase.scala:140)
  at
 
 org.apache.flink.runtime.jobmanager.TaskManagerFailsWithSlotSharingITCase$anonfun$1$anonfun$apply$mcV$sp$3.apply(TaskManagerFailsWithSlotSharingITCase.scala:95)
  at
 
 org.apache.flink.runtime.jobmanager.TaskManagerFailsWithSlotSharingITCase$anonfun$1$anonfun$apply$mcV$sp$3.apply(TaskManagerFailsWithSlotSharingITCase.scala:95)
  at
 
 org.scalatest.Transformer$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22)
  at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
  at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
  at org.scalatest.Transformer.apply(Transformer.scala:22)
  at org.scalatest.Transformer.apply(Transformer.scala:20)
  at org.scalatest.WordSpecLike$anon$1.apply(WordSpecLike.scala:953)
  at org.scalatest.Suite$class.withFixture(Suite.scala:1122)
  at
 
 org.apache.flink.runtime.jobmanager.TaskManagerFailsWithSlotSharingITCase.withFixture(TaskManagerFailsWithSlotSharingITCase.scala:36)
 
  Results :
 
  Tests in error:
 
 
 TaskManagerFailsWithSlotSharingITCase.run:36-org$scalatest$BeforeAndAfterAll$super$run:36-org$scalatest$WordSpecLike$super$run:36-runTests:36-runTest:36-withFixture:36
  » Timeout
 
  On Tue, Jun 9, 2015 at 11:26 AM Maximilian Michels m...@apache.org
  http://mailto:m...@apache.org wrote:
 
  The name of the Git branch was not correct. Thank you, Aljoscha, for
   noticing. I've changed it from release-0.9-rc1 to
 release-0.9.0-rc1.
   This has no affect on the validity of the release candidate.
  
  ​
 



Re: Testing Apache Flink 0.9.0-rc1

2015-06-09 Thread Aljoscha Krettek
I did five mvn clean verify runs by now. All of them failed. One
with the TaskmanagerFailsWithSlotSharingITCase and the other ones with
YARNSessionFIFOITCase

On Tue, Jun 9, 2015 at 12:34 PM, Aljoscha Krettek aljos...@apache.org wrote:
 I also saw the same error on my third mvn clean verify run. Before it
 always failed in the YARN tests.

 On Tue, Jun 9, 2015 at 12:23 PM, Till Rohrmann trohrm...@apache.org wrote:

 I also encountered a failing TaskManagerFailsWithSlotSharingITCase using
 Java8. I could, however, not reproduce the error a second time. The stack
 trace is:

 The JobManager should handle hard failing task manager with slot

 sharing(org.apache.flink.runtime.jobmanager.TaskManagerFailsWithSlotSharingITCase)
  Time elapsed: 1,400.148 sec   ERROR!
 java.util.concurrent.TimeoutException: Futures timed out after [20
 milliseconds]
 at
 scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)
 at
 scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:153)
 at scala.concurrent.Await$anonfun$ready$1.apply(package.scala:86)
 at scala.concurrent.Await$anonfun$ready$1.apply(package.scala:86)
 at
 scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53)
 at scala.concurrent.Await$.ready(package.scala:86)
 at
 org.apache.flink.runtime.minicluster.FlinkMiniCluster.shutdown(FlinkMiniCluster.scala:162)
 at
 org.apache.flink.runtime.minicluster.FlinkMiniCluster.stop(FlinkMiniCluster.scala:149)
 at
 org.apache.flink.runtime.jobmanager.TaskManagerFailsWithSlotSharingITCase$anonfun$1$anonfun$apply$mcV$sp$3.apply$mcV$sp(TaskManagerFailsWithSlotSharingITCase.scala:140)
 at
 org.apache.flink.runtime.jobmanager.TaskManagerFailsWithSlotSharingITCase$anonfun$1$anonfun$apply$mcV$sp$3.apply(TaskManagerFailsWithSlotSharingITCase.scala:95)
 at
 org.apache.flink.runtime.jobmanager.TaskManagerFailsWithSlotSharingITCase$anonfun$1$anonfun$apply$mcV$sp$3.apply(TaskManagerFailsWithSlotSharingITCase.scala:95)
 at
 org.scalatest.Transformer$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22)
 at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
 at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
 at org.scalatest.Transformer.apply(Transformer.scala:22)
 at org.scalatest.Transformer.apply(Transformer.scala:20)
 at org.scalatest.WordSpecLike$anon$1.apply(WordSpecLike.scala:953)
 at org.scalatest.Suite$class.withFixture(Suite.scala:1122)
 at
 org.apache.flink.runtime.jobmanager.TaskManagerFailsWithSlotSharingITCase.withFixture(TaskManagerFailsWithSlotSharingITCase.scala:36)

 Results :

 Tests in error:

 TaskManagerFailsWithSlotSharingITCase.run:36-org$scalatest$BeforeAndAfterAll$super$run:36-org$scalatest$WordSpecLike$super$run:36-runTests:36-runTest:36-withFixture:36
 » Timeout

 On Tue, Jun 9, 2015 at 11:26 AM Maximilian Michels m...@apache.org
 http://mailto:m...@apache.org wrote:

 The name of the Git branch was not correct. Thank you, Aljoscha, for
  noticing. I've changed it from release-0.9-rc1 to release-0.9.0-rc1.
  This has no affect on the validity of the release candidate.
 




Re: Testing Apache Flink 0.9.0-rc1

2015-06-09 Thread Ufuk Celebi

On 09 Jun 2015, at 13:58, Sachin Goel sachingoel0...@gmail.com wrote:

 On my local machine, several flink runtime tests are failing on mvn clean
 verify. Here is the log output: http://pastebin.com/raw.php?i=VWbx2ppf

Thanks for reporting this. Have you tried it multiple times? Is it failing 
reproducibly with the same tests? What's your setup?

– Ufuk

Re: Testing Apache Flink 0.9.0-rc1

2015-06-09 Thread Aljoscha Krettek
I discovered something that might be a feature, rather than a bug. When you
submit an example using the web client without giving parameters the
program fails with this:

org.apache.flink.client.program.ProgramInvocationException: The main method
caused an error.

at
org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:452)

at
org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:353)

at org.apache.flink.client.program.Client.run(Client.java:315)

at
org.apache.flink.client.web.JobSubmissionServlet.doGet(JobSubmissionServlet.java:302)

at javax.servlet.http.HttpServlet.service(HttpServlet.java:668)

at javax.servlet.http.HttpServlet.service(HttpServlet.java:770)

at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:532)

at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:453)

at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:227)

at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:965)

at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:388)

at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:187)

at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:901)

at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:117)

at org.eclipse.jetty.server.handler.HandlerList.handle(HandlerList.java:47)

at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:113)

at org.eclipse.jetty.server.Server.handle(Server.java:352)

at
org.eclipse.jetty.server.HttpConnection.handleRequest(HttpConnection.java:596)

at
org.eclipse.jetty.server.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:1048)

at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:549)

at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:211)

at org.eclipse.jetty.server.HttpConnection.handle(HttpConnection.java:425)

at
org.eclipse.jetty.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:489)

at
org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:436)

at java.lang.Thread.run(Thread.java:745)

Caused by: java.lang.NullPointerException

at
org.apache.flink.api.common.JobExecutionResult.getAccumulatorResult(JobExecutionResult.java:78)

at org.apache.flink.api.java.DataSet.collect(DataSet.java:409)

at org.apache.flink.api.java.DataSet.print(DataSet.java:1345)

at
org.apache.flink.examples.java.wordcount.WordCount.main(WordCount.java:80)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)

at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:497)

at
org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:437)

... 24 more


This also only occurs when you uncheck the suspend execution while showing
plan.

I think this arises because the new print() uses collect() which tries to
get the job execution result. I guess the result is Null since the job is
submitted asynchronously when the checkbox is unchecked.


Other than that, the new print() is pretty sweet when you run the builtin
examples from the CLI. You get all the state changes and also the result,
even when running in cluster mode on several task managers. :D


On Tue, Jun 9, 2015 at 3:41 PM, Aljoscha Krettek aljos...@apache.org
wrote:

 I discovered another problem:
 https://issues.apache.org/jira/browse/FLINK-2191 The closure cleaner
 cannot be disabled in part of the Streaming Java API and all of the
 Streaming Scala API. I think this is a release blocker (in addition
 the the other bugs found so far.)

 On Tue, Jun 9, 2015 at 2:35 PM, Aljoscha Krettek aljos...@apache.org
 wrote:
  I found the bug in the failing YARNSessionFIFOITCase: It was comparing
  the hostname to a hostname in some yarn config. In one case it was
  capitalised, in the other case it wasn't.
 
  Pushing fix to master and release-0.9 branch.
 
  On Tue, Jun 9, 2015 at 2:18 PM, Sachin Goel sachingoel0...@gmail.com
 wrote:
  A re-ran lead to reproducibility of 11 failures again.
  TaskManagerTest.testSubmitAndExecuteTask was failing with a time-out but
  managed to succeed in a re-run. Here is the log output again:
  http://pastebin.com/raw.php?i=N4cm1J18
 
  Setup: JDK 1.8.0_40 on windows 8.1
  System memory: 8GB, quad-core with maximum 8 threads.
 
  Regards
  Sachin Goel
 
  On Tue, Jun 9, 2015 at 5:34 PM, Ufuk Celebi u...@apache.org wrote:
 
 
  On 09 Jun 2015, at 13:58, Sachin Goel sachingoel0...@gmail.com
 wrote:
 
   On my local machine, several flink runtime tests are failing on mvn
  clean
   verify. Here is the log output:
 http://pastebin.com/raw.php?i=VWbx2ppf
 
  Thanks for reporting this. Have you tried it multiple times? Is it
 failing
  reproducibly 

Re: Testing Apache Flink 0.9.0-rc1

2015-06-09 Thread Chiwan Park
I attached jps and jstack log about hanging 
TaskManagerFailsWithSlotSharingITCase to JIRA FLINK-2183.

Regards,
Chiwan Park

 On Jun 10, 2015, at 12:28 AM, Aljoscha Krettek aljos...@apache.org wrote:
 
 I discovered something that might be a feature, rather than a bug. When you
 submit an example using the web client without giving parameters the
 program fails with this:
 
 org.apache.flink.client.program.ProgramInvocationException: The main method
 caused an error.
 
 at
 org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:452)
 
 at
 org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:353)
 
 at org.apache.flink.client.program.Client.run(Client.java:315)
 
 at
 org.apache.flink.client.web.JobSubmissionServlet.doGet(JobSubmissionServlet.java:302)
 
 at javax.servlet.http.HttpServlet.service(HttpServlet.java:668)
 
 at javax.servlet.http.HttpServlet.service(HttpServlet.java:770)
 
 at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:532)
 
 at
 org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:453)
 
 at
 org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:227)
 
 at
 org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:965)
 
 at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:388)
 
 at
 org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:187)
 
 at
 org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:901)
 
 at
 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:117)
 
 at org.eclipse.jetty.server.handler.HandlerList.handle(HandlerList.java:47)
 
 at
 org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:113)
 
 at org.eclipse.jetty.server.Server.handle(Server.java:352)
 
 at
 org.eclipse.jetty.server.HttpConnection.handleRequest(HttpConnection.java:596)
 
 at
 org.eclipse.jetty.server.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:1048)
 
 at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:549)
 
 at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:211)
 
 at org.eclipse.jetty.server.HttpConnection.handle(HttpConnection.java:425)
 
 at
 org.eclipse.jetty.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:489)
 
 at
 org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:436)
 
 at java.lang.Thread.run(Thread.java:745)
 
 Caused by: java.lang.NullPointerException
 
 at
 org.apache.flink.api.common.JobExecutionResult.getAccumulatorResult(JobExecutionResult.java:78)
 
 at org.apache.flink.api.java.DataSet.collect(DataSet.java:409)
 
 at org.apache.flink.api.java.DataSet.print(DataSet.java:1345)
 
 at
 org.apache.flink.examples.java.wordcount.WordCount.main(WordCount.java:80)
 
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 
 at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
 
 at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 
 at java.lang.reflect.Method.invoke(Method.java:497)
 
 at
 org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:437)
 
 ... 24 more
 
 
 This also only occurs when you uncheck the suspend execution while showing
 plan.
 
 I think this arises because the new print() uses collect() which tries to
 get the job execution result. I guess the result is Null since the job is
 submitted asynchronously when the checkbox is unchecked.
 
 
 Other than that, the new print() is pretty sweet when you run the builtin
 examples from the CLI. You get all the state changes and also the result,
 even when running in cluster mode on several task managers. :D
 
 
 On Tue, Jun 9, 2015 at 3:41 PM, Aljoscha Krettek aljos...@apache.org
 wrote:
 
 I discovered another problem:
 https://issues.apache.org/jira/browse/FLINK-2191 The closure cleaner
 cannot be disabled in part of the Streaming Java API and all of the
 Streaming Scala API. I think this is a release blocker (in addition
 the the other bugs found so far.)
 
 On Tue, Jun 9, 2015 at 2:35 PM, Aljoscha Krettek aljos...@apache.org
 wrote:
 I found the bug in the failing YARNSessionFIFOITCase: It was comparing
 the hostname to a hostname in some yarn config. In one case it was
 capitalised, in the other case it wasn't.
 
 Pushing fix to master and release-0.9 branch.
 
 On Tue, Jun 9, 2015 at 2:18 PM, Sachin Goel sachingoel0...@gmail.com
 wrote:
 A re-ran lead to reproducibility of 11 failures again.
 TaskManagerTest.testSubmitAndExecuteTask was failing with a time-out but
 managed to succeed in a re-run. Here is the log output again:
 http://pastebin.com/raw.php?i=N4cm1J18
 
 Setup: JDK 1.8.0_40 on windows 8.1
 System memory: 8GB, quad-core with maximum 8 threads.
 
 Regards
 Sachin Goel
 
 On Tue, Jun 9, 2015 at 5:34 PM, Ufuk Celebi u...@apache.org wrote:
 
 
 On 09 Jun 2015, at 

Re: Testing Apache Flink 0.9.0-rc1

2015-06-09 Thread Ufuk Celebi
While looking into FLINK-2188 (HBase input) I've discovered that Hadoop input 
formats implementing Configurable (like mapreduce.TableInputFormat) don't have 
the Hadoop configuration set via setConf(Configuration).

I have a small fix for this, which I have to clean up. First, I wanted to check 
what you think about this issue wrt the release. Personally, I think this is a 
release blocker, because it essentially means that no Hadoop input format, 
which relies on the Configuration instance to be set this way will work (this 
is to some extent a bug of the respective input formats) – most notably the 
HBase TableInputFormat.

– Ufuk

On 09 Jun 2015, at 18:07, Chiwan Park chiwanp...@icloud.com wrote:

 I attached jps and jstack log about hanging 
 TaskManagerFailsWithSlotSharingITCase to JIRA FLINK-2183.
 
 Regards,
 Chiwan Park
 
 On Jun 10, 2015, at 12:28 AM, Aljoscha Krettek aljos...@apache.org wrote:
 
 I discovered something that might be a feature, rather than a bug. When you
 submit an example using the web client without giving parameters the
 program fails with this:
 
 org.apache.flink.client.program.ProgramInvocationException: The main method
 caused an error.
 
 at
 org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:452)
 
 at
 org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:353)
 
 at org.apache.flink.client.program.Client.run(Client.java:315)
 
 at
 org.apache.flink.client.web.JobSubmissionServlet.doGet(JobSubmissionServlet.java:302)
 
 at javax.servlet.http.HttpServlet.service(HttpServlet.java:668)
 
 at javax.servlet.http.HttpServlet.service(HttpServlet.java:770)
 
 at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:532)
 
 at
 org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:453)
 
 at
 org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:227)
 
 at
 org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:965)
 
 at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:388)
 
 at
 org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:187)
 
 at
 org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:901)
 
 at
 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:117)
 
 at org.eclipse.jetty.server.handler.HandlerList.handle(HandlerList.java:47)
 
 at
 org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:113)
 
 at org.eclipse.jetty.server.Server.handle(Server.java:352)
 
 at
 org.eclipse.jetty.server.HttpConnection.handleRequest(HttpConnection.java:596)
 
 at
 org.eclipse.jetty.server.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:1048)
 
 at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:549)
 
 at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:211)
 
 at org.eclipse.jetty.server.HttpConnection.handle(HttpConnection.java:425)
 
 at
 org.eclipse.jetty.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:489)
 
 at
 org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:436)
 
 at java.lang.Thread.run(Thread.java:745)
 
 Caused by: java.lang.NullPointerException
 
 at
 org.apache.flink.api.common.JobExecutionResult.getAccumulatorResult(JobExecutionResult.java:78)
 
 at org.apache.flink.api.java.DataSet.collect(DataSet.java:409)
 
 at org.apache.flink.api.java.DataSet.print(DataSet.java:1345)
 
 at
 org.apache.flink.examples.java.wordcount.WordCount.main(WordCount.java:80)
 
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 
 at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
 
 at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 
 at java.lang.reflect.Method.invoke(Method.java:497)
 
 at
 org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:437)
 
 ... 24 more
 
 
 This also only occurs when you uncheck the suspend execution while showing
 plan.
 
 I think this arises because the new print() uses collect() which tries to
 get the job execution result. I guess the result is Null since the job is
 submitted asynchronously when the checkbox is unchecked.
 
 
 Other than that, the new print() is pretty sweet when you run the builtin
 examples from the CLI. You get all the state changes and also the result,
 even when running in cluster mode on several task managers. :D
 
 
 On Tue, Jun 9, 2015 at 3:41 PM, Aljoscha Krettek aljos...@apache.org
 wrote:
 
 I discovered another problem:
 https://issues.apache.org/jira/browse/FLINK-2191 The closure cleaner
 cannot be disabled in part of the Streaming Java API and all of the
 Streaming Scala API. I think this is a release blocker (in addition
 the the other bugs found so far.)
 
 On Tue, Jun 9, 2015 at 2:35 PM, Aljoscha Krettek aljos...@apache.org
 wrote:
 I found the bug in the failing 

Re: Testing Apache Flink 0.9.0-rc1

2015-06-09 Thread Sachin Goel
A re-ran lead to reproducibility of 11 failures again.
TaskManagerTest.testSubmitAndExecuteTask was failing with a time-out but
managed to succeed in a re-run. Here is the log output again:
http://pastebin.com/raw.php?i=N4cm1J18

Setup: JDK 1.8.0_40 on windows 8.1
System memory: 8GB, quad-core with maximum 8 threads.

Regards
Sachin Goel

On Tue, Jun 9, 2015 at 5:34 PM, Ufuk Celebi u...@apache.org wrote:


 On 09 Jun 2015, at 13:58, Sachin Goel sachingoel0...@gmail.com wrote:

  On my local machine, several flink runtime tests are failing on mvn
 clean
  verify. Here is the log output: http://pastebin.com/raw.php?i=VWbx2ppf

 Thanks for reporting this. Have you tried it multiple times? Is it failing
 reproducibly with the same tests? What's your setup?

 – Ufuk


Re: Testing Apache Flink 0.9.0-rc1

2015-06-09 Thread Aljoscha Krettek
I found the bug in the failing YARNSessionFIFOITCase: It was comparing
the hostname to a hostname in some yarn config. In one case it was
capitalised, in the other case it wasn't.

Pushing fix to master and release-0.9 branch.

On Tue, Jun 9, 2015 at 2:18 PM, Sachin Goel sachingoel0...@gmail.com wrote:
 A re-ran lead to reproducibility of 11 failures again.
 TaskManagerTest.testSubmitAndExecuteTask was failing with a time-out but
 managed to succeed in a re-run. Here is the log output again:
 http://pastebin.com/raw.php?i=N4cm1J18

 Setup: JDK 1.8.0_40 on windows 8.1
 System memory: 8GB, quad-core with maximum 8 threads.

 Regards
 Sachin Goel

 On Tue, Jun 9, 2015 at 5:34 PM, Ufuk Celebi u...@apache.org wrote:


 On 09 Jun 2015, at 13:58, Sachin Goel sachingoel0...@gmail.com wrote:

  On my local machine, several flink runtime tests are failing on mvn
 clean
  verify. Here is the log output: http://pastebin.com/raw.php?i=VWbx2ppf

 Thanks for reporting this. Have you tried it multiple times? Is it failing
 reproducibly with the same tests? What's your setup?

 – Ufuk


Re: Testing Apache Flink 0.9.0-rc1

2015-06-09 Thread Aljoscha Krettek
I discovered another problem:
https://issues.apache.org/jira/browse/FLINK-2191 The closure cleaner
cannot be disabled in part of the Streaming Java API and all of the
Streaming Scala API. I think this is a release blocker (in addition
the the other bugs found so far.)

On Tue, Jun 9, 2015 at 2:35 PM, Aljoscha Krettek aljos...@apache.org wrote:
 I found the bug in the failing YARNSessionFIFOITCase: It was comparing
 the hostname to a hostname in some yarn config. In one case it was
 capitalised, in the other case it wasn't.

 Pushing fix to master and release-0.9 branch.

 On Tue, Jun 9, 2015 at 2:18 PM, Sachin Goel sachingoel0...@gmail.com wrote:
 A re-ran lead to reproducibility of 11 failures again.
 TaskManagerTest.testSubmitAndExecuteTask was failing with a time-out but
 managed to succeed in a re-run. Here is the log output again:
 http://pastebin.com/raw.php?i=N4cm1J18

 Setup: JDK 1.8.0_40 on windows 8.1
 System memory: 8GB, quad-core with maximum 8 threads.

 Regards
 Sachin Goel

 On Tue, Jun 9, 2015 at 5:34 PM, Ufuk Celebi u...@apache.org wrote:


 On 09 Jun 2015, at 13:58, Sachin Goel sachingoel0...@gmail.com wrote:

  On my local machine, several flink runtime tests are failing on mvn
 clean
  verify. Here is the log output: http://pastebin.com/raw.php?i=VWbx2ppf

 Thanks for reporting this. Have you tried it multiple times? Is it failing
 reproducibly with the same tests? What's your setup?

 – Ufuk


Re: Testing Apache Flink 0.9.0-rc1

2015-06-08 Thread Chiwan Park
Hi. I have a problem running `mvn clean verify` command.
TaskManagerFailsWithSlotSharingITCase hangs in Oracle JDK 7 (1.7.0_80). But in 
Oracle JDK 8 the test case doesn’t hang.

I’ve investigated about this problem but I cannot found the bug.

Regards,
Chiwan Park

 On Jun 9, 2015, at 2:11 AM, Márton Balassi balassi.mar...@gmail.com wrote:
 
 Added F7 Running against Kafka cluster for me in the doc. Doing it
 tomorrow.
 
 On Mon, Jun 8, 2015 at 7:00 PM, Chiwan Park chiwanp...@icloud.com wrote:
 
 Hi. I’m very excited about preparing a new major release. :)
 I just picked two tests. I will report status as soon as possible.
 
 Regards,
 Chiwan Park
 
 On Jun 9, 2015, at 1:52 AM, Maximilian Michels m...@apache.org wrote:
 
 Hi everyone!
 
 As previously discussed, the Flink developer community is very eager to
 get
 out a new major release. Apache Flink 0.9.0 will contain lots of new
 features and many bugfixes. This time, I'll try to coordinate the release
 process. Feel free to correct me if I'm doing something wrong because I
 don't no any better :)
 
 To release a great version of Flink to the public, I'd like to ask
 everyone
 to test the release candidate. Recently, Flink has received a lot of
 attention. The expectations are quite high. Only through thorough testing
 we will be able to satisfy all the Flink users out there.
 
 Below is a list from the Wiki that we use to ensure the legal and
 functional aspects of a release [1]. What I would like you to do is pick
 at
 least one of the tasks, put your name as assignee in the link below, and
 report back once you verified it. That way, I hope we can quickly and
 thoroughly test the release candidate.
 
 
 https://docs.google.com/document/d/1BhyMPTpAUYA8dG8-vJ3gSAmBUAa0PBSRkxIBPsZxkLs/edit
 
 Best,
 Max
 
 Git branch: release-0.9-rc1
 Release binaries: http://people.apache.org/~mxm/flink-0.9.0-rc1/
 Maven artifacts:
 https://repository.apache.org/content/repositories/orgapacheflink-1037/
 PGP public key for verifying the signatures:
 http://pgp.mit.edu/pks/lookup?op=vindexsearch=0xDE976D18C2909CBF
 
 
 Legal
 
 
 L.1 Check if checksums and GPG files match the corresponding release
 files
 
 L.2 Verify that the source archives do NOT contains any binaries
 
 L.3 Check if the source release is building properly with Maven
 (including
 license header check (default) and checkstyle). Also the tests should be
 executed (mvn clean verify)
 
 L.4 Verify that the LICENSE and NOTICE file is correct for the binary and
 source release.
 
 L.5 All dependencies must be checked for their license and the license
 must
 be ASL 2.0 compatible (
 http://www.apache.org/legal/resolved.html#category-x)
 * The LICENSE and NOTICE files in the root directory refer to
 dependencies
 in the source release, i.e., files in the git repository (such as fonts,
 css, JavaScript, images)
 * The LICENSE and NOTICE files in flink-dist/src/main/flink-bin refer to
 the binary distribution and mention all of Flink's Maven dependencies as
 well
 
 L.6 Check that all POM files point to the same version (mostly relevant
 to
 examine quickstart artifact files)
 
 L.7 Read the README.md file
 
 
 Functional
 
 
 F.1 Run the start-local.sh/start-local-streaming.sh,
 start-cluster.sh/start-cluster-streaming.sh, start-webclient.sh scripts
 and
 verify that the processes come up
 
 F.2 Examine the *.out files (should be empty) and the log files (should
 contain no exceptions)
 * Test for Linux, OS X, Windows (for Windows as far as possible, not all
 scripts exist)
 * Shutdown and verify there are no exceptions in the log output (after
 shutdown)
 * Check all start+submission scripts for paths with and without spaces
 (./bin/* scripts are quite fragile for paths with spaces)
 
 F.3 local mode (start-local.sh, see criteria below)
 F.4 cluster mode (start-cluster.sh, see criteria below)
 F.5 multi-node cluster (can simulate locally by starting two
 taskmanagers,
 see criteria below)
 
 Criteria for F.3 F.4 F.5
 
 * Verify that the examples are running from both ./bin/flink and from the
 web-based job submission tool
 * flink-conf.yml should define more than one task slot
 * Results of job are produced and correct
 ** Check also that the examples are running with the build-in data and
 external sources.
 * Examine the log output - no error messages should be encountered
 ** Web interface shows progress and finished job in history
 
 
 F.6 Test on a cluster with HDFS.
 * Check that a good amount of input splits is read locally (JobManager
 log
 reveals local assignments)
 
 F.7 Test against a Kafka installation
 
 F.8 Test the ./bin/flink command line client
 * Test info option, paste the JSON into the plan visualizer HTML file,
 check that plan is rendered
 * Test the parallelism flag (-p) to override the configured default
 parallelism
 
 F.9 Verify the plan visualizer with different browsers/operating systems
 
 F.10 Verify that the quickstarts for scala and 

Re: Testing Apache Flink 0.9.0-rc1

2015-06-08 Thread Chiwan Park
Hi. I’m very excited about preparing a new major release. :)
I just picked two tests. I will report status as soon as possible.

Regards,
Chiwan Park

 On Jun 9, 2015, at 1:52 AM, Maximilian Michels m...@apache.org wrote:
 
 Hi everyone!
 
 As previously discussed, the Flink developer community is very eager to get
 out a new major release. Apache Flink 0.9.0 will contain lots of new
 features and many bugfixes. This time, I'll try to coordinate the release
 process. Feel free to correct me if I'm doing something wrong because I
 don't no any better :)
 
 To release a great version of Flink to the public, I'd like to ask everyone
 to test the release candidate. Recently, Flink has received a lot of
 attention. The expectations are quite high. Only through thorough testing
 we will be able to satisfy all the Flink users out there.
 
 Below is a list from the Wiki that we use to ensure the legal and
 functional aspects of a release [1]. What I would like you to do is pick at
 least one of the tasks, put your name as assignee in the link below, and
 report back once you verified it. That way, I hope we can quickly and
 thoroughly test the release candidate.
 
 https://docs.google.com/document/d/1BhyMPTpAUYA8dG8-vJ3gSAmBUAa0PBSRkxIBPsZxkLs/edit
 
 Best,
 Max
 
 Git branch: release-0.9-rc1
 Release binaries: http://people.apache.org/~mxm/flink-0.9.0-rc1/
 Maven artifacts:
 https://repository.apache.org/content/repositories/orgapacheflink-1037/
 PGP public key for verifying the signatures:
 http://pgp.mit.edu/pks/lookup?op=vindexsearch=0xDE976D18C2909CBF
 
 
 Legal
 
 
 L.1 Check if checksums and GPG files match the corresponding release files
 
 L.2 Verify that the source archives do NOT contains any binaries
 
 L.3 Check if the source release is building properly with Maven (including
 license header check (default) and checkstyle). Also the tests should be
 executed (mvn clean verify)
 
 L.4 Verify that the LICENSE and NOTICE file is correct for the binary and
 source release.
 
 L.5 All dependencies must be checked for their license and the license must
 be ASL 2.0 compatible (http://www.apache.org/legal/resolved.html#category-x)
 * The LICENSE and NOTICE files in the root directory refer to dependencies
 in the source release, i.e., files in the git repository (such as fonts,
 css, JavaScript, images)
 * The LICENSE and NOTICE files in flink-dist/src/main/flink-bin refer to
 the binary distribution and mention all of Flink's Maven dependencies as
 well
 
 L.6 Check that all POM files point to the same version (mostly relevant to
 examine quickstart artifact files)
 
 L.7 Read the README.md file
 
 
 Functional
 
 
 F.1 Run the start-local.sh/start-local-streaming.sh,
 start-cluster.sh/start-cluster-streaming.sh, start-webclient.sh scripts and
 verify that the processes come up
 
 F.2 Examine the *.out files (should be empty) and the log files (should
 contain no exceptions)
 * Test for Linux, OS X, Windows (for Windows as far as possible, not all
 scripts exist)
 * Shutdown and verify there are no exceptions in the log output (after
 shutdown)
 * Check all start+submission scripts for paths with and without spaces
 (./bin/* scripts are quite fragile for paths with spaces)
 
 F.3 local mode (start-local.sh, see criteria below)
 F.4 cluster mode (start-cluster.sh, see criteria below)
 F.5 multi-node cluster (can simulate locally by starting two taskmanagers,
 see criteria below)
 
 Criteria for F.3 F.4 F.5
 
 * Verify that the examples are running from both ./bin/flink and from the
 web-based job submission tool
 * flink-conf.yml should define more than one task slot
 * Results of job are produced and correct
 ** Check also that the examples are running with the build-in data and
 external sources.
 * Examine the log output - no error messages should be encountered
 ** Web interface shows progress and finished job in history
 
 
 F.6 Test on a cluster with HDFS.
 * Check that a good amount of input splits is read locally (JobManager log
 reveals local assignments)
 
 F.7 Test against a Kafka installation
 
 F.8 Test the ./bin/flink command line client
 * Test info option, paste the JSON into the plan visualizer HTML file,
 check that plan is rendered
 * Test the parallelism flag (-p) to override the configured default
 parallelism
 
 F.9 Verify the plan visualizer with different browsers/operating systems
 
 F.10 Verify that the quickstarts for scala and java are working with the
 staging repository for both IntelliJ and Eclipse.
 * In particular the dependencies of the quickstart project need to be set
 correctly and the QS project needs to build from the staging repository
 (replace the snapshot repo URL with the staging repo URL)
 * The dependency tree of the QuickStart project must not contain any
 dependencies we shade away upstream (guava, netty, ...)
 
 F.11 Run examples on a YARN cluster
 
 F.12 Run all examples from the IDE (Eclipse  IntelliJ)
 
 F.13 Run an 

Re: Testing Apache Flink 0.9.0-rc1

2015-06-08 Thread Márton Balassi
Added F7 Running against Kafka cluster for me in the doc. Doing it
tomorrow.

On Mon, Jun 8, 2015 at 7:00 PM, Chiwan Park chiwanp...@icloud.com wrote:

 Hi. I’m very excited about preparing a new major release. :)
 I just picked two tests. I will report status as soon as possible.

 Regards,
 Chiwan Park

  On Jun 9, 2015, at 1:52 AM, Maximilian Michels m...@apache.org wrote:
 
  Hi everyone!
 
  As previously discussed, the Flink developer community is very eager to
 get
  out a new major release. Apache Flink 0.9.0 will contain lots of new
  features and many bugfixes. This time, I'll try to coordinate the release
  process. Feel free to correct me if I'm doing something wrong because I
  don't no any better :)
 
  To release a great version of Flink to the public, I'd like to ask
 everyone
  to test the release candidate. Recently, Flink has received a lot of
  attention. The expectations are quite high. Only through thorough testing
  we will be able to satisfy all the Flink users out there.
 
  Below is a list from the Wiki that we use to ensure the legal and
  functional aspects of a release [1]. What I would like you to do is pick
 at
  least one of the tasks, put your name as assignee in the link below, and
  report back once you verified it. That way, I hope we can quickly and
  thoroughly test the release candidate.
 
 
 https://docs.google.com/document/d/1BhyMPTpAUYA8dG8-vJ3gSAmBUAa0PBSRkxIBPsZxkLs/edit
 
  Best,
  Max
 
  Git branch: release-0.9-rc1
  Release binaries: http://people.apache.org/~mxm/flink-0.9.0-rc1/
  Maven artifacts:
  https://repository.apache.org/content/repositories/orgapacheflink-1037/
  PGP public key for verifying the signatures:
  http://pgp.mit.edu/pks/lookup?op=vindexsearch=0xDE976D18C2909CBF
 
 
  Legal
  
 
  L.1 Check if checksums and GPG files match the corresponding release
 files
 
  L.2 Verify that the source archives do NOT contains any binaries
 
  L.3 Check if the source release is building properly with Maven
 (including
  license header check (default) and checkstyle). Also the tests should be
  executed (mvn clean verify)
 
  L.4 Verify that the LICENSE and NOTICE file is correct for the binary and
  source release.
 
  L.5 All dependencies must be checked for their license and the license
 must
  be ASL 2.0 compatible (
 http://www.apache.org/legal/resolved.html#category-x)
  * The LICENSE and NOTICE files in the root directory refer to
 dependencies
  in the source release, i.e., files in the git repository (such as fonts,
  css, JavaScript, images)
  * The LICENSE and NOTICE files in flink-dist/src/main/flink-bin refer to
  the binary distribution and mention all of Flink's Maven dependencies as
  well
 
  L.6 Check that all POM files point to the same version (mostly relevant
 to
  examine quickstart artifact files)
 
  L.7 Read the README.md file
 
 
  Functional
  
 
  F.1 Run the start-local.sh/start-local-streaming.sh,
  start-cluster.sh/start-cluster-streaming.sh, start-webclient.sh scripts
 and
  verify that the processes come up
 
  F.2 Examine the *.out files (should be empty) and the log files (should
  contain no exceptions)
  * Test for Linux, OS X, Windows (for Windows as far as possible, not all
  scripts exist)
  * Shutdown and verify there are no exceptions in the log output (after
  shutdown)
  * Check all start+submission scripts for paths with and without spaces
  (./bin/* scripts are quite fragile for paths with spaces)
 
  F.3 local mode (start-local.sh, see criteria below)
  F.4 cluster mode (start-cluster.sh, see criteria below)
  F.5 multi-node cluster (can simulate locally by starting two
 taskmanagers,
  see criteria below)
 
  Criteria for F.3 F.4 F.5
  
  * Verify that the examples are running from both ./bin/flink and from the
  web-based job submission tool
  * flink-conf.yml should define more than one task slot
  * Results of job are produced and correct
  ** Check also that the examples are running with the build-in data and
  external sources.
  * Examine the log output - no error messages should be encountered
  ** Web interface shows progress and finished job in history
 
 
  F.6 Test on a cluster with HDFS.
  * Check that a good amount of input splits is read locally (JobManager
 log
  reveals local assignments)
 
  F.7 Test against a Kafka installation
 
  F.8 Test the ./bin/flink command line client
  * Test info option, paste the JSON into the plan visualizer HTML file,
  check that plan is rendered
  * Test the parallelism flag (-p) to override the configured default
  parallelism
 
  F.9 Verify the plan visualizer with different browsers/operating systems
 
  F.10 Verify that the quickstarts for scala and java are working with the
  staging repository for both IntelliJ and Eclipse.
  * In particular the dependencies of the quickstart project need to be set
  correctly and the QS project needs to build from the staging repository
  (replace the snapshot repo URL with the