Re: Testing Apache Flink 0.9.0-rc1
Sorry, it was already out. I was merely struggling with the Maven deploy command because tools/generate_specific_pom.sh is not entirely compatible with old versions of Perl or sed. The script was generating incorrect pom.xml files.. On Sat, Jun 13, 2015 at 9:14 PM, Aljoscha Krettek aljos...@apache.org wrote: The new release candidate is not yet done? We have a very simple fix that allows the RowSerializer of the Table API to work with null-fields. I think we should include that. What do you think? On Fri, 12 Jun 2015 at 23:50 Ufuk Celebi u...@apache.org wrote: I'm with Till on this. Robert's position is valid as well. Again, there is no core disagreement here. No one wants to add it to dist. On 12 Jun 2015, at 00:40, Ufuk Celebi u...@apache.org javascript:_e(%7B%7D,'cvml','u...@apache.org'); wrote: On 11 Jun 2015, at 20:04, Fabian Hueske fhue...@gmail.com javascript:_e(%7B%7D,'cvml','fhue...@gmail.com'); wrote: How about the following issues? 1. The Hbase Hadoop Compat issue, Ufuk is working on I was not able to reproduce this :( I ran HadoopInputFormats against various sources and confirmed the results and everything was fine so far. The issue has been resolved Not a problem.
Re: Testing Apache Flink 0.9.0-rc1
Hi guys, I just noticed while testing the TableAPI on the cluster that it is not part of the dist module. Therefore, programs using the TableAPI will only run when you put the TableAPI jar directly on the cluster or if you build a fat jar including the TableAPI jar. This is nowhere documented. Furthermore, this also applies to Gelly and FlinkML. Cheers, Till On Fri, Jun 12, 2015 at 9:16 AM Till Rohrmann trohrm...@apache.org wrote: I'm currently going through the license file and I discovered some skeletons in our closet. This has to be merged as well. But I'm still working on it (we have a lot of dependencies). Cheers, Till On Fri, Jun 12, 2015 at 12:51 AM Ufuk Celebi u...@apache.org wrote: On 12 Jun 2015, at 00:49, Fabian Hueske fhue...@gmail.com wrote: 2. is basically done. I have a patch which updates the counters on page reload but that shouldn't be hard to extend to dynamic updates. Very nice! :-) Thanks!
Re: Testing Apache Flink 0.9.0-rc1
We should have a nightly cluster test for every library. Let's keep that in mind for the future. Very nice find, Till! Since there were not objections, I cherry-picked the proposed commits from the document to the release-0.9 branch. If I understand correctly, we can create the new release candidate once Till has checked the licenses, Ufuk's TableInput fix has been merged, and Fabian's web interface improvement are in. Plus, we need to include all Flink libraries in flink-dist. Are you going to fix that as well, Till? On Fri, Jun 12, 2015 at 9:53 AM, Ufuk Celebi u...@apache.org wrote: On 12 Jun 2015, at 09:45, Till Rohrmann trohrm...@apache.org wrote: Hi guys, I just noticed while testing the TableAPI on the cluster that it is not part of the dist module. Therefore, programs using the TableAPI will only run when you put the TableAPI jar directly on the cluster or if you build a fat jar including the TableAPI jar. This is nowhere documented. Furthermore, this also applies to Gelly and FlinkML. I think all of these should be included in the fat jar. They are all highly advertized components. Very good catch, Till! I didn't get around to testing Table API on a cluster, yet.
Re: Testing Apache Flink 0.9.0-rc1
@Till: This also apples to the streaming connectors. On Fri, Jun 12, 2015 at 9:45 AM, Till Rohrmann trohrm...@apache.org wrote: Hi guys, I just noticed while testing the TableAPI on the cluster that it is not part of the dist module. Therefore, programs using the TableAPI will only run when you put the TableAPI jar directly on the cluster or if you build a fat jar including the TableAPI jar. This is nowhere documented. Furthermore, this also applies to Gelly and FlinkML. Cheers, Till On Fri, Jun 12, 2015 at 9:16 AM Till Rohrmann trohrm...@apache.org wrote: I'm currently going through the license file and I discovered some skeletons in our closet. This has to be merged as well. But I'm still working on it (we have a lot of dependencies). Cheers, Till On Fri, Jun 12, 2015 at 12:51 AM Ufuk Celebi u...@apache.org wrote: On 12 Jun 2015, at 00:49, Fabian Hueske fhue...@gmail.com wrote: 2. is basically done. I have a patch which updates the counters on page reload but that shouldn't be hard to extend to dynamic updates. Very nice! :-) Thanks!
Re: Testing Apache Flink 0.9.0-rc1
I have another fix, but this is just a documentation update (FLINK-2207) and will be done soon. 2015-06-12 10:02 GMT+02:00 Maximilian Michels m...@apache.org: We should have a nightly cluster test for every library. Let's keep that in mind for the future. Very nice find, Till! Since there were not objections, I cherry-picked the proposed commits from the document to the release-0.9 branch. If I understand correctly, we can create the new release candidate once Till has checked the licenses, Ufuk's TableInput fix has been merged, and Fabian's web interface improvement are in. Plus, we need to include all Flink libraries in flink-dist. Are you going to fix that as well, Till? On Fri, Jun 12, 2015 at 9:53 AM, Ufuk Celebi u...@apache.org wrote: On 12 Jun 2015, at 09:45, Till Rohrmann trohrm...@apache.org wrote: Hi guys, I just noticed while testing the TableAPI on the cluster that it is not part of the dist module. Therefore, programs using the TableAPI will only run when you put the TableAPI jar directly on the cluster or if you build a fat jar including the TableAPI jar. This is nowhere documented. Furthermore, this also applies to Gelly and FlinkML. I think all of these should be included in the fat jar. They are all highly advertized components. Very good catch, Till! I didn't get around to testing Table API on a cluster, yet.
Re: Testing Apache Flink 0.9.0-rc1
As for outstanding issues I think streaming is good to go as far as I know. I am personally against including all libraries - at least speaking for the streaming connectors. Robert, Stephan and myself had a detailed discussion on that some time ago and the disadvantage of having all the libraries in the distribution is the dependency mess that they pull. In this case I would rather add documentation on putting them in the user jar then. As for the other libraries they do not depend on so much external code, so +1 for putting them in. On Fri, Jun 12, 2015 at 10:02 AM, Maximilian Michels m...@apache.org wrote: We should have a nightly cluster test for every library. Let's keep that in mind for the future. Very nice find, Till! Since there were not objections, I cherry-picked the proposed commits from the document to the release-0.9 branch. If I understand correctly, we can create the new release candidate once Till has checked the licenses, Ufuk's TableInput fix has been merged, and Fabian's web interface improvement are in. Plus, we need to include all Flink libraries in flink-dist. Are you going to fix that as well, Till? On Fri, Jun 12, 2015 at 9:53 AM, Ufuk Celebi u...@apache.org wrote: On 12 Jun 2015, at 09:45, Till Rohrmann trohrm...@apache.org wrote: Hi guys, I just noticed while testing the TableAPI on the cluster that it is not part of the dist module. Therefore, programs using the TableAPI will only run when you put the TableAPI jar directly on the cluster or if you build a fat jar including the TableAPI jar. This is nowhere documented. Furthermore, this also applies to Gelly and FlinkML. I think all of these should be included in the fat jar. They are all highly advertized components. Very good catch, Till! I didn't get around to testing Table API on a cluster, yet.
Re: Testing Apache Flink 0.9.0-rc1
What about the shaded jars? On Fri, Jun 12, 2015 at 11:32 AM Ufuk Celebi u...@apache.org wrote: @Max: for the new RC. Can you make sure to set the variables correctly with regard to stable/snapshot versions in the docs?
Re: Testing Apache Flink 0.9.0-rc1
On 12 Jun 2015, at 09:45, Till Rohrmann trohrm...@apache.org wrote: Hi guys, I just noticed while testing the TableAPI on the cluster that it is not part of the dist module. Therefore, programs using the TableAPI will only run when you put the TableAPI jar directly on the cluster or if you build a fat jar including the TableAPI jar. This is nowhere documented. Furthermore, this also applies to Gelly and FlinkML. I think all of these should be included in the fat jar. They are all highly advertized components. Very good catch, Till! I didn't get around to testing Table API on a cluster, yet.
Re: Testing Apache Flink 0.9.0-rc1
After thinking about it a bit more, I think that's fine. +1 to document and keep it as it is.
Re: Testing Apache Flink 0.9.0-rc1
Well I think the initial idea was to keep the dist jar as small a possible and therefore we did not include the libraries. I'm not sure whether we can decide this here ad-hoc. If the community says that we shall include these libraries then I can add them. But bear in mind that all of them have some transitive dependencies which will be added as well. On Fri, Jun 12, 2015 at 10:15 AM Márton Balassi balassi.mar...@gmail.com wrote: As for outstanding issues I think streaming is good to go as far as I know. I am personally against including all libraries - at least speaking for the streaming connectors. Robert, Stephan and myself had a detailed discussion on that some time ago and the disadvantage of having all the libraries in the distribution is the dependency mess that they pull. In this case I would rather add documentation on putting them in the user jar then. As for the other libraries they do not depend on so much external code, so +1 for putting them in. On Fri, Jun 12, 2015 at 10:02 AM, Maximilian Michels m...@apache.org wrote: We should have a nightly cluster test for every library. Let's keep that in mind for the future. Very nice find, Till! Since there were not objections, I cherry-picked the proposed commits from the document to the release-0.9 branch. If I understand correctly, we can create the new release candidate once Till has checked the licenses, Ufuk's TableInput fix has been merged, and Fabian's web interface improvement are in. Plus, we need to include all Flink libraries in flink-dist. Are you going to fix that as well, Till? On Fri, Jun 12, 2015 at 9:53 AM, Ufuk Celebi u...@apache.org wrote: On 12 Jun 2015, at 09:45, Till Rohrmann trohrm...@apache.org wrote: Hi guys, I just noticed while testing the TableAPI on the cluster that it is not part of the dist module. Therefore, programs using the TableAPI will only run when you put the TableAPI jar directly on the cluster or if you build a fat jar including the TableAPI jar. This is nowhere documented. Furthermore, this also applies to Gelly and FlinkML. I think all of these should be included in the fat jar. They are all highly advertized components. Very good catch, Till! I didn't get around to testing Table API on a cluster, yet.
Re: Testing Apache Flink 0.9.0-rc1
On 12 Jun 2015, at 10:44, Till Rohrmann trohrm...@apache.org wrote: Yes you're right Ufuk. At the moment the user has to place the jars in the lib folder of Flink. If this folder is not shared then he has to do it for every node on which Flink runs. OK. I guess there is a nice way to do this with YARN as well. I think it decreases the out-of-the-box experience quite a bit if you want to use these nice features. What's your stand on this issue?
Re: Testing Apache Flink 0.9.0-rc1
On 12 Jun 2015, at 10:29, Till Rohrmann trohrm...@apache.org wrote: Well I think the initial idea was to keep the dist jar as small a possible and therefore we did not include the libraries. I'm not sure whether we can decide this here ad-hoc. If the community says that we shall include these libraries then I can add them. But bear in mind that all of them have some transitive dependencies which will be added as well. I'm against the connectors as well, but not having Table API, Flink ML, and Gelly not in seems odd to me. Or maybe I'm missing something. Someone who wants to try this out has to place the dependencies manually into the lib folder of the Flink installation, right?
Re: Testing Apache Flink 0.9.0-rc1
I think I found a real release blocker. Currently we don't add license files to our shaded jars. For example the flink-shaded-include-yarn-0.9.0-milestone-1.jar shades hadoop code. This code also includes the `org.apache.util.bloom.*` classes. These classes are licensed under The European Commission project OneLab. We have a notice in the LICENSE file of our binary distribution but I think we also have to add them in the shaded jar. There might even be more code bundled as part of some shaded jars which I have not spotted yet. Furthermore, I noticed that we list all Apache License dependencies in our LICENSE file of our binary distribution (which we don't have to do). However, we don't do it in our jars which contain for example guava and asm as shaded dependencies. Maybe we should be consistent here. But maybe I overlook something here and we don't have to do it. On Fri, Jun 12, 2015 at 10:29 AM Till Rohrmann trohrm...@apache.org wrote: Well I think the initial idea was to keep the dist jar as small a possible and therefore we did not include the libraries. I'm not sure whether we can decide this here ad-hoc. If the community says that we shall include these libraries then I can add them. But bear in mind that all of them have some transitive dependencies which will be added as well. On Fri, Jun 12, 2015 at 10:15 AM Márton Balassi balassi.mar...@gmail.com wrote: As for outstanding issues I think streaming is good to go as far as I know. I am personally against including all libraries - at least speaking for the streaming connectors. Robert, Stephan and myself had a detailed discussion on that some time ago and the disadvantage of having all the libraries in the distribution is the dependency mess that they pull. In this case I would rather add documentation on putting them in the user jar then. As for the other libraries they do not depend on so much external code, so +1 for putting them in. On Fri, Jun 12, 2015 at 10:02 AM, Maximilian Michels m...@apache.org wrote: We should have a nightly cluster test for every library. Let's keep that in mind for the future. Very nice find, Till! Since there were not objections, I cherry-picked the proposed commits from the document to the release-0.9 branch. If I understand correctly, we can create the new release candidate once Till has checked the licenses, Ufuk's TableInput fix has been merged, and Fabian's web interface improvement are in. Plus, we need to include all Flink libraries in flink-dist. Are you going to fix that as well, Till? On Fri, Jun 12, 2015 at 9:53 AM, Ufuk Celebi u...@apache.org wrote: On 12 Jun 2015, at 09:45, Till Rohrmann trohrm...@apache.org wrote: Hi guys, I just noticed while testing the TableAPI on the cluster that it is not part of the dist module. Therefore, programs using the TableAPI will only run when you put the TableAPI jar directly on the cluster or if you build a fat jar including the TableAPI jar. This is nowhere documented. Furthermore, this also applies to Gelly and FlinkML. I think all of these should be included in the fat jar. They are all highly advertized components. Very good catch, Till! I didn't get around to testing Table API on a cluster, yet.
Re: Testing Apache Flink 0.9.0-rc1
On 12 Jun 2015, at 00:40, Ufuk Celebi u...@apache.org wrote: On 11 Jun 2015, at 20:04, Fabian Hueske fhue...@gmail.com wrote: How about the following issues? 1. The Hbase Hadoop Compat issue, Ufuk is working on I was not able to reproduce this :( I ran HadoopInputFormats against various sources and confirmed the results and everything was fine so far. The issue has been resolved as Not a problem. There was some misconfiguration in the user code.
Re: Testing Apache Flink 0.9.0-rc1
I'm currently going through the license file and I discovered some skeletons in our closet. This has to be merged as well. But I'm still working on it (we have a lot of dependencies). Cheers, Till On Fri, Jun 12, 2015 at 12:51 AM Ufuk Celebi u...@apache.org wrote: On 12 Jun 2015, at 00:49, Fabian Hueske fhue...@gmail.com wrote: 2. is basically done. I have a patch which updates the counters on page reload but that shouldn't be hard to extend to dynamic updates. Very nice! :-) Thanks!
Re: Testing Apache Flink 0.9.0-rc1
I'm in favour of option b) as well. On Fri, Jun 12, 2015 at 12:05 PM Ufuk Celebi u...@apache.org wrote: Yes, the LICENSE files are definitely a release blocker. a) Either we wait with the RC until we have fixed the LICENSES, or b) Put out next RC to continue with testing and then update it with the LICENSE [either we find something before the LICENSE update or we only have to review the LICENSE change] Since this is not a vote yet, it doesn't really matter, but I'm leaning towards b). On Fri, Jun 12, 2015 at 11:43 AM, Till Rohrmann till.rohrm...@gmail.com wrote: What about the shaded jars? On Fri, Jun 12, 2015 at 11:32 AM Ufuk Celebi u...@apache.org wrote: @Max: for the new RC. Can you make sure to set the variables correctly with regard to stable/snapshot versions in the docs?
Re: Testing Apache Flink 0.9.0-rc1
+1 for b) I'm organizing + merging the commits that need to go the new candidate right now. Will let you know, when I am done. 2015-06-12 14:03 GMT+02:00 Till Rohrmann till.rohrm...@gmail.com: I'm in favour of option b) as well. On Fri, Jun 12, 2015 at 12:05 PM Ufuk Celebi u...@apache.org wrote: Yes, the LICENSE files are definitely a release blocker. a) Either we wait with the RC until we have fixed the LICENSES, or b) Put out next RC to continue with testing and then update it with the LICENSE [either we find something before the LICENSE update or we only have to review the LICENSE change] Since this is not a vote yet, it doesn't really matter, but I'm leaning towards b). On Fri, Jun 12, 2015 at 11:43 AM, Till Rohrmann till.rohrm...@gmail.com wrote: What about the shaded jars? On Fri, Jun 12, 2015 at 11:32 AM Ufuk Celebi u...@apache.org wrote: @Max: for the new RC. Can you make sure to set the variables correctly with regard to stable/snapshot versions in the docs?
Re: Testing Apache Flink 0.9.0-rc1
Regarding the discussion with including ML, Gelly, streaming connectors into flink-dist. I'm strongly against adding those into our jar because they blow up the dependencies we are shipping by default. Also, the maven archetype sets up everything so that the dependencies are packaged into the usercode jar. I'd say most of the time users are using custom dependencies anyways (Guava), so they need to set this up properly. I would not start recommending our users putting their dependencies into the lib/ folder. Its much more convenient to let maven do the fat-jar packaging. On Fri, Jun 12, 2015 at 9:44 AM, Till Rohrmann trohrm...@apache.org wrote: I've finished the legal check of the source and binary distribution. The PR with the LICENSE and NOTICE file updates can be found here [1]. What I haven't done yet is addressing the issue with the shaded dependencies. I think that we have to add to all jars which contain dependencies as binary data a LICENSE/NOTICE file referencing the included dependencies if they are not licensed under Apache-2.0 or contain a special NOTICE portion. Cheers, Till [1] https://github.com/apache/flink/pull/830 On Fri, Jun 12, 2015 at 5:44 PM Maximilian Michels m...@apache.org wrote: I almost finished creating the new release candidate. Then the maven deploy command failed on me for the hadoop1 profile: [INFO] [INFO] BUILD FAILURE [INFO] [INFO] Total time: 19:15.388s [INFO] Finished at: Fri Jun 12 15:25:50 UTC 2015 [INFO] Final Memory: 126M/752M [INFO] [ERROR] Failed to execute goal org.apache.maven.plugins:maven-checkstyle-plugin:2.12.1:check (validate) on project flink-language-binding-generic: Failed during checkstyle executi on: Unable to find suppressions file at location: /tools/maven/suppressions.xml: Could not find resource '/tools/maven/suppressions.xml'. - [Help 1] [ERROR] [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch. [ERROR] Re-run Maven using the -X switch to enable full debug logging. [ERROR] [ERROR] For more information about the errors and possible solutions, please read the following articles: [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException [ERROR] [ERROR] After correcting the problems, you can resume the build with the command [ERROR] mvn goals -rf :flink-language-binding-generic I need to look into this later. Unfortunately, I'm traveling this weekend. On Fri, Jun 12, 2015 at 3:34 PM, Fabian Hueske fhue...@gmail.com wrote: OK, guys. I merged and pushed the last outstanding commits to the release-0.9 branch. Good to go for a new candidate. 2015-06-12 14:30 GMT+02:00 Maximilian Michels m...@apache.org: +1 Let's constitute the changes in a new release candidate. On Fri, Jun 12, 2015 at 2:06 PM, Fabian Hueske fhue...@gmail.com wrote: +1 for b) I'm organizing + merging the commits that need to go the new candidate right now. Will let you know, when I am done. 2015-06-12 14:03 GMT+02:00 Till Rohrmann till.rohrm...@gmail.com : I'm in favour of option b) as well. On Fri, Jun 12, 2015 at 12:05 PM Ufuk Celebi u...@apache.org wrote: Yes, the LICENSE files are definitely a release blocker. a) Either we wait with the RC until we have fixed the LICENSES, or b) Put out next RC to continue with testing and then update it with the LICENSE [either we find something before the LICENSE update or we only have to review the LICENSE change] Since this is not a vote yet, it doesn't really matter, but I'm leaning towards b). On Fri, Jun 12, 2015 at 11:43 AM, Till Rohrmann till.rohrm...@gmail.com wrote: What about the shaded jars? On Fri, Jun 12, 2015 at 11:32 AM Ufuk Celebi u...@apache.org wrote: @Max: for the new RC. Can you make sure to set the variables correctly with regard to stable/snapshot versions in the docs?
Re: Testing Apache Flink 0.9.0-rc1
I agree mostly with Robert. However, one could also argue that by not including the libraries in the dist package, the user code jar will also be blown up by the dependencies added by the library. This will slow down job submission, because it has to be distributed on the cluster. Furthermore, I wouldn't expect all our users to use the quickstarts archetypes or to set up maven such that it builds a fat jar. I think the best is to explicitly document how to use the libraries and what to do in order to run it on the cluster. On Jun 12, 2015 9:15 PM, Robert Metzger rmetz...@apache.org wrote: Regarding the discussion with including ML, Gelly, streaming connectors into flink-dist. I'm strongly against adding those into our jar because they blow up the dependencies we are shipping by default. Also, the maven archetype sets up everything so that the dependencies are packaged into the usercode jar. I'd say most of the time users are using custom dependencies anyways (Guava), so they need to set this up properly. I would not start recommending our users putting their dependencies into the lib/ folder. Its much more convenient to let maven do the fat-jar packaging. On Fri, Jun 12, 2015 at 9:44 AM, Till Rohrmann trohrm...@apache.org wrote: I've finished the legal check of the source and binary distribution. The PR with the LICENSE and NOTICE file updates can be found here [1]. What I haven't done yet is addressing the issue with the shaded dependencies. I think that we have to add to all jars which contain dependencies as binary data a LICENSE/NOTICE file referencing the included dependencies if they are not licensed under Apache-2.0 or contain a special NOTICE portion. Cheers, Till [1] https://github.com/apache/flink/pull/830 On Fri, Jun 12, 2015 at 5:44 PM Maximilian Michels m...@apache.org wrote: I almost finished creating the new release candidate. Then the maven deploy command failed on me for the hadoop1 profile: [INFO] [INFO] BUILD FAILURE [INFO] [INFO] Total time: 19:15.388s [INFO] Finished at: Fri Jun 12 15:25:50 UTC 2015 [INFO] Final Memory: 126M/752M [INFO] [ERROR] Failed to execute goal org.apache.maven.plugins:maven-checkstyle-plugin:2.12.1:check (validate) on project flink-language-binding-generic: Failed during checkstyle executi on: Unable to find suppressions file at location: /tools/maven/suppressions.xml: Could not find resource '/tools/maven/suppressions.xml'. - [Help 1] [ERROR] [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch. [ERROR] Re-run Maven using the -X switch to enable full debug logging. [ERROR] [ERROR] For more information about the errors and possible solutions, please read the following articles: [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException [ERROR] [ERROR] After correcting the problems, you can resume the build with the command [ERROR] mvn goals -rf :flink-language-binding-generic I need to look into this later. Unfortunately, I'm traveling this weekend. On Fri, Jun 12, 2015 at 3:34 PM, Fabian Hueske fhue...@gmail.com wrote: OK, guys. I merged and pushed the last outstanding commits to the release-0.9 branch. Good to go for a new candidate. 2015-06-12 14:30 GMT+02:00 Maximilian Michels m...@apache.org: +1 Let's constitute the changes in a new release candidate. On Fri, Jun 12, 2015 at 2:06 PM, Fabian Hueske fhue...@gmail.com wrote: +1 for b) I'm organizing + merging the commits that need to go the new candidate right now. Will let you know, when I am done. 2015-06-12 14:03 GMT+02:00 Till Rohrmann till.rohrm...@gmail.com : I'm in favour of option b) as well. On Fri, Jun 12, 2015 at 12:05 PM Ufuk Celebi u...@apache.org wrote: Yes, the LICENSE files are definitely a release blocker. a) Either we wait with the RC until we have fixed the LICENSES, or b) Put out next RC to continue with testing and then update it with the LICENSE [either we find something before the LICENSE update or we only have to review the LICENSE change] Since this is not a vote yet, it doesn't really matter, but I'm leaning towards b). On Fri, Jun 12, 2015 at 11:43 AM, Till Rohrmann till.rohrm...@gmail.com wrote: What about the shaded jars? On Fri, Jun 12, 2015 at 11:32 AM Ufuk Celebi u...@apache.org wrote:
Re: Testing Apache Flink 0.9.0-rc1
Yes, we would include those in the new release candidate. On Jun 11, 2015 5:22 PM, Aljoscha Krettek aljos...@apache.org wrote: Aren't there still some commits at the top of the release document that need to be cherry-picked to the release branch? On Thu, 11 Jun 2015 at 17:13 Maximilian Michels m...@apache.org wrote: The deadlock in the scheduler is now fixed. Based on the changes that have been push to the release-0.9 branch, I'd like to create a new release candidate later on. I think we have gotten the most critical issues out of the way. Would that be ok for you? On Wed, Jun 10, 2015 at 5:56 PM, Fabian Hueske fhue...@gmail.com wrote: Yes, that needs to be fixed IMO 2015-06-10 17:51 GMT+02:00 Till Rohrmann trohrm...@apache.org: Yes since it is clearly a deadlock in the scheduler, the current version shouldn't be released. On Wed, Jun 10, 2015 at 5:48 PM Ufuk Celebi u...@apache.org wrote: On 10 Jun 2015, at 16:18, Maximilian Michels m...@apache.org wrote: I'm debugging the TaskManagerFailsWithSlotSharingITCase. I've located its cause but still need to find out how to fix it. Very good find, Max! Max, Till, and I have looked into this and it is a reproducible deadlock in the scheduler during concurrent slot release (in failure cases). Max will attach the relevant stack trace to the issue. I think this is a release blocker. Any opinions? – Ufuk
Re: Testing Apache Flink 0.9.0-rc1
How about the following issues? 1. The Hbase Hadoop Compat issue, Ufuk is working on 2. The incorrect webinterface counts @Ufuk were you able to reproduce the bug? The deadlock in the scheduler is now fixed. Based on the changes that have been push to the release-0.9 branch, I'd like to create a new release candidate later on. I think we have gotten the most critical issues out of the way. Would that be ok for you? On Wed, Jun 10, 2015 at 5:56 PM, Fabian Hueske fhue...@gmail.com wrote: Yes, that needs to be fixed IMO 2015-06-10 17:51 GMT+02:00 Till Rohrmann trohrm...@apache.org: Yes since it is clearly a deadlock in the scheduler, the current version shouldn't be released. On Wed, Jun 10, 2015 at 5:48 PM Ufuk Celebi u...@apache.org wrote: On 10 Jun 2015, at 16:18, Maximilian Michels m...@apache.org wrote: I'm debugging the TaskManagerFailsWithSlotSharingITCase. I've located its cause but still need to find out how to fix it. Very good find, Max! Max, Till, and I have looked into this and it is a reproducible deadlock in the scheduler during concurrent slot release (in failure cases). Max will attach the relevant stack trace to the issue. I think this is a release blocker. Any opinions? – Ufuk
Re: Testing Apache Flink 0.9.0-rc1
On 12 Jun 2015, at 00:49, Fabian Hueske fhue...@gmail.com wrote: 2. is basically done. I have a patch which updates the counters on page reload but that shouldn't be hard to extend to dynamic updates. Very nice! :-) Thanks!
Re: Testing Apache Flink 0.9.0-rc1
The deadlock in the scheduler is now fixed. Based on the changes that have been push to the release-0.9 branch, I'd like to create a new release candidate later on. I think we have gotten the most critical issues out of the way. Would that be ok for you? On Wed, Jun 10, 2015 at 5:56 PM, Fabian Hueske fhue...@gmail.com wrote: Yes, that needs to be fixed IMO 2015-06-10 17:51 GMT+02:00 Till Rohrmann trohrm...@apache.org: Yes since it is clearly a deadlock in the scheduler, the current version shouldn't be released. On Wed, Jun 10, 2015 at 5:48 PM Ufuk Celebi u...@apache.org wrote: On 10 Jun 2015, at 16:18, Maximilian Michels m...@apache.org wrote: I'm debugging the TaskManagerFailsWithSlotSharingITCase. I've located its cause but still need to find out how to fix it. Very good find, Max! Max, Till, and I have looked into this and it is a reproducible deadlock in the scheduler during concurrent slot release (in failure cases). Max will attach the relevant stack trace to the issue. I think this is a release blocker. Any opinions? – Ufuk
Re: Testing Apache Flink 0.9.0-rc1
Aren't there still some commits at the top of the release document that need to be cherry-picked to the release branch? On Thu, 11 Jun 2015 at 17:13 Maximilian Michels m...@apache.org wrote: The deadlock in the scheduler is now fixed. Based on the changes that have been push to the release-0.9 branch, I'd like to create a new release candidate later on. I think we have gotten the most critical issues out of the way. Would that be ok for you? On Wed, Jun 10, 2015 at 5:56 PM, Fabian Hueske fhue...@gmail.com wrote: Yes, that needs to be fixed IMO 2015-06-10 17:51 GMT+02:00 Till Rohrmann trohrm...@apache.org: Yes since it is clearly a deadlock in the scheduler, the current version shouldn't be released. On Wed, Jun 10, 2015 at 5:48 PM Ufuk Celebi u...@apache.org wrote: On 10 Jun 2015, at 16:18, Maximilian Michels m...@apache.org wrote: I'm debugging the TaskManagerFailsWithSlotSharingITCase. I've located its cause but still need to find out how to fix it. Very good find, Max! Max, Till, and I have looked into this and it is a reproducible deadlock in the scheduler during concurrent slot release (in failure cases). Max will attach the relevant stack trace to the issue. I think this is a release blocker. Any opinions? – Ufuk
Re: Testing Apache Flink 0.9.0-rc1
Regarding the iteration partitioning feature, since I use it of course I find it very useful, but it is true that it needs to be tested more extensively and also be discussed by the community before it is added in a release. Moreover, given the fact that I can still use it for research purposes (I had already cherry-picked before it is being merged to the master branch), there is no actual reason to put it in the next release, so that the community has more time to discuss and decide about the feature. Lastly, I cross checked the SAMOA application, and till now, there is still no algorithm implemented in the SAMOA API which needs the new feature. Faye. 2015-06-10 11:28 GMT+02:00 Sachin Goel sachingoel0...@gmail.com: I have run mvn clean verify five times now and every time I'm getting these failed tests: BlobUtilsTest.before:45 null BlobUtilsTest.before:45 null BlobServerDeleteTest.testDeleteFails:291 null BlobLibraryCacheManagerTest.testRegisterAndDownload:196 Could not remove write permissions from cache directory BlobServerPutTest.testPutBufferFails:224 null BlobServerPutTest.testPutNamedBufferFails:286 null JobManagerStartupTest.before:55 null JobManagerStartupTest.before:55 null DataSinkTaskTest.testFailingDataSinkTask:317 Temp output file has not been removed DataSinkTaskTest.testFailingSortingDataSinkTask:358 Temp output file has not been removed TaskManagerTest.testSubmitAndExecuteTask**:123 assertion failed: timeout (19998080696 nanoseconds) during expectMsgClass waiting for class org.apache.flink.runtime.messages.RegistrationMessages$RegisterTaskManager TaskManagerProcessReapingTest.testReapProcessOnFailure:133 TaskManager process did not launch the TaskManager properly. Failed to look up akka.tcp://flink@127.0.0.1:50673/user/taskmanager ** fails randomly. Is someone able to reproduce these while building on a windows machine? I would try to debug these myself but I'm not yet familiar with the core architecture and API. -- Sachin On Wed, Jun 10, 2015 at 2:46 PM, Aljoscha Krettek aljos...@apache.org wrote: The KMeans quickstart example does not work with the current state of the KMeansDataGenerator. I created PR that brings the two in sync. This should probably go into the release since it affects initial user satisfaction. On Wed, Jun 10, 2015 at 11:14 AM, Márton Balassi balassi.mar...@gmail.com wrote: As for the streaming commit cherry-picked to the release branch: This is an unfortunate communication issue, let us make sure that we clearly communicate similar issues in the future. As for FLINK-2192: This is essentially a duplicate issue of the testability of the streaming iteration. Not a blocker, I will comment on the JIRA ticket, Gabor Hermann is already working on the root cause. On Wed, Jun 10, 2015 at 11:07 AM, Ufuk Celebi u...@apache.org wrote: Hey Gyula, Max, On 10 Jun 2015, at 10:54, Gyula Fóra gyula.f...@gmail.com wrote: This feature needs to be included in the release, it has been tested and used extensively. And many applciations depend on it. It would be nice to announce/discuss this before just cherry-picking it into the release branch. The issue is that no one (except you) knows that this is important. Let's just make sure to do this for future fixes. Having said that... it seems to be an important fix. Does someone have time (looking at Aljoscha ;)) to review the changes? Maximilian Michels m...@apache.org ezt írta (időpont: 2015. jún. 10., Sze, 10:47): With all the issues discovered, it looks like we'll have another release candidate. Right now, we have discovered the following problems: 1 YARN ITCase fails [fixed via 2eb5cfe] 2 No Jar for SessionWindowing example [fixed in #809] 3 Wrong description of the input format for the graph examples (eg. ConnectedComponents) [fixed in #809] 4 TaskManagerFailsWithSlotSharingITCase fails 5 ComplexIntegrationTest.complexIntegrationTest1() (FLINK-2192) fails Can we verify that the tests are defect and not the tested component? ;) Otherwise, I would not block the release on flakey tests. 6 Submitting KMeans example to Web Submission Client does not work on Firefox. 7 Zooming is buggy in Web Submission Client (Firefox) Do we have someone familiar with the web interface who could take a look at the Firefox issues? If not, I would not block the release on this.
Re: Testing Apache Flink 0.9.0-rc1
Adding one more thing to the list: The code contains a misplaced class (mea culpa) in flink-java, org.apache.flink.api.java.SortPartitionOperator which is API facing and should be moved to the operators package. If we do that after the release, it will break binary compatibility. I created FLINK-2196 and will open a PR soon. If nobody objects, I'll merge it into the 0.9 release branch as well. 2015-06-10 11:02 GMT+02:00 Maximilian Michels m...@apache.org: I'm not against including the feature but I'd like to discuss it first. I believe that only very carefully selected commits should be added to release-0.9. If that feature happens to be tested extensively and is very important for user satisfactory then we might include it. On Wed, Jun 10, 2015 at 10:59 AM, F. Beligianni faybeligia...@gmail.com wrote: I agree with Gyula regarding the iteration partitioning. I have also been using this feature for developing machine learning algorithms. And I think SAMOA also needs this feature. Faye 2015-06-10 10:54 GMT+02:00 Gyula Fóra gyula.f...@gmail.com: This feature needs to be included in the release, it has been tested and used extensively. And many applciations depend on it. Maximilian Michels m...@apache.org ezt írta (időpont: 2015. jún. 10., Sze, 10:47): With all the issues discovered, it looks like we'll have another release candidate. Right now, we have discovered the following problems: 1 YARN ITCase fails [fixed via 2eb5cfe] 2 No Jar for SessionWindowing example [fixed in #809] 3 Wrong description of the input format for the graph examples (eg. ConnectedComponents) [fixed in #809] 4 TaskManagerFailsWithSlotSharingITCase fails 5 ComplexIntegrationTest.complexIntegrationTest1() (FLINK-2192) fails 6 Submitting KMeans example to Web Submission Client does not work on Firefox. 7 Zooming is buggy in Web Submission Client (Firefox) Do we have someone familiar with the web interface who could take a look at the Firefox issues? One more important thing: The release-0.9 branch should only be used for bug fixes or prior discussed feature changes. Adding new features defies the purpose of carefully testing in advance and can have unforeseeable consequences. In particular, I'm referring to #810 pull request: https://github.com/apache/flink/pull/810 IMHO, this one shouldn't have been cherry-picked onto the release-0.9 branch. I would like to remove it from there if no objections are raised. https://github.com/apache/flink/commit/e0e6f59f309170e5217bdfbf5d30db87c947f8ce On Wed, Jun 10, 2015 at 8:52 AM, Aljoscha Krettek aljos...@apache.org wrote: This doesn't look good, yes. On Wed, Jun 10, 2015 at 1:32 AM, Ufuk Celebi u...@apache.org wrote: While looking into FLINK-2188 (HBase input) I've discovered that Hadoop input formats implementing Configurable (like mapreduce.TableInputFormat) don't have the Hadoop configuration set via setConf(Configuration). I have a small fix for this, which I have to clean up. First, I wanted to check what you think about this issue wrt the release. Personally, I think this is a release blocker, because it essentially means that no Hadoop input format, which relies on the Configuration instance to be set this way will work (this is to some extent a bug of the respective input formats) – most notably the HBase TableInputFormat. – Ufuk On 09 Jun 2015, at 18:07, Chiwan Park chiwanp...@icloud.com wrote: I attached jps and jstack log about hanging TaskManagerFailsWithSlotSharingITCase to JIRA FLINK-2183. Regards, Chiwan Park On Jun 10, 2015, at 12:28 AM, Aljoscha Krettek aljos...@apache.org wrote: I discovered something that might be a feature, rather than a bug. When you submit an example using the web client without giving parameters the program fails with this: org.apache.flink.client.program.ProgramInvocationException: The main method caused an error. at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:452) at org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:353) at org.apache.flink.client.program.Client.run(Client.java:315) at org.apache.flink.client.web.JobSubmissionServlet.doGet(JobSubmissionServlet.java:302) at javax.servlet.http.HttpServlet.service(HttpServlet.java:668) at javax.servlet.http.HttpServlet.service(HttpServlet.java:770) at
Re: Testing Apache Flink 0.9.0-rc1
Hey Gyula, Max, On 10 Jun 2015, at 10:54, Gyula Fóra gyula.f...@gmail.com wrote: This feature needs to be included in the release, it has been tested and used extensively. And many applciations depend on it. It would be nice to announce/discuss this before just cherry-picking it into the release branch. The issue is that no one (except you) knows that this is important. Let's just make sure to do this for future fixes. Having said that... it seems to be an important fix. Does someone have time (looking at Aljoscha ;)) to review the changes? Maximilian Michels m...@apache.org ezt írta (időpont: 2015. jún. 10., Sze, 10:47): With all the issues discovered, it looks like we'll have another release candidate. Right now, we have discovered the following problems: 1 YARN ITCase fails [fixed via 2eb5cfe] 2 No Jar for SessionWindowing example [fixed in #809] 3 Wrong description of the input format for the graph examples (eg. ConnectedComponents) [fixed in #809] 4 TaskManagerFailsWithSlotSharingITCase fails 5 ComplexIntegrationTest.complexIntegrationTest1() (FLINK-2192) fails Can we verify that the tests are defect and not the tested component? ;) Otherwise, I would not block the release on flakey tests. 6 Submitting KMeans example to Web Submission Client does not work on Firefox. 7 Zooming is buggy in Web Submission Client (Firefox) Do we have someone familiar with the web interface who could take a look at the Firefox issues? If not, I would not block the release on this.
Re: Testing Apache Flink 0.9.0-rc1
I added a section at the top of the release testing document to keep track of commits that we might want to cherry-pick to the release. I included the YARNSessionFIFOITCase fix and the optional stream iteration partitioning (both already on release branch). On Wed, Jun 10, 2015 at 12:51 PM, Fabian Hueske fhue...@gmail.com wrote: @Sachin: I reproduced the build error on my Windows machine. 2015-06-10 12:22 GMT+02:00 Maximilian Michels m...@apache.org: @Sachin: This looks like a file permission issue. We should have someone else verify that on a Windows system. On Wed, Jun 10, 2015 at 11:28 AM, Sachin Goel sachingoel0...@gmail.com wrote: I have run mvn clean verify five times now and every time I'm getting these failed tests: BlobUtilsTest.before:45 null BlobUtilsTest.before:45 null BlobServerDeleteTest.testDeleteFails:291 null BlobLibraryCacheManagerTest.testRegisterAndDownload:196 Could not remove write permissions from cache directory BlobServerPutTest.testPutBufferFails:224 null BlobServerPutTest.testPutNamedBufferFails:286 null JobManagerStartupTest.before:55 null JobManagerStartupTest.before:55 null DataSinkTaskTest.testFailingDataSinkTask:317 Temp output file has not been removed DataSinkTaskTest.testFailingSortingDataSinkTask:358 Temp output file has not been removed TaskManagerTest.testSubmitAndExecuteTask**:123 assertion failed: timeout (19998080696 nanoseconds) during expectMsgClass waiting for class org.apache.flink.runtime.messages.RegistrationMessages$RegisterTaskManager TaskManagerProcessReapingTest.testReapProcessOnFailure:133 TaskManager process did not launch the TaskManager properly. Failed to look up akka.tcp://flink@127.0.0.1:50673/user/taskmanager ** fails randomly. Is someone able to reproduce these while building on a windows machine? I would try to debug these myself but I'm not yet familiar with the core architecture and API. -- Sachin On Wed, Jun 10, 2015 at 2:46 PM, Aljoscha Krettek aljos...@apache.org wrote: The KMeans quickstart example does not work with the current state of the KMeansDataGenerator. I created PR that brings the two in sync. This should probably go into the release since it affects initial user satisfaction. On Wed, Jun 10, 2015 at 11:14 AM, Márton Balassi balassi.mar...@gmail.com wrote: As for the streaming commit cherry-picked to the release branch: This is an unfortunate communication issue, let us make sure that we clearly communicate similar issues in the future. As for FLINK-2192: This is essentially a duplicate issue of the testability of the streaming iteration. Not a blocker, I will comment on the JIRA ticket, Gabor Hermann is already working on the root cause. On Wed, Jun 10, 2015 at 11:07 AM, Ufuk Celebi u...@apache.org wrote: Hey Gyula, Max, On 10 Jun 2015, at 10:54, Gyula Fóra gyula.f...@gmail.com wrote: This feature needs to be included in the release, it has been tested and used extensively. And many applciations depend on it. It would be nice to announce/discuss this before just cherry-picking it into the release branch. The issue is that no one (except you) knows that this is important. Let's just make sure to do this for future fixes. Having said that... it seems to be an important fix. Does someone have time (looking at Aljoscha ;)) to review the changes? Maximilian Michels m...@apache.org ezt írta (időpont: 2015. jún. 10., Sze, 10:47): With all the issues discovered, it looks like we'll have another release candidate. Right now, we have discovered the following problems: 1 YARN ITCase fails [fixed via 2eb5cfe] 2 No Jar for SessionWindowing example [fixed in #809] 3 Wrong description of the input format for the graph examples (eg. ConnectedComponents) [fixed in #809] 4 TaskManagerFailsWithSlotSharingITCase fails 5 ComplexIntegrationTest.complexIntegrationTest1() (FLINK-2192) fails Can we verify that the tests are defect and not the tested component? ;) Otherwise, I would not block the release on flakey tests. 6 Submitting KMeans example to Web Submission Client does not work on Firefox. 7 Zooming is buggy in Web Submission Client (Firefox) Do we have someone familiar with the web interface who could take a look at the Firefox issues? If not, I would not block the release on this.
Re: Testing Apache Flink 0.9.0-rc1
With all the issues discovered, it looks like we'll have another release candidate. Right now, we have discovered the following problems: 1 YARN ITCase fails [fixed via 2eb5cfe] 2 No Jar for SessionWindowing example [fixed in #809] 3 Wrong description of the input format for the graph examples (eg. ConnectedComponents) [fixed in #809] 4 TaskManagerFailsWithSlotSharingITCase fails 5 ComplexIntegrationTest.complexIntegrationTest1() (FLINK-2192) fails 6 Submitting KMeans example to Web Submission Client does not work on Firefox. 7 Zooming is buggy in Web Submission Client (Firefox) Do we have someone familiar with the web interface who could take a look at the Firefox issues? One more important thing: The release-0.9 branch should only be used for bug fixes or prior discussed feature changes. Adding new features defies the purpose of carefully testing in advance and can have unforeseeable consequences. In particular, I'm referring to #810 pull request: https://github.com/apache/flink/pull/810 IMHO, this one shouldn't have been cherry-picked onto the release-0.9 branch. I would like to remove it from there if no objections are raised. https://github.com/apache/flink/commit/e0e6f59f309170e5217bdfbf5d30db87c947f8ce On Wed, Jun 10, 2015 at 8:52 AM, Aljoscha Krettek aljos...@apache.org wrote: This doesn't look good, yes. On Wed, Jun 10, 2015 at 1:32 AM, Ufuk Celebi u...@apache.org wrote: While looking into FLINK-2188 (HBase input) I've discovered that Hadoop input formats implementing Configurable (like mapreduce.TableInputFormat) don't have the Hadoop configuration set via setConf(Configuration). I have a small fix for this, which I have to clean up. First, I wanted to check what you think about this issue wrt the release. Personally, I think this is a release blocker, because it essentially means that no Hadoop input format, which relies on the Configuration instance to be set this way will work (this is to some extent a bug of the respective input formats) – most notably the HBase TableInputFormat. – Ufuk On 09 Jun 2015, at 18:07, Chiwan Park chiwanp...@icloud.com wrote: I attached jps and jstack log about hanging TaskManagerFailsWithSlotSharingITCase to JIRA FLINK-2183. Regards, Chiwan Park On Jun 10, 2015, at 12:28 AM, Aljoscha Krettek aljos...@apache.org wrote: I discovered something that might be a feature, rather than a bug. When you submit an example using the web client without giving parameters the program fails with this: org.apache.flink.client.program.ProgramInvocationException: The main method caused an error. at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:452) at org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:353) at org.apache.flink.client.program.Client.run(Client.java:315) at org.apache.flink.client.web.JobSubmissionServlet.doGet(JobSubmissionServlet.java:302) at javax.servlet.http.HttpServlet.service(HttpServlet.java:668) at javax.servlet.http.HttpServlet.service(HttpServlet.java:770) at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:532) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:453) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:227) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:965) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:388) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:187) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:901) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:117) at org.eclipse.jetty.server.handler.HandlerList.handle(HandlerList.java:47) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:113) at org.eclipse.jetty.server.Server.handle(Server.java:352) at org.eclipse.jetty.server.HttpConnection.handleRequest(HttpConnection.java:596) at org.eclipse.jetty.server.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:1048) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:549) at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:211) at org.eclipse.jetty.server.HttpConnection.handle(HttpConnection.java:425) at org.eclipse.jetty.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:489) at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:436) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.NullPointerException at
Re: Testing Apache Flink 0.9.0-rc1
I have run mvn clean verify five times now and every time I'm getting these failed tests: BlobUtilsTest.before:45 null BlobUtilsTest.before:45 null BlobServerDeleteTest.testDeleteFails:291 null BlobLibraryCacheManagerTest.testRegisterAndDownload:196 Could not remove write permissions from cache directory BlobServerPutTest.testPutBufferFails:224 null BlobServerPutTest.testPutNamedBufferFails:286 null JobManagerStartupTest.before:55 null JobManagerStartupTest.before:55 null DataSinkTaskTest.testFailingDataSinkTask:317 Temp output file has not been removed DataSinkTaskTest.testFailingSortingDataSinkTask:358 Temp output file has not been removed TaskManagerTest.testSubmitAndExecuteTask**:123 assertion failed: timeout (19998080696 nanoseconds) during expectMsgClass waiting for class org.apache.flink.runtime.messages.RegistrationMessages$RegisterTaskManager TaskManagerProcessReapingTest.testReapProcessOnFailure:133 TaskManager process did not launch the TaskManager properly. Failed to look up akka.tcp://flink@127.0.0.1:50673/user/taskmanager ** fails randomly. Is someone able to reproduce these while building on a windows machine? I would try to debug these myself but I'm not yet familiar with the core architecture and API. -- Sachin On Wed, Jun 10, 2015 at 2:46 PM, Aljoscha Krettek aljos...@apache.org wrote: The KMeans quickstart example does not work with the current state of the KMeansDataGenerator. I created PR that brings the two in sync. This should probably go into the release since it affects initial user satisfaction. On Wed, Jun 10, 2015 at 11:14 AM, Márton Balassi balassi.mar...@gmail.com wrote: As for the streaming commit cherry-picked to the release branch: This is an unfortunate communication issue, let us make sure that we clearly communicate similar issues in the future. As for FLINK-2192: This is essentially a duplicate issue of the testability of the streaming iteration. Not a blocker, I will comment on the JIRA ticket, Gabor Hermann is already working on the root cause. On Wed, Jun 10, 2015 at 11:07 AM, Ufuk Celebi u...@apache.org wrote: Hey Gyula, Max, On 10 Jun 2015, at 10:54, Gyula Fóra gyula.f...@gmail.com wrote: This feature needs to be included in the release, it has been tested and used extensively. And many applciations depend on it. It would be nice to announce/discuss this before just cherry-picking it into the release branch. The issue is that no one (except you) knows that this is important. Let's just make sure to do this for future fixes. Having said that... it seems to be an important fix. Does someone have time (looking at Aljoscha ;)) to review the changes? Maximilian Michels m...@apache.org ezt írta (időpont: 2015. jún. 10., Sze, 10:47): With all the issues discovered, it looks like we'll have another release candidate. Right now, we have discovered the following problems: 1 YARN ITCase fails [fixed via 2eb5cfe] 2 No Jar for SessionWindowing example [fixed in #809] 3 Wrong description of the input format for the graph examples (eg. ConnectedComponents) [fixed in #809] 4 TaskManagerFailsWithSlotSharingITCase fails 5 ComplexIntegrationTest.complexIntegrationTest1() (FLINK-2192) fails Can we verify that the tests are defect and not the tested component? ;) Otherwise, I would not block the release on flakey tests. 6 Submitting KMeans example to Web Submission Client does not work on Firefox. 7 Zooming is buggy in Web Submission Client (Firefox) Do we have someone familiar with the web interface who could take a look at the Firefox issues? If not, I would not block the release on this.
Re: Testing Apache Flink 0.9.0-rc1
I agree with Gyula regarding the iteration partitioning. I have also been using this feature for developing machine learning algorithms. And I think SAMOA also needs this feature. Faye 2015-06-10 10:54 GMT+02:00 Gyula Fóra gyula.f...@gmail.com: This feature needs to be included in the release, it has been tested and used extensively. And many applciations depend on it. Maximilian Michels m...@apache.org ezt írta (időpont: 2015. jún. 10., Sze, 10:47): With all the issues discovered, it looks like we'll have another release candidate. Right now, we have discovered the following problems: 1 YARN ITCase fails [fixed via 2eb5cfe] 2 No Jar for SessionWindowing example [fixed in #809] 3 Wrong description of the input format for the graph examples (eg. ConnectedComponents) [fixed in #809] 4 TaskManagerFailsWithSlotSharingITCase fails 5 ComplexIntegrationTest.complexIntegrationTest1() (FLINK-2192) fails 6 Submitting KMeans example to Web Submission Client does not work on Firefox. 7 Zooming is buggy in Web Submission Client (Firefox) Do we have someone familiar with the web interface who could take a look at the Firefox issues? One more important thing: The release-0.9 branch should only be used for bug fixes or prior discussed feature changes. Adding new features defies the purpose of carefully testing in advance and can have unforeseeable consequences. In particular, I'm referring to #810 pull request: https://github.com/apache/flink/pull/810 IMHO, this one shouldn't have been cherry-picked onto the release-0.9 branch. I would like to remove it from there if no objections are raised. https://github.com/apache/flink/commit/e0e6f59f309170e5217bdfbf5d30db87c947f8ce On Wed, Jun 10, 2015 at 8:52 AM, Aljoscha Krettek aljos...@apache.org wrote: This doesn't look good, yes. On Wed, Jun 10, 2015 at 1:32 AM, Ufuk Celebi u...@apache.org wrote: While looking into FLINK-2188 (HBase input) I've discovered that Hadoop input formats implementing Configurable (like mapreduce.TableInputFormat) don't have the Hadoop configuration set via setConf(Configuration). I have a small fix for this, which I have to clean up. First, I wanted to check what you think about this issue wrt the release. Personally, I think this is a release blocker, because it essentially means that no Hadoop input format, which relies on the Configuration instance to be set this way will work (this is to some extent a bug of the respective input formats) – most notably the HBase TableInputFormat. – Ufuk On 09 Jun 2015, at 18:07, Chiwan Park chiwanp...@icloud.com wrote: I attached jps and jstack log about hanging TaskManagerFailsWithSlotSharingITCase to JIRA FLINK-2183. Regards, Chiwan Park On Jun 10, 2015, at 12:28 AM, Aljoscha Krettek aljos...@apache.org wrote: I discovered something that might be a feature, rather than a bug. When you submit an example using the web client without giving parameters the program fails with this: org.apache.flink.client.program.ProgramInvocationException: The main method caused an error. at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:452) at org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:353) at org.apache.flink.client.program.Client.run(Client.java:315) at org.apache.flink.client.web.JobSubmissionServlet.doGet(JobSubmissionServlet.java:302) at javax.servlet.http.HttpServlet.service(HttpServlet.java:668) at javax.servlet.http.HttpServlet.service(HttpServlet.java:770) at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:532) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:453) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:227) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:965) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:388) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:187) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:901) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:117) at org.eclipse.jetty.server.handler.HandlerList.handle(HandlerList.java:47) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:113) at org.eclipse.jetty.server.Server.handle(Server.java:352) at
Re: Testing Apache Flink 0.9.0-rc1
I'm not against including the feature but I'd like to discuss it first. I believe that only very carefully selected commits should be added to release-0.9. If that feature happens to be tested extensively and is very important for user satisfactory then we might include it. On Wed, Jun 10, 2015 at 10:59 AM, F. Beligianni faybeligia...@gmail.com wrote: I agree with Gyula regarding the iteration partitioning. I have also been using this feature for developing machine learning algorithms. And I think SAMOA also needs this feature. Faye 2015-06-10 10:54 GMT+02:00 Gyula Fóra gyula.f...@gmail.com: This feature needs to be included in the release, it has been tested and used extensively. And many applciations depend on it. Maximilian Michels m...@apache.org ezt írta (időpont: 2015. jún. 10., Sze, 10:47): With all the issues discovered, it looks like we'll have another release candidate. Right now, we have discovered the following problems: 1 YARN ITCase fails [fixed via 2eb5cfe] 2 No Jar for SessionWindowing example [fixed in #809] 3 Wrong description of the input format for the graph examples (eg. ConnectedComponents) [fixed in #809] 4 TaskManagerFailsWithSlotSharingITCase fails 5 ComplexIntegrationTest.complexIntegrationTest1() (FLINK-2192) fails 6 Submitting KMeans example to Web Submission Client does not work on Firefox. 7 Zooming is buggy in Web Submission Client (Firefox) Do we have someone familiar with the web interface who could take a look at the Firefox issues? One more important thing: The release-0.9 branch should only be used for bug fixes or prior discussed feature changes. Adding new features defies the purpose of carefully testing in advance and can have unforeseeable consequences. In particular, I'm referring to #810 pull request: https://github.com/apache/flink/pull/810 IMHO, this one shouldn't have been cherry-picked onto the release-0.9 branch. I would like to remove it from there if no objections are raised. https://github.com/apache/flink/commit/e0e6f59f309170e5217bdfbf5d30db87c947f8ce On Wed, Jun 10, 2015 at 8:52 AM, Aljoscha Krettek aljos...@apache.org wrote: This doesn't look good, yes. On Wed, Jun 10, 2015 at 1:32 AM, Ufuk Celebi u...@apache.org wrote: While looking into FLINK-2188 (HBase input) I've discovered that Hadoop input formats implementing Configurable (like mapreduce.TableInputFormat) don't have the Hadoop configuration set via setConf(Configuration). I have a small fix for this, which I have to clean up. First, I wanted to check what you think about this issue wrt the release. Personally, I think this is a release blocker, because it essentially means that no Hadoop input format, which relies on the Configuration instance to be set this way will work (this is to some extent a bug of the respective input formats) – most notably the HBase TableInputFormat. – Ufuk On 09 Jun 2015, at 18:07, Chiwan Park chiwanp...@icloud.com wrote: I attached jps and jstack log about hanging TaskManagerFailsWithSlotSharingITCase to JIRA FLINK-2183. Regards, Chiwan Park On Jun 10, 2015, at 12:28 AM, Aljoscha Krettek aljos...@apache.org wrote: I discovered something that might be a feature, rather than a bug. When you submit an example using the web client without giving parameters the program fails with this: org.apache.flink.client.program.ProgramInvocationException: The main method caused an error. at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:452) at org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:353) at org.apache.flink.client.program.Client.run(Client.java:315) at org.apache.flink.client.web.JobSubmissionServlet.doGet(JobSubmissionServlet.java:302) at javax.servlet.http.HttpServlet.service(HttpServlet.java:668) at javax.servlet.http.HttpServlet.service(HttpServlet.java:770) at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:532) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:453) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:227) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:965) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:388) at
Re: Testing Apache Flink 0.9.0-rc1
I'm debugging the TaskManagerFailsWithSlotSharingITCase. I've located its cause but still need to find out how to fix it. On Wed, Jun 10, 2015 at 2:25 PM, Aljoscha Krettek aljos...@apache.org wrote: I added a section at the top of the release testing document to keep track of commits that we might want to cherry-pick to the release. I included the YARNSessionFIFOITCase fix and the optional stream iteration partitioning (both already on release branch). On Wed, Jun 10, 2015 at 12:51 PM, Fabian Hueske fhue...@gmail.com wrote: @Sachin: I reproduced the build error on my Windows machine. 2015-06-10 12:22 GMT+02:00 Maximilian Michels m...@apache.org: @Sachin: This looks like a file permission issue. We should have someone else verify that on a Windows system. On Wed, Jun 10, 2015 at 11:28 AM, Sachin Goel sachingoel0...@gmail.com wrote: I have run mvn clean verify five times now and every time I'm getting these failed tests: BlobUtilsTest.before:45 null BlobUtilsTest.before:45 null BlobServerDeleteTest.testDeleteFails:291 null BlobLibraryCacheManagerTest.testRegisterAndDownload:196 Could not remove write permissions from cache directory BlobServerPutTest.testPutBufferFails:224 null BlobServerPutTest.testPutNamedBufferFails:286 null JobManagerStartupTest.before:55 null JobManagerStartupTest.before:55 null DataSinkTaskTest.testFailingDataSinkTask:317 Temp output file has not been removed DataSinkTaskTest.testFailingSortingDataSinkTask:358 Temp output file has not been removed TaskManagerTest.testSubmitAndExecuteTask**:123 assertion failed: timeout (19998080696 nanoseconds) during expectMsgClass waiting for class org.apache.flink.runtime.messages.RegistrationMessages$RegisterTaskManager TaskManagerProcessReapingTest.testReapProcessOnFailure:133 TaskManager process did not launch the TaskManager properly. Failed to look up akka.tcp://flink@127.0.0.1:50673/user/taskmanager ** fails randomly. Is someone able to reproduce these while building on a windows machine? I would try to debug these myself but I'm not yet familiar with the core architecture and API. -- Sachin On Wed, Jun 10, 2015 at 2:46 PM, Aljoscha Krettek aljos...@apache.org wrote: The KMeans quickstart example does not work with the current state of the KMeansDataGenerator. I created PR that brings the two in sync. This should probably go into the release since it affects initial user satisfaction. On Wed, Jun 10, 2015 at 11:14 AM, Márton Balassi balassi.mar...@gmail.com wrote: As for the streaming commit cherry-picked to the release branch: This is an unfortunate communication issue, let us make sure that we clearly communicate similar issues in the future. As for FLINK-2192: This is essentially a duplicate issue of the testability of the streaming iteration. Not a blocker, I will comment on the JIRA ticket, Gabor Hermann is already working on the root cause. On Wed, Jun 10, 2015 at 11:07 AM, Ufuk Celebi u...@apache.org wrote: Hey Gyula, Max, On 10 Jun 2015, at 10:54, Gyula Fóra gyula.f...@gmail.com wrote: This feature needs to be included in the release, it has been tested and used extensively. And many applciations depend on it. It would be nice to announce/discuss this before just cherry-picking it into the release branch. The issue is that no one (except you) knows that this is important. Let's just make sure to do this for future fixes. Having said that... it seems to be an important fix. Does someone have time (looking at Aljoscha ;)) to review the changes? Maximilian Michels m...@apache.org ezt írta (időpont: 2015. jún. 10., Sze, 10:47): With all the issues discovered, it looks like we'll have another release candidate. Right now, we have discovered the following problems: 1 YARN ITCase fails [fixed via 2eb5cfe] 2 No Jar for SessionWindowing example [fixed in #809] 3 Wrong description of the input format for the graph examples (eg. ConnectedComponents) [fixed in #809] 4 TaskManagerFailsWithSlotSharingITCase fails 5 ComplexIntegrationTest.complexIntegrationTest1() (FLINK-2192) fails Can we verify that the tests are defect and not the tested component? ;) Otherwise, I would not block the release on flakey tests. 6 Submitting KMeans example to Web Submission Client does not work on Firefox. 7 Zooming is buggy in Web Submission Client (Firefox) Do we have someone familiar with the web interface who could take a look at the Firefox issues? If not, I would not block the release on this.
Re: Testing Apache Flink 0.9.0-rc1
This doesn't look good, yes. On Wed, Jun 10, 2015 at 1:32 AM, Ufuk Celebi u...@apache.org wrote: While looking into FLINK-2188 (HBase input) I've discovered that Hadoop input formats implementing Configurable (like mapreduce.TableInputFormat) don't have the Hadoop configuration set via setConf(Configuration). I have a small fix for this, which I have to clean up. First, I wanted to check what you think about this issue wrt the release. Personally, I think this is a release blocker, because it essentially means that no Hadoop input format, which relies on the Configuration instance to be set this way will work (this is to some extent a bug of the respective input formats) – most notably the HBase TableInputFormat. – Ufuk On 09 Jun 2015, at 18:07, Chiwan Park chiwanp...@icloud.com wrote: I attached jps and jstack log about hanging TaskManagerFailsWithSlotSharingITCase to JIRA FLINK-2183. Regards, Chiwan Park On Jun 10, 2015, at 12:28 AM, Aljoscha Krettek aljos...@apache.org wrote: I discovered something that might be a feature, rather than a bug. When you submit an example using the web client without giving parameters the program fails with this: org.apache.flink.client.program.ProgramInvocationException: The main method caused an error. at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:452) at org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:353) at org.apache.flink.client.program.Client.run(Client.java:315) at org.apache.flink.client.web.JobSubmissionServlet.doGet(JobSubmissionServlet.java:302) at javax.servlet.http.HttpServlet.service(HttpServlet.java:668) at javax.servlet.http.HttpServlet.service(HttpServlet.java:770) at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:532) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:453) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:227) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:965) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:388) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:187) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:901) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:117) at org.eclipse.jetty.server.handler.HandlerList.handle(HandlerList.java:47) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:113) at org.eclipse.jetty.server.Server.handle(Server.java:352) at org.eclipse.jetty.server.HttpConnection.handleRequest(HttpConnection.java:596) at org.eclipse.jetty.server.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:1048) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:549) at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:211) at org.eclipse.jetty.server.HttpConnection.handle(HttpConnection.java:425) at org.eclipse.jetty.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:489) at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:436) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.NullPointerException at org.apache.flink.api.common.JobExecutionResult.getAccumulatorResult(JobExecutionResult.java:78) at org.apache.flink.api.java.DataSet.collect(DataSet.java:409) at org.apache.flink.api.java.DataSet.print(DataSet.java:1345) at org.apache.flink.examples.java.wordcount.WordCount.main(WordCount.java:80) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:437) ... 24 more This also only occurs when you uncheck the suspend execution while showing plan. I think this arises because the new print() uses collect() which tries to get the job execution result. I guess the result is Null since the job is submitted asynchronously when the checkbox is unchecked. Other than that, the new print() is pretty sweet when you run the builtin examples from the CLI. You get all the state changes and also the result, even when running in cluster mode on several task managers. :D On Tue, Jun 9, 2015 at 3:41 PM, Aljoscha Krettek aljos...@apache.org wrote: I discovered another problem: https://issues.apache.org/jira/browse/FLINK-2191 The closure cleaner cannot be disabled in part of the Streaming Java API and all of the Streaming
Re: Testing Apache Flink 0.9.0-rc1
Yes, that needs to be fixed IMO 2015-06-10 17:51 GMT+02:00 Till Rohrmann trohrm...@apache.org: Yes since it is clearly a deadlock in the scheduler, the current version shouldn't be released. On Wed, Jun 10, 2015 at 5:48 PM Ufuk Celebi u...@apache.org wrote: On 10 Jun 2015, at 16:18, Maximilian Michels m...@apache.org wrote: I'm debugging the TaskManagerFailsWithSlotSharingITCase. I've located its cause but still need to find out how to fix it. Very good find, Max! Max, Till, and I have looked into this and it is a reproducible deadlock in the scheduler during concurrent slot release (in failure cases). Max will attach the relevant stack trace to the issue. I think this is a release blocker. Any opinions? – Ufuk
Re: Testing Apache Flink 0.9.0-rc1
Yes since it is clearly a deadlock in the scheduler, the current version shouldn't be released. On Wed, Jun 10, 2015 at 5:48 PM Ufuk Celebi u...@apache.org wrote: On 10 Jun 2015, at 16:18, Maximilian Michels m...@apache.org wrote: I'm debugging the TaskManagerFailsWithSlotSharingITCase. I've located its cause but still need to find out how to fix it. Very good find, Max! Max, Till, and I have looked into this and it is a reproducible deadlock in the scheduler during concurrent slot release (in failure cases). Max will attach the relevant stack trace to the issue. I think this is a release blocker. Any opinions? – Ufuk
Re: Testing Apache Flink 0.9.0-rc1
I also encountered a failing TaskManagerFailsWithSlotSharingITCase using Java8. I could, however, not reproduce the error a second time. The stack trace is: The JobManager should handle hard failing task manager with slot sharing(org.apache.flink.runtime.jobmanager.TaskManagerFailsWithSlotSharingITCase) Time elapsed: 1,400.148 sec ERROR! java.util.concurrent.TimeoutException: Futures timed out after [20 milliseconds] at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219) at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:153) at scala.concurrent.Await$anonfun$ready$1.apply(package.scala:86) at scala.concurrent.Await$anonfun$ready$1.apply(package.scala:86) at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53) at scala.concurrent.Await$.ready(package.scala:86) at org.apache.flink.runtime.minicluster.FlinkMiniCluster.shutdown(FlinkMiniCluster.scala:162) at org.apache.flink.runtime.minicluster.FlinkMiniCluster.stop(FlinkMiniCluster.scala:149) at org.apache.flink.runtime.jobmanager.TaskManagerFailsWithSlotSharingITCase$anonfun$1$anonfun$apply$mcV$sp$3.apply$mcV$sp(TaskManagerFailsWithSlotSharingITCase.scala:140) at org.apache.flink.runtime.jobmanager.TaskManagerFailsWithSlotSharingITCase$anonfun$1$anonfun$apply$mcV$sp$3.apply(TaskManagerFailsWithSlotSharingITCase.scala:95) at org.apache.flink.runtime.jobmanager.TaskManagerFailsWithSlotSharingITCase$anonfun$1$anonfun$apply$mcV$sp$3.apply(TaskManagerFailsWithSlotSharingITCase.scala:95) at org.scalatest.Transformer$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22) at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85) at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) at org.scalatest.Transformer.apply(Transformer.scala:22) at org.scalatest.Transformer.apply(Transformer.scala:20) at org.scalatest.WordSpecLike$anon$1.apply(WordSpecLike.scala:953) at org.scalatest.Suite$class.withFixture(Suite.scala:1122) at org.apache.flink.runtime.jobmanager.TaskManagerFailsWithSlotSharingITCase.withFixture(TaskManagerFailsWithSlotSharingITCase.scala:36) Results : Tests in error: TaskManagerFailsWithSlotSharingITCase.run:36-org$scalatest$BeforeAndAfterAll$super$run:36-org$scalatest$WordSpecLike$super$run:36-runTests:36-runTest:36-withFixture:36 » Timeout On Tue, Jun 9, 2015 at 11:26 AM Maximilian Michels m...@apache.org http://mailto:m...@apache.org wrote: The name of the Git branch was not correct. Thank you, Aljoscha, for noticing. I've changed it from release-0.9-rc1 to release-0.9.0-rc1. This has no affect on the validity of the release candidate.
Re: Testing Apache Flink 0.9.0-rc1
The name of the Git branch was not correct. Thank you, Aljoscha, for noticing. I've changed it from release-0.9-rc1 to release-0.9.0-rc1. This has no affect on the validity of the release candidate.
Re: Testing Apache Flink 0.9.0-rc1
I would suggest we use this format to notify others that we did a task: Assignees: - Aljoscha: done - Ufuk: found bug in such an such... - Chiwan Park: done, ... The simple status doesn't work with multiple people on one task. On Tue, Jun 9, 2015 at 9:40 AM, Ufuk Celebi u...@apache.org wrote: Hey all, 1. it would be nice if we find more people to also do testing of the streaming API. I think it's especially good to have people on it, which did not use it before. 2. Just to make sure: the assignee field of each task is a list, i.e. we can and should have more people testing per task. ;-) – Ufuk On 08 Jun 2015, at 19:00, Chiwan Park chiwanp...@icloud.com wrote: Hi. I’m very excited about preparing a new major release. :) I just picked two tests. I will report status as soon as possible. Regards, Chiwan Park On Jun 9, 2015, at 1:52 AM, Maximilian Michels m...@apache.org wrote: Hi everyone! As previously discussed, the Flink developer community is very eager to get out a new major release. Apache Flink 0.9.0 will contain lots of new features and many bugfixes. This time, I'll try to coordinate the release process. Feel free to correct me if I'm doing something wrong because I don't no any better :) To release a great version of Flink to the public, I'd like to ask everyone to test the release candidate. Recently, Flink has received a lot of attention. The expectations are quite high. Only through thorough testing we will be able to satisfy all the Flink users out there. Below is a list from the Wiki that we use to ensure the legal and functional aspects of a release [1]. What I would like you to do is pick at least one of the tasks, put your name as assignee in the link below, and report back once you verified it. That way, I hope we can quickly and thoroughly test the release candidate. https://docs.google.com/document/d/1BhyMPTpAUYA8dG8-vJ3gSAmBUAa0PBSRkxIBPsZxkLs/edit Best, Max Git branch: release-0.9-rc1 Release binaries: http://people.apache.org/~mxm/flink-0.9.0-rc1/ Maven artifacts: https://repository.apache.org/content/repositories/orgapacheflink-1037/ PGP public key for verifying the signatures: http://pgp.mit.edu/pks/lookup?op=vindexsearch=0xDE976D18C2909CBF Legal L.1 Check if checksums and GPG files match the corresponding release files L.2 Verify that the source archives do NOT contains any binaries L.3 Check if the source release is building properly with Maven (including license header check (default) and checkstyle). Also the tests should be executed (mvn clean verify) L.4 Verify that the LICENSE and NOTICE file is correct for the binary and source release. L.5 All dependencies must be checked for their license and the license must be ASL 2.0 compatible (http://www.apache.org/legal/resolved.html#category-x) * The LICENSE and NOTICE files in the root directory refer to dependencies in the source release, i.e., files in the git repository (such as fonts, css, JavaScript, images) * The LICENSE and NOTICE files in flink-dist/src/main/flink-bin refer to the binary distribution and mention all of Flink's Maven dependencies as well L.6 Check that all POM files point to the same version (mostly relevant to examine quickstart artifact files) L.7 Read the README.md file Functional F.1 Run the start-local.sh/start-local-streaming.sh, start-cluster.sh/start-cluster-streaming.sh, start-webclient.sh scripts and verify that the processes come up F.2 Examine the *.out files (should be empty) and the log files (should contain no exceptions) * Test for Linux, OS X, Windows (for Windows as far as possible, not all scripts exist) * Shutdown and verify there are no exceptions in the log output (after shutdown) * Check all start+submission scripts for paths with and without spaces (./bin/* scripts are quite fragile for paths with spaces) F.3 local mode (start-local.sh, see criteria below) F.4 cluster mode (start-cluster.sh, see criteria below) F.5 multi-node cluster (can simulate locally by starting two taskmanagers, see criteria below) Criteria for F.3 F.4 F.5 * Verify that the examples are running from both ./bin/flink and from the web-based job submission tool * flink-conf.yml should define more than one task slot * Results of job are produced and correct ** Check also that the examples are running with the build-in data and external sources. * Examine the log output - no error messages should be encountered ** Web interface shows progress and finished job in history F.6 Test on a cluster with HDFS. * Check that a good amount of input splits is read locally (JobManager log reveals local assignments) F.7 Test against a Kafka installation F.8 Test the ./bin/flink command line client * Test info option, paste the JSON into the plan visualizer HTML file, check that plan is rendered * Test the parallelism flag (-p) to override the configured
Re: Testing Apache Flink 0.9.0-rc1
+1 makes sense. On Tue, Jun 9, 2015 at 10:48 AM, Aljoscha Krettek aljos...@apache.org wrote: I would suggest we use this format to notify others that we did a task: Assignees: - Aljoscha: done - Ufuk: found bug in such an such... - Chiwan Park: done, ... The simple status doesn't work with multiple people on one task. On Tue, Jun 9, 2015 at 9:40 AM, Ufuk Celebi u...@apache.org wrote: Hey all, 1. it would be nice if we find more people to also do testing of the streaming API. I think it's especially good to have people on it, which did not use it before. 2. Just to make sure: the assignee field of each task is a list, i.e. we can and should have more people testing per task. ;-) – Ufuk On 08 Jun 2015, at 19:00, Chiwan Park chiwanp...@icloud.com wrote: Hi. I’m very excited about preparing a new major release. :) I just picked two tests. I will report status as soon as possible. Regards, Chiwan Park On Jun 9, 2015, at 1:52 AM, Maximilian Michels m...@apache.org wrote: Hi everyone! As previously discussed, the Flink developer community is very eager to get out a new major release. Apache Flink 0.9.0 will contain lots of new features and many bugfixes. This time, I'll try to coordinate the release process. Feel free to correct me if I'm doing something wrong because I don't no any better :) To release a great version of Flink to the public, I'd like to ask everyone to test the release candidate. Recently, Flink has received a lot of attention. The expectations are quite high. Only through thorough testing we will be able to satisfy all the Flink users out there. Below is a list from the Wiki that we use to ensure the legal and functional aspects of a release [1]. What I would like you to do is pick at least one of the tasks, put your name as assignee in the link below, and report back once you verified it. That way, I hope we can quickly and thoroughly test the release candidate. https://docs.google.com/document/d/1BhyMPTpAUYA8dG8-vJ3gSAmBUAa0PBSRkxIBPsZxkLs/edit Best, Max Git branch: release-0.9-rc1 Release binaries: http://people.apache.org/~mxm/flink-0.9.0-rc1/ Maven artifacts: https://repository.apache.org/content/repositories/orgapacheflink-1037/ PGP public key for verifying the signatures: http://pgp.mit.edu/pks/lookup?op=vindexsearch=0xDE976D18C2909CBF Legal L.1 Check if checksums and GPG files match the corresponding release files L.2 Verify that the source archives do NOT contains any binaries L.3 Check if the source release is building properly with Maven (including license header check (default) and checkstyle). Also the tests should be executed (mvn clean verify) L.4 Verify that the LICENSE and NOTICE file is correct for the binary and source release. L.5 All dependencies must be checked for their license and the license must be ASL 2.0 compatible ( http://www.apache.org/legal/resolved.html#category-x) * The LICENSE and NOTICE files in the root directory refer to dependencies in the source release, i.e., files in the git repository (such as fonts, css, JavaScript, images) * The LICENSE and NOTICE files in flink-dist/src/main/flink-bin refer to the binary distribution and mention all of Flink's Maven dependencies as well L.6 Check that all POM files point to the same version (mostly relevant to examine quickstart artifact files) L.7 Read the README.md file Functional F.1 Run the start-local.sh/start-local-streaming.sh, start-cluster.sh/start-cluster-streaming.sh, start-webclient.sh scripts and verify that the processes come up F.2 Examine the *.out files (should be empty) and the log files (should contain no exceptions) * Test for Linux, OS X, Windows (for Windows as far as possible, not all scripts exist) * Shutdown and verify there are no exceptions in the log output (after shutdown) * Check all start+submission scripts for paths with and without spaces (./bin/* scripts are quite fragile for paths with spaces) F.3 local mode (start-local.sh, see criteria below) F.4 cluster mode (start-cluster.sh, see criteria below) F.5 multi-node cluster (can simulate locally by starting two taskmanagers, see criteria below) Criteria for F.3 F.4 F.5 * Verify that the examples are running from both ./bin/flink and from the web-based job submission tool * flink-conf.yml should define more than one task slot * Results of job are produced and correct ** Check also that the examples are running with the build-in data and external sources. * Examine the log output - no error messages should be encountered ** Web interface shows progress and finished job in history F.6 Test on a cluster with HDFS. * Check that a good amount of input splits is read locally (JobManager log reveals local assignments) F.7 Test
Re: Testing Apache Flink 0.9.0-rc1
I also saw the same error on my third mvn clean verify run. Before it always failed in the YARN tests. On Tue, Jun 9, 2015 at 12:23 PM, Till Rohrmann trohrm...@apache.org wrote: I also encountered a failing TaskManagerFailsWithSlotSharingITCase using Java8. I could, however, not reproduce the error a second time. The stack trace is: The JobManager should handle hard failing task manager with slot sharing(org.apache.flink.runtime.jobmanager.TaskManagerFailsWithSlotSharingITCase) Time elapsed: 1,400.148 sec ERROR! java.util.concurrent.TimeoutException: Futures timed out after [20 milliseconds] at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219) at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:153) at scala.concurrent.Await$anonfun$ready$1.apply(package.scala:86) at scala.concurrent.Await$anonfun$ready$1.apply(package.scala:86) at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53) at scala.concurrent.Await$.ready(package.scala:86) at org.apache.flink.runtime.minicluster.FlinkMiniCluster.shutdown(FlinkMiniCluster.scala:162) at org.apache.flink.runtime.minicluster.FlinkMiniCluster.stop(FlinkMiniCluster.scala:149) at org.apache.flink.runtime.jobmanager.TaskManagerFailsWithSlotSharingITCase$anonfun$1$anonfun$apply$mcV$sp$3.apply$mcV$sp(TaskManagerFailsWithSlotSharingITCase.scala:140) at org.apache.flink.runtime.jobmanager.TaskManagerFailsWithSlotSharingITCase$anonfun$1$anonfun$apply$mcV$sp$3.apply(TaskManagerFailsWithSlotSharingITCase.scala:95) at org.apache.flink.runtime.jobmanager.TaskManagerFailsWithSlotSharingITCase$anonfun$1$anonfun$apply$mcV$sp$3.apply(TaskManagerFailsWithSlotSharingITCase.scala:95) at org.scalatest.Transformer$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22) at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85) at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) at org.scalatest.Transformer.apply(Transformer.scala:22) at org.scalatest.Transformer.apply(Transformer.scala:20) at org.scalatest.WordSpecLike$anon$1.apply(WordSpecLike.scala:953) at org.scalatest.Suite$class.withFixture(Suite.scala:1122) at org.apache.flink.runtime.jobmanager.TaskManagerFailsWithSlotSharingITCase.withFixture(TaskManagerFailsWithSlotSharingITCase.scala:36) Results : Tests in error: TaskManagerFailsWithSlotSharingITCase.run:36-org$scalatest$BeforeAndAfterAll$super$run:36-org$scalatest$WordSpecLike$super$run:36-runTests:36-runTest:36-withFixture:36 » Timeout On Tue, Jun 9, 2015 at 11:26 AM Maximilian Michels m...@apache.org http://mailto:m...@apache.org wrote: The name of the Git branch was not correct. Thank you, Aljoscha, for noticing. I've changed it from release-0.9-rc1 to release-0.9.0-rc1. This has no affect on the validity of the release candidate.
Re: Testing Apache Flink 0.9.0-rc1
On my local machine, several flink runtime tests are failing on mvn clean verify. Here is the log output: http://pastebin.com/raw.php?i=VWbx2ppf -- Sachin On Tue, Jun 9, 2015 at 4:04 PM, Aljoscha Krettek aljos...@apache.org wrote: I also saw the same error on my third mvn clean verify run. Before it always failed in the YARN tests. On Tue, Jun 9, 2015 at 12:23 PM, Till Rohrmann trohrm...@apache.org wrote: I also encountered a failing TaskManagerFailsWithSlotSharingITCase using Java8. I could, however, not reproduce the error a second time. The stack trace is: The JobManager should handle hard failing task manager with slot sharing(org.apache.flink.runtime.jobmanager.TaskManagerFailsWithSlotSharingITCase) Time elapsed: 1,400.148 sec ERROR! java.util.concurrent.TimeoutException: Futures timed out after [20 milliseconds] at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219) at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:153) at scala.concurrent.Await$anonfun$ready$1.apply(package.scala:86) at scala.concurrent.Await$anonfun$ready$1.apply(package.scala:86) at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53) at scala.concurrent.Await$.ready(package.scala:86) at org.apache.flink.runtime.minicluster.FlinkMiniCluster.shutdown(FlinkMiniCluster.scala:162) at org.apache.flink.runtime.minicluster.FlinkMiniCluster.stop(FlinkMiniCluster.scala:149) at org.apache.flink.runtime.jobmanager.TaskManagerFailsWithSlotSharingITCase$anonfun$1$anonfun$apply$mcV$sp$3.apply$mcV$sp(TaskManagerFailsWithSlotSharingITCase.scala:140) at org.apache.flink.runtime.jobmanager.TaskManagerFailsWithSlotSharingITCase$anonfun$1$anonfun$apply$mcV$sp$3.apply(TaskManagerFailsWithSlotSharingITCase.scala:95) at org.apache.flink.runtime.jobmanager.TaskManagerFailsWithSlotSharingITCase$anonfun$1$anonfun$apply$mcV$sp$3.apply(TaskManagerFailsWithSlotSharingITCase.scala:95) at org.scalatest.Transformer$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22) at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85) at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) at org.scalatest.Transformer.apply(Transformer.scala:22) at org.scalatest.Transformer.apply(Transformer.scala:20) at org.scalatest.WordSpecLike$anon$1.apply(WordSpecLike.scala:953) at org.scalatest.Suite$class.withFixture(Suite.scala:1122) at org.apache.flink.runtime.jobmanager.TaskManagerFailsWithSlotSharingITCase.withFixture(TaskManagerFailsWithSlotSharingITCase.scala:36) Results : Tests in error: TaskManagerFailsWithSlotSharingITCase.run:36-org$scalatest$BeforeAndAfterAll$super$run:36-org$scalatest$WordSpecLike$super$run:36-runTests:36-runTest:36-withFixture:36 » Timeout On Tue, Jun 9, 2015 at 11:26 AM Maximilian Michels m...@apache.org http://mailto:m...@apache.org wrote: The name of the Git branch was not correct. Thank you, Aljoscha, for noticing. I've changed it from release-0.9-rc1 to release-0.9.0-rc1. This has no affect on the validity of the release candidate.
Re: Testing Apache Flink 0.9.0-rc1
I did five mvn clean verify runs by now. All of them failed. One with the TaskmanagerFailsWithSlotSharingITCase and the other ones with YARNSessionFIFOITCase On Tue, Jun 9, 2015 at 12:34 PM, Aljoscha Krettek aljos...@apache.org wrote: I also saw the same error on my third mvn clean verify run. Before it always failed in the YARN tests. On Tue, Jun 9, 2015 at 12:23 PM, Till Rohrmann trohrm...@apache.org wrote: I also encountered a failing TaskManagerFailsWithSlotSharingITCase using Java8. I could, however, not reproduce the error a second time. The stack trace is: The JobManager should handle hard failing task manager with slot sharing(org.apache.flink.runtime.jobmanager.TaskManagerFailsWithSlotSharingITCase) Time elapsed: 1,400.148 sec ERROR! java.util.concurrent.TimeoutException: Futures timed out after [20 milliseconds] at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219) at scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:153) at scala.concurrent.Await$anonfun$ready$1.apply(package.scala:86) at scala.concurrent.Await$anonfun$ready$1.apply(package.scala:86) at scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53) at scala.concurrent.Await$.ready(package.scala:86) at org.apache.flink.runtime.minicluster.FlinkMiniCluster.shutdown(FlinkMiniCluster.scala:162) at org.apache.flink.runtime.minicluster.FlinkMiniCluster.stop(FlinkMiniCluster.scala:149) at org.apache.flink.runtime.jobmanager.TaskManagerFailsWithSlotSharingITCase$anonfun$1$anonfun$apply$mcV$sp$3.apply$mcV$sp(TaskManagerFailsWithSlotSharingITCase.scala:140) at org.apache.flink.runtime.jobmanager.TaskManagerFailsWithSlotSharingITCase$anonfun$1$anonfun$apply$mcV$sp$3.apply(TaskManagerFailsWithSlotSharingITCase.scala:95) at org.apache.flink.runtime.jobmanager.TaskManagerFailsWithSlotSharingITCase$anonfun$1$anonfun$apply$mcV$sp$3.apply(TaskManagerFailsWithSlotSharingITCase.scala:95) at org.scalatest.Transformer$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22) at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85) at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) at org.scalatest.Transformer.apply(Transformer.scala:22) at org.scalatest.Transformer.apply(Transformer.scala:20) at org.scalatest.WordSpecLike$anon$1.apply(WordSpecLike.scala:953) at org.scalatest.Suite$class.withFixture(Suite.scala:1122) at org.apache.flink.runtime.jobmanager.TaskManagerFailsWithSlotSharingITCase.withFixture(TaskManagerFailsWithSlotSharingITCase.scala:36) Results : Tests in error: TaskManagerFailsWithSlotSharingITCase.run:36-org$scalatest$BeforeAndAfterAll$super$run:36-org$scalatest$WordSpecLike$super$run:36-runTests:36-runTest:36-withFixture:36 » Timeout On Tue, Jun 9, 2015 at 11:26 AM Maximilian Michels m...@apache.org http://mailto:m...@apache.org wrote: The name of the Git branch was not correct. Thank you, Aljoscha, for noticing. I've changed it from release-0.9-rc1 to release-0.9.0-rc1. This has no affect on the validity of the release candidate.
Re: Testing Apache Flink 0.9.0-rc1
On 09 Jun 2015, at 13:58, Sachin Goel sachingoel0...@gmail.com wrote: On my local machine, several flink runtime tests are failing on mvn clean verify. Here is the log output: http://pastebin.com/raw.php?i=VWbx2ppf Thanks for reporting this. Have you tried it multiple times? Is it failing reproducibly with the same tests? What's your setup? – Ufuk
Re: Testing Apache Flink 0.9.0-rc1
I discovered something that might be a feature, rather than a bug. When you submit an example using the web client without giving parameters the program fails with this: org.apache.flink.client.program.ProgramInvocationException: The main method caused an error. at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:452) at org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:353) at org.apache.flink.client.program.Client.run(Client.java:315) at org.apache.flink.client.web.JobSubmissionServlet.doGet(JobSubmissionServlet.java:302) at javax.servlet.http.HttpServlet.service(HttpServlet.java:668) at javax.servlet.http.HttpServlet.service(HttpServlet.java:770) at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:532) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:453) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:227) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:965) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:388) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:187) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:901) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:117) at org.eclipse.jetty.server.handler.HandlerList.handle(HandlerList.java:47) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:113) at org.eclipse.jetty.server.Server.handle(Server.java:352) at org.eclipse.jetty.server.HttpConnection.handleRequest(HttpConnection.java:596) at org.eclipse.jetty.server.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:1048) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:549) at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:211) at org.eclipse.jetty.server.HttpConnection.handle(HttpConnection.java:425) at org.eclipse.jetty.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:489) at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:436) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.NullPointerException at org.apache.flink.api.common.JobExecutionResult.getAccumulatorResult(JobExecutionResult.java:78) at org.apache.flink.api.java.DataSet.collect(DataSet.java:409) at org.apache.flink.api.java.DataSet.print(DataSet.java:1345) at org.apache.flink.examples.java.wordcount.WordCount.main(WordCount.java:80) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:437) ... 24 more This also only occurs when you uncheck the suspend execution while showing plan. I think this arises because the new print() uses collect() which tries to get the job execution result. I guess the result is Null since the job is submitted asynchronously when the checkbox is unchecked. Other than that, the new print() is pretty sweet when you run the builtin examples from the CLI. You get all the state changes and also the result, even when running in cluster mode on several task managers. :D On Tue, Jun 9, 2015 at 3:41 PM, Aljoscha Krettek aljos...@apache.org wrote: I discovered another problem: https://issues.apache.org/jira/browse/FLINK-2191 The closure cleaner cannot be disabled in part of the Streaming Java API and all of the Streaming Scala API. I think this is a release blocker (in addition the the other bugs found so far.) On Tue, Jun 9, 2015 at 2:35 PM, Aljoscha Krettek aljos...@apache.org wrote: I found the bug in the failing YARNSessionFIFOITCase: It was comparing the hostname to a hostname in some yarn config. In one case it was capitalised, in the other case it wasn't. Pushing fix to master and release-0.9 branch. On Tue, Jun 9, 2015 at 2:18 PM, Sachin Goel sachingoel0...@gmail.com wrote: A re-ran lead to reproducibility of 11 failures again. TaskManagerTest.testSubmitAndExecuteTask was failing with a time-out but managed to succeed in a re-run. Here is the log output again: http://pastebin.com/raw.php?i=N4cm1J18 Setup: JDK 1.8.0_40 on windows 8.1 System memory: 8GB, quad-core with maximum 8 threads. Regards Sachin Goel On Tue, Jun 9, 2015 at 5:34 PM, Ufuk Celebi u...@apache.org wrote: On 09 Jun 2015, at 13:58, Sachin Goel sachingoel0...@gmail.com wrote: On my local machine, several flink runtime tests are failing on mvn clean verify. Here is the log output: http://pastebin.com/raw.php?i=VWbx2ppf Thanks for reporting this. Have you tried it multiple times? Is it failing reproducibly
Re: Testing Apache Flink 0.9.0-rc1
I attached jps and jstack log about hanging TaskManagerFailsWithSlotSharingITCase to JIRA FLINK-2183. Regards, Chiwan Park On Jun 10, 2015, at 12:28 AM, Aljoscha Krettek aljos...@apache.org wrote: I discovered something that might be a feature, rather than a bug. When you submit an example using the web client without giving parameters the program fails with this: org.apache.flink.client.program.ProgramInvocationException: The main method caused an error. at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:452) at org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:353) at org.apache.flink.client.program.Client.run(Client.java:315) at org.apache.flink.client.web.JobSubmissionServlet.doGet(JobSubmissionServlet.java:302) at javax.servlet.http.HttpServlet.service(HttpServlet.java:668) at javax.servlet.http.HttpServlet.service(HttpServlet.java:770) at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:532) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:453) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:227) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:965) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:388) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:187) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:901) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:117) at org.eclipse.jetty.server.handler.HandlerList.handle(HandlerList.java:47) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:113) at org.eclipse.jetty.server.Server.handle(Server.java:352) at org.eclipse.jetty.server.HttpConnection.handleRequest(HttpConnection.java:596) at org.eclipse.jetty.server.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:1048) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:549) at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:211) at org.eclipse.jetty.server.HttpConnection.handle(HttpConnection.java:425) at org.eclipse.jetty.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:489) at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:436) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.NullPointerException at org.apache.flink.api.common.JobExecutionResult.getAccumulatorResult(JobExecutionResult.java:78) at org.apache.flink.api.java.DataSet.collect(DataSet.java:409) at org.apache.flink.api.java.DataSet.print(DataSet.java:1345) at org.apache.flink.examples.java.wordcount.WordCount.main(WordCount.java:80) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:437) ... 24 more This also only occurs when you uncheck the suspend execution while showing plan. I think this arises because the new print() uses collect() which tries to get the job execution result. I guess the result is Null since the job is submitted asynchronously when the checkbox is unchecked. Other than that, the new print() is pretty sweet when you run the builtin examples from the CLI. You get all the state changes and also the result, even when running in cluster mode on several task managers. :D On Tue, Jun 9, 2015 at 3:41 PM, Aljoscha Krettek aljos...@apache.org wrote: I discovered another problem: https://issues.apache.org/jira/browse/FLINK-2191 The closure cleaner cannot be disabled in part of the Streaming Java API and all of the Streaming Scala API. I think this is a release blocker (in addition the the other bugs found so far.) On Tue, Jun 9, 2015 at 2:35 PM, Aljoscha Krettek aljos...@apache.org wrote: I found the bug in the failing YARNSessionFIFOITCase: It was comparing the hostname to a hostname in some yarn config. In one case it was capitalised, in the other case it wasn't. Pushing fix to master and release-0.9 branch. On Tue, Jun 9, 2015 at 2:18 PM, Sachin Goel sachingoel0...@gmail.com wrote: A re-ran lead to reproducibility of 11 failures again. TaskManagerTest.testSubmitAndExecuteTask was failing with a time-out but managed to succeed in a re-run. Here is the log output again: http://pastebin.com/raw.php?i=N4cm1J18 Setup: JDK 1.8.0_40 on windows 8.1 System memory: 8GB, quad-core with maximum 8 threads. Regards Sachin Goel On Tue, Jun 9, 2015 at 5:34 PM, Ufuk Celebi u...@apache.org wrote: On 09 Jun 2015, at
Re: Testing Apache Flink 0.9.0-rc1
While looking into FLINK-2188 (HBase input) I've discovered that Hadoop input formats implementing Configurable (like mapreduce.TableInputFormat) don't have the Hadoop configuration set via setConf(Configuration). I have a small fix for this, which I have to clean up. First, I wanted to check what you think about this issue wrt the release. Personally, I think this is a release blocker, because it essentially means that no Hadoop input format, which relies on the Configuration instance to be set this way will work (this is to some extent a bug of the respective input formats) – most notably the HBase TableInputFormat. – Ufuk On 09 Jun 2015, at 18:07, Chiwan Park chiwanp...@icloud.com wrote: I attached jps and jstack log about hanging TaskManagerFailsWithSlotSharingITCase to JIRA FLINK-2183. Regards, Chiwan Park On Jun 10, 2015, at 12:28 AM, Aljoscha Krettek aljos...@apache.org wrote: I discovered something that might be a feature, rather than a bug. When you submit an example using the web client without giving parameters the program fails with this: org.apache.flink.client.program.ProgramInvocationException: The main method caused an error. at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:452) at org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:353) at org.apache.flink.client.program.Client.run(Client.java:315) at org.apache.flink.client.web.JobSubmissionServlet.doGet(JobSubmissionServlet.java:302) at javax.servlet.http.HttpServlet.service(HttpServlet.java:668) at javax.servlet.http.HttpServlet.service(HttpServlet.java:770) at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:532) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:453) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:227) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:965) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:388) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:187) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:901) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:117) at org.eclipse.jetty.server.handler.HandlerList.handle(HandlerList.java:47) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:113) at org.eclipse.jetty.server.Server.handle(Server.java:352) at org.eclipse.jetty.server.HttpConnection.handleRequest(HttpConnection.java:596) at org.eclipse.jetty.server.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:1048) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:549) at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:211) at org.eclipse.jetty.server.HttpConnection.handle(HttpConnection.java:425) at org.eclipse.jetty.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:489) at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:436) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.NullPointerException at org.apache.flink.api.common.JobExecutionResult.getAccumulatorResult(JobExecutionResult.java:78) at org.apache.flink.api.java.DataSet.collect(DataSet.java:409) at org.apache.flink.api.java.DataSet.print(DataSet.java:1345) at org.apache.flink.examples.java.wordcount.WordCount.main(WordCount.java:80) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:497) at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:437) ... 24 more This also only occurs when you uncheck the suspend execution while showing plan. I think this arises because the new print() uses collect() which tries to get the job execution result. I guess the result is Null since the job is submitted asynchronously when the checkbox is unchecked. Other than that, the new print() is pretty sweet when you run the builtin examples from the CLI. You get all the state changes and also the result, even when running in cluster mode on several task managers. :D On Tue, Jun 9, 2015 at 3:41 PM, Aljoscha Krettek aljos...@apache.org wrote: I discovered another problem: https://issues.apache.org/jira/browse/FLINK-2191 The closure cleaner cannot be disabled in part of the Streaming Java API and all of the Streaming Scala API. I think this is a release blocker (in addition the the other bugs found so far.) On Tue, Jun 9, 2015 at 2:35 PM, Aljoscha Krettek aljos...@apache.org wrote: I found the bug in the failing
Re: Testing Apache Flink 0.9.0-rc1
A re-ran lead to reproducibility of 11 failures again. TaskManagerTest.testSubmitAndExecuteTask was failing with a time-out but managed to succeed in a re-run. Here is the log output again: http://pastebin.com/raw.php?i=N4cm1J18 Setup: JDK 1.8.0_40 on windows 8.1 System memory: 8GB, quad-core with maximum 8 threads. Regards Sachin Goel On Tue, Jun 9, 2015 at 5:34 PM, Ufuk Celebi u...@apache.org wrote: On 09 Jun 2015, at 13:58, Sachin Goel sachingoel0...@gmail.com wrote: On my local machine, several flink runtime tests are failing on mvn clean verify. Here is the log output: http://pastebin.com/raw.php?i=VWbx2ppf Thanks for reporting this. Have you tried it multiple times? Is it failing reproducibly with the same tests? What's your setup? – Ufuk
Re: Testing Apache Flink 0.9.0-rc1
I found the bug in the failing YARNSessionFIFOITCase: It was comparing the hostname to a hostname in some yarn config. In one case it was capitalised, in the other case it wasn't. Pushing fix to master and release-0.9 branch. On Tue, Jun 9, 2015 at 2:18 PM, Sachin Goel sachingoel0...@gmail.com wrote: A re-ran lead to reproducibility of 11 failures again. TaskManagerTest.testSubmitAndExecuteTask was failing with a time-out but managed to succeed in a re-run. Here is the log output again: http://pastebin.com/raw.php?i=N4cm1J18 Setup: JDK 1.8.0_40 on windows 8.1 System memory: 8GB, quad-core with maximum 8 threads. Regards Sachin Goel On Tue, Jun 9, 2015 at 5:34 PM, Ufuk Celebi u...@apache.org wrote: On 09 Jun 2015, at 13:58, Sachin Goel sachingoel0...@gmail.com wrote: On my local machine, several flink runtime tests are failing on mvn clean verify. Here is the log output: http://pastebin.com/raw.php?i=VWbx2ppf Thanks for reporting this. Have you tried it multiple times? Is it failing reproducibly with the same tests? What's your setup? – Ufuk
Re: Testing Apache Flink 0.9.0-rc1
I discovered another problem: https://issues.apache.org/jira/browse/FLINK-2191 The closure cleaner cannot be disabled in part of the Streaming Java API and all of the Streaming Scala API. I think this is a release blocker (in addition the the other bugs found so far.) On Tue, Jun 9, 2015 at 2:35 PM, Aljoscha Krettek aljos...@apache.org wrote: I found the bug in the failing YARNSessionFIFOITCase: It was comparing the hostname to a hostname in some yarn config. In one case it was capitalised, in the other case it wasn't. Pushing fix to master and release-0.9 branch. On Tue, Jun 9, 2015 at 2:18 PM, Sachin Goel sachingoel0...@gmail.com wrote: A re-ran lead to reproducibility of 11 failures again. TaskManagerTest.testSubmitAndExecuteTask was failing with a time-out but managed to succeed in a re-run. Here is the log output again: http://pastebin.com/raw.php?i=N4cm1J18 Setup: JDK 1.8.0_40 on windows 8.1 System memory: 8GB, quad-core with maximum 8 threads. Regards Sachin Goel On Tue, Jun 9, 2015 at 5:34 PM, Ufuk Celebi u...@apache.org wrote: On 09 Jun 2015, at 13:58, Sachin Goel sachingoel0...@gmail.com wrote: On my local machine, several flink runtime tests are failing on mvn clean verify. Here is the log output: http://pastebin.com/raw.php?i=VWbx2ppf Thanks for reporting this. Have you tried it multiple times? Is it failing reproducibly with the same tests? What's your setup? – Ufuk
Re: Testing Apache Flink 0.9.0-rc1
Hi. I have a problem running `mvn clean verify` command. TaskManagerFailsWithSlotSharingITCase hangs in Oracle JDK 7 (1.7.0_80). But in Oracle JDK 8 the test case doesn’t hang. I’ve investigated about this problem but I cannot found the bug. Regards, Chiwan Park On Jun 9, 2015, at 2:11 AM, Márton Balassi balassi.mar...@gmail.com wrote: Added F7 Running against Kafka cluster for me in the doc. Doing it tomorrow. On Mon, Jun 8, 2015 at 7:00 PM, Chiwan Park chiwanp...@icloud.com wrote: Hi. I’m very excited about preparing a new major release. :) I just picked two tests. I will report status as soon as possible. Regards, Chiwan Park On Jun 9, 2015, at 1:52 AM, Maximilian Michels m...@apache.org wrote: Hi everyone! As previously discussed, the Flink developer community is very eager to get out a new major release. Apache Flink 0.9.0 will contain lots of new features and many bugfixes. This time, I'll try to coordinate the release process. Feel free to correct me if I'm doing something wrong because I don't no any better :) To release a great version of Flink to the public, I'd like to ask everyone to test the release candidate. Recently, Flink has received a lot of attention. The expectations are quite high. Only through thorough testing we will be able to satisfy all the Flink users out there. Below is a list from the Wiki that we use to ensure the legal and functional aspects of a release [1]. What I would like you to do is pick at least one of the tasks, put your name as assignee in the link below, and report back once you verified it. That way, I hope we can quickly and thoroughly test the release candidate. https://docs.google.com/document/d/1BhyMPTpAUYA8dG8-vJ3gSAmBUAa0PBSRkxIBPsZxkLs/edit Best, Max Git branch: release-0.9-rc1 Release binaries: http://people.apache.org/~mxm/flink-0.9.0-rc1/ Maven artifacts: https://repository.apache.org/content/repositories/orgapacheflink-1037/ PGP public key for verifying the signatures: http://pgp.mit.edu/pks/lookup?op=vindexsearch=0xDE976D18C2909CBF Legal L.1 Check if checksums and GPG files match the corresponding release files L.2 Verify that the source archives do NOT contains any binaries L.3 Check if the source release is building properly with Maven (including license header check (default) and checkstyle). Also the tests should be executed (mvn clean verify) L.4 Verify that the LICENSE and NOTICE file is correct for the binary and source release. L.5 All dependencies must be checked for their license and the license must be ASL 2.0 compatible ( http://www.apache.org/legal/resolved.html#category-x) * The LICENSE and NOTICE files in the root directory refer to dependencies in the source release, i.e., files in the git repository (such as fonts, css, JavaScript, images) * The LICENSE and NOTICE files in flink-dist/src/main/flink-bin refer to the binary distribution and mention all of Flink's Maven dependencies as well L.6 Check that all POM files point to the same version (mostly relevant to examine quickstart artifact files) L.7 Read the README.md file Functional F.1 Run the start-local.sh/start-local-streaming.sh, start-cluster.sh/start-cluster-streaming.sh, start-webclient.sh scripts and verify that the processes come up F.2 Examine the *.out files (should be empty) and the log files (should contain no exceptions) * Test for Linux, OS X, Windows (for Windows as far as possible, not all scripts exist) * Shutdown and verify there are no exceptions in the log output (after shutdown) * Check all start+submission scripts for paths with and without spaces (./bin/* scripts are quite fragile for paths with spaces) F.3 local mode (start-local.sh, see criteria below) F.4 cluster mode (start-cluster.sh, see criteria below) F.5 multi-node cluster (can simulate locally by starting two taskmanagers, see criteria below) Criteria for F.3 F.4 F.5 * Verify that the examples are running from both ./bin/flink and from the web-based job submission tool * flink-conf.yml should define more than one task slot * Results of job are produced and correct ** Check also that the examples are running with the build-in data and external sources. * Examine the log output - no error messages should be encountered ** Web interface shows progress and finished job in history F.6 Test on a cluster with HDFS. * Check that a good amount of input splits is read locally (JobManager log reveals local assignments) F.7 Test against a Kafka installation F.8 Test the ./bin/flink command line client * Test info option, paste the JSON into the plan visualizer HTML file, check that plan is rendered * Test the parallelism flag (-p) to override the configured default parallelism F.9 Verify the plan visualizer with different browsers/operating systems F.10 Verify that the quickstarts for scala and
Re: Testing Apache Flink 0.9.0-rc1
Hi. I’m very excited about preparing a new major release. :) I just picked two tests. I will report status as soon as possible. Regards, Chiwan Park On Jun 9, 2015, at 1:52 AM, Maximilian Michels m...@apache.org wrote: Hi everyone! As previously discussed, the Flink developer community is very eager to get out a new major release. Apache Flink 0.9.0 will contain lots of new features and many bugfixes. This time, I'll try to coordinate the release process. Feel free to correct me if I'm doing something wrong because I don't no any better :) To release a great version of Flink to the public, I'd like to ask everyone to test the release candidate. Recently, Flink has received a lot of attention. The expectations are quite high. Only through thorough testing we will be able to satisfy all the Flink users out there. Below is a list from the Wiki that we use to ensure the legal and functional aspects of a release [1]. What I would like you to do is pick at least one of the tasks, put your name as assignee in the link below, and report back once you verified it. That way, I hope we can quickly and thoroughly test the release candidate. https://docs.google.com/document/d/1BhyMPTpAUYA8dG8-vJ3gSAmBUAa0PBSRkxIBPsZxkLs/edit Best, Max Git branch: release-0.9-rc1 Release binaries: http://people.apache.org/~mxm/flink-0.9.0-rc1/ Maven artifacts: https://repository.apache.org/content/repositories/orgapacheflink-1037/ PGP public key for verifying the signatures: http://pgp.mit.edu/pks/lookup?op=vindexsearch=0xDE976D18C2909CBF Legal L.1 Check if checksums and GPG files match the corresponding release files L.2 Verify that the source archives do NOT contains any binaries L.3 Check if the source release is building properly with Maven (including license header check (default) and checkstyle). Also the tests should be executed (mvn clean verify) L.4 Verify that the LICENSE and NOTICE file is correct for the binary and source release. L.5 All dependencies must be checked for their license and the license must be ASL 2.0 compatible (http://www.apache.org/legal/resolved.html#category-x) * The LICENSE and NOTICE files in the root directory refer to dependencies in the source release, i.e., files in the git repository (such as fonts, css, JavaScript, images) * The LICENSE and NOTICE files in flink-dist/src/main/flink-bin refer to the binary distribution and mention all of Flink's Maven dependencies as well L.6 Check that all POM files point to the same version (mostly relevant to examine quickstart artifact files) L.7 Read the README.md file Functional F.1 Run the start-local.sh/start-local-streaming.sh, start-cluster.sh/start-cluster-streaming.sh, start-webclient.sh scripts and verify that the processes come up F.2 Examine the *.out files (should be empty) and the log files (should contain no exceptions) * Test for Linux, OS X, Windows (for Windows as far as possible, not all scripts exist) * Shutdown and verify there are no exceptions in the log output (after shutdown) * Check all start+submission scripts for paths with and without spaces (./bin/* scripts are quite fragile for paths with spaces) F.3 local mode (start-local.sh, see criteria below) F.4 cluster mode (start-cluster.sh, see criteria below) F.5 multi-node cluster (can simulate locally by starting two taskmanagers, see criteria below) Criteria for F.3 F.4 F.5 * Verify that the examples are running from both ./bin/flink and from the web-based job submission tool * flink-conf.yml should define more than one task slot * Results of job are produced and correct ** Check also that the examples are running with the build-in data and external sources. * Examine the log output - no error messages should be encountered ** Web interface shows progress and finished job in history F.6 Test on a cluster with HDFS. * Check that a good amount of input splits is read locally (JobManager log reveals local assignments) F.7 Test against a Kafka installation F.8 Test the ./bin/flink command line client * Test info option, paste the JSON into the plan visualizer HTML file, check that plan is rendered * Test the parallelism flag (-p) to override the configured default parallelism F.9 Verify the plan visualizer with different browsers/operating systems F.10 Verify that the quickstarts for scala and java are working with the staging repository for both IntelliJ and Eclipse. * In particular the dependencies of the quickstart project need to be set correctly and the QS project needs to build from the staging repository (replace the snapshot repo URL with the staging repo URL) * The dependency tree of the QuickStart project must not contain any dependencies we shade away upstream (guava, netty, ...) F.11 Run examples on a YARN cluster F.12 Run all examples from the IDE (Eclipse IntelliJ) F.13 Run an
Re: Testing Apache Flink 0.9.0-rc1
Added F7 Running against Kafka cluster for me in the doc. Doing it tomorrow. On Mon, Jun 8, 2015 at 7:00 PM, Chiwan Park chiwanp...@icloud.com wrote: Hi. I’m very excited about preparing a new major release. :) I just picked two tests. I will report status as soon as possible. Regards, Chiwan Park On Jun 9, 2015, at 1:52 AM, Maximilian Michels m...@apache.org wrote: Hi everyone! As previously discussed, the Flink developer community is very eager to get out a new major release. Apache Flink 0.9.0 will contain lots of new features and many bugfixes. This time, I'll try to coordinate the release process. Feel free to correct me if I'm doing something wrong because I don't no any better :) To release a great version of Flink to the public, I'd like to ask everyone to test the release candidate. Recently, Flink has received a lot of attention. The expectations are quite high. Only through thorough testing we will be able to satisfy all the Flink users out there. Below is a list from the Wiki that we use to ensure the legal and functional aspects of a release [1]. What I would like you to do is pick at least one of the tasks, put your name as assignee in the link below, and report back once you verified it. That way, I hope we can quickly and thoroughly test the release candidate. https://docs.google.com/document/d/1BhyMPTpAUYA8dG8-vJ3gSAmBUAa0PBSRkxIBPsZxkLs/edit Best, Max Git branch: release-0.9-rc1 Release binaries: http://people.apache.org/~mxm/flink-0.9.0-rc1/ Maven artifacts: https://repository.apache.org/content/repositories/orgapacheflink-1037/ PGP public key for verifying the signatures: http://pgp.mit.edu/pks/lookup?op=vindexsearch=0xDE976D18C2909CBF Legal L.1 Check if checksums and GPG files match the corresponding release files L.2 Verify that the source archives do NOT contains any binaries L.3 Check if the source release is building properly with Maven (including license header check (default) and checkstyle). Also the tests should be executed (mvn clean verify) L.4 Verify that the LICENSE and NOTICE file is correct for the binary and source release. L.5 All dependencies must be checked for their license and the license must be ASL 2.0 compatible ( http://www.apache.org/legal/resolved.html#category-x) * The LICENSE and NOTICE files in the root directory refer to dependencies in the source release, i.e., files in the git repository (such as fonts, css, JavaScript, images) * The LICENSE and NOTICE files in flink-dist/src/main/flink-bin refer to the binary distribution and mention all of Flink's Maven dependencies as well L.6 Check that all POM files point to the same version (mostly relevant to examine quickstart artifact files) L.7 Read the README.md file Functional F.1 Run the start-local.sh/start-local-streaming.sh, start-cluster.sh/start-cluster-streaming.sh, start-webclient.sh scripts and verify that the processes come up F.2 Examine the *.out files (should be empty) and the log files (should contain no exceptions) * Test for Linux, OS X, Windows (for Windows as far as possible, not all scripts exist) * Shutdown and verify there are no exceptions in the log output (after shutdown) * Check all start+submission scripts for paths with and without spaces (./bin/* scripts are quite fragile for paths with spaces) F.3 local mode (start-local.sh, see criteria below) F.4 cluster mode (start-cluster.sh, see criteria below) F.5 multi-node cluster (can simulate locally by starting two taskmanagers, see criteria below) Criteria for F.3 F.4 F.5 * Verify that the examples are running from both ./bin/flink and from the web-based job submission tool * flink-conf.yml should define more than one task slot * Results of job are produced and correct ** Check also that the examples are running with the build-in data and external sources. * Examine the log output - no error messages should be encountered ** Web interface shows progress and finished job in history F.6 Test on a cluster with HDFS. * Check that a good amount of input splits is read locally (JobManager log reveals local assignments) F.7 Test against a Kafka installation F.8 Test the ./bin/flink command line client * Test info option, paste the JSON into the plan visualizer HTML file, check that plan is rendered * Test the parallelism flag (-p) to override the configured default parallelism F.9 Verify the plan visualizer with different browsers/operating systems F.10 Verify that the quickstarts for scala and java are working with the staging repository for both IntelliJ and Eclipse. * In particular the dependencies of the quickstart project need to be set correctly and the QS project needs to build from the staging repository (replace the snapshot repo URL with the