[jira] [Commented] (FLINK-1396) Add hadoop input formats directly to the user API.
[ https://issues.apache.org/jira/browse/FLINK-1396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14307154#comment-14307154 ] ASF GitHub Bot commented on FLINK-1396: --- Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/363#issuecomment-73040193 Does this occur during local execution, or collection execution? The dependencies are not covered by the runtime dependencies? Add hadoop input formats directly to the user API. -- Key: FLINK-1396 URL: https://issues.apache.org/jira/browse/FLINK-1396 Project: Flink Issue Type: Bug Reporter: Robert Metzger Assignee: Aljoscha Krettek -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1396) Add hadoop input formats directly to the user API.
[ https://issues.apache.org/jira/browse/FLINK-1396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14307162#comment-14307162 ] ASF GitHub Bot commented on FLINK-1396: --- Github user aljoscha commented on the pull request: https://github.com/apache/flink/pull/363#issuecomment-73040767 I think if executing it in an IDE the dependencies are not there. Since flink-java does not depend on flink-runtime, which has the hadoop dependencies. Add hadoop input formats directly to the user API. -- Key: FLINK-1396 URL: https://issues.apache.org/jira/browse/FLINK-1396 Project: Flink Issue Type: Bug Reporter: Robert Metzger Assignee: Aljoscha Krettek -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1396][FLINK-1303] Hadoop Input/Output d...
Github user aljoscha commented on the pull request: https://github.com/apache/flink/pull/363#issuecomment-73040767 I think if executing it in an IDE the dependencies are not there. Since flink-java does not depend on flink-runtime, which has the hadoop dependencies. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-883) Use MiniYARNCluster to test the Flink YARN client
[ https://issues.apache.org/jira/browse/FLINK-883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14307210#comment-14307210 ] ASF GitHub Bot commented on FLINK-883: -- Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/134#issuecomment-73046443 I guess this one is subsumed by @rmetzger 's work on https://issues.apache.org/jira/browse/FLINK-883 Can we close this PR? Use MiniYARNCluster to test the Flink YARN client - Key: FLINK-883 URL: https://issues.apache.org/jira/browse/FLINK-883 Project: Flink Issue Type: Improvement Components: YARN Client Affects Versions: 0.7.0-incubating Reporter: Robert Metzger Assignee: Sebastian Kunert Labels: github-import Fix For: 0.9 I would like to have a test that verifies that our YARN client is properly working, using the `MiniYARNCluster` class of YARN. Imported from GitHub Url: https://github.com/stratosphere/stratosphere/issues/883 Created by: [rmetzger|https://github.com/rmetzger] Labels: enhancement, testing, YARN, Created at: Thu May 29 09:13:06 CEST 2014 State: open -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1376] [runtime] Add proper shared slot ...
Github user tillrohrmann commented on the pull request: https://github.com/apache/flink/pull/317#issuecomment-73046225 Of course. Thanks for merging. On Thu, Feb 5, 2015 at 2:30 PM, Stephan Ewen notificati...@github.com wrote: This one has been manually merged (but I made a typo in the This closes #xxx message ;-) ) @tillrohrmann https://github.com/tillrohrmann Can you close the PR manually? â Reply to this email directly or view it on GitHub https://github.com/apache/flink/pull/317#issuecomment-73045884. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Updated] (FLINK-1463) RuntimeStatefulSerializerFactory declares ClassLoader as transient but later tries to use it
[ https://issues.apache.org/jira/browse/FLINK-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aljoscha Krettek updated FLINK-1463: Fix Version/s: 0.8.1 RuntimeStatefulSerializerFactory declares ClassLoader as transient but later tries to use it Key: FLINK-1463 URL: https://issues.apache.org/jira/browse/FLINK-1463 Project: Flink Issue Type: Bug Affects Versions: 0.8, 0.9 Reporter: Aljoscha Krettek Assignee: Aljoscha Krettek Priority: Blocker Fix For: 0.8.1 At least one user has seen an exception because of this. In theory, the ClassLoader is set again in readParametersFromConfig. But the way it is used in TupleComparatorBase, this method is never called. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1471) Allow KeySelectors to implement ResultTypeQueryable
[ https://issues.apache.org/jira/browse/FLINK-1471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14307236#comment-14307236 ] ASF GitHub Bot commented on FLINK-1471: --- Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/359 Allow KeySelectors to implement ResultTypeQueryable --- Key: FLINK-1471 URL: https://issues.apache.org/jira/browse/FLINK-1471 Project: Flink Issue Type: Bug Components: Java API Affects Versions: 0.9 Reporter: Robert Metzger Assignee: Timo Walther Fix For: 0.9 See https://github.com/apache/flink/pull/354 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (FLINK-1442) Archived Execution Graph consumes too much memory
[ https://issues.apache.org/jira/browse/FLINK-1442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephan Ewen resolved FLINK-1442. - Resolution: Fixed Fix Version/s: 0.9 Fixed via 9d181a86a0870204113271b6e45f611cba04fc7d and 8ae0dc2d768aecfa3129df553f43d827792b65d7 Archived Execution Graph consumes too much memory - Key: FLINK-1442 URL: https://issues.apache.org/jira/browse/FLINK-1442 Project: Flink Issue Type: Bug Components: JobManager Affects Versions: 0.9 Reporter: Stephan Ewen Assignee: Max Michels Fix For: 0.9 The JobManager archives the execution graphs, for analysis of jobs. The graphs may consume a lot of memory. Especially the execution edges in all2all connection patterns are extremely many and add up in memory consumption. The execution edges connect all parallel tasks. So for a all2all pattern between n and m tasks, there are n*m edges. For parallelism of multiple 100 tasks, this can easily reach 100k objects and more, each with a set of metadata. I propose the following to solve that: 1. Clear all execution edges from the graph (majority of the memory consumers) when it is given to the archiver. 2. Have the map/list of the archived graphs behind a soft reference, to it will be removed under memory pressure before the JVM crashes. That may remove graphs from the history early, but is much preferable to the JVM crashing, in which case the graph is lost as well... 3. Long term: The graph should be archived somewhere else. Somthing like the History server used by Hadoop and Hive would be a good idea. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [docs] add documentation for FLink GCE setup u...
Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/361 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1479) The spawned threads in the sorter have no context class loader
[ https://issues.apache.org/jira/browse/FLINK-1479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14307189#comment-14307189 ] Stephan Ewen commented on FLINK-1479: - Patch coming up... The spawned threads in the sorter have no context class loader -- Key: FLINK-1479 URL: https://issues.apache.org/jira/browse/FLINK-1479 Project: Flink Issue Type: Bug Components: Local Runtime Affects Versions: 0.9 Reporter: Stephan Ewen Fix For: 0.9 The context class loader of task threads is the user code class loader that has access to the libraries of the user program. The sorter spawns extra threads (reading, sorting, spilling) without setting the context class loader on those threads. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-1479) The spawned threads in the sorter have no context class loader
Stephan Ewen created FLINK-1479: --- Summary: The spawned threads in the sorter have no context class loader Key: FLINK-1479 URL: https://issues.apache.org/jira/browse/FLINK-1479 Project: Flink Issue Type: Bug Components: Local Runtime Affects Versions: 0.9 Reporter: Stephan Ewen Fix For: 0.9 The context class loader of task threads is the user code class loader that has access to the libraries of the user program. The sorter spawns extra threads (reading, sorting, spilling) without setting the context class loader on those threads. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-1480) Test Flink against Hadoop 2.6.0
Robert Metzger created FLINK-1480: - Summary: Test Flink against Hadoop 2.6.0 Key: FLINK-1480 URL: https://issues.apache.org/jira/browse/FLINK-1480 Project: Flink Issue Type: Task Components: Build System Reporter: Robert Metzger Assignee: Robert Metzger One of our travis builds is building always against the latest hadoop release. Right now, we test against 2.5.1. I would like to bump the version to 2.6.0 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1471) Allow KeySelectors to implement ResultTypeQueryable
[ https://issues.apache.org/jira/browse/FLINK-1471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14307239#comment-14307239 ] Timo Walther commented on FLINK-1471: - Fixed in d033fa8fa834d288ec977ef7bda043dfdc397e59. Allow KeySelectors to implement ResultTypeQueryable --- Key: FLINK-1471 URL: https://issues.apache.org/jira/browse/FLINK-1471 Project: Flink Issue Type: Bug Components: Java API Affects Versions: 0.9 Reporter: Robert Metzger Assignee: Timo Walther Fix For: 0.9 See https://github.com/apache/flink/pull/354 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (FLINK-1471) Allow KeySelectors to implement ResultTypeQueryable
[ https://issues.apache.org/jira/browse/FLINK-1471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Timo Walther resolved FLINK-1471. - Resolution: Fixed Allow KeySelectors to implement ResultTypeQueryable --- Key: FLINK-1471 URL: https://issues.apache.org/jira/browse/FLINK-1471 Project: Flink Issue Type: Bug Components: Java API Affects Versions: 0.9 Reporter: Robert Metzger Assignee: Timo Walther Fix For: 0.9 See https://github.com/apache/flink/pull/354 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1396) Add hadoop input formats directly to the user API.
[ https://issues.apache.org/jira/browse/FLINK-1396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14307147#comment-14307147 ] ASF GitHub Bot commented on FLINK-1396: --- Github user aljoscha commented on the pull request: https://github.com/apache/flink/pull/363#issuecomment-73039917 @StephanEwen If I add the exclusions then users that just add flink-java as a dependency will get weird errors when using Hadoop InputFormats. Add hadoop input formats directly to the user API. -- Key: FLINK-1396 URL: https://issues.apache.org/jira/browse/FLINK-1396 Project: Flink Issue Type: Bug Reporter: Robert Metzger Assignee: Aljoscha Krettek -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1471][java-api] Fixes wrong input valid...
Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/359#issuecomment-73042551 Skipping the validation on raw types makes total sense. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1471) Allow KeySelectors to implement ResultTypeQueryable
[ https://issues.apache.org/jira/browse/FLINK-1471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14307180#comment-14307180 ] ASF GitHub Bot commented on FLINK-1471: --- Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/359#issuecomment-73042551 Skipping the validation on raw types makes total sense. Allow KeySelectors to implement ResultTypeQueryable --- Key: FLINK-1471 URL: https://issues.apache.org/jira/browse/FLINK-1471 Project: Flink Issue Type: Bug Components: Java API Affects Versions: 0.9 Reporter: Robert Metzger Assignee: Timo Walther Fix For: 0.9 See https://github.com/apache/flink/pull/354 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (FLINK-1463) RuntimeStatefulSerializerFactory declares ClassLoader as transient but later tries to use it
[ https://issues.apache.org/jira/browse/FLINK-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aljoscha Krettek updated FLINK-1463: Priority: Blocker (was: Major) RuntimeStatefulSerializerFactory declares ClassLoader as transient but later tries to use it Key: FLINK-1463 URL: https://issues.apache.org/jira/browse/FLINK-1463 Project: Flink Issue Type: Bug Affects Versions: 0.8, 0.9 Reporter: Aljoscha Krettek Assignee: Aljoscha Krettek Priority: Blocker At least one user has seen an exception because of this. In theory, the ClassLoader is set again in readParametersFromConfig. But the way it is used in TupleComparatorBase, this method is never called. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [docs] add documentation for FLink GCE setup u...
Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/361#issuecomment-73041953 Will merge this... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1463) RuntimeStatefulSerializerFactory declares ClassLoader as transient but later tries to use it
[ https://issues.apache.org/jira/browse/FLINK-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14307183#comment-14307183 ] ASF GitHub Bot commented on FLINK-1463: --- Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/353#issuecomment-73042896 Looks good. For efficiency, we could change the factory such that it returns the original in the first request, and a duplicate after that. Good change otherwise. +1 to merge RuntimeStatefulSerializerFactory declares ClassLoader as transient but later tries to use it Key: FLINK-1463 URL: https://issues.apache.org/jira/browse/FLINK-1463 Project: Flink Issue Type: Bug Affects Versions: 0.8, 0.9 Reporter: Aljoscha Krettek Assignee: Aljoscha Krettek Priority: Blocker At least one user has seen an exception because of this. In theory, the ClassLoader is set again in readParametersFromConfig. But the way it is used in TupleComparatorBase, this method is never called. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1319) Add static code analysis for UDFs
[ https://issues.apache.org/jira/browse/FLINK-1319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14307203#comment-14307203 ] Timo Walther commented on FLINK-1319: - Actually, I don't like the drop-in approach. I think it would be much better if the code analysis can be included in the release. Especially once the code is stable enough, it would be great to enable it by default and speed up jobs automatically. I did some research about other frameworks we could use instead. Soot is the best framework, however, I think we can also build the code analysis on top of the ObjectWeb ASM library[1]. It provides some functionality for data flow analysis[2]. The examples for BasicInterpreter and BasicVerifier look promising. Other projects use it for determine types[3]. Using ASM requires us to implement more but it gives us full flexibility for further analysis use cases. I would try implement a simple proof-of-concept prototype. What do you think? [1] http://asm.ow2.org/ [2] http://download.forge.objectweb.org/asm/asm4-guide.pdf, 115ff [3] https://github.com/hraberg/enumerable/blob/master/src/main/java/org/enumerable/lambda/support/expression/ExpressionInterpreter.java Add static code analysis for UDFs - Key: FLINK-1319 URL: https://issues.apache.org/jira/browse/FLINK-1319 Project: Flink Issue Type: New Feature Components: Java API, Scala API Reporter: Stephan Ewen Assignee: Timo Walther Priority: Minor Flink's Optimizer takes information that tells it for UDFs which fields of the input elements are accessed, modified, or frwarded/copied. This information frequently helps to reuse partitionings, sorts, etc. It may speed up programs significantly, as it can frequently eliminate sorts and shuffles, which are costly. Right now, users can add lightweight annotations to UDFs to provide this information (such as adding {{@ConstandFields(0-3, 1, 2-1)}}. We worked with static code analysis of UDFs before, to determine this information automatically. This is an incredible feature, as it magically makes programs faster. For record-at-a-time operations (Map, Reduce, FlatMap, Join, Cross), this works surprisingly well in many cases. We used the Soot toolkit for the static code analysis. Unfortunately, Soot is LGPL licensed and thus we did not include any of the code so far. I propose to add this functionality to Flink, in the form of a drop-in addition, to work around the LGPL incompatibility with ALS 2.0. Users could simply download a special flink-code-analysis.jar and drop it into the lib folder to enable this functionality. We may even add a script to tools that downloads that library automatically into the lib folder. This should be legally fine, since we do not redistribute LGPL code and only dynamically link it (the incompatibility with ASL 2.0 is mainly in the patentability, if I remember correctly). Prior work on this has been done by [~aljoscha] and [~skunert], which could provide a code base to start with. *Appendix* Hompage to Soot static analysis toolkit: http://www.sable.mcgill.ca/soot/ Papers on static analysis and for optimization: http://stratosphere.eu/assets/papers/EnablingOperatorReorderingSCA_12.pdf and http://stratosphere.eu/assets/papers/openingTheBlackBoxes_12.pdf Quick introduction to the Optimizer: http://stratosphere.eu/assets/papers/2014-VLDBJ_Stratosphere_Overview.pdf (Section 6) Optimizer for Iterations: http://stratosphere.eu/assets/papers/spinningFastIterativeDataFlows_12.pdf (Sections 4.3 and 5.3) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1376) SubSlots are not properly released in case that a TaskManager fatally fails, leaving the system in a corrupted state
[ https://issues.apache.org/jira/browse/FLINK-1376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14307202#comment-14307202 ] ASF GitHub Bot commented on FLINK-1376: --- Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/317#issuecomment-73045884 This one has been manually merged (but I made a typo in the This closes #xxx message ;-) ) @tillrohrmann Can you close the PR manually? SubSlots are not properly released in case that a TaskManager fatally fails, leaving the system in a corrupted state Key: FLINK-1376 URL: https://issues.apache.org/jira/browse/FLINK-1376 Project: Flink Issue Type: Bug Reporter: Till Rohrmann Assignee: Till Rohrmann In case that the TaskManager fatally fails and some of the failing node's slots are SharedSlots, then the slots are not properly released by the JobManager. This causes that the corresponding job will not be properly failed, leaving the system in a corrupted state. The reason for that is that the AllocatedSlot is not aware of being treated as a SharedSlot and thus he cannot release the associated SubSlots. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1376) SubSlots are not properly released in case that a TaskManager fatally fails, leaving the system in a corrupted state
[ https://issues.apache.org/jira/browse/FLINK-1376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14307206#comment-14307206 ] ASF GitHub Bot commented on FLINK-1376: --- Github user tillrohrmann commented on the pull request: https://github.com/apache/flink/pull/317#issuecomment-73046225 Of course. Thanks for merging. On Thu, Feb 5, 2015 at 2:30 PM, Stephan Ewen notificati...@github.com wrote: This one has been manually merged (but I made a typo in the This closes #xxx message ;-) ) @tillrohrmann https://github.com/tillrohrmann Can you close the PR manually? — Reply to this email directly or view it on GitHub https://github.com/apache/flink/pull/317#issuecomment-73045884. SubSlots are not properly released in case that a TaskManager fatally fails, leaving the system in a corrupted state Key: FLINK-1376 URL: https://issues.apache.org/jira/browse/FLINK-1376 Project: Flink Issue Type: Bug Reporter: Till Rohrmann Assignee: Till Rohrmann In case that the TaskManager fatally fails and some of the failing node's slots are SharedSlots, then the slots are not properly released by the JobManager. This causes that the corresponding job will not be properly failed, leaving the system in a corrupted state. The reason for that is that the AllocatedSlot is not aware of being treated as a SharedSlot and thus he cannot release the associated SubSlots. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: yarn client tests
Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/134#issuecomment-73046443 I guess this one is subsumed by @rmetzger 's work on https://issues.apache.org/jira/browse/FLINK-883 Can we close this PR? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1376) SubSlots are not properly released in case that a TaskManager fatally fails, leaving the system in a corrupted state
[ https://issues.apache.org/jira/browse/FLINK-1376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14307208#comment-14307208 ] ASF GitHub Bot commented on FLINK-1376: --- Github user tillrohrmann closed the pull request at: https://github.com/apache/flink/pull/317 SubSlots are not properly released in case that a TaskManager fatally fails, leaving the system in a corrupted state Key: FLINK-1376 URL: https://issues.apache.org/jira/browse/FLINK-1376 Project: Flink Issue Type: Bug Reporter: Till Rohrmann Assignee: Till Rohrmann In case that the TaskManager fatally fails and some of the failing node's slots are SharedSlots, then the slots are not properly released by the JobManager. This causes that the corresponding job will not be properly failed, leaving the system in a corrupted state. The reason for that is that the AllocatedSlot is not aware of being treated as a SharedSlot and thus he cannot release the associated SubSlots. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: yarn client tests
Github user rmetzger commented on the pull request: https://github.com/apache/flink/pull/134#issuecomment-73047820 Yes, the changes here have been subsumed by FLINK-883. @skunert can you close this pull request? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: Port FLINK-1391 and FLINK-1392 to release-0.8...
GitHub user rmetzger opened a pull request: https://github.com/apache/flink/pull/364 Port FLINK-1391 and FLINK-1392 to release-0.8 branch. These commits port the fixes for the two issues (Avro and Protobuf support) to the release-0.8 branch. They also contain a hotfix regarding the closure cleaner by @aljoscha. You can merge this pull request into a Git repository by running: $ git pull https://github.com/rmetzger/flink kryo081 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/364.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #364 commit 38ebc09ff5782005c5aa1f60b458cae250b8c26e Author: Robert Metzger metzg...@web.de Date: 2015-01-12T20:11:09Z [FLINK-1391] Add support for using Avro-POJOs and Avro types with Kryo Conflicts: flink-java/src/main/java/org/apache/flink/api/java/typeutils/TypeExtractor.java flink-java/src/main/java/org/apache/flink/api/java/typeutils/runtime/KryoSerializer.java Conflicts: flink-java/src/main/java/org/apache/flink/api/java/typeutils/runtime/KryoSerializer.java commit cbe633537567cc39a9877125e79cd7da49ee7f3b Author: Robert Metzger rmetz...@apache.org Date: 2015-01-13T09:21:29Z [FLINK-1392] Add Kryo serializer for Protobuf Conflicts: flink-java/pom.xml flink-java/src/main/java/org/apache/flink/api/java/typeutils/runtime/KryoSerializer.java Conflicts: flink-shaded/pom.xml pom.xml commit 63472baff1fca18b83666831effe2204606cf355 Author: Aljoscha Krettek aljoscha.kret...@gmail.com Date: 2015-01-15T10:46:53Z [hotfix] Also use java closure cleaner on grouped operations commit 9043582a4a1f4fd25e960217a99f3f32d4ba18a9 Author: Robert Metzger rmetz...@apache.org Date: 2015-02-05T13:07:48Z [backports] Cleanup and port changes to 0.8 branch. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1415] Akka cleanups
Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/319 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1391) Kryo fails to properly serialize avro collection types
[ https://issues.apache.org/jira/browse/FLINK-1391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14307230#comment-14307230 ] ASF GitHub Bot commented on FLINK-1391: --- GitHub user rmetzger opened a pull request: https://github.com/apache/flink/pull/364 Port FLINK-1391 and FLINK-1392 to release-0.8 branch. These commits port the fixes for the two issues (Avro and Protobuf support) to the release-0.8 branch. They also contain a hotfix regarding the closure cleaner by @aljoscha. You can merge this pull request into a Git repository by running: $ git pull https://github.com/rmetzger/flink kryo081 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/364.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #364 commit 38ebc09ff5782005c5aa1f60b458cae250b8c26e Author: Robert Metzger metzg...@web.de Date: 2015-01-12T20:11:09Z [FLINK-1391] Add support for using Avro-POJOs and Avro types with Kryo Conflicts: flink-java/src/main/java/org/apache/flink/api/java/typeutils/TypeExtractor.java flink-java/src/main/java/org/apache/flink/api/java/typeutils/runtime/KryoSerializer.java Conflicts: flink-java/src/main/java/org/apache/flink/api/java/typeutils/runtime/KryoSerializer.java commit cbe633537567cc39a9877125e79cd7da49ee7f3b Author: Robert Metzger rmetz...@apache.org Date: 2015-01-13T09:21:29Z [FLINK-1392] Add Kryo serializer for Protobuf Conflicts: flink-java/pom.xml flink-java/src/main/java/org/apache/flink/api/java/typeutils/runtime/KryoSerializer.java Conflicts: flink-shaded/pom.xml pom.xml commit 63472baff1fca18b83666831effe2204606cf355 Author: Aljoscha Krettek aljoscha.kret...@gmail.com Date: 2015-01-15T10:46:53Z [hotfix] Also use java closure cleaner on grouped operations commit 9043582a4a1f4fd25e960217a99f3f32d4ba18a9 Author: Robert Metzger rmetz...@apache.org Date: 2015-02-05T13:07:48Z [backports] Cleanup and port changes to 0.8 branch. Kryo fails to properly serialize avro collection types -- Key: FLINK-1391 URL: https://issues.apache.org/jira/browse/FLINK-1391 Project: Flink Issue Type: Sub-task Affects Versions: 0.8, 0.9 Reporter: Robert Metzger Assignee: Robert Metzger Before FLINK-610, Avro was the default generic serializer. Now, special types coming from Avro are handled by Kryo .. which seems to cause errors like: {code} Exception in thread main org.apache.flink.runtime.client.JobExecutionException: java.lang.NullPointerException at org.apache.avro.generic.GenericData$Array.add(GenericData.java:200) at com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:116) at com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:22) at com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:761) at org.apache.flink.api.java.typeutils.runtime.KryoSerializer.deserialize(KryoSerializer.java:143) at org.apache.flink.api.java.typeutils.runtime.KryoSerializer.deserialize(KryoSerializer.java:148) at org.apache.flink.api.java.typeutils.runtime.PojoSerializer.deserialize(PojoSerializer.java:244) at org.apache.flink.runtime.plugable.DeserializationDelegate.read(DeserializationDelegate.java:56) at org.apache.flink.runtime.io.network.serialization.AdaptiveSpanningRecordDeserializer.getNextRecord(AdaptiveSpanningRecordDeserializer.java:71) at org.apache.flink.runtime.io.network.channels.InputChannel.readRecord(InputChannel.java:189) at org.apache.flink.runtime.io.network.gates.InputGate.readRecord(InputGate.java:176) at org.apache.flink.runtime.io.network.api.MutableRecordReader.next(MutableRecordReader.java:51) at org.apache.flink.runtime.operators.util.ReaderIterator.next(ReaderIterator.java:53) at org.apache.flink.runtime.operators.DataSinkTask.invoke(DataSinkTask.java:170) at org.apache.flink.runtime.execution.RuntimeEnvironment.run(RuntimeEnvironment.java:257) at java.lang.Thread.run(Thread.java:744) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (FLINK-1299) Remove default values for JobManager and TaskManager heap sizes
[ https://issues.apache.org/jira/browse/FLINK-1299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ufuk Celebi resolved FLINK-1299. Resolution: Won't Fix +1 Remove default values for JobManager and TaskManager heap sizes --- Key: FLINK-1299 URL: https://issues.apache.org/jira/browse/FLINK-1299 Project: Flink Issue Type: Improvement Components: Build System Reporter: Ufuk Celebi Priority: Minor Currently, the default config contains fixed values for both the job manager and the task manager heap sizes. I propose to comment both lines out and let the JVM set -Xms and -Xms automatically. This should give a better out of the box experience when trying out Flink locally or on a cluster. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1396][FLINK-1303] Hadoop Input/Output d...
Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/363#issuecomment-73040193 Does this occur during local execution, or collection execution? The dependencies are not covered by the runtime dependencies? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1463] Fix stateful/stateless Serializer...
Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/353#issuecomment-73042896 Looks good. For efficiency, we could change the factory such that it returns the original in the first request, and a duplicate after that. Good change otherwise. +1 to merge --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1481) Flakey JobManagerITCase
[ https://issues.apache.org/jira/browse/FLINK-1481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14307321#comment-14307321 ] ASF GitHub Bot commented on FLINK-1481: --- GitHub user tillrohrmann opened a pull request: https://github.com/apache/flink/pull/365 [FLINK-1481] Fixes flakey JobManagerITCase The sometimes failing sender tasks are now guaranteed to have at least one task which fails. You can merge this pull request into a Git repository by running: $ git pull https://github.com/tillrohrmann/flink fixFlakeyJobManagerITCase Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/365.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #365 commit 0964a6ebaa64c9ae644831564ff898f23e851bb6 Author: Till Rohrmann trohrm...@apache.org Date: 2015-02-05T14:48:23Z [FLINK-1481] Fixes flakey JobManagerITCase which relied on non-deterministic behaviour. Flakey JobManagerITCase --- Key: FLINK-1481 URL: https://issues.apache.org/jira/browse/FLINK-1481 Project: Flink Issue Type: Bug Reporter: Till Rohrmann Priority: Minor We currently have some test cases which rely on non-deterministic behaviour. For example the {{JobManagerItCase}} contains a test case with a {{SometimesExceptionSender}} which randomly throws an exception and otherwise blocks. The probability of a failure is 0.05 and we have 100 parallel tasks. Thus, the overall probability that no task fails at all is still 0.005. Thus every 200th test run, the test case will block and thus fail. In order to get a stable test case base, I think that we should try to avoid these kind of test cases with random behaviour. The same goes with sleep timeouts in order to establish an intended interleaving of concurrent processes. With Travis this can fail too often. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (FLINK-1443) Add replicated data source
[ https://issues.apache.org/jira/browse/FLINK-1443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fabian Hueske resolved FLINK-1443. -- Resolution: Implemented Fix Version/s: 0.9 Implemented in a19b4a02bfa5237e0dcd2b264da36229546f23c0 Add replicated data source -- Key: FLINK-1443 URL: https://issues.apache.org/jira/browse/FLINK-1443 Project: Flink Issue Type: New Feature Components: Java API, JobManager, Optimizer Affects Versions: 0.9 Reporter: Fabian Hueske Assignee: Fabian Hueske Priority: Minor Fix For: 0.9 This issue proposes to add support for data sources that read the same data in all parallel instances. This feature can be useful, if the data is replicated to all machines in a cluster and can be locally read. For example, a replicated input format can be used for a broadcast join without sending any data over the network. The following changes are necessary to achieve this: 1) Add a replicating InputSplitAssigner which assigns all splits to the all parallel instances. This requires also to extend the InputSplitAssigner interface to identify the exact parallel instance that requests an InputSplit (currently only the hostname is provided). 2) Make sure that the DOP of the replicated data source is identical to the DOP of its successor. 3) Let the optimizer know that the data is replicated and ensure that plan enumeration works correctly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (FLINK-1318) Make quoted String parsing optional and configurable for CSVInputFormats
[ https://issues.apache.org/jira/browse/FLINK-1318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fabian Hueske resolved FLINK-1318. -- Resolution: Fixed Fix Version/s: 0.9 Fixed in 2665cf4e2a9e33e0e94ac7e0b7518a10445febbb Make quoted String parsing optional and configurable for CSVInputFormats Key: FLINK-1318 URL: https://issues.apache.org/jira/browse/FLINK-1318 Project: Flink Issue Type: Improvement Components: Java API, Scala API Affects Versions: 0.8 Reporter: Fabian Hueske Assignee: Fabian Hueske Priority: Minor Fix For: 0.9 With the current implementation of the CSVInputFormat, quoted string parsing kicks in, if the first non-whitespace character of a field is a double quote. I see two issues with this implementation: 1. Quoted String parsing cannot be disabled 2. The quoting character is fixed to double quotes () I propose to add parameters to disable quoted String parsing and set the quote character. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-1485) Typo in Documentation - Join with Join-Function
Johannes created FLINK-1485: --- Summary: Typo in Documentation - Join with Join-Function Key: FLINK-1485 URL: https://issues.apache.org/jira/browse/FLINK-1485 Project: Flink Issue Type: Bug Affects Versions: 0.8 Reporter: Johannes Assignee: Johannes Priority: Trivial Small typo in documentation In the java example for Join with Join-Function -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: Typo in - Join with Join-Function
GitHub user jkirsch opened a pull request: https://github.com/apache/flink/pull/369 Typo in - Join with Join-Function Fix typo in the Java Code Example You can merge this pull request into a Git repository by running: $ git pull https://github.com/jkirsch/incubator-flink typo Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/369.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #369 commit 7c15e144398fa615c6c67a6a21155830a9108f2e Author: jkirsch jkirschn...@gmail.com Date: 2015-02-05T20:42:17Z Typo in - Join with Join-Function Fix typo in the Java Code Example --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1471) Allow KeySelectors to implement ResultTypeQueryable
[ https://issues.apache.org/jira/browse/FLINK-1471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14306869#comment-14306869 ] ASF GitHub Bot commented on FLINK-1471: --- Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/359#issuecomment-73013430 I do not quite get what this means now for the input validation. Assume two classes, `A` and `B`, where `B` is a subclass of `A`. ```java DataSetB data = ...; data.map(new MapFunctionA, Long() { ... }); ``` I though that the validation previously checked that the MapFunction's input parameter is assignable from the data set type. So this is skipped now? It is not a big deal, I am just wondering why... Allow KeySelectors to implement ResultTypeQueryable --- Key: FLINK-1471 URL: https://issues.apache.org/jira/browse/FLINK-1471 Project: Flink Issue Type: Bug Components: Java API Affects Versions: 0.9 Reporter: Robert Metzger Assignee: Timo Walther Fix For: 0.9 See https://github.com/apache/flink/pull/354 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1396][FLINK-1303] Hadoop Input/Output d...
Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/363#issuecomment-73014361 Looks good. I have one suggestion concerning the Hadoop dependencies: The `flink-java` project depends on the Hadoop API, for the `Writable` interface, the `NullValue` and the InputFormat classes. We should be able to exclude all transitive dependencies from the Hadoop dependency in `flink-java`, making the project more lightweight, so that someone that only writes against the flink API does not have the long tail of transitive hadoop dependencies. Whenever we really execute Hadoop code, we have the flink runtime involved, which then has the necessary dependencies. To exclude all transitive dependencies, use ``` exclusions exclusion groupId*/groupId artifactId*/artifactId /exclusion /exclusions ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1476) Flink VS Spark on loop test
[ https://issues.apache.org/jira/browse/FLINK-1476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14306879#comment-14306879 ] Fabian Hueske commented on FLINK-1476: -- Hi Xuhong, your memory configuration does not look optimal. Your machines have 32GB each and you assign 2GB to each task manager. There is only one TM running on each node, so the total memory given to Flink is 6GB. Each TM allocates approx. 70% of its memory for in-memory processing and divides it among all slots (24 for each TM). Therefore, the memory for each slot is about (2GB * 0.7) / 24 = 60MB which is not a lot. If you run the program with a DOP of 16, Flink will use 60MB * 16 = 960MB for in-memory processing and put all other data to disk. Flink VS Spark on loop test --- Key: FLINK-1476 URL: https://issues.apache.org/jira/browse/FLINK-1476 Project: Flink Issue Type: Test Affects Versions: 0.7.0-incubating, 0.8 Environment: 3 machines, every machines has 24 CPU cores and allocate 16 CPU cores for the tests. The memory situation is: 3 * 32G Reporter: xuhong Priority: Critical In the last days, i did some test on flink and spark. The test results shows that flink can do better on many operations, such as GroupBy, Join and some complex jobs. But when I do the KMeans, LinearRegression and other loop tests, i found that flink is no more excellent than spark. I want to konw, whether flink is more comfortable to do the loop jobs with spark. I add code: env.setDegreeOfParallelism(16) in each test to allocate same CPU cores as in Spark tests. My english is not good, i wish you guys can understand me! the following is some config of my Flnk: jobmanager.rpc.port: 6123 jobmanager.heap.mb: 2048 taskmanager.heap.mb: 2048 taskmanager.numberOfTaskSlots: 24 parallelization.degree.default: 72 jobmanager.web.port: 8081 webclient.port: 8085 fs.overwrite-files: true taskmanager.memory.fraction: 0.8 taskmanager.network.numberofBuffers: 7 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK1443] Add support for replicating input ...
Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/360#issuecomment-73014844 Very nice, +1 to merge --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1471][java-api] Fixes wrong input valid...
Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/359#issuecomment-73013430 I do not quite get what this means now for the input validation. Assume two classes, `A` and `B`, where `B` is a subclass of `A`. ```java DataSetB data = ...; data.map(new MapFunctionA, Long() { ... }); ``` I though that the validation previously checked that the MapFunction's input parameter is assignable from the data set type. So this is skipped now? It is not a big deal, I am just wondering why... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-592] Add support for Kerberos secured Y...
Github user mxm commented on the pull request: https://github.com/apache/flink/pull/358#issuecomment-73013486 @warneke Thank you for your help! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1396) Add hadoop input formats directly to the user API.
[ https://issues.apache.org/jira/browse/FLINK-1396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14306878#comment-14306878 ] ASF GitHub Bot commented on FLINK-1396: --- Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/363#issuecomment-73014361 Looks good. I have one suggestion concerning the Hadoop dependencies: The `flink-java` project depends on the Hadoop API, for the `Writable` interface, the `NullValue` and the InputFormat classes. We should be able to exclude all transitive dependencies from the Hadoop dependency in `flink-java`, making the project more lightweight, so that someone that only writes against the flink API does not have the long tail of transitive hadoop dependencies. Whenever we really execute Hadoop code, we have the flink runtime involved, which then has the necessary dependencies. To exclude all transitive dependencies, use ``` exclusions exclusion groupId*/groupId artifactId*/artifactId /exclusion /exclusions ``` Add hadoop input formats directly to the user API. -- Key: FLINK-1396 URL: https://issues.apache.org/jira/browse/FLINK-1396 Project: Flink Issue Type: Bug Reporter: Robert Metzger Assignee: Aljoscha Krettek -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1422) Missing usage example for withParameters
[ https://issues.apache.org/jira/browse/FLINK-1422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14307292#comment-14307292 ] ASF GitHub Bot commented on FLINK-1422: --- Github user zentol commented on the pull request: https://github.com/apache/flink/pull/350#issuecomment-73056344 updated. Missing usage example for withParameters -- Key: FLINK-1422 URL: https://issues.apache.org/jira/browse/FLINK-1422 Project: Flink Issue Type: Improvement Components: Documentation Affects Versions: 0.8 Reporter: Alexander Alexandrov Assignee: Chesnay Schepler Priority: Trivial Fix For: 0.8.1 Original Estimate: 1h Remaining Estimate: 1h I am struggling to find a usage example of the withParameters method in the documentation. At the moment I only see this note: {quote} Note: As the content of broadcast variables is kept in-memory on each node, it should not become too large. For simpler things like scalar values you can simply make parameters part of the closure of a function, or use the withParameters(...) method to pass in a configuration. {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1422] Add withParameters() to documenta...
Github user fhueske commented on a diff in the pull request: https://github.com/apache/flink/pull/350#discussion_r24167714 --- Diff: docs/programming_guide.md --- @@ -2398,6 +2399,61 @@ of a function, or use the `withParameters(...)` method to pass in a configuratio [Back to top](#top) +Passing parameters to functions +--- + +Parameters can be passed to rich functions using either the constructor (if the function is --- End diff -- Only the Configuration object method requires a RichFunction to overwrite `open()`. Configuring via the constructor (or any other method which sets non-transient fields) works also for functions that implement the function interfaces. Also it is not required that the class is defined as `static final`. That is only necessary if the class is defined inside another class. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1422) Missing usage example for withParameters
[ https://issues.apache.org/jira/browse/FLINK-1422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14307361#comment-14307361 ] ASF GitHub Bot commented on FLINK-1422: --- Github user fhueske commented on a diff in the pull request: https://github.com/apache/flink/pull/350#discussion_r24168441 --- Diff: docs/programming_guide.md --- @@ -2398,6 +2399,61 @@ of a function, or use the `withParameters(...)` method to pass in a configuratio [Back to top](#top) +Passing parameters to functions +--- + +Parameters can be passed to rich functions using either the constructor (if the function is --- End diff -- If the function is not serialisable the program will fail in any case. That is not related to the configuration. IMO, the documentation should state that the function object is serialised using standard java serialization and shipped as-is to all parallel task instances. Mentioning that transient fields are not shipped is OK but not mandatory (somebody who uses the transient keyword should know what it does). Missing usage example for withParameters -- Key: FLINK-1422 URL: https://issues.apache.org/jira/browse/FLINK-1422 Project: Flink Issue Type: Improvement Components: Documentation Affects Versions: 0.8 Reporter: Alexander Alexandrov Assignee: Chesnay Schepler Priority: Trivial Fix For: 0.8.1 Original Estimate: 1h Remaining Estimate: 1h I am struggling to find a usage example of the withParameters method in the documentation. At the moment I only see this note: {quote} Note: As the content of broadcast variables is kept in-memory on each node, it should not become too large. For simpler things like scalar values you can simply make parameters part of the closure of a function, or use the withParameters(...) method to pass in a configuration. {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1422] Add withParameters() to documenta...
Github user fhueske commented on a diff in the pull request: https://github.com/apache/flink/pull/350#discussion_r24168441 --- Diff: docs/programming_guide.md --- @@ -2398,6 +2399,61 @@ of a function, or use the `withParameters(...)` method to pass in a configuratio [Back to top](#top) +Passing parameters to functions +--- + +Parameters can be passed to rich functions using either the constructor (if the function is --- End diff -- If the function is not serialisable the program will fail in any case. That is not related to the configuration. IMO, the documentation should state that the function object is serialised using standard java serialization and shipped as-is to all parallel task instances. Mentioning that transient fields are not shipped is OK but not mandatory (somebody who uses the transient keyword should know what it does). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1422) Missing usage example for withParameters
[ https://issues.apache.org/jira/browse/FLINK-1422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14307336#comment-14307336 ] ASF GitHub Bot commented on FLINK-1422: --- Github user zentol commented on a diff in the pull request: https://github.com/apache/flink/pull/350#discussion_r24168173 --- Diff: docs/programming_guide.md --- @@ -2398,6 +2399,61 @@ of a function, or use the `withParameters(...)` method to pass in a configuratio [Back to top](#top) +Passing parameters to functions +--- + +Parameters can be passed to rich functions using either the constructor (if the function is --- End diff -- ok, so which requirements do exist for the operator to be serializable? or can i just omit those details and say something like parameters can be stored in non-transient fields if the function is serializable. Missing usage example for withParameters -- Key: FLINK-1422 URL: https://issues.apache.org/jira/browse/FLINK-1422 Project: Flink Issue Type: Improvement Components: Documentation Affects Versions: 0.8 Reporter: Alexander Alexandrov Assignee: Chesnay Schepler Priority: Trivial Fix For: 0.8.1 Original Estimate: 1h Remaining Estimate: 1h I am struggling to find a usage example of the withParameters method in the documentation. At the moment I only see this note: {quote} Note: As the content of broadcast variables is kept in-memory on each node, it should not become too large. For simpler things like scalar values you can simply make parameters part of the closure of a function, or use the withParameters(...) method to pass in a configuration. {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1422] Add withParameters() to documenta...
Github user zentol commented on a diff in the pull request: https://github.com/apache/flink/pull/350#discussion_r24168173 --- Diff: docs/programming_guide.md --- @@ -2398,6 +2399,61 @@ of a function, or use the `withParameters(...)` method to pass in a configuratio [Back to top](#top) +Passing parameters to functions +--- + +Parameters can be passed to rich functions using either the constructor (if the function is --- End diff -- ok, so which requirements do exist for the operator to be serializable? or can i just omit those details and say something like parameters can be stored in non-transient fields if the function is serializable. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1481] Fixes flakey JobManagerITCase
Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/365#issuecomment-73060980 Good one, thank you! +1 to merge --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1481) Flakey JobManagerITCase
[ https://issues.apache.org/jira/browse/FLINK-1481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14307363#comment-14307363 ] ASF GitHub Bot commented on FLINK-1481: --- Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/365#issuecomment-73060980 Good one, thank you! +1 to merge Flakey JobManagerITCase --- Key: FLINK-1481 URL: https://issues.apache.org/jira/browse/FLINK-1481 Project: Flink Issue Type: Bug Reporter: Till Rohrmann Priority: Minor We currently have some test cases which rely on non-deterministic behaviour. For example the {{JobManagerItCase}} contains a test case with a {{SometimesExceptionSender}} which randomly throws an exception and otherwise blocks. The probability of a failure is 0.05 and we have 100 parallel tasks. Thus, the overall probability that no task fails at all is still 0.005. Thus every 200th test run, the test case will block and thus fail. In order to get a stable test case base, I think that we should try to avoid these kind of test cases with random behaviour. The same goes with sleep timeouts in order to establish an intended interleaving of concurrent processes. With Travis this can fail too often. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1166) Add a QA bot to Flink that is testing pull requests
[ https://issues.apache.org/jira/browse/FLINK-1166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14307568#comment-14307568 ] ASF GitHub Bot commented on FLINK-1166: --- Github user uce commented on a diff in the pull request: https://github.com/apache/flink/pull/366#discussion_r24179312 --- Diff: tools/qa-check.sh --- @@ -0,0 +1,175 @@ +#!/usr/bin/env bash + +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the License); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an AS IS BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + + +# +# QA check your changes. +# Possible options: +# BRANCH set a another branch as the check reference +# +# +# Use the tool like this BRANCH=release-0.8 ./tools/qa-check.sh +# + + +BRANCH=${BRANCH:-origin/master} + + + +here=`dirname \$0\`# relative +here=`( cd \$here\ pwd )` # absolutized and normalized +if [ -z $here ] ; then + # error; for some reason, the path is not accessible + # to the script (e.g. permissions re-evaled after suid) + exit 1 # fail +fi +flink_home=`dirname \$here\` + +echo flink_home=$flink_home here=$here +cd $here + +if [ ! -d _qa_workdir ] ; then + echo _qa_workdir doesnt exist. Creating it + mkdir _qa_workdir +fi +# attention, it overwrites +echo _qa_workdir .gitignore + +cd _qa_workdir + +if [ ! -d flink ] ; then + echo There is no flink copy in the workdir. Cloning flink + git clone https://github.com/apache/flink.git flink + cd flink + +fi +cd flink +git fetch origin +git checkout $BRANCH +cd $here +# go to refrence flink directory + +cd _qa_workdir +VAR_DIR=`pwd` +cd flink + +# Initialize variables +export TESTS_PASSED=true +export MESSAGES=Flink QA-Check results:$'\n' + +goToTestDirectory() { + cd $flink_home +} + + Methods + + Javadocs +JAVADOC_MVN_COMMAND=mvn javadoc:aggregate -Pdocs-and-source -Dmaven.javadoc.failOnError=false -Dquiet=false | grep \WARNING\|warning\|error\ | wc -l + +referenceJavadocsErrors() { + eval $JAVADOC_MVN_COMMAND $VAR_DIR/_JAVADOCS_NUM_WARNINGS +} + + +checkJavadocsErrors() { + OLD_JAVADOC_ERR_CNT=`cat $VAR_DIR/_JAVADOCS_NUM_WARNINGS` + NEW_JAVADOC_ERR_CNT=`eval $JAVADOC_MVN_COMMAND` + if [ $NEW_JAVADOC_ERR_CNT -gt $OLD_JAVADOC_ERR_CNT ]; then + MESSAGES+=:-1: The change increases the number of javadoc errors from $OLD_JAVADOC_ERR_CNT to $NEW_JAVADOC_ERR_CNT$'\n' + TESTS_PASSED=false + else + MESSAGES+=:+1: The number of javadoc errors was $OLD_JAVADOC_ERR_CNT and is now $NEW_JAVADOC_ERR_CNT$'\n' + fi +} + + + Compiler warnings +COMPILER_WARN_MVN_COMMAND=mvn clean compile -Dmaven.compiler.showWarning=true -Dmaven.compiler.showDeprecation=true | grep \WARNING\ | wc -l +referenceCompilerWarnings() { + eval $COMPILER_WARN_MVN_COMMAND $VAR_DIR/_COMPILER_NUM_WARNINGS +} + +checkCompilerWarnings() { + OLD_COMPILER_ERR_CNT=`cat $VAR_DIR/_COMPILER_NUM_WARNINGS` + NEW_COMPILER_ERR_CNT=`eval $COMPILER_WARN_MVN_COMMAND` + if [ $NEW_COMPILER_ERR_CNT -gt $OLD_COMPILER_ERR_CNT ]; then + MESSAGES+=:-1: The change increases the number of compiler warnings from $OLD_COMPILER_ERR_CNT to $NEW_COMPILER_ERR_CNT$'\n' + TESTS_PASSED=false + else + MESSAGES+=:+1: The number of compiler warnings was $OLD_COMPILER_ERR_CNT and is now $NEW_COMPILER_ERR_CNT$'\n' + fi +} + + Files in lib +BUILD_MVN_COMMAND=mvn clean package -DskipTests --- End diff -- we could also skip Javadocs here with `-Dmaven.javadoc.skip=true` Add a QA bot to Flink that is testing pull requests --- Key: FLINK-1166
[GitHub] flink pull request: [FLINK-1166] Add qa-check.sh tool
Github user rmetzger commented on the pull request: https://github.com/apache/flink/pull/366#issuecomment-73089829 Thank you for trying it out. Ideally, with the bot in place the number of warnings will go down over time. I'll address the comments in the source. I'm not sure if the number of compiler warnings is correct here. A third option would be to a) check out the current master b) get the reference counts on the master c) apply the pull request as a patch to master (checking if patching is possible (basically testing if rebase is possible)) d) if rebase was possible, get the new counts. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1482) BlobStore does not delete directories when process is killed
[ https://issues.apache.org/jira/browse/FLINK-1482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14307619#comment-14307619 ] ASF GitHub Bot commented on FLINK-1482: --- Github user tillrohrmann closed the pull request at: https://github.com/apache/flink/pull/367 BlobStore does not delete directories when process is killed Key: FLINK-1482 URL: https://issues.apache.org/jira/browse/FLINK-1482 Project: Flink Issue Type: Bug Reporter: Till Rohrmann The BlobStore (BlobCache + BlobServer) does not delete the blob store directory if it is not properly shutdown. This happens if one kills the process as it is done by Flink's start and stop shell scripts. We can solve the problem by registering a shutdown hook to do the cleanup. The shutdown hook has to delete a unique directory which has been created by the blob store. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1482) BlobStore does not delete directories when process is killed
[ https://issues.apache.org/jira/browse/FLINK-1482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14307620#comment-14307620 ] ASF GitHub Bot commented on FLINK-1482: --- Github user tillrohrmann commented on the pull request: https://github.com/apache/flink/pull/367#issuecomment-73089771 Ufuk already implemented the shutdown hooks. BlobStore does not delete directories when process is killed Key: FLINK-1482 URL: https://issues.apache.org/jira/browse/FLINK-1482 Project: Flink Issue Type: Bug Reporter: Till Rohrmann The BlobStore (BlobCache + BlobServer) does not delete the blob store directory if it is not properly shutdown. This happens if one kills the process as it is done by Flink's start and stop shell scripts. We can solve the problem by registering a shutdown hook to do the cleanup. The shutdown hook has to delete a unique directory which has been created by the blob store. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1166) Add a QA bot to Flink that is testing pull requests
[ https://issues.apache.org/jira/browse/FLINK-1166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14307621#comment-14307621 ] ASF GitHub Bot commented on FLINK-1166: --- Github user rmetzger commented on the pull request: https://github.com/apache/flink/pull/366#issuecomment-73089829 Thank you for trying it out. Ideally, with the bot in place the number of warnings will go down over time. I'll address the comments in the source. I'm not sure if the number of compiler warnings is correct here. A third option would be to a) check out the current master b) get the reference counts on the master c) apply the pull request as a patch to master (checking if patching is possible (basically testing if rebase is possible)) d) if rebase was possible, get the new counts. Add a QA bot to Flink that is testing pull requests --- Key: FLINK-1166 URL: https://issues.apache.org/jira/browse/FLINK-1166 Project: Flink Issue Type: New Feature Components: Build System Reporter: Robert Metzger Assignee: Robert Metzger Priority: Minor Labels: starter We should have a QA bot (similar to Hadoop) that is checking incoming pull requests for a few things: - Changes to user-facing APIs - More compiler warnings than before - more Javadoc warnings than before - change of the number of files in the lib/ directory. - unused dependencies - {{@author}} tag. - guava (and other shaded jars) in the lib/ directory. It should be somehow extensible to add new tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (FLINK-1482) BlobStore does not delete directories when process is killed
[ https://issues.apache.org/jira/browse/FLINK-1482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann closed FLINK-1482. Resolution: Fixed Fixed with e766dba3653bb942ff66bd9edc999237a77c2ea2 BlobStore does not delete directories when process is killed Key: FLINK-1482 URL: https://issues.apache.org/jira/browse/FLINK-1482 Project: Flink Issue Type: Bug Reporter: Till Rohrmann The BlobStore (BlobCache + BlobServer) does not delete the blob store directory if it is not properly shutdown. This happens if one kills the process as it is done by Flink's start and stop shell scripts. We can solve the problem by registering a shutdown hook to do the cleanup. The shutdown hook has to delete a unique directory which has been created by the blob store. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1166) Add a QA bot to Flink that is testing pull requests
[ https://issues.apache.org/jira/browse/FLINK-1166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14307553#comment-14307553 ] ASF GitHub Bot commented on FLINK-1166: --- Github user rmetzger commented on the pull request: https://github.com/apache/flink/pull/366#issuecomment-73084319 Its not completely building flink, its only generating javadocs or compiling for getting the compiler warnings. The main purpose of the script is to be executed automatically. Add a QA bot to Flink that is testing pull requests --- Key: FLINK-1166 URL: https://issues.apache.org/jira/browse/FLINK-1166 Project: Flink Issue Type: New Feature Components: Build System Reporter: Robert Metzger Assignee: Robert Metzger Priority: Minor Labels: starter We should have a QA bot (similar to Hadoop) that is checking incoming pull requests for a few things: - Changes to user-facing APIs - More compiler warnings than before - more Javadoc warnings than before - change of the number of files in the lib/ directory. - unused dependencies - {{@author}} tag. - guava (and other shaded jars) in the lib/ directory. It should be somehow extensible to add new tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1166) Add a QA bot to Flink that is testing pull requests
[ https://issues.apache.org/jira/browse/FLINK-1166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14307552#comment-14307552 ] ASF GitHub Bot commented on FLINK-1166: --- Github user mxm commented on a diff in the pull request: https://github.com/apache/flink/pull/366#discussion_r24178788 --- Diff: tools/qa-check.sh --- @@ -0,0 +1,175 @@ +#!/usr/bin/env bash + +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the License); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an AS IS BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + + +# +# QA check your changes. +# Possible options: +# BRANCH set a another branch as the check reference +# +# +# Use the tool like this BRANCH=release-0.8 ./tools/qa-check.sh +# + + +BRANCH=${BRANCH:-origin/master} + + + +here=`dirname \$0\`# relative +here=`( cd \$here\ pwd )` # absolutized and normalized +if [ -z $here ] ; then + # error; for some reason, the path is not accessible + # to the script (e.g. permissions re-evaled after suid) + exit 1 # fail +fi +flink_home=`dirname \$here\` + +echo flink_home=$flink_home here=$here +cd $here + +if [ ! -d _qa_workdir ] ; then + echo _qa_workdir doesnt exist. Creating it + mkdir _qa_workdir +fi +# attention, it overwrites +echo _qa_workdir .gitignore + +cd _qa_workdir + +if [ ! -d flink ] ; then + echo There is no flink copy in the workdir. Cloning flink + git clone https://github.com/apache/flink.git flink + cd flink --- End diff -- This `cd flink` is not necessary. Add a QA bot to Flink that is testing pull requests --- Key: FLINK-1166 URL: https://issues.apache.org/jira/browse/FLINK-1166 Project: Flink Issue Type: New Feature Components: Build System Reporter: Robert Metzger Assignee: Robert Metzger Priority: Minor Labels: starter We should have a QA bot (similar to Hadoop) that is checking incoming pull requests for a few things: - Changes to user-facing APIs - More compiler warnings than before - more Javadoc warnings than before - change of the number of files in the lib/ directory. - unused dependencies - {{@author}} tag. - guava (and other shaded jars) in the lib/ directory. It should be somehow extensible to add new tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-1482) BlobStore does not delete directories when process is killed
Till Rohrmann created FLINK-1482: Summary: BlobStore does not delete directories when process is killed Key: FLINK-1482 URL: https://issues.apache.org/jira/browse/FLINK-1482 Project: Flink Issue Type: Bug Reporter: Till Rohrmann The BlobStore (BlobCache + BlobServer) does not delete the blob store directory if it is not properly shutdown. This happens if one kills the process as it is done by Flink's start and stop shell scripts. We can solve the problem by registering a shutdown hook to do the cleanup. The shutdown hook has to delete a unique directory which has been created by the blob store. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1166) Add a QA bot to Flink that is testing pull requests
[ https://issues.apache.org/jira/browse/FLINK-1166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14307600#comment-14307600 ] ASF GitHub Bot commented on FLINK-1166: --- Github user mxm commented on the pull request: https://github.com/apache/flink/pull/366#issuecomment-73087911 I just ran the script on your pull request ;) Flink QA-Check results: :+1: The number of javadoc errors was 402 and is now 402 :-1: The change increases the number of compiler warnings from 104 to 314 :+1: The number of files in the lib/ folder was 142 before the change and is now 142 Test finished. Overall result: :-1:. Some tests failed. Please check messages above I guess if the user does not rebase to the latest master, then negative QA test results are likely to occur. Also, warnings could be hidden when the number of warnings have increased on the master. Wouldn't it be better to compare against the base of the user's changes? Add a QA bot to Flink that is testing pull requests --- Key: FLINK-1166 URL: https://issues.apache.org/jira/browse/FLINK-1166 Project: Flink Issue Type: New Feature Components: Build System Reporter: Robert Metzger Assignee: Robert Metzger Priority: Minor Labels: starter We should have a QA bot (similar to Hadoop) that is checking incoming pull requests for a few things: - Changes to user-facing APIs - More compiler warnings than before - more Javadoc warnings than before - change of the number of files in the lib/ directory. - unused dependencies - {{@author}} tag. - guava (and other shaded jars) in the lib/ directory. It should be somehow extensible to add new tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1482) BlobStore does not delete directories when process is killed
[ https://issues.apache.org/jira/browse/FLINK-1482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14307612#comment-14307612 ] ASF GitHub Bot commented on FLINK-1482: --- GitHub user tillrohrmann opened a pull request: https://github.com/apache/flink/pull/367 [FLINK-1482] [runtime] Adds shutdown hook to delete blob store directories Adds shutdown hook to delete blob store directories. This also deletes the blob store files in case that the JVM was terminated. You can merge this pull request into a Git repository by running: $ git pull https://github.com/tillrohrmann/flink fixBlobStoreFileDeletion Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/367.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #367 commit e868c0ed39a8f48d74d7315e911f23d5ccab111a Author: Till Rohrmann trohrm...@apache.org Date: 2015-02-05T17:29:13Z [FLINK-1482] [runtime] Adds shutdown hook to delete blob store directories BlobStore does not delete directories when process is killed Key: FLINK-1482 URL: https://issues.apache.org/jira/browse/FLINK-1482 Project: Flink Issue Type: Bug Reporter: Till Rohrmann The BlobStore (BlobCache + BlobServer) does not delete the blob store directory if it is not properly shutdown. This happens if one kills the process as it is done by Flink's start and stop shell scripts. We can solve the problem by registering a shutdown hook to do the cleanup. The shutdown hook has to delete a unique directory which has been created by the blob store. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1482] [runtime] Adds shutdown hook to d...
Github user tillrohrmann commented on the pull request: https://github.com/apache/flink/pull/367#issuecomment-73089771 Ufuk already implemented the shutdown hooks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1482] [runtime] Adds shutdown hook to d...
Github user tillrohrmann closed the pull request at: https://github.com/apache/flink/pull/367 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Updated] (FLINK-1483) Temporary channel files are not properly deleted when Flink is terminated
[ https://issues.apache.org/jira/browse/FLINK-1483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Till Rohrmann updated FLINK-1483: - Description: The temporary channel files are not properly deleted if the IOManager does not shut down properly. This can be the case when the TaskManagers are terminated by Flink's shell scripts. A solution could be to store all channel files of one TaskManager in a uniquely identifiable directory and to register a shutdown hook which deletes this file upon termination. was:The temporary channel files are not properly deleted if the IOManager does not shut down properly. This can be the case when the Job/TaskManager are terminated by Flink's shell scripts. Temporary channel files are not properly deleted when Flink is terminated - Key: FLINK-1483 URL: https://issues.apache.org/jira/browse/FLINK-1483 Project: Flink Issue Type: Bug Reporter: Till Rohrmann Assignee: Ufuk Celebi The temporary channel files are not properly deleted if the IOManager does not shut down properly. This can be the case when the TaskManagers are terminated by Flink's shell scripts. A solution could be to store all channel files of one TaskManager in a uniquely identifiable directory and to register a shutdown hook which deletes this file upon termination. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1463] Fix stateful/stateless Serializer...
Github user aljoscha commented on the pull request: https://github.com/apache/flink/pull/353#issuecomment-73077279 Do you think that with the additional checking logic this would really make up for one superfluous duplication? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1166) Add a QA bot to Flink that is testing pull requests
[ https://issues.apache.org/jira/browse/FLINK-1166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14307549#comment-14307549 ] ASF GitHub Bot commented on FLINK-1166: --- Github user uce commented on the pull request: https://github.com/apache/flink/pull/366#issuecomment-73083750 Very nice. Will try it out now :) Add a QA bot to Flink that is testing pull requests --- Key: FLINK-1166 URL: https://issues.apache.org/jira/browse/FLINK-1166 Project: Flink Issue Type: New Feature Components: Build System Reporter: Robert Metzger Assignee: Robert Metzger Priority: Minor Labels: starter We should have a QA bot (similar to Hadoop) that is checking incoming pull requests for a few things: - Changes to user-facing APIs - More compiler warnings than before - more Javadoc warnings than before - change of the number of files in the lib/ directory. - unused dependencies - {{@author}} tag. - guava (and other shaded jars) in the lib/ directory. It should be somehow extensible to add new tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1166] Add qa-check.sh tool
Github user uce commented on the pull request: https://github.com/apache/flink/pull/366#issuecomment-73083750 Very nice. Will try it out now :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1442) Archived Execution Graph consumes too much memory
[ https://issues.apache.org/jira/browse/FLINK-1442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14307484#comment-14307484 ] ASF GitHub Bot commented on FLINK-1442: --- Github user hsaputra commented on the pull request: https://github.com/apache/flink/pull/344#issuecomment-73074657 Awesome! Thanks for the update. Archived Execution Graph consumes too much memory - Key: FLINK-1442 URL: https://issues.apache.org/jira/browse/FLINK-1442 Project: Flink Issue Type: Bug Components: JobManager Affects Versions: 0.9 Reporter: Stephan Ewen Assignee: Max Michels Fix For: 0.9 The JobManager archives the execution graphs, for analysis of jobs. The graphs may consume a lot of memory. Especially the execution edges in all2all connection patterns are extremely many and add up in memory consumption. The execution edges connect all parallel tasks. So for a all2all pattern between n and m tasks, there are n*m edges. For parallelism of multiple 100 tasks, this can easily reach 100k objects and more, each with a set of metadata. I propose the following to solve that: 1. Clear all execution edges from the graph (majority of the memory consumers) when it is given to the archiver. 2. Have the map/list of the archived graphs behind a soft reference, to it will be removed under memory pressure before the JVM crashes. That may remove graphs from the history early, but is much preferable to the JVM crashing, in which case the graph is lost as well... 3. Long term: The graph should be archived somewhere else. Somthing like the History server used by Hadoop and Hive would be a good idea. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1463) RuntimeStatefulSerializerFactory declares ClassLoader as transient but later tries to use it
[ https://issues.apache.org/jira/browse/FLINK-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14307507#comment-14307507 ] ASF GitHub Bot commented on FLINK-1463: --- Github user aljoscha commented on the pull request: https://github.com/apache/flink/pull/353#issuecomment-73077279 Do you think that with the additional checking logic this would really make up for one superfluous duplication? RuntimeStatefulSerializerFactory declares ClassLoader as transient but later tries to use it Key: FLINK-1463 URL: https://issues.apache.org/jira/browse/FLINK-1463 Project: Flink Issue Type: Bug Affects Versions: 0.8, 0.9 Reporter: Aljoscha Krettek Assignee: Aljoscha Krettek Priority: Blocker Fix For: 0.8.1 At least one user has seen an exception because of this. In theory, the ClassLoader is set again in readParametersFromConfig. But the way it is used in TupleComparatorBase, this method is never called. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1463) RuntimeStatefulSerializerFactory declares ClassLoader as transient but later tries to use it
[ https://issues.apache.org/jira/browse/FLINK-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14307524#comment-14307524 ] ASF GitHub Bot commented on FLINK-1463: --- Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/353#issuecomment-73079537 Thanks. Sorry for being picky ;-) RuntimeStatefulSerializerFactory declares ClassLoader as transient but later tries to use it Key: FLINK-1463 URL: https://issues.apache.org/jira/browse/FLINK-1463 Project: Flink Issue Type: Bug Affects Versions: 0.8, 0.9 Reporter: Aljoscha Krettek Assignee: Aljoscha Krettek Priority: Blocker Fix For: 0.8.1 At least one user has seen an exception because of this. In theory, the ClassLoader is set again in readParametersFromConfig. But the way it is used in TupleComparatorBase, this method is never called. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1166] Add qa-check.sh tool
Github user mxm commented on the pull request: https://github.com/apache/flink/pull/366#issuecomment-73083755 The script builds Flink twice and therefore takes a while. I don't know if we can force users to execute it before making pull requests. However, it would be good to get an automatic report for open pull requests using this script. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1463) RuntimeStatefulSerializerFactory declares ClassLoader as transient but later tries to use it
[ https://issues.apache.org/jira/browse/FLINK-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14307516#comment-14307516 ] ASF GitHub Bot commented on FLINK-1463: --- Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/353#issuecomment-73078373 Yes. Some serializers are heavyweight and often, the factories produce only one serializer through their lifetime. It is a nice improvement, bot a crucial one, though. RuntimeStatefulSerializerFactory declares ClassLoader as transient but later tries to use it Key: FLINK-1463 URL: https://issues.apache.org/jira/browse/FLINK-1463 Project: Flink Issue Type: Bug Affects Versions: 0.8, 0.9 Reporter: Aljoscha Krettek Assignee: Aljoscha Krettek Priority: Blocker Fix For: 0.8.1 At least one user has seen an exception because of this. In theory, the ClassLoader is set again in readParametersFromConfig. But the way it is used in TupleComparatorBase, this method is never called. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1463] Fix stateful/stateless Serializer...
Github user aljoscha commented on the pull request: https://github.com/apache/flink/pull/353#issuecomment-73079161 Ok, then I'll add this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1166] Add qa-check.sh tool
Github user uce commented on the pull request: https://github.com/apache/flink/pull/366#issuecomment-73084249 The idea is to run this automatically or on request, which should be fine. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1166] Add qa-check.sh tool
Github user mxm commented on a diff in the pull request: https://github.com/apache/flink/pull/366#discussion_r24178788 --- Diff: tools/qa-check.sh --- @@ -0,0 +1,175 @@ +#!/usr/bin/env bash + +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the License); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +#http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an AS IS BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + + +# +# QA check your changes. +# Possible options: +# BRANCH set a another branch as the check reference +# +# +# Use the tool like this BRANCH=release-0.8 ./tools/qa-check.sh +# + + +BRANCH=${BRANCH:-origin/master} + + + +here=`dirname \$0\`# relative +here=`( cd \$here\ pwd )` # absolutized and normalized +if [ -z $here ] ; then + # error; for some reason, the path is not accessible + # to the script (e.g. permissions re-evaled after suid) + exit 1 # fail +fi +flink_home=`dirname \$here\` + +echo flink_home=$flink_home here=$here +cd $here + +if [ ! -d _qa_workdir ] ; then + echo _qa_workdir doesnt exist. Creating it + mkdir _qa_workdir +fi +# attention, it overwrites +echo _qa_workdir .gitignore + +cd _qa_workdir + +if [ ! -d flink ] ; then + echo There is no flink copy in the workdir. Cloning flink + git clone https://github.com/apache/flink.git flink + cd flink --- End diff -- This `cd flink` is not necessary. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1166] Add qa-check.sh tool
Github user rmetzger commented on the pull request: https://github.com/apache/flink/pull/366#issuecomment-73084319 Its not completely building flink, its only generating javadocs or compiling for getting the compiler warnings. The main purpose of the script is to be executed automatically. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Resolved] (FLINK-1477) YARN Client: Use HADOOP_HOME if set
[ https://issues.apache.org/jira/browse/FLINK-1477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Metzger resolved FLINK-1477. --- Resolution: Fixed Resolved in http://git-wip-us.apache.org/repos/asf/flink/commit/5e1cc9e2 YARN Client: Use HADOOP_HOME if set --- Key: FLINK-1477 URL: https://issues.apache.org/jira/browse/FLINK-1477 Project: Flink Issue Type: Bug Components: YARN Client Reporter: Robert Metzger Assignee: Robert Metzger As part of FLINK-592 I removed code to load the Hadoop configuration from the HADOOP_HOME variable. But it seems that we still have users where this deprecated variable is the only hadoop conf variable which is set. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK1443] Add support for replicating input ...
Github user fhueske closed the pull request at: https://github.com/apache/flink/pull/360 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK1443] Add support for replicating input ...
Github user fhueske commented on the pull request: https://github.com/apache/flink/pull/360#issuecomment-73025884 Merged as a19b4a02bfa5237e0dcd2b264da36229546f23c0 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1415] Akka cleanups
Github user tillrohrmann commented on the pull request: https://github.com/apache/flink/pull/319#issuecomment-73031242 Rebased on Stephan's latest master branch containing the subslot release fix and reduced memory footprint of archived ExecutionGraphs. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1442] Reduce memory consumption of arch...
Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/344 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Created] (FLINK-1477) YARN Client: Use HADOOP_HOME if set
Robert Metzger created FLINK-1477: - Summary: YARN Client: Use HADOOP_HOME if set Key: FLINK-1477 URL: https://issues.apache.org/jira/browse/FLINK-1477 Project: Flink Issue Type: Bug Components: YARN Client Reporter: Robert Metzger Assignee: Robert Metzger As part of FLINK-592 I removed code to load the Hadoop configuration from the HADOOP_HOME variable. But it seems that we still have users where this deprecated variable is the only hadoop conf variable which is set. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1396) Add hadoop input formats directly to the user API.
[ https://issues.apache.org/jira/browse/FLINK-1396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14306959#comment-14306959 ] ASF GitHub Bot commented on FLINK-1396: --- Github user rmetzger commented on the pull request: https://github.com/apache/flink/pull/363#issuecomment-73024197 I vote for keeping @aljoscha's original approach. Users might not notice the different interfaces there, so the Hadoop in the method name makes it more explicit. Also, it could lead to confusions because Flink's and Hadoop's InputFormats have pretty similar names (they actually only differ in the package names). Lastly, it would cause some work on Aljoscha's side to update the code and the documentation. Add hadoop input formats directly to the user API. -- Key: FLINK-1396 URL: https://issues.apache.org/jira/browse/FLINK-1396 Project: Flink Issue Type: Bug Reporter: Robert Metzger Assignee: Aljoscha Krettek -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1318) Make quoted String parsing optional and configurable for CSVInputFormats
[ https://issues.apache.org/jira/browse/FLINK-1318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14306972#comment-14306972 ] ASF GitHub Bot commented on FLINK-1318: --- Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/265 Make quoted String parsing optional and configurable for CSVInputFormats Key: FLINK-1318 URL: https://issues.apache.org/jira/browse/FLINK-1318 Project: Flink Issue Type: Improvement Components: Java API, Scala API Affects Versions: 0.8 Reporter: Fabian Hueske Assignee: Fabian Hueske Priority: Minor With the current implementation of the CSVInputFormat, quoted string parsing kicks in, if the first non-whitespace character of a field is a double quote. I see two issues with this implementation: 1. Quoted String parsing cannot be disabled 2. The quoting character is fixed to double quotes () I propose to add parameters to disable quoted String parsing and set the quote character. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1471) Allow KeySelectors to implement ResultTypeQueryable
[ https://issues.apache.org/jira/browse/FLINK-1471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14307049#comment-14307049 ] ASF GitHub Bot commented on FLINK-1471: --- Github user twalthr commented on the pull request: https://github.com/apache/flink/pull/359#issuecomment-73028436 The discussion is how a Function without any generics (raw) should be treated. In general we have 3 possiblities: - Skip input validation completely if Function implements ResultTypeQueryable - Introduce a Interface /Anntotation SkipInputValidation for special use cases - This PR: Skip input validation if Function is a raw type. Allow KeySelectors to implement ResultTypeQueryable --- Key: FLINK-1471 URL: https://issues.apache.org/jira/browse/FLINK-1471 Project: Flink Issue Type: Bug Components: Java API Affects Versions: 0.9 Reporter: Robert Metzger Assignee: Timo Walther Fix For: 0.9 See https://github.com/apache/flink/pull/354 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-1478) Add strictly local input split assignment
Stephan Ewen created FLINK-1478: --- Summary: Add strictly local input split assignment Key: FLINK-1478 URL: https://issues.apache.org/jira/browse/FLINK-1478 Project: Flink Issue Type: New Feature Components: JobManager Affects Versions: 0.9 Reporter: Stephan Ewen Fix For: 0.9 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1458] Allow Interfaces and abstract typ...
Github user aljoscha closed the pull request at: https://github.com/apache/flink/pull/357 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1442) Archived Execution Graph consumes too much memory
[ https://issues.apache.org/jira/browse/FLINK-1442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14307133#comment-14307133 ] ASF GitHub Bot commented on FLINK-1442: --- Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/344 Archived Execution Graph consumes too much memory - Key: FLINK-1442 URL: https://issues.apache.org/jira/browse/FLINK-1442 Project: Flink Issue Type: Bug Components: JobManager Affects Versions: 0.9 Reporter: Stephan Ewen Assignee: Max Michels The JobManager archives the execution graphs, for analysis of jobs. The graphs may consume a lot of memory. Especially the execution edges in all2all connection patterns are extremely many and add up in memory consumption. The execution edges connect all parallel tasks. So for a all2all pattern between n and m tasks, there are n*m edges. For parallelism of multiple 100 tasks, this can easily reach 100k objects and more, each with a set of metadata. I propose the following to solve that: 1. Clear all execution edges from the graph (majority of the memory consumers) when it is given to the archiver. 2. Have the map/list of the archived graphs behind a soft reference, to it will be removed under memory pressure before the JVM crashes. That may remove graphs from the history early, but is much preferable to the JVM crashing, in which case the graph is lost as well... 3. Long term: The graph should be archived somewhere else. Somthing like the History server used by Hadoop and Hive would be a good idea. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1396][FLINK-1303] Hadoop Input/Output d...
Github user aljoscha commented on the pull request: https://github.com/apache/flink/pull/363#issuecomment-73020223 I addressed the comments. What do the others think about overloading readFile()? I made it like this on purpose. So that the user sees in the API that they are using Hadoop input formats or that they can be used. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1396) Add hadoop input formats directly to the user API.
[ https://issues.apache.org/jira/browse/FLINK-1396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14306928#comment-14306928 ] ASF GitHub Bot commented on FLINK-1396: --- Github user aljoscha commented on the pull request: https://github.com/apache/flink/pull/363#issuecomment-73020223 I addressed the comments. What do the others think about overloading readFile()? I made it like this on purpose. So that the user sees in the API that they are using Hadoop input formats or that they can be used. Add hadoop input formats directly to the user API. -- Key: FLINK-1396 URL: https://issues.apache.org/jira/browse/FLINK-1396 Project: Flink Issue Type: Bug Reporter: Robert Metzger Assignee: Aljoscha Krettek -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1484] Adds explicit disconnect message ...
Github user hsaputra commented on a diff in the pull request: https://github.com/apache/flink/pull/368#discussion_r24220454 --- Diff: flink-runtime/src/main/scala/org/apache/flink/runtime/jobmanager/JobManager.scala --- @@ -125,6 +126,10 @@ Actor with ActorLogMessages with ActorLogging { override def postStop(): Unit = { log.info(sStopping job manager ${self.path}.) +// disconnect the registered task managers +instanceManager.getAllRegisteredInstances.asScala.foreach{ + _.getTaskManager ! Disconnected(JobManager is stopping)} + for((e,_) - currentJobs.values){ e.fail(new Exception(The JobManager is shutting down.)) --- End diff -- Since we are cleaning up messages, maybe remove The so it is consistent with other messages. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1484) JobManager restart does not notify the TaskManager
[ https://issues.apache.org/jira/browse/FLINK-1484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14308540#comment-14308540 ] ASF GitHub Bot commented on FLINK-1484: --- Github user hsaputra commented on a diff in the pull request: https://github.com/apache/flink/pull/368#discussion_r24220454 --- Diff: flink-runtime/src/main/scala/org/apache/flink/runtime/jobmanager/JobManager.scala --- @@ -125,6 +126,10 @@ Actor with ActorLogMessages with ActorLogging { override def postStop(): Unit = { log.info(sStopping job manager ${self.path}.) +// disconnect the registered task managers +instanceManager.getAllRegisteredInstances.asScala.foreach{ + _.getTaskManager ! Disconnected(JobManager is stopping)} + for((e,_) - currentJobs.values){ e.fail(new Exception(The JobManager is shutting down.)) --- End diff -- Since we are cleaning up messages, maybe remove The so it is consistent with other messages. JobManager restart does not notify the TaskManager -- Key: FLINK-1484 URL: https://issues.apache.org/jira/browse/FLINK-1484 Project: Flink Issue Type: Bug Reporter: Till Rohrmann In case of a JobManager restart, which can happen due to an uncaught exception, the JobManager is restarted. However, connected TaskManager are not informed about the disconnection and continue sending messages to a JobManager with a reseted state. TaskManager should be informed about a possible restart and cleanup their own state in such a case. Afterwards, they can try to reconnect to a restarted JobManager. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (FLINK-1455) ExternalSortLargeRecordsITCase.testSortWithShortMediumAndLargeRecords: Potential Memory leak
[ https://issues.apache.org/jira/browse/FLINK-1455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephan Ewen resolved FLINK-1455. - Resolution: Invalid Not a memory leak. The error message comes only because the JVM I/O problem caused the test to abort pre-maturely. The root cause is a spurious JVM failure: {code} Caused by: java.io.IOException: Cannot allocate memory at sun.nio.ch.FileDispatcherImpl.write0(Native Method) at sun.nio.ch.FileDispatcherImpl.write(FileDispatcherImpl.java:60) at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93) at sun.nio.ch.IOUtil.write(IOUtil.java:65) at sun.nio.ch.FileChannelImpl.write(FileChannelImpl.java:205) {code} ExternalSortLargeRecordsITCase.testSortWithShortMediumAndLargeRecords: Potential Memory leak Key: FLINK-1455 URL: https://issues.apache.org/jira/browse/FLINK-1455 Project: Flink Issue Type: Bug Components: Local Runtime Affects Versions: 0.9 Reporter: Robert Metzger Priority: Minor This error occurred in one of my Travis jobs: https://travis-ci.org/rmetzger/flink/jobs/48343022 Would be cool if somebody who knows the sorter better could verify/invalidate the issue. {code} Running org.apache.flink.runtime.operators.sort.ExternalSortLargeRecordsITCase java.lang.RuntimeException: Error obtaining the sorted input: Thread 'SortMerger spilling thread' terminated due to an exception: Cannot allocate memory at org.apache.flink.runtime.operators.sort.UnilateralSortMerger.getIterator(UnilateralSortMerger.java:593) at org.apache.flink.runtime.operators.sort.ExternalSortLargeRecordsITCase.testSortWithShortMediumAndLargeRecords(ExternalSortLargeRecordsITCase.java:285) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229) at org.junit.runners.ParentRunner.run(ParentRunner.java:309) at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:264) at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:153) at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:124) at org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:200) at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:153) at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:103) Caused by: java.io.IOException: Thread 'SortMerger spilling thread' terminated due to an exception: Cannot allocate memory at org.apache.flink.runtime.operators.sort.UnilateralSortMerger$ThreadBase.run(UnilateralSortMerger.java:770) Caused by: java.io.IOException: Cannot allocate memory at sun.nio.ch.FileDispatcherImpl.write0(Native Method) at sun.nio.ch.FileDispatcherImpl.write(FileDispatcherImpl.java:60) at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93) at sun.nio.ch.IOUtil.write(IOUtil.java:65) at sun.nio.ch.FileChannelImpl.write(FileChannelImpl.java:205) at org.apache.flink.runtime.io.disk.iomanager.SegmentWriteRequest.write(AsynchronousFileIOChannel.java:267) at org.apache.flink.runtime.io.disk.iomanager.IOManagerAsync$WriterThread.run(IOManagerAsync.java:440) Tests run: 5,
[jira] [Commented] (FLINK-837) Allow FunctionAnnotations on Methods
[ https://issues.apache.org/jira/browse/FLINK-837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14306890#comment-14306890 ] Stephan Ewen commented on FLINK-837: I guess that this is sufficient for now. Allow FunctionAnnotations on Methods Key: FLINK-837 URL: https://issues.apache.org/jira/browse/FLINK-837 Project: Flink Issue Type: Improvement Reporter: GitHub Import Labels: github-import Fix For: pre-apache Allowing function annotations on methods makes it possible to use them with anonymous inner classes. Imported from GitHub Url: https://github.com/stratosphere/stratosphere/issues/837 Created by: [StephanEwen|https://github.com/StephanEwen] Labels: Created at: Mon May 19 22:19:58 CEST 2014 State: open -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1396) Add hadoop input formats directly to the user API.
[ https://issues.apache.org/jira/browse/FLINK-1396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14306943#comment-14306943 ] ASF GitHub Bot commented on FLINK-1396: --- Github user fhueske commented on the pull request: https://github.com/apache/flink/pull/363#issuecomment-73022621 Hmm, yes. That's also a valid point. But on the other hand, new users might not even be aware of the different types of InputFormats. It all would look natural ;-) I am more leaning towards overloading, but would be fine with having separate functions as well. Add hadoop input formats directly to the user API. -- Key: FLINK-1396 URL: https://issues.apache.org/jira/browse/FLINK-1396 Project: Flink Issue Type: Bug Reporter: Robert Metzger Assignee: Aljoscha Krettek -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1396][FLINK-1303] Hadoop Input/Output d...
Github user fhueske commented on the pull request: https://github.com/apache/flink/pull/363#issuecomment-73022621 Hmm, yes. That's also a valid point. But on the other hand, new users might not even be aware of the different types of InputFormats. It all would look natural ;-) I am more leaning towards overloading, but would be fine with having separate functions as well. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1396][FLINK-1303] Hadoop Input/Output d...
Github user rmetzger commented on the pull request: https://github.com/apache/flink/pull/363#issuecomment-73024197 I vote for keeping @aljoscha's original approach. Users might not notice the different interfaces there, so the Hadoop in the method name makes it more explicit. Also, it could lead to confusions because Flink's and Hadoop's InputFormats have pretty similar names (they actually only differ in the package names). Lastly, it would cause some work on Aljoscha's side to update the code and the documentation. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1318] CsvInputFormat: Made quoted strin...
Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/265 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---