[GitHub] flink pull request: [FLINK-1565][FLINK-2078] Document ExecutionCon...
Github user flinkqa commented on the pull request: https://github.com/apache/flink/pull/781#issuecomment-109172997 Tested pull request.Result: fatal: No such remote 'totest' fatal: 'totest' does not appear to be a git repository fatal: The remote end hung up unexpectedly error: pathspec 'totest/flink1565' did not match any file(s) known to git. Running ./tools/qa-check.sh Computing Flink QA-Check results (please be patient). :+1: The number of javadoc errors was 174 and is now 174 :-1: The change increases the number of compiler warnings from 634 to 647 ```diff First 100 warnings: diff: standard output: Broken pipe 1,139c1,149 [WARNING] bootstrap class path not set in conjunction with -source 1.6 [WARNING] /home/robert/qa-bot/flink/tools/_qa_workdir/flink/flink-core/src/main/java/org/apache/flink/api/common/operators/base/CollectorMapOperatorBase.java:[24,45] org.apache.flink.api.common.functions.GenericCollectorMap in org.apache.flink.api.common.functions has been deprecated [WARNING] /home/robert/qa-bot/flink/tools/_qa_workdir/flink/flink-core/src/main/java/org/apache/flink/core/memory/MemorySegment.java:[968,38] sun.misc.Unsafe is internal proprietary API and may be removed in a future release [WARNING] /home/robert/qa-bot/flink/tools/_qa_workdir/flink/flink-core/src/main/java/org/apache/flink/api/common/typeinfo/BasicTypeInfo.java:[56,8] serializable class org.apache.flink.api.common.typeinfo.BasicTypeInfo has no definition of serialVersionUID [WARNING] /home/robert/qa-bot/flink/tools/_qa_workdir/flink/flink-core/src/main/java/org/apache/flink/api/common/typeinfo/NumericTypeInfo.java:[27,8] serializable class org.apache.flink.api.common.typeinfo.NumericTypeInfo has no definition of serialVersionUID [WARNING] /home/robert/qa-bot/flink/tools/_qa_workdir/flink/flink-core/src/main/java/org/apache/flink/api/common/typeinfo/IntegerTypeInfo.java:[27,8] serializable class org.apache.flink.api.common.typeinfo.IntegerTypeInfo has no definition of serialVersionUID [WARNING] /home/robert/qa-bot/flink/tools/_qa_workdir/flink/flink-core/src/main/java/org/apache/flink/api/common/ExecutionConfig.java:[579,25] found raw type: org.apache.flink.api.common.ExecutionConfig.Entry [WARNING] /home/robert/qa-bot/flink/tools/_qa_workdir/flink/flink-core/src/main/java/org/apache/flink/api/common/ExecutionConfig.java:[613,23] serializable class org.apache.flink.api.common.ExecutionConfig.GlobalJobParameters has no definition of serialVersionUID [WARNING] /home/robert/qa-bot/flink/tools/_qa_workdir/flink/flink-core/src/main/java/org/apache/flink/core/memory/MemoryUtils.java:[33,37] sun.misc.Unsafe is internal proprietary API and may be removed in a future release [WARNING] /home/robert/qa-bot/flink/tools/_qa_workdir/flink/flink-core/src/main/java/org/apache/flink/core/memory/MemoryUtils.java:[42,32] sun.misc.Unsafe is internal proprietary API and may be removed in a future release [WARNING] /home/robert/qa-bot/flink/tools/_qa_workdir/flink/flink-core/src/main/java/org/apache/flink/core/memory/MemoryUtils.java:[44,53] sun.misc.Unsafe is internal proprietary API and may be removed in a future release [WARNING] /home/robert/qa-bot/flink/tools/_qa_workdir/flink/flink-core/src/main/java/org/apache/flink/core/memory/MemoryUtils.java:[46,41] sun.misc.Unsafe is internal proprietary API and may be removed in a future release [WARNING] /home/robert/qa-bot/flink/tools/_qa_workdir/flink/flink-core/src/main/java/org/apache/flink/api/common/operators/Operator.java:[231,81] found raw type: org.apache.flink.api.common.operators.Operator [WARNING] /home/robert/qa-bot/flink/tools/_qa_workdir/flink/flink-core/src/main/java/org/apache/flink/api/common/operators/GenericDataSourceBase.java:[49,17] found raw type: org.apache.flink.api.common.operators.GenericDataSourceBase.SplitDataProperties [WARNING] /home/robert/qa-bot/flink/tools/_qa_workdir/flink/flink-core/src/main/java/org/apache/flink/api/common/operators/GenericDataSourceBase.java:[184,28] unchecked conversion [WARNING] /home/robert/qa-bot/flink/tools/_qa_workdir/flink/flink-core/src/main/java/org/apache/flink/api/common/operators/Ordering.java:[116,47] found raw type: java.lang.Class [WARNING] /home/robert/qa-bot/flink/tools/_qa_workdir/flink/flink-core/src/main/java/org/apache/flink/api/common/operators/GenericDataSinkBase.java:[154,97] found raw type: org.apache.flink.api.common.operators.Operator [WARNING] /home/robert/qa-bot/flink/tools/_qa_workdir/flink/flink-core/src/main/java/org/apache/flink/api/common/operators/AbstractUdfOperator.java:[142,40] found raw type: java.lang.Class [WARNING] /home/robert/qa-bot/flink/tools/_qa_workdir/flink/flink-core/src/main/java/org/apache/flink/api/common/operators/AbstractUdfOperator.java:[154,40] found raw type: java.lang.Class [WARNING]
[GitHub] flink pull request: [FLINK-2155] Enforce import restriction on usa...
Github user rmetzger commented on the pull request: https://github.com/apache/flink/pull/790#issuecomment-109192811 Thank you for the contribution. +1 to merge. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2155) Add an additional checkstyle validation for illegal imports
[ https://issues.apache.org/jira/browse/FLINK-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14574082#comment-14574082 ] ASF GitHub Bot commented on FLINK-2155: --- Github user rmetzger commented on the pull request: https://github.com/apache/flink/pull/790#issuecomment-109192811 Thank you for the contribution. +1 to merge. Add an additional checkstyle validation for illegal imports --- Key: FLINK-2155 URL: https://issues.apache.org/jira/browse/FLINK-2155 Project: Flink Issue Type: Improvement Components: Build System Reporter: Lokesh Rajaram Assignee: Lokesh Rajaram Add an additional check-style validation for illegal imports. To begin with the following two package import are marked as illegal: 1. org.apache.commons.lang3.Validate 2. org.apache.flink.shaded.* Implementation based on: http://checkstyle.sourceforge.net/config_imports.html#IllegalImport -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request:
Github user thvasilo commented on the pull request: https://github.com/apache/flink/commit/27487ec6089adbea77266f194582ae476e50e928#commitcomment-11537565 In docs/libs/ml/quickstart.md: In docs/libs/ml/quickstart.md on line 82: TODO: Need to transform the tuples to LabeledVector --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-2139] [streaming] Streaming outputforma...
Github user mbalassi commented on the pull request: https://github.com/apache/flink/pull/789#issuecomment-109218281 Both are good points. As for the socket test I can do one of the following: * Do not test it * Test it with smaller data * Mock the socket with some collection I am not not really happy with any of these options to be honest, but maybe I am just missing something. As for the ITCases, you are right, I have to admit that I was looking at the [AvroOutputFormatTest](https://github.com/apache/flink/blob/master/flink-staging/flink-avro/src/test/java/org/apache/flink/api/avro/AvroOutputFormatTest.java) and named my tests accordingly. So you would suggest renaming that too then? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2000) Add SQL-style aggregations for Table API
[ https://issues.apache.org/jira/browse/FLINK-2000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14574161#comment-14574161 ] ASF GitHub Bot commented on FLINK-2000: --- Github user aljoscha commented on the pull request: https://github.com/apache/flink/pull/782#issuecomment-109211373 This looks good. :+1: Any objections to me merging this later? Add SQL-style aggregations for Table API Key: FLINK-2000 URL: https://issues.apache.org/jira/browse/FLINK-2000 Project: Flink Issue Type: Improvement Components: Table API Reporter: Aljoscha Krettek Assignee: Cheng Hao Priority: Minor Right now, the syntax for aggregations is a.count, a.min and so on. We could in addition offer COUNT(a), MIN(a) and so on. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2153) Exclude dependency on hbase annotations module
[ https://issues.apache.org/jira/browse/FLINK-2153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14574189#comment-14574189 ] Stephan Ewen commented on FLINK-2153: - Any updates on that. Would like to have this in the 0.9 release, where we like the first release candidate out any time now. Exclude dependency on hbase annotations module -- Key: FLINK-2153 URL: https://issues.apache.org/jira/browse/FLINK-2153 Project: Flink Issue Type: Bug Components: Build System Affects Versions: 0.9 Reporter: Márton Balassi Assignee: Lokesh Rajaram [ERROR] Failed to execute goal on project flink-hbase: Could not resolve dependencies for project org.apache.flink:flink-hbase:jar:0.9-SNAPSHOT: Could not find artifact jdk.tools:jdk.tools:jar:1.7 at specified path /System/Library/Java/JavaVirtualMachines/1.6.0.jdk/Contents/Home/../lib/tools.jar There is a Spark issue for this [1] with a solution [2]. [1] https://issues.apache.org/jira/browse/SPARK-4455 [2] https://github.com/apache/spark/pull/3286/files -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2139) Test Streaming Outputformats
[ https://issues.apache.org/jira/browse/FLINK-2139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14574196#comment-14574196 ] ASF GitHub Bot commented on FLINK-2139: --- Github user aljoscha commented on the pull request: https://github.com/apache/flink/pull/789#issuecomment-109216797 I think we need to name the tests ITCase. In my understanding the tests that bring up a test cluster and execute a program are ITCases since they take longer to run than simple tests. I might be wrong, though. Anyone else have an opinion? Test Streaming Outputformats Key: FLINK-2139 URL: https://issues.apache.org/jira/browse/FLINK-2139 Project: Flink Issue Type: Test Components: Streaming Affects Versions: 0.9 Reporter: Márton Balassi Assignee: Márton Balassi Fix For: 0.9 Currently the only tested streaming core output is the writeAsTest and that is only tested indirectly in integration tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2002) Iterative test fails when ran with other tests in the same environment
[ https://issues.apache.org/jira/browse/FLINK-2002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14574199#comment-14574199 ] Márton Balassi commented on FLINK-2002: --- [~gyfora] has already fixed this, I am adding a commit re-enabling the test that revealed the issue. Iterative test fails when ran with other tests in the same environment -- Key: FLINK-2002 URL: https://issues.apache.org/jira/browse/FLINK-2002 Project: Flink Issue Type: Bug Components: Streaming Reporter: Péter Szabó Assignee: Márton Balassi I run tests in the same StreamExecutionEnvironment with MultipleProgramsTestBase. One of the tests uses an iterative data stream. It fails as well as all tests after that. (When I put the iterative test in a separate environment, all tests passes.) For me it seems that it is a state-related issue but there is also some problem with the broker slots. The iterative test throws: java.lang.Exception: TaskManager sent illegal state update: CANCELING at org.apache.flink.runtime.executiongraph.ExecutionGraph.updateState(ExecutionGraph.java:618) at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$receiveWithLogMessages$1$$anonfun$applyOrElse$2.apply$mcV$sp(JobManager.scala:222) at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$receiveWithLogMessages$1$$anonfun$applyOrElse$2.apply(JobManager.scala:221) at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$receiveWithLogMessages$1$$anonfun$applyOrElse$2.apply(JobManager.scala:221) at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24) at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24) at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:41) at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:401) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) org.apache.flink.runtime.client.JobExecutionException: Job execution failed. at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$receiveWithLogMessages$1.applyOrElse(JobManager.scala:314) at scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33) at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33) at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25) at org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:36) at org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:29) at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118) at org.apache.flink.runtime.ActorLogMessages$$anon$1.applyOrElse(ActorLogMessages.scala:29) at akka.actor.Actor$class.aroundReceive(Actor.scala:465) at org.apache.flink.runtime.jobmanager.JobManager.aroundReceive(JobManager.scala:95) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516) at akka.actor.ActorCell.invoke(ActorCell.scala:487) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254) at akka.dispatch.Mailbox.run(Mailbox.scala:221) at akka.dispatch.Mailbox.exec(Mailbox.scala:231) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.pollAndExecAll(ForkJoinPool.java:1253) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1346) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) Caused by: org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException: Not enough free slots available to run the job. You can decrease the operator parallelism or increase the number of slots per TaskManager in the configuration. Task to schedule: Attempt #0 (GroupedActiveDiscretizer (2/4)) @ (unassigned) - [SCHEDULED] with groupID e8f7c9c85e64403962648bc7e2aead8b in sharing group SlotSharingGroup [5e62f1cc5cae2c088430ef935470a8d5, 5bc227941969d1daa1ebb1ba070b55ce, d999ee6c10730775a8fef1c6f1af1dbd, 45b73caa75424d84adbb7bb92671591d, 5c94c54d9316b827c6eba6c721329549, 794d6c56bee347dcdd62ffdf189de267, 4c3b72e17a4acecde4241fe6e63355b8, f6a6028c224a7b81e4802eeaf9c8487e,
[jira] [Commented] (FLINK-2139) Test Streaming Outputformats
[ https://issues.apache.org/jira/browse/FLINK-2139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14574207#comment-14574207 ] ASF GitHub Bot commented on FLINK-2139: --- Github user mbalassi commented on the pull request: https://github.com/apache/flink/pull/789#issuecomment-109218281 Both are good points. As for the socket test I can do one of the following: * Do not test it * Test it with smaller data * Mock the socket with some collection I am not not really happy with any of these options to be honest, but maybe I am just missing something. As for the ITCases, you are right, I have to admit that I was looking at the [AvroOutputFormatTest](https://github.com/apache/flink/blob/master/flink-staging/flink-avro/src/test/java/org/apache/flink/api/avro/AvroOutputFormatTest.java) and named my tests accordingly. So you would suggest renaming that too then? Test Streaming Outputformats Key: FLINK-2139 URL: https://issues.apache.org/jira/browse/FLINK-2139 Project: Flink Issue Type: Test Components: Streaming Affects Versions: 0.9 Reporter: Márton Balassi Assignee: Márton Balassi Fix For: 0.9 Currently the only tested streaming core output is the writeAsTest and that is only tested indirectly in integration tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2123] Fix log4j warnings on CliFrontend...
Github user rmetzger commented on the pull request: https://github.com/apache/flink/pull/783#issuecomment-109199667 Merging ... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-2000] [table] Add sql style aggregation...
Github user fhueske commented on the pull request: https://github.com/apache/flink/pull/782#issuecomment-109214447 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2000) Add SQL-style aggregations for Table API
[ https://issues.apache.org/jira/browse/FLINK-2000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14574178#comment-14574178 ] ASF GitHub Bot commented on FLINK-2000: --- Github user fhueske commented on the pull request: https://github.com/apache/flink/pull/782#issuecomment-109214447 LGTM Add SQL-style aggregations for Table API Key: FLINK-2000 URL: https://issues.apache.org/jira/browse/FLINK-2000 Project: Flink Issue Type: Improvement Components: Table API Reporter: Aljoscha Krettek Assignee: Cheng Hao Priority: Minor Right now, the syntax for aggregations is a.count, a.min and so on. We could in addition offer COUNT(a), MIN(a) and so on. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: Added member currentSplit for FileInputFormat....
Github user rmetzger commented on the pull request: https://github.com/apache/flink/pull/791#issuecomment-109205245 Thank you for the contribution. +1 to merge. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1319) Add static code analysis for UDFs
[ https://issues.apache.org/jira/browse/FLINK-1319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14574191#comment-14574191 ] ASF GitHub Bot commented on FLINK-1319: --- Github user twalthr commented on a diff in the pull request: https://github.com/apache/flink/pull/729#discussion_r31798470 --- Diff: flink-java/src/main/java/org/apache/flink/api/java/functions/SemanticPropUtil.java --- @@ -309,15 +308,20 @@ public static DualInputSemanticProperties getSemanticPropsDual( getSemanticPropsDualFromString(result, forwardedFirst, forwardedSecond, nonForwardedFirst, nonForwardedSecond, readFirst, readSecond, inType1, inType2, outType); return result; - } else { - return new DualInputSemanticProperties(); } + return null; + } + + public static void getSemanticPropsSingleFromString(SingleInputSemanticProperties result, + String[] forwarded, String[] nonForwarded, String[] readSet, + TypeInformation? inType, TypeInformation? outType) { + getSemanticPropsSingleFromString(result, forwarded, nonForwarded, readSet, inType, outType, false); } public static void getSemanticPropsSingleFromString(SingleInputSemanticProperties result, String[] forwarded, String[] nonForwarded, String[] readSet, - TypeInformation? inType, TypeInformation? outType) - { + TypeInformation? inType, TypeInformation? outType, + boolean skipIncompatibleTypes) { --- End diff -- Sometimes the analyzer works better than required. E.g. the analyzer outputs @ForwardedFields(*-record.customer.name) but if customer is a GenericType output type, the types are incompatible. I thought it is better to reuse the type compatibility checking of the PropUtil than reimplement everything, but skip types that are incompatible without throwing Exceptions. Add static code analysis for UDFs - Key: FLINK-1319 URL: https://issues.apache.org/jira/browse/FLINK-1319 Project: Flink Issue Type: New Feature Components: Java API, Scala API Reporter: Stephan Ewen Assignee: Timo Walther Priority: Minor Flink's Optimizer takes information that tells it for UDFs which fields of the input elements are accessed, modified, or frwarded/copied. This information frequently helps to reuse partitionings, sorts, etc. It may speed up programs significantly, as it can frequently eliminate sorts and shuffles, which are costly. Right now, users can add lightweight annotations to UDFs to provide this information (such as adding {{@ConstandFields(0-3, 1, 2-1)}}. We worked with static code analysis of UDFs before, to determine this information automatically. This is an incredible feature, as it magically makes programs faster. For record-at-a-time operations (Map, Reduce, FlatMap, Join, Cross), this works surprisingly well in many cases. We used the Soot toolkit for the static code analysis. Unfortunately, Soot is LGPL licensed and thus we did not include any of the code so far. I propose to add this functionality to Flink, in the form of a drop-in addition, to work around the LGPL incompatibility with ALS 2.0. Users could simply download a special flink-code-analysis.jar and drop it into the lib folder to enable this functionality. We may even add a script to tools that downloads that library automatically into the lib folder. This should be legally fine, since we do not redistribute LGPL code and only dynamically link it (the incompatibility with ASL 2.0 is mainly in the patentability, if I remember correctly). Prior work on this has been done by [~aljoscha] and [~skunert], which could provide a code base to start with. *Appendix* Hompage to Soot static analysis toolkit: http://www.sable.mcgill.ca/soot/ Papers on static analysis and for optimization: http://stratosphere.eu/assets/papers/EnablingOperatorReorderingSCA_12.pdf and http://stratosphere.eu/assets/papers/openingTheBlackBoxes_12.pdf Quick introduction to the Optimizer: http://stratosphere.eu/assets/papers/2014-VLDBJ_Stratosphere_Overview.pdf (Section 6) Optimizer for Iterations: http://stratosphere.eu/assets/papers/spinningFastIterativeDataFlows_12.pdf (Sections 4.3 and 5.3) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2139) Test Streaming Outputformats
[ https://issues.apache.org/jira/browse/FLINK-2139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14574097#comment-14574097 ] ASF GitHub Bot commented on FLINK-2139: --- Github user mbalassi commented on the pull request: https://github.com/apache/flink/pull/789#issuecomment-109196655 The socket test failed once in ten runs on travis, some messages were lost while writing and reading to the socket. do not think that we can avoid that completely. I can feed in less input data or remove the test. Test Streaming Outputformats Key: FLINK-2139 URL: https://issues.apache.org/jira/browse/FLINK-2139 Project: Flink Issue Type: Test Components: Streaming Affects Versions: 0.9 Reporter: Márton Balassi Assignee: Márton Balassi Fix For: 0.9 Currently the only tested streaming core output is the writeAsTest and that is only tested indirectly in integration tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2139] [streaming] Streaming outputforma...
Github user mbalassi commented on the pull request: https://github.com/apache/flink/pull/789#issuecomment-109196655 The socket test failed once in ten runs on travis, some messages were lost while writing and reading to the socket. do not think that we can avoid that completely. I can feed in less input data or remove the test. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2002) Iterative test fails when ran with other tests in the same environment
[ https://issues.apache.org/jira/browse/FLINK-2002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14574153#comment-14574153 ] Stephan Ewen commented on FLINK-2002: - This seems to be related to improper cleanup after a job. Iterative test fails when ran with other tests in the same environment -- Key: FLINK-2002 URL: https://issues.apache.org/jira/browse/FLINK-2002 Project: Flink Issue Type: Bug Components: Streaming Reporter: Péter Szabó I run tests in the same StreamExecutionEnvironment with MultipleProgramsTestBase. One of the tests uses an iterative data stream. It fails as well as all tests after that. (When I put the iterative test in a separate environment, all tests passes.) For me it seems that it is a state-related issue but there is also some problem with the broker slots. The iterative test throws: java.lang.Exception: TaskManager sent illegal state update: CANCELING at org.apache.flink.runtime.executiongraph.ExecutionGraph.updateState(ExecutionGraph.java:618) at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$receiveWithLogMessages$1$$anonfun$applyOrElse$2.apply$mcV$sp(JobManager.scala:222) at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$receiveWithLogMessages$1$$anonfun$applyOrElse$2.apply(JobManager.scala:221) at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$receiveWithLogMessages$1$$anonfun$applyOrElse$2.apply(JobManager.scala:221) at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24) at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24) at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:41) at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:401) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) org.apache.flink.runtime.client.JobExecutionException: Job execution failed. at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$receiveWithLogMessages$1.applyOrElse(JobManager.scala:314) at scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33) at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33) at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25) at org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:36) at org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:29) at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118) at org.apache.flink.runtime.ActorLogMessages$$anon$1.applyOrElse(ActorLogMessages.scala:29) at akka.actor.Actor$class.aroundReceive(Actor.scala:465) at org.apache.flink.runtime.jobmanager.JobManager.aroundReceive(JobManager.scala:95) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516) at akka.actor.ActorCell.invoke(ActorCell.scala:487) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254) at akka.dispatch.Mailbox.run(Mailbox.scala:221) at akka.dispatch.Mailbox.exec(Mailbox.scala:231) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.pollAndExecAll(ForkJoinPool.java:1253) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1346) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) Caused by: org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException: Not enough free slots available to run the job. You can decrease the operator parallelism or increase the number of slots per TaskManager in the configuration. Task to schedule: Attempt #0 (GroupedActiveDiscretizer (2/4)) @ (unassigned) - [SCHEDULED] with groupID e8f7c9c85e64403962648bc7e2aead8b in sharing group SlotSharingGroup [5e62f1cc5cae2c088430ef935470a8d5, 5bc227941969d1daa1ebb1ba070b55ce, d999ee6c10730775a8fef1c6f1af1dbd, 45b73caa75424d84adbb7bb92671591d, 5c94c54d9316b827c6eba6c721329549, 794d6c56bee347dcdd62ffdf189de267, 4c3b72e17a4acecde4241fe6e63355b8, f6a6028c224a7b81e4802eeaf9c8487e, 989c68790fc7c5e2f8b8c150a33fef89, db93daa1f9e5194f0079df2629b08efb,
[GitHub] flink pull request: [FLINK-2072] [ml] [docs] Add a quickstart guid...
GitHub user thvasilo opened a pull request: https://github.com/apache/flink/pull/792 [FLINK-2072] [ml] [docs] Add a quickstart guide for FlinkML This is an initial version of the quickstart guide. There are some issues that still need to be addressed such as the validity of standardizing the data, and whether the complete code example should be included in an examples package for FlinkML. You can merge this pull request into a Git repository by running: $ git pull https://github.com/thvasilo/flink quickstart-ml Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/792.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #792 commit 27487ec6089adbea77266f194582ae476e50e928 Author: Theodore Vasiloudis t...@sics.se Date: 2015-06-05T09:09:11Z Initial version of quickstart guide --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2163) VertexCentricConfigurationITCase sometimes fails on Travis
[ https://issues.apache.org/jira/browse/FLINK-2163?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14574184#comment-14574184 ] Stephan Ewen commented on FLINK-2163: - It is a strong suspicion. Other file based tests have failed with a similar error before. Collect() based tests seem to not exhibit that issue. VertexCentricConfigurationITCase sometimes fails on Travis -- Key: FLINK-2163 URL: https://issues.apache.org/jira/browse/FLINK-2163 Project: Flink Issue Type: Bug Components: Gelly Reporter: Aljoscha Krettek This is the relevant output from the log: {code} testIterationINDirection[Execution mode = CLUSTER](org.apache.flink.graph.test.VertexCentricConfigurationITCase) Time elapsed: 0.587 sec FAILURE! java.lang.AssertionError: Different number of lines in expected and obtained result. expected:5 but was:2 at org.junit.Assert.fail(Assert.java:88) at org.junit.Assert.failNotEquals(Assert.java:743) at org.junit.Assert.assertEquals(Assert.java:118) at org.junit.Assert.assertEquals(Assert.java:555) at org.apache.flink.test.util.TestBaseUtils.compareResultsByLinesInMemory(TestBaseUtils.java:270) at org.apache.flink.test.util.TestBaseUtils.compareResultsByLinesInMemory(TestBaseUtils.java:256) at org.apache.flink.graph.test.VertexCentricConfigurationITCase.after(VertexCentricConfigurationITCase.java:70) Results : Failed tests: VertexCentricConfigurationITCase.after:70-TestBaseUtils.compareResultsByLinesInMemory:256-TestBaseUtils.compareResultsByLinesInMemory:270 Different number of lines in expected and obtained result. expected:5 but was:2 {code} https://travis-ci.org/aljoscha/flink/jobs/65403502 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request:
Github user thvasilo commented on the pull request: https://github.com/apache/flink/commit/27487ec6089adbea77266f194582ae476e50e928#commitcomment-11537784 In docs/libs/ml/quickstart.md: In docs/libs/ml/quickstart.md on line 69: This way of reading in CSVs can get unwieldy fast. We need a more concise way to do this, --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-2160] Change Streaming Source Interface...
Github user aljoscha commented on the pull request: https://github.com/apache/flink/pull/785#issuecomment-109173098 @mbalassi What's the reason for not wanting emit and keeping collect? I am aware that other parts of the system have a Collector with a collect method but the sources are somewhat special, I think. Also, collect() is a legacy from Hadoop MapReduce and I'm not sure it's a good name in the first place. :smile: --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-2000] [table] Add sql style aggregation...
Github user aljoscha commented on the pull request: https://github.com/apache/flink/pull/782#issuecomment-109211373 This looks good. :+1: Any objections to me merging this later? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1319][core] Add static code analysis fo...
Github user twalthr commented on a diff in the pull request: https://github.com/apache/flink/pull/729#discussion_r31798470 --- Diff: flink-java/src/main/java/org/apache/flink/api/java/functions/SemanticPropUtil.java --- @@ -309,15 +308,20 @@ public static DualInputSemanticProperties getSemanticPropsDual( getSemanticPropsDualFromString(result, forwardedFirst, forwardedSecond, nonForwardedFirst, nonForwardedSecond, readFirst, readSecond, inType1, inType2, outType); return result; - } else { - return new DualInputSemanticProperties(); } + return null; + } + + public static void getSemanticPropsSingleFromString(SingleInputSemanticProperties result, + String[] forwarded, String[] nonForwarded, String[] readSet, + TypeInformation? inType, TypeInformation? outType) { + getSemanticPropsSingleFromString(result, forwarded, nonForwarded, readSet, inType, outType, false); } public static void getSemanticPropsSingleFromString(SingleInputSemanticProperties result, String[] forwarded, String[] nonForwarded, String[] readSet, - TypeInformation? inType, TypeInformation? outType) - { + TypeInformation? inType, TypeInformation? outType, + boolean skipIncompatibleTypes) { --- End diff -- Sometimes the analyzer works better than required. E.g. the analyzer outputs @ForwardedFields(*-record.customer.name) but if customer is a GenericType output type, the types are incompatible. I thought it is better to reuse the type compatibility checking of the PropUtil than reimplement everything, but skip types that are incompatible without throwing Exceptions. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Created] (FLINK-2164) Document batch and streaming startup modes
Robert Metzger created FLINK-2164: - Summary: Document batch and streaming startup modes Key: FLINK-2164 URL: https://issues.apache.org/jira/browse/FLINK-2164 Project: Flink Issue Type: Bug Components: Documentation Affects Versions: 0.9 Reporter: Robert Metzger Assignee: Robert Metzger -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1892) Local job execution does not exit.
[ https://issues.apache.org/jira/browse/FLINK-1892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14574222#comment-14574222 ] Robert Metzger commented on FLINK-1892: --- [~ktzoumas], I'll take over the issue, ok? Local job execution does not exit. -- Key: FLINK-1892 URL: https://issues.apache.org/jira/browse/FLINK-1892 Project: Flink Issue Type: Bug Components: Flink on Tez Reporter: Kostas Tzoumas Assignee: Kostas Tzoumas When using the LocalTezEnvironment to run a job from the IDE the job fails to exit after producing data. The following thread seems to run and not allow the job to exit: Thread-31 #46 prio=5 os_prio=31 tid=0x7fb5d2c43000 nid=0x5507 runnable [0x000127319000] java.lang.Thread.State: RUNNABLE at java.lang.Throwable.fillInStackTrace(Native Method) at java.lang.Throwable.fillInStackTrace(Throwable.java:783) - locked 0x00076dfda130 (a java.lang.InterruptedException) at java.lang.Throwable.init(Throwable.java:250) at java.lang.Exception.init(Exception.java:54) at java.lang.InterruptedException.init(InterruptedException.java:57) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1220) at java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:335) at java.util.concurrent.PriorityBlockingQueue.take(PriorityBlockingQueue.java:545) at org.apache.tez.dag.app.rm.LocalTaskSchedulerService$AsyncDelegateRequestHandler.processRequest(LocalTaskSchedulerService.java:322) at org.apache.tez.dag.app.rm.LocalTaskSchedulerService$AsyncDelegateRequestHandler.run(LocalTaskSchedulerService.java:316) at java.lang.Thread.run(Thread.java:745) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (FLINK-1892) Local job execution does not exit.
[ https://issues.apache.org/jira/browse/FLINK-1892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Metzger reassigned FLINK-1892: - Assignee: Robert Metzger (was: Kostas Tzoumas) Local job execution does not exit. -- Key: FLINK-1892 URL: https://issues.apache.org/jira/browse/FLINK-1892 Project: Flink Issue Type: Bug Components: Flink on Tez Reporter: Kostas Tzoumas Assignee: Robert Metzger When using the LocalTezEnvironment to run a job from the IDE the job fails to exit after producing data. The following thread seems to run and not allow the job to exit: Thread-31 #46 prio=5 os_prio=31 tid=0x7fb5d2c43000 nid=0x5507 runnable [0x000127319000] java.lang.Thread.State: RUNNABLE at java.lang.Throwable.fillInStackTrace(Native Method) at java.lang.Throwable.fillInStackTrace(Throwable.java:783) - locked 0x00076dfda130 (a java.lang.InterruptedException) at java.lang.Throwable.init(Throwable.java:250) at java.lang.Exception.init(Exception.java:54) at java.lang.InterruptedException.init(InterruptedException.java:57) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1220) at java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:335) at java.util.concurrent.PriorityBlockingQueue.take(PriorityBlockingQueue.java:545) at org.apache.tez.dag.app.rm.LocalTaskSchedulerService$AsyncDelegateRequestHandler.processRequest(LocalTaskSchedulerService.java:322) at org.apache.tez.dag.app.rm.LocalTaskSchedulerService$AsyncDelegateRequestHandler.run(LocalTaskSchedulerService.java:316) at java.lang.Thread.run(Thread.java:745) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-2165) Rename Table conversion methods in TableEnvironment
Fabian Hueske created FLINK-2165: Summary: Rename Table conversion methods in TableEnvironment Key: FLINK-2165 URL: https://issues.apache.org/jira/browse/FLINK-2165 Project: Flink Issue Type: Improvement Components: Table API Affects Versions: 0.9 Reporter: Fabian Hueske Priority: Minor Fix For: 0.9 The {{TableEnvironment}} provides methods to convert DataSets and DataStreams into Tables and back. These methods are called {{toTable()}}, {{toSet()}}, and {{toStream()}}. I propose to rename the methods into {{fromDataSet()}}, {{fromDataStream()}}, {{toDataSet()}}, and {{toDataStream()}} for the following reasons: - {{fromDataSet()}}, {{fromDataStream()}} is closer to the SQL FROM expression - It allows to add methods such as {{fromCSV()}}, {{fromHCat()}}, {{fromParquet()}}, and so on to the {{TableEnvironment}} - {{toSet()}} and {{toStream()}} should be renamed for consistency. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1940) StockPrice example cannot be visualized
[ https://issues.apache.org/jira/browse/FLINK-1940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14574157#comment-14574157 ] Stephan Ewen commented on FLINK-1940: - I think Marton removed that example, so it is not relevant to any packaged example any more. StockPrice example cannot be visualized --- Key: FLINK-1940 URL: https://issues.apache.org/jira/browse/FLINK-1940 Project: Flink Issue Type: Bug Components: Streaming Reporter: Gyula Fora The planvisualizer fails on the JSON generated by the StockPrice example -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1892) Local job execution does not exit.
[ https://issues.apache.org/jira/browse/FLINK-1892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14574167#comment-14574167 ] Stephan Ewen commented on FLINK-1892: - Let's bump the Tez version to 0.6.1 to get the fix in. Local job execution does not exit. -- Key: FLINK-1892 URL: https://issues.apache.org/jira/browse/FLINK-1892 Project: Flink Issue Type: Bug Components: Flink on Tez Reporter: Kostas Tzoumas Assignee: Kostas Tzoumas When using the LocalTezEnvironment to run a job from the IDE the job fails to exit after producing data. The following thread seems to run and not allow the job to exit: Thread-31 #46 prio=5 os_prio=31 tid=0x7fb5d2c43000 nid=0x5507 runnable [0x000127319000] java.lang.Thread.State: RUNNABLE at java.lang.Throwable.fillInStackTrace(Native Method) at java.lang.Throwable.fillInStackTrace(Throwable.java:783) - locked 0x00076dfda130 (a java.lang.InterruptedException) at java.lang.Throwable.init(Throwable.java:250) at java.lang.Exception.init(Exception.java:54) at java.lang.InterruptedException.init(InterruptedException.java:57) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterruptibly(AbstractQueuedSynchronizer.java:1220) at java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantLock.java:335) at java.util.concurrent.PriorityBlockingQueue.take(PriorityBlockingQueue.java:545) at org.apache.tez.dag.app.rm.LocalTaskSchedulerService$AsyncDelegateRequestHandler.processRequest(LocalTaskSchedulerService.java:322) at org.apache.tez.dag.app.rm.LocalTaskSchedulerService$AsyncDelegateRequestHandler.run(LocalTaskSchedulerService.java:316) at java.lang.Thread.run(Thread.java:745) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1844) Add Normaliser to ML library
[ https://issues.apache.org/jira/browse/FLINK-1844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14574166#comment-14574166 ] Theodore Vasiloudis commented on FLINK-1844: No worries [~fobeligi], thank you for your contribution. Keep us updated. Add Normaliser to ML library Key: FLINK-1844 URL: https://issues.apache.org/jira/browse/FLINK-1844 Project: Flink Issue Type: Improvement Components: Machine Learning Library Reporter: Faye Beligianni Assignee: Faye Beligianni Priority: Minor Labels: ML, Starter In many algorithms in ML, the features' values would be better to lie between a given range of values, usually in the range (0,1) [1]. Therefore, a {{Transformer}} could be implemented to achieve that normalisation. Resources: [1][http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.MinMaxScaler.html] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (FLINK-2002) Iterative test fails when ran with other tests in the same environment
[ https://issues.apache.org/jira/browse/FLINK-2002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Márton Balassi reassigned FLINK-2002: - Assignee: Márton Balassi Iterative test fails when ran with other tests in the same environment -- Key: FLINK-2002 URL: https://issues.apache.org/jira/browse/FLINK-2002 Project: Flink Issue Type: Bug Components: Streaming Reporter: Péter Szabó Assignee: Márton Balassi I run tests in the same StreamExecutionEnvironment with MultipleProgramsTestBase. One of the tests uses an iterative data stream. It fails as well as all tests after that. (When I put the iterative test in a separate environment, all tests passes.) For me it seems that it is a state-related issue but there is also some problem with the broker slots. The iterative test throws: java.lang.Exception: TaskManager sent illegal state update: CANCELING at org.apache.flink.runtime.executiongraph.ExecutionGraph.updateState(ExecutionGraph.java:618) at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$receiveWithLogMessages$1$$anonfun$applyOrElse$2.apply$mcV$sp(JobManager.scala:222) at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$receiveWithLogMessages$1$$anonfun$applyOrElse$2.apply(JobManager.scala:221) at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$receiveWithLogMessages$1$$anonfun$applyOrElse$2.apply(JobManager.scala:221) at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24) at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24) at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:41) at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:401) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) org.apache.flink.runtime.client.JobExecutionException: Job execution failed. at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$receiveWithLogMessages$1.applyOrElse(JobManager.scala:314) at scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33) at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33) at scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25) at org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:36) at org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:29) at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118) at org.apache.flink.runtime.ActorLogMessages$$anon$1.applyOrElse(ActorLogMessages.scala:29) at akka.actor.Actor$class.aroundReceive(Actor.scala:465) at org.apache.flink.runtime.jobmanager.JobManager.aroundReceive(JobManager.scala:95) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516) at akka.actor.ActorCell.invoke(ActorCell.scala:487) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254) at akka.dispatch.Mailbox.run(Mailbox.scala:221) at akka.dispatch.Mailbox.exec(Mailbox.scala:231) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.pollAndExecAll(ForkJoinPool.java:1253) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1346) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) Caused by: org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException: Not enough free slots available to run the job. You can decrease the operator parallelism or increase the number of slots per TaskManager in the configuration. Task to schedule: Attempt #0 (GroupedActiveDiscretizer (2/4)) @ (unassigned) - [SCHEDULED] with groupID e8f7c9c85e64403962648bc7e2aead8b in sharing group SlotSharingGroup [5e62f1cc5cae2c088430ef935470a8d5, 5bc227941969d1daa1ebb1ba070b55ce, d999ee6c10730775a8fef1c6f1af1dbd, 45b73caa75424d84adbb7bb92671591d, 5c94c54d9316b827c6eba6c721329549, 794d6c56bee347dcdd62ffdf189de267, 4c3b72e17a4acecde4241fe6e63355b8, f6a6028c224a7b81e4802eeaf9c8487e, 989c68790fc7c5e2f8b8c150a33fef89, db93daa1f9e5194f0079df2629b08efb, bf7dbb1fd756ce322249eb973844b375,
[GitHub] flink pull request: [FLINK-2139] [streaming] Streaming outputforma...
Github user aljoscha commented on the pull request: https://github.com/apache/flink/pull/789#issuecomment-109216797 I think we need to name the tests ITCase. In my understanding the tests that bring up a test cluster and execute a program are ITCases since they take longer to run than simple tests. I might be wrong, though. Anyone else have an opinion? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1319][core] Add static code analysis fo...
Github user fhueske commented on a diff in the pull request: https://github.com/apache/flink/pull/729#discussion_r31799272 --- Diff: flink-java/src/main/java/org/apache/flink/api/java/functions/SemanticPropUtil.java --- @@ -309,15 +308,20 @@ public static DualInputSemanticProperties getSemanticPropsDual( getSemanticPropsDualFromString(result, forwardedFirst, forwardedSecond, nonForwardedFirst, nonForwardedSecond, readFirst, readSecond, inType1, inType2, outType); return result; - } else { - return new DualInputSemanticProperties(); } + return null; + } + + public static void getSemanticPropsSingleFromString(SingleInputSemanticProperties result, + String[] forwarded, String[] nonForwarded, String[] readSet, + TypeInformation? inType, TypeInformation? outType) { + getSemanticPropsSingleFromString(result, forwarded, nonForwarded, readSet, inType, outType, false); } public static void getSemanticPropsSingleFromString(SingleInputSemanticProperties result, String[] forwarded, String[] nonForwarded, String[] readSet, - TypeInformation? inType, TypeInformation? outType) - { + TypeInformation? inType, TypeInformation? outType, + boolean skipIncompatibleTypes) { --- End diff -- OK, got it. Thanks for explaining. :-) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1319) Add static code analysis for UDFs
[ https://issues.apache.org/jira/browse/FLINK-1319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14574204#comment-14574204 ] ASF GitHub Bot commented on FLINK-1319: --- Github user fhueske commented on a diff in the pull request: https://github.com/apache/flink/pull/729#discussion_r31799272 --- Diff: flink-java/src/main/java/org/apache/flink/api/java/functions/SemanticPropUtil.java --- @@ -309,15 +308,20 @@ public static DualInputSemanticProperties getSemanticPropsDual( getSemanticPropsDualFromString(result, forwardedFirst, forwardedSecond, nonForwardedFirst, nonForwardedSecond, readFirst, readSecond, inType1, inType2, outType); return result; - } else { - return new DualInputSemanticProperties(); } + return null; + } + + public static void getSemanticPropsSingleFromString(SingleInputSemanticProperties result, + String[] forwarded, String[] nonForwarded, String[] readSet, + TypeInformation? inType, TypeInformation? outType) { + getSemanticPropsSingleFromString(result, forwarded, nonForwarded, readSet, inType, outType, false); } public static void getSemanticPropsSingleFromString(SingleInputSemanticProperties result, String[] forwarded, String[] nonForwarded, String[] readSet, - TypeInformation? inType, TypeInformation? outType) - { + TypeInformation? inType, TypeInformation? outType, + boolean skipIncompatibleTypes) { --- End diff -- OK, got it. Thanks for explaining. :-) Add static code analysis for UDFs - Key: FLINK-1319 URL: https://issues.apache.org/jira/browse/FLINK-1319 Project: Flink Issue Type: New Feature Components: Java API, Scala API Reporter: Stephan Ewen Assignee: Timo Walther Priority: Minor Flink's Optimizer takes information that tells it for UDFs which fields of the input elements are accessed, modified, or frwarded/copied. This information frequently helps to reuse partitionings, sorts, etc. It may speed up programs significantly, as it can frequently eliminate sorts and shuffles, which are costly. Right now, users can add lightweight annotations to UDFs to provide this information (such as adding {{@ConstandFields(0-3, 1, 2-1)}}. We worked with static code analysis of UDFs before, to determine this information automatically. This is an incredible feature, as it magically makes programs faster. For record-at-a-time operations (Map, Reduce, FlatMap, Join, Cross), this works surprisingly well in many cases. We used the Soot toolkit for the static code analysis. Unfortunately, Soot is LGPL licensed and thus we did not include any of the code so far. I propose to add this functionality to Flink, in the form of a drop-in addition, to work around the LGPL incompatibility with ALS 2.0. Users could simply download a special flink-code-analysis.jar and drop it into the lib folder to enable this functionality. We may even add a script to tools that downloads that library automatically into the lib folder. This should be legally fine, since we do not redistribute LGPL code and only dynamically link it (the incompatibility with ASL 2.0 is mainly in the patentability, if I remember correctly). Prior work on this has been done by [~aljoscha] and [~skunert], which could provide a code base to start with. *Appendix* Hompage to Soot static analysis toolkit: http://www.sable.mcgill.ca/soot/ Papers on static analysis and for optimization: http://stratosphere.eu/assets/papers/EnablingOperatorReorderingSCA_12.pdf and http://stratosphere.eu/assets/papers/openingTheBlackBoxes_12.pdf Quick introduction to the Optimizer: http://stratosphere.eu/assets/papers/2014-VLDBJ_Stratosphere_Overview.pdf (Section 6) Optimizer for Iterations: http://stratosphere.eu/assets/papers/spinningFastIterativeDataFlows_12.pdf (Sections 4.3 and 5.3) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2165] [TableAPI] Renamed table conversi...
GitHub user fhueske opened a pull request: https://github.com/apache/flink/pull/793 [FLINK-2165] [TableAPI] Renamed table conversion functions in TableEnvironment You can merge this pull request into a Git repository by running: $ git pull https://github.com/fhueske/flink fromDataSet Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/793.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #793 commit 9dd5ca596282786c0ae4e2b104e4f36de48554f6 Author: Fabian Hueske fhue...@apache.org Date: 2015-06-05T09:37:39Z [FLINK-2165] [TableAPI] Renamed table conversion functions in TableEnvironment --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Resolved] (FLINK-2065) Cancelled jobs finish with final state FAILED
[ https://issues.apache.org/jira/browse/FLINK-2065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ufuk Celebi resolved FLINK-2065. Resolution: Fixed Fix Version/s: 0.9 Fixed in 5a7ceda61227336115723da969ee649202a8dbb6. Cancelled jobs finish with final state FAILED - Key: FLINK-2065 URL: https://issues.apache.org/jira/browse/FLINK-2065 Project: Flink Issue Type: Bug Components: Distributed Runtime Affects Versions: 0.9 Reporter: Robert Metzger Fix For: 0.9 Attachments: failed.tgz While running some experiments, I've noticed that jobs sometimes finish in FAILED, even though I've cancelled them. The reported error is {code} hdp22-kafka-w-0.c.astral-sorter-757.internal Error: java.lang.IllegalStateException: Buffer has already been recycled. at org.apache.flink.shaded.com.google.common.base.Preconditions.checkState(Preconditions.java:173) at org.apache.flink.runtime.io.network.buffer.Buffer.ensureNotRecycled(Buffer.java:142) at org.apache.flink.runtime.io.network.buffer.Buffer.getMemorySegment(Buffer.java:78) at org.apache.flink.runtime.io.network.api.serialization.SpillingAdaptiveSpanningRecordDeserializer.setNextBuffer(SpillingAdaptiveSpanningRecordDeserializer.java:72) at org.apache.flink.runtime.io.network.api.reader.AbstractRecordReader.getNextRecord(AbstractRecordReader.java:80) at org.apache.flink.runtime.io.network.api.reader.MutableRecordReader.next(MutableRecordReader.java:34) at org.apache.flink.runtime.operators.util.ReaderIterator.next(ReaderIterator.java:73) at org.apache.flink.runtime.operators.MapDriver.run(MapDriver.java:96) at org.apache.flink.runtime.operators.RegularPactTask.run(RegularPactTask.java:496) at org.apache.flink.runtime.operators.RegularPactTask.invoke(RegularPactTask.java:362) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:562) at java.lang.Thread.run(Thread.java:745) {code} The logs: {code} 16:29:37,212 INFO org.apache.flink.yarn.ApplicationMaster$$anonfun$2$$anon$1 - Trying to cancel job with ID ecccf02327c70c9e35770c6da37638e1. 16:29:37,214 INFO org.apache.flink.yarn.ApplicationMaster$$anonfun$2$$anon$1 - Status of job ecccf02327c70c9e35770c6da37638e1 (Simple big union) changed to CANCELLING . 16:31:15,581 INFO org.apache.flink.yarn.ApplicationMaster$$anonfun$2$$anon$1 - Status of job ecccf02327c70c9e35770c6da37638e1 (Simple big union) changed to FAILING Buffer has already been recycled.. {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-2166) Add fromCsvFile() to TableEnvironment
Fabian Hueske created FLINK-2166: Summary: Add fromCsvFile() to TableEnvironment Key: FLINK-2166 URL: https://issues.apache.org/jira/browse/FLINK-2166 Project: Flink Issue Type: New Feature Components: Table API Affects Versions: 0.9 Reporter: Fabian Hueske Priority: Minor Add a {{fromCsvFile()}} method to the {{TableEnvironment}} to read a {{Table}} from a CSV file. The implementation should reuse Flink's CsvInputFormat. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-2167) Add fromHCat() to TableEnvironment
Fabian Hueske created FLINK-2167: Summary: Add fromHCat() to TableEnvironment Key: FLINK-2167 URL: https://issues.apache.org/jira/browse/FLINK-2167 Project: Flink Issue Type: New Feature Components: Table API Affects Versions: 0.9 Reporter: Fabian Hueske Priority: Minor Add a {{fromHCat()}} method to the {{TableEnvironment}} to read a {{Table}} from an HCatalog table. The implementation could reuse Flink's HCatInputFormat. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: Added member currentSplit for FileInputFormat....
Github user peedeeX21 commented on the pull request: https://github.com/apache/flink/pull/791#issuecomment-109249506 Travis fails executing this test: `KafkaITCase.testPersistentSourceWithOffsetUpdates`. Unfortunately I have no idea what the problem is. I am not even sure if this is related to my commit. Can someone have a look at it? Thanks --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1863) RemoteInputChannelTest.testConcurrentOnBufferAndRelease fails on travis
[ https://issues.apache.org/jira/browse/FLINK-1863?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14574312#comment-14574312 ] Ufuk Celebi commented on FLINK-1863: Not occurring any more. I tried to debug this , but could not reproduce it with many (20+) Travis runs and didn't notice anything in the code. RemoteInputChannelTest.testConcurrentOnBufferAndRelease fails on travis --- Key: FLINK-1863 URL: https://issues.apache.org/jira/browse/FLINK-1863 Project: Flink Issue Type: Bug Components: Tests Affects Versions: 0.9 Reporter: Robert Metzger Assignee: Ufuk Celebi {code} testConcurrentOnBufferAndRelease(org.apache.flink.runtime.io.network.partition.consumer.RemoteInputChannelTest) Time elapsed: 120.022 sec ERROR! java.lang.Exception: test timed out after 12 milliseconds at sun.misc.Unsafe.park(Native Method) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:838) at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:998) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304) at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:248) at java.util.concurrent.FutureTask.get(FutureTask.java:111) at org.apache.flink.runtime.io.network.partition.consumer.RemoteInputChannelTest.testConcurrentOnBufferAndRelease(RemoteInputChannelTest.java:124) {code} This is the build: https://s3.amazonaws.com/archive.travis-ci.org/jobs/57943450/log.txt -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2165) Rename Table conversion methods in TableEnvironment
[ https://issues.apache.org/jira/browse/FLINK-2165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14574267#comment-14574267 ] ASF GitHub Bot commented on FLINK-2165: --- GitHub user fhueske opened a pull request: https://github.com/apache/flink/pull/793 [FLINK-2165] [TableAPI] Renamed table conversion functions in TableEnvironment You can merge this pull request into a Git repository by running: $ git pull https://github.com/fhueske/flink fromDataSet Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/793.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #793 commit 9dd5ca596282786c0ae4e2b104e4f36de48554f6 Author: Fabian Hueske fhue...@apache.org Date: 2015-06-05T09:37:39Z [FLINK-2165] [TableAPI] Renamed table conversion functions in TableEnvironment Rename Table conversion methods in TableEnvironment --- Key: FLINK-2165 URL: https://issues.apache.org/jira/browse/FLINK-2165 Project: Flink Issue Type: Improvement Components: Table API Affects Versions: 0.9 Reporter: Fabian Hueske Assignee: Fabian Hueske Priority: Minor Fix For: 0.9 The {{TableEnvironment}} provides methods to convert DataSets and DataStreams into Tables and back. These methods are called {{toTable()}}, {{toSet()}}, and {{toStream()}}. I propose to rename the methods into {{fromDataSet()}}, {{fromDataStream()}}, {{toDataSet()}}, and {{toDataStream()}} for the following reasons: - {{fromDataSet()}}, {{fromDataStream()}} is closer to the SQL FROM expression - It allows to add methods such as {{fromCSV()}}, {{fromHCat()}}, {{fromParquet()}}, and so on to the {{TableEnvironment}} - {{toSet()}} and {{toStream()}} should be renamed for consistency. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1731) Add kMeans clustering algorithm to machine learning library
[ https://issues.apache.org/jira/browse/FLINK-1731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14574273#comment-14574273 ] Peter Schrott commented on FLINK-1731: -- [~till.rohrmann] I am not entirely sure if we speak about the same thing. In our opinion the failure of Travis is not related to our changes. Or do you mean, that I should force Travis to run over my repository to see the problem still exists? If so, I just need to push something to my repository, right? But I don't have any changes to make. - Thanks, Peter Add kMeans clustering algorithm to machine learning library --- Key: FLINK-1731 URL: https://issues.apache.org/jira/browse/FLINK-1731 Project: Flink Issue Type: New Feature Components: Machine Learning Library Reporter: Till Rohrmann Assignee: Peter Schrott Labels: ML The Flink repository already contains a kMeans implementation but it is not yet ported to the machine learning library. I assume that only the used data types have to be adapted and then it can be more or less directly moved to flink-ml. The kMeans++ [1] and the kMeans|| [2] algorithm constitute a better implementation because the improve the initial seeding phase to achieve near optimal clustering. It might be worthwhile to implement kMeans||. Resources: [1] http://ilpubs.stanford.edu:8090/778/1/2006-13.pdf [2] http://theory.stanford.edu/~sergei/papers/vldb12-kmpar.pdf -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1319][core] Add static code analysis fo...
Github user twalthr commented on the pull request: https://github.com/apache/flink/pull/729#issuecomment-109264887 Good points. Perfectly fine for me ;) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2133) Possible deadlock in ExecutionGraph
[ https://issues.apache.org/jira/browse/FLINK-2133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14575036#comment-14575036 ] Stephan Ewen commented on FLINK-2133: - Okay, I found a plausible scenario how this can happen: (It is a super hard race) - During canceling, the {{ExecutionJobVertices}} cancel simultaneously (vertex1 and vertex2) - {{Vertex 1}} transitions into its final state - In the executiongraph, it transitions the counter to the next vertex to check/wait for to {{vertex 2}} and checks if that one is finished already - {{Vertex 2}} is just done with its final subtask canceling has reached the state where it increments the number of terminal subtasks (ExecutionJobvertex, after 448, but before 454) - The thread that finished {{vertex 1}} recognizes that this considers {{vertex 2}} terminal and marks the job entirely as complete. It triggers restart. - {{Vertex 2}} tries to tell the ExecutionGraph that it reached a terminal state and cannot acquire the lock any more that it needs to learn that its transition to terminal has already been registered. == Deadlock There is a simple way to fix this, but I am not sure if there is any reasonable way to test this. Seems that one needs to provoke this insanely exact timed race between the threads to provoke that situation. Possible deadlock in ExecutionGraph --- Key: FLINK-2133 URL: https://issues.apache.org/jira/browse/FLINK-2133 Project: Flink Issue Type: Bug Reporter: Aljoscha Krettek I had the following output on Travis: {code} Found one Java-level deadlock: = ForkJoinPool-1-worker-3: waiting to lock monitor 0x7f1c54af7eb8 (object 0xd77fa8c0, a org.apache.flink.runtime.util.SerializableObject), which is held by flink-akka.actor.default-dispatcher-4 flink-akka.actor.default-dispatcher-4: waiting to lock monitor 0x7f1c5486aca0 (object 0xd77fa218, a org.apache.flink.runtime.util.SerializableObject), which is held by ForkJoinPool-1-worker-3 Java stack information for the threads listed above: === ForkJoinPool-1-worker-3: at org.apache.flink.runtime.executiongraph.ExecutionJobVertex.resetForNewExecution(ExecutionJobVertex.java:338) - waiting to lock 0xd77fa8c0 (a org.apache.flink.runtime.util.SerializableObject) at org.apache.flink.runtime.executiongraph.ExecutionGraph.restart(ExecutionGraph.java:595) - locked 0xd77fa218 (a org.apache.flink.runtime.util.SerializableObject) at org.apache.flink.runtime.executiongraph.ExecutionGraph$3.call(ExecutionGraph.java:733) at akka.dispatch.Futures$$anonfun$future$1.apply(Future.scala:94) at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24) at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24) at scala.concurrent.impl.ExecutionContextImpl$$anon$3.exec(ExecutionContextImpl.scala:107) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) flink-akka.actor.default-dispatcher-4: at org.apache.flink.runtime.executiongraph.ExecutionGraph.jobVertexInFinalState(ExecutionGraph.java:683) - waiting to lock 0xd77fa218 (a org.apache.flink.runtime.util.SerializableObject) at org.apache.flink.runtime.executiongraph.ExecutionJobVertex.subtaskInFinalState(ExecutionJobVertex.java:454) - locked 0xd77fa8c0 (a org.apache.flink.runtime.util.SerializableObject) at org.apache.flink.runtime.executiongraph.ExecutionJobVertex.vertexCancelled(ExecutionJobVertex.java:426) at org.apache.flink.runtime.executiongraph.ExecutionVertex.executionCanceled(ExecutionVertex.java:565) at org.apache.flink.runtime.executiongraph.Execution.cancelingComplete(Execution.java:653) at org.apache.flink.runtime.executiongraph.ExecutionGraph.updateState(ExecutionGraph.java:784) at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$receiveWithLogMessages$1$$anonfun$applyOrElse$2.apply$mcV$sp(JobManager.scala:220) at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$receiveWithLogMessages$1$$anonfun$applyOrElse$2.apply(JobManager.scala:219) at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$receiveWithLogMessages$1$$anonfun$applyOrElse$2.apply(JobManager.scala:219) at
[jira] [Commented] (FLINK-2174) Allow comments in 'slaves' file
[ https://issues.apache.org/jira/browse/FLINK-2174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14574995#comment-14574995 ] ASF GitHub Bot commented on FLINK-2174: --- GitHub user mjsax opened a pull request: https://github.com/apache/flink/pull/796 [FLINK-2174] Allow comments in 'slaves' file added skipping of #-comments in slaves file in start/stop scripts You can merge this pull request into a Git repository by running: $ git pull https://github.com/mjsax/flink slavesFile Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/796.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #796 commit f3d3bd7b5ab4241371642f5d0da3f06f7f56d92c Author: mjsax mj...@informatik.hu-berlin.de Date: 2015-06-05T18:35:45Z added skipping of #-comments in slaves file Allow comments in 'slaves' file --- Key: FLINK-2174 URL: https://issues.apache.org/jira/browse/FLINK-2174 Project: Flink Issue Type: Improvement Components: Start-Stop Scripts Reporter: Matthias J. Sax Assignee: Matthias J. Sax Priority: Minor Currently, each line in slaves in interpreded as a host name. Scripts should skip lines starting with '#'. Also allow for comments at the end of a line and skip empty lines. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2174] Allow comments in 'slaves' file
GitHub user mjsax opened a pull request: https://github.com/apache/flink/pull/796 [FLINK-2174] Allow comments in 'slaves' file added skipping of #-comments in slaves file in start/stop scripts You can merge this pull request into a Git repository by running: $ git pull https://github.com/mjsax/flink slavesFile Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/796.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #796 commit f3d3bd7b5ab4241371642f5d0da3f06f7f56d92c Author: mjsax mj...@informatik.hu-berlin.de Date: 2015-06-05T18:35:45Z added skipping of #-comments in slaves file --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Updated] (FLINK-2176) Add support for ProgramDesctiption interface in clients
[ https://issues.apache.org/jira/browse/FLINK-2176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthias J. Sax updated FLINK-2176: --- Component/s: Examples Add support for ProgramDesctiption interface in clients --- Key: FLINK-2176 URL: https://issues.apache.org/jira/browse/FLINK-2176 Project: Flink Issue Type: Improvement Components: Examples, other, Webfrontend Reporter: Matthias J. Sax Assignee: Matthias J. Sax Priority: Minor Extend WebClient and bin/flink client to show information providid via ProgramDesctiption interface. - show as tooltip in WebClient - show on command line in info command -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2153) Exclude dependency on hbase annotations module
[ https://issues.apache.org/jira/browse/FLINK-2153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14574736#comment-14574736 ] Márton Balassi commented on FLINK-2153: --- Thanks for the quick feedback. Exclude dependency on hbase annotations module -- Key: FLINK-2153 URL: https://issues.apache.org/jira/browse/FLINK-2153 Project: Flink Issue Type: Bug Components: Build System Affects Versions: 0.9 Reporter: Márton Balassi Assignee: Lokesh Rajaram [ERROR] Failed to execute goal on project flink-hbase: Could not resolve dependencies for project org.apache.flink:flink-hbase:jar:0.9-SNAPSHOT: Could not find artifact jdk.tools:jdk.tools:jar:1.7 at specified path /System/Library/Java/JavaVirtualMachines/1.6.0.jdk/Contents/Home/../lib/tools.jar There is a Spark issue for this [1] with a solution [2]. [1] https://issues.apache.org/jira/browse/SPARK-4455 [2] https://github.com/apache/spark/pull/3286/files -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2173) Python uses different tmp file than Flink
[ https://issues.apache.org/jira/browse/FLINK-2173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14574818#comment-14574818 ] Chesnay Schepler commented on FLINK-2173: - also, in case anyone is interested in tackling this issue: it should only be relevant during plan construction; at runtime all paths are supplied by the java side in a safe manner. one potential way to fix this earlier: instead of using tempfile.gettempdir(), infer the proper tmp directory by checking the location of the plan file, since it should reside in the java tmp folder. Python uses different tmp file than Flink - Key: FLINK-2173 URL: https://issues.apache.org/jira/browse/FLINK-2173 Project: Flink Issue Type: Bug Components: Python API Environment: Debian Linux Reporter: Matthias J. Sax Priority: Critical Flink gets the temp file path using System.getProperty(java.io.tmpdir) while Python uses the tempfile.gettempdir() method. However, both can be semantically different. On my system Flink uses /tmp while Pyhton used /tmp/users/1000 (1000 is my Linux user-id) This issues leads (at least) to failing tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2164] Document streaming and batch mode
GitHub user rmetzger opened a pull request: https://github.com/apache/flink/pull/795 [FLINK-2164] Document streaming and batch mode You can merge this pull request into a Git repository by running: $ git pull https://github.com/rmetzger/flink flink2164 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/795.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #795 commit 3996871b40d87408fcb024482430f1348c06bc13 Author: Robert Metzger rmetz...@apache.org Date: 2015-06-05T16:59:34Z [FLINK-2164] Document streaming and batch mode --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1635) Remove Apache Thrift dependency from Flink
[ https://issues.apache.org/jira/browse/FLINK-1635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14574793#comment-14574793 ] ASF GitHub Bot commented on FLINK-1635: --- Github user rmetzger commented on the pull request: https://github.com/apache/flink/pull/794#issuecomment-109355770 +1 to merge like this. Remove Apache Thrift dependency from Flink -- Key: FLINK-1635 URL: https://issues.apache.org/jira/browse/FLINK-1635 Project: Flink Issue Type: Bug Components: Java API Affects Versions: 0.9 Reporter: Robert Metzger Assignee: Maximilian Michels I've added Thrift and Protobuf to Flink to support it out of the box with Kryo. However, after trying to access a HCatalog/Hive table yesterday using Flink I found that there is a dependency conflict between Flink and Hive (on thrift). Maybe it makes more sense to properly document our serialization framework and provide a copypaste solution on how to get thrift/protobuf et al to work with Flink. Please chime in if you are against removing the out of the box support for protobuf and kryo. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1635] remove Apache Thrift and Google P...
Github user rmetzger commented on the pull request: https://github.com/apache/flink/pull/794#issuecomment-109355770 +1 to merge like this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1635] remove Apache Thrift and Google P...
Github user mxm commented on the pull request: https://github.com/apache/flink/pull/794#issuecomment-109355690 I also added the necessary Maven configuration for the examples. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1635) Remove Apache Thrift dependency from Flink
[ https://issues.apache.org/jira/browse/FLINK-1635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14574792#comment-14574792 ] ASF GitHub Bot commented on FLINK-1635: --- Github user mxm commented on the pull request: https://github.com/apache/flink/pull/794#issuecomment-109355690 I also added the necessary Maven configuration for the examples. Remove Apache Thrift dependency from Flink -- Key: FLINK-1635 URL: https://issues.apache.org/jira/browse/FLINK-1635 Project: Flink Issue Type: Bug Components: Java API Affects Versions: 0.9 Reporter: Robert Metzger Assignee: Maximilian Michels I've added Thrift and Protobuf to Flink to support it out of the box with Kryo. However, after trying to access a HCatalog/Hive table yesterday using Flink I found that there is a dependency conflict between Flink and Hive (on thrift). Maybe it makes more sense to properly document our serialization framework and provide a copypaste solution on how to get thrift/protobuf et al to work with Flink. Please chime in if you are against removing the out of the box support for protobuf and kryo. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2136) Test the streaming scala API
[ https://issues.apache.org/jira/browse/FLINK-2136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14574718#comment-14574718 ] ASF GitHub Bot commented on FLINK-2136: --- Github user mbalassi commented on the pull request: https://github.com/apache/flink/pull/771#issuecomment-109338790 Let me merge this for the release now. Test the streaming scala API Key: FLINK-2136 URL: https://issues.apache.org/jira/browse/FLINK-2136 Project: Flink Issue Type: Test Components: Scala API, Streaming Affects Versions: 0.9 Reporter: Márton Balassi Assignee: Gábor Hermann There are no test covering the streaming scala API. I would suggest to test whether the StreamGraph created by a certain operation looks as expected. Deeper layers and runtime should not be tested here, that is done in streaming-core. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (FLINK-2173) Python used different tmp file than Flink
[ https://issues.apache.org/jira/browse/FLINK-2173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthias J. Sax updated FLINK-2173: --- Summary: Python used different tmp file than Flink (was: Python used diffente tmp file than Flink) Python used different tmp file than Flink - Key: FLINK-2173 URL: https://issues.apache.org/jira/browse/FLINK-2173 Project: Flink Issue Type: Bug Components: Python API Environment: Debian Linux Reporter: Matthias J. Sax Priority: Critical Flink gets the temp file path using System.getProperty(java.io.tmpdir) while Python uses the tempfile.gettempdir() method. However, both can be semantically different. On my system Flink uses /tmp while Pyhton used /tmp/users/1000 (1000 is my Linux user-id) This issues leads (at least) to failing tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (FLINK-2173) Python uses different tmp file than Flink
[ https://issues.apache.org/jira/browse/FLINK-2173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthias J. Sax updated FLINK-2173: --- Summary: Python uses different tmp file than Flink (was: Python used different tmp file than Flink) Python uses different tmp file than Flink - Key: FLINK-2173 URL: https://issues.apache.org/jira/browse/FLINK-2173 Project: Flink Issue Type: Bug Components: Python API Environment: Debian Linux Reporter: Matthias J. Sax Priority: Critical Flink gets the temp file path using System.getProperty(java.io.tmpdir) while Python uses the tempfile.gettempdir() method. However, both can be semantically different. On my system Flink uses /tmp while Pyhton used /tmp/users/1000 (1000 is my Linux user-id) This issues leads (at least) to failing tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2136] Adding DataStream tests for Scala...
Github user mbalassi commented on the pull request: https://github.com/apache/flink/pull/771#issuecomment-109338790 Let me merge this for the release now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1635] remove Apache Thrift and Google P...
Github user rmetzger commented on a diff in the pull request: https://github.com/apache/flink/pull/794#discussion_r31827450 --- Diff: docs/apis/best_practices.md --- @@ -155,3 +155,41 @@ public static final class Tokenizer extends RichFlatMapFunctionString, Tuple2S // .. do more .. {% endhighlight %} + +## Register a custom serializer for your Flink program + +If you use a custom type in your Flink program which cannot be serialized by the +Flink type serializer, Flink falls back to using the generic Kryo +serializer. You may register your own serializer or a serialization system like +Google Protobuf or Apache Thrift with Kryo. To do that, simply register the type +class and the serializer in the `ExecutionConfig` of your Flink program. + + +{% highlight java %} +final ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment(); + +// register the class of the serializer as serializer for a type +env.getConfig().registerTypeWithKryoSerializer(MyCustomType.class, MyCustomSerializer.class); + +// register an instance as serializer for a type +MySerializer mySerializer = new MySerializer(); +env.getConfig().registerTypeWithKryoSerializer(MyCustomType.class, mySerializer); +{% endhighlight %} + +Note that your custom serializer has to extend Kryo's Serializer class. In the +case of Google Protobuf or Apache Thrift, this has already been done for +you: + +{% highlight java %} + +final ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment(); + +// register the Google Protobuf serializer with Kryo +env.getConfig().registerTypeWithKryoSerializer(MyCustomType.class, ProtobufSerializer.class); + +// register the serializer included with Apache Thrift as the standard serializer +// TBaseSerializer states it should be initalized as a default Kryo serializer +env.getConfig().addDefaultKryoSerializer(MyCustomType.class, TBaseSerializer.class); +env.getConfig().registerTypeWithKryoSerializer(MyCustomType.class, TMessage.class); --- End diff -- this seems incorrect --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1635) Remove Apache Thrift dependency from Flink
[ https://issues.apache.org/jira/browse/FLINK-1635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14574774#comment-14574774 ] ASF GitHub Bot commented on FLINK-1635: --- Github user rmetzger commented on a diff in the pull request: https://github.com/apache/flink/pull/794#discussion_r31827450 --- Diff: docs/apis/best_practices.md --- @@ -155,3 +155,41 @@ public static final class Tokenizer extends RichFlatMapFunctionString, Tuple2S // .. do more .. {% endhighlight %} + +## Register a custom serializer for your Flink program + +If you use a custom type in your Flink program which cannot be serialized by the +Flink type serializer, Flink falls back to using the generic Kryo +serializer. You may register your own serializer or a serialization system like +Google Protobuf or Apache Thrift with Kryo. To do that, simply register the type +class and the serializer in the `ExecutionConfig` of your Flink program. + + +{% highlight java %} +final ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment(); + +// register the class of the serializer as serializer for a type +env.getConfig().registerTypeWithKryoSerializer(MyCustomType.class, MyCustomSerializer.class); + +// register an instance as serializer for a type +MySerializer mySerializer = new MySerializer(); +env.getConfig().registerTypeWithKryoSerializer(MyCustomType.class, mySerializer); +{% endhighlight %} + +Note that your custom serializer has to extend Kryo's Serializer class. In the +case of Google Protobuf or Apache Thrift, this has already been done for +you: + +{% highlight java %} + +final ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment(); + +// register the Google Protobuf serializer with Kryo +env.getConfig().registerTypeWithKryoSerializer(MyCustomType.class, ProtobufSerializer.class); + +// register the serializer included with Apache Thrift as the standard serializer +// TBaseSerializer states it should be initalized as a default Kryo serializer +env.getConfig().addDefaultKryoSerializer(MyCustomType.class, TBaseSerializer.class); +env.getConfig().registerTypeWithKryoSerializer(MyCustomType.class, TMessage.class); --- End diff -- this seems incorrect Remove Apache Thrift dependency from Flink -- Key: FLINK-1635 URL: https://issues.apache.org/jira/browse/FLINK-1635 Project: Flink Issue Type: Bug Components: Java API Affects Versions: 0.9 Reporter: Robert Metzger Assignee: Maximilian Michels I've added Thrift and Protobuf to Flink to support it out of the box with Kryo. However, after trying to access a HCatalog/Hive table yesterday using Flink I found that there is a dependency conflict between Flink and Hive (on thrift). Maybe it makes more sense to properly document our serialization framework and provide a copypaste solution on how to get thrift/protobuf et al to work with Flink. Please chime in if you are against removing the out of the box support for protobuf and kryo. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1126) Add suggestion for using large TupleX types
[ https://issues.apache.org/jira/browse/FLINK-1126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14574851#comment-14574851 ] ASF GitHub Bot commented on FLINK-1126: --- Github user rmetzger commented on the pull request: https://github.com/apache/flink/pull/786#issuecomment-109367152 Merging ... Add suggestion for using large TupleX types --- Key: FLINK-1126 URL: https://issues.apache.org/jira/browse/FLINK-1126 Project: Flink Issue Type: Improvement Components: Documentation Reporter: Ufuk Celebi Assignee: Robert Metzger Priority: Minor Instead of {code} Tuple11String, String, ..., String var = new ...; {code} I would like to add a hint to use custom types like: {code} CustomType var = new ...; public static class CustomType extends Tuple11String, String, ..., String { // constructor matching super } {code} I saw a couple of users sticking to the large TupleX types instead of doing this, which leads to a very clumsy user code. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-2172) Stabilize SocketOutputFormatTest
Márton Balassi created FLINK-2172: - Summary: Stabilize SocketOutputFormatTest Key: FLINK-2172 URL: https://issues.apache.org/jira/browse/FLINK-2172 Project: Flink Issue Type: Test Components: Streaming Affects Versions: 0.9 Reporter: Márton Balassi As for a resolution of FLINK-2139 I am adding tests for the core streaming outputformats. Added a skeleton for the socket output too, but found that it was unstable and disabled it for now for that reason. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1635) Remove Apache Thrift dependency from Flink
[ https://issues.apache.org/jira/browse/FLINK-1635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14574789#comment-14574789 ] ASF GitHub Bot commented on FLINK-1635: --- Github user mxm commented on a diff in the pull request: https://github.com/apache/flink/pull/794#discussion_r31828328 --- Diff: docs/apis/best_practices.md --- @@ -155,3 +155,41 @@ public static final class Tokenizer extends RichFlatMapFunctionString, Tuple2S // .. do more .. {% endhighlight %} + +## Register a custom serializer for your Flink program + +If you use a custom type in your Flink program which cannot be serialized by the +Flink type serializer, Flink falls back to using the generic Kryo +serializer. You may register your own serializer or a serialization system like +Google Protobuf or Apache Thrift with Kryo. To do that, simply register the type +class and the serializer in the `ExecutionConfig` of your Flink program. + + +{% highlight java %} +final ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment(); + +// register the class of the serializer as serializer for a type +env.getConfig().registerTypeWithKryoSerializer(MyCustomType.class, MyCustomSerializer.class); + +// register an instance as serializer for a type +MySerializer mySerializer = new MySerializer(); +env.getConfig().registerTypeWithKryoSerializer(MyCustomType.class, mySerializer); +{% endhighlight %} + +Note that your custom serializer has to extend Kryo's Serializer class. In the +case of Google Protobuf or Apache Thrift, this has already been done for +you: + +{% highlight java %} + +final ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment(); + +// register the Google Protobuf serializer with Kryo +env.getConfig().registerTypeWithKryoSerializer(MyCustomType.class, ProtobufSerializer.class); + +// register the serializer included with Apache Thrift as the standard serializer +// TBaseSerializer states it should be initalized as a default Kryo serializer +env.getConfig().addDefaultKryoSerializer(MyCustomType.class, TBaseSerializer.class); +env.getConfig().registerTypeWithKryoSerializer(MyCustomType.class, TMessage.class); --- End diff -- yes, sorry. copy/paste error. fixed. Remove Apache Thrift dependency from Flink -- Key: FLINK-1635 URL: https://issues.apache.org/jira/browse/FLINK-1635 Project: Flink Issue Type: Bug Components: Java API Affects Versions: 0.9 Reporter: Robert Metzger Assignee: Maximilian Michels I've added Thrift and Protobuf to Flink to support it out of the box with Kryo. However, after trying to access a HCatalog/Hive table yesterday using Flink I found that there is a dependency conflict between Flink and Hive (on thrift). Maybe it makes more sense to properly document our serialization framework and provide a copypaste solution on how to get thrift/protobuf et al to work with Flink. Please chime in if you are against removing the out of the box support for protobuf and kryo. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1565][FLINK-2078] Document ExecutionCon...
Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/781 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2173) Python uses different tmp file than Flink
[ https://issues.apache.org/jira/browse/FLINK-2173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14574804#comment-14574804 ] Chesnay Schepler commented on FLINK-2173: - small addition from the mailing list: this is not affecting all users. It will furthermore be trivial to fix once FLINK-1927 is resolved, which admittedly could take a while. Python uses different tmp file than Flink - Key: FLINK-2173 URL: https://issues.apache.org/jira/browse/FLINK-2173 Project: Flink Issue Type: Bug Components: Python API Environment: Debian Linux Reporter: Matthias J. Sax Priority: Critical Flink gets the temp file path using System.getProperty(java.io.tmpdir) while Python uses the tempfile.gettempdir() method. However, both can be semantically different. On my system Flink uses /tmp while Pyhton used /tmp/users/1000 (1000 is my Linux user-id) This issues leads (at least) to failing tests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1565) Document object reuse behavior
[ https://issues.apache.org/jira/browse/FLINK-1565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14574856#comment-14574856 ] ASF GitHub Bot commented on FLINK-1565: --- Github user rmetzger commented on the pull request: https://github.com/apache/flink/pull/781#issuecomment-109367441 Merging ... Document object reuse behavior -- Key: FLINK-1565 URL: https://issues.apache.org/jira/browse/FLINK-1565 Project: Flink Issue Type: Improvement Components: Documentation Reporter: Fabian Hueske Assignee: Robert Metzger Fix For: 0.9 The documentation needs to be extended and describe the object reuse behavior of Flink and its implications for how to implement functions. The documentation must at least cover the default reuse mode: * new objects through iterators and in reduce functions * chaining behavior (objects are passed on to the next function which might modify it) Optionally, the documentation could describe the object reuse switch introduced by FLINK-1137. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1565][FLINK-2078] Document ExecutionCon...
Github user rmetzger commented on the pull request: https://github.com/apache/flink/pull/781#issuecomment-109367441 Merging ... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1126][docs] Best practice: named TupleX...
Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/786 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Created] (FLINK-2170) Add fromOrcFile() to TableEnvironment
Fabian Hueske created FLINK-2170: Summary: Add fromOrcFile() to TableEnvironment Key: FLINK-2170 URL: https://issues.apache.org/jira/browse/FLINK-2170 Project: Flink Issue Type: New Feature Components: Table API Affects Versions: 0.9 Reporter: Fabian Hueske Priority: Minor Add a {{fromOrcFile()}} method to the {{TableEnvironment}} to read a {{Table}} from a ORC file. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (FLINK-2165) Rename Table conversion methods in TableEnvironment
[ https://issues.apache.org/jira/browse/FLINK-2165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fabian Hueske reassigned FLINK-2165: Assignee: Fabian Hueske Rename Table conversion methods in TableEnvironment --- Key: FLINK-2165 URL: https://issues.apache.org/jira/browse/FLINK-2165 Project: Flink Issue Type: Improvement Components: Table API Affects Versions: 0.9 Reporter: Fabian Hueske Assignee: Fabian Hueske Priority: Minor Fix For: 0.9 The {{TableEnvironment}} provides methods to convert DataSets and DataStreams into Tables and back. These methods are called {{toTable()}}, {{toSet()}}, and {{toStream()}}. I propose to rename the methods into {{fromDataSet()}}, {{fromDataStream()}}, {{toDataSet()}}, and {{toDataStream()}} for the following reasons: - {{fromDataSet()}}, {{fromDataStream()}} is closer to the SQL FROM expression - It allows to add methods such as {{fromCSV()}}, {{fromHCat()}}, {{fromParquet()}}, and so on to the {{TableEnvironment}} - {{toSet()}} and {{toStream()}} should be renamed for consistency. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-2171) Add instruction to build Flink with Scala 2.11
Fabian Hueske created FLINK-2171: Summary: Add instruction to build Flink with Scala 2.11 Key: FLINK-2171 URL: https://issues.apache.org/jira/browse/FLINK-2171 Project: Flink Issue Type: Improvement Components: Documentation Affects Versions: 0.9 Reporter: Fabian Hueske Priority: Minor Flink can be built for Scala 2.11. However, the build documentation does not cover include instructions for that. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1635) Remove Apache Thrift dependency from Flink
[ https://issues.apache.org/jira/browse/FLINK-1635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14574679#comment-14574679 ] ASF GitHub Bot commented on FLINK-1635: --- Github user rmetzger commented on the pull request: https://github.com/apache/flink/pull/794#issuecomment-109328706 It would be great if you could add some documentation that explains how users can register the serializers at the ExecutionConfig. I've actually added support for these two frameworks because users needed this. It would be nice if your docs would explain how to use Flink with Thrift/Protobuf types (thats also needed for stuff like Parquet) Remove Apache Thrift dependency from Flink -- Key: FLINK-1635 URL: https://issues.apache.org/jira/browse/FLINK-1635 Project: Flink Issue Type: Bug Components: Java API Affects Versions: 0.9 Reporter: Robert Metzger Assignee: Maximilian Michels I've added Thrift and Protobuf to Flink to support it out of the box with Kryo. However, after trying to access a HCatalog/Hive table yesterday using Flink I found that there is a dependency conflict between Flink and Hive (on thrift). Maybe it makes more sense to properly document our serialization framework and provide a copypaste solution on how to get thrift/protobuf et al to work with Flink. Please chime in if you are against removing the out of the box support for protobuf and kryo. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (FLINK-2177) NullPointer in task resource release
[ https://issues.apache.org/jira/browse/FLINK-2177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephan Ewen updated FLINK-2177: Summary: NullPointer in task resource release (was: NillPointer in task resource release) NullPointer in task resource release Key: FLINK-2177 URL: https://issues.apache.org/jira/browse/FLINK-2177 Project: Flink Issue Type: Bug Components: Distributed Runtime Affects Versions: 0.9 Reporter: Stephan Ewen Assignee: Ufuk Celebi Priority: Blocker Fix For: 0.9 {code} == == FATAL === == A fatal error occurred, forcing the TaskManager to shut down: FATAL - exception in task resource cleanup java.lang.NullPointerException at org.apache.flink.runtime.io.network.netty.PartitionRequestClientHandler.cancelRequestFor(PartitionRequestClientHandler.java:89) at org.apache.flink.runtime.io.network.netty.PartitionRequestClient.close(PartitionRequestClient.java:182) at org.apache.flink.runtime.io.network.partition.consumer.RemoteInputChannel.releaseAllResources(RemoteInputChannel.java:199) at org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.releaseAllResources(SingleInputGate.java:332) at org.apache.flink.runtime.io.network.NetworkEnvironment.unregisterTask(NetworkEnvironment.java:368) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:650) at java.lang.Thread.run(Thread.java:701) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2133] [jobmanager] Fix possible deadloc...
Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/797 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2177) NillPointer in task resource release
[ https://issues.apache.org/jira/browse/FLINK-2177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14575213#comment-14575213 ] Stephan Ewen commented on FLINK-2177: - Here are the tests where this occurred (need to download the logs to examine the TaskManager output) https://travis-ci.org/StephanEwen/incubator-flink/builds/65540688 NillPointer in task resource release Key: FLINK-2177 URL: https://issues.apache.org/jira/browse/FLINK-2177 Project: Flink Issue Type: Bug Components: Distributed Runtime Affects Versions: 0.9 Reporter: Stephan Ewen Assignee: Ufuk Celebi Priority: Blocker Fix For: 0.9 {code} == == FATAL === == A fatal error occurred, forcing the TaskManager to shut down: FATAL - exception in task resource cleanup java.lang.NullPointerException at org.apache.flink.runtime.io.network.netty.PartitionRequestClientHandler.cancelRequestFor(PartitionRequestClientHandler.java:89) at org.apache.flink.runtime.io.network.netty.PartitionRequestClient.close(PartitionRequestClient.java:182) at org.apache.flink.runtime.io.network.partition.consumer.RemoteInputChannel.releaseAllResources(RemoteInputChannel.java:199) at org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.releaseAllResources(SingleInputGate.java:332) at org.apache.flink.runtime.io.network.NetworkEnvironment.unregisterTask(NetworkEnvironment.java:368) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:650) at java.lang.Thread.run(Thread.java:701) {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2098) Checkpoint barrier initiation at source is not aligned with snapshotting
[ https://issues.apache.org/jira/browse/FLINK-2098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14575234#comment-14575234 ] ASF GitHub Bot commented on FLINK-2098: --- Github user StephanEwen closed the pull request at: https://github.com/apache/flink/pull/755 Checkpoint barrier initiation at source is not aligned with snapshotting Key: FLINK-2098 URL: https://issues.apache.org/jira/browse/FLINK-2098 Project: Flink Issue Type: Bug Components: Streaming Affects Versions: 0.9 Reporter: Stephan Ewen Assignee: Aljoscha Krettek Priority: Blocker Fix For: 0.9 The stream source does not properly align the emission of checkpoint barriers with the drawing of snapshots. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2164) Document batch and streaming startup modes
[ https://issues.apache.org/jira/browse/FLINK-2164?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14575228#comment-14575228 ] ASF GitHub Bot commented on FLINK-2164: --- Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/795#issuecomment-109441645 Good addition, +1 Document batch and streaming startup modes -- Key: FLINK-2164 URL: https://issues.apache.org/jira/browse/FLINK-2164 Project: Flink Issue Type: Bug Components: Documentation Affects Versions: 0.9 Reporter: Robert Metzger Assignee: Robert Metzger -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-2098) Checkpoint barrier initiation at source is not aligned with snapshotting
[ https://issues.apache.org/jira/browse/FLINK-2098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14575233#comment-14575233 ] ASF GitHub Bot commented on FLINK-2098: --- Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/755#issuecomment-109442172 merged as part of #742 Checkpoint barrier initiation at source is not aligned with snapshotting Key: FLINK-2098 URL: https://issues.apache.org/jira/browse/FLINK-2098 Project: Flink Issue Type: Bug Components: Streaming Affects Versions: 0.9 Reporter: Stephan Ewen Assignee: Aljoscha Krettek Priority: Blocker Fix For: 0.9 The stream source does not properly align the emission of checkpoint barriers with the drawing of snapshots. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2164] Document streaming and batch mode
Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/795#issuecomment-109441645 Good addition, +1 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-2098] Improvements on checkpoint-aligne...
Github user StephanEwen closed the pull request at: https://github.com/apache/flink/pull/755 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2175) Allow multiple jobs in single jar file
[ https://issues.apache.org/jira/browse/FLINK-2175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14575279#comment-14575279 ] ASF GitHub Bot commented on FLINK-2175: --- Github user mjsax commented on the pull request: https://github.com/apache/flink/pull/707#issuecomment-109447546 My Travis is green (https://travis-ci.org/mjsax/flink/builds/65609726). From my point of view, this PR can be merged. Let me know if you request any changes. Allow multiple jobs in single jar file -- Key: FLINK-2175 URL: https://issues.apache.org/jira/browse/FLINK-2175 Project: Flink Issue Type: Improvement Components: Examples, other, Webfrontend Reporter: Matthias J. Sax Assignee: Matthias J. Sax Priority: Minor Allow to package multiple jobs into a single jar. - extend WebClient to display all available jobs - extend WebClient to diplay plan and submit each job -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2133] [jobmanager] Fix possible deadloc...
GitHub user StephanEwen opened a pull request: https://github.com/apache/flink/pull/797 [FLINK-2133] [jobmanager] Fix possible deadlock when vertices transition to final state You can merge this pull request into a Git repository by running: $ git pull https://github.com/StephanEwen/incubator-flink final_state_deadlock Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/797.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #797 commit 2298cfedbf880b3a6065a307224c5f3e9e326a0b Author: Stephan Ewen se...@apache.org Date: 2015-06-05T20:39:29Z [FLINK-2133] [jobmanager] Fix possible deadlock when vertices transition to final state. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2133) Possible deadlock in ExecutionGraph
[ https://issues.apache.org/jira/browse/FLINK-2133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14575226#comment-14575226 ] ASF GitHub Bot commented on FLINK-2133: --- GitHub user StephanEwen opened a pull request: https://github.com/apache/flink/pull/797 [FLINK-2133] [jobmanager] Fix possible deadlock when vertices transition to final state You can merge this pull request into a Git repository by running: $ git pull https://github.com/StephanEwen/incubator-flink final_state_deadlock Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/797.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #797 commit 2298cfedbf880b3a6065a307224c5f3e9e326a0b Author: Stephan Ewen se...@apache.org Date: 2015-06-05T20:39:29Z [FLINK-2133] [jobmanager] Fix possible deadlock when vertices transition to final state. Possible deadlock in ExecutionGraph --- Key: FLINK-2133 URL: https://issues.apache.org/jira/browse/FLINK-2133 Project: Flink Issue Type: Bug Reporter: Aljoscha Krettek I had the following output on Travis: {code} Found one Java-level deadlock: = ForkJoinPool-1-worker-3: waiting to lock monitor 0x7f1c54af7eb8 (object 0xd77fa8c0, a org.apache.flink.runtime.util.SerializableObject), which is held by flink-akka.actor.default-dispatcher-4 flink-akka.actor.default-dispatcher-4: waiting to lock monitor 0x7f1c5486aca0 (object 0xd77fa218, a org.apache.flink.runtime.util.SerializableObject), which is held by ForkJoinPool-1-worker-3 Java stack information for the threads listed above: === ForkJoinPool-1-worker-3: at org.apache.flink.runtime.executiongraph.ExecutionJobVertex.resetForNewExecution(ExecutionJobVertex.java:338) - waiting to lock 0xd77fa8c0 (a org.apache.flink.runtime.util.SerializableObject) at org.apache.flink.runtime.executiongraph.ExecutionGraph.restart(ExecutionGraph.java:595) - locked 0xd77fa218 (a org.apache.flink.runtime.util.SerializableObject) at org.apache.flink.runtime.executiongraph.ExecutionGraph$3.call(ExecutionGraph.java:733) at akka.dispatch.Futures$$anonfun$future$1.apply(Future.scala:94) at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24) at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24) at scala.concurrent.impl.ExecutionContextImpl$$anon$3.exec(ExecutionContextImpl.scala:107) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) flink-akka.actor.default-dispatcher-4: at org.apache.flink.runtime.executiongraph.ExecutionGraph.jobVertexInFinalState(ExecutionGraph.java:683) - waiting to lock 0xd77fa218 (a org.apache.flink.runtime.util.SerializableObject) at org.apache.flink.runtime.executiongraph.ExecutionJobVertex.subtaskInFinalState(ExecutionJobVertex.java:454) - locked 0xd77fa8c0 (a org.apache.flink.runtime.util.SerializableObject) at org.apache.flink.runtime.executiongraph.ExecutionJobVertex.vertexCancelled(ExecutionJobVertex.java:426) at org.apache.flink.runtime.executiongraph.ExecutionVertex.executionCanceled(ExecutionVertex.java:565) at org.apache.flink.runtime.executiongraph.Execution.cancelingComplete(Execution.java:653) at org.apache.flink.runtime.executiongraph.ExecutionGraph.updateState(ExecutionGraph.java:784) at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$receiveWithLogMessages$1$$anonfun$applyOrElse$2.apply$mcV$sp(JobManager.scala:220) at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$receiveWithLogMessages$1$$anonfun$applyOrElse$2.apply(JobManager.scala:219) at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$receiveWithLogMessages$1$$anonfun$applyOrElse$2.apply(JobManager.scala:219) at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24) at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24) at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:41) at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:401) at
[jira] [Commented] (FLINK-2133) Possible deadlock in ExecutionGraph
[ https://issues.apache.org/jira/browse/FLINK-2133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14575246#comment-14575246 ] ASF GitHub Bot commented on FLINK-2133: --- Github user asfgit closed the pull request at: https://github.com/apache/flink/pull/797 Possible deadlock in ExecutionGraph --- Key: FLINK-2133 URL: https://issues.apache.org/jira/browse/FLINK-2133 Project: Flink Issue Type: Bug Reporter: Aljoscha Krettek I had the following output on Travis: {code} Found one Java-level deadlock: = ForkJoinPool-1-worker-3: waiting to lock monitor 0x7f1c54af7eb8 (object 0xd77fa8c0, a org.apache.flink.runtime.util.SerializableObject), which is held by flink-akka.actor.default-dispatcher-4 flink-akka.actor.default-dispatcher-4: waiting to lock monitor 0x7f1c5486aca0 (object 0xd77fa218, a org.apache.flink.runtime.util.SerializableObject), which is held by ForkJoinPool-1-worker-3 Java stack information for the threads listed above: === ForkJoinPool-1-worker-3: at org.apache.flink.runtime.executiongraph.ExecutionJobVertex.resetForNewExecution(ExecutionJobVertex.java:338) - waiting to lock 0xd77fa8c0 (a org.apache.flink.runtime.util.SerializableObject) at org.apache.flink.runtime.executiongraph.ExecutionGraph.restart(ExecutionGraph.java:595) - locked 0xd77fa218 (a org.apache.flink.runtime.util.SerializableObject) at org.apache.flink.runtime.executiongraph.ExecutionGraph$3.call(ExecutionGraph.java:733) at akka.dispatch.Futures$$anonfun$future$1.apply(Future.scala:94) at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24) at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24) at scala.concurrent.impl.ExecutionContextImpl$$anon$3.exec(ExecutionContextImpl.scala:107) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) flink-akka.actor.default-dispatcher-4: at org.apache.flink.runtime.executiongraph.ExecutionGraph.jobVertexInFinalState(ExecutionGraph.java:683) - waiting to lock 0xd77fa218 (a org.apache.flink.runtime.util.SerializableObject) at org.apache.flink.runtime.executiongraph.ExecutionJobVertex.subtaskInFinalState(ExecutionJobVertex.java:454) - locked 0xd77fa8c0 (a org.apache.flink.runtime.util.SerializableObject) at org.apache.flink.runtime.executiongraph.ExecutionJobVertex.vertexCancelled(ExecutionJobVertex.java:426) at org.apache.flink.runtime.executiongraph.ExecutionVertex.executionCanceled(ExecutionVertex.java:565) at org.apache.flink.runtime.executiongraph.Execution.cancelingComplete(Execution.java:653) at org.apache.flink.runtime.executiongraph.ExecutionGraph.updateState(ExecutionGraph.java:784) at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$receiveWithLogMessages$1$$anonfun$applyOrElse$2.apply$mcV$sp(JobManager.scala:220) at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$receiveWithLogMessages$1$$anonfun$applyOrElse$2.apply(JobManager.scala:219) at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$receiveWithLogMessages$1$$anonfun$applyOrElse$2.apply(JobManager.scala:219) at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24) at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24) at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:41) at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:401) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.pollAndExecAll(ForkJoinPool.java:1253) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1346) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) Found 1 deadlock. {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (FLINK-2133) Possible deadlock in ExecutionGraph
[ https://issues.apache.org/jira/browse/FLINK-2133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephan Ewen resolved FLINK-2133. - Resolution: Fixed Fix Version/s: 0.9 Assignee: Stephan Ewen Fixed via 2298cfedbf880b3a6065a307224c5f3e9e326a0b Possible deadlock in ExecutionGraph --- Key: FLINK-2133 URL: https://issues.apache.org/jira/browse/FLINK-2133 Project: Flink Issue Type: Bug Reporter: Aljoscha Krettek Assignee: Stephan Ewen Fix For: 0.9 I had the following output on Travis: {code} Found one Java-level deadlock: = ForkJoinPool-1-worker-3: waiting to lock monitor 0x7f1c54af7eb8 (object 0xd77fa8c0, a org.apache.flink.runtime.util.SerializableObject), which is held by flink-akka.actor.default-dispatcher-4 flink-akka.actor.default-dispatcher-4: waiting to lock monitor 0x7f1c5486aca0 (object 0xd77fa218, a org.apache.flink.runtime.util.SerializableObject), which is held by ForkJoinPool-1-worker-3 Java stack information for the threads listed above: === ForkJoinPool-1-worker-3: at org.apache.flink.runtime.executiongraph.ExecutionJobVertex.resetForNewExecution(ExecutionJobVertex.java:338) - waiting to lock 0xd77fa8c0 (a org.apache.flink.runtime.util.SerializableObject) at org.apache.flink.runtime.executiongraph.ExecutionGraph.restart(ExecutionGraph.java:595) - locked 0xd77fa218 (a org.apache.flink.runtime.util.SerializableObject) at org.apache.flink.runtime.executiongraph.ExecutionGraph$3.call(ExecutionGraph.java:733) at akka.dispatch.Futures$$anonfun$future$1.apply(Future.scala:94) at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24) at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24) at scala.concurrent.impl.ExecutionContextImpl$$anon$3.exec(ExecutionContextImpl.scala:107) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) flink-akka.actor.default-dispatcher-4: at org.apache.flink.runtime.executiongraph.ExecutionGraph.jobVertexInFinalState(ExecutionGraph.java:683) - waiting to lock 0xd77fa218 (a org.apache.flink.runtime.util.SerializableObject) at org.apache.flink.runtime.executiongraph.ExecutionJobVertex.subtaskInFinalState(ExecutionJobVertex.java:454) - locked 0xd77fa8c0 (a org.apache.flink.runtime.util.SerializableObject) at org.apache.flink.runtime.executiongraph.ExecutionJobVertex.vertexCancelled(ExecutionJobVertex.java:426) at org.apache.flink.runtime.executiongraph.ExecutionVertex.executionCanceled(ExecutionVertex.java:565) at org.apache.flink.runtime.executiongraph.Execution.cancelingComplete(Execution.java:653) at org.apache.flink.runtime.executiongraph.ExecutionGraph.updateState(ExecutionGraph.java:784) at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$receiveWithLogMessages$1$$anonfun$applyOrElse$2.apply$mcV$sp(JobManager.scala:220) at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$receiveWithLogMessages$1$$anonfun$applyOrElse$2.apply(JobManager.scala:219) at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$receiveWithLogMessages$1$$anonfun$applyOrElse$2.apply(JobManager.scala:219) at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24) at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24) at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:41) at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:401) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.pollAndExecAll(ForkJoinPool.java:1253) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1346) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) Found 1 deadlock. {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (FLINK-2133) Possible deadlock in ExecutionGraph
[ https://issues.apache.org/jira/browse/FLINK-2133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephan Ewen closed FLINK-2133. --- Possible deadlock in ExecutionGraph --- Key: FLINK-2133 URL: https://issues.apache.org/jira/browse/FLINK-2133 Project: Flink Issue Type: Bug Reporter: Aljoscha Krettek Assignee: Stephan Ewen Fix For: 0.9 I had the following output on Travis: {code} Found one Java-level deadlock: = ForkJoinPool-1-worker-3: waiting to lock monitor 0x7f1c54af7eb8 (object 0xd77fa8c0, a org.apache.flink.runtime.util.SerializableObject), which is held by flink-akka.actor.default-dispatcher-4 flink-akka.actor.default-dispatcher-4: waiting to lock monitor 0x7f1c5486aca0 (object 0xd77fa218, a org.apache.flink.runtime.util.SerializableObject), which is held by ForkJoinPool-1-worker-3 Java stack information for the threads listed above: === ForkJoinPool-1-worker-3: at org.apache.flink.runtime.executiongraph.ExecutionJobVertex.resetForNewExecution(ExecutionJobVertex.java:338) - waiting to lock 0xd77fa8c0 (a org.apache.flink.runtime.util.SerializableObject) at org.apache.flink.runtime.executiongraph.ExecutionGraph.restart(ExecutionGraph.java:595) - locked 0xd77fa218 (a org.apache.flink.runtime.util.SerializableObject) at org.apache.flink.runtime.executiongraph.ExecutionGraph$3.call(ExecutionGraph.java:733) at akka.dispatch.Futures$$anonfun$future$1.apply(Future.scala:94) at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24) at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24) at scala.concurrent.impl.ExecutionContextImpl$$anon$3.exec(ExecutionContextImpl.scala:107) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) flink-akka.actor.default-dispatcher-4: at org.apache.flink.runtime.executiongraph.ExecutionGraph.jobVertexInFinalState(ExecutionGraph.java:683) - waiting to lock 0xd77fa218 (a org.apache.flink.runtime.util.SerializableObject) at org.apache.flink.runtime.executiongraph.ExecutionJobVertex.subtaskInFinalState(ExecutionJobVertex.java:454) - locked 0xd77fa8c0 (a org.apache.flink.runtime.util.SerializableObject) at org.apache.flink.runtime.executiongraph.ExecutionJobVertex.vertexCancelled(ExecutionJobVertex.java:426) at org.apache.flink.runtime.executiongraph.ExecutionVertex.executionCanceled(ExecutionVertex.java:565) at org.apache.flink.runtime.executiongraph.Execution.cancelingComplete(Execution.java:653) at org.apache.flink.runtime.executiongraph.ExecutionGraph.updateState(ExecutionGraph.java:784) at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$receiveWithLogMessages$1$$anonfun$applyOrElse$2.apply$mcV$sp(JobManager.scala:220) at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$receiveWithLogMessages$1$$anonfun$applyOrElse$2.apply(JobManager.scala:219) at org.apache.flink.runtime.jobmanager.JobManager$$anonfun$receiveWithLogMessages$1$$anonfun$applyOrElse$2.apply(JobManager.scala:219) at scala.concurrent.impl.Future$PromiseCompletingRunnable.liftedTree1$1(Future.scala:24) at scala.concurrent.impl.Future$PromiseCompletingRunnable.run(Future.scala:24) at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:41) at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:401) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.pollAndExecAll(ForkJoinPool.java:1253) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1346) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) Found 1 deadlock. {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2175] Allow multiple jobs in single jar...
Github user mjsax commented on the pull request: https://github.com/apache/flink/pull/707#issuecomment-109447546 My Travis is green (https://travis-ci.org/mjsax/flink/builds/65609726). From my point of view, this PR can be merged. Let me know if you request any changes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1297] Added OperatorStatsAccumulator fo...
Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/605#issuecomment-109446516 I am merging this for the next version. Very nice addition, sorry for the delay. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1297) Add support for tracking statistics of intermediate results
[ https://issues.apache.org/jira/browse/FLINK-1297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14575263#comment-14575263 ] ASF GitHub Bot commented on FLINK-1297: --- Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/605#issuecomment-109446516 I am merging this for the next version. Very nice addition, sorry for the delay. Add support for tracking statistics of intermediate results --- Key: FLINK-1297 URL: https://issues.apache.org/jira/browse/FLINK-1297 Project: Flink Issue Type: Improvement Components: Distributed Runtime Reporter: Alexander Alexandrov Assignee: Alexander Alexandrov Fix For: 0.9 Original Estimate: 1,008h Remaining Estimate: 1,008h One of the major problems related to the optimizer at the moment is the lack of proper statistics. With the introduction of staged execution, it is possible to instrument the runtime code with a statistics facility that collects the required information for optimizing the next execution stage. I would therefore like to contribute code that can be used to gather basic statistics for the (intermediate) result of dataflows (e.g. min, max, count, count distinct) and make them available to the job manager. Before I start, I would like to hear some feedback form the other users. In particular, to handle skew (e.g. on grouping) it might be good to have some sort of detailed sketch about the key distribution of an intermediate result. I am not sure whether a simple histogram is the most effective way to go. Maybe somebody would propose another lightweight sketch that provides better accuracy. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1297] Added OperatorStatsAccumulator fo...
Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/605#issuecomment-109490490 The tests seem to be non-deterministic and fail frequently. Check out this build: https://travis-ci.org/StephanEwen/incubator-flink/jobs/65634990 The tests need to be more stable before we can add this to the codebase. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [Flink 1844] Add Normaliser to ML library
GitHub user fobeligi opened a pull request: https://github.com/apache/flink/pull/798 [Flink 1844] Add Normaliser to ML library Adds a MinMaxScaler to the ML preprocessing package. MinMax scaler scales the values to a user-specified range. You can merge this pull request into a Git repository by running: $ git pull https://github.com/fobeligi/incubator-flink FLINK-1844 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/798.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #798 commit 802b9da07a2c3f7c055b4c024aaecbbe647db1cd Author: fobeligi faybeligia...@gmail.com Date: 2015-06-05T21:12:43Z [FLINK-1844] Add MinMaxScaler implementation in the proprocessing package, test for the for the corresponding functionality and documentation. commit e639185108f9bda253e296bae4c6c4269a30d1d0 Author: fobeligi faybeligia...@gmail.com Date: 2015-06-05T22:12:33Z [FLINK-1844] Change second test to use LabeledVectors instead of Vectors --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2174) Allow comments in 'slaves' file
[ https://issues.apache.org/jira/browse/FLINK-2174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14575392#comment-14575392 ] ASF GitHub Bot commented on FLINK-2174: --- Github user mjsax commented on the pull request: https://github.com/apache/flink/pull/796#issuecomment-109480303 My Travis is green (https://travis-ci.org/mjsax/flink/builds/65612778). Any comments? Can be merged from my point of view. Allow comments in 'slaves' file --- Key: FLINK-2174 URL: https://issues.apache.org/jira/browse/FLINK-2174 Project: Flink Issue Type: Improvement Components: Start-Stop Scripts Reporter: Matthias J. Sax Assignee: Matthias J. Sax Priority: Trivial Currently, each line in slaves in interpreded as a host name. Scripts should skip lines starting with '#'. Also allow for comments at the end of a line and skip empty lines. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-2174] Allow comments in 'slaves' file
Github user mjsax commented on the pull request: https://github.com/apache/flink/pull/796#issuecomment-109480303 My Travis is green (https://travis-ci.org/mjsax/flink/builds/65612778). Any comments? Can be merged from my point of view. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Updated] (FLINK-2174) Allow comments in 'slaves' file
[ https://issues.apache.org/jira/browse/FLINK-2174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Matthias J. Sax updated FLINK-2174: --- Priority: Trivial (was: Minor) Allow comments in 'slaves' file --- Key: FLINK-2174 URL: https://issues.apache.org/jira/browse/FLINK-2174 Project: Flink Issue Type: Improvement Components: Start-Stop Scripts Reporter: Matthias J. Sax Assignee: Matthias J. Sax Priority: Trivial Currently, each line in slaves in interpreded as a host name. Scripts should skip lines starting with '#'. Also allow for comments at the end of a line and skip empty lines. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1297) Add support for tracking statistics of intermediate results
[ https://issues.apache.org/jira/browse/FLINK-1297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14575462#comment-14575462 ] ASF GitHub Bot commented on FLINK-1297: --- Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/605#issuecomment-109490490 The tests seem to be non-deterministic and fail frequently. Check out this build: https://travis-ci.org/StephanEwen/incubator-flink/jobs/65634990 The tests need to be more stable before we can add this to the codebase. Add support for tracking statistics of intermediate results --- Key: FLINK-1297 URL: https://issues.apache.org/jira/browse/FLINK-1297 Project: Flink Issue Type: Improvement Components: Distributed Runtime Reporter: Alexander Alexandrov Assignee: Alexander Alexandrov Fix For: 0.9 Original Estimate: 1,008h Remaining Estimate: 1,008h One of the major problems related to the optimizer at the moment is the lack of proper statistics. With the introduction of staged execution, it is possible to instrument the runtime code with a statistics facility that collects the required information for optimizing the next execution stage. I would therefore like to contribute code that can be used to gather basic statistics for the (intermediate) result of dataflows (e.g. min, max, count, count distinct) and make them available to the job manager. Before I start, I would like to hear some feedback form the other users. In particular, to handle skew (e.g. on grouping) it might be good to have some sort of detailed sketch about the key distribution of an intermediate result. I am not sure whether a simple histogram is the most effective way to go. Maybe somebody would propose another lightweight sketch that provides better accuracy. -- This message was sent by Atlassian JIRA (v6.3.4#6332)