[GitHub] flink pull request: [FLINK-2001] [ml] Fix DistanceMetric serializa...
GitHub user chiwanpark opened a pull request: https://github.com/apache/flink/pull/668 [FLINK-2001] [ml] Fix DistanceMetric serialization error * `DistanceMetric` extends Serializable * Add simple serialization test You can merge this pull request into a Git repository by running: $ git pull https://github.com/chiwanpark/flink FLINK-2001 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/668.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #668 commit 53726c12c8af09dfbe17903df3efcc1308da0540 Author: Chiwan Park Date: 2015-05-12T06:11:25Z [FLINK-2001] [ml] Fix DistanceMetric serialization error --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-2001) DistanceMetric cannot be serialized
[ https://issues.apache.org/jira/browse/FLINK-2001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14539325#comment-14539325 ] ASF GitHub Bot commented on FLINK-2001: --- GitHub user chiwanpark opened a pull request: https://github.com/apache/flink/pull/668 [FLINK-2001] [ml] Fix DistanceMetric serialization error * `DistanceMetric` extends Serializable * Add simple serialization test You can merge this pull request into a Git repository by running: $ git pull https://github.com/chiwanpark/flink FLINK-2001 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/668.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #668 commit 53726c12c8af09dfbe17903df3efcc1308da0540 Author: Chiwan Park Date: 2015-05-12T06:11:25Z [FLINK-2001] [ml] Fix DistanceMetric serialization error > DistanceMetric cannot be serialized > --- > > Key: FLINK-2001 > URL: https://issues.apache.org/jira/browse/FLINK-2001 > Project: Flink > Issue Type: Bug > Components: Machine Learning Library >Reporter: Chiwan Park >Assignee: Chiwan Park >Priority: Critical > Labels: ML > > Because DistanceMeasure trait doesn't extend Serializable, The task using > DistanceMeasure raises a following exception. > {code} > Task not serializable > org.apache.flink.api.common.InvalidProgramException: Task not serializable > at > org.apache.flink.api.scala.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:179) > at > org.apache.flink.api.scala.ClosureCleaner$.clean(ClosureCleaner.scala:171) > at org.apache.flink.api.scala.DataSet.clean(DataSet.scala:123) > at org.apache.flink.api.scala.DataSet$$anon$10.(DataSet.scala:691) > at org.apache.flink.api.scala.DataSet.combineGroup(DataSet.scala:690) > at org.apache.flink.ml.classification.KNNModel.transform(KNN.scala:78) > at > org.apache.flink.ml.classification.KNNITSuite$$anonfun$1.apply$mcV$sp(KNNSuite.scala:25) > at > org.apache.flink.ml.classification.KNNITSuite$$anonfun$1.apply(KNNSuite.scala:12) > at > org.apache.flink.ml.classification.KNNITSuite$$anonfun$1.apply(KNNSuite.scala:12) > at > org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22) > at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85) > at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) > at org.scalatest.Transformer.apply(Transformer.scala:22) > at org.scalatest.Transformer.apply(Transformer.scala:20) > at org.scalatest.FlatSpecLike$$anon$1.apply(FlatSpecLike.scala:1647) > at org.scalatest.Suite$class.withFixture(Suite.scala:1122) > at org.scalatest.FlatSpec.withFixture(FlatSpec.scala:1683) > at > org.scalatest.FlatSpecLike$class.invokeWithFixture$1(FlatSpecLike.scala:1644) > at > org.scalatest.FlatSpecLike$$anonfun$runTest$1.apply(FlatSpecLike.scala:1656) > at > org.scalatest.FlatSpecLike$$anonfun$runTest$1.apply(FlatSpecLike.scala:1656) > at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306) > at org.scalatest.FlatSpecLike$class.runTest(FlatSpecLike.scala:1656) > at > org.apache.flink.ml.classification.KNNITSuite.org$scalatest$BeforeAndAfter$$super$runTest(KNNSuite.scala:9) > at org.scalatest.BeforeAndAfter$class.runTest(BeforeAndAfter.scala:200) > at > org.apache.flink.ml.classification.KNNITSuite.runTest(KNNSuite.scala:9) > at > org.scalatest.FlatSpecLike$$anonfun$runTests$1.apply(FlatSpecLike.scala:1714) > at > org.scalatest.FlatSpecLike$$anonfun$runTests$1.apply(FlatSpecLike.scala:1714) > at > org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:413) > at > org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:401) > at scala.collection.immutable.List.foreach(List.scala:318) > at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401) > at > org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:390) > at > org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:427) > at > org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:401) > at scala.collection.immutable.List.foreach(List.scala:318) > at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401) > at > org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:396) > at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:483) > at org.scalatest.FlatSpecLike$class.runTests(FlatSpecLike.scala:1714) > at org.scala
[GitHub] flink pull request: [hotfix][scala] Let type analysis work on some...
Github user aljoscha commented on the pull request: https://github.com/apache/flink/pull/660#issuecomment-101145293 It does not treat Java classes the same. For some strange reason, the Scala Type analysis in Scala 2.10 can sometimes not "see" the fields of classes defined in Java. The Scala Type analysis treats Java Tuples as what they are, POJOs. We could maybe add special case handling to make it treat Java tuples as Java tuples. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Created] (FLINK-2001) DistanceMetric cannot be serialized
Chiwan Park created FLINK-2001: -- Summary: DistanceMetric cannot be serialized Key: FLINK-2001 URL: https://issues.apache.org/jira/browse/FLINK-2001 Project: Flink Issue Type: Bug Components: Machine Learning Library Reporter: Chiwan Park Assignee: Chiwan Park Priority: Critical Because DistanceMeasure trait doesn't extend Serializable, The task using DistanceMeasure raises a following exception. {code} Task not serializable org.apache.flink.api.common.InvalidProgramException: Task not serializable at org.apache.flink.api.scala.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:179) at org.apache.flink.api.scala.ClosureCleaner$.clean(ClosureCleaner.scala:171) at org.apache.flink.api.scala.DataSet.clean(DataSet.scala:123) at org.apache.flink.api.scala.DataSet$$anon$10.(DataSet.scala:691) at org.apache.flink.api.scala.DataSet.combineGroup(DataSet.scala:690) at org.apache.flink.ml.classification.KNNModel.transform(KNN.scala:78) at org.apache.flink.ml.classification.KNNITSuite$$anonfun$1.apply$mcV$sp(KNNSuite.scala:25) at org.apache.flink.ml.classification.KNNITSuite$$anonfun$1.apply(KNNSuite.scala:12) at org.apache.flink.ml.classification.KNNITSuite$$anonfun$1.apply(KNNSuite.scala:12) at org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22) at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85) at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) at org.scalatest.Transformer.apply(Transformer.scala:22) at org.scalatest.Transformer.apply(Transformer.scala:20) at org.scalatest.FlatSpecLike$$anon$1.apply(FlatSpecLike.scala:1647) at org.scalatest.Suite$class.withFixture(Suite.scala:1122) at org.scalatest.FlatSpec.withFixture(FlatSpec.scala:1683) at org.scalatest.FlatSpecLike$class.invokeWithFixture$1(FlatSpecLike.scala:1644) at org.scalatest.FlatSpecLike$$anonfun$runTest$1.apply(FlatSpecLike.scala:1656) at org.scalatest.FlatSpecLike$$anonfun$runTest$1.apply(FlatSpecLike.scala:1656) at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306) at org.scalatest.FlatSpecLike$class.runTest(FlatSpecLike.scala:1656) at org.apache.flink.ml.classification.KNNITSuite.org$scalatest$BeforeAndAfter$$super$runTest(KNNSuite.scala:9) at org.scalatest.BeforeAndAfter$class.runTest(BeforeAndAfter.scala:200) at org.apache.flink.ml.classification.KNNITSuite.runTest(KNNSuite.scala:9) at org.scalatest.FlatSpecLike$$anonfun$runTests$1.apply(FlatSpecLike.scala:1714) at org.scalatest.FlatSpecLike$$anonfun$runTests$1.apply(FlatSpecLike.scala:1714) at org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:413) at org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:401) at scala.collection.immutable.List.foreach(List.scala:318) at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401) at org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:390) at org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:427) at org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:401) at scala.collection.immutable.List.foreach(List.scala:318) at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401) at org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:396) at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:483) at org.scalatest.FlatSpecLike$class.runTests(FlatSpecLike.scala:1714) at org.scalatest.FlatSpec.runTests(FlatSpec.scala:1683) at org.scalatest.Suite$class.run(Suite.scala:1424) at org.scalatest.FlatSpec.org$scalatest$FlatSpecLike$$super$run(FlatSpec.scala:1683) at org.scalatest.FlatSpecLike$$anonfun$run$1.apply(FlatSpecLike.scala:1760) at org.scalatest.FlatSpecLike$$anonfun$run$1.apply(FlatSpecLike.scala:1760) at org.scalatest.SuperEngine.runImpl(Engine.scala:545) at org.scalatest.FlatSpecLike$class.run(FlatSpecLike.scala:1760) at org.apache.flink.ml.classification.KNNITSuite.org$scalatest$BeforeAndAfter$$super$run(KNNSuite.scala:9) at org.scalatest.BeforeAndAfter$class.run(BeforeAndAfter.scala:241) at org.apache.flink.ml.classification.KNNITSuite.run(KNNSuite.scala:9) at org.scalatest.tools.SuiteRunner.run(SuiteRunner.scala:55) at org.scalatest.tools.Runner$$anonfun$doRunRunRunDaDoRunRun$3.apply(Runner.scala:2563) at org.scalatest.tools.Runner$$anonfun$doRunRunRunDaDoRunRun$3.apply(Runner.scala:2557) at scala.collection.immutable.List.foreach(List.scala:31
[jira] [Commented] (FLINK-1980) Allowing users to decorate input streams
[ https://issues.apache.org/jira/browse/FLINK-1980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14538747#comment-14538747 ] ASF GitHub Bot commented on FLINK-1980: --- Github user sekruse commented on the pull request: https://github.com/apache/flink/pull/658#issuecomment-101064479 I wanted to integrate the gz support with the decorateStream method, therefore I was waiting for this PR to be merged. I can of course skip the seek, but I thought that forward seeking would be useful as it enables splittable file formats. > Allowing users to decorate input streams > > > Key: FLINK-1980 > URL: https://issues.apache.org/jira/browse/FLINK-1980 > Project: Flink > Issue Type: New Feature > Components: Core >Reporter: Sebastian Kruse >Assignee: Sebastian Kruse >Priority: Minor > > Users may have to do unforeseeable operations on file input streams before > they can be used by the actual input format logic, e.g., exotic compression > formats or preambles such as byte order marks. Therefore, it would be useful > to provide the user with a hook to decorate input streams in order to handle > such issues. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1980] allowing users to decorate input ...
Github user sekruse commented on the pull request: https://github.com/apache/flink/pull/658#issuecomment-101064479 I wanted to integrate the gz support with the decorateStream method, therefore I was waiting for this PR to be merged. I can of course skip the seek, but I thought that forward seeking would be useful as it enables splittable file formats. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1523) Vertex-centric iteration extensions
[ https://issues.apache.org/jira/browse/FLINK-1523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14538662#comment-14538662 ] ASF GitHub Bot commented on FLINK-1523: --- Github user vasia commented on the pull request: https://github.com/apache/flink/pull/537#issuecomment-101048928 Hey @andralungu , I built on your changes and implemented what I described above. I have pushed in [this branch](https://github.com/vasia/flink/tree/vc-extentions). I had to manually resolve conflicts caused by #657, so I guess the easiest way to allow people to review would be for me to open a fresh PR. Since you are the one mostly familiar with these features, could you please take a look and let me know what you think? Thanks a lot! > Vertex-centric iteration extensions > --- > > Key: FLINK-1523 > URL: https://issues.apache.org/jira/browse/FLINK-1523 > Project: Flink > Issue Type: Improvement > Components: Gelly >Reporter: Vasia Kalavri >Assignee: Andra Lungu > > We would like to make the following extensions to the vertex-centric > iterations of Gelly: > - allow vertices to access their in/out degrees and the total number of > vertices of the graph, inside the iteration. > - allow choosing the neighborhood type (in/out/all) over which to run the > vertex-centric iteration. Now, the model uses the updates of the in-neighbors > to calculate state and send messages to out-neighbors. We could add a > parameter with value "in/out/all" to the {{VertexUpdateFunction}} and > {{MessagingFunction}}, that would indicate the type of neighborhood. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1523][gelly] Vertex centric iteration e...
Github user vasia commented on the pull request: https://github.com/apache/flink/pull/537#issuecomment-101048928 Hey @andralungu , I built on your changes and implemented what I described above. I have pushed in [this branch](https://github.com/vasia/flink/tree/vc-extentions). I had to manually resolve conflicts caused by #657, so I guess the easiest way to allow people to review would be for me to open a fresh PR. Since you are the one mostly familiar with these features, could you please take a look and let me know what you think? Thanks a lot! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1962) Add Gelly Scala API
[ https://issues.apache.org/jira/browse/FLINK-1962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14538622#comment-14538622 ] Vasia Kalavri commented on FLINK-1962: -- It would be great if we could somehow overcome this through the Scala type analysis system. Most of the Gelly methods are currently built on the assumption that Vertex and Edge types are Tuples and it wouldn't be trivial to change them. > Add Gelly Scala API > --- > > Key: FLINK-1962 > URL: https://issues.apache.org/jira/browse/FLINK-1962 > Project: Flink > Issue Type: Task > Components: Gelly, Scala API >Affects Versions: 0.9 >Reporter: Vasia Kalavri > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (FLINK-1969) Remove old profile code
[ https://issues.apache.org/jira/browse/FLINK-1969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephan Ewen closed FLINK-1969. --- > Remove old profile code > --- > > Key: FLINK-1969 > URL: https://issues.apache.org/jira/browse/FLINK-1969 > Project: Flink > Issue Type: Bug > Components: Distributed Runtime >Affects Versions: 0.9 >Reporter: Stephan Ewen >Assignee: Stephan Ewen > Fix For: 0.9 > > > The old profiler code is not instantiated any more and is basically dead. > It has in parts been replaced by the metrics library already. > The classes still get in the way during refactoring, which is why I suggest > to remove them. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (FLINK-1969) Remove old profile code
[ https://issues.apache.org/jira/browse/FLINK-1969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephan Ewen resolved FLINK-1969. - Resolution: Done Done in fbea2da26d01c470687a5ad217a5fd6ad1de89e4 > Remove old profile code > --- > > Key: FLINK-1969 > URL: https://issues.apache.org/jira/browse/FLINK-1969 > Project: Flink > Issue Type: Bug > Components: Distributed Runtime >Affects Versions: 0.9 >Reporter: Stephan Ewen >Assignee: Stephan Ewen > Fix For: 0.9 > > > The old profiler code is not instantiated any more and is basically dead. > It has in parts been replaced by the metrics library already. > The classes still get in the way during refactoring, which is why I suggest > to remove them. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (FLINK-1672) Refactor task registration/unregistration
[ https://issues.apache.org/jira/browse/FLINK-1672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephan Ewen resolved FLINK-1672. - Resolution: Fixed Fix Version/s: 0.9 Assignee: Stephan Ewen Implemented in 8e61301452218e6d279b013beb7bbd02a7c2e3f9 > Refactor task registration/unregistration > - > > Key: FLINK-1672 > URL: https://issues.apache.org/jira/browse/FLINK-1672 > Project: Flink > Issue Type: Improvement > Components: Distributed Runtime >Reporter: Ufuk Celebi >Assignee: Stephan Ewen > Fix For: 0.9 > > > h4. Current control flow for task registrations > # JM submits a TaskDeploymentDescriptor to a TM > ## TM registers the required JAR files with the LibraryCacheManager and > returns the user code class loader > ## TM creates a Task instance and registers the task in the runningTasks map > ## TM creates a TaskInputSplitProvider > ## TM creates a RuntimeEnvironment and sets it as the environment for the task > ## TM registers the task with the network environment > ## TM sends async msg to profiler to monitor tasks > ## TM creates temporary files in file cache > ## TM tries to start the task > If any operation >= 1.2 fails: > * TM calls task.failExternally() > * TM removes temporary files from file cache > * TM unregisters the task from the network environment > * TM sends async msg to profiler to unmonitor tasks > * TM calls unregisterMemoryManager on task > If 1.1 fails, only unregister from LibraryCacheManager. > h4. RuntimeEnvironment, Task, TaskManager separation > The RuntimeEnvironment has references to certain components of the task > manager like memory manager, which are accecssed from the task. Furthermore > it implements Runnable, and creates the executing task Thread. The Task > instance essentially wraps the RuntimeEnvironment and allows asynchronous > state management of the task (RUNNING, FINISHED, etc.). > The way that the state updates affect the task is not that obvious: state > changes trigger messages to the TM, which for final states further trigger a > msg to unregister the task. The way that tasks are unregistered again depends > on the state of the task. > > I would propose to refactor this to make the way the state > handling/registration/unregistration is handled is more transparent. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (FLINK-1672) Refactor task registration/unregistration
[ https://issues.apache.org/jira/browse/FLINK-1672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephan Ewen closed FLINK-1672. --- > Refactor task registration/unregistration > - > > Key: FLINK-1672 > URL: https://issues.apache.org/jira/browse/FLINK-1672 > Project: Flink > Issue Type: Improvement > Components: Distributed Runtime >Reporter: Ufuk Celebi >Assignee: Stephan Ewen > Fix For: 0.9 > > > h4. Current control flow for task registrations > # JM submits a TaskDeploymentDescriptor to a TM > ## TM registers the required JAR files with the LibraryCacheManager and > returns the user code class loader > ## TM creates a Task instance and registers the task in the runningTasks map > ## TM creates a TaskInputSplitProvider > ## TM creates a RuntimeEnvironment and sets it as the environment for the task > ## TM registers the task with the network environment > ## TM sends async msg to profiler to monitor tasks > ## TM creates temporary files in file cache > ## TM tries to start the task > If any operation >= 1.2 fails: > * TM calls task.failExternally() > * TM removes temporary files from file cache > * TM unregisters the task from the network environment > * TM sends async msg to profiler to unmonitor tasks > * TM calls unregisterMemoryManager on task > If 1.1 fails, only unregister from LibraryCacheManager. > h4. RuntimeEnvironment, Task, TaskManager separation > The RuntimeEnvironment has references to certain components of the task > manager like memory manager, which are accecssed from the task. Furthermore > it implements Runnable, and creates the executing task Thread. The Task > instance essentially wraps the RuntimeEnvironment and allows asynchronous > state management of the task (RUNNING, FINISHED, etc.). > The way that the state updates affect the task is not that obvious: state > changes trigger messages to the TM, which for final states further trigger a > msg to unregister the task. The way that tasks are unregistered again depends > on the state of the task. > > I would propose to refactor this to make the way the state > handling/registration/unregistration is handled is more transparent. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (FLINK-1968) Make Distributed Cache more robust
[ https://issues.apache.org/jira/browse/FLINK-1968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephan Ewen closed FLINK-1968. --- > Make Distributed Cache more robust > -- > > Key: FLINK-1968 > URL: https://issues.apache.org/jira/browse/FLINK-1968 > Project: Flink > Issue Type: Bug > Components: Local Runtime >Affects Versions: 0.9 >Reporter: Stephan Ewen >Assignee: Stephan Ewen > Fix For: 0.9 > > > The distributed cache has a variety of issues at the moment. > - It does not give a proper exception when a non-cached file is accessed > - It swallows I/O exceptions that happen during file transfer and later only > returns null > - It keeps inonsistently reference counts and attempts to copy often, > resolving this via file collisions > - Files are not properly removes on shutdown > - No shutdown hook to remove files when process is killed -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (FLINK-1968) Make Distributed Cache more robust
[ https://issues.apache.org/jira/browse/FLINK-1968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephan Ewen resolved FLINK-1968. - Resolution: Fixed Fixed via 1c8d866a83065e3d1bc9707dab81117f24c9f678 > Make Distributed Cache more robust > -- > > Key: FLINK-1968 > URL: https://issues.apache.org/jira/browse/FLINK-1968 > Project: Flink > Issue Type: Bug > Components: Local Runtime >Affects Versions: 0.9 >Reporter: Stephan Ewen >Assignee: Stephan Ewen > Fix For: 0.9 > > > The distributed cache has a variety of issues at the moment. > - It does not give a proper exception when a non-cached file is accessed > - It swallows I/O exceptions that happen during file transfer and later only > returns null > - It keeps inonsistently reference counts and attempts to copy often, > resolving this via file collisions > - Files are not properly removes on shutdown > - No shutdown hook to remove files when process is killed -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1525) Provide utils to pass -D parameters to UDFs
[ https://issues.apache.org/jira/browse/FLINK-1525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14538574#comment-14538574 ] ASF GitHub Bot commented on FLINK-1525: --- Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/664#issuecomment-101035060 Nice idea. Do we need the special `UserConfig` interface, or can we use a `Properties` object, or a directly a `Map`? > Provide utils to pass -D parameters to UDFs > > > Key: FLINK-1525 > URL: https://issues.apache.org/jira/browse/FLINK-1525 > Project: Flink > Issue Type: Improvement > Components: flink-contrib >Reporter: Robert Metzger > Labels: starter > > Hadoop users are used to setting job configuration through "-D" on the > command line. > Right now, Flink users have to manually parse command line arguments and pass > them to the methods. > It would be nice to provide a standard args parser with is taking care of > such stuff. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1525][FEEDBACK] Introduction of a small...
Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/664#issuecomment-101035060 Nice idea. Do we need the special `UserConfig` interface, or can we use a `Properties` object, or a directly a `Map`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1962) Add Gelly Scala API
[ https://issues.apache.org/jira/browse/FLINK-1962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14538571#comment-14538571 ] Stephan Ewen commented on FLINK-1962: - This seems like a flaw in the Scala Type analysis. The Scala type system should really support the Java types properly. This will be a recurring issue, if consider it common practice to build Scala libraries on Java libraries. > Add Gelly Scala API > --- > > Key: FLINK-1962 > URL: https://issues.apache.org/jira/browse/FLINK-1962 > Project: Flink > Issue Type: Task > Components: Gelly, Scala API >Affects Versions: 0.9 >Reporter: Vasia Kalavri > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1874) Break up streaming connectors into submodules
[ https://issues.apache.org/jira/browse/FLINK-1874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14538556#comment-14538556 ] Stephan Ewen commented on FLINK-1874: - I think we are about to merge the biggest change, so this issue should become available soon... > Break up streaming connectors into submodules > - > > Key: FLINK-1874 > URL: https://issues.apache.org/jira/browse/FLINK-1874 > Project: Flink > Issue Type: Task > Components: Build System, Streaming >Affects Versions: 0.9 >Reporter: Robert Metzger > Labels: starter > > As per: > http://apache-flink-incubator-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Break-up-streaming-connectors-into-subprojects-td5001.html -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1959) Accumulators BROKEN after Partitioning
[ https://issues.apache.org/jira/browse/FLINK-1959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14538540#comment-14538540 ] Stephan Ewen commented on FLINK-1959: - I will try and look into this very soon... > Accumulators BROKEN after Partitioning > -- > > Key: FLINK-1959 > URL: https://issues.apache.org/jira/browse/FLINK-1959 > Project: Flink > Issue Type: Bug > Components: Examples >Affects Versions: master >Reporter: mustafa elbehery >Priority: Critical > Fix For: master > > > while running the Accumulator example in > https://github.com/Elbehery/flink/blob/master/flink-examples/flink-java-examples/src/main/java/org/apache/flink/examples/java/relational/EmptyFieldsCountAccumulator.java, > > I tried to alter the data flow with "PartitionByHash" function before > applying "Filter", and the resulted accumulator was NULL. > By Debugging, I could see the accumulator in the RunTime Map. However, by > retrieving the accumulator from the JobExecutionResult object, it was NULL. > The line caused the problem is "file.partitionByHash(1).filter(new > EmptyFieldFilter())" instead of "file.filter(new EmptyFieldFilter())" -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1980) Allowing users to decorate input streams
[ https://issues.apache.org/jira/browse/FLINK-1980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14538497#comment-14538497 ] ASF GitHub Bot commented on FLINK-1980: --- Github user rmetzger commented on the pull request: https://github.com/apache/flink/pull/658#issuecomment-101026234 I think Sebastian has already filed a JIRA for adding gz read support. > Allowing users to decorate input streams > > > Key: FLINK-1980 > URL: https://issues.apache.org/jira/browse/FLINK-1980 > Project: Flink > Issue Type: New Feature > Components: Core >Reporter: Sebastian Kruse >Assignee: Sebastian Kruse >Priority: Minor > > Users may have to do unforeseeable operations on file input streams before > they can be used by the actual input format logic, e.g., exotic compression > formats or preambles such as byte order marks. Therefore, it would be useful > to provide the user with a hook to decorate input streams in order to handle > such issues. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1980] allowing users to decorate input ...
Github user rmetzger commented on the pull request: https://github.com/apache/flink/pull/658#issuecomment-101026234 I think Sebastian has already filed a JIRA for adding gz read support. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1980] allowing users to decorate input ...
Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/658#issuecomment-101024456 Looks good. One comment, though: - The position tracking in the stream wrapper is a nice idea, but since the `skip()` method is anyways only a hint (it may skip less if it wants) and seeking works only in one direction, why not skip seek support completely? Also simplifies the code and makes it even more lightweight. As a followup: Should we add more built-in decompressors for other file endings, like `*.gz` ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Updated] (FLINK-1996) Add output methods to Table API
[ https://issues.apache.org/jira/browse/FLINK-1996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aljoscha Krettek updated FLINK-1996: Assignee: (was: Aljoscha Krettek) > Add output methods to Table API > --- > > Key: FLINK-1996 > URL: https://issues.apache.org/jira/browse/FLINK-1996 > Project: Flink > Issue Type: Improvement > Components: Table API >Affects Versions: 0.9 >Reporter: Fabian Hueske > > Tables need to be converted to DataSets (or DataStreams) to write them out. > It would be good to have a way to emit Table results directly for example to > print, CSV, JDBC, HBase, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1996) Add output methods to Table API
[ https://issues.apache.org/jira/browse/FLINK-1996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14538490#comment-14538490 ] Aljoscha Krettek commented on FLINK-1996: - I would say so, yes. > Add output methods to Table API > --- > > Key: FLINK-1996 > URL: https://issues.apache.org/jira/browse/FLINK-1996 > Project: Flink > Issue Type: Improvement > Components: Table API >Affects Versions: 0.9 >Reporter: Fabian Hueske > > Tables need to be converted to DataSets (or DataStreams) to write them out. > It would be good to have a way to emit Table results directly for example to > print, CSV, JDBC, HBase, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1980) Allowing users to decorate input streams
[ https://issues.apache.org/jira/browse/FLINK-1980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14538486#comment-14538486 ] ASF GitHub Bot commented on FLINK-1980: --- Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/658#issuecomment-101024456 Looks good. One comment, though: - The position tracking in the stream wrapper is a nice idea, but since the `skip()` method is anyways only a hint (it may skip less if it wants) and seeking works only in one direction, why not skip seek support completely? Also simplifies the code and makes it even more lightweight. As a followup: Should we add more built-in decompressors for other file endings, like `*.gz` ? > Allowing users to decorate input streams > > > Key: FLINK-1980 > URL: https://issues.apache.org/jira/browse/FLINK-1980 > Project: Flink > Issue Type: New Feature > Components: Core >Reporter: Sebastian Kruse >Assignee: Sebastian Kruse >Priority: Minor > > Users may have to do unforeseeable operations on file input streams before > they can be used by the actual input format logic, e.g., exotic compression > formats or preambles such as byte order marks. Therefore, it would be useful > to provide the user with a hook to decorate input streams in order to handle > such issues. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1996) Add output methods to Table API
[ https://issues.apache.org/jira/browse/FLINK-1996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14538457#comment-14538457 ] Stephan Ewen commented on FLINK-1996: - Does this make a good starter issue? > Add output methods to Table API > --- > > Key: FLINK-1996 > URL: https://issues.apache.org/jira/browse/FLINK-1996 > Project: Flink > Issue Type: Improvement > Components: Table API >Affects Versions: 0.9 >Reporter: Fabian Hueske >Assignee: Aljoscha Krettek > > Tables need to be converted to DataSets (or DataStreams) to write them out. > It would be good to have a way to emit Table results directly for example to > print, CSV, JDBC, HBase, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1996) Add output methods to Table API
[ https://issues.apache.org/jira/browse/FLINK-1996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14538456#comment-14538456 ] Stephan Ewen commented on FLINK-1996: - I think it may actually be worth adding those methods, where possible. It really is much simpler that way... Their implementation could be slim, if they delegate to the data set code / output formats. > Add output methods to Table API > --- > > Key: FLINK-1996 > URL: https://issues.apache.org/jira/browse/FLINK-1996 > Project: Flink > Issue Type: Improvement > Components: Table API >Affects Versions: 0.9 >Reporter: Fabian Hueske >Assignee: Aljoscha Krettek > > Tables need to be converted to DataSets (or DataStreams) to write them out. > It would be good to have a way to emit Table results directly for example to > print, CSV, JDBC, HBase, etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1989) Sorting of POJO data set from TableEnv yields NotSerializableException
[ https://issues.apache.org/jira/browse/FLINK-1989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14538407#comment-14538407 ] Aljoscha Krettek commented on FLINK-1989: - I could rework the code generation to generate the complete code on the client and only ship the generated Java code in a String. Only compilation from Java to code would then happen at runtime on the TaskManager. > Sorting of POJO data set from TableEnv yields NotSerializableException > -- > > Key: FLINK-1989 > URL: https://issues.apache.org/jira/browse/FLINK-1989 > Project: Flink > Issue Type: Bug > Components: Table API >Affects Versions: 0.9 >Reporter: Fabian Hueske >Assignee: Aljoscha Krettek > Fix For: 0.9 > > > Sorting or grouping (or probably any other key operation) on a POJO data set > that was created by a {{TableEnvironment}} yields a > {{NotSerializableException}} due to a non-serializable > {{java.lang.reflect.Field}} object. > I traced the error back to the {{ExpressionSelectFunction}}. I guess that a > {{TypeInformation}} object is stored in the generated user-code function. A > {{PojoTypeInfo}} holds Field objects, which cannot be serialized. > The following test can be pasted into the {{SelectITCase}} and reproduces the > problem. > {code} > @Test > public void testGroupByAfterTable() throws Exception { > ExecutionEnvironment env = > ExecutionEnvironment.getExecutionEnvironment(); > TableEnvironment tableEnv = new TableEnvironment(); > DataSet> ds = > CollectionDataSets.get3TupleDataSet(env); > Table in = tableEnv.toTable(ds, "a,b,c"); > Table result = in > .select("a, b, c"); > DataSet resultSet = tableEnv.toSet(result, ABC.class); > resultSet > .sortPartition("a", Order.DESCENDING) > .writeAsText(resultPath, > FileSystem.WriteMode.OVERWRITE); > env.execute(); > expected = "1,1,Hi\n" + "2,2,Hello\n" + "3,2,Hello world\n" + > "4,3,Hello world, " + > "how are you?\n" + "5,3,I am fine.\n" + "6,3,Luke > Skywalker\n" + "7,4," + > "Comment#1\n" + "8,4,Comment#2\n" + "9,4,Comment#3\n" + > "10,4,Comment#4\n" + "11,5," + > "Comment#5\n" + "12,5,Comment#6\n" + "13,5,Comment#7\n" > + "14,5,Comment#8\n" + "15,5," + > "Comment#9\n" + "16,6,Comment#10\n" + > "17,6,Comment#11\n" + "18,6,Comment#12\n" + "19," + > "6,Comment#13\n" + "20,6,Comment#14\n" + > "21,6,Comment#15\n"; > } > public static class ABC { > public int a; > public long b; > public String c; > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1676) enableForceKryo() is not working as expected
[ https://issues.apache.org/jira/browse/FLINK-1676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14538370#comment-14538370 ] ASF GitHub Bot commented on FLINK-1676: --- Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/473#issuecomment-101010857 I maintain one comment, but I am not blocking this. Feel free to merge... > enableForceKryo() is not working as expected > > > Key: FLINK-1676 > URL: https://issues.apache.org/jira/browse/FLINK-1676 > Project: Flink > Issue Type: Bug > Components: Java API >Affects Versions: 0.9 >Reporter: Robert Metzger > > I my Flink job, I've set the following execution config > {code} > final ExecutionEnvironment env = > ExecutionEnvironment.getExecutionEnvironment(); > env.getConfig().disableObjectReuse(); > env.getConfig().enableForceKryo(); > {code} > Setting a breakpoint in the {{PojoSerializer()}} constructor, you'll see that > we still serialize data with the POJO serializer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1676] Rework ExecutionConfig.enableForc...
Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/473#issuecomment-101010857 I maintain one comment, but I am not blocking this. Feel free to merge... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1676] Rework ExecutionConfig.enableForc...
Github user StephanEwen commented on a diff in the pull request: https://github.com/apache/flink/pull/473#discussion_r30067549 --- Diff: flink-java/src/main/java/org/apache/flink/api/java/typeutils/PojoTypeInfo.java --- @@ -313,6 +319,15 @@ public int getFieldIndex(String fieldName) { @Override public TypeSerializer createSerializer(ExecutionConfig config) { + if(config.isForceKryoEnabled()) { + LOG.debug("Using KryoSerializer for serializing POJOs"); --- End diff -- I still think this is the wrong place to log this, it will get repeated a gazillion times on large plans. Also, it is logged in some places on the client, and sometimes in the runtime. It is not terribly bad, since it is debug level, but I don't get the reason to have it in the first place. Since you have no context information (what stream/dataset this is for), it is not a good debugging help anyways... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1676) enableForceKryo() is not working as expected
[ https://issues.apache.org/jira/browse/FLINK-1676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14538366#comment-14538366 ] ASF GitHub Bot commented on FLINK-1676: --- Github user StephanEwen commented on a diff in the pull request: https://github.com/apache/flink/pull/473#discussion_r30067549 --- Diff: flink-java/src/main/java/org/apache/flink/api/java/typeutils/PojoTypeInfo.java --- @@ -313,6 +319,15 @@ public int getFieldIndex(String fieldName) { @Override public TypeSerializer createSerializer(ExecutionConfig config) { + if(config.isForceKryoEnabled()) { + LOG.debug("Using KryoSerializer for serializing POJOs"); --- End diff -- I still think this is the wrong place to log this, it will get repeated a gazillion times on large plans. Also, it is logged in some places on the client, and sometimes in the runtime. It is not terribly bad, since it is debug level, but I don't get the reason to have it in the first place. Since you have no context information (what stream/dataset this is for), it is not a good debugging help anyways... > enableForceKryo() is not working as expected > > > Key: FLINK-1676 > URL: https://issues.apache.org/jira/browse/FLINK-1676 > Project: Flink > Issue Type: Bug > Components: Java API >Affects Versions: 0.9 >Reporter: Robert Metzger > > I my Flink job, I've set the following execution config > {code} > final ExecutionEnvironment env = > ExecutionEnvironment.getExecutionEnvironment(); > env.getConfig().disableObjectReuse(); > env.getConfig().enableForceKryo(); > {code} > Setting a breakpoint in the {{PojoSerializer()}} constructor, you'll see that > we still serialize data with the POJO serializer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1989) Sorting of POJO data set from TableEnv yields NotSerializableException
[ https://issues.apache.org/jira/browse/FLINK-1989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14538348#comment-14538348 ] Stephan Ewen commented on FLINK-1989: - Storing the type information breaks with the current design principles, where the {{TypeInformation}} is a pure pre-flight concept, and the {{TypeSerializer}} and {{TypeComparator}} are the runtime handles. > Sorting of POJO data set from TableEnv yields NotSerializableException > -- > > Key: FLINK-1989 > URL: https://issues.apache.org/jira/browse/FLINK-1989 > Project: Flink > Issue Type: Bug > Components: Table API >Affects Versions: 0.9 >Reporter: Fabian Hueske >Assignee: Aljoscha Krettek > Fix For: 0.9 > > > Sorting or grouping (or probably any other key operation) on a POJO data set > that was created by a {{TableEnvironment}} yields a > {{NotSerializableException}} due to a non-serializable > {{java.lang.reflect.Field}} object. > I traced the error back to the {{ExpressionSelectFunction}}. I guess that a > {{TypeInformation}} object is stored in the generated user-code function. A > {{PojoTypeInfo}} holds Field objects, which cannot be serialized. > The following test can be pasted into the {{SelectITCase}} and reproduces the > problem. > {code} > @Test > public void testGroupByAfterTable() throws Exception { > ExecutionEnvironment env = > ExecutionEnvironment.getExecutionEnvironment(); > TableEnvironment tableEnv = new TableEnvironment(); > DataSet> ds = > CollectionDataSets.get3TupleDataSet(env); > Table in = tableEnv.toTable(ds, "a,b,c"); > Table result = in > .select("a, b, c"); > DataSet resultSet = tableEnv.toSet(result, ABC.class); > resultSet > .sortPartition("a", Order.DESCENDING) > .writeAsText(resultPath, > FileSystem.WriteMode.OVERWRITE); > env.execute(); > expected = "1,1,Hi\n" + "2,2,Hello\n" + "3,2,Hello world\n" + > "4,3,Hello world, " + > "how are you?\n" + "5,3,I am fine.\n" + "6,3,Luke > Skywalker\n" + "7,4," + > "Comment#1\n" + "8,4,Comment#2\n" + "9,4,Comment#3\n" + > "10,4,Comment#4\n" + "11,5," + > "Comment#5\n" + "12,5,Comment#6\n" + "13,5,Comment#7\n" > + "14,5,Comment#8\n" + "15,5," + > "Comment#9\n" + "16,6,Comment#10\n" + > "17,6,Comment#11\n" + "18,6,Comment#12\n" + "19," + > "6,Comment#13\n" + "20,6,Comment#14\n" + > "21,6,Comment#15\n"; > } > public static class ABC { > public int a; > public long b; > public String c; > } > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1986) Group by fails on iterative data streams
[ https://issues.apache.org/jira/browse/FLINK-1986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14538328#comment-14538328 ] Stephan Ewen commented on FLINK-1986: - [~gyfora] and [~senorcarbone] Are you using co-location constraints, to make sure that head and tail of an iteration are co-located? Otherwise that is not guaranteed, but required by the backchannel broker. > Group by fails on iterative data streams > > > Key: FLINK-1986 > URL: https://issues.apache.org/jira/browse/FLINK-1986 > Project: Flink > Issue Type: Bug > Components: Streaming >Reporter: Daniel Bali > Labels: iteration, streaming > > Hello! > When I try to run a `groupBy` on an IterativeDataStream I get a > NullPointerException. Here is the code that reproduces the issue: > {code} > public Test() throws Exception { > StreamExecutionEnvironment env = > StreamExecutionEnvironment.createLocalEnvironment(); > DataStream> edges = env > .generateSequence(0, 7) > .map(new MapFunction>() { > @Override > public Tuple2 map(Long v) throws Exception { > return new Tuple2<>(v, (v + 1)); > } > }); > IterativeDataStream> iteration = edges.iterate(); > SplitDataStream> step = iteration.groupBy(1) > .map(new MapFunction, Tuple2>() { > @Override > public Tuple2 map(Tuple2 tuple) > throws Exception { > return tuple; > } > }) > .split(new OutputSelector>() { > @Override > public Iterable select(Tuple2 tuple) { > List output = new ArrayList<>(); > output.add("iterate"); > return output; > } > }); > iteration.closeWith(step.select("iterate")); > env.execute("Sandbox"); > } > {code} > Moving the groupBy before the iteration solves the issue. e.g. this works: > {code} > ... iteration = edges.groupBy(1).iterate(); > iteration.map(...) > {code} > Here is the stack trace: > {code} > Exception in thread "main" java.lang.NullPointerException > at > org.apache.flink.streaming.api.graph.StreamGraph.addIterationTail(StreamGraph.java:207) > at > org.apache.flink.streaming.api.datastream.IterativeDataStream.closeWith(IterativeDataStream.java:72) > at org.apache.flink.graph.streaming.example.Test.(Test.java:73) > at org.apache.flink.graph.streaming.example.Test.main(Test.java:79) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:601) > at com.intellij.rt.execution.application.AppMain.main(AppMain.java:134) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [hotfix][scala] Let type analysis work on some...
Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/660#issuecomment-101000540 Does the Scala API not treat Java classes the same way as its own classes? That sounds absolutely like the right thing to do. Why not do as much compile time analysis as possible... To understand this better, does the Scala type analysis now generate - Java Tuple TypeInformation? - Does it treat Java POJOs as POJO types? - ... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [hotfix][scala] Let type analysis work on some...
Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/660#issuecomment-100999172 Seems to contain a second fix, to skip static fields in data types. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-951] Reworking of Iteration Synchroniza...
Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/570#issuecomment-100996059 I had a look at the pull request and I like very much what it tries to do. The problem right now is that I can hardly say without investing a lot of time whether this is in good shape to merge. This pull request does a at least two very big things at the same time: - Move iteration synchronization to the JobManager - Unify aggregators and accumulators into one. With all the example / testcase adjustments, this becomes a lot to review. The description of the pull request also does not make it easy, since many questions and decisions that arise are not explained: - What interface do the unified aggregators/accumulators follow: The aggregators, or the accumulators. - How is the blocking superstep synchronization currently done. With actor ask? - How is the aggregator/accumulator unification achieved, when aggregators are created per superstep, and accumulators once? This is a lot for a very delicate and critical mechanism. I think if we want to merge this, we would need more details on how things were changed (what is the concept behind the changed, not just what are the code diffs). We may need to break it into multiple self-contained changes that we can individually review and merge, to make sure that it gets properly checked and will work robustly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-951) Reworking of Iteration Synchronization, Accumulators and Aggregators
[ https://issues.apache.org/jira/browse/FLINK-951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14538269#comment-14538269 ] ASF GitHub Bot commented on FLINK-951: -- Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/570#issuecomment-100996059 I had a look at the pull request and I like very much what it tries to do. The problem right now is that I can hardly say without investing a lot of time whether this is in good shape to merge. This pull request does a at least two very big things at the same time: - Move iteration synchronization to the JobManager - Unify aggregators and accumulators into one. With all the example / testcase adjustments, this becomes a lot to review. The description of the pull request also does not make it easy, since many questions and decisions that arise are not explained: - What interface do the unified aggregators/accumulators follow: The aggregators, or the accumulators. - How is the blocking superstep synchronization currently done. With actor ask? - How is the aggregator/accumulator unification achieved, when aggregators are created per superstep, and accumulators once? This is a lot for a very delicate and critical mechanism. I think if we want to merge this, we would need more details on how things were changed (what is the concept behind the changed, not just what are the code diffs). We may need to break it into multiple self-contained changes that we can individually review and merge, to make sure that it gets properly checked and will work robustly. > Reworking of Iteration Synchronization, Accumulators and Aggregators > > > Key: FLINK-951 > URL: https://issues.apache.org/jira/browse/FLINK-951 > Project: Flink > Issue Type: Improvement > Components: Iterations, Optimizer >Affects Versions: 0.9 >Reporter: Markus Holzemer >Assignee: Markus Holzemer > Labels: refactoring > Original Estimate: 168h > Remaining Estimate: 168h > > I just realized that there is no real Jira issue for the task I am currently > working on. > I am currently reworking a few things regarding Iteration Synchronization, > Accumulators and Aggregators. Currently the synchronization at the end of one > superstep is done through channel events. That makes it hard to track the > current status of iterations. That is why I am changing this synchronization > to use RPC calls with the JobManager, so that the JobManager manages the > current status of all iterations. > Currently we use Accumulators outside of iterations and Aggregators inside of > iterations. Both have a similiar function, but a bit different interfaces and > handling. I want to unify these two concepts. I propose that we stick in the > future to Accumulators only. Aggregators therefore are removed and > Accumulators are extended to cover the usecases Aggregators were used fore > before. The switch to RPC for iterations makes it possible to also send the > current Accumulator values at the end of each superstep, so that the > JobManager (and thereby the webinterface) will be able to print intermediate > accumulation results. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1982) Remove dependencies on Record for Flink runtime and core
[ https://issues.apache.org/jira/browse/FLINK-1982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14538253#comment-14538253 ] Henry Saputra commented on FLINK-1982: -- Thanks [~StephanEwen] ! As for terasort test, if we finally got to remove all tests dependency, would it be ok to disable the test to be able to remove the Record API? My preference is to remove deprecated and not preferred APIs to use Flink. > Remove dependencies on Record for Flink runtime and core > > > Key: FLINK-1982 > URL: https://issues.apache.org/jira/browse/FLINK-1982 > Project: Flink > Issue Type: Sub-task > Components: Core >Reporter: Henry Saputra > > Seemed like there are several uses of Record API in core and runtime module > that need to be updated before Record API could be removed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1983) Remove dependencies on Record APIs for Spargel
[ https://issues.apache.org/jira/browse/FLINK-1983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14538180#comment-14538180 ] Stephan Ewen commented on FLINK-1983: - This seems doable without implications. > Remove dependencies on Record APIs for Spargel > -- > > Key: FLINK-1983 > URL: https://issues.apache.org/jira/browse/FLINK-1983 > Project: Flink > Issue Type: Sub-task > Components: Spargel >Reporter: Henry Saputra > > Need to remove usage of Record API in Spargel -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1982) Remove dependencies on Record for Flink runtime and core
[ https://issues.apache.org/jira/browse/FLINK-1982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14538156#comment-14538156 ] Stephan Ewen commented on FLINK-1982: - I think the runtime and optimizer specializations can be removed with the API together in one patch. The last blocker is probably that some of the runtime tests are implemented in the Record API. We would need to migrate some, many can probably be dropped, as they are redundant now. The only test that we cannot port is the terasort test, because the other APIs do not yet support range partitioning. > Remove dependencies on Record for Flink runtime and core > > > Key: FLINK-1982 > URL: https://issues.apache.org/jira/browse/FLINK-1982 > Project: Flink > Issue Type: Sub-task > Components: Core >Reporter: Henry Saputra > > Seemed like there are several uses of Record API in core and runtime module > that need to be updated before Record API could be removed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1977] Rework Stream Operators to always...
Github user aljoscha commented on the pull request: https://github.com/apache/flink/pull/659#issuecomment-100953248 I think I addressed all the (reasonable) comments you made? The user facing API does not change in any way from this and I tried to pick consistent names for the internal tasks and operators. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1977) Rework Stream Operators to always be push based
[ https://issues.apache.org/jira/browse/FLINK-1977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14538077#comment-14538077 ] ASF GitHub Bot commented on FLINK-1977: --- Github user aljoscha commented on the pull request: https://github.com/apache/flink/pull/659#issuecomment-100953248 I think I addressed all the (reasonable) comments you made? The user facing API does not change in any way from this and I tried to pick consistent names for the internal tasks and operators. > Rework Stream Operators to always be push based > --- > > Key: FLINK-1977 > URL: https://issues.apache.org/jira/browse/FLINK-1977 > Project: Flink > Issue Type: Improvement >Reporter: Aljoscha Krettek >Assignee: Aljoscha Krettek > > This is a result of the discussion on the mailing list. This is an excerpt > from the mailing list that gives the basic idea of the change: > I propose to change all streaming operators to be push based, with a > slightly improved interface: In addition to collect(), which I would > call receiveElement() I would add receivePunctuation() and > receiveBarrier(). The first operator in the chain would also get data > from the outside invokable that reads from the input iterator and > calls receiveElement() for the first operator in a chain. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1990) Uppercase "AS" keyword not allowed in select expression
[ https://issues.apache.org/jira/browse/FLINK-1990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14538069#comment-14538069 ] ASF GitHub Bot commented on FLINK-1990: --- Github user aljoscha commented on the pull request: https://github.com/apache/flink/pull/667#issuecomment-100950435 I created this issue for the aggregations: https://issues.apache.org/jira/browse/FLINK-2000. I think I can only assign you if you have a Jira account. Do you have one? > Uppercase "AS" keyword not allowed in select expression > --- > > Key: FLINK-1990 > URL: https://issues.apache.org/jira/browse/FLINK-1990 > Project: Flink > Issue Type: Bug > Components: Table API >Affects Versions: 0.9 >Reporter: Fabian Hueske >Assignee: Aljoscha Krettek >Priority: Minor > Fix For: 0.9 > > > Table API select expressions do not allow an uppercase "AS" keyword. > The following expression fails with an {{ExpressionException}}: > {{table.groupBy("request").select("request, request.count AS cnt")}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1990] [staging table] Support upper cas...
Github user aljoscha commented on the pull request: https://github.com/apache/flink/pull/667#issuecomment-100950435 I created this issue for the aggregations: https://issues.apache.org/jira/browse/FLINK-2000. I think I can only assign you if you have a Jira account. Do you have one? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Created] (FLINK-2000) Add SQL-style aggregations for Table API
Aljoscha Krettek created FLINK-2000: --- Summary: Add SQL-style aggregations for Table API Key: FLINK-2000 URL: https://issues.apache.org/jira/browse/FLINK-2000 Project: Flink Issue Type: Improvement Reporter: Aljoscha Krettek Priority: Minor Right now, the syntax for aggregations is "a.count, a.min" and so on. We could in addition offer "COUNT(a), MIN(a)" and so on. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1990) Uppercase "AS" keyword not allowed in select expression
[ https://issues.apache.org/jira/browse/FLINK-1990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14538041#comment-14538041 ] ASF GitHub Bot commented on FLINK-1990: --- Github user chhao01 commented on the pull request: https://github.com/apache/flink/pull/667#issuecomment-100943527 Thanks for the quick response, I was not sure which is suppose to be. Since it's case-insensitive, I will update the code with unit test soon. Yes, I am interested in adding the SQL style aggregations, it will be great if you can create the jira and assign to me. :) > Uppercase "AS" keyword not allowed in select expression > --- > > Key: FLINK-1990 > URL: https://issues.apache.org/jira/browse/FLINK-1990 > Project: Flink > Issue Type: Bug > Components: Table API >Affects Versions: 0.9 >Reporter: Fabian Hueske >Assignee: Aljoscha Krettek >Priority: Minor > Fix For: 0.9 > > > Table API select expressions do not allow an uppercase "AS" keyword. > The following expression fails with an {{ExpressionException}}: > {{table.groupBy("request").select("request, request.count AS cnt")}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1990] [staging table] Support upper cas...
Github user chhao01 commented on the pull request: https://github.com/apache/flink/pull/667#issuecomment-100943527 Thanks for the quick response, I was not sure which is suppose to be. Since it's case-insensitive, I will update the code with unit test soon. Yes, I am interested in adding the SQL style aggregations, it will be great if you can create the jira and assign to me. :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1525) Provide utils to pass -D parameters to UDFs
[ https://issues.apache.org/jira/browse/FLINK-1525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14538035#comment-14538035 ] ASF GitHub Bot commented on FLINK-1525: --- Github user hsaputra commented on the pull request: https://github.com/apache/flink/pull/664#issuecomment-100941805 Thanks for clarifying, Robert. I think convention to expect *Util classes to just contain static methods instead of being created as an instance. Maybe we could use ```InputParameters``` as class name to help parse the CLI arguments instead? > Provide utils to pass -D parameters to UDFs > > > Key: FLINK-1525 > URL: https://issues.apache.org/jira/browse/FLINK-1525 > Project: Flink > Issue Type: Improvement > Components: flink-contrib >Reporter: Robert Metzger > Labels: starter > > Hadoop users are used to setting job configuration through "-D" on the > command line. > Right now, Flink users have to manually parse command line arguments and pass > them to the methods. > It would be nice to provide a standard args parser with is taking care of > such stuff. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1525][FEEDBACK] Introduction of a small...
Github user hsaputra commented on the pull request: https://github.com/apache/flink/pull/664#issuecomment-100941805 Thanks for clarifying, Robert. I think convention to expect *Util classes to just contain static methods instead of being created as an instance. Maybe we could use ```InputParameters``` as class name to help parse the CLI arguments instead? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1990] [staging table] Support upper cas...
Github user aljoscha commented on the pull request: https://github.com/apache/flink/pull/667#issuecomment-100940455 Thanks for your contribution! Could you maybe enhance it along these lines: http://stackoverflow.com/questions/6080437/case-insensitive-scala-parser-combinator Because right now it would only support as and AS, but not As or aS. I know the latter is a rather academic example, but still... Would you also be interested in adding SQL style aggregations, for example COUNT(field), MAX(field) and so on? I could open a Jira for this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1990) Uppercase "AS" keyword not allowed in select expression
[ https://issues.apache.org/jira/browse/FLINK-1990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14538026#comment-14538026 ] ASF GitHub Bot commented on FLINK-1990: --- Github user aljoscha commented on the pull request: https://github.com/apache/flink/pull/667#issuecomment-100940455 Thanks for your contribution! Could you maybe enhance it along these lines: http://stackoverflow.com/questions/6080437/case-insensitive-scala-parser-combinator Because right now it would only support as and AS, but not As or aS. I know the latter is a rather academic example, but still... Would you also be interested in adding SQL style aggregations, for example COUNT(field), MAX(field) and so on? I could open a Jira for this. > Uppercase "AS" keyword not allowed in select expression > --- > > Key: FLINK-1990 > URL: https://issues.apache.org/jira/browse/FLINK-1990 > Project: Flink > Issue Type: Bug > Components: Table API >Affects Versions: 0.9 >Reporter: Fabian Hueske >Assignee: Aljoscha Krettek >Priority: Minor > Fix For: 0.9 > > > Table API select expressions do not allow an uppercase "AS" keyword. > The following expression fails with an {{ExpressionException}}: > {{table.groupBy("request").select("request, request.count AS cnt")}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1962) Add Gelly Scala API
[ https://issues.apache.org/jira/browse/FLINK-1962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14538018#comment-14538018 ] Aljoscha Krettek commented on FLINK-1962: - Yes, because the Scala type analysis does not recognise the Java tuples as Tuples. > Add Gelly Scala API > --- > > Key: FLINK-1962 > URL: https://issues.apache.org/jira/browse/FLINK-1962 > Project: Flink > Issue Type: Task > Components: Gelly, Scala API >Affects Versions: 0.9 >Reporter: Vasia Kalavri > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1990) Uppercase "AS" keyword not allowed in select expression
[ https://issues.apache.org/jira/browse/FLINK-1990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14538015#comment-14538015 ] ASF GitHub Bot commented on FLINK-1990: --- GitHub user chhao01 opened a pull request: https://github.com/apache/flink/pull/667 [FLINK-1990] [staging table] Support upper case of `as` for expression You can merge this pull request into a Git repository by running: $ git pull https://github.com/chhao01/flink as Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/667.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #667 commit bd1d059440a23e4b2a725946716f580e168c2a7b Author: Cheng Hao Date: 2015-05-11T14:57:14Z support upper case of 'as' for expression > Uppercase "AS" keyword not allowed in select expression > --- > > Key: FLINK-1990 > URL: https://issues.apache.org/jira/browse/FLINK-1990 > Project: Flink > Issue Type: Bug > Components: Table API >Affects Versions: 0.9 >Reporter: Fabian Hueske >Assignee: Aljoscha Krettek >Priority: Minor > Fix For: 0.9 > > > Table API select expressions do not allow an uppercase "AS" keyword. > The following expression fails with an {{ExpressionException}}: > {{table.groupBy("request").select("request, request.count AS cnt")}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1990] [staging table] Support upper cas...
GitHub user chhao01 opened a pull request: https://github.com/apache/flink/pull/667 [FLINK-1990] [staging table] Support upper case of `as` for expression You can merge this pull request into a Git repository by running: $ git pull https://github.com/chhao01/flink as Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/667.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #667 commit bd1d059440a23e4b2a725946716f580e168c2a7b Author: Cheng Hao Date: 2015-05-11T14:57:14Z support upper case of 'as' for expression --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1525][FEEDBACK] Introduction of a small...
Github user hsaputra commented on a diff in the pull request: https://github.com/apache/flink/pull/664#discussion_r30046159 --- Diff: flink-examples/flink-java-examples/src/main/java/org/apache/flink/examples/java/wordcount/PojoExample.java --- @@ -26,6 +26,7 @@ import org.apache.flink.util.Collector; + --- End diff -- Extra new line? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1525) Provide utils to pass -D parameters to UDFs
[ https://issues.apache.org/jira/browse/FLINK-1525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14538004#comment-14538004 ] ASF GitHub Bot commented on FLINK-1525: --- Github user hsaputra commented on a diff in the pull request: https://github.com/apache/flink/pull/664#discussion_r30046159 --- Diff: flink-examples/flink-java-examples/src/main/java/org/apache/flink/examples/java/wordcount/PojoExample.java --- @@ -26,6 +26,7 @@ import org.apache.flink.util.Collector; + --- End diff -- Extra new line? > Provide utils to pass -D parameters to UDFs > > > Key: FLINK-1525 > URL: https://issues.apache.org/jira/browse/FLINK-1525 > Project: Flink > Issue Type: Improvement > Components: flink-contrib >Reporter: Robert Metzger > Labels: starter > > Hadoop users are used to setting job configuration through "-D" on the > command line. > Right now, Flink users have to manually parse command line arguments and pass > them to the methods. > It would be nice to provide a standard args parser with is taking care of > such stuff. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1525][FEEDBACK] Introduction of a small...
Github user aljoscha commented on the pull request: https://github.com/apache/flink/pull/664#issuecomment-100926127 Yes, something like this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1962) Add Gelly Scala API
[ https://issues.apache.org/jira/browse/FLINK-1962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14537998#comment-14537998 ] Vasia Kalavri commented on FLINK-1962: -- We chose Tuples for performance and convenience I guess. We haven't really seen this choice as limiting the model in any case. Is this a problem for the Scala API? > Add Gelly Scala API > --- > > Key: FLINK-1962 > URL: https://issues.apache.org/jira/browse/FLINK-1962 > Project: Flink > Issue Type: Task > Components: Gelly, Scala API >Affects Versions: 0.9 >Reporter: Vasia Kalavri > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (FLINK-1990) Uppercase "AS" keyword not allowed in select expression
[ https://issues.apache.org/jira/browse/FLINK-1990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14537997#comment-14537997 ] Cheng Hao edited comment on FLINK-1990 at 5/11/15 2:50 PM: --- What about the aggregate functions? e.g. sum / SUM ? was (Author: chhao01): What if the aggregate functions? e.g. sum / SUM ? > Uppercase "AS" keyword not allowed in select expression > --- > > Key: FLINK-1990 > URL: https://issues.apache.org/jira/browse/FLINK-1990 > Project: Flink > Issue Type: Bug > Components: Table API >Affects Versions: 0.9 >Reporter: Fabian Hueske >Assignee: Aljoscha Krettek >Priority: Minor > Fix For: 0.9 > > > Table API select expressions do not allow an uppercase "AS" keyword. > The following expression fails with an {{ExpressionException}}: > {{table.groupBy("request").select("request, request.count AS cnt")}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1990) Uppercase "AS" keyword not allowed in select expression
[ https://issues.apache.org/jira/browse/FLINK-1990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14537997#comment-14537997 ] Cheng Hao commented on FLINK-1990: -- What if the aggregate functions? e.g. sum / SUM ? > Uppercase "AS" keyword not allowed in select expression > --- > > Key: FLINK-1990 > URL: https://issues.apache.org/jira/browse/FLINK-1990 > Project: Flink > Issue Type: Bug > Components: Table API >Affects Versions: 0.9 >Reporter: Fabian Hueske >Assignee: Aljoscha Krettek >Priority: Minor > Fix For: 0.9 > > > Table API select expressions do not allow an uppercase "AS" keyword. > The following expression fails with an {{ExpressionException}}: > {{table.groupBy("request").select("request, request.count AS cnt")}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1525) Provide utils to pass -D parameters to UDFs
[ https://issues.apache.org/jira/browse/FLINK-1525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14537984#comment-14537984 ] ASF GitHub Bot commented on FLINK-1525: --- Github user aljoscha commented on the pull request: https://github.com/apache/flink/pull/664#issuecomment-100926127 Yes, something like this. > Provide utils to pass -D parameters to UDFs > > > Key: FLINK-1525 > URL: https://issues.apache.org/jira/browse/FLINK-1525 > Project: Flink > Issue Type: Improvement > Components: flink-contrib >Reporter: Robert Metzger > Labels: starter > > Hadoop users are used to setting job configuration through "-D" on the > command line. > Right now, Flink users have to manually parse command line arguments and pass > them to the methods. > It would be nice to provide a standard args parser with is taking care of > such stuff. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1525) Provide utils to pass -D parameters to UDFs
[ https://issues.apache.org/jira/browse/FLINK-1525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14537975#comment-14537975 ] ASF GitHub Bot commented on FLINK-1525: --- Github user rmetzger commented on the pull request: https://github.com/apache/flink/pull/664#issuecomment-100923321 @aljoscha: Is this the API you were thinking of? ```java RequiredParameters required = new RequiredParameters(); Option input = required.add("input").alt("i").description("Path to input file or directory"); // parameter with long and short variant required.add("output"); // parameter only with long variant Option parallelism = required.add("parallelism").alt("p").type(Integer.class); // parameter with type Option spOption = required.add("sourceParallelism").alt("sp").defaultValue(12).description("Number specifying the number of parallel data source instances"); // parameter with default value, specifying the type. ParameterUtil parameter = ParameterUtil.fromArgs(new String[]{"-i", "someinput", "--output", "someout", "-p", "15"}); required.check(parameter); required.printHelp(); required.checkAndPopulate(parameter); String inputString = input.get(); int par = parallelism.getInteger(); String output = parameter.get("output"); int sourcePar = parameter.getInteger(spOption.getName()); ``` > Provide utils to pass -D parameters to UDFs > > > Key: FLINK-1525 > URL: https://issues.apache.org/jira/browse/FLINK-1525 > Project: Flink > Issue Type: Improvement > Components: flink-contrib >Reporter: Robert Metzger > Labels: starter > > Hadoop users are used to setting job configuration through "-D" on the > command line. > Right now, Flink users have to manually parse command line arguments and pass > them to the methods. > It would be nice to provide a standard args parser with is taking care of > such stuff. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1525][FEEDBACK] Introduction of a small...
Github user rmetzger commented on the pull request: https://github.com/apache/flink/pull/664#issuecomment-100923321 @aljoscha: Is this the API you were thinking of? ```java RequiredParameters required = new RequiredParameters(); Option input = required.add("input").alt("i").description("Path to input file or directory"); // parameter with long and short variant required.add("output"); // parameter only with long variant Option parallelism = required.add("parallelism").alt("p").type(Integer.class); // parameter with type Option spOption = required.add("sourceParallelism").alt("sp").defaultValue(12).description("Number specifying the number of parallel data source instances"); // parameter with default value, specifying the type. ParameterUtil parameter = ParameterUtil.fromArgs(new String[]{"-i", "someinput", "--output", "someout", "-p", "15"}); required.check(parameter); required.printHelp(); required.checkAndPopulate(parameter); String inputString = input.get(); int par = parallelism.getInteger(); String output = parameter.get("output"); int sourcePar = parameter.getInteger(spOption.getName()); ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1525) Provide utils to pass -D parameters to UDFs
[ https://issues.apache.org/jira/browse/FLINK-1525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14537958#comment-14537958 ] ASF GitHub Bot commented on FLINK-1525: --- Github user rmetzger commented on the pull request: https://github.com/apache/flink/pull/664#issuecomment-100917520 >>If this tool also supported positional arguments ... > >I also though about adding those, but decided against it, because args[n] is already a way of > accessing the arguments by their position ;) > But I'll add it so that users can also specify default values .. and we take care of the parsing. I'm not so sure about this anymore. Positional arguments would break a lot in the design of the `ParameterUtil`. For example the export to the web interface, the Configuration object or Properties. > Provide utils to pass -D parameters to UDFs > > > Key: FLINK-1525 > URL: https://issues.apache.org/jira/browse/FLINK-1525 > Project: Flink > Issue Type: Improvement > Components: flink-contrib >Reporter: Robert Metzger > Labels: starter > > Hadoop users are used to setting job configuration through "-D" on the > command line. > Right now, Flink users have to manually parse command line arguments and pass > them to the methods. > It would be nice to provide a standard args parser with is taking care of > such stuff. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1525][FEEDBACK] Introduction of a small...
Github user rmetzger commented on the pull request: https://github.com/apache/flink/pull/664#issuecomment-100917520 >>If this tool also supported positional arguments ... > >I also though about adding those, but decided against it, because args[n] is already a way of > accessing the arguments by their position ;) > But I'll add it so that users can also specify default values .. and we take care of the parsing. I'm not so sure about this anymore. Positional arguments would break a lot in the design of the `ParameterUtil`. For example the export to the web interface, the Configuration object or Properties. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1962) Add Gelly Scala API
[ https://issues.apache.org/jira/browse/FLINK-1962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14537938#comment-14537938 ] Aljoscha Krettek commented on FLINK-1962: - Yes, this is correct, because the Java Tuples classes look like any other classes to the Scala type analysis component. [~vkalavri] The Graph API can only work with Java Tuples? Could it not work with more generic types? > Add Gelly Scala API > --- > > Key: FLINK-1962 > URL: https://issues.apache.org/jira/browse/FLINK-1962 > Project: Flink > Issue Type: Task > Components: Gelly, Scala API >Affects Versions: 0.9 >Reporter: Vasia Kalavri > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1964) Rework TwitterSource to use a Properties object instead of a file path
[ https://issues.apache.org/jira/browse/FLINK-1964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14537922#comment-14537922 ] ASF GitHub Bot commented on FLINK-1964: --- Github user rmetzger commented on the pull request: https://github.com/apache/flink/pull/666#issuecomment-100902145 Hey, thank you for working on this & opening a pull request. I'll address your comments inline: > put the default value for queueSize attribute in a constant (we could call it, DEFAULT_QUEUE_SIZE) > put the default value for waitSec attribute in a constant (we could call it, DEFAULT_WAIT_SECONDS) Good idea, please do so. Could you also make these values configurable for our users? Nobody wants to recompile Flink just for changing an integer ;) >i see a couple of methods, that seems to work in paris such as: >open / close: called at the begining of the function and at the very end of it >run / cancel: called on each iteration to perform some work or cancel the current ongoing work >Is my understanding correct? I am asking since i see that the TwitterSource is initializing its client connection in the open(...) method but stops it in the run(...) method. Do those methods works in pairs? The way we call sources in Flink streaming is a bit broken, thats why we are reworking it right now. Thats the pull request where we change it: https://github.com/apache/flink/pull/659 But you are right, open() and close() work in pairs. Connections we are opening in the open() method should be closed in close(). I would leave that for now as it is, because we are going to change that in the next days anyways. > do you think it is a good idea to assign the client to null after stopping it in the closeConnection method, so it can be garbage collected as soon as possible? Also, if closeConnection() throws an exception, isRunning never changes to false. Yes, you can set it to null. > we should check for preconditions in the constructors and public set methods to avoid receiving null auth propeties or similar cases. > we could have a private constructor that receives all the possible parameters and use constructor chaining to avoid code duplication even in the constructors. These are both very good ideas! > I have created a util file to place methods when working with properties files, i put the load method that loads a properties object form a file path, we should place this in a more common package, i put it close to the file i have changed but we should consider moving it to a better place. Maybe we can combine it with this one: https://github.com/apache/flink/pull/664 > Separate the different connectors into different submodules. There is already a JIRA filed for this: https://issues.apache.org/jira/browse/FLINK-1874 But I would suggest to wait for a few days until https://github.com/apache/flink/pull/659 is merged, otherwise, there are going to be a lot of merge conflicts. In general: Feel free to change whatever is necessary to make the code better (also the missing javadocs). Your suggestions are all very good. I see that you have a good sense of writing high quality code! > Rework TwitterSource to use a Properties object instead of a file path > -- > > Key: FLINK-1964 > URL: https://issues.apache.org/jira/browse/FLINK-1964 > Project: Flink > Issue Type: Bug > Components: Streaming >Affects Versions: 0.9 >Reporter: Robert Metzger >Assignee: Carlos Curotto >Priority: Minor > Labels: starter > > The twitter connector is very hard to use on a cluster because it expects the > property file to be present on all nodes. > It would be much easier to ask the user to pass a Properties object > immediately. > Also, the javadoc of the class stops in the middle of the sentence. > It was not obvious to me how the two examples TwitterStreaming and > TwitterTopology differ. Also, there is a third TwitterStream example in the > streaming examples. > The documentation of the Twitter source refers to the non existent > TwitterLocal class. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1964] Rework TwitterSource to use a Pro...
Github user rmetzger commented on the pull request: https://github.com/apache/flink/pull/666#issuecomment-100902145 Hey, thank you for working on this & opening a pull request. I'll address your comments inline: > put the default value for queueSize attribute in a constant (we could call it, DEFAULT_QUEUE_SIZE) > put the default value for waitSec attribute in a constant (we could call it, DEFAULT_WAIT_SECONDS) Good idea, please do so. Could you also make these values configurable for our users? Nobody wants to recompile Flink just for changing an integer ;) >i see a couple of methods, that seems to work in paris such as: >open / close: called at the begining of the function and at the very end of it >run / cancel: called on each iteration to perform some work or cancel the current ongoing work >Is my understanding correct? I am asking since i see that the TwitterSource is initializing its client connection in the open(...) method but stops it in the run(...) method. Do those methods works in pairs? The way we call sources in Flink streaming is a bit broken, thats why we are reworking it right now. Thats the pull request where we change it: https://github.com/apache/flink/pull/659 But you are right, open() and close() work in pairs. Connections we are opening in the open() method should be closed in close(). I would leave that for now as it is, because we are going to change that in the next days anyways. > do you think it is a good idea to assign the client to null after stopping it in the closeConnection method, so it can be garbage collected as soon as possible? Also, if closeConnection() throws an exception, isRunning never changes to false. Yes, you can set it to null. > we should check for preconditions in the constructors and public set methods to avoid receiving null auth propeties or similar cases. > we could have a private constructor that receives all the possible parameters and use constructor chaining to avoid code duplication even in the constructors. These are both very good ideas! > I have created a util file to place methods when working with properties files, i put the load method that loads a properties object form a file path, we should place this in a more common package, i put it close to the file i have changed but we should consider moving it to a better place. Maybe we can combine it with this one: https://github.com/apache/flink/pull/664 > Separate the different connectors into different submodules. There is already a JIRA filed for this: https://issues.apache.org/jira/browse/FLINK-1874 But I would suggest to wait for a few days until https://github.com/apache/flink/pull/659 is merged, otherwise, there are going to be a lot of merge conflicts. In general: Feel free to change whatever is necessary to make the code better (also the missing javadocs). Your suggestions are all very good. I see that you have a good sense of writing high quality code! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Updated] (FLINK-1874) Break up streaming connectors into submodules
[ https://issues.apache.org/jira/browse/FLINK-1874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Metzger updated FLINK-1874: -- Component/s: Streaming > Break up streaming connectors into submodules > - > > Key: FLINK-1874 > URL: https://issues.apache.org/jira/browse/FLINK-1874 > Project: Flink > Issue Type: Task > Components: Build System, Streaming >Affects Versions: 0.9 >Reporter: Robert Metzger > Labels: starter > > As per: > http://apache-flink-incubator-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Break-up-streaming-connectors-into-subprojects-td5001.html -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (FLINK-1537) GSoC project: Machine learning with Apache Flink
[ https://issues.apache.org/jira/browse/FLINK-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Metzger updated FLINK-1537: -- Component/s: Machine Learning Library > GSoC project: Machine learning with Apache Flink > > > Key: FLINK-1537 > URL: https://issues.apache.org/jira/browse/FLINK-1537 > Project: Flink > Issue Type: New Feature > Components: Machine Learning Library >Reporter: Till Rohrmann >Priority: Minor > Labels: gsoc2015, java, machine_learning, scala > > Currently, the Flink community is setting up the infrastructure for a machine > learning library for Flink. The goal is to provide a set of highly optimized > ML algorithms and to offer a high level linear algebra abstraction to easily > do data pre- and post-processing. By defining a set of commonly used data > structures on which the algorithms work it will be possible to define complex > processing pipelines. > The Mahout DSL constitutes a good fit to be used as the linear algebra > language in Flink. It has to be evaluated which means have to be provided to > allow an easy transition between the high level abstraction and the optimized > algorithms. > The machine learning library offers multiple starting points for a GSoC > project. Amongst others, the following projects are conceivable. > * Extension of Flink's machine learning library by additional ML algorithms > ** Stochastic gradient descent > ** Distributed dual coordinate ascent > ** SVM > ** Gaussian mixture EM > ** DecisionTrees > ** ... > * Integration of Flink with the Mahout DSL to support a high level linear > algebra abstraction > * Integration of H2O with Flink to benefit from H2O's sophisticated machine > learning algorithms > * Implementation of a parameter server like distributed global state storage > facility for Flink. This also includes the extension of Flink to support > asynchronous iterations and update messages. > Own ideas for a possible contribution on the field of the machine learning > library are highly welcome. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1525][FEEDBACK] Introduction of a small...
Github user mxm commented on the pull request: https://github.com/apache/flink/pull/664#issuecomment-100897320 Thanks for clarifying. I agree that the parser in Apache Commons is not the nicest... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1525) Provide utils to pass -D parameters to UDFs
[ https://issues.apache.org/jira/browse/FLINK-1525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14537909#comment-14537909 ] ASF GitHub Bot commented on FLINK-1525: --- Github user mxm commented on the pull request: https://github.com/apache/flink/pull/664#issuecomment-100897320 Thanks for clarifying. I agree that the parser in Apache Commons is not the nicest... > Provide utils to pass -D parameters to UDFs > > > Key: FLINK-1525 > URL: https://issues.apache.org/jira/browse/FLINK-1525 > Project: Flink > Issue Type: Improvement > Components: flink-contrib >Reporter: Robert Metzger > Labels: starter > > Hadoop users are used to setting job configuration through "-D" on the > command line. > Right now, Flink users have to manually parse command line arguments and pass > them to the methods. > It would be nice to provide a standard args parser with is taking care of > such stuff. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1103) Update Streaming examples to become self-contained
[ https://issues.apache.org/jira/browse/FLINK-1103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14537896#comment-14537896 ] Robert Metzger commented on FLINK-1103: --- The data has been replaced by something we can use from a legal perspective in http://git-wip-us.apache.org/repos/asf/flink/commit/8ea840e2 > Update Streaming examples to become self-contained > -- > > Key: FLINK-1103 > URL: https://issues.apache.org/jira/browse/FLINK-1103 > Project: Flink > Issue Type: Improvement > Components: Streaming >Affects Versions: 0.7.0-incubating >Reporter: Márton Balassi >Assignee: Márton Balassi > > Streaming examples do not follow the standard set by the recent examples > refactor of the batch API. > TestDataUtil should be removed and Object[][] used to contain the example > data. > Comments are also lacking in comparison with the batch counterpart. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1999) TF-IDF transformer
[ https://issues.apache.org/jira/browse/FLINK-1999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14537888#comment-14537888 ] Felix Neutatz commented on FLINK-1999: -- When I understand it correctly, it is like this: Document1: Hello world Document2: This is awesome val data: Seq[Seq[String]] = Seq( "Hello world".split(" ").toSeq, "This is awesome".split(" ").toSeq ) val dataSet: DataSet[Seq[String]] = env.fromCollection(data) > TF-IDF transformer > -- > > Key: FLINK-1999 > URL: https://issues.apache.org/jira/browse/FLINK-1999 > Project: Flink > Issue Type: New Feature > Components: Machine Learning Library >Reporter: Ronny Bräunlich >Assignee: Alexander Alexandrov >Priority: Minor > Labels: ML > > Hello everybody, > we are a group of three students from TU Berlin (I guess we're not the first > group creating an issue) and we want to/have to implement a tf-idf tranformer > for Flink. > Our lecturer Alexander told us that we could get some guidance here and that > you could point us to an old version of a similar tranformer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1807) Stochastic gradient descent optimizer for ML library
[ https://issues.apache.org/jira/browse/FLINK-1807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14537886#comment-14537886 ] Theodore Vasiloudis commented on FLINK-1807: [~till.rohrmann] You think we can close this, or should we leave it open until we sampling in place? > Stochastic gradient descent optimizer for ML library > > > Key: FLINK-1807 > URL: https://issues.apache.org/jira/browse/FLINK-1807 > Project: Flink > Issue Type: Improvement > Components: Machine Learning Library >Reporter: Till Rohrmann >Assignee: Theodore Vasiloudis > Labels: ML > > Stochastic gradient descent (SGD) is a widely used optimization technique in > different ML algorithms. Thus, it would be helpful to provide a generalized > SGD implementation which can be instantiated with the respective gradient > computation. Such a building block would make the development of future > algorithms easier. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1525) Provide utils to pass -D parameters to UDFs
[ https://issues.apache.org/jira/browse/FLINK-1525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14537880#comment-14537880 ] ASF GitHub Bot commented on FLINK-1525: --- Github user rmetzger commented on the pull request: https://github.com/apache/flink/pull/664#issuecomment-100891653 Thank you. I didn't reuse the parser there because there is an ongoing JIRA to unify all the command line parsers inside Flink. It seems that we are using at least two different libraries .. and the consensus in the JIRA seems to be using a third one to solve the problem. But I didn't want to add more confusion to the topic than we already have. > Provide utils to pass -D parameters to UDFs > > > Key: FLINK-1525 > URL: https://issues.apache.org/jira/browse/FLINK-1525 > Project: Flink > Issue Type: Improvement > Components: flink-contrib >Reporter: Robert Metzger > Labels: starter > > Hadoop users are used to setting job configuration through "-D" on the > command line. > Right now, Flink users have to manually parse command line arguments and pass > them to the methods. > It would be nice to provide a standard args parser with is taking care of > such stuff. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1525][FEEDBACK] Introduction of a small...
Github user rmetzger commented on the pull request: https://github.com/apache/flink/pull/664#issuecomment-100891653 Thank you. I didn't reuse the parser there because there is an ongoing JIRA to unify all the command line parsers inside Flink. It seems that we are using at least two different libraries .. and the consensus in the JIRA seems to be using a third one to solve the problem. But I didn't want to add more confusion to the topic than we already have. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1525][FEEDBACK] Introduction of a small...
Github user mxm commented on the pull request: https://github.com/apache/flink/pull/664#issuecomment-100891261 Very helpful utility. I think it's worth adapting all the examples if we merge this. Removes a lot of unnecessary code and makes the examples more readable. May I ask why you didn't reuse the Parser in `org.apache.commons.cli`? Too much overhead? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1525) Provide utils to pass -D parameters to UDFs
[ https://issues.apache.org/jira/browse/FLINK-1525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14537878#comment-14537878 ] ASF GitHub Bot commented on FLINK-1525: --- Github user mxm commented on the pull request: https://github.com/apache/flink/pull/664#issuecomment-100891261 Very helpful utility. I think it's worth adapting all the examples if we merge this. Removes a lot of unnecessary code and makes the examples more readable. May I ask why you didn't reuse the Parser in `org.apache.commons.cli`? Too much overhead? > Provide utils to pass -D parameters to UDFs > > > Key: FLINK-1525 > URL: https://issues.apache.org/jira/browse/FLINK-1525 > Project: Flink > Issue Type: Improvement > Components: flink-contrib >Reporter: Robert Metzger > Labels: starter > > Hadoop users are used to setting job configuration through "-D" on the > command line. > Right now, Flink users have to manually parse command line arguments and pass > them to the methods. > It would be nice to provide a standard args parser with is taking care of > such stuff. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (FLINK-1959) Accumulators BROKEN after Partitioning
[ https://issues.apache.org/jira/browse/FLINK-1959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Alexandrov updated FLINK-1959: Affects Version/s: (was: 0.8.1) master Fix Version/s: (was: 0.8.1) master > Accumulators BROKEN after Partitioning > -- > > Key: FLINK-1959 > URL: https://issues.apache.org/jira/browse/FLINK-1959 > Project: Flink > Issue Type: Bug > Components: Examples >Affects Versions: master >Reporter: mustafa elbehery >Priority: Critical > Fix For: master > > > while running the Accumulator example in > https://github.com/Elbehery/flink/blob/master/flink-examples/flink-java-examples/src/main/java/org/apache/flink/examples/java/relational/EmptyFieldsCountAccumulator.java, > > I tried to alter the data flow with "PartitionByHash" function before > applying "Filter", and the resulted accumulator was NULL. > By Debugging, I could see the accumulator in the RunTime Map. However, by > retrieving the accumulator from the JobExecutionResult object, it was NULL. > The line caused the problem is "file.partitionByHash(1).filter(new > EmptyFieldFilter())" instead of "file.filter(new EmptyFieldFilter())" -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1525][FEEDBACK] Introduction of a small...
Github user aljoscha commented on the pull request: https://github.com/apache/flink/pull/664#issuecomment-100880185 I like it. But I think it needs some functionality for verifying parameters. To let the user specify some parameters that always need to be there and a description of the parameter. Similar to how other tools print the "usage" when you don't give correct arguments. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1979) Implement Loss Functions
[ https://issues.apache.org/jira/browse/FLINK-1979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14537855#comment-14537855 ] ASF GitHub Bot commented on FLINK-1979: --- Github user thvasilo commented on the pull request: https://github.com/apache/flink/pull/656#issuecomment-100884364 Thank you Johaness. The optimization code has been merged to the master now, so could you rebase your branch to the latest master so we can look at the changes in an isolated way? You can take a look at the [How to contibute](http://flink.apache.org/how-to-contribute.html#contributing-code-&-documentation) guide on how to do this. The merges you have currently make it hard to review the code. Also, please make sure all your classes have docstrings, you can take the docstring for SquaredLoss as an example (i.e. one sentence is usually enough). Documentation is always welcome of course, so if you want to add some more details to the loss functions section of the ML documentation (docs/libs/ml/optimization.md) feel free to do so in this PR. Let me know if you run into any problems. > Implement Loss Functions > > > Key: FLINK-1979 > URL: https://issues.apache.org/jira/browse/FLINK-1979 > Project: Flink > Issue Type: Improvement > Components: Machine Learning Library >Reporter: Johannes Günther >Assignee: Johannes Günther >Priority: Minor > Labels: ML > > For convex optimization problems, optimizer methods like SGD rely on a > pluggable implementation of a loss function and its first derivative. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1979] Lossfunctions
Github user thvasilo commented on the pull request: https://github.com/apache/flink/pull/656#issuecomment-100884364 Thank you Johaness. The optimization code has been merged to the master now, so could you rebase your branch to the latest master so we can look at the changes in an isolated way? You can take a look at the [How to contibute](http://flink.apache.org/how-to-contribute.html#contributing-code-&-documentation) guide on how to do this. The merges you have currently make it hard to review the code. Also, please make sure all your classes have docstrings, you can take the docstring for SquaredLoss as an example (i.e. one sentence is usually enough). Documentation is always welcome of course, so if you want to add some more details to the loss functions section of the ML documentation (docs/libs/ml/optimization.md) feel free to do so in this PR. Let me know if you run into any problems. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1999) TF-IDF transformer
[ https://issues.apache.org/jira/browse/FLINK-1999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14537853#comment-14537853 ] Alexander Alexandrov commented on FLINK-1999: - What does a Seq[String] represent? > TF-IDF transformer > -- > > Key: FLINK-1999 > URL: https://issues.apache.org/jira/browse/FLINK-1999 > Project: Flink > Issue Type: New Feature > Components: Machine Learning Library >Reporter: Ronny Bräunlich >Assignee: Alexander Alexandrov >Priority: Minor > Labels: ML > > Hello everybody, > we are a group of three students from TU Berlin (I guess we're not the first > group creating an issue) and we want to/have to implement a tf-idf tranformer > for Flink. > Our lecturer Alexander told us that we could get some guidance here and that > you could point us to an old version of a similar tranformer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1525][FEEDBACK] Introduction of a small...
Github user rmetzger commented on the pull request: https://github.com/apache/flink/pull/664#issuecomment-100882340 I agree, however it should be optional. I don't like these tools where you spend a lot of time registering / specifying arguments. People want to analyze their data, not configure a huge parameter parsing framework ;) But I'm going to look into this and see how much work it would be to implement it. If it blocks me from getting this PR merged soon, I'll file a follow-up JIRA. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1525) Provide utils to pass -D parameters to UDFs
[ https://issues.apache.org/jira/browse/FLINK-1525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14537847#comment-14537847 ] ASF GitHub Bot commented on FLINK-1525: --- Github user rmetzger commented on the pull request: https://github.com/apache/flink/pull/664#issuecomment-100882340 I agree, however it should be optional. I don't like these tools where you spend a lot of time registering / specifying arguments. People want to analyze their data, not configure a huge parameter parsing framework ;) But I'm going to look into this and see how much work it would be to implement it. If it blocks me from getting this PR merged soon, I'll file a follow-up JIRA. > Provide utils to pass -D parameters to UDFs > > > Key: FLINK-1525 > URL: https://issues.apache.org/jira/browse/FLINK-1525 > Project: Flink > Issue Type: Improvement > Components: flink-contrib >Reporter: Robert Metzger > Labels: starter > > Hadoop users are used to setting job configuration through "-D" on the > command line. > Right now, Flink users have to manually parse command line arguments and pass > them to the methods. > It would be nice to provide a standard args parser with is taking care of > such stuff. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1525) Provide utils to pass -D parameters to UDFs
[ https://issues.apache.org/jira/browse/FLINK-1525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14537842#comment-14537842 ] ASF GitHub Bot commented on FLINK-1525: --- Github user aljoscha commented on the pull request: https://github.com/apache/flink/pull/664#issuecomment-100880185 I like it. But I think it needs some functionality for verifying parameters. To let the user specify some parameters that always need to be there and a description of the parameter. Similar to how other tools print the "usage" when you don't give correct arguments. > Provide utils to pass -D parameters to UDFs > > > Key: FLINK-1525 > URL: https://issues.apache.org/jira/browse/FLINK-1525 > Project: Flink > Issue Type: Improvement > Components: flink-contrib >Reporter: Robert Metzger > Labels: starter > > Hadoop users are used to setting job configuration through "-D" on the > command line. > Right now, Flink users have to manually parse command line arguments and pass > them to the methods. > It would be nice to provide a standard args parser with is taking care of > such stuff. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1959) Accumulators BROKEN after Partitioning
[ https://issues.apache.org/jira/browse/FLINK-1959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14537845#comment-14537845 ] Alexander Alexandrov commented on FLINK-1959: - I think that some communication / traversal chain gets broken in the ParitionByHash node. You can either (1) try to dig through the code and see where this happens, or (2) use an alternative to the accumulator until the issue is resolved (e.g. write the information to a pre-defined HDFS path); > Accumulators BROKEN after Partitioning > -- > > Key: FLINK-1959 > URL: https://issues.apache.org/jira/browse/FLINK-1959 > Project: Flink > Issue Type: Bug > Components: Examples >Affects Versions: 0.8.1 >Reporter: mustafa elbehery >Priority: Critical > Fix For: 0.8.1 > > > while running the Accumulator example in > https://github.com/Elbehery/flink/blob/master/flink-examples/flink-java-examples/src/main/java/org/apache/flink/examples/java/relational/EmptyFieldsCountAccumulator.java, > > I tried to alter the data flow with "PartitionByHash" function before > applying "Filter", and the resulted accumulator was NULL. > By Debugging, I could see the accumulator in the RunTime Map. However, by > retrieving the accumulator from the JobExecutionResult object, it was NULL. > The line caused the problem is "file.partitionByHash(1).filter(new > EmptyFieldFilter())" instead of "file.filter(new EmptyFieldFilter())" -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (FLINK-1962) Add Gelly Scala API
[ https://issues.apache.org/jira/browse/FLINK-1962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14537797#comment-14537797 ] PJ Van Aeken edited comment on FLINK-1962 at 5/11/15 11:30 AM: --- I was wrong before. The TypeErasure fix has one more problem. Method "createTypeInformation" creates a PojoTypeInfo for Vertex, rather than a TupleTypeInfo which is what the Java API expects. I think this is because the Scala Type Extraction does not recognize java's Tuple2 as a Tuple... was (Author: vanaepi): I was wrong before. The TypeErasure fix has one more problem. Method "createTypeInformation" creates a PojoTypeInfo for Vertex, rather than a TupleTypeInfo which is what the Java API expects. > Add Gelly Scala API > --- > > Key: FLINK-1962 > URL: https://issues.apache.org/jira/browse/FLINK-1962 > Project: Flink > Issue Type: Task > Components: Gelly, Scala API >Affects Versions: 0.9 >Reporter: Vasia Kalavri > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1999) TF-IDF transformer
[ https://issues.apache.org/jira/browse/FLINK-1999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14537822#comment-14537822 ] Ronny Bräunlich commented on FLINK-1999: Are you sure that the input type should be DataSet[Seq[String]]? That seems to me like we would calculate the idf always for one document, which would be log(1/1) -> 0 or is one element of the sequence supposed to be one document? If yes, would it be wise to always load the full document into memory or is the DataSet smart enough to read the file stream-wise? > TF-IDF transformer > -- > > Key: FLINK-1999 > URL: https://issues.apache.org/jira/browse/FLINK-1999 > Project: Flink > Issue Type: New Feature > Components: Machine Learning Library >Reporter: Ronny Bräunlich >Assignee: Alexander Alexandrov >Priority: Minor > Labels: ML > > Hello everybody, > we are a group of three students from TU Berlin (I guess we're not the first > group creating an issue) and we want to/have to implement a tf-idf tranformer > for Flink. > Our lecturer Alexander told us that we could get some guidance here and that > you could point us to an old version of a similar tranformer. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (FLINK-1962) Add Gelly Scala API
[ https://issues.apache.org/jira/browse/FLINK-1962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14537797#comment-14537797 ] PJ Van Aeken edited comment on FLINK-1962 at 5/11/15 10:33 AM: --- I was wrong before. The TypeErasure fix has one more problem. Method "createTypeInformation" creates a PojoTypeInfo for Vertex, rather than a TupleTypeInfo which is what the Java API expects. was (Author: vanaepi): The combination of both pull requests did the trick. Perhaps someone should merge the TypeErasure fix into master as well. Seems like an important fix for future combined java/scala api's > Add Gelly Scala API > --- > > Key: FLINK-1962 > URL: https://issues.apache.org/jira/browse/FLINK-1962 > Project: Flink > Issue Type: Task > Components: Gelly, Scala API >Affects Versions: 0.9 >Reporter: Vasia Kalavri > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1962) Add Gelly Scala API
[ https://issues.apache.org/jira/browse/FLINK-1962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14537797#comment-14537797 ] PJ Van Aeken commented on FLINK-1962: - The combination of both pull requests did the trick. Perhaps someone should merge the TypeErasure fix into master as well. Seems like an important fix for future combined java/scala api's > Add Gelly Scala API > --- > > Key: FLINK-1962 > URL: https://issues.apache.org/jira/browse/FLINK-1962 > Project: Flink > Issue Type: Task > Components: Gelly, Scala API >Affects Versions: 0.9 >Reporter: Vasia Kalavri > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (FLINK-1636) Improve error handling when partitions not found
[ https://issues.apache.org/jira/browse/FLINK-1636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ufuk Celebi updated FLINK-1636: --- Priority: Major (was: Minor) > Improve error handling when partitions not found > > > Key: FLINK-1636 > URL: https://issues.apache.org/jira/browse/FLINK-1636 > Project: Flink > Issue Type: Improvement > Components: Distributed Runtime >Reporter: Ufuk Celebi >Assignee: Ufuk Celebi > > When a result partition is released concurrently with a remote partition > request, the request might come in late and result in an exception at the > receiving task saying: > {code} > 16:04:22,499 INFO org.apache.flink.runtime.taskmanager.Task >- CHAIN Partition -> Map (Map at > testRestartMultipleTimes(SimpleRecoveryITCase.java:200)) (1/4) switched to > FAILED : java.io.IOException: > org.apache.flink.runtime.io.network.partition.queue.IllegalQueueIteratorRequestException > at remote input channel: Intermediate result partition has already been > released.]. > at > org.apache.flink.runtime.io.network.partition.consumer.RemoteInputChannel.checkIoError(RemoteInputChannel.java:223) > at > org.apache.flink.runtime.io.network.partition.consumer.RemoteInputChannel.getNextBuffer(RemoteInputChannel.java:103) > at > org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.getNextBufferOrEvent(SingleInputGate.java:310) > at > org.apache.flink.runtime.io.network.api.reader.AbstractRecordReader.getNextRecord(AbstractRecordReader.java:75) > at > org.apache.flink.runtime.io.network.api.reader.MutableRecordReader.next(MutableRecordReader.java:34) > at > org.apache.flink.runtime.operators.util.ReaderIterator.next(ReaderIterator.java:59) > at org.apache.flink.runtime.operators.NoOpDriver.run(NoOpDriver.java:91) > at > org.apache.flink.runtime.operators.RegularPactTask.run(RegularPactTask.java:496) > at > org.apache.flink.runtime.operators.RegularPactTask.invoke(RegularPactTask.java:362) > at > org.apache.flink.runtime.execution.RuntimeEnvironment.run(RuntimeEnvironment.java:205) > at java.lang.Thread.run(Thread.java:745) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1525) Provide utils to pass -D parameters to UDFs
[ https://issues.apache.org/jira/browse/FLINK-1525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14537696#comment-14537696 ] ASF GitHub Bot commented on FLINK-1525: --- Github user rmetzger commented on the pull request: https://github.com/apache/flink/pull/664#issuecomment-100805130 In Hadoop, the `UserConfig` would probably be a `Configuration` object with key/value pairs. In Flink, we are trying to get rid of these untyped maps. Instead, I would recommend users to use a simple java class, like ```java public static class MyConfig extends UserConfig { public long someLongValue; public int someInt; // this is optional public Map toMap() { return null; } } ``` It can be used in a similar way to a Configuration object, but the compiler is able to check the types. The `ParameterUtil` is implementing the UserConfig interface to expose the configuration values through the Flink program & in the web interface. > Provide utils to pass -D parameters to UDFs > > > Key: FLINK-1525 > URL: https://issues.apache.org/jira/browse/FLINK-1525 > Project: Flink > Issue Type: Improvement > Components: flink-contrib >Reporter: Robert Metzger > Labels: starter > > Hadoop users are used to setting job configuration through "-D" on the > command line. > Right now, Flink users have to manually parse command line arguments and pass > them to the methods. > It would be nice to provide a standard args parser with is taking care of > such stuff. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1525][FEEDBACK] Introduction of a small...
Github user rmetzger commented on the pull request: https://github.com/apache/flink/pull/664#issuecomment-100805130 In Hadoop, the `UserConfig` would probably be a `Configuration` object with key/value pairs. In Flink, we are trying to get rid of these untyped maps. Instead, I would recommend users to use a simple java class, like ```java public static class MyConfig extends UserConfig { public long someLongValue; public int someInt; // this is optional public Map toMap() { return null; } } ``` It can be used in a similar way to a Configuration object, but the compiler is able to check the types. The `ParameterUtil` is implementing the UserConfig interface to expose the configuration values through the Flink program & in the web interface. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1987) Broken links in the add_operator section of the documentation
[ https://issues.apache.org/jira/browse/FLINK-1987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14537695#comment-14537695 ] ASF GitHub Bot commented on FLINK-1987: --- Github user mxm commented on the pull request: https://github.com/apache/flink/pull/662#issuecomment-100805034 +1 links work again > Broken links in the add_operator section of the documentation > - > > Key: FLINK-1987 > URL: https://issues.apache.org/jira/browse/FLINK-1987 > Project: Flink > Issue Type: Bug > Components: docs >Affects Versions: 0.9 >Reporter: Andra Lungu >Assignee: Andra Lungu >Priority: Trivial > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1987][docs] Fixed broken links
Github user mxm commented on the pull request: https://github.com/apache/flink/pull/662#issuecomment-100805034 +1 links work again --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---