[GitHub] flink pull request: [FLINK-2001] [ml] Fix DistanceMetric serializa...

2015-05-11 Thread chiwanpark
GitHub user chiwanpark opened a pull request:

https://github.com/apache/flink/pull/668

[FLINK-2001] [ml] Fix DistanceMetric serialization error

* `DistanceMetric` extends Serializable
* Add simple serialization test

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/chiwanpark/flink FLINK-2001

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/668.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #668


commit 53726c12c8af09dfbe17903df3efcc1308da0540
Author: Chiwan Park 
Date:   2015-05-12T06:11:25Z

[FLINK-2001] [ml] Fix DistanceMetric serialization error




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-2001) DistanceMetric cannot be serialized

2015-05-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-2001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14539325#comment-14539325
 ] 

ASF GitHub Bot commented on FLINK-2001:
---

GitHub user chiwanpark opened a pull request:

https://github.com/apache/flink/pull/668

[FLINK-2001] [ml] Fix DistanceMetric serialization error

* `DistanceMetric` extends Serializable
* Add simple serialization test

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/chiwanpark/flink FLINK-2001

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/668.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #668


commit 53726c12c8af09dfbe17903df3efcc1308da0540
Author: Chiwan Park 
Date:   2015-05-12T06:11:25Z

[FLINK-2001] [ml] Fix DistanceMetric serialization error




> DistanceMetric cannot be serialized
> ---
>
> Key: FLINK-2001
> URL: https://issues.apache.org/jira/browse/FLINK-2001
> Project: Flink
>  Issue Type: Bug
>  Components: Machine Learning Library
>Reporter: Chiwan Park
>Assignee: Chiwan Park
>Priority: Critical
>  Labels: ML
>
> Because DistanceMeasure trait doesn't extend Serializable, The task using 
> DistanceMeasure raises a following exception.
> {code}
> Task not serializable
> org.apache.flink.api.common.InvalidProgramException: Task not serializable
>   at 
> org.apache.flink.api.scala.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:179)
>   at 
> org.apache.flink.api.scala.ClosureCleaner$.clean(ClosureCleaner.scala:171)
>   at org.apache.flink.api.scala.DataSet.clean(DataSet.scala:123)
>   at org.apache.flink.api.scala.DataSet$$anon$10.(DataSet.scala:691)
>   at org.apache.flink.api.scala.DataSet.combineGroup(DataSet.scala:690)
>   at org.apache.flink.ml.classification.KNNModel.transform(KNN.scala:78)
>   at 
> org.apache.flink.ml.classification.KNNITSuite$$anonfun$1.apply$mcV$sp(KNNSuite.scala:25)
>   at 
> org.apache.flink.ml.classification.KNNITSuite$$anonfun$1.apply(KNNSuite.scala:12)
>   at 
> org.apache.flink.ml.classification.KNNITSuite$$anonfun$1.apply(KNNSuite.scala:12)
>   at 
> org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22)
>   at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
>   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
>   at org.scalatest.Transformer.apply(Transformer.scala:22)
>   at org.scalatest.Transformer.apply(Transformer.scala:20)
>   at org.scalatest.FlatSpecLike$$anon$1.apply(FlatSpecLike.scala:1647)
>   at org.scalatest.Suite$class.withFixture(Suite.scala:1122)
>   at org.scalatest.FlatSpec.withFixture(FlatSpec.scala:1683)
>   at 
> org.scalatest.FlatSpecLike$class.invokeWithFixture$1(FlatSpecLike.scala:1644)
>   at 
> org.scalatest.FlatSpecLike$$anonfun$runTest$1.apply(FlatSpecLike.scala:1656)
>   at 
> org.scalatest.FlatSpecLike$$anonfun$runTest$1.apply(FlatSpecLike.scala:1656)
>   at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306)
>   at org.scalatest.FlatSpecLike$class.runTest(FlatSpecLike.scala:1656)
>   at 
> org.apache.flink.ml.classification.KNNITSuite.org$scalatest$BeforeAndAfter$$super$runTest(KNNSuite.scala:9)
>   at org.scalatest.BeforeAndAfter$class.runTest(BeforeAndAfter.scala:200)
>   at 
> org.apache.flink.ml.classification.KNNITSuite.runTest(KNNSuite.scala:9)
>   at 
> org.scalatest.FlatSpecLike$$anonfun$runTests$1.apply(FlatSpecLike.scala:1714)
>   at 
> org.scalatest.FlatSpecLike$$anonfun$runTests$1.apply(FlatSpecLike.scala:1714)
>   at 
> org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:413)
>   at 
> org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:401)
>   at scala.collection.immutable.List.foreach(List.scala:318)
>   at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401)
>   at 
> org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:390)
>   at 
> org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:427)
>   at 
> org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:401)
>   at scala.collection.immutable.List.foreach(List.scala:318)
>   at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401)
>   at 
> org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:396)
>   at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:483)
>   at org.scalatest.FlatSpecLike$class.runTests(FlatSpecLike.scala:1714)
>   at org.scala

[GitHub] flink pull request: [hotfix][scala] Let type analysis work on some...

2015-05-11 Thread aljoscha
Github user aljoscha commented on the pull request:

https://github.com/apache/flink/pull/660#issuecomment-101145293
  
It does not treat Java classes the same. For some strange reason, the Scala 
Type analysis in Scala 2.10 can sometimes not "see" the fields of classes 
defined in Java.

The Scala Type analysis treats Java Tuples as what they are, POJOs. We 
could maybe add special case handling to make it treat Java tuples as Java 
tuples.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (FLINK-2001) DistanceMetric cannot be serialized

2015-05-11 Thread Chiwan Park (JIRA)
Chiwan Park created FLINK-2001:
--

 Summary: DistanceMetric cannot be serialized
 Key: FLINK-2001
 URL: https://issues.apache.org/jira/browse/FLINK-2001
 Project: Flink
  Issue Type: Bug
  Components: Machine Learning Library
Reporter: Chiwan Park
Assignee: Chiwan Park
Priority: Critical


Because DistanceMeasure trait doesn't extend Serializable, The task using 
DistanceMeasure raises a following exception.

{code}
Task not serializable
org.apache.flink.api.common.InvalidProgramException: Task not serializable
at 
org.apache.flink.api.scala.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:179)
at 
org.apache.flink.api.scala.ClosureCleaner$.clean(ClosureCleaner.scala:171)
at org.apache.flink.api.scala.DataSet.clean(DataSet.scala:123)
at org.apache.flink.api.scala.DataSet$$anon$10.(DataSet.scala:691)
at org.apache.flink.api.scala.DataSet.combineGroup(DataSet.scala:690)
at org.apache.flink.ml.classification.KNNModel.transform(KNN.scala:78)
at 
org.apache.flink.ml.classification.KNNITSuite$$anonfun$1.apply$mcV$sp(KNNSuite.scala:25)
at 
org.apache.flink.ml.classification.KNNITSuite$$anonfun$1.apply(KNNSuite.scala:12)
at 
org.apache.flink.ml.classification.KNNITSuite$$anonfun$1.apply(KNNSuite.scala:12)
at 
org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22)
at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
at org.scalatest.Transformer.apply(Transformer.scala:22)
at org.scalatest.Transformer.apply(Transformer.scala:20)
at org.scalatest.FlatSpecLike$$anon$1.apply(FlatSpecLike.scala:1647)
at org.scalatest.Suite$class.withFixture(Suite.scala:1122)
at org.scalatest.FlatSpec.withFixture(FlatSpec.scala:1683)
at 
org.scalatest.FlatSpecLike$class.invokeWithFixture$1(FlatSpecLike.scala:1644)
at 
org.scalatest.FlatSpecLike$$anonfun$runTest$1.apply(FlatSpecLike.scala:1656)
at 
org.scalatest.FlatSpecLike$$anonfun$runTest$1.apply(FlatSpecLike.scala:1656)
at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306)
at org.scalatest.FlatSpecLike$class.runTest(FlatSpecLike.scala:1656)
at 
org.apache.flink.ml.classification.KNNITSuite.org$scalatest$BeforeAndAfter$$super$runTest(KNNSuite.scala:9)
at org.scalatest.BeforeAndAfter$class.runTest(BeforeAndAfter.scala:200)
at 
org.apache.flink.ml.classification.KNNITSuite.runTest(KNNSuite.scala:9)
at 
org.scalatest.FlatSpecLike$$anonfun$runTests$1.apply(FlatSpecLike.scala:1714)
at 
org.scalatest.FlatSpecLike$$anonfun$runTests$1.apply(FlatSpecLike.scala:1714)
at 
org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:413)
at 
org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:401)
at scala.collection.immutable.List.foreach(List.scala:318)
at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401)
at 
org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:390)
at 
org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:427)
at 
org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:401)
at scala.collection.immutable.List.foreach(List.scala:318)
at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401)
at 
org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:396)
at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:483)
at org.scalatest.FlatSpecLike$class.runTests(FlatSpecLike.scala:1714)
at org.scalatest.FlatSpec.runTests(FlatSpec.scala:1683)
at org.scalatest.Suite$class.run(Suite.scala:1424)
at 
org.scalatest.FlatSpec.org$scalatest$FlatSpecLike$$super$run(FlatSpec.scala:1683)
at 
org.scalatest.FlatSpecLike$$anonfun$run$1.apply(FlatSpecLike.scala:1760)
at 
org.scalatest.FlatSpecLike$$anonfun$run$1.apply(FlatSpecLike.scala:1760)
at org.scalatest.SuperEngine.runImpl(Engine.scala:545)
at org.scalatest.FlatSpecLike$class.run(FlatSpecLike.scala:1760)
at 
org.apache.flink.ml.classification.KNNITSuite.org$scalatest$BeforeAndAfter$$super$run(KNNSuite.scala:9)
at org.scalatest.BeforeAndAfter$class.run(BeforeAndAfter.scala:241)
at org.apache.flink.ml.classification.KNNITSuite.run(KNNSuite.scala:9)
at org.scalatest.tools.SuiteRunner.run(SuiteRunner.scala:55)
at 
org.scalatest.tools.Runner$$anonfun$doRunRunRunDaDoRunRun$3.apply(Runner.scala:2563)
at 
org.scalatest.tools.Runner$$anonfun$doRunRunRunDaDoRunRun$3.apply(Runner.scala:2557)
at scala.collection.immutable.List.foreach(List.scala:31

[jira] [Commented] (FLINK-1980) Allowing users to decorate input streams

2015-05-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14538747#comment-14538747
 ] 

ASF GitHub Bot commented on FLINK-1980:
---

Github user sekruse commented on the pull request:

https://github.com/apache/flink/pull/658#issuecomment-101064479
  
I wanted to integrate the gz support with the decorateStream method, 
therefore I was waiting for this PR to be merged.
I can of course skip the seek, but I thought that forward seeking would be 
useful as it enables splittable file formats.


> Allowing users to decorate input streams
> 
>
> Key: FLINK-1980
> URL: https://issues.apache.org/jira/browse/FLINK-1980
> Project: Flink
>  Issue Type: New Feature
>  Components: Core
>Reporter: Sebastian Kruse
>Assignee: Sebastian Kruse
>Priority: Minor
>
> Users may have to do unforeseeable operations on file input streams before 
> they can be used by the actual input format logic, e.g., exotic compression 
> formats or preambles such as byte order marks. Therefore, it would be useful 
> to provide the user with a hook to decorate input streams in order to handle 
> such issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-1980] allowing users to decorate input ...

2015-05-11 Thread sekruse
Github user sekruse commented on the pull request:

https://github.com/apache/flink/pull/658#issuecomment-101064479
  
I wanted to integrate the gz support with the decorateStream method, 
therefore I was waiting for this PR to be merged.
I can of course skip the seek, but I thought that forward seeking would be 
useful as it enables splittable file formats.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-1523) Vertex-centric iteration extensions

2015-05-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14538662#comment-14538662
 ] 

ASF GitHub Bot commented on FLINK-1523:
---

Github user vasia commented on the pull request:

https://github.com/apache/flink/pull/537#issuecomment-101048928
  
Hey @andralungu ,

I built on your changes and implemented what I described above. I have 
pushed in [this branch](https://github.com/vasia/flink/tree/vc-extentions).
I had to manually resolve conflicts caused by #657, so I guess the easiest 
way to allow people to review would be for me to open a fresh PR.
Since you are the one mostly familiar with these features, could you please 
take a look and let me know what you think?

Thanks a lot!



> Vertex-centric iteration extensions
> ---
>
> Key: FLINK-1523
> URL: https://issues.apache.org/jira/browse/FLINK-1523
> Project: Flink
>  Issue Type: Improvement
>  Components: Gelly
>Reporter: Vasia Kalavri
>Assignee: Andra Lungu
>
> We would like to make the following extensions to the vertex-centric 
> iterations of Gelly:
> - allow vertices to access their in/out degrees and the total number of 
> vertices of the graph, inside the iteration.
> - allow choosing the neighborhood type (in/out/all) over which to run the 
> vertex-centric iteration. Now, the model uses the updates of the in-neighbors 
> to calculate state and send messages to out-neighbors. We could add a 
> parameter with value "in/out/all" to the {{VertexUpdateFunction}} and 
> {{MessagingFunction}}, that would indicate the type of neighborhood.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-1523][gelly] Vertex centric iteration e...

2015-05-11 Thread vasia
Github user vasia commented on the pull request:

https://github.com/apache/flink/pull/537#issuecomment-101048928
  
Hey @andralungu ,

I built on your changes and implemented what I described above. I have 
pushed in [this branch](https://github.com/vasia/flink/tree/vc-extentions).
I had to manually resolve conflicts caused by #657, so I guess the easiest 
way to allow people to review would be for me to open a fresh PR.
Since you are the one mostly familiar with these features, could you please 
take a look and let me know what you think?

Thanks a lot!



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-1962) Add Gelly Scala API

2015-05-11 Thread Vasia Kalavri (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14538622#comment-14538622
 ] 

Vasia Kalavri commented on FLINK-1962:
--

It would be great if we could somehow overcome this through the Scala type 
analysis system. Most of the Gelly methods are currently built on the 
assumption that Vertex and Edge types are Tuples and it wouldn't be trivial to 
change them.

> Add Gelly Scala API
> ---
>
> Key: FLINK-1962
> URL: https://issues.apache.org/jira/browse/FLINK-1962
> Project: Flink
>  Issue Type: Task
>  Components: Gelly, Scala API
>Affects Versions: 0.9
>Reporter: Vasia Kalavri
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (FLINK-1969) Remove old profile code

2015-05-11 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-1969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen closed FLINK-1969.
---

> Remove old profile code
> ---
>
> Key: FLINK-1969
> URL: https://issues.apache.org/jira/browse/FLINK-1969
> Project: Flink
>  Issue Type: Bug
>  Components: Distributed Runtime
>Affects Versions: 0.9
>Reporter: Stephan Ewen
>Assignee: Stephan Ewen
> Fix For: 0.9
>
>
> The old profiler code is not instantiated any more and is basically dead.
> It has in parts been replaced by the metrics library already.
> The classes still get in the way during refactoring, which is why I suggest 
> to remove them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (FLINK-1969) Remove old profile code

2015-05-11 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-1969?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen resolved FLINK-1969.
-
Resolution: Done

Done in fbea2da26d01c470687a5ad217a5fd6ad1de89e4

> Remove old profile code
> ---
>
> Key: FLINK-1969
> URL: https://issues.apache.org/jira/browse/FLINK-1969
> Project: Flink
>  Issue Type: Bug
>  Components: Distributed Runtime
>Affects Versions: 0.9
>Reporter: Stephan Ewen
>Assignee: Stephan Ewen
> Fix For: 0.9
>
>
> The old profiler code is not instantiated any more and is basically dead.
> It has in parts been replaced by the metrics library already.
> The classes still get in the way during refactoring, which is why I suggest 
> to remove them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (FLINK-1672) Refactor task registration/unregistration

2015-05-11 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-1672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen resolved FLINK-1672.
-
   Resolution: Fixed
Fix Version/s: 0.9
 Assignee: Stephan Ewen

Implemented in 8e61301452218e6d279b013beb7bbd02a7c2e3f9

> Refactor task registration/unregistration
> -
>
> Key: FLINK-1672
> URL: https://issues.apache.org/jira/browse/FLINK-1672
> Project: Flink
>  Issue Type: Improvement
>  Components: Distributed Runtime
>Reporter: Ufuk Celebi
>Assignee: Stephan Ewen
> Fix For: 0.9
>
>
> h4. Current control flow for task registrations
> # JM submits a TaskDeploymentDescriptor to a TM
> ## TM registers the required JAR files with the LibraryCacheManager and 
> returns the user code class loader
> ## TM creates a Task instance and registers the task in the runningTasks map
> ## TM creates a TaskInputSplitProvider
> ## TM creates a RuntimeEnvironment and sets it as the environment for the task
> ## TM registers the task with the network environment
> ## TM sends async msg to profiler to monitor tasks
> ## TM creates temporary files in file cache
> ## TM tries to start the task
> If any operation >= 1.2 fails:
> * TM calls task.failExternally()
> * TM removes temporary files from file cache
> * TM unregisters the task from the network environment
> * TM sends async msg to profiler to unmonitor tasks
> * TM calls unregisterMemoryManager on task
> If 1.1 fails, only unregister from LibraryCacheManager.
> h4. RuntimeEnvironment, Task, TaskManager separation
> The RuntimeEnvironment has references to certain components of the task 
> manager like memory manager, which are accecssed from the task. Furthermore 
> it implements Runnable, and creates the executing task Thread. The Task 
> instance essentially wraps the RuntimeEnvironment and allows asynchronous 
> state management of the task (RUNNING, FINISHED, etc.).
> The way that the state updates affect the task is not that obvious: state 
> changes trigger messages to the TM, which for final states further trigger a 
> msg to unregister the task. The way that tasks are unregistered again depends 
> on the state of the task.
> 
> I would propose to refactor this to make the way the state 
> handling/registration/unregistration is handled is more transparent.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (FLINK-1672) Refactor task registration/unregistration

2015-05-11 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-1672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen closed FLINK-1672.
---

> Refactor task registration/unregistration
> -
>
> Key: FLINK-1672
> URL: https://issues.apache.org/jira/browse/FLINK-1672
> Project: Flink
>  Issue Type: Improvement
>  Components: Distributed Runtime
>Reporter: Ufuk Celebi
>Assignee: Stephan Ewen
> Fix For: 0.9
>
>
> h4. Current control flow for task registrations
> # JM submits a TaskDeploymentDescriptor to a TM
> ## TM registers the required JAR files with the LibraryCacheManager and 
> returns the user code class loader
> ## TM creates a Task instance and registers the task in the runningTasks map
> ## TM creates a TaskInputSplitProvider
> ## TM creates a RuntimeEnvironment and sets it as the environment for the task
> ## TM registers the task with the network environment
> ## TM sends async msg to profiler to monitor tasks
> ## TM creates temporary files in file cache
> ## TM tries to start the task
> If any operation >= 1.2 fails:
> * TM calls task.failExternally()
> * TM removes temporary files from file cache
> * TM unregisters the task from the network environment
> * TM sends async msg to profiler to unmonitor tasks
> * TM calls unregisterMemoryManager on task
> If 1.1 fails, only unregister from LibraryCacheManager.
> h4. RuntimeEnvironment, Task, TaskManager separation
> The RuntimeEnvironment has references to certain components of the task 
> manager like memory manager, which are accecssed from the task. Furthermore 
> it implements Runnable, and creates the executing task Thread. The Task 
> instance essentially wraps the RuntimeEnvironment and allows asynchronous 
> state management of the task (RUNNING, FINISHED, etc.).
> The way that the state updates affect the task is not that obvious: state 
> changes trigger messages to the TM, which for final states further trigger a 
> msg to unregister the task. The way that tasks are unregistered again depends 
> on the state of the task.
> 
> I would propose to refactor this to make the way the state 
> handling/registration/unregistration is handled is more transparent.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (FLINK-1968) Make Distributed Cache more robust

2015-05-11 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-1968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen closed FLINK-1968.
---

> Make Distributed Cache more robust
> --
>
> Key: FLINK-1968
> URL: https://issues.apache.org/jira/browse/FLINK-1968
> Project: Flink
>  Issue Type: Bug
>  Components: Local Runtime
>Affects Versions: 0.9
>Reporter: Stephan Ewen
>Assignee: Stephan Ewen
> Fix For: 0.9
>
>
> The distributed cache has a variety of issues at the moment.
>  - It does not give a proper exception when a non-cached file is accessed
>  - It swallows I/O exceptions that happen during file transfer and later only 
> returns null
>  - It keeps inonsistently reference counts and attempts to copy often, 
> resolving this via file collisions
>  - Files are not properly removes on shutdown
>  - No shutdown hook to remove files when process is killed



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (FLINK-1968) Make Distributed Cache more robust

2015-05-11 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-1968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen resolved FLINK-1968.
-
Resolution: Fixed

Fixed via 1c8d866a83065e3d1bc9707dab81117f24c9f678

> Make Distributed Cache more robust
> --
>
> Key: FLINK-1968
> URL: https://issues.apache.org/jira/browse/FLINK-1968
> Project: Flink
>  Issue Type: Bug
>  Components: Local Runtime
>Affects Versions: 0.9
>Reporter: Stephan Ewen
>Assignee: Stephan Ewen
> Fix For: 0.9
>
>
> The distributed cache has a variety of issues at the moment.
>  - It does not give a proper exception when a non-cached file is accessed
>  - It swallows I/O exceptions that happen during file transfer and later only 
> returns null
>  - It keeps inonsistently reference counts and attempts to copy often, 
> resolving this via file collisions
>  - Files are not properly removes on shutdown
>  - No shutdown hook to remove files when process is killed



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1525) Provide utils to pass -D parameters to UDFs

2015-05-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14538574#comment-14538574
 ] 

ASF GitHub Bot commented on FLINK-1525:
---

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/664#issuecomment-101035060
  
Nice idea. Do we need the special `UserConfig` interface, or can we use a 
`Properties` object, or a directly a `Map`?


> Provide utils to pass -D parameters to UDFs 
> 
>
> Key: FLINK-1525
> URL: https://issues.apache.org/jira/browse/FLINK-1525
> Project: Flink
>  Issue Type: Improvement
>  Components: flink-contrib
>Reporter: Robert Metzger
>  Labels: starter
>
> Hadoop users are used to setting job configuration through "-D" on the 
> command line.
> Right now, Flink users have to manually parse command line arguments and pass 
> them to the methods.
> It would be nice to provide a standard args parser with is taking care of 
> such stuff.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-1525][FEEDBACK] Introduction of a small...

2015-05-11 Thread StephanEwen
Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/664#issuecomment-101035060
  
Nice idea. Do we need the special `UserConfig` interface, or can we use a 
`Properties` object, or a directly a `Map`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-1962) Add Gelly Scala API

2015-05-11 Thread Stephan Ewen (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14538571#comment-14538571
 ] 

Stephan Ewen commented on FLINK-1962:
-

This seems like a flaw in the Scala Type analysis. The Scala type system should 
really support the Java types properly.
This will be a recurring issue, if consider it common practice to build Scala 
libraries on Java libraries.

> Add Gelly Scala API
> ---
>
> Key: FLINK-1962
> URL: https://issues.apache.org/jira/browse/FLINK-1962
> Project: Flink
>  Issue Type: Task
>  Components: Gelly, Scala API
>Affects Versions: 0.9
>Reporter: Vasia Kalavri
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1874) Break up streaming connectors into submodules

2015-05-11 Thread Stephan Ewen (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14538556#comment-14538556
 ] 

Stephan Ewen commented on FLINK-1874:
-

I think we are about to merge the biggest change, so this issue should become 
available soon...

> Break up streaming connectors into submodules
> -
>
> Key: FLINK-1874
> URL: https://issues.apache.org/jira/browse/FLINK-1874
> Project: Flink
>  Issue Type: Task
>  Components: Build System, Streaming
>Affects Versions: 0.9
>Reporter: Robert Metzger
>  Labels: starter
>
> As per: 
> http://apache-flink-incubator-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Break-up-streaming-connectors-into-subprojects-td5001.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1959) Accumulators BROKEN after Partitioning

2015-05-11 Thread Stephan Ewen (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14538540#comment-14538540
 ] 

Stephan Ewen commented on FLINK-1959:
-

I will try and look into this very soon...

> Accumulators BROKEN after Partitioning
> --
>
> Key: FLINK-1959
> URL: https://issues.apache.org/jira/browse/FLINK-1959
> Project: Flink
>  Issue Type: Bug
>  Components: Examples
>Affects Versions: master
>Reporter: mustafa elbehery
>Priority: Critical
> Fix For: master
>
>
> while running the Accumulator example in 
> https://github.com/Elbehery/flink/blob/master/flink-examples/flink-java-examples/src/main/java/org/apache/flink/examples/java/relational/EmptyFieldsCountAccumulator.java,
>  
> I tried to alter the data flow with "PartitionByHash" function before 
> applying "Filter", and the resulted accumulator was NULL. 
> By Debugging, I could see the accumulator in the RunTime Map. However, by 
> retrieving the accumulator from the JobExecutionResult object, it was NULL. 
> The line caused the problem is "file.partitionByHash(1).filter(new 
> EmptyFieldFilter())" instead of "file.filter(new EmptyFieldFilter())"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1980) Allowing users to decorate input streams

2015-05-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14538497#comment-14538497
 ] 

ASF GitHub Bot commented on FLINK-1980:
---

Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/658#issuecomment-101026234
  
I think Sebastian has already filed a JIRA for adding gz read support.


> Allowing users to decorate input streams
> 
>
> Key: FLINK-1980
> URL: https://issues.apache.org/jira/browse/FLINK-1980
> Project: Flink
>  Issue Type: New Feature
>  Components: Core
>Reporter: Sebastian Kruse
>Assignee: Sebastian Kruse
>Priority: Minor
>
> Users may have to do unforeseeable operations on file input streams before 
> they can be used by the actual input format logic, e.g., exotic compression 
> formats or preambles such as byte order marks. Therefore, it would be useful 
> to provide the user with a hook to decorate input streams in order to handle 
> such issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-1980] allowing users to decorate input ...

2015-05-11 Thread rmetzger
Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/658#issuecomment-101026234
  
I think Sebastian has already filed a JIRA for adding gz read support.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1980] allowing users to decorate input ...

2015-05-11 Thread StephanEwen
Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/658#issuecomment-101024456
  
Looks good. One comment, though:
 - The position tracking in the stream wrapper is a nice idea, but since 
the `skip()` method is anyways only a hint (it may skip less if it wants) and 
seeking works only in one direction, why not skip seek support completely? Also 
simplifies the code and makes it even more lightweight.

As a followup: Should we add more built-in decompressors for other file 
endings, like `*.gz` ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Updated] (FLINK-1996) Add output methods to Table API

2015-05-11 Thread Aljoscha Krettek (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-1996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aljoscha Krettek updated FLINK-1996:

Assignee: (was: Aljoscha Krettek)

> Add output methods to Table API
> ---
>
> Key: FLINK-1996
> URL: https://issues.apache.org/jira/browse/FLINK-1996
> Project: Flink
>  Issue Type: Improvement
>  Components: Table API
>Affects Versions: 0.9
>Reporter: Fabian Hueske
>
> Tables need to be converted to DataSets (or DataStreams) to write them out. 
> It would be good to have a way to emit Table results directly for example to 
> print, CSV, JDBC, HBase, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1996) Add output methods to Table API

2015-05-11 Thread Aljoscha Krettek (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14538490#comment-14538490
 ] 

Aljoscha Krettek commented on FLINK-1996:
-

I would say so, yes.

> Add output methods to Table API
> ---
>
> Key: FLINK-1996
> URL: https://issues.apache.org/jira/browse/FLINK-1996
> Project: Flink
>  Issue Type: Improvement
>  Components: Table API
>Affects Versions: 0.9
>Reporter: Fabian Hueske
>
> Tables need to be converted to DataSets (or DataStreams) to write them out. 
> It would be good to have a way to emit Table results directly for example to 
> print, CSV, JDBC, HBase, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1980) Allowing users to decorate input streams

2015-05-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14538486#comment-14538486
 ] 

ASF GitHub Bot commented on FLINK-1980:
---

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/658#issuecomment-101024456
  
Looks good. One comment, though:
 - The position tracking in the stream wrapper is a nice idea, but since 
the `skip()` method is anyways only a hint (it may skip less if it wants) and 
seeking works only in one direction, why not skip seek support completely? Also 
simplifies the code and makes it even more lightweight.

As a followup: Should we add more built-in decompressors for other file 
endings, like `*.gz` ?


> Allowing users to decorate input streams
> 
>
> Key: FLINK-1980
> URL: https://issues.apache.org/jira/browse/FLINK-1980
> Project: Flink
>  Issue Type: New Feature
>  Components: Core
>Reporter: Sebastian Kruse
>Assignee: Sebastian Kruse
>Priority: Minor
>
> Users may have to do unforeseeable operations on file input streams before 
> they can be used by the actual input format logic, e.g., exotic compression 
> formats or preambles such as byte order marks. Therefore, it would be useful 
> to provide the user with a hook to decorate input streams in order to handle 
> such issues.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1996) Add output methods to Table API

2015-05-11 Thread Stephan Ewen (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14538457#comment-14538457
 ] 

Stephan Ewen commented on FLINK-1996:
-

Does this make a good starter issue?

> Add output methods to Table API
> ---
>
> Key: FLINK-1996
> URL: https://issues.apache.org/jira/browse/FLINK-1996
> Project: Flink
>  Issue Type: Improvement
>  Components: Table API
>Affects Versions: 0.9
>Reporter: Fabian Hueske
>Assignee: Aljoscha Krettek
>
> Tables need to be converted to DataSets (or DataStreams) to write them out. 
> It would be good to have a way to emit Table results directly for example to 
> print, CSV, JDBC, HBase, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1996) Add output methods to Table API

2015-05-11 Thread Stephan Ewen (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14538456#comment-14538456
 ] 

Stephan Ewen commented on FLINK-1996:
-

I think it may actually be worth adding those methods, where possible. It 
really is much simpler that way...
Their implementation could be slim, if they delegate to the data set code / 
output formats.

> Add output methods to Table API
> ---
>
> Key: FLINK-1996
> URL: https://issues.apache.org/jira/browse/FLINK-1996
> Project: Flink
>  Issue Type: Improvement
>  Components: Table API
>Affects Versions: 0.9
>Reporter: Fabian Hueske
>Assignee: Aljoscha Krettek
>
> Tables need to be converted to DataSets (or DataStreams) to write them out. 
> It would be good to have a way to emit Table results directly for example to 
> print, CSV, JDBC, HBase, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1989) Sorting of POJO data set from TableEnv yields NotSerializableException

2015-05-11 Thread Aljoscha Krettek (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14538407#comment-14538407
 ] 

Aljoscha Krettek commented on FLINK-1989:
-

I could rework the code generation to generate the complete code on the client 
and only ship the generated Java code in a String. Only compilation from Java 
to code would then happen at runtime on the TaskManager.

> Sorting of POJO data set from TableEnv yields NotSerializableException
> --
>
> Key: FLINK-1989
> URL: https://issues.apache.org/jira/browse/FLINK-1989
> Project: Flink
>  Issue Type: Bug
>  Components: Table API
>Affects Versions: 0.9
>Reporter: Fabian Hueske
>Assignee: Aljoscha Krettek
> Fix For: 0.9
>
>
> Sorting or grouping (or probably any other key operation) on a POJO data set 
> that was created by a {{TableEnvironment}} yields a 
> {{NotSerializableException}} due to a non-serializable 
> {{java.lang.reflect.Field}} object. 
> I traced the error back to the {{ExpressionSelectFunction}}. I guess that a 
> {{TypeInformation}} object is stored in the generated user-code function. A 
> {{PojoTypeInfo}} holds Field objects, which cannot be serialized.
> The following test can be pasted into the {{SelectITCase}} and reproduces the 
> problem. 
> {code}
> @Test
> public void testGroupByAfterTable() throws Exception {
>   ExecutionEnvironment env = 
> ExecutionEnvironment.getExecutionEnvironment();
>   TableEnvironment tableEnv = new TableEnvironment();
>   DataSet> ds = 
> CollectionDataSets.get3TupleDataSet(env);
>   Table in = tableEnv.toTable(ds, "a,b,c");
>   Table result = in
>   .select("a, b, c");
>   DataSet resultSet = tableEnv.toSet(result, ABC.class);
>   resultSet
>   .sortPartition("a", Order.DESCENDING)
>   .writeAsText(resultPath, 
> FileSystem.WriteMode.OVERWRITE);
>   env.execute();
>   expected = "1,1,Hi\n" + "2,2,Hello\n" + "3,2,Hello world\n" + 
> "4,3,Hello world, " +
>   "how are you?\n" + "5,3,I am fine.\n" + "6,3,Luke 
> Skywalker\n" + "7,4," +
>   "Comment#1\n" + "8,4,Comment#2\n" + "9,4,Comment#3\n" + 
> "10,4,Comment#4\n" + "11,5," +
>   "Comment#5\n" + "12,5,Comment#6\n" + "13,5,Comment#7\n" 
> + "14,5,Comment#8\n" + "15,5," +
>   "Comment#9\n" + "16,6,Comment#10\n" + 
> "17,6,Comment#11\n" + "18,6,Comment#12\n" + "19," +
>   "6,Comment#13\n" + "20,6,Comment#14\n" + 
> "21,6,Comment#15\n";
> }
> public static class ABC {
>   public int a;
>   public long b;
>   public String c;
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1676) enableForceKryo() is not working as expected

2015-05-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14538370#comment-14538370
 ] 

ASF GitHub Bot commented on FLINK-1676:
---

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/473#issuecomment-101010857
  
I maintain one comment, but I am not blocking this. Feel free to merge...


> enableForceKryo() is not working as expected
> 
>
> Key: FLINK-1676
> URL: https://issues.apache.org/jira/browse/FLINK-1676
> Project: Flink
>  Issue Type: Bug
>  Components: Java API
>Affects Versions: 0.9
>Reporter: Robert Metzger
>
> I my Flink job, I've set the following execution config
> {code}
> final ExecutionEnvironment env = 
> ExecutionEnvironment.getExecutionEnvironment();
> env.getConfig().disableObjectReuse();
> env.getConfig().enableForceKryo();
> {code}
> Setting a breakpoint in the {{PojoSerializer()}} constructor, you'll see that 
> we still serialize data with the POJO serializer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-1676] Rework ExecutionConfig.enableForc...

2015-05-11 Thread StephanEwen
Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/473#issuecomment-101010857
  
I maintain one comment, but I am not blocking this. Feel free to merge...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1676] Rework ExecutionConfig.enableForc...

2015-05-11 Thread StephanEwen
Github user StephanEwen commented on a diff in the pull request:

https://github.com/apache/flink/pull/473#discussion_r30067549
  
--- Diff: 
flink-java/src/main/java/org/apache/flink/api/java/typeutils/PojoTypeInfo.java 
---
@@ -313,6 +319,15 @@ public int getFieldIndex(String fieldName) {
 
@Override
public TypeSerializer createSerializer(ExecutionConfig config) {
+   if(config.isForceKryoEnabled()) {
+   LOG.debug("Using KryoSerializer for serializing POJOs");
--- End diff --

I still think this is the wrong place to log this, it will get repeated a 
gazillion times on large plans. Also, it is logged in some places on the 
client, and sometimes in the runtime.

It is not terribly bad, since it is debug level, but I don't get the reason 
to have it in the first place. Since you have no context information (what 
stream/dataset this is for), it is not a good debugging help anyways...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-1676) enableForceKryo() is not working as expected

2015-05-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14538366#comment-14538366
 ] 

ASF GitHub Bot commented on FLINK-1676:
---

Github user StephanEwen commented on a diff in the pull request:

https://github.com/apache/flink/pull/473#discussion_r30067549
  
--- Diff: 
flink-java/src/main/java/org/apache/flink/api/java/typeutils/PojoTypeInfo.java 
---
@@ -313,6 +319,15 @@ public int getFieldIndex(String fieldName) {
 
@Override
public TypeSerializer createSerializer(ExecutionConfig config) {
+   if(config.isForceKryoEnabled()) {
+   LOG.debug("Using KryoSerializer for serializing POJOs");
--- End diff --

I still think this is the wrong place to log this, it will get repeated a 
gazillion times on large plans. Also, it is logged in some places on the 
client, and sometimes in the runtime.

It is not terribly bad, since it is debug level, but I don't get the reason 
to have it in the first place. Since you have no context information (what 
stream/dataset this is for), it is not a good debugging help anyways...


> enableForceKryo() is not working as expected
> 
>
> Key: FLINK-1676
> URL: https://issues.apache.org/jira/browse/FLINK-1676
> Project: Flink
>  Issue Type: Bug
>  Components: Java API
>Affects Versions: 0.9
>Reporter: Robert Metzger
>
> I my Flink job, I've set the following execution config
> {code}
> final ExecutionEnvironment env = 
> ExecutionEnvironment.getExecutionEnvironment();
> env.getConfig().disableObjectReuse();
> env.getConfig().enableForceKryo();
> {code}
> Setting a breakpoint in the {{PojoSerializer()}} constructor, you'll see that 
> we still serialize data with the POJO serializer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1989) Sorting of POJO data set from TableEnv yields NotSerializableException

2015-05-11 Thread Stephan Ewen (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14538348#comment-14538348
 ] 

Stephan Ewen commented on FLINK-1989:
-

Storing the type information breaks with the current design principles, where 
the {{TypeInformation}} is a pure pre-flight concept, and the 
{{TypeSerializer}} and {{TypeComparator}} are the runtime handles.

> Sorting of POJO data set from TableEnv yields NotSerializableException
> --
>
> Key: FLINK-1989
> URL: https://issues.apache.org/jira/browse/FLINK-1989
> Project: Flink
>  Issue Type: Bug
>  Components: Table API
>Affects Versions: 0.9
>Reporter: Fabian Hueske
>Assignee: Aljoscha Krettek
> Fix For: 0.9
>
>
> Sorting or grouping (or probably any other key operation) on a POJO data set 
> that was created by a {{TableEnvironment}} yields a 
> {{NotSerializableException}} due to a non-serializable 
> {{java.lang.reflect.Field}} object. 
> I traced the error back to the {{ExpressionSelectFunction}}. I guess that a 
> {{TypeInformation}} object is stored in the generated user-code function. A 
> {{PojoTypeInfo}} holds Field objects, which cannot be serialized.
> The following test can be pasted into the {{SelectITCase}} and reproduces the 
> problem. 
> {code}
> @Test
> public void testGroupByAfterTable() throws Exception {
>   ExecutionEnvironment env = 
> ExecutionEnvironment.getExecutionEnvironment();
>   TableEnvironment tableEnv = new TableEnvironment();
>   DataSet> ds = 
> CollectionDataSets.get3TupleDataSet(env);
>   Table in = tableEnv.toTable(ds, "a,b,c");
>   Table result = in
>   .select("a, b, c");
>   DataSet resultSet = tableEnv.toSet(result, ABC.class);
>   resultSet
>   .sortPartition("a", Order.DESCENDING)
>   .writeAsText(resultPath, 
> FileSystem.WriteMode.OVERWRITE);
>   env.execute();
>   expected = "1,1,Hi\n" + "2,2,Hello\n" + "3,2,Hello world\n" + 
> "4,3,Hello world, " +
>   "how are you?\n" + "5,3,I am fine.\n" + "6,3,Luke 
> Skywalker\n" + "7,4," +
>   "Comment#1\n" + "8,4,Comment#2\n" + "9,4,Comment#3\n" + 
> "10,4,Comment#4\n" + "11,5," +
>   "Comment#5\n" + "12,5,Comment#6\n" + "13,5,Comment#7\n" 
> + "14,5,Comment#8\n" + "15,5," +
>   "Comment#9\n" + "16,6,Comment#10\n" + 
> "17,6,Comment#11\n" + "18,6,Comment#12\n" + "19," +
>   "6,Comment#13\n" + "20,6,Comment#14\n" + 
> "21,6,Comment#15\n";
> }
> public static class ABC {
>   public int a;
>   public long b;
>   public String c;
> }
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1986) Group by fails on iterative data streams

2015-05-11 Thread Stephan Ewen (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14538328#comment-14538328
 ] 

Stephan Ewen commented on FLINK-1986:
-

[~gyfora] and [~senorcarbone] Are you using co-location constraints, to make 
sure that head and tail of an iteration are co-located? Otherwise that is not 
guaranteed, but required by the backchannel broker.


> Group by fails on iterative data streams
> 
>
> Key: FLINK-1986
> URL: https://issues.apache.org/jira/browse/FLINK-1986
> Project: Flink
>  Issue Type: Bug
>  Components: Streaming
>Reporter: Daniel Bali
>  Labels: iteration, streaming
>
> Hello!
> When I try to run a `groupBy` on an IterativeDataStream I get a 
> NullPointerException. Here is the code that reproduces the issue:
> {code}
> public Test() throws Exception {
> StreamExecutionEnvironment env = 
> StreamExecutionEnvironment.createLocalEnvironment();
> DataStream> edges = env
> .generateSequence(0, 7)
> .map(new MapFunction>() {
> @Override
> public Tuple2 map(Long v) throws Exception {
> return new Tuple2<>(v, (v + 1));
> }
> });
> IterativeDataStream> iteration = edges.iterate();
> SplitDataStream> step = iteration.groupBy(1)
> .map(new MapFunction, Tuple2>() {
> @Override
> public Tuple2 map(Tuple2 tuple) 
> throws Exception {
> return tuple;
> }
> })
> .split(new OutputSelector>() {
> @Override
> public Iterable select(Tuple2 tuple) {
> List output = new ArrayList<>();
> output.add("iterate");
> return output;
> }
> });
> iteration.closeWith(step.select("iterate"));
> env.execute("Sandbox");
> }
> {code}
> Moving the groupBy before the iteration solves the issue. e.g. this works:
> {code}
> ... iteration = edges.groupBy(1).iterate();
> iteration.map(...)
> {code}
> Here is the stack trace:
> {code}
> Exception in thread "main" java.lang.NullPointerException
>   at 
> org.apache.flink.streaming.api.graph.StreamGraph.addIterationTail(StreamGraph.java:207)
>   at 
> org.apache.flink.streaming.api.datastream.IterativeDataStream.closeWith(IterativeDataStream.java:72)
>   at org.apache.flink.graph.streaming.example.Test.(Test.java:73)
>   at org.apache.flink.graph.streaming.example.Test.main(Test.java:79)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:601)
>   at com.intellij.rt.execution.application.AppMain.main(AppMain.java:134)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [hotfix][scala] Let type analysis work on some...

2015-05-11 Thread StephanEwen
Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/660#issuecomment-101000540
  
Does the Scala API not treat Java classes the same way as its own classes?
That sounds absolutely like the right thing to do. Why not do as much 
compile time analysis as possible...

To understand this better, does the Scala type analysis now generate
 - Java Tuple TypeInformation?
 - Does it treat Java POJOs as POJO types?
 - ...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [hotfix][scala] Let type analysis work on some...

2015-05-11 Thread StephanEwen
Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/660#issuecomment-100999172
  
Seems to contain a second fix, to skip static fields in data types.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-951] Reworking of Iteration Synchroniza...

2015-05-11 Thread StephanEwen
Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/570#issuecomment-100996059
  
I had a look at the pull request and I like very much what it tries to do.

The problem right now is that I can hardly say without investing a lot of 
time whether this is in good shape to merge. This pull request does a at least 
two very big things at the same time:
 - Move iteration synchronization to the JobManager
 - Unify aggregators and accumulators into one.

With all the example / testcase adjustments, this becomes a lot to review. 
The description of the pull request also does not make it easy, since many 
questions and decisions that arise are not explained:
 - What interface do the unified aggregators/accumulators follow: The 
aggregators, or the accumulators.
 - How is the blocking superstep synchronization currently done. With actor 
ask?
 - How is the aggregator/accumulator unification achieved, when aggregators 
are created per superstep, and accumulators once?

This is a lot for a very delicate and critical mechanism. I think if we 
want to merge this, we would need more details on how things were changed (what 
is the concept behind the changed, not just what are the code diffs).
We may need to break it into multiple self-contained changes that we can 
individually review and merge, to make sure that it gets properly checked and 
will work robustly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-951) Reworking of Iteration Synchronization, Accumulators and Aggregators

2015-05-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14538269#comment-14538269
 ] 

ASF GitHub Bot commented on FLINK-951:
--

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/570#issuecomment-100996059
  
I had a look at the pull request and I like very much what it tries to do.

The problem right now is that I can hardly say without investing a lot of 
time whether this is in good shape to merge. This pull request does a at least 
two very big things at the same time:
 - Move iteration synchronization to the JobManager
 - Unify aggregators and accumulators into one.

With all the example / testcase adjustments, this becomes a lot to review. 
The description of the pull request also does not make it easy, since many 
questions and decisions that arise are not explained:
 - What interface do the unified aggregators/accumulators follow: The 
aggregators, or the accumulators.
 - How is the blocking superstep synchronization currently done. With actor 
ask?
 - How is the aggregator/accumulator unification achieved, when aggregators 
are created per superstep, and accumulators once?

This is a lot for a very delicate and critical mechanism. I think if we 
want to merge this, we would need more details on how things were changed (what 
is the concept behind the changed, not just what are the code diffs).
We may need to break it into multiple self-contained changes that we can 
individually review and merge, to make sure that it gets properly checked and 
will work robustly.


> Reworking of Iteration Synchronization, Accumulators and Aggregators
> 
>
> Key: FLINK-951
> URL: https://issues.apache.org/jira/browse/FLINK-951
> Project: Flink
>  Issue Type: Improvement
>  Components: Iterations, Optimizer
>Affects Versions: 0.9
>Reporter: Markus Holzemer
>Assignee: Markus Holzemer
>  Labels: refactoring
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> I just realized that there is no real Jira issue for the task I am currently 
> working on. 
> I am currently reworking a few things regarding Iteration Synchronization, 
> Accumulators and Aggregators. Currently the synchronization at the end of one 
> superstep is done through channel events. That makes it hard to track the 
> current status of iterations. That is why I am changing this synchronization 
> to use RPC calls with the JobManager, so that the JobManager manages the 
> current status of all iterations.
> Currently we use Accumulators outside of iterations and Aggregators inside of 
> iterations. Both have a similiar function, but a bit different interfaces and 
> handling. I want to unify these two concepts. I propose that we stick in the 
> future to Accumulators only. Aggregators therefore are removed and 
> Accumulators are extended to cover the usecases Aggregators were used fore 
> before. The switch to RPC for iterations makes it possible to also send the 
> current Accumulator values at the end of each superstep, so that the 
> JobManager (and thereby the webinterface) will be able to print intermediate 
> accumulation results.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1982) Remove dependencies on Record for Flink runtime and core

2015-05-11 Thread Henry Saputra (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14538253#comment-14538253
 ] 

Henry Saputra commented on FLINK-1982:
--

Thanks [~StephanEwen] !

As for terasort test, if we finally got to remove all tests dependency, would 
it be ok to disable the test to be able to remove the Record API?

My preference is to remove deprecated and not preferred APIs to use Flink.

> Remove dependencies on Record for Flink runtime and core
> 
>
> Key: FLINK-1982
> URL: https://issues.apache.org/jira/browse/FLINK-1982
> Project: Flink
>  Issue Type: Sub-task
>  Components: Core
>Reporter: Henry Saputra
>
> Seemed like there are several uses of Record API in core and runtime module 
> that need to be updated before Record API could be removed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1983) Remove dependencies on Record APIs for Spargel

2015-05-11 Thread Stephan Ewen (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14538180#comment-14538180
 ] 

Stephan Ewen commented on FLINK-1983:
-

This seems doable without implications.

> Remove dependencies on Record APIs for Spargel
> --
>
> Key: FLINK-1983
> URL: https://issues.apache.org/jira/browse/FLINK-1983
> Project: Flink
>  Issue Type: Sub-task
>  Components: Spargel
>Reporter: Henry Saputra
>
> Need to remove usage of Record API in Spargel



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1982) Remove dependencies on Record for Flink runtime and core

2015-05-11 Thread Stephan Ewen (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14538156#comment-14538156
 ] 

Stephan Ewen commented on FLINK-1982:
-

I think the runtime and optimizer specializations can be removed with the API 
together in one patch.

The last blocker is probably that some of the runtime tests are implemented in 
the Record API. We would need to migrate some, many can probably be dropped, as 
they are redundant now.

The only test that we cannot port is the terasort test, because the other APIs 
do not yet support range partitioning.


> Remove dependencies on Record for Flink runtime and core
> 
>
> Key: FLINK-1982
> URL: https://issues.apache.org/jira/browse/FLINK-1982
> Project: Flink
>  Issue Type: Sub-task
>  Components: Core
>Reporter: Henry Saputra
>
> Seemed like there are several uses of Record API in core and runtime module 
> that need to be updated before Record API could be removed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-1977] Rework Stream Operators to always...

2015-05-11 Thread aljoscha
Github user aljoscha commented on the pull request:

https://github.com/apache/flink/pull/659#issuecomment-100953248
  
I think I addressed all the (reasonable) comments you made? The user facing 
API does not change in any way from this and I tried to pick consistent names 
for the internal tasks and operators. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-1977) Rework Stream Operators to always be push based

2015-05-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14538077#comment-14538077
 ] 

ASF GitHub Bot commented on FLINK-1977:
---

Github user aljoscha commented on the pull request:

https://github.com/apache/flink/pull/659#issuecomment-100953248
  
I think I addressed all the (reasonable) comments you made? The user facing 
API does not change in any way from this and I tried to pick consistent names 
for the internal tasks and operators. 


> Rework Stream Operators to always be push based
> ---
>
> Key: FLINK-1977
> URL: https://issues.apache.org/jira/browse/FLINK-1977
> Project: Flink
>  Issue Type: Improvement
>Reporter: Aljoscha Krettek
>Assignee: Aljoscha Krettek
>
> This is a result of the discussion on the mailing list. This is an excerpt 
> from the mailing list that gives the basic idea of the change:
> I propose to change all streaming operators to be push based, with a
> slightly improved interface: In addition to collect(), which I would
> call receiveElement() I would add receivePunctuation() and
> receiveBarrier(). The first operator in the chain would also get data
> from the outside invokable that reads from the input iterator and
> calls receiveElement() for the first operator in a chain.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1990) Uppercase "AS" keyword not allowed in select expression

2015-05-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14538069#comment-14538069
 ] 

ASF GitHub Bot commented on FLINK-1990:
---

Github user aljoscha commented on the pull request:

https://github.com/apache/flink/pull/667#issuecomment-100950435
  
I created this issue for the aggregations: 
https://issues.apache.org/jira/browse/FLINK-2000. I think I can only assign you 
if you have a Jira account. Do you have one?


> Uppercase "AS" keyword not allowed in select expression
> ---
>
> Key: FLINK-1990
> URL: https://issues.apache.org/jira/browse/FLINK-1990
> Project: Flink
>  Issue Type: Bug
>  Components: Table API
>Affects Versions: 0.9
>Reporter: Fabian Hueske
>Assignee: Aljoscha Krettek
>Priority: Minor
> Fix For: 0.9
>
>
> Table API select expressions do not allow an uppercase "AS" keyword.
> The following expression fails with an {{ExpressionException}}:
>  {{table.groupBy("request").select("request, request.count AS cnt")}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-1990] [staging table] Support upper cas...

2015-05-11 Thread aljoscha
Github user aljoscha commented on the pull request:

https://github.com/apache/flink/pull/667#issuecomment-100950435
  
I created this issue for the aggregations: 
https://issues.apache.org/jira/browse/FLINK-2000. I think I can only assign you 
if you have a Jira account. Do you have one?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Created] (FLINK-2000) Add SQL-style aggregations for Table API

2015-05-11 Thread Aljoscha Krettek (JIRA)
Aljoscha Krettek created FLINK-2000:
---

 Summary: Add SQL-style aggregations for Table API
 Key: FLINK-2000
 URL: https://issues.apache.org/jira/browse/FLINK-2000
 Project: Flink
  Issue Type: Improvement
Reporter: Aljoscha Krettek
Priority: Minor


Right now, the syntax for aggregations is "a.count, a.min" and so on. We could 
in addition offer "COUNT(a), MIN(a)" and so on.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1990) Uppercase "AS" keyword not allowed in select expression

2015-05-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14538041#comment-14538041
 ] 

ASF GitHub Bot commented on FLINK-1990:
---

Github user chhao01 commented on the pull request:

https://github.com/apache/flink/pull/667#issuecomment-100943527
  
Thanks for the quick response, I was not sure which is suppose to be. Since 
it's case-insensitive, I will update the code with unit test soon.

Yes, I am interested in adding the SQL style aggregations, it will be great 
if you can create the jira and assign to me. :)


> Uppercase "AS" keyword not allowed in select expression
> ---
>
> Key: FLINK-1990
> URL: https://issues.apache.org/jira/browse/FLINK-1990
> Project: Flink
>  Issue Type: Bug
>  Components: Table API
>Affects Versions: 0.9
>Reporter: Fabian Hueske
>Assignee: Aljoscha Krettek
>Priority: Minor
> Fix For: 0.9
>
>
> Table API select expressions do not allow an uppercase "AS" keyword.
> The following expression fails with an {{ExpressionException}}:
>  {{table.groupBy("request").select("request, request.count AS cnt")}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-1990] [staging table] Support upper cas...

2015-05-11 Thread chhao01
Github user chhao01 commented on the pull request:

https://github.com/apache/flink/pull/667#issuecomment-100943527
  
Thanks for the quick response, I was not sure which is suppose to be. Since 
it's case-insensitive, I will update the code with unit test soon.

Yes, I am interested in adding the SQL style aggregations, it will be great 
if you can create the jira and assign to me. :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-1525) Provide utils to pass -D parameters to UDFs

2015-05-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14538035#comment-14538035
 ] 

ASF GitHub Bot commented on FLINK-1525:
---

Github user hsaputra commented on the pull request:

https://github.com/apache/flink/pull/664#issuecomment-100941805
  
Thanks for clarifying, Robert.

I think convention to expect *Util classes to just contain static methods 
instead of being created as an instance.
Maybe we could use ```InputParameters``` as class name to help parse the 
CLI arguments instead?


> Provide utils to pass -D parameters to UDFs 
> 
>
> Key: FLINK-1525
> URL: https://issues.apache.org/jira/browse/FLINK-1525
> Project: Flink
>  Issue Type: Improvement
>  Components: flink-contrib
>Reporter: Robert Metzger
>  Labels: starter
>
> Hadoop users are used to setting job configuration through "-D" on the 
> command line.
> Right now, Flink users have to manually parse command line arguments and pass 
> them to the methods.
> It would be nice to provide a standard args parser with is taking care of 
> such stuff.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-1525][FEEDBACK] Introduction of a small...

2015-05-11 Thread hsaputra
Github user hsaputra commented on the pull request:

https://github.com/apache/flink/pull/664#issuecomment-100941805
  
Thanks for clarifying, Robert.

I think convention to expect *Util classes to just contain static methods 
instead of being created as an instance.
Maybe we could use ```InputParameters``` as class name to help parse the 
CLI arguments instead?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1990] [staging table] Support upper cas...

2015-05-11 Thread aljoscha
Github user aljoscha commented on the pull request:

https://github.com/apache/flink/pull/667#issuecomment-100940455
  
Thanks for your contribution!

Could you maybe enhance it along these lines: 
http://stackoverflow.com/questions/6080437/case-insensitive-scala-parser-combinator

Because right now it would only support as and AS, but not As or aS. I know 
the latter is a rather academic example, but still...

Would you also be interested in adding SQL style aggregations, for example 
COUNT(field), MAX(field) and so on? I could open a Jira for this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-1990) Uppercase "AS" keyword not allowed in select expression

2015-05-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14538026#comment-14538026
 ] 

ASF GitHub Bot commented on FLINK-1990:
---

Github user aljoscha commented on the pull request:

https://github.com/apache/flink/pull/667#issuecomment-100940455
  
Thanks for your contribution!

Could you maybe enhance it along these lines: 
http://stackoverflow.com/questions/6080437/case-insensitive-scala-parser-combinator

Because right now it would only support as and AS, but not As or aS. I know 
the latter is a rather academic example, but still...

Would you also be interested in adding SQL style aggregations, for example 
COUNT(field), MAX(field) and so on? I could open a Jira for this.


> Uppercase "AS" keyword not allowed in select expression
> ---
>
> Key: FLINK-1990
> URL: https://issues.apache.org/jira/browse/FLINK-1990
> Project: Flink
>  Issue Type: Bug
>  Components: Table API
>Affects Versions: 0.9
>Reporter: Fabian Hueske
>Assignee: Aljoscha Krettek
>Priority: Minor
> Fix For: 0.9
>
>
> Table API select expressions do not allow an uppercase "AS" keyword.
> The following expression fails with an {{ExpressionException}}:
>  {{table.groupBy("request").select("request, request.count AS cnt")}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1962) Add Gelly Scala API

2015-05-11 Thread Aljoscha Krettek (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14538018#comment-14538018
 ] 

Aljoscha Krettek commented on FLINK-1962:
-

Yes, because the Scala type analysis does not recognise the Java tuples as 
Tuples.

> Add Gelly Scala API
> ---
>
> Key: FLINK-1962
> URL: https://issues.apache.org/jira/browse/FLINK-1962
> Project: Flink
>  Issue Type: Task
>  Components: Gelly, Scala API
>Affects Versions: 0.9
>Reporter: Vasia Kalavri
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1990) Uppercase "AS" keyword not allowed in select expression

2015-05-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14538015#comment-14538015
 ] 

ASF GitHub Bot commented on FLINK-1990:
---

GitHub user chhao01 opened a pull request:

https://github.com/apache/flink/pull/667

[FLINK-1990] [staging table] Support upper case of `as` for expression



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/chhao01/flink as

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/667.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #667


commit bd1d059440a23e4b2a725946716f580e168c2a7b
Author: Cheng Hao 
Date:   2015-05-11T14:57:14Z

support upper case of 'as' for expression




> Uppercase "AS" keyword not allowed in select expression
> ---
>
> Key: FLINK-1990
> URL: https://issues.apache.org/jira/browse/FLINK-1990
> Project: Flink
>  Issue Type: Bug
>  Components: Table API
>Affects Versions: 0.9
>Reporter: Fabian Hueske
>Assignee: Aljoscha Krettek
>Priority: Minor
> Fix For: 0.9
>
>
> Table API select expressions do not allow an uppercase "AS" keyword.
> The following expression fails with an {{ExpressionException}}:
>  {{table.groupBy("request").select("request, request.count AS cnt")}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-1990] [staging table] Support upper cas...

2015-05-11 Thread chhao01
GitHub user chhao01 opened a pull request:

https://github.com/apache/flink/pull/667

[FLINK-1990] [staging table] Support upper case of `as` for expression



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/chhao01/flink as

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/667.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #667


commit bd1d059440a23e4b2a725946716f580e168c2a7b
Author: Cheng Hao 
Date:   2015-05-11T14:57:14Z

support upper case of 'as' for expression




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1525][FEEDBACK] Introduction of a small...

2015-05-11 Thread hsaputra
Github user hsaputra commented on a diff in the pull request:

https://github.com/apache/flink/pull/664#discussion_r30046159
  
--- Diff: 
flink-examples/flink-java-examples/src/main/java/org/apache/flink/examples/java/wordcount/PojoExample.java
 ---
@@ -26,6 +26,7 @@
 import org.apache.flink.util.Collector;
 
 
+
--- End diff --

Extra new line?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-1525) Provide utils to pass -D parameters to UDFs

2015-05-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14538004#comment-14538004
 ] 

ASF GitHub Bot commented on FLINK-1525:
---

Github user hsaputra commented on a diff in the pull request:

https://github.com/apache/flink/pull/664#discussion_r30046159
  
--- Diff: 
flink-examples/flink-java-examples/src/main/java/org/apache/flink/examples/java/wordcount/PojoExample.java
 ---
@@ -26,6 +26,7 @@
 import org.apache.flink.util.Collector;
 
 
+
--- End diff --

Extra new line?


> Provide utils to pass -D parameters to UDFs 
> 
>
> Key: FLINK-1525
> URL: https://issues.apache.org/jira/browse/FLINK-1525
> Project: Flink
>  Issue Type: Improvement
>  Components: flink-contrib
>Reporter: Robert Metzger
>  Labels: starter
>
> Hadoop users are used to setting job configuration through "-D" on the 
> command line.
> Right now, Flink users have to manually parse command line arguments and pass 
> them to the methods.
> It would be nice to provide a standard args parser with is taking care of 
> such stuff.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-1525][FEEDBACK] Introduction of a small...

2015-05-11 Thread aljoscha
Github user aljoscha commented on the pull request:

https://github.com/apache/flink/pull/664#issuecomment-100926127
  
Yes, something like this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-1962) Add Gelly Scala API

2015-05-11 Thread Vasia Kalavri (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14537998#comment-14537998
 ] 

Vasia Kalavri commented on FLINK-1962:
--

We chose Tuples for performance and convenience I guess. We haven't really seen 
this choice as limiting the model in any case. Is this a problem for the Scala 
API?

> Add Gelly Scala API
> ---
>
> Key: FLINK-1962
> URL: https://issues.apache.org/jira/browse/FLINK-1962
> Project: Flink
>  Issue Type: Task
>  Components: Gelly, Scala API
>Affects Versions: 0.9
>Reporter: Vasia Kalavri
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (FLINK-1990) Uppercase "AS" keyword not allowed in select expression

2015-05-11 Thread Cheng Hao (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14537997#comment-14537997
 ] 

Cheng Hao edited comment on FLINK-1990 at 5/11/15 2:50 PM:
---

What about the aggregate functions? e.g. sum / SUM ?


was (Author: chhao01):
What if the aggregate functions? e.g. sum / SUM ?

> Uppercase "AS" keyword not allowed in select expression
> ---
>
> Key: FLINK-1990
> URL: https://issues.apache.org/jira/browse/FLINK-1990
> Project: Flink
>  Issue Type: Bug
>  Components: Table API
>Affects Versions: 0.9
>Reporter: Fabian Hueske
>Assignee: Aljoscha Krettek
>Priority: Minor
> Fix For: 0.9
>
>
> Table API select expressions do not allow an uppercase "AS" keyword.
> The following expression fails with an {{ExpressionException}}:
>  {{table.groupBy("request").select("request, request.count AS cnt")}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1990) Uppercase "AS" keyword not allowed in select expression

2015-05-11 Thread Cheng Hao (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14537997#comment-14537997
 ] 

Cheng Hao commented on FLINK-1990:
--

What if the aggregate functions? e.g. sum / SUM ?

> Uppercase "AS" keyword not allowed in select expression
> ---
>
> Key: FLINK-1990
> URL: https://issues.apache.org/jira/browse/FLINK-1990
> Project: Flink
>  Issue Type: Bug
>  Components: Table API
>Affects Versions: 0.9
>Reporter: Fabian Hueske
>Assignee: Aljoscha Krettek
>Priority: Minor
> Fix For: 0.9
>
>
> Table API select expressions do not allow an uppercase "AS" keyword.
> The following expression fails with an {{ExpressionException}}:
>  {{table.groupBy("request").select("request, request.count AS cnt")}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1525) Provide utils to pass -D parameters to UDFs

2015-05-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14537984#comment-14537984
 ] 

ASF GitHub Bot commented on FLINK-1525:
---

Github user aljoscha commented on the pull request:

https://github.com/apache/flink/pull/664#issuecomment-100926127
  
Yes, something like this.


> Provide utils to pass -D parameters to UDFs 
> 
>
> Key: FLINK-1525
> URL: https://issues.apache.org/jira/browse/FLINK-1525
> Project: Flink
>  Issue Type: Improvement
>  Components: flink-contrib
>Reporter: Robert Metzger
>  Labels: starter
>
> Hadoop users are used to setting job configuration through "-D" on the 
> command line.
> Right now, Flink users have to manually parse command line arguments and pass 
> them to the methods.
> It would be nice to provide a standard args parser with is taking care of 
> such stuff.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1525) Provide utils to pass -D parameters to UDFs

2015-05-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14537975#comment-14537975
 ] 

ASF GitHub Bot commented on FLINK-1525:
---

Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/664#issuecomment-100923321
  
@aljoscha:

Is this the API you were thinking of?
```java
RequiredParameters required = new RequiredParameters();
Option input = required.add("input").alt("i").description("Path to input 
file or directory"); // parameter with long and short variant
required.add("output"); // parameter only with long variant
Option parallelism = 
required.add("parallelism").alt("p").type(Integer.class); // parameter with type
Option spOption = 
required.add("sourceParallelism").alt("sp").defaultValue(12).description("Number
 specifying the number of parallel data source instances"); // parameter with 
default value, specifying the type.

ParameterUtil parameter = ParameterUtil.fromArgs(new String[]{"-i", 
"someinput", "--output", "someout", "-p", "15"});

required.check(parameter);
required.printHelp();
required.checkAndPopulate(parameter);

String inputString = input.get();
int par = parallelism.getInteger();
String output = parameter.get("output");
int sourcePar = parameter.getInteger(spOption.getName());
```


> Provide utils to pass -D parameters to UDFs 
> 
>
> Key: FLINK-1525
> URL: https://issues.apache.org/jira/browse/FLINK-1525
> Project: Flink
>  Issue Type: Improvement
>  Components: flink-contrib
>Reporter: Robert Metzger
>  Labels: starter
>
> Hadoop users are used to setting job configuration through "-D" on the 
> command line.
> Right now, Flink users have to manually parse command line arguments and pass 
> them to the methods.
> It would be nice to provide a standard args parser with is taking care of 
> such stuff.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-1525][FEEDBACK] Introduction of a small...

2015-05-11 Thread rmetzger
Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/664#issuecomment-100923321
  
@aljoscha:

Is this the API you were thinking of?
```java
RequiredParameters required = new RequiredParameters();
Option input = required.add("input").alt("i").description("Path to input 
file or directory"); // parameter with long and short variant
required.add("output"); // parameter only with long variant
Option parallelism = 
required.add("parallelism").alt("p").type(Integer.class); // parameter with type
Option spOption = 
required.add("sourceParallelism").alt("sp").defaultValue(12).description("Number
 specifying the number of parallel data source instances"); // parameter with 
default value, specifying the type.

ParameterUtil parameter = ParameterUtil.fromArgs(new String[]{"-i", 
"someinput", "--output", "someout", "-p", "15"});

required.check(parameter);
required.printHelp();
required.checkAndPopulate(parameter);

String inputString = input.get();
int par = parallelism.getInteger();
String output = parameter.get("output");
int sourcePar = parameter.getInteger(spOption.getName());
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-1525) Provide utils to pass -D parameters to UDFs

2015-05-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14537958#comment-14537958
 ] 

ASF GitHub Bot commented on FLINK-1525:
---

Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/664#issuecomment-100917520
  
>>If this tool also supported positional arguments ...
>
>I also though about adding those, but decided against it, because args[n] 
is already a way of > accessing the arguments by their position ;)
> But I'll add it so that users can also specify default values .. and we 
take care of the parsing.

I'm not so sure about this anymore. Positional arguments would break a lot 
in the design of the  `ParameterUtil`.
For example the export to the web interface, the Configuration object or 
Properties.


> Provide utils to pass -D parameters to UDFs 
> 
>
> Key: FLINK-1525
> URL: https://issues.apache.org/jira/browse/FLINK-1525
> Project: Flink
>  Issue Type: Improvement
>  Components: flink-contrib
>Reporter: Robert Metzger
>  Labels: starter
>
> Hadoop users are used to setting job configuration through "-D" on the 
> command line.
> Right now, Flink users have to manually parse command line arguments and pass 
> them to the methods.
> It would be nice to provide a standard args parser with is taking care of 
> such stuff.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-1525][FEEDBACK] Introduction of a small...

2015-05-11 Thread rmetzger
Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/664#issuecomment-100917520
  
>>If this tool also supported positional arguments ...
>
>I also though about adding those, but decided against it, because args[n] 
is already a way of > accessing the arguments by their position ;)
> But I'll add it so that users can also specify default values .. and we 
take care of the parsing.

I'm not so sure about this anymore. Positional arguments would break a lot 
in the design of the  `ParameterUtil`.
For example the export to the web interface, the Configuration object or 
Properties.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-1962) Add Gelly Scala API

2015-05-11 Thread Aljoscha Krettek (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14537938#comment-14537938
 ] 

Aljoscha Krettek commented on FLINK-1962:
-

Yes, this is correct, because the Java Tuples classes look like any other 
classes to the Scala type analysis component.

[~vkalavri] The Graph API can only work with Java Tuples? Could it not work 
with more generic types?

> Add Gelly Scala API
> ---
>
> Key: FLINK-1962
> URL: https://issues.apache.org/jira/browse/FLINK-1962
> Project: Flink
>  Issue Type: Task
>  Components: Gelly, Scala API
>Affects Versions: 0.9
>Reporter: Vasia Kalavri
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1964) Rework TwitterSource to use a Properties object instead of a file path

2015-05-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14537922#comment-14537922
 ] 

ASF GitHub Bot commented on FLINK-1964:
---

Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/666#issuecomment-100902145
  
Hey,
thank you for working on this & opening a pull request.

I'll address your comments inline:

> put the default value for queueSize attribute in a constant (we could 
call it, DEFAULT_QUEUE_SIZE)
> put the default value for waitSec attribute in a constant (we could call 
it, DEFAULT_WAIT_SECONDS)

Good idea, please do so. Could you also make these values configurable for 
our users? Nobody wants to recompile Flink just for changing an integer ;) 

>i see a couple of methods, that seems to work in paris such as:
>open / close: called at the begining of the function and at the very end 
of it
>run / cancel: called on each iteration to perform some work or cancel the 
current ongoing work
>Is my understanding correct? I am asking since i see that the 
TwitterSource is initializing its client connection in the open(...) method but 
stops it in the run(...) method. Do those methods works in pairs?

The way we call sources in Flink streaming is a bit broken, thats why we 
are reworking it right now. Thats the pull request where we change it: 
https://github.com/apache/flink/pull/659
But you are right, open() and close() work in pairs. Connections we are 
opening in the open() method should be closed in close().
I would leave that for now as it is, because we are going to change that in 
the next days anyways.

> do you think it is a good idea to assign the client to null after 
stopping it in the closeConnection method, so it can be garbage collected as 
soon as possible? Also, if closeConnection() throws an exception, isRunning 
never changes to false.

Yes, you can set it to null.

> we should check for preconditions in the constructors and public set 
methods to avoid receiving null auth propeties or similar cases.
> we could have a private constructor that receives all the possible 
parameters and use constructor chaining to avoid code duplication even in the 
constructors.

These are both very good ideas!

> I have created a util file to place methods when working with properties 
files, i put the load method that loads a properties object form a file path, 
we should place this in a more common package, i put it close to the file i 
have changed but we should consider moving it to a better place.

Maybe we can combine it with this one: 
https://github.com/apache/flink/pull/664

> Separate the different connectors into different submodules.

There is already a JIRA filed for this: 
https://issues.apache.org/jira/browse/FLINK-1874
But I would suggest to wait for a few days until 
https://github.com/apache/flink/pull/659 is merged, otherwise, there are going 
to be a lot of merge conflicts.


In general: Feel free to change whatever is necessary to make the code 
better (also the missing javadocs). Your suggestions are all very good. I see 
that you have a good sense of writing high quality code!



> Rework TwitterSource to use a Properties object instead of a file path
> --
>
> Key: FLINK-1964
> URL: https://issues.apache.org/jira/browse/FLINK-1964
> Project: Flink
>  Issue Type: Bug
>  Components: Streaming
>Affects Versions: 0.9
>Reporter: Robert Metzger
>Assignee: Carlos Curotto
>Priority: Minor
>  Labels: starter
>
> The twitter connector is very hard to use on a cluster because it expects the 
> property file to be present on all nodes.
> It would be much easier to ask the user to pass a Properties object 
> immediately.
> Also, the javadoc of the class stops in the middle of the sentence.
> It was not obvious to me how the two examples TwitterStreaming and 
> TwitterTopology differ. Also, there is a third TwitterStream example in the 
> streaming examples.
> The documentation of the Twitter source refers to the non existent 
> TwitterLocal class.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-1964] Rework TwitterSource to use a Pro...

2015-05-11 Thread rmetzger
Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/666#issuecomment-100902145
  
Hey,
thank you for working on this & opening a pull request.

I'll address your comments inline:

> put the default value for queueSize attribute in a constant (we could 
call it, DEFAULT_QUEUE_SIZE)
> put the default value for waitSec attribute in a constant (we could call 
it, DEFAULT_WAIT_SECONDS)

Good idea, please do so. Could you also make these values configurable for 
our users? Nobody wants to recompile Flink just for changing an integer ;) 

>i see a couple of methods, that seems to work in paris such as:
>open / close: called at the begining of the function and at the very end 
of it
>run / cancel: called on each iteration to perform some work or cancel the 
current ongoing work
>Is my understanding correct? I am asking since i see that the 
TwitterSource is initializing its client connection in the open(...) method but 
stops it in the run(...) method. Do those methods works in pairs?

The way we call sources in Flink streaming is a bit broken, thats why we 
are reworking it right now. Thats the pull request where we change it: 
https://github.com/apache/flink/pull/659
But you are right, open() and close() work in pairs. Connections we are 
opening in the open() method should be closed in close().
I would leave that for now as it is, because we are going to change that in 
the next days anyways.

> do you think it is a good idea to assign the client to null after 
stopping it in the closeConnection method, so it can be garbage collected as 
soon as possible? Also, if closeConnection() throws an exception, isRunning 
never changes to false.

Yes, you can set it to null.

> we should check for preconditions in the constructors and public set 
methods to avoid receiving null auth propeties or similar cases.
> we could have a private constructor that receives all the possible 
parameters and use constructor chaining to avoid code duplication even in the 
constructors.

These are both very good ideas!

> I have created a util file to place methods when working with properties 
files, i put the load method that loads a properties object form a file path, 
we should place this in a more common package, i put it close to the file i 
have changed but we should consider moving it to a better place.

Maybe we can combine it with this one: 
https://github.com/apache/flink/pull/664

> Separate the different connectors into different submodules.

There is already a JIRA filed for this: 
https://issues.apache.org/jira/browse/FLINK-1874
But I would suggest to wait for a few days until 
https://github.com/apache/flink/pull/659 is merged, otherwise, there are going 
to be a lot of merge conflicts.


In general: Feel free to change whatever is necessary to make the code 
better (also the missing javadocs). Your suggestions are all very good. I see 
that you have a good sense of writing high quality code!



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Updated] (FLINK-1874) Break up streaming connectors into submodules

2015-05-11 Thread Robert Metzger (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-1874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Metzger updated FLINK-1874:
--
Component/s: Streaming

> Break up streaming connectors into submodules
> -
>
> Key: FLINK-1874
> URL: https://issues.apache.org/jira/browse/FLINK-1874
> Project: Flink
>  Issue Type: Task
>  Components: Build System, Streaming
>Affects Versions: 0.9
>Reporter: Robert Metzger
>  Labels: starter
>
> As per: 
> http://apache-flink-incubator-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-Break-up-streaming-connectors-into-subprojects-td5001.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (FLINK-1537) GSoC project: Machine learning with Apache Flink

2015-05-11 Thread Robert Metzger (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Metzger updated FLINK-1537:
--
Component/s: Machine Learning Library

> GSoC project: Machine learning with Apache Flink
> 
>
> Key: FLINK-1537
> URL: https://issues.apache.org/jira/browse/FLINK-1537
> Project: Flink
>  Issue Type: New Feature
>  Components: Machine Learning Library
>Reporter: Till Rohrmann
>Priority: Minor
>  Labels: gsoc2015, java, machine_learning, scala
>
> Currently, the Flink community is setting up the infrastructure for a machine 
> learning library for Flink. The goal is to provide a set of highly optimized 
> ML algorithms and to offer a high level linear algebra abstraction to easily 
> do data pre- and post-processing. By defining a set of commonly used data 
> structures on which the algorithms work it will be possible to define complex 
> processing pipelines. 
> The Mahout DSL constitutes a good fit to be used as the linear algebra 
> language in Flink. It has to be evaluated which means have to be provided to 
> allow an easy transition between the high level abstraction and the optimized 
> algorithms.
> The machine learning library offers multiple starting points for a GSoC 
> project. Amongst others, the following projects are conceivable.
> * Extension of Flink's machine learning library by additional ML algorithms
> ** Stochastic gradient descent
> ** Distributed dual coordinate ascent
> ** SVM
> ** Gaussian mixture EM
> ** DecisionTrees
> ** ...
> * Integration of Flink with the Mahout DSL to support a high level linear 
> algebra abstraction
> * Integration of H2O with Flink to benefit from H2O's sophisticated machine 
> learning algorithms
> * Implementation of a parameter server like distributed global state storage 
> facility for Flink. This also includes the extension of Flink to support 
> asynchronous iterations and update messages.
> Own ideas for a possible contribution on the field of the machine learning 
> library are highly welcome.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-1525][FEEDBACK] Introduction of a small...

2015-05-11 Thread mxm
Github user mxm commented on the pull request:

https://github.com/apache/flink/pull/664#issuecomment-100897320
  
Thanks for clarifying. I agree that the parser in Apache Commons is not the 
nicest...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-1525) Provide utils to pass -D parameters to UDFs

2015-05-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14537909#comment-14537909
 ] 

ASF GitHub Bot commented on FLINK-1525:
---

Github user mxm commented on the pull request:

https://github.com/apache/flink/pull/664#issuecomment-100897320
  
Thanks for clarifying. I agree that the parser in Apache Commons is not the 
nicest...


> Provide utils to pass -D parameters to UDFs 
> 
>
> Key: FLINK-1525
> URL: https://issues.apache.org/jira/browse/FLINK-1525
> Project: Flink
>  Issue Type: Improvement
>  Components: flink-contrib
>Reporter: Robert Metzger
>  Labels: starter
>
> Hadoop users are used to setting job configuration through "-D" on the 
> command line.
> Right now, Flink users have to manually parse command line arguments and pass 
> them to the methods.
> It would be nice to provide a standard args parser with is taking care of 
> such stuff.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1103) Update Streaming examples to become self-contained

2015-05-11 Thread Robert Metzger (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14537896#comment-14537896
 ] 

Robert Metzger commented on FLINK-1103:
---

The data has been replaced by something we can use from a legal perspective in 
http://git-wip-us.apache.org/repos/asf/flink/commit/8ea840e2

> Update Streaming examples to become self-contained
> --
>
> Key: FLINK-1103
> URL: https://issues.apache.org/jira/browse/FLINK-1103
> Project: Flink
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 0.7.0-incubating
>Reporter: Márton Balassi
>Assignee: Márton Balassi
>
> Streaming examples do not follow the standard set by the recent examples 
> refactor of the batch API.
> TestDataUtil should be removed and Object[][] used to contain the example 
> data.
> Comments are also lacking in comparison with the batch counterpart.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1999) TF-IDF transformer

2015-05-11 Thread Felix Neutatz (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14537888#comment-14537888
 ] 

Felix Neutatz commented on FLINK-1999:
--

When I understand it correctly, it is like this:

Document1: Hello world
Document2: This is awesome


val data: Seq[Seq[String]] = Seq(
"Hello world".split(" ").toSeq,
"This is awesome".split(" ").toSeq
  )

val dataSet: DataSet[Seq[String]] = env.fromCollection(data)


> TF-IDF transformer
> --
>
> Key: FLINK-1999
> URL: https://issues.apache.org/jira/browse/FLINK-1999
> Project: Flink
>  Issue Type: New Feature
>  Components: Machine Learning Library
>Reporter: Ronny Bräunlich
>Assignee: Alexander Alexandrov
>Priority: Minor
>  Labels: ML
>
> Hello everybody,
> we are a group of three students from TU Berlin (I guess we're not the first 
> group creating an issue) and we want to/have to implement a tf-idf tranformer 
> for Flink.
> Our lecturer Alexander told us that we could get some guidance here and that 
> you could point us to an old version of a similar tranformer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1807) Stochastic gradient descent optimizer for ML library

2015-05-11 Thread Theodore Vasiloudis (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14537886#comment-14537886
 ] 

Theodore Vasiloudis commented on FLINK-1807:


[~till.rohrmann] You think we can close this, or should we leave it open until 
we sampling in place? 

> Stochastic gradient descent optimizer for ML library
> 
>
> Key: FLINK-1807
> URL: https://issues.apache.org/jira/browse/FLINK-1807
> Project: Flink
>  Issue Type: Improvement
>  Components: Machine Learning Library
>Reporter: Till Rohrmann
>Assignee: Theodore Vasiloudis
>  Labels: ML
>
> Stochastic gradient descent (SGD) is a widely used optimization technique in 
> different ML algorithms. Thus, it would be helpful to provide a generalized 
> SGD implementation which can be instantiated with the respective gradient 
> computation. Such a building block would make the development of future 
> algorithms easier.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1525) Provide utils to pass -D parameters to UDFs

2015-05-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14537880#comment-14537880
 ] 

ASF GitHub Bot commented on FLINK-1525:
---

Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/664#issuecomment-100891653
  
Thank you.
I didn't reuse the parser there because there is an ongoing JIRA to unify 
all the command line parsers inside Flink. It seems that we are using at least 
two different libraries .. and the consensus in the JIRA seems to be using a 
third one to solve the problem.
But I didn't want to add more confusion to the topic than we already have.



> Provide utils to pass -D parameters to UDFs 
> 
>
> Key: FLINK-1525
> URL: https://issues.apache.org/jira/browse/FLINK-1525
> Project: Flink
>  Issue Type: Improvement
>  Components: flink-contrib
>Reporter: Robert Metzger
>  Labels: starter
>
> Hadoop users are used to setting job configuration through "-D" on the 
> command line.
> Right now, Flink users have to manually parse command line arguments and pass 
> them to the methods.
> It would be nice to provide a standard args parser with is taking care of 
> such stuff.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-1525][FEEDBACK] Introduction of a small...

2015-05-11 Thread rmetzger
Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/664#issuecomment-100891653
  
Thank you.
I didn't reuse the parser there because there is an ongoing JIRA to unify 
all the command line parsers inside Flink. It seems that we are using at least 
two different libraries .. and the consensus in the JIRA seems to be using a 
third one to solve the problem.
But I didn't want to add more confusion to the topic than we already have.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1525][FEEDBACK] Introduction of a small...

2015-05-11 Thread mxm
Github user mxm commented on the pull request:

https://github.com/apache/flink/pull/664#issuecomment-100891261
  
Very helpful utility. I think it's worth adapting all the examples if we 
merge this. Removes a lot of unnecessary code and makes the examples more 
readable. 

May I ask why you didn't reuse the Parser in `org.apache.commons.cli`? Too 
much overhead?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-1525) Provide utils to pass -D parameters to UDFs

2015-05-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14537878#comment-14537878
 ] 

ASF GitHub Bot commented on FLINK-1525:
---

Github user mxm commented on the pull request:

https://github.com/apache/flink/pull/664#issuecomment-100891261
  
Very helpful utility. I think it's worth adapting all the examples if we 
merge this. Removes a lot of unnecessary code and makes the examples more 
readable. 

May I ask why you didn't reuse the Parser in `org.apache.commons.cli`? Too 
much overhead?


> Provide utils to pass -D parameters to UDFs 
> 
>
> Key: FLINK-1525
> URL: https://issues.apache.org/jira/browse/FLINK-1525
> Project: Flink
>  Issue Type: Improvement
>  Components: flink-contrib
>Reporter: Robert Metzger
>  Labels: starter
>
> Hadoop users are used to setting job configuration through "-D" on the 
> command line.
> Right now, Flink users have to manually parse command line arguments and pass 
> them to the methods.
> It would be nice to provide a standard args parser with is taking care of 
> such stuff.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (FLINK-1959) Accumulators BROKEN after Partitioning

2015-05-11 Thread Alexander Alexandrov (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-1959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexander Alexandrov updated FLINK-1959:

Affects Version/s: (was: 0.8.1)
   master
Fix Version/s: (was: 0.8.1)
   master

> Accumulators BROKEN after Partitioning
> --
>
> Key: FLINK-1959
> URL: https://issues.apache.org/jira/browse/FLINK-1959
> Project: Flink
>  Issue Type: Bug
>  Components: Examples
>Affects Versions: master
>Reporter: mustafa elbehery
>Priority: Critical
> Fix For: master
>
>
> while running the Accumulator example in 
> https://github.com/Elbehery/flink/blob/master/flink-examples/flink-java-examples/src/main/java/org/apache/flink/examples/java/relational/EmptyFieldsCountAccumulator.java,
>  
> I tried to alter the data flow with "PartitionByHash" function before 
> applying "Filter", and the resulted accumulator was NULL. 
> By Debugging, I could see the accumulator in the RunTime Map. However, by 
> retrieving the accumulator from the JobExecutionResult object, it was NULL. 
> The line caused the problem is "file.partitionByHash(1).filter(new 
> EmptyFieldFilter())" instead of "file.filter(new EmptyFieldFilter())"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-1525][FEEDBACK] Introduction of a small...

2015-05-11 Thread aljoscha
Github user aljoscha commented on the pull request:

https://github.com/apache/flink/pull/664#issuecomment-100880185
  
I like it. But I think it needs some functionality for verifying 
parameters. To let the user specify some parameters that always need to be 
there and a description of the parameter. Similar to how other tools print the 
"usage" when you don't give correct arguments.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-1979) Implement Loss Functions

2015-05-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14537855#comment-14537855
 ] 

ASF GitHub Bot commented on FLINK-1979:
---

Github user thvasilo commented on the pull request:

https://github.com/apache/flink/pull/656#issuecomment-100884364
  
Thank you Johaness. The optimization code has been merged to the master 
now, so could you rebase your branch to the latest master so we can look at the 
changes in an isolated way?

You can take a look at the [How to 
contibute](http://flink.apache.org/how-to-contribute.html#contributing-code-&-documentation)
 guide on how to do this. The merges you have currently make it hard to review 
the code.

Also, please make sure all your classes have docstrings, you can take the 
docstring for SquaredLoss as an example (i.e. one sentence is usually enough).

Documentation is always welcome of course, so if you want to add some more 
details to the loss functions section of the ML documentation 
(docs/libs/ml/optimization.md) feel free to do so in this PR.

Let me know if you run into any problems.


> Implement Loss Functions
> 
>
> Key: FLINK-1979
> URL: https://issues.apache.org/jira/browse/FLINK-1979
> Project: Flink
>  Issue Type: Improvement
>  Components: Machine Learning Library
>Reporter: Johannes Günther
>Assignee: Johannes Günther
>Priority: Minor
>  Labels: ML
>
> For convex optimization problems, optimizer methods like SGD rely on a 
> pluggable implementation of a loss function and its first derivative.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-1979] Lossfunctions

2015-05-11 Thread thvasilo
Github user thvasilo commented on the pull request:

https://github.com/apache/flink/pull/656#issuecomment-100884364
  
Thank you Johaness. The optimization code has been merged to the master 
now, so could you rebase your branch to the latest master so we can look at the 
changes in an isolated way?

You can take a look at the [How to 
contibute](http://flink.apache.org/how-to-contribute.html#contributing-code-&-documentation)
 guide on how to do this. The merges you have currently make it hard to review 
the code.

Also, please make sure all your classes have docstrings, you can take the 
docstring for SquaredLoss as an example (i.e. one sentence is usually enough).

Documentation is always welcome of course, so if you want to add some more 
details to the loss functions section of the ML documentation 
(docs/libs/ml/optimization.md) feel free to do so in this PR.

Let me know if you run into any problems.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-1999) TF-IDF transformer

2015-05-11 Thread Alexander Alexandrov (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14537853#comment-14537853
 ] 

Alexander Alexandrov commented on FLINK-1999:
-

What does a Seq[String] represent?

> TF-IDF transformer
> --
>
> Key: FLINK-1999
> URL: https://issues.apache.org/jira/browse/FLINK-1999
> Project: Flink
>  Issue Type: New Feature
>  Components: Machine Learning Library
>Reporter: Ronny Bräunlich
>Assignee: Alexander Alexandrov
>Priority: Minor
>  Labels: ML
>
> Hello everybody,
> we are a group of three students from TU Berlin (I guess we're not the first 
> group creating an issue) and we want to/have to implement a tf-idf tranformer 
> for Flink.
> Our lecturer Alexander told us that we could get some guidance here and that 
> you could point us to an old version of a similar tranformer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-1525][FEEDBACK] Introduction of a small...

2015-05-11 Thread rmetzger
Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/664#issuecomment-100882340
  
I agree, however it should be optional. I don't like these tools where you 
spend a lot of time registering / specifying arguments.
People want to analyze their data, not configure a huge parameter parsing 
framework ;)
But I'm going to look into this and see how much work it would be to 
implement it. If it blocks me from getting this PR merged soon, I'll file a 
follow-up JIRA.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-1525) Provide utils to pass -D parameters to UDFs

2015-05-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14537847#comment-14537847
 ] 

ASF GitHub Bot commented on FLINK-1525:
---

Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/664#issuecomment-100882340
  
I agree, however it should be optional. I don't like these tools where you 
spend a lot of time registering / specifying arguments.
People want to analyze their data, not configure a huge parameter parsing 
framework ;)
But I'm going to look into this and see how much work it would be to 
implement it. If it blocks me from getting this PR merged soon, I'll file a 
follow-up JIRA.


> Provide utils to pass -D parameters to UDFs 
> 
>
> Key: FLINK-1525
> URL: https://issues.apache.org/jira/browse/FLINK-1525
> Project: Flink
>  Issue Type: Improvement
>  Components: flink-contrib
>Reporter: Robert Metzger
>  Labels: starter
>
> Hadoop users are used to setting job configuration through "-D" on the 
> command line.
> Right now, Flink users have to manually parse command line arguments and pass 
> them to the methods.
> It would be nice to provide a standard args parser with is taking care of 
> such stuff.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1525) Provide utils to pass -D parameters to UDFs

2015-05-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14537842#comment-14537842
 ] 

ASF GitHub Bot commented on FLINK-1525:
---

Github user aljoscha commented on the pull request:

https://github.com/apache/flink/pull/664#issuecomment-100880185
  
I like it. But I think it needs some functionality for verifying 
parameters. To let the user specify some parameters that always need to be 
there and a description of the parameter. Similar to how other tools print the 
"usage" when you don't give correct arguments.


> Provide utils to pass -D parameters to UDFs 
> 
>
> Key: FLINK-1525
> URL: https://issues.apache.org/jira/browse/FLINK-1525
> Project: Flink
>  Issue Type: Improvement
>  Components: flink-contrib
>Reporter: Robert Metzger
>  Labels: starter
>
> Hadoop users are used to setting job configuration through "-D" on the 
> command line.
> Right now, Flink users have to manually parse command line arguments and pass 
> them to the methods.
> It would be nice to provide a standard args parser with is taking care of 
> such stuff.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1959) Accumulators BROKEN after Partitioning

2015-05-11 Thread Alexander Alexandrov (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14537845#comment-14537845
 ] 

Alexander Alexandrov commented on FLINK-1959:
-

I think that some communication / traversal chain gets broken in the 
ParitionByHash node.

You can either 

(1) try to dig through the code and see where this happens, or
(2) use an alternative to the accumulator until the issue is resolved (e.g. 
write the information to a pre-defined HDFS path); 

> Accumulators BROKEN after Partitioning
> --
>
> Key: FLINK-1959
> URL: https://issues.apache.org/jira/browse/FLINK-1959
> Project: Flink
>  Issue Type: Bug
>  Components: Examples
>Affects Versions: 0.8.1
>Reporter: mustafa elbehery
>Priority: Critical
> Fix For: 0.8.1
>
>
> while running the Accumulator example in 
> https://github.com/Elbehery/flink/blob/master/flink-examples/flink-java-examples/src/main/java/org/apache/flink/examples/java/relational/EmptyFieldsCountAccumulator.java,
>  
> I tried to alter the data flow with "PartitionByHash" function before 
> applying "Filter", and the resulted accumulator was NULL. 
> By Debugging, I could see the accumulator in the RunTime Map. However, by 
> retrieving the accumulator from the JobExecutionResult object, it was NULL. 
> The line caused the problem is "file.partitionByHash(1).filter(new 
> EmptyFieldFilter())" instead of "file.filter(new EmptyFieldFilter())"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (FLINK-1962) Add Gelly Scala API

2015-05-11 Thread PJ Van Aeken (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14537797#comment-14537797
 ] 

PJ Van Aeken edited comment on FLINK-1962 at 5/11/15 11:30 AM:
---

I was wrong before. The TypeErasure fix has one more problem. Method 
"createTypeInformation" creates a PojoTypeInfo for Vertex, rather than a 
TupleTypeInfo which is what the Java API expects. I think this is because the 
Scala Type Extraction does not recognize java's Tuple2 as a Tuple...


was (Author: vanaepi):
I was wrong before. The TypeErasure fix has one more problem. Method 
"createTypeInformation" creates a PojoTypeInfo for Vertex, rather than a 
TupleTypeInfo which is what the Java API expects.

> Add Gelly Scala API
> ---
>
> Key: FLINK-1962
> URL: https://issues.apache.org/jira/browse/FLINK-1962
> Project: Flink
>  Issue Type: Task
>  Components: Gelly, Scala API
>Affects Versions: 0.9
>Reporter: Vasia Kalavri
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1999) TF-IDF transformer

2015-05-11 Thread JIRA

[ 
https://issues.apache.org/jira/browse/FLINK-1999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14537822#comment-14537822
 ] 

Ronny Bräunlich commented on FLINK-1999:


Are you sure that the input type should be DataSet[Seq[String]]?
That seems to me like we would calculate the idf always for one document, which 
would be log(1/1) -> 0 or is one element of the sequence supposed to be one 
document? 
If yes, would it be wise to always load the full document into memory or is the 
DataSet smart enough to read the file stream-wise? 

> TF-IDF transformer
> --
>
> Key: FLINK-1999
> URL: https://issues.apache.org/jira/browse/FLINK-1999
> Project: Flink
>  Issue Type: New Feature
>  Components: Machine Learning Library
>Reporter: Ronny Bräunlich
>Assignee: Alexander Alexandrov
>Priority: Minor
>  Labels: ML
>
> Hello everybody,
> we are a group of three students from TU Berlin (I guess we're not the first 
> group creating an issue) and we want to/have to implement a tf-idf tranformer 
> for Flink.
> Our lecturer Alexander told us that we could get some guidance here and that 
> you could point us to an old version of a similar tranformer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (FLINK-1962) Add Gelly Scala API

2015-05-11 Thread PJ Van Aeken (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14537797#comment-14537797
 ] 

PJ Van Aeken edited comment on FLINK-1962 at 5/11/15 10:33 AM:
---

I was wrong before. The TypeErasure fix has one more problem. Method 
"createTypeInformation" creates a PojoTypeInfo for Vertex, rather than a 
TupleTypeInfo which is what the Java API expects.


was (Author: vanaepi):
The combination of both pull requests did the trick. Perhaps someone should 
merge the TypeErasure fix into master as well. Seems like an important fix for 
future combined java/scala api's

> Add Gelly Scala API
> ---
>
> Key: FLINK-1962
> URL: https://issues.apache.org/jira/browse/FLINK-1962
> Project: Flink
>  Issue Type: Task
>  Components: Gelly, Scala API
>Affects Versions: 0.9
>Reporter: Vasia Kalavri
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1962) Add Gelly Scala API

2015-05-11 Thread PJ Van Aeken (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14537797#comment-14537797
 ] 

PJ Van Aeken commented on FLINK-1962:
-

The combination of both pull requests did the trick. Perhaps someone should 
merge the TypeErasure fix into master as well. Seems like an important fix for 
future combined java/scala api's

> Add Gelly Scala API
> ---
>
> Key: FLINK-1962
> URL: https://issues.apache.org/jira/browse/FLINK-1962
> Project: Flink
>  Issue Type: Task
>  Components: Gelly, Scala API
>Affects Versions: 0.9
>Reporter: Vasia Kalavri
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (FLINK-1636) Improve error handling when partitions not found

2015-05-11 Thread Ufuk Celebi (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-1636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ufuk Celebi updated FLINK-1636:
---
Priority: Major  (was: Minor)

> Improve error handling when partitions not found
> 
>
> Key: FLINK-1636
> URL: https://issues.apache.org/jira/browse/FLINK-1636
> Project: Flink
>  Issue Type: Improvement
>  Components: Distributed Runtime
>Reporter: Ufuk Celebi
>Assignee: Ufuk Celebi
>
> When a result partition is released concurrently with a remote partition 
> request, the request might come in late and result in an exception at the 
> receiving task saying:
> {code}
> 16:04:22,499 INFO  org.apache.flink.runtime.taskmanager.Task  
>- CHAIN Partition -> Map (Map at 
> testRestartMultipleTimes(SimpleRecoveryITCase.java:200)) (1/4) switched to 
> FAILED : java.io.IOException: 
> org.apache.flink.runtime.io.network.partition.queue.IllegalQueueIteratorRequestException
>  at remote input channel: Intermediate result partition has already been 
> released.].
>   at 
> org.apache.flink.runtime.io.network.partition.consumer.RemoteInputChannel.checkIoError(RemoteInputChannel.java:223)
>   at 
> org.apache.flink.runtime.io.network.partition.consumer.RemoteInputChannel.getNextBuffer(RemoteInputChannel.java:103)
>   at 
> org.apache.flink.runtime.io.network.partition.consumer.SingleInputGate.getNextBufferOrEvent(SingleInputGate.java:310)
>   at 
> org.apache.flink.runtime.io.network.api.reader.AbstractRecordReader.getNextRecord(AbstractRecordReader.java:75)
>   at 
> org.apache.flink.runtime.io.network.api.reader.MutableRecordReader.next(MutableRecordReader.java:34)
>   at 
> org.apache.flink.runtime.operators.util.ReaderIterator.next(ReaderIterator.java:59)
>   at org.apache.flink.runtime.operators.NoOpDriver.run(NoOpDriver.java:91)
>   at 
> org.apache.flink.runtime.operators.RegularPactTask.run(RegularPactTask.java:496)
>   at 
> org.apache.flink.runtime.operators.RegularPactTask.invoke(RegularPactTask.java:362)
>   at 
> org.apache.flink.runtime.execution.RuntimeEnvironment.run(RuntimeEnvironment.java:205)
>   at java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1525) Provide utils to pass -D parameters to UDFs

2015-05-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14537696#comment-14537696
 ] 

ASF GitHub Bot commented on FLINK-1525:
---

Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/664#issuecomment-100805130
  
In Hadoop, the `UserConfig` would probably be a `Configuration` object with 
key/value pairs.
In Flink, we are trying to get rid of these untyped maps.
Instead, I would recommend users to use a simple java class, like
```java
public static class MyConfig extends UserConfig {
  public long someLongValue;
  public int someInt;
  // this is optional
  public Map toMap() { return null; }
}
```
It can be used in a similar way to a Configuration object, but the compiler 
is able to check the types.

The `ParameterUtil` is implementing the UserConfig interface to expose the 
configuration values through the Flink program & in the web interface.


> Provide utils to pass -D parameters to UDFs 
> 
>
> Key: FLINK-1525
> URL: https://issues.apache.org/jira/browse/FLINK-1525
> Project: Flink
>  Issue Type: Improvement
>  Components: flink-contrib
>Reporter: Robert Metzger
>  Labels: starter
>
> Hadoop users are used to setting job configuration through "-D" on the 
> command line.
> Right now, Flink users have to manually parse command line arguments and pass 
> them to the methods.
> It would be nice to provide a standard args parser with is taking care of 
> such stuff.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-1525][FEEDBACK] Introduction of a small...

2015-05-11 Thread rmetzger
Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/664#issuecomment-100805130
  
In Hadoop, the `UserConfig` would probably be a `Configuration` object with 
key/value pairs.
In Flink, we are trying to get rid of these untyped maps.
Instead, I would recommend users to use a simple java class, like
```java
public static class MyConfig extends UserConfig {
  public long someLongValue;
  public int someInt;
  // this is optional
  public Map toMap() { return null; }
}
```
It can be used in a similar way to a Configuration object, but the compiler 
is able to check the types.

The `ParameterUtil` is implementing the UserConfig interface to expose the 
configuration values through the Flink program & in the web interface.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-1987) Broken links in the add_operator section of the documentation

2015-05-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14537695#comment-14537695
 ] 

ASF GitHub Bot commented on FLINK-1987:
---

Github user mxm commented on the pull request:

https://github.com/apache/flink/pull/662#issuecomment-100805034
  
+1 links work again


> Broken links in the add_operator section of the documentation
> -
>
> Key: FLINK-1987
> URL: https://issues.apache.org/jira/browse/FLINK-1987
> Project: Flink
>  Issue Type: Bug
>  Components: docs
>Affects Versions: 0.9
>Reporter: Andra Lungu
>Assignee: Andra Lungu
>Priority: Trivial
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-1987][docs] Fixed broken links

2015-05-11 Thread mxm
Github user mxm commented on the pull request:

https://github.com/apache/flink/pull/662#issuecomment-100805034
  
+1 links work again


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---