date:20150423


[ 
https://issues.apache.org/jira/browse/FLINK-1615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14509139#comment-14509139
 ] 

ASF GitHub Bot commented on FLINK-1615:
---

Github user Elbehery commented on the pull request:

https://github.com/apache/flink/pull/442#issuecomment-95603365
  
@aljoscha  you should be able to see it from the PR .. anyway this is mine 
https://github.com/Elbehery/flink


 Introduces a new InputFormat for Tweets
 ---

 Key: FLINK-1615
 URL: https://issues.apache.org/jira/browse/FLINK-1615
 Project: Flink
  Issue Type: New Feature
  Components: flink-contrib
Affects Versions: 0.8.1
Reporter: mustafa elbehery
Priority: Minor

 An event-driven parser for Tweets into Java Pojos. 
 It parses all the important part of the tweet into Java objects. 
 Tested on cluster and the performance in pretty well. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-1848) Paths containing a Windows drive letter cannot be used in FileOutputFormats

2015-04-23 Thread Alexander Alexandrov (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-1848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14509282#comment-14509282
 ] 

Alexander Alexandrov commented on FLINK-1848:
-

I think this should be fixed (with the PR) and can be closed now, right?

 Paths containing a Windows drive letter cannot be used in FileOutputFormats
 ---

 Key: FLINK-1848
 URL: https://issues.apache.org/jira/browse/FLINK-1848
 Project: Flink
  Issue Type: Bug
Affects Versions: 0.9
 Environment: Windows (Cygwin and native)
Reporter: Fabian Hueske
Assignee: Fabian Hueske
Priority: Critical
 Fix For: 0.9


 Paths that contain a Windows drive letter such as {{file:///c:/my/directory}} 
 cannot be used as output path for {{FileOutputFormat}}.
 If done, the following exception is thrown:
 {code}
 Caused by: java.lang.IllegalArgumentException: java.net.URISyntaxException: 
 Relative path in absolute URI: file:c:
 at org.apache.flink.core.fs.Path.initialize(Path.java:242)
 at org.apache.flink.core.fs.Path.init(Path.java:225)
 at org.apache.flink.core.fs.Path.init(Path.java:138)
 at 
 org.apache.flink.core.fs.local.LocalFileSystem.pathToFile(LocalFileSystem.java:147)
 at 
 org.apache.flink.core.fs.local.LocalFileSystem.mkdirs(LocalFileSystem.java:232)
 at 
 org.apache.flink.core.fs.local.LocalFileSystem.mkdirs(LocalFileSystem.java:233)
 at 
 org.apache.flink.core.fs.local.LocalFileSystem.mkdirs(LocalFileSystem.java:233)
 at 
 org.apache.flink.core.fs.local.LocalFileSystem.mkdirs(LocalFileSystem.java:233)
 at 
 org.apache.flink.core.fs.local.LocalFileSystem.mkdirs(LocalFileSystem.java:233)
 at 
 org.apache.flink.core.fs.FileSystem.initOutPathLocalFS(FileSystem.java:603)
 at 
 org.apache.flink.api.common.io.FileOutputFormat.open(FileOutputFormat.java:233)
 at 
 org.apache.flink.api.java.io.CsvOutputFormat.open(CsvOutputFormat.java:158)
 at 
 org.apache.flink.runtime.operators.DataSinkTask.invoke(DataSinkTask.java:183)
 at 
 org.apache.flink.runtime.execution.RuntimeEnvironment.run(RuntimeEnvironment.java:217)
 at java.lang.Thread.run(Unknown Source)
 Caused by: java.net.URISyntaxException: Relative path in absolute URI: file:c:
 at java.net.URI.checkPath(Unknown Source)
 at java.net.URI.init(Unknown Source)
 at org.apache.flink.core.fs.Path.initialize(Path.java:240)
 ... 14 more
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK-1615] [java api] SimpleTweetInputFormat

2015-04-23 Thread Elbehery

Github user Elbehery commented on the pull request:

https://github.com/apache/flink/pull/442#issuecomment-95603365
  
@aljoscha  you should be able to see it from the PR .. anyway this is mine 
https://github.com/Elbehery/flink


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-1828) Impossible to output data to an HBase table


[ 
https://issues.apache.org/jira/browse/FLINK-1828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14509134#comment-14509134
 ] 

ASF GitHub Bot commented on FLINK-1828:
---

Github user fpompermaier commented on a diff in the pull request:

https://github.com/apache/flink/pull/571#discussion_r28966088
  
--- Diff: flink-staging/flink-hbase/pom.xml ---
@@ -112,6 +112,12 @@ under the License.
/exclusion
/exclusions
/dependency
+   dependency
--- End diff --

Could you do that?


 Impossible to output data to an HBase table
 ---

 Key: FLINK-1828
 URL: https://issues.apache.org/jira/browse/FLINK-1828
 Project: Flink
  Issue Type: Bug
  Components: Hadoop Compatibility
Affects Versions: 0.9
Reporter: Flavio Pompermaier
  Labels: hadoop, hbase
 Fix For: 0.9


 Right now it is not possible to use HBase TableOutputFormat as output format 
 because Configurable.setConf  is not called in the configure() method of the 
 HadoopOutputFormatBase



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-1693) Deprecate the Spargel API


[ 
https://issues.apache.org/jira/browse/FLINK-1693?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14509442#comment-14509442
 ] 

ASF GitHub Bot commented on FLINK-1693:
---

Github user vasia commented on the pull request:

https://github.com/apache/flink/pull/618#issuecomment-95668944
  
Thank you @hsaputra! Looks good :-)


 Deprecate the Spargel API
 -

 Key: FLINK-1693
 URL: https://issues.apache.org/jira/browse/FLINK-1693
 Project: Flink
  Issue Type: Task
  Components: Spargel
Affects Versions: 0.9
Reporter: Vasia Kalavri
Assignee: Henry Saputra

 For the upcoming 0.9 release, we should mark all user-facing methods from the 
 Spargel API as deprecated, with a warning that we are going to remove it at 
 some point.
 We should also add a comment in the docs and point people to Gelly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: FLINK-1693 Add DEPRECATED annotations in Sparg...

2015-04-23 Thread vasia

Github user vasia commented on the pull request:

https://github.com/apache/flink/pull/618#issuecomment-95668944
  
Thank you @hsaputra! Looks good :-)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: Fixed Configurable HadoopOutputFormat (FLINK-1...

2015-04-23 Thread fpompermaier

Github user fpompermaier commented on a diff in the pull request:

https://github.com/apache/flink/pull/571#discussion_r28966088
  
--- Diff: flink-staging/flink-hbase/pom.xml ---
@@ -112,6 +112,12 @@ under the License.
/exclusion
/exclusions
/dependency
+   dependency
--- End diff --

Could you do that?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-1930) NullPointerException in vertex-centric iteration


[ 
https://issues.apache.org/jira/browse/FLINK-1930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14509090#comment-14509090
 ] 

ASF GitHub Bot commented on FLINK-1930:
---

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/619#issuecomment-95593489
  
Manually merged in 4dbf030a6b0415832862c3fd0c3fe7403878a998


 NullPointerException in vertex-centric iteration
 

 Key: FLINK-1930
 URL: https://issues.apache.org/jira/browse/FLINK-1930
 Project: Flink
  Issue Type: Bug
Affects Versions: 0.9
Reporter: Vasia Kalavri

 Hello to my Squirrels,
 I came across this exception when having a vertex-centric iteration output 
 followed by a group by. 
 I'm not sure if what is causing it, since I saw this error in a rather large 
 pipeline, but I managed to reproduce it with [this code example | 
 https://github.com/vasia/flink/commit/1b7bbca1a6130fbcfe98b4b9b43967eb4c61f309]
  and a sufficiently large dataset, e.g. [this one | 
 http://snap.stanford.edu/data/com-DBLP.html] (I'm running this locally).
 It seems like a null Buffer in RecordWriter.
 The exception message is the following:
 Exception in thread main 
 org.apache.flink.runtime.client.JobExecutionException: Job execution failed.
 at 
 org.apache.flink.runtime.jobmanager.JobManager$anonfun$receiveWithLogMessages$1.applyOrElse(JobManager.scala:319)
 at 
 scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33)
 at 
 scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33)
 at 
 scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25)
 at 
 org.apache.flink.runtime.ActorLogMessages$anon$1.apply(ActorLogMessages.scala:37)
 at 
 org.apache.flink.runtime.ActorLogMessages$anon$1.apply(ActorLogMessages.scala:30)
 at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118)
 at 
 org.apache.flink.runtime.ActorLogMessages$anon$1.applyOrElse(ActorLogMessages.scala:30)
 at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
 at 
 org.apache.flink.runtime.jobmanager.JobManager.aroundReceive(JobManager.scala:94)
 at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
 at akka.actor.ActorCell.invoke(ActorCell.scala:487)
 at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254)
 at akka.dispatch.Mailbox.run(Mailbox.scala:221)
 at akka.dispatch.Mailbox.exec(Mailbox.scala:231)
 at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
 at 
 scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
 at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
 at 
 scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
 Caused by: java.lang.NullPointerException
 at 
 org.apache.flink.runtime.io.network.api.serialization.SpanningRecordSerializer.setNextBuffer(SpanningRecordSerializer.java:93)
 at 
 org.apache.flink.runtime.io.network.api.writer.RecordWriter.emit(RecordWriter.java:92)
 at 
 org.apache.flink.runtime.operators.shipping.OutputCollector.collect(OutputCollector.java:65)
 at 
 org.apache.flink.runtime.iterative.task.IterationHeadPactTask.streamSolutionSetToFinalOutput(IterationHeadPactTask.java:405)
 at 
 org.apache.flink.runtime.iterative.task.IterationHeadPactTask.run(IterationHeadPactTask.java:365)
 at 
 org.apache.flink.runtime.operators.RegularPactTask.invoke(RegularPactTask.java:360)
 at 
 org.apache.flink.runtime.execution.RuntimeEnvironment.run(RuntimeEnvironment.java:221)
 at java.lang.Thread.run(Thread.java:745)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-1930) NullPointerException in vertex-centric iteration


[ 
https://issues.apache.org/jira/browse/FLINK-1930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14509091#comment-14509091
 ] 

ASF GitHub Bot commented on FLINK-1930:
---

Github user StephanEwen closed the pull request at:

https://github.com/apache/flink/pull/619


 NullPointerException in vertex-centric iteration
 

 Key: FLINK-1930
 URL: https://issues.apache.org/jira/browse/FLINK-1930
 Project: Flink
  Issue Type: Bug
Affects Versions: 0.9
Reporter: Vasia Kalavri

 Hello to my Squirrels,
 I came across this exception when having a vertex-centric iteration output 
 followed by a group by. 
 I'm not sure if what is causing it, since I saw this error in a rather large 
 pipeline, but I managed to reproduce it with [this code example | 
 https://github.com/vasia/flink/commit/1b7bbca1a6130fbcfe98b4b9b43967eb4c61f309]
  and a sufficiently large dataset, e.g. [this one | 
 http://snap.stanford.edu/data/com-DBLP.html] (I'm running this locally).
 It seems like a null Buffer in RecordWriter.
 The exception message is the following:
 Exception in thread main 
 org.apache.flink.runtime.client.JobExecutionException: Job execution failed.
 at 
 org.apache.flink.runtime.jobmanager.JobManager$anonfun$receiveWithLogMessages$1.applyOrElse(JobManager.scala:319)
 at 
 scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33)
 at 
 scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33)
 at 
 scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25)
 at 
 org.apache.flink.runtime.ActorLogMessages$anon$1.apply(ActorLogMessages.scala:37)
 at 
 org.apache.flink.runtime.ActorLogMessages$anon$1.apply(ActorLogMessages.scala:30)
 at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118)
 at 
 org.apache.flink.runtime.ActorLogMessages$anon$1.applyOrElse(ActorLogMessages.scala:30)
 at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
 at 
 org.apache.flink.runtime.jobmanager.JobManager.aroundReceive(JobManager.scala:94)
 at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
 at akka.actor.ActorCell.invoke(ActorCell.scala:487)
 at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254)
 at akka.dispatch.Mailbox.run(Mailbox.scala:221)
 at akka.dispatch.Mailbox.exec(Mailbox.scala:231)
 at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
 at 
 scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
 at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
 at 
 scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
 Caused by: java.lang.NullPointerException
 at 
 org.apache.flink.runtime.io.network.api.serialization.SpanningRecordSerializer.setNextBuffer(SpanningRecordSerializer.java:93)
 at 
 org.apache.flink.runtime.io.network.api.writer.RecordWriter.emit(RecordWriter.java:92)
 at 
 org.apache.flink.runtime.operators.shipping.OutputCollector.collect(OutputCollector.java:65)
 at 
 org.apache.flink.runtime.iterative.task.IterationHeadPactTask.streamSolutionSetToFinalOutput(IterationHeadPactTask.java:405)
 at 
 org.apache.flink.runtime.iterative.task.IterationHeadPactTask.run(IterationHeadPactTask.java:365)
 at 
 org.apache.flink.runtime.operators.RegularPactTask.invoke(RegularPactTask.java:360)
 at 
 org.apache.flink.runtime.execution.RuntimeEnvironment.run(RuntimeEnvironment.java:221)
 at java.lang.Thread.run(Thread.java:745)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-1930) NullPointerException in vertex-centric iteration

2015-04-23 Thread Stephan Ewen (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-1930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14509097#comment-14509097
 ] 

Stephan Ewen commented on FLINK-1930:
-

Should give better error messages now.
Please post if this happens again...

 NullPointerException in vertex-centric iteration
 

 Key: FLINK-1930
 URL: https://issues.apache.org/jira/browse/FLINK-1930
 Project: Flink
  Issue Type: Bug
Affects Versions: 0.9
Reporter: Vasia Kalavri

 Hello to my Squirrels,
 I came across this exception when having a vertex-centric iteration output 
 followed by a group by. 
 I'm not sure if what is causing it, since I saw this error in a rather large 
 pipeline, but I managed to reproduce it with [this code example | 
 https://github.com/vasia/flink/commit/1b7bbca1a6130fbcfe98b4b9b43967eb4c61f309]
  and a sufficiently large dataset, e.g. [this one | 
 http://snap.stanford.edu/data/com-DBLP.html] (I'm running this locally).
 It seems like a null Buffer in RecordWriter.
 The exception message is the following:
 Exception in thread main 
 org.apache.flink.runtime.client.JobExecutionException: Job execution failed.
 at 
 org.apache.flink.runtime.jobmanager.JobManager$anonfun$receiveWithLogMessages$1.applyOrElse(JobManager.scala:319)
 at 
 scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33)
 at 
 scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33)
 at 
 scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25)
 at 
 org.apache.flink.runtime.ActorLogMessages$anon$1.apply(ActorLogMessages.scala:37)
 at 
 org.apache.flink.runtime.ActorLogMessages$anon$1.apply(ActorLogMessages.scala:30)
 at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118)
 at 
 org.apache.flink.runtime.ActorLogMessages$anon$1.applyOrElse(ActorLogMessages.scala:30)
 at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
 at 
 org.apache.flink.runtime.jobmanager.JobManager.aroundReceive(JobManager.scala:94)
 at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
 at akka.actor.ActorCell.invoke(ActorCell.scala:487)
 at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254)
 at akka.dispatch.Mailbox.run(Mailbox.scala:221)
 at akka.dispatch.Mailbox.exec(Mailbox.scala:231)
 at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
 at 
 scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
 at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
 at 
 scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
 Caused by: java.lang.NullPointerException
 at 
 org.apache.flink.runtime.io.network.api.serialization.SpanningRecordSerializer.setNextBuffer(SpanningRecordSerializer.java:93)
 at 
 org.apache.flink.runtime.io.network.api.writer.RecordWriter.emit(RecordWriter.java:92)
 at 
 org.apache.flink.runtime.operators.shipping.OutputCollector.collect(OutputCollector.java:65)
 at 
 org.apache.flink.runtime.iterative.task.IterationHeadPactTask.streamSolutionSetToFinalOutput(IterationHeadPactTask.java:405)
 at 
 org.apache.flink.runtime.iterative.task.IterationHeadPactTask.run(IterationHeadPactTask.java:365)
 at 
 org.apache.flink.runtime.operators.RegularPactTask.invoke(RegularPactTask.java:360)
 at 
 org.apache.flink.runtime.execution.RuntimeEnvironment.run(RuntimeEnvironment.java:221)
 at java.lang.Thread.run(Thread.java:745)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK-1930] [runtime] Improve exception when ...

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/619#issuecomment-95593489
  
Manually merged in 4dbf030a6b0415832862c3fd0c3fe7403878a998


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-1930) NullPointerException in vertex-centric iteration

2015-04-23 Thread Vasia Kalavri (JIRA)


[ 
https://issues.apache.org/jira/browse/FLINK-1930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14509124#comment-14509124
 ] 

Vasia Kalavri commented on FLINK-1930:
--

Ok, so now the program fails with this:

Caused by: java.lang.IllegalStateException: Buffer pool is destroyed.
at 
org.apache.flink.runtime.io.network.buffer.LocalBufferPool.requestBuffer(LocalBufferPool.java:144)
at 
org.apache.flink.runtime.io.network.buffer.LocalBufferPool.requestBufferBlocking(LocalBufferPool.java:133)
at 
org.apache.flink.runtime.io.network.api.writer.RecordWriter.emit(RecordWriter.java:91)
at 
org.apache.flink.runtime.operators.shipping.OutputCollector.collect(OutputCollector.java:65)
at 
org.apache.flink.runtime.iterative.task.IterationHeadPactTask.streamSolutionSetToFinalOutput(IterationHeadPactTask.java:405)
at 
org.apache.flink.runtime.iterative.task.IterationHeadPactTask.run(IterationHeadPactTask.java:365)
at 
org.apache.flink.runtime.operators.RegularPactTask.invoke(RegularPactTask.java:360)
at 
org.apache.flink.runtime.execution.RuntimeEnvironment.run(RuntimeEnvironment.java:221)
at java.lang.Thread.run(Thread.java:745)

 NullPointerException in vertex-centric iteration
 

 Key: FLINK-1930
 URL: https://issues.apache.org/jira/browse/FLINK-1930
 Project: Flink
  Issue Type: Bug
Affects Versions: 0.9
Reporter: Vasia Kalavri

 Hello to my Squirrels,
 I came across this exception when having a vertex-centric iteration output 
 followed by a group by. 
 I'm not sure if what is causing it, since I saw this error in a rather large 
 pipeline, but I managed to reproduce it with [this code example | 
 https://github.com/vasia/flink/commit/1b7bbca1a6130fbcfe98b4b9b43967eb4c61f309]
  and a sufficiently large dataset, e.g. [this one | 
 http://snap.stanford.edu/data/com-DBLP.html] (I'm running this locally).
 It seems like a null Buffer in RecordWriter.
 The exception message is the following:
 Exception in thread main 
 org.apache.flink.runtime.client.JobExecutionException: Job execution failed.
 at 
 org.apache.flink.runtime.jobmanager.JobManager$anonfun$receiveWithLogMessages$1.applyOrElse(JobManager.scala:319)
 at 
 scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33)
 at 
 scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33)
 at 
 scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25)
 at 
 org.apache.flink.runtime.ActorLogMessages$anon$1.apply(ActorLogMessages.scala:37)
 at 
 org.apache.flink.runtime.ActorLogMessages$anon$1.apply(ActorLogMessages.scala:30)
 at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118)
 at 
 org.apache.flink.runtime.ActorLogMessages$anon$1.applyOrElse(ActorLogMessages.scala:30)
 at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
 at 
 org.apache.flink.runtime.jobmanager.JobManager.aroundReceive(JobManager.scala:94)
 at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
 at akka.actor.ActorCell.invoke(ActorCell.scala:487)
 at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254)
 at akka.dispatch.Mailbox.run(Mailbox.scala:221)
 at akka.dispatch.Mailbox.exec(Mailbox.scala:231)
 at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
 at 
 scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
 at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
 at 
 scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
 Caused by: java.lang.NullPointerException
 at 
 org.apache.flink.runtime.io.network.api.serialization.SpanningRecordSerializer.setNextBuffer(SpanningRecordSerializer.java:93)
 at 
 org.apache.flink.runtime.io.network.api.writer.RecordWriter.emit(RecordWriter.java:92)
 at 
 org.apache.flink.runtime.operators.shipping.OutputCollector.collect(OutputCollector.java:65)
 at 
 org.apache.flink.runtime.iterative.task.IterationHeadPactTask.streamSolutionSetToFinalOutput(IterationHeadPactTask.java:405)
 at 
 org.apache.flink.runtime.iterative.task.IterationHeadPactTask.run(IterationHeadPactTask.java:365)
 at 
 org.apache.flink.runtime.operators.RegularPactTask.invoke(RegularPactTask.java:360)
 at 
 org.apache.flink.runtime.execution.RuntimeEnvironment.run(RuntimeEnvironment.java:221)
 at java.lang.Thread.run(Thread.java:745)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (FLINK-1865) Unstable test KafkaITCase


 [ 
https://issues.apache.org/jira/browse/FLINK-1865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Metzger reassigned FLINK-1865:
-

Assignee: Robert Metzger  (was: Márton Balassi)

 Unstable test KafkaITCase
 -

 Key: FLINK-1865
 URL: https://issues.apache.org/jira/browse/FLINK-1865
 Project: Flink
  Issue Type: Bug
  Components: Streaming, Tests
Affects Versions: 0.9
Reporter: Stephan Ewen
Assignee: Robert Metzger

 {code}
 Running org.apache.flink.streaming.connectors.kafka.KafkaITCase
 04/10/2015 13:46:53   Job execution switched to status RUNNING.
 04/10/2015 13:46:53   Custom Source - Stream Sink(1/1) switched to SCHEDULED 
 04/10/2015 13:46:53   Custom Source - Stream Sink(1/1) switched to DEPLOYING 
 04/10/2015 13:46:53   Custom Source - Stream Sink(1/1) switched to SCHEDULED 
 04/10/2015 13:46:53   Custom Source - Stream Sink(1/1) switched to DEPLOYING 
 04/10/2015 13:46:53   Custom Source - Stream Sink(1/1) switched to RUNNING 
 04/10/2015 13:46:53   Custom Source - Stream Sink(1/1) switched to RUNNING 
 04/10/2015 13:47:04   Custom Source - Stream Sink(1/1) switched to FAILED 
 java.lang.RuntimeException: java.lang.RuntimeException: 
 org.apache.flink.streaming.connectors.kafka.KafkaITCase$SuccessException
   at 
 org.apache.flink.streaming.api.invokable.StreamInvokable.callUserFunctionAndLogException(StreamInvokable.java:145)
   at 
 org.apache.flink.streaming.api.invokable.SourceInvokable.invoke(SourceInvokable.java:37)
   at 
 org.apache.flink.streaming.api.streamvertex.StreamVertex.invoke(StreamVertex.java:168)
   at 
 org.apache.flink.runtime.execution.RuntimeEnvironment.run(RuntimeEnvironment.java:221)
   at java.lang.Thread.run(Thread.java:701)
 Caused by: java.lang.RuntimeException: 
 org.apache.flink.streaming.connectors.kafka.KafkaITCase$SuccessException
   at 
 org.apache.flink.streaming.api.invokable.StreamInvokable.callUserFunctionAndLogException(StreamInvokable.java:145)
   at 
 org.apache.flink.streaming.api.invokable.ChainableInvokable.collect(ChainableInvokable.java:54)
   at 
 org.apache.flink.streaming.api.collector.CollectorWrapper.collect(CollectorWrapper.java:39)
   at 
 org.apache.flink.streaming.connectors.kafka.api.KafkaSource.run(KafkaSource.java:196)
   at 
 org.apache.flink.streaming.api.invokable.SourceInvokable.callUserFunction(SourceInvokable.java:42)
   at 
 org.apache.flink.streaming.api.invokable.StreamInvokable.callUserFunctionAndLogException(StreamInvokable.java:139)
   ... 4 more
 Caused by: 
 org.apache.flink.streaming.connectors.kafka.KafkaITCase$SuccessException
   at 
 org.apache.flink.streaming.connectors.kafka.KafkaITCase$1.invoke(KafkaITCase.java:166)
   at 
 org.apache.flink.streaming.connectors.kafka.KafkaITCase$1.invoke(KafkaITCase.java:141)
   at 
 org.apache.flink.streaming.api.invokable.SinkInvokable.callUserFunction(SinkInvokable.java:41)
   at 
 org.apache.flink.streaming.api.invokable.StreamInvokable.callUserFunctionAndLogException(StreamInvokable.java:139)
   ... 9 more
 04/10/2015 13:47:04   Job execution switched to status FAILING.
 04/10/2015 13:47:04   Custom Source - Stream Sink(1/1) switched to CANCELING 
 04/10/2015 13:47:04   Custom Source - Stream Sink(1/1) switched to CANCELED 
 04/10/2015 13:47:04   Job execution switched to status FAILED.
 04/10/2015 13:47:05   Job execution switched to status RUNNING.
 04/10/2015 13:47:05   Custom Source - Stream Sink(1/1) switched to SCHEDULED 
 04/10/2015 13:47:05   Custom Source - Stream Sink(1/1) switched to DEPLOYING 
 04/10/2015 13:47:05   Custom Source - Stream Sink(1/1) switched to SCHEDULED 
 04/10/2015 13:47:05   Custom Source - Stream Sink(1/1) switched to DEPLOYING 
 04/10/2015 13:47:05   Custom Source - Stream Sink(1/1) switched to RUNNING 
 04/10/2015 13:47:05   Custom Source - Stream Sink(1/1) switched to RUNNING 
 04/10/2015 13:47:15   Custom Source - Stream Sink(1/1) switched to FAILED 
 java.lang.RuntimeException: java.lang.RuntimeException: 
 org.apache.flink.streaming.connectors.kafka.KafkaITCase$SuccessException
   at 
 org.apache.flink.streaming.api.invokable.StreamInvokable.callUserFunctionAndLogException(StreamInvokable.java:145)
   at 
 org.apache.flink.streaming.api.invokable.SourceInvokable.invoke(SourceInvokable.java:37)
   at 
 org.apache.flink.streaming.api.streamvertex.StreamVertex.invoke(StreamVertex.java:168)
   at 
 org.apache.flink.runtime.execution.RuntimeEnvironment.run(RuntimeEnvironment.java:221)
   at java.lang.Thread.run(Thread.java:701)
 Caused by: java.lang.RuntimeException: 
 org.apache.flink.streaming.connectors.kafka.KafkaITCase$SuccessException
   at 
 org.apache.flink.streaming.api.invokable.StreamInvokable.callUserFunctionAndLogException(StreamInvokable.java:145)
   at

[jira] [Assigned] (FLINK-1921) Rework parallelism/slots handling for per-job YARN sessions


 [ 
https://issues.apache.org/jira/browse/FLINK-1921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Metzger reassigned FLINK-1921:
-

Assignee: Robert Metzger

 Rework parallelism/slots handling for per-job YARN sessions
 ---

 Key: FLINK-1921
 URL: https://issues.apache.org/jira/browse/FLINK-1921
 Project: Flink
  Issue Type: Bug
  Components: YARN Client
Reporter: Robert Metzger
Assignee: Robert Metzger

 Right now, the -p argument is overwriting the -ys argument for per job yarn 
 sessions.
 Also, the priorities for parallelism should be documented:
 low to high
 1. flink-conf.yaml (-D arguments on YARN)
 2. -p on ./bin/flink
 3. ExecutionEnvironment.setParallelism()
 4. Operator.setParallelism().



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (FLINK-1920) Passing -D akka.ask.timeout=5 min to yarn client does not work


 [ 
https://issues.apache.org/jira/browse/FLINK-1920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Metzger reassigned FLINK-1920:
-

Assignee: Robert Metzger

 Passing -D akka.ask.timeout=5 min to yarn client does not work
 --

 Key: FLINK-1920
 URL: https://issues.apache.org/jira/browse/FLINK-1920
 Project: Flink
  Issue Type: Bug
  Components: YARN Client
Affects Versions: 0.9
Reporter: Robert Metzger
Assignee: Robert Metzger

 Thats probably an issue of the command line parsing.
 Variations like -D akka.ask.timeout=5 min or  -D akka.ask.timeout=5 min 
 are also not working.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-1615) Introduces a new InputFormat for Tweets


[ 
https://issues.apache.org/jira/browse/FLINK-1615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14508978#comment-14508978
 ] 

ASF GitHub Bot commented on FLINK-1615:
---

Github user aljoscha commented on the pull request:

https://github.com/apache/flink/pull/442#issuecomment-95570894
  
Where is your git repository? So that I can checkout your commit and merge 
it?



 Introduces a new InputFormat for Tweets
 ---

 Key: FLINK-1615
 URL: https://issues.apache.org/jira/browse/FLINK-1615
 Project: Flink
  Issue Type: New Feature
  Components: flink-contrib
Affects Versions: 0.8.1
Reporter: mustafa elbehery
Priority: Minor

 An event-driven parser for Tweets into Java Pojos. 
 It parses all the important part of the tweet into Java objects. 
 Tested on cluster and the performance in pretty well. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK-1615] [java api] SimpleTweetInputFormat

Github user aljoscha commented on the pull request:

https://github.com/apache/flink/pull/442#issuecomment-95570894
  
Where is your git repository? So that I can checkout your commit and merge 
it?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Created] (FLINK-1935) Reimplement PersistentKafkaSource using high level Kafka API

Robert Metzger created FLINK-1935:
-

 Summary: Reimplement PersistentKafkaSource using high level Kafka 
API
 Key: FLINK-1935
 URL: https://issues.apache.org/jira/browse/FLINK-1935
 Project: Flink
  Issue Type: Improvement
  Components: Kafka Connector, Streaming
Affects Versions: 0.9
Reporter: Robert Metzger
Assignee: Robert Metzger
 Fix For: 0.9


The current PersistentKafkaSource in Flink has some limitations that I seek to 
overcome by reimplementing it using Kafka's high level API (and manually 
committing the offsets to ZK).

This approach only works when the offsets are committed to ZK directly.

The current PersistentKafkaSource does not integrate with existing Kafka tools 
(for example for monitoring the lag). All the communication with Zookeeper is 
implemented manually in our current code. This is prone to errors and 
inefficiencies.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-1398) A new DataSet function: extractElementFromTuple


[ 
https://issues.apache.org/jira/browse/FLINK-1398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14509002#comment-14509002
 ] 

ASF GitHub Bot commented on FLINK-1398:
---

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/308#issuecomment-95577570
  
+1 for not specializing code in the `DataSet`

A `TupleUtil` would be fine, in my opinion


 A new DataSet function: extractElementFromTuple
 ---

 Key: FLINK-1398
 URL: https://issues.apache.org/jira/browse/FLINK-1398
 Project: Flink
  Issue Type: Wish
Reporter: Felix Neutatz
Assignee: Felix Neutatz
Priority: Minor

 This is the use case:
 {code:xml}
 DataSetTuple2Integer, Double data =  env.fromElements(new 
 Tuple2Integer, Double(1,2.0));
 
 data.map(new ElementFromTuple());
 
 }
 public static final class ElementFromTuple implements 
 MapFunctionTuple2Integer, Double, Double {
 @Override
 public Double map(Tuple2Integer, Double value) {
 return value.f1;
 }
 }
 {code}
 It would be awesome if we had something like this:
 {code:xml}
 data.extractElement(1);
 {code}
 This means that we implement a function for DataSet which extracts a certain 
 element from a given Tuple.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK-1398] Introduce extractSingleField() in...

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/308#issuecomment-95577570
  
+1 for not specializing code in the `DataSet`

A `TupleUtil` would be fine, in my opinion


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-1472] Fixed Web frontend config overvie...

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/439#issuecomment-95579057
  
I agree to enforce normalization in the `ConfigConstants`, by adding a test 
case that reflectively gets all fields and checks whether they start with 
DEFAULT_ or end in _KEY.

The mapping between keys/values will probably have to be done manually, 
unless we add annotations above the key fields declaring what the default value 
is.

The type of the default value should help to figure out what get method 
to use. (getInteger, getDouble) in order to get the correct conversion method.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-1472) Web frontend config overview shows wrong value


[ 
https://issues.apache.org/jira/browse/FLINK-1472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14509006#comment-14509006
 ] 

ASF GitHub Bot commented on FLINK-1472:
---

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/439#issuecomment-95579057
  
I agree to enforce normalization in the `ConfigConstants`, by adding a test 
case that reflectively gets all fields and checks whether they start with 
DEFAULT_ or end in _KEY.

The mapping between keys/values will probably have to be done manually, 
unless we add annotations above the key fields declaring what the default value 
is.

The type of the default value should help to figure out what get method 
to use. (getInteger, getDouble) in order to get the correct conversion method.



 Web frontend config overview shows wrong value
 --

 Key: FLINK-1472
 URL: https://issues.apache.org/jira/browse/FLINK-1472
 Project: Flink
  Issue Type: Bug
  Components: Webfrontend
Affects Versions: master
Reporter: Ufuk Celebi
Assignee: Mingliang Qi
Priority: Minor

 The web frontend shows configuration values even if they could not be 
 correctly parsed.
 For example I've configured the number of buffers as 123.000, which cannot 
 be parsed as an Integer by GlobalConfiguration and the default value is used. 
 Still, the web frontend shows the not used 123.000.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-1745) Add k-nearest-neighbours algorithm to machine learning library


[ 
https://issues.apache.org/jira/browse/FLINK-1745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14508609#comment-14508609
 ] 

Till Rohrmann commented on FLINK-1745:
--

That sounds really good [~chiwanpark] and [~raghav.chalapa...@gmail.com] :-). 
We can use this issue to track the progress of the exact kNN implementation and 
I'll add another one for the approximated kNN.

 Add k-nearest-neighbours algorithm to machine learning library
 --

 Key: FLINK-1745
 URL: https://issues.apache.org/jira/browse/FLINK-1745
 Project: Flink
  Issue Type: New Feature
  Components: Machine Learning Library
Reporter: Till Rohrmann
Assignee: Chiwan Park
  Labels: ML, Starter

 Even though the k-nearest-neighbours (kNN) [1,2] algorithm is quite trivial 
 it is still used as a mean to classify data and to do regression.
 Could be a starter task.
 Resources:
 [1] [http://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm]
 [2] [https://www.cs.utah.edu/~lifeifei/papers/mrknnj.pdf]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-1933) Add distance measure interface and basic implementation to machine learning library


[ 
https://issues.apache.org/jira/browse/FLINK-1933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14508612#comment-14508612
 ] 

Till Rohrmann commented on FLINK-1933:
--

Good selection of distance measures. Should be more than enough to start with 
:-)

 Add distance measure interface and basic implementation to machine learning 
 library
 ---

 Key: FLINK-1933
 URL: https://issues.apache.org/jira/browse/FLINK-1933
 Project: Flink
  Issue Type: New Feature
  Components: Machine Learning Library
Reporter: Chiwan Park
Assignee: Chiwan Park
  Labels: ML

 Add distance measure interface to calculate distance between two vectors and 
 some implementations of the interface. In FLINK-1745, [~till.rohrmann] 
 suggests a interface following:
 {code}
 trait DistanceMeasure {
   def distance(a: Vector, b: Vector): Double
 }
 {code}
 I think that following list of implementation is sufficient to provide first 
 to ML library users.
 * Manhattan distance [1]
 * Cosine distance [2]
 * Euclidean distance (and Squared) [3]
 * Tanimoto distance [4]
 * Minkowski distance [5]
 * Chebyshev distance [6]
 [1]: http://en.wikipedia.org/wiki/Taxicab_geometry
 [2]: http://en.wikipedia.org/wiki/Cosine_similarity
 [3]: http://en.wikipedia.org/wiki/Euclidean_distance
 [4]: 
 http://en.wikipedia.org/wiki/Jaccard_index#Tanimoto_coefficient_.28extended_Jaccard_coefficient.29
 [5]: http://en.wikipedia.org/wiki/Minkowski_distance
 [6]: http://en.wikipedia.org/wiki/Chebyshev_distance



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-1867) TaskManagerFailureRecoveryITCase causes stalled travis builds


[ 
https://issues.apache.org/jira/browse/FLINK-1867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14508635#comment-14508635
 ] 

ASF GitHub Bot commented on FLINK-1867:
---

Github user aljoscha commented on a diff in the pull request:

https://github.com/apache/flink/pull/612#discussion_r28943598
  
--- Diff: 
flink-tests/src/test/scala/org/apache/flink/api/scala/runtime/taskmanager/TaskManagerFailsITCase.scala
 ---
@@ -231,9 +231,9 @@ with WordSpecLike with Matchers with BeforeAndAfterAll {
 config.setInteger(ConfigConstants.TASK_MANAGER_NUM_TASK_SLOTS, 
numSlots)
 
config.setInteger(ConfigConstants.LOCAL_INSTANCE_MANAGER_NUMBER_TASK_MANAGER, 
numTaskmanagers)
 config.setString(ConfigConstants.AKKA_WATCH_HEARTBEAT_INTERVAL, 1000 
ms)
-config.setString(ConfigConstants.AKKA_WATCH_HEARTBEAT_PAUSE, 4000 ms)
+config.setString(ConfigConstants.AKKA_WATCH_HEARTBEAT_PAUSE, 20 s)
--- End diff --

No, it doesn't before and after this change the 4 tests complete in about 8 
seconds. @tillrohrmann suggested that, since the actor system is local, there 
are some other mechanisms in play that signal failure in this case.


 TaskManagerFailureRecoveryITCase causes stalled travis builds
 -

 Key: FLINK-1867
 URL: https://issues.apache.org/jira/browse/FLINK-1867
 Project: Flink
  Issue Type: Bug
  Components: TaskManager, Tests
Affects Versions: 0.9
Reporter: Robert Metzger
Assignee: Aljoscha Krettek

 There are currently tests on travis failing:
 https://travis-ci.org/apache/flink/jobs/57943063



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK-1867/1880] Raise test timeouts in hope ...

Github user aljoscha commented on a diff in the pull request:

https://github.com/apache/flink/pull/612#discussion_r28943704
  
--- Diff: 
flink-tests/src/test/scala/org/apache/flink/api/scala/runtime/jobmanager/JobManagerFailsITCase.scala
 ---
@@ -18,24 +18,23 @@
 
 package org.apache.flink.api.scala.runtime.jobmanager
 
-import akka.actor.Status.{Success, Failure}
+import akka.actor.Status.Success
 import akka.actor.{ActorSystem, PoisonPill}
 import akka.testkit.{ImplicitSender, TestKit}
+import org.junit.Ignore
+import org.junit.runner.RunWith
+import org.scalatest.junit.JUnitRunner
+import org.scalatest.{BeforeAndAfterAll, Matchers, WordSpecLike}
+
 import org.apache.flink.configuration.{ConfigConstants, Configuration}
 import org.apache.flink.runtime.akka.AkkaUtils
-import org.apache.flink.runtime.client.JobExecutionException
-import org.apache.flink.runtime.jobgraph.{JobGraph, AbstractJobVertex}
-import org.apache.flink.runtime.jobmanager.Tasks.{NoOpInvokable, 
BlockingNoOpInvokable}
+import org.apache.flink.runtime.jobgraph.{AbstractJobVertex, JobGraph}
+import org.apache.flink.runtime.jobmanager.Tasks.{BlockingNoOpInvokable, 
NoOpInvokable}
 import org.apache.flink.runtime.messages.JobManagerMessages._
 import 
org.apache.flink.runtime.testingUtils.TestingMessages.DisableDisconnect
-import 
org.apache.flink.runtime.testingUtils.TestingTaskManagerMessages.{JobManagerTerminated,
-NotifyWhenJobManagerTerminated}
+import 
org.apache.flink.runtime.testingUtils.TestingTaskManagerMessages.{JobManagerTerminated,
 NotifyWhenJobManagerTerminated}
 import org.apache.flink.runtime.testingUtils.TestingUtils
 import org.apache.flink.test.util.ForkableFlinkMiniCluster
-import org.junit.Ignore
-import org.junit.runner.RunWith
-import org.scalatest.junit.JUnitRunner
-import org.scalatest.{BeforeAndAfterAll, Matchers, WordSpecLike}
 
 @Ignore(Contains a bug with Akka 2.2.1)
--- End diff --

True, removing it


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Resolved] (FLINK-1486) Add a string to the print method to identify output

2015-04-23 Thread Maximilian Michels (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maximilian Michels resolved FLINK-1486.
---
Resolution: Implemented

 Add a string to the print method to identify output
 ---

 Key: FLINK-1486
 URL: https://issues.apache.org/jira/browse/FLINK-1486
 Project: Flink
  Issue Type: Improvement
  Components: Local Runtime
Reporter: Maximilian Michels
Assignee: Maximilian Michels
Priority: Minor
  Labels: usability
 Fix For: 0.9


 The output of the {{print}} method of {[DataSet}} is mainly used for debug 
 purposes. Currently, it is difficult to identify the output.
 I would suggest to add another {{print(String str)}} method which allows the 
 user to supply a String to identify the output. This could be a prefix before 
 the actual output or a format string (which might be an overkill).
 {code}
 DataSet data = env.fromElements(1,2,3,4,5);
 {code}
 For example, {{data.print(MyDataSet: )}} would output print
 {noformat}
 MyDataSet: 1
 MyDataSet: 2
 ...
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-1615) Introduces a new InputFormat for Tweets


[ 
https://issues.apache.org/jira/browse/FLINK-1615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14508743#comment-14508743
 ] 

ASF GitHub Bot commented on FLINK-1615:
---

Github user Elbehery commented on the pull request:

https://github.com/apache/flink/pull/442#issuecomment-95512206
  
I did rebase against the master before creating the PR .. but this was long 
time ago, could this be the problems for the conflicts ?!!


 Introduces a new InputFormat for Tweets
 ---

 Key: FLINK-1615
 URL: https://issues.apache.org/jira/browse/FLINK-1615
 Project: Flink
  Issue Type: New Feature
  Components: flink-contrib
Affects Versions: 0.8.1
Reporter: mustafa elbehery
Priority: Minor

 An event-driven parser for Tweets into Java Pojos. 
 It parses all the important part of the tweet into Java objects. 
 Tested on cluster and the performance in pretty well. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-1867) TaskManagerFailureRecoveryITCase causes stalled travis builds


[ 
https://issues.apache.org/jira/browse/FLINK-1867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14508643#comment-14508643
 ] 

ASF GitHub Bot commented on FLINK-1867:
---

Github user aljoscha commented on a diff in the pull request:

https://github.com/apache/flink/pull/612#discussion_r28943824
  
--- Diff: 
flink-tests/src/test/java/org/apache/flink/test/recovery/AbstractProcessFailureRecoveryTest.java
 ---
@@ -112,9 +112,9 @@ public void testTaskManagerProcessFailure() {
Tuple2String, Object localAddress = new 
Tuple2String, Object(localhost, jobManagerPort);
 
Configuration jmConfig = new Configuration();
-   
jmConfig.setString(ConfigConstants.AKKA_WATCH_HEARTBEAT_INTERVAL, 1 s);
-   
jmConfig.setString(ConfigConstants.AKKA_WATCH_HEARTBEAT_PAUSE, 4 s);
-   
jmConfig.setInteger(ConfigConstants.AKKA_WATCH_THRESHOLD, 2);
+   
jmConfig.setString(ConfigConstants.AKKA_WATCH_HEARTBEAT_INTERVAL, 1 ms);
--- End diff --

That's a typo, will fix it.


 TaskManagerFailureRecoveryITCase causes stalled travis builds
 -

 Key: FLINK-1867
 URL: https://issues.apache.org/jira/browse/FLINK-1867
 Project: Flink
  Issue Type: Bug
  Components: TaskManager, Tests
Affects Versions: 0.9
Reporter: Robert Metzger
Assignee: Aljoscha Krettek

 There are currently tests on travis failing:
 https://travis-ci.org/apache/flink/jobs/57943063



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK-1867/1880] Raise test timeouts in hope ...

Github user aljoscha commented on a diff in the pull request:

https://github.com/apache/flink/pull/612#discussion_r28943849
  
--- Diff: 
flink-tests/src/test/java/org/apache/flink/test/recovery/AbstractProcessFailureRecoveryTest.java
 ---
@@ -112,9 +112,9 @@ public void testTaskManagerProcessFailure() {
Tuple2String, Object localAddress = new 
Tuple2String, Object(localhost, jobManagerPort);
 
Configuration jmConfig = new Configuration();
-   
jmConfig.setString(ConfigConstants.AKKA_WATCH_HEARTBEAT_INTERVAL, 1 s);
-   
jmConfig.setString(ConfigConstants.AKKA_WATCH_HEARTBEAT_PAUSE, 4 s);
-   
jmConfig.setInteger(ConfigConstants.AKKA_WATCH_THRESHOLD, 2);
+   
jmConfig.setString(ConfigConstants.AKKA_WATCH_HEARTBEAT_INTERVAL, 1 ms);
--- End diff --

Seemed to run flawlessly, though. :smile: I can the tests about 100 times 
by now without seeing another failure in the targeted tests.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-1867/1880] Raise test timeouts in hope ...

Github user aljoscha commented on a diff in the pull request:

https://github.com/apache/flink/pull/612#discussion_r28943875
  
--- Diff: 
flink-tests/src/test/java/org/apache/flink/test/recovery/AbstractProcessFailureRecoveryTest.java
 ---
@@ -112,9 +112,9 @@ public void testTaskManagerProcessFailure() {
Tuple2String, Object localAddress = new 
Tuple2String, Object(localhost, jobManagerPort);
 
Configuration jmConfig = new Configuration();
-   
jmConfig.setString(ConfigConstants.AKKA_WATCH_HEARTBEAT_INTERVAL, 1 s);
-   
jmConfig.setString(ConfigConstants.AKKA_WATCH_HEARTBEAT_PAUSE, 4 s);
-   
jmConfig.setInteger(ConfigConstants.AKKA_WATCH_THRESHOLD, 2);
+   
jmConfig.setString(ConfigConstants.AKKA_WATCH_HEARTBEAT_INTERVAL, 1 ms);
+   
jmConfig.setString(ConfigConstants.AKKA_WATCH_HEARTBEAT_PAUSE, 20 s);
+   
jmConfig.setInteger(ConfigConstants.AKKA_WATCH_THRESHOLD, 20);
--- End diff --

See above.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-1867/1880] Raise test timeouts in hope ...

Github user aljoscha commented on a diff in the pull request:

https://github.com/apache/flink/pull/612#discussion_r28943824
  
--- Diff: 
flink-tests/src/test/java/org/apache/flink/test/recovery/AbstractProcessFailureRecoveryTest.java
 ---
@@ -112,9 +112,9 @@ public void testTaskManagerProcessFailure() {
Tuple2String, Object localAddress = new 
Tuple2String, Object(localhost, jobManagerPort);
 
Configuration jmConfig = new Configuration();
-   
jmConfig.setString(ConfigConstants.AKKA_WATCH_HEARTBEAT_INTERVAL, 1 s);
-   
jmConfig.setString(ConfigConstants.AKKA_WATCH_HEARTBEAT_PAUSE, 4 s);
-   
jmConfig.setInteger(ConfigConstants.AKKA_WATCH_THRESHOLD, 2);
+   
jmConfig.setString(ConfigConstants.AKKA_WATCH_HEARTBEAT_INTERVAL, 1 ms);
--- End diff --

That's a typo, will fix it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-1867) TaskManagerFailureRecoveryITCase causes stalled travis builds


[ 
https://issues.apache.org/jira/browse/FLINK-1867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14508645#comment-14508645
 ] 

ASF GitHub Bot commented on FLINK-1867:
---

Github user aljoscha commented on a diff in the pull request:

https://github.com/apache/flink/pull/612#discussion_r28943849
  
--- Diff: 
flink-tests/src/test/java/org/apache/flink/test/recovery/AbstractProcessFailureRecoveryTest.java
 ---
@@ -112,9 +112,9 @@ public void testTaskManagerProcessFailure() {
Tuple2String, Object localAddress = new 
Tuple2String, Object(localhost, jobManagerPort);
 
Configuration jmConfig = new Configuration();
-   
jmConfig.setString(ConfigConstants.AKKA_WATCH_HEARTBEAT_INTERVAL, 1 s);
-   
jmConfig.setString(ConfigConstants.AKKA_WATCH_HEARTBEAT_PAUSE, 4 s);
-   
jmConfig.setInteger(ConfigConstants.AKKA_WATCH_THRESHOLD, 2);
+   
jmConfig.setString(ConfigConstants.AKKA_WATCH_HEARTBEAT_INTERVAL, 1 ms);
--- End diff --

Seemed to run flawlessly, though. :smile: I can the tests about 100 times 
by now without seeing another failure in the targeted tests.


 TaskManagerFailureRecoveryITCase causes stalled travis builds
 -

 Key: FLINK-1867
 URL: https://issues.apache.org/jira/browse/FLINK-1867
 Project: Flink
  Issue Type: Bug
  Components: TaskManager, Tests
Affects Versions: 0.9
Reporter: Robert Metzger
Assignee: Aljoscha Krettek

 There are currently tests on travis failing:
 https://travis-ci.org/apache/flink/jobs/57943063



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-1725) New Partitioner for better load balancing for skewed data

2015-04-23 Thread Gianmarco De Francisci Morales (JIRA)

[
https://issues.apache.org/jira/browse/FLINK-1725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14508652#comment-14508652
]

Gianmarco De Francisci Morales commented on FLINK-1725:
---

Hi,
The code looks good, but might there be a way to avoid converting the key to a
string before hashing it?

New Partitioner for better load balancing for skewed data
-

Key: FLINK-1725
URL: https://issues.apache.org/jira/browse/FLINK-1725
Project: Flink
Issue Type: Improvement
Components: New Components
Affects Versions: 0.8.1
Reporter: Anis Nasir
Assignee: Anis Nasir
Labels: LoadBalancing, Partitioner
Original Estimate: 336h
Remaining Estimate: 336h

Hi,
We have recently studied the problem of load balancing in Storm [1].
In particular, we focused on key distribution of the stream for skewed data.
We developed a new stream partitioning scheme (which we call Partial Key
Grouping). It achieves better load balancing than key grouping while being
more scalable than shuffle grouping in terms of memory.
In the paper we show a number of mining algorithms that are easy to implement
with partial key grouping, and whose performance can benefit from it. We
think that it might also be useful for a larger class of algorithms.
Partial key grouping is very easy to implement: it requires just a few lines
of code in Java when implemented as a custom grouping in Storm [2].
For all these reasons, we believe it will be a nice addition to the standard
Partitioners available in Flink. If the community thinks it's a good idea, we
will be happy to offer support in the porting.
References:
[1].
https://melmeric.files.wordpress.com/2014/11/the-power-of-both-choices-practical-load-balancing-for-distributed-stream-processing-engines.pdf
[2]. https://github.com/gdfm/partial-key-grouping

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK-1615] [java api] SimpleTweetInputFormat

2015-04-23 Thread Elbehery

Github user Elbehery commented on the pull request:

https://github.com/apache/flink/pull/442#issuecomment-95512206
  
I did rebase against the master before creating the PR .. but this was long 
time ago, could this be the problems for the conflicts ?!!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-1792) Improve TM Monitoring: CPU utilization, hide graphs by default and show summary only


[ 
https://issues.apache.org/jira/browse/FLINK-1792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14508619#comment-14508619
 ] 

ASF GitHub Bot commented on FLINK-1792:
---

Github user bhatsachin commented on the pull request:

https://github.com/apache/flink/pull/553#issuecomment-95480100
  
The suggested changes have been made


 Improve TM Monitoring: CPU utilization, hide graphs by default and show 
 summary only
 

 Key: FLINK-1792
 URL: https://issues.apache.org/jira/browse/FLINK-1792
 Project: Flink
  Issue Type: Sub-task
  Components: Webfrontend
Affects Versions: 0.9
Reporter: Robert Metzger
Assignee: Sachin Bhat

 As per https://github.com/apache/flink/pull/421 from FLINK-1501, there are 
 some enhancements to the current monitoring required
 - Get the CPU utilization in % from each TaskManager process
 - Remove the metrics graph from the overview and only show the current stats 
 as numbers (cpu load, heap utilization) and add a button to enable the 
 detailed graph.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK-1792] TM Monitoring: CPU utilization, h...

2015-04-23 Thread bhatsachin

Github user bhatsachin commented on the pull request:

https://github.com/apache/flink/pull/553#issuecomment-95480100
  
The suggested changes have been made


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Updated] (FLINK-1745) Add exact k-nearest-neighbours algorithm to machine learning library


 [ 
https://issues.apache.org/jira/browse/FLINK-1745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Rohrmann updated FLINK-1745:
-
Summary: Add exact k-nearest-neighbours algorithm to machine learning 
library  (was: Add k-nearest-neighbours algorithm to machine learning library)

 Add exact k-nearest-neighbours algorithm to machine learning library
 

 Key: FLINK-1745
 URL: https://issues.apache.org/jira/browse/FLINK-1745
 Project: Flink
  Issue Type: New Feature
  Components: Machine Learning Library
Reporter: Till Rohrmann
Assignee: Chiwan Park
  Labels: ML, Starter

 Even though the k-nearest-neighbours (kNN) [1,2] algorithm is quite trivial 
 it is still used as a mean to classify data and to do regression. This issue 
 focuses on the implementation of an exact kNN (H-BNLJ, H-BRJ) algorithm as 
 proposed in [2].
 Could be a starter task.
 Resources:
 [1] [http://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm]
 [2] [https://www.cs.utah.edu/~lifeifei/papers/mrknnj.pdf]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (FLINK-1745) Add k-nearest-neighbours algorithm to machine learning library


 [ 
https://issues.apache.org/jira/browse/FLINK-1745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Rohrmann updated FLINK-1745:
-
Description: 
Even though the k-nearest-neighbours (kNN) [1,2] algorithm is quite trivial it 
is still used as a mean to classify data and to do regression. This issue 

Could be a starter task.

Resources:

[1] [http://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm]
[2] [https://www.cs.utah.edu/~lifeifei/papers/mrknnj.pdf]

  was:
Even though the k-nearest-neighbours (kNN) [1,2] algorithm is quite trivial it 
is still used as a mean to classify data and to do regression.

Could be a starter task.

Resources:

[1] [http://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm]
[2] [https://www.cs.utah.edu/~lifeifei/papers/mrknnj.pdf]


 Add k-nearest-neighbours algorithm to machine learning library
 --

 Key: FLINK-1745
 URL: https://issues.apache.org/jira/browse/FLINK-1745
 Project: Flink
  Issue Type: New Feature
  Components: Machine Learning Library
Reporter: Till Rohrmann
Assignee: Chiwan Park
  Labels: ML, Starter

 Even though the k-nearest-neighbours (kNN) [1,2] algorithm is quite trivial 
 it is still used as a mean to classify data and to do regression. This issue 
 Could be a starter task.
 Resources:
 [1] [http://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm]
 [2] [https://www.cs.utah.edu/~lifeifei/papers/mrknnj.pdf]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Updated] (FLINK-1745) Add k-nearest-neighbours algorithm to machine learning library


 [ 
https://issues.apache.org/jira/browse/FLINK-1745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Rohrmann updated FLINK-1745:
-
Description: 
Even though the k-nearest-neighbours (kNN) [1,2] algorithm is quite trivial it 
is still used as a mean to classify data and to do regression. This issue 
focuses on the implementation of an exact kNN (H-BNLJ, H-BRJ) algorithm as 
proposed in [2].

Could be a starter task.

Resources:

[1] [http://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm]
[2] [https://www.cs.utah.edu/~lifeifei/papers/mrknnj.pdf]

  was:
Even though the k-nearest-neighbours (kNN) [1,2] algorithm is quite trivial it 
is still used as a mean to classify data and to do regression. This issue 

Could be a starter task.

Resources:

[1] [http://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm]
[2] [https://www.cs.utah.edu/~lifeifei/papers/mrknnj.pdf]


 Add k-nearest-neighbours algorithm to machine learning library
 --

 Key: FLINK-1745
 URL: https://issues.apache.org/jira/browse/FLINK-1745
 Project: Flink
  Issue Type: New Feature
  Components: Machine Learning Library
Reporter: Till Rohrmann
Assignee: Chiwan Park
  Labels: ML, Starter

 Even though the k-nearest-neighbours (kNN) [1,2] algorithm is quite trivial 
 it is still used as a mean to classify data and to do regression. This issue 
 focuses on the implementation of an exact kNN (H-BNLJ, H-BRJ) algorithm as 
 proposed in [2].
 Could be a starter task.
 Resources:
 [1] [http://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm]
 [2] [https://www.cs.utah.edu/~lifeifei/papers/mrknnj.pdf]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-1867) TaskManagerFailureRecoveryITCase causes stalled travis builds


[ 
https://issues.apache.org/jira/browse/FLINK-1867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14508637#comment-14508637
 ] 

ASF GitHub Bot commented on FLINK-1867:
---

Github user aljoscha commented on a diff in the pull request:

https://github.com/apache/flink/pull/612#discussion_r28943704
  
--- Diff: 
flink-tests/src/test/scala/org/apache/flink/api/scala/runtime/jobmanager/JobManagerFailsITCase.scala
 ---
@@ -18,24 +18,23 @@
 
 package org.apache.flink.api.scala.runtime.jobmanager
 
-import akka.actor.Status.{Success, Failure}
+import akka.actor.Status.Success
 import akka.actor.{ActorSystem, PoisonPill}
 import akka.testkit.{ImplicitSender, TestKit}
+import org.junit.Ignore
+import org.junit.runner.RunWith
+import org.scalatest.junit.JUnitRunner
+import org.scalatest.{BeforeAndAfterAll, Matchers, WordSpecLike}
+
 import org.apache.flink.configuration.{ConfigConstants, Configuration}
 import org.apache.flink.runtime.akka.AkkaUtils
-import org.apache.flink.runtime.client.JobExecutionException
-import org.apache.flink.runtime.jobgraph.{JobGraph, AbstractJobVertex}
-import org.apache.flink.runtime.jobmanager.Tasks.{NoOpInvokable, 
BlockingNoOpInvokable}
+import org.apache.flink.runtime.jobgraph.{AbstractJobVertex, JobGraph}
+import org.apache.flink.runtime.jobmanager.Tasks.{BlockingNoOpInvokable, 
NoOpInvokable}
 import org.apache.flink.runtime.messages.JobManagerMessages._
 import 
org.apache.flink.runtime.testingUtils.TestingMessages.DisableDisconnect
-import 
org.apache.flink.runtime.testingUtils.TestingTaskManagerMessages.{JobManagerTerminated,
-NotifyWhenJobManagerTerminated}
+import 
org.apache.flink.runtime.testingUtils.TestingTaskManagerMessages.{JobManagerTerminated,
 NotifyWhenJobManagerTerminated}
 import org.apache.flink.runtime.testingUtils.TestingUtils
 import org.apache.flink.test.util.ForkableFlinkMiniCluster
-import org.junit.Ignore
-import org.junit.runner.RunWith
-import org.scalatest.junit.JUnitRunner
-import org.scalatest.{BeforeAndAfterAll, Matchers, WordSpecLike}
 
 @Ignore(Contains a bug with Akka 2.2.1)
--- End diff --

True, removing it


 TaskManagerFailureRecoveryITCase causes stalled travis builds
 -

 Key: FLINK-1867
 URL: https://issues.apache.org/jira/browse/FLINK-1867
 Project: Flink
  Issue Type: Bug
  Components: TaskManager, Tests
Affects Versions: 0.9
Reporter: Robert Metzger
Assignee: Aljoscha Krettek

 There are currently tests on travis failing:
 https://travis-ci.org/apache/flink/jobs/57943063



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK-1867/1880] Raise test timeouts in hope ...

Github user aljoscha commented on a diff in the pull request:

https://github.com/apache/flink/pull/612#discussion_r28943731
  
--- Diff: 
flink-tests/src/test/scala/org/apache/flink/api/scala/runtime/jobmanager/JobManagerFailsITCase.scala
 ---
@@ -136,9 +135,9 @@ with WordSpecLike with Matchers with BeforeAndAfterAll {
 config.setInteger(ConfigConstants.TASK_MANAGER_NUM_TASK_SLOTS, 
numSlots)
 
config.setInteger(ConfigConstants.LOCAL_INSTANCE_MANAGER_NUMBER_TASK_MANAGER, 
numTaskmanagers)
 config.setString(ConfigConstants.AKKA_WATCH_HEARTBEAT_INTERVAL, 1000 
ms)
-config.setString(ConfigConstants.AKKA_WATCH_HEARTBEAT_PAUSE, 4000 ms)
+config.setString(ConfigConstants.AKKA_WATCH_HEARTBEAT_PAUSE, 20 s)
--- End diff --

As above, it doesn't affect test execution time.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-1867) TaskManagerFailureRecoveryITCase causes stalled travis builds


[ 
https://issues.apache.org/jira/browse/FLINK-1867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14508638#comment-14508638
 ] 

ASF GitHub Bot commented on FLINK-1867:
---

Github user aljoscha commented on a diff in the pull request:

https://github.com/apache/flink/pull/612#discussion_r28943731
  
--- Diff: 
flink-tests/src/test/scala/org/apache/flink/api/scala/runtime/jobmanager/JobManagerFailsITCase.scala
 ---
@@ -136,9 +135,9 @@ with WordSpecLike with Matchers with BeforeAndAfterAll {
 config.setInteger(ConfigConstants.TASK_MANAGER_NUM_TASK_SLOTS, 
numSlots)
 
config.setInteger(ConfigConstants.LOCAL_INSTANCE_MANAGER_NUMBER_TASK_MANAGER, 
numTaskmanagers)
 config.setString(ConfigConstants.AKKA_WATCH_HEARTBEAT_INTERVAL, 1000 
ms)
-config.setString(ConfigConstants.AKKA_WATCH_HEARTBEAT_PAUSE, 4000 ms)
+config.setString(ConfigConstants.AKKA_WATCH_HEARTBEAT_PAUSE, 20 s)
--- End diff --

As above, it doesn't affect test execution time.


 TaskManagerFailureRecoveryITCase causes stalled travis builds
 -

 Key: FLINK-1867
 URL: https://issues.apache.org/jira/browse/FLINK-1867
 Project: Flink
  Issue Type: Bug
  Components: TaskManager, Tests
Affects Versions: 0.9
Reporter: Robert Metzger
Assignee: Aljoscha Krettek

 There are currently tests on travis failing:
 https://travis-ci.org/apache/flink/jobs/57943063



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK-1789] [core] [runtime] [java-api] Allow...

Github user aljoscha commented on the pull request:

https://github.com/apache/flink/pull/593#issuecomment-95500680
  
This does not work if the user uses classes that are not available on the 
local machine since you don't add the additional class path entries in 
JobWithJars.buildUserCodeClassLoader(). Correct?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-1799][scala] Fix handling of generic ar...

2015-04-23 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/582


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-1615] [java api] SimpleTweetInputFormat

Github user aljoscha commented on the pull request:

https://github.com/apache/flink/pull/442#issuecomment-95511047
  
This looks good to merge. Any objections?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-1745) Add k-nearest-neighbours algorithm to machine learning library


[ 
https://issues.apache.org/jira/browse/FLINK-1745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14508606#comment-14508606
 ] 

Till Rohrmann commented on FLINK-1745:
--

I agree with grouping the nearest neighbours with respect to the queried vector.

 Add k-nearest-neighbours algorithm to machine learning library
 --

 Key: FLINK-1745
 URL: https://issues.apache.org/jira/browse/FLINK-1745
 Project: Flink
  Issue Type: New Feature
  Components: Machine Learning Library
Reporter: Till Rohrmann
Assignee: Chiwan Park
  Labels: ML, Starter

 Even though the k-nearest-neighbours (kNN) [1,2] algorithm is quite trivial 
 it is still used as a mean to classify data and to do regression.
 Could be a starter task.
 Resources:
 [1] [http://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm]
 [2] [https://www.cs.utah.edu/~lifeifei/papers/mrknnj.pdf]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (FLINK-1934) Add approximative k-nearest-neighbours (kNN) algorithm to machine learning library

Till Rohrmann created FLINK-1934:


 Summary: Add approximative k-nearest-neighbours (kNN) algorithm to 
machine learning library
 Key: FLINK-1934
 URL: https://issues.apache.org/jira/browse/FLINK-1934
 Project: Flink
  Issue Type: New Feature
  Components: Machine Learning Library
Reporter: Till Rohrmann


kNN is still a widely used algorithm for classification and regression. 
However, due to the computational costs of an exact implementation, it does not 
scale well to large amounts of data. Therefore, it is worthwhile to also add an 
approximative kNN implementation as proposed in [1,2].

Resources:
[1] https://www.cs.utah.edu/~lifeifei/papers/mrknnj.pdf
[2] http://www.computer.org/csdl/proceedings/wacv/2007/2794/00/27940028.pdf



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK-1867/1880] Raise test timeouts in hope ...

Github user aljoscha commented on a diff in the pull request:

https://github.com/apache/flink/pull/612#discussion_r28943598
  
--- Diff: 
flink-tests/src/test/scala/org/apache/flink/api/scala/runtime/taskmanager/TaskManagerFailsITCase.scala
 ---
@@ -231,9 +231,9 @@ with WordSpecLike with Matchers with BeforeAndAfterAll {
 config.setInteger(ConfigConstants.TASK_MANAGER_NUM_TASK_SLOTS, 
numSlots)
 
config.setInteger(ConfigConstants.LOCAL_INSTANCE_MANAGER_NUMBER_TASK_MANAGER, 
numTaskmanagers)
 config.setString(ConfigConstants.AKKA_WATCH_HEARTBEAT_INTERVAL, 1000 
ms)
-config.setString(ConfigConstants.AKKA_WATCH_HEARTBEAT_PAUSE, 4000 ms)
+config.setString(ConfigConstants.AKKA_WATCH_HEARTBEAT_PAUSE, 20 s)
--- End diff --

No, it doesn't before and after this change the 4 tests complete in about 8 
seconds. @tillrohrmann suggested that, since the actor system is local, there 
are some other mechanisms in play that signal failure in this case.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-1867) TaskManagerFailureRecoveryITCase causes stalled travis builds


[ 
https://issues.apache.org/jira/browse/FLINK-1867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14508648#comment-14508648
 ] 

ASF GitHub Bot commented on FLINK-1867:
---

Github user aljoscha commented on a diff in the pull request:

https://github.com/apache/flink/pull/612#discussion_r28943875
  
--- Diff: 
flink-tests/src/test/java/org/apache/flink/test/recovery/AbstractProcessFailureRecoveryTest.java
 ---
@@ -112,9 +112,9 @@ public void testTaskManagerProcessFailure() {
Tuple2String, Object localAddress = new 
Tuple2String, Object(localhost, jobManagerPort);
 
Configuration jmConfig = new Configuration();
-   
jmConfig.setString(ConfigConstants.AKKA_WATCH_HEARTBEAT_INTERVAL, 1 s);
-   
jmConfig.setString(ConfigConstants.AKKA_WATCH_HEARTBEAT_PAUSE, 4 s);
-   
jmConfig.setInteger(ConfigConstants.AKKA_WATCH_THRESHOLD, 2);
+   
jmConfig.setString(ConfigConstants.AKKA_WATCH_HEARTBEAT_INTERVAL, 1 ms);
+   
jmConfig.setString(ConfigConstants.AKKA_WATCH_HEARTBEAT_PAUSE, 20 s);
+   
jmConfig.setInteger(ConfigConstants.AKKA_WATCH_THRESHOLD, 20);
--- End diff --

See above.


 TaskManagerFailureRecoveryITCase causes stalled travis builds
 -

 Key: FLINK-1867
 URL: https://issues.apache.org/jira/browse/FLINK-1867
 Project: Flink
  Issue Type: Bug
  Components: TaskManager, Tests
Affects Versions: 0.9
Reporter: Robert Metzger
Assignee: Aljoscha Krettek

 There are currently tests on travis failing:
 https://travis-ci.org/apache/flink/jobs/57943063



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK-1486] add print method for prefixing a ...

2015-04-23 Thread mxm

Github user mxm commented on the pull request:

https://github.com/apache/flink/pull/372#issuecomment-95488183
  
Do we want to break backwards compatibility or include a new method for 
printing on the client? After all, printing on the workers is a useful tool to 
debug the dataflow of a program.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-1486) Add a string to the print method to identify output


[ 
https://issues.apache.org/jira/browse/FLINK-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14508656#comment-14508656
 ] 

ASF GitHub Bot commented on FLINK-1486:
---

Github user mxm commented on the pull request:

https://github.com/apache/flink/pull/372#issuecomment-95488183
  
Do we want to break backwards compatibility or include a new method for 
printing on the client? After all, printing on the workers is a useful tool to 
debug the dataflow of a program.


 Add a string to the print method to identify output
 ---

 Key: FLINK-1486
 URL: https://issues.apache.org/jira/browse/FLINK-1486
 Project: Flink
  Issue Type: Improvement
  Components: Local Runtime
Reporter: Maximilian Michels
Assignee: Maximilian Michels
Priority: Minor
  Labels: usability
 Fix For: 0.9


 The output of the {{print}} method of {[DataSet}} is mainly used for debug 
 purposes. Currently, it is difficult to identify the output.
 I would suggest to add another {{print(String str)}} method which allows the 
 user to supply a String to identify the output. This could be a prefix before 
 the actual output or a format string (which might be an overkill).
 {code}
 DataSet data = env.fromElements(1,2,3,4,5);
 {code}
 For example, {{data.print(MyDataSet: )}} would output print
 {noformat}
 MyDataSet: 1
 MyDataSet: 2
 ...
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-1789) Allow adding of URLs to the usercode class loader


[ 
https://issues.apache.org/jira/browse/FLINK-1789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14508704#comment-14508704
 ] 

ASF GitHub Bot commented on FLINK-1789:
---

Github user aljoscha commented on the pull request:

https://github.com/apache/flink/pull/593#issuecomment-95500680
  
This does not work if the user uses classes that are not available on the 
local machine since you don't add the additional class path entries in 
JobWithJars.buildUserCodeClassLoader(). Correct?


 Allow adding of URLs to the usercode class loader
 -

 Key: FLINK-1789
 URL: https://issues.apache.org/jira/browse/FLINK-1789
 Project: Flink
  Issue Type: Improvement
  Components: Distributed Runtime
Reporter: Timo Walther
Assignee: Timo Walther
Priority: Minor

 Currently, there is no option to add customs classpath URLs to the 
 FlinkUserCodeClassLoader. JARs always need to be shipped to the cluster even 
 if they are already present on all nodes.
 It would be great if RemoteEnvironment also accepts valid classpaths URLs and 
 forwards them to BlobLibraryCacheManager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Resolved] (FLINK-1799) Scala API does not support generic arrays

2015-04-23 Thread Aljoscha Krettek (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-1799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aljoscha Krettek resolved FLINK-1799.
-
Resolution: Fixed

Fixed in 
https://github.com/apache/flink/commit/4672e95ef158bb5e52b1d493465d62429bbdb29e

 Scala API does not support generic arrays
 -

 Key: FLINK-1799
 URL: https://issues.apache.org/jira/browse/FLINK-1799
 Project: Flink
  Issue Type: Bug
Reporter: Till Rohrmann
Assignee: Aljoscha Krettek

 The Scala API does not support generic arrays at the moment. It throws a 
 rather unhelpful error message ```InvalidTypesException: The given type is 
 not a valid object array```.
 Code to reproduce the problem is given below:
 {code}
 def main(args: Array[String]) {
   foobar[Double]
 }
 def foobar[T: ClassTag: TypeInformation]: DataSet[Block[T]] = {
   val tpe = createTypeInformation[Array[T]]
   null
 }
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-1799) Scala API does not support generic arrays


[ 
https://issues.apache.org/jira/browse/FLINK-1799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14508721#comment-14508721
 ] 

ASF GitHub Bot commented on FLINK-1799:
---

Github user asfgit closed the pull request at:

https://github.com/apache/flink/pull/582


 Scala API does not support generic arrays
 -

 Key: FLINK-1799
 URL: https://issues.apache.org/jira/browse/FLINK-1799
 Project: Flink
  Issue Type: Bug
Reporter: Till Rohrmann
Assignee: Aljoscha Krettek

 The Scala API does not support generic arrays at the moment. It throws a 
 rather unhelpful error message ```InvalidTypesException: The given type is 
 not a valid object array```.
 Code to reproduce the problem is given below:
 {code}
 def main(args: Array[String]) {
   foobar[Double]
 }
 def foobar[T: ClassTag: TypeInformation]: DataSet[Block[T]] = {
   val tpe = createTypeInformation[Array[T]]
   null
 }
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-1615) Introduces a new InputFormat for Tweets


[ 
https://issues.apache.org/jira/browse/FLINK-1615?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14508741#comment-14508741
 ] 

ASF GitHub Bot commented on FLINK-1615:
---

Github user aljoscha commented on the pull request:

https://github.com/apache/flink/pull/442#issuecomment-95511047
  
This looks good to merge. Any objections?


 Introduces a new InputFormat for Tweets
 ---

 Key: FLINK-1615
 URL: https://issues.apache.org/jira/browse/FLINK-1615
 Project: Flink
  Issue Type: New Feature
  Components: flink-contrib
Affects Versions: 0.8.1
Reporter: mustafa elbehery
Priority: Minor

 An event-driven parser for Tweets into Java Pojos. 
 It parses all the important part of the tweet into Java objects. 
 Tested on cluster and the performance in pretty well. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-1398) A new DataSet function: extractElementFromTuple


[ 
https://issues.apache.org/jira/browse/FLINK-1398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14508774#comment-14508774
 ] 

ASF GitHub Bot commented on FLINK-1398:
---

Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/308#issuecomment-95522921
  
I would put this into the `flink-contrib` module.


 A new DataSet function: extractElementFromTuple
 ---

 Key: FLINK-1398
 URL: https://issues.apache.org/jira/browse/FLINK-1398
 Project: Flink
  Issue Type: Wish
Reporter: Felix Neutatz
Assignee: Felix Neutatz
Priority: Minor

 This is the use case:
 {code:xml}
 DataSetTuple2Integer, Double data =  env.fromElements(new 
 Tuple2Integer, Double(1,2.0));
 
 data.map(new ElementFromTuple());
 
 }
 public static final class ElementFromTuple implements 
 MapFunctionTuple2Integer, Double, Double {
 @Override
 public Double map(Tuple2Integer, Double value) {
 return value.f1;
 }
 }
 {code}
 It would be awesome if we had something like this:
 {code:xml}
 data.extractElement(1);
 {code}
 This means that we implement a function for DataSet which extracts a certain 
 element from a given Tuple.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-1472) Web frontend config overview shows wrong value


[ 
https://issues.apache.org/jira/browse/FLINK-1472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14508778#comment-14508778
 ] 

ASF GitHub Bot commented on FLINK-1472:
---

Github user qmlmoon commented on the pull request:

https://github.com/apache/flink/pull/439#issuecomment-95523387
  
That would be a nice way, but
1) key/value pairs are not always distinguished by _KEY suffix, we have to 
format all the fields in ConfigConstants
2) some values are not in the ConfigConstants. e.g task manager slot = 1, 
akka_startup_time=aka_ask_timeout

In summary, we need a formatted one-to-one map of the key/value pairs in 
ConfigConstants


 Web frontend config overview shows wrong value
 --

 Key: FLINK-1472
 URL: https://issues.apache.org/jira/browse/FLINK-1472
 Project: Flink
  Issue Type: Bug
  Components: Webfrontend
Affects Versions: master
Reporter: Ufuk Celebi
Assignee: Mingliang Qi
Priority: Minor

 The web frontend shows configuration values even if they could not be 
 correctly parsed.
 For example I've configured the number of buffers as 123.000, which cannot 
 be parsed as an Integer by GlobalConfiguration and the default value is used. 
 Still, the web frontend shows the not used 123.000.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK-1670] Made DataStream iterable

2015-04-23 Thread ggevay

Github user ggevay commented on the pull request:

https://github.com/apache/flink/pull/581#issuecomment-95523580
  
OK, I see your point. I am thinking about using 
NetUtils.findConnectingAddress to determine which interface is used for the 
communication with the cluster. For this, I would need an IP of something in 
the cluster to connect to. What would be the best way to get this info? (Maybe 
I could use RemoteStreamEnvironment.host, but it is a private member, so there 
is probably a more standard way that I'm just overlooking.)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-1930) NullPointerException in vertex-centric iteration


[ 
https://issues.apache.org/jira/browse/FLINK-1930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14508751#comment-14508751
 ] 

ASF GitHub Bot commented on FLINK-1930:
---

GitHub user StephanEwen opened a pull request:

https://github.com/apache/flink/pull/619

[FLINK-1930] [runtime] Improve exception when bufferpools have been shutdown



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/StephanEwen/incubator-flink buffer_exception

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/619.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #619


commit d6d7c3bcf4cf0a3f1164875a6fe3a093aeb2a68b
Author: Stephan Ewen se...@apache.org
Date:   2015-04-22T15:52:21Z

[FLINK-1930] [runtime] Improve exception when bufferpools have been shut 
down.




 NullPointerException in vertex-centric iteration
 

 Key: FLINK-1930
 URL: https://issues.apache.org/jira/browse/FLINK-1930
 Project: Flink
  Issue Type: Bug
Affects Versions: 0.9
Reporter: Vasia Kalavri

 Hello to my Squirrels,
 I came across this exception when having a vertex-centric iteration output 
 followed by a group by. 
 I'm not sure if what is causing it, since I saw this error in a rather large 
 pipeline, but I managed to reproduce it with [this code example | 
 https://github.com/vasia/flink/commit/1b7bbca1a6130fbcfe98b4b9b43967eb4c61f309]
  and a sufficiently large dataset, e.g. [this one | 
 http://snap.stanford.edu/data/com-DBLP.html] (I'm running this locally).
 It seems like a null Buffer in RecordWriter.
 The exception message is the following:
 Exception in thread main 
 org.apache.flink.runtime.client.JobExecutionException: Job execution failed.
 at 
 org.apache.flink.runtime.jobmanager.JobManager$anonfun$receiveWithLogMessages$1.applyOrElse(JobManager.scala:319)
 at 
 scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33)
 at 
 scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33)
 at 
 scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25)
 at 
 org.apache.flink.runtime.ActorLogMessages$anon$1.apply(ActorLogMessages.scala:37)
 at 
 org.apache.flink.runtime.ActorLogMessages$anon$1.apply(ActorLogMessages.scala:30)
 at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118)
 at 
 org.apache.flink.runtime.ActorLogMessages$anon$1.applyOrElse(ActorLogMessages.scala:30)
 at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
 at 
 org.apache.flink.runtime.jobmanager.JobManager.aroundReceive(JobManager.scala:94)
 at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
 at akka.actor.ActorCell.invoke(ActorCell.scala:487)
 at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254)
 at akka.dispatch.Mailbox.run(Mailbox.scala:221)
 at akka.dispatch.Mailbox.exec(Mailbox.scala:231)
 at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
 at 
 scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
 at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
 at 
 scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
 Caused by: java.lang.NullPointerException
 at 
 org.apache.flink.runtime.io.network.api.serialization.SpanningRecordSerializer.setNextBuffer(SpanningRecordSerializer.java:93)
 at 
 org.apache.flink.runtime.io.network.api.writer.RecordWriter.emit(RecordWriter.java:92)
 at 
 org.apache.flink.runtime.operators.shipping.OutputCollector.collect(OutputCollector.java:65)
 at 
 org.apache.flink.runtime.iterative.task.IterationHeadPactTask.streamSolutionSetToFinalOutput(IterationHeadPactTask.java:405)
 at 
 org.apache.flink.runtime.iterative.task.IterationHeadPactTask.run(IterationHeadPactTask.java:365)
 at 
 org.apache.flink.runtime.operators.RegularPactTask.invoke(RegularPactTask.java:360)
 at 
 org.apache.flink.runtime.execution.RuntimeEnvironment.run(RuntimeEnvironment.java:221)
 at java.lang.Thread.run(Thread.java:745)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Assigned] (FLINK-1934) Add approximative k-nearest-neighbours (kNN) algorithm to machine learning library

2015-04-23 Thread Raghav Chalapathy (JIRA)


 [ 
https://issues.apache.org/jira/browse/FLINK-1934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raghav Chalapathy reassigned FLINK-1934:


Assignee: Raghav Chalapathy

 Add approximative k-nearest-neighbours (kNN) algorithm to machine learning 
 library
 --

 Key: FLINK-1934
 URL: https://issues.apache.org/jira/browse/FLINK-1934
 Project: Flink
  Issue Type: New Feature
  Components: Machine Learning Library
Reporter: Till Rohrmann
Assignee: Raghav Chalapathy
  Labels: ML

 kNN is still a widely used algorithm for classification and regression. 
 However, due to the computational costs of an exact implementation, it does 
 not scale well to large amounts of data. Therefore, it is worthwhile to also 
 add an approximative kNN implementation as proposed in [1,2].
 Resources:
 [1] https://www.cs.utah.edu/~lifeifei/papers/mrknnj.pdf
 [2] http://www.computer.org/csdl/proceedings/wacv/2007/2794/00/27940028.pdf



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [runtime] Bump Netty version to 4.27.Final and...

Github user uce closed the pull request at:

https://github.com/apache/flink/pull/617


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [runtime] Bump Netty version to 4.27.Final and...

GitHub user uce reopened a pull request:

https://github.com/apache/flink/pull/617

[runtime] Bump Netty version to 4.27.Final and add javassist

@rmetzger, javassist is set in the root POM. Is it OK to leave it in 
flink-runtime as I have it now w/o version?

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/uce/incubator-flink javassist

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/617.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #617


commit 6a91d722005d7f8d138e8553f92a134c52209de8
Author: Ufuk Celebi u...@apache.org
Date:   2015-04-22T12:50:38Z

[runtime] Bump Netty version to 4.27.Final and add javassist




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-1930] [runtime] Improve exception when ...

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/619#issuecomment-95517313
  
@uce Do you see any further implications to the network stack via this 
change (exception instead of null)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-1930] [runtime] Improve exception when ...

GitHub user StephanEwen opened a pull request:

https://github.com/apache/flink/pull/619

[FLINK-1930] [runtime] Improve exception when bufferpools have been shutdown



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/StephanEwen/incubator-flink buffer_exception

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/619.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #619


commit d6d7c3bcf4cf0a3f1164875a6fe3a093aeb2a68b
Author: Stephan Ewen se...@apache.org
Date:   2015-04-22T15:52:21Z

[FLINK-1930] [runtime] Improve exception when bufferpools have been shut 
down.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-1472] Fixed Web frontend config overvie...

Github user aljoscha commented on the pull request:

https://github.com/apache/flink/pull/439#issuecomment-95517673
  
Hi,
sorry for the long wait on this. I really like the feature but the 
implementation is not scalable: If new config values are added this needs to be 
updated in several places now.

Could you change ConfigConstants and add a static initializer block that 
builds the hash maps that you manually build in DefaultConfigKeyValues using 
reflection. The code would just need to loop through all fields that have _KEY 
at the end, and then find the matching default value without the _KEY at the 
end. From the default value field the type of the value can be determined and 
it can be added to the appropriate hash map. This way, the defaults will always 
stay up to date with the actual config constants.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-1472) Web frontend config overview shows wrong value


[ 
https://issues.apache.org/jira/browse/FLINK-1472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14508756#comment-14508756
 ] 

ASF GitHub Bot commented on FLINK-1472:
---

Github user aljoscha commented on the pull request:

https://github.com/apache/flink/pull/439#issuecomment-95517673
  
Hi,
sorry for the long wait on this. I really like the feature but the 
implementation is not scalable: If new config values are added this needs to be 
updated in several places now.

Could you change ConfigConstants and add a static initializer block that 
builds the hash maps that you manually build in DefaultConfigKeyValues using 
reflection. The code would just need to loop through all fields that have _KEY 
at the end, and then find the matching default value without the _KEY at the 
end. From the default value field the type of the value can be determined and 
it can be added to the appropriate hash map. This way, the defaults will always 
stay up to date with the actual config constants.


 Web frontend config overview shows wrong value
 --

 Key: FLINK-1472
 URL: https://issues.apache.org/jira/browse/FLINK-1472
 Project: Flink
  Issue Type: Bug
  Components: Webfrontend
Affects Versions: master
Reporter: Ufuk Celebi
Assignee: Mingliang Qi
Priority: Minor

 The web frontend shows configuration values even if they could not be 
 correctly parsed.
 For example I've configured the number of buffers as 123.000, which cannot 
 be parsed as an Integer by GlobalConfiguration and the default value is used. 
 Still, the web frontend shows the not used 123.000.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK-1398] Introduce extractSingleField() in...

2015-04-23 Thread rmetzger

Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/308#issuecomment-95522921
  
I would put this into the `flink-contrib` module.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-1398] Introduce extractSingleField() in...

2015-04-23 Thread fhueske

Github user fhueske commented on the pull request:

https://github.com/apache/flink/pull/308#issuecomment-95523597
  
@aljoscha Those are good points. I like the idea of cleaning the Java and 
Scala APIs from operations that work on structured data such as projection, 
aggregation, etc. and support those use cases through the Table API. Not sure 
if the Table API can serve as a complete replacement at this point in time, but 
moving it there is the right thing to do, IMO.

But this discussion should happen on the dev mailing list, not in some PR 
comment thread ;-)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-1670) Collect method for streaming


[ 
https://issues.apache.org/jira/browse/FLINK-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14508781#comment-14508781
 ] 

ASF GitHub Bot commented on FLINK-1670:
---

Github user ggevay commented on the pull request:

https://github.com/apache/flink/pull/581#issuecomment-95523580
  
OK, I see your point. I am thinking about using 
NetUtils.findConnectingAddress to determine which interface is used for the 
communication with the cluster. For this, I would need an IP of something in 
the cluster to connect to. What would be the best way to get this info? (Maybe 
I could use RemoteStreamEnvironment.host, but it is a private member, so there 
is probably a more standard way that I'm just overlooking.)


 Collect method for streaming
 

 Key: FLINK-1670
 URL: https://issues.apache.org/jira/browse/FLINK-1670
 Project: Flink
  Issue Type: New Feature
  Components: Streaming
Affects Versions: 0.9
Reporter: Márton Balassi
Assignee: Gabor Gevay
Priority: Minor

 A convenience method for streaming back the results of a job to the client.
 As the client itself is a bottleneck anyway an easy solution would be to 
 provide a socket sink with degree of parallelism 1, from which a client 
 utility can read.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-1398) A new DataSet function: extractElementFromTuple


[ 
https://issues.apache.org/jira/browse/FLINK-1398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14508782#comment-14508782
 ] 

ASF GitHub Bot commented on FLINK-1398:
---

Github user fhueske commented on the pull request:

https://github.com/apache/flink/pull/308#issuecomment-95523597
  
@aljoscha Those are good points. I like the idea of cleaning the Java and 
Scala APIs from operations that work on structured data such as projection, 
aggregation, etc. and support those use cases through the Table API. Not sure 
if the Table API can serve as a complete replacement at this point in time, but 
moving it there is the right thing to do, IMO.

But this discussion should happen on the dev mailing list, not in some PR 
comment thread ;-)


 A new DataSet function: extractElementFromTuple
 ---

 Key: FLINK-1398
 URL: https://issues.apache.org/jira/browse/FLINK-1398
 Project: Flink
  Issue Type: Wish
Reporter: Felix Neutatz
Assignee: Felix Neutatz
Priority: Minor

 This is the use case:
 {code:xml}
 DataSetTuple2Integer, Double data =  env.fromElements(new 
 Tuple2Integer, Double(1,2.0));
 
 data.map(new ElementFromTuple());
 
 }
 public static final class ElementFromTuple implements 
 MapFunctionTuple2Integer, Double, Double {
 @Override
 public Double map(Tuple2Integer, Double value) {
 return value.f1;
 }
 }
 {code}
 It would be awesome if we had something like this:
 {code:xml}
 data.extractElement(1);
 {code}
 This means that we implement a function for DataSet which extracts a certain 
 element from a given Tuple.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK-1670] Made DataStream iterable

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/581#issuecomment-95528015
  
`NetUtils.findConnectingAddress` is a useful util, when you know that the 
endpoint is up already. If you can make the assumption that the master is 
running already you can use that. For remote environments, this is probably a 
fair assumption. For local environments, this may not matter anyways, localhost 
should work there.

Beware of the setups where you start the cluster from one machine (that 
starts the master on that machine) and launch the program from the same one. 
This whole network discovery is simply a bit tricky ;-)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-1670) Collect method for streaming


[ 
https://issues.apache.org/jira/browse/FLINK-1670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14508803#comment-14508803
 ] 

ASF GitHub Bot commented on FLINK-1670:
---

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/581#issuecomment-95528015
  
`NetUtils.findConnectingAddress` is a useful util, when you know that the 
endpoint is up already. If you can make the assumption that the master is 
running already you can use that. For remote environments, this is probably a 
fair assumption. For local environments, this may not matter anyways, localhost 
should work there.

Beware of the setups where you start the cluster from one machine (that 
starts the master on that machine) and launch the program from the same one. 
This whole network discovery is simply a bit tricky ;-)


 Collect method for streaming
 

 Key: FLINK-1670
 URL: https://issues.apache.org/jira/browse/FLINK-1670
 Project: Flink
  Issue Type: New Feature
  Components: Streaming
Affects Versions: 0.9
Reporter: Márton Balassi
Assignee: Gabor Gevay
Priority: Minor

 A convenience method for streaming back the results of a job to the client.
 As the client itself is a bottleneck anyway an easy solution would be to 
 provide a socket sink with degree of parallelism 1, from which a client 
 utility can read.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK-1789] [core] [runtime] [java-api] Allow...

Github user aljoscha commented on the pull request:

https://github.com/apache/flink/pull/593#issuecomment-95564844
  
Yes, this is true, but the way it is implemented, the folders are not 
always added to the class loader. Maybe I'm wrong here, but 
JobWithJars.getUserCodeClassLoader and JobWithJars.buildUserCodeClassLoader 
don't add the URLs to the ClassLoader that they create.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-1789) Allow adding of URLs to the usercode class loader


[ 
https://issues.apache.org/jira/browse/FLINK-1789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14508954#comment-14508954
 ] 

ASF GitHub Bot commented on FLINK-1789:
---

Github user aljoscha commented on the pull request:

https://github.com/apache/flink/pull/593#issuecomment-95564844
  
Yes, this is true, but the way it is implemented, the folders are not 
always added to the class loader. Maybe I'm wrong here, but 
JobWithJars.getUserCodeClassLoader and JobWithJars.buildUserCodeClassLoader 
don't add the URLs to the ClassLoader that they create.


 Allow adding of URLs to the usercode class loader
 -

 Key: FLINK-1789
 URL: https://issues.apache.org/jira/browse/FLINK-1789
 Project: Flink
  Issue Type: Improvement
  Components: Distributed Runtime
Reporter: Timo Walther
Assignee: Timo Walther
Priority: Minor

 Currently, there is no option to add customs classpath URLs to the 
 FlinkUserCodeClassLoader. JARs always need to be shipped to the cluster even 
 if they are already present on all nodes.
 It would be great if RemoteEnvironment also accepts valid classpaths URLs and 
 forwards them to BlobLibraryCacheManager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK-1398] Introduce extractSingleField() in...

Github user aljoscha commented on the pull request:

https://github.com/apache/flink/pull/308#issuecomment-95566780
  
Yes, I think we should start a discussion there. I just wanted to give the 
reasons for my opinion here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-1398) A new DataSet function: extractElementFromTuple


[ 
https://issues.apache.org/jira/browse/FLINK-1398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14508957#comment-14508957
 ] 

ASF GitHub Bot commented on FLINK-1398:
---

Github user aljoscha commented on the pull request:

https://github.com/apache/flink/pull/308#issuecomment-95566780
  
Yes, I think we should start a discussion there. I just wanted to give the 
reasons for my opinion here.


 A new DataSet function: extractElementFromTuple
 ---

 Key: FLINK-1398
 URL: https://issues.apache.org/jira/browse/FLINK-1398
 Project: Flink
  Issue Type: Wish
Reporter: Felix Neutatz
Assignee: Felix Neutatz
Priority: Minor

 This is the use case:
 {code:xml}
 DataSetTuple2Integer, Double data =  env.fromElements(new 
 Tuple2Integer, Double(1,2.0));
 
 data.map(new ElementFromTuple());
 
 }
 public static final class ElementFromTuple implements 
 MapFunctionTuple2Integer, Double, Double {
 @Override
 public Double map(Tuple2Integer, Double value) {
 return value.f1;
 }
 }
 {code}
 It would be awesome if we had something like this:
 {code:xml}
 data.extractElement(1);
 {code}
 This means that we implement a function for DataSet which extracts a certain 
 element from a given Tuple.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK-938] Automatically configure the jobman...

Github user uce commented on the pull request:

https://github.com/apache/flink/pull/48#issuecomment-95579798
  
Thanks for the PR. I've looked into the other approach (#248) and I think 
you can safely close this on now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-1486) Add a string to the print method to identify output


[ 
https://issues.apache.org/jira/browse/FLINK-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14509016#comment-14509016
 ] 

ASF GitHub Bot commented on FLINK-1486:
---

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/372#issuecomment-95580456
  
From what I have seen, most people expect print() to actually go to the 
client.
It would be a break of backwards compatibility, agreed. Maybe one of the 
few that are necessary.


 Add a string to the print method to identify output
 ---

 Key: FLINK-1486
 URL: https://issues.apache.org/jira/browse/FLINK-1486
 Project: Flink
  Issue Type: Improvement
  Components: Local Runtime
Reporter: Maximilian Michels
Assignee: Maximilian Michels
Priority: Minor
  Labels: usability
 Fix For: 0.9


 The output of the {{print}} method of {[DataSet}} is mainly used for debug 
 purposes. Currently, it is difficult to identify the output.
 I would suggest to add another {{print(String str)}} method which allows the 
 user to supply a String to identify the output. This could be a prefix before 
 the actual output or a format string (which might be an overkill).
 {code}
 DataSet data = env.fromElements(1,2,3,4,5);
 {code}
 For example, {{data.print(MyDataSet: )}} would output print
 {noformat}
 MyDataSet: 1
 MyDataSet: 2
 ...
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK-1486] add print method for prefixing a ...

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/372#issuecomment-95580456
  
From what I have seen, most people expect print() to actually go to the 
client.
It would be a break of backwards compatibility, agreed. Maybe one of the 
few that are necessary.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-1867) TaskManagerFailureRecoveryITCase causes stalled travis builds


[ 
https://issues.apache.org/jira/browse/FLINK-1867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14509029#comment-14509029
 ] 

ASF GitHub Bot commented on FLINK-1867:
---

Github user StephanEwen commented on a diff in the pull request:

https://github.com/apache/flink/pull/612#discussion_r28961064
  
--- Diff: 
flink-tests/src/test/scala/org/apache/flink/api/scala/runtime/taskmanager/TaskManagerFailsITCase.scala
 ---
@@ -231,9 +231,9 @@ with WordSpecLike with Matchers with BeforeAndAfterAll {
 config.setInteger(ConfigConstants.TASK_MANAGER_NUM_TASK_SLOTS, 
numSlots)
 
config.setInteger(ConfigConstants.LOCAL_INSTANCE_MANAGER_NUMBER_TASK_MANAGER, 
numTaskmanagers)
 config.setString(ConfigConstants.AKKA_WATCH_HEARTBEAT_INTERVAL, 1000 
ms)
-config.setString(ConfigConstants.AKKA_WATCH_HEARTBEAT_PAUSE, 4000 ms)
+config.setString(ConfigConstants.AKKA_WATCH_HEARTBEAT_PAUSE, 20 s)
--- End diff --

True, the heartbeats go only between actor systems, not actors. If 
everything is local, no hearbeats happen anyways.

May as well drop this config entry then, I suppose...


 TaskManagerFailureRecoveryITCase causes stalled travis builds
 -

 Key: FLINK-1867
 URL: https://issues.apache.org/jira/browse/FLINK-1867
 Project: Flink
  Issue Type: Bug
  Components: TaskManager, Tests
Affects Versions: 0.9
Reporter: Robert Metzger
Assignee: Aljoscha Krettek

 There are currently tests on travis failing:
 https://travis-ci.org/apache/flink/jobs/57943063



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK-1472] Fixed Web frontend config overvie...

Github user aljoscha commented on the pull request:

https://github.com/apache/flink/pull/439#issuecomment-95580562
  
Yes, but then we should change this now and not build more code on top of 
this that can fail in the future if someone forgets to add the names to the 
correct hash set in some other class.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-1472) Web frontend config overview shows wrong value


[ 
https://issues.apache.org/jira/browse/FLINK-1472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14509032#comment-14509032
 ] 

ASF GitHub Bot commented on FLINK-1472:
---

Github user aljoscha commented on the pull request:

https://github.com/apache/flink/pull/439#issuecomment-95581307
  
I added a Jira for this: https://issues.apache.org/jira/browse/FLINK-1936


 Web frontend config overview shows wrong value
 --

 Key: FLINK-1472
 URL: https://issues.apache.org/jira/browse/FLINK-1472
 Project: Flink
  Issue Type: Bug
  Components: Webfrontend
Affects Versions: master
Reporter: Ufuk Celebi
Assignee: Mingliang Qi
Priority: Minor

 The web frontend shows configuration values even if they could not be 
 correctly parsed.
 For example I've configured the number of buffers as 123.000, which cannot 
 be parsed as an Integer by GlobalConfiguration and the default value is used. 
 Still, the web frontend shows the not used 123.000.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (FLINK-1936) Normalize keys and default values in ConfigConstants.java

2015-04-23 Thread Aljoscha Krettek (JIRA)

Aljoscha Krettek created FLINK-1936:
---

 Summary: Normalize keys and default values in ConfigConstants.java
 Key: FLINK-1936
 URL: https://issues.apache.org/jira/browse/FLINK-1936
 Project: Flink
  Issue Type: Improvement
Reporter: Aljoscha Krettek


We have to find a way to normalise these. This is for example required as part 
of [FLINK-1472].

The idea would be to either force _KEY suffix for the string key, and DEFAULT_ 
prefix for the default value. Another idea would be to specify the default 
value using an Annotation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK-1472] Fixed Web frontend config overvie...

Github user aljoscha commented on the pull request:

https://github.com/apache/flink/pull/439#issuecomment-95581307
  
I added a Jira for this: https://issues.apache.org/jira/browse/FLINK-1936


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-1828) Impossible to output data to an HBase table


[ 
https://issues.apache.org/jira/browse/FLINK-1828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14509034#comment-14509034
 ] 

ASF GitHub Bot commented on FLINK-1828:
---

Github user StephanEwen commented on a diff in the pull request:

https://github.com/apache/flink/pull/571#discussion_r28961417
  
--- Diff: flink-staging/flink-hbase/pom.xml ---
@@ -112,6 +112,12 @@ under the License.
/exclusion
/exclusions
/dependency
+   dependency
--- End diff --

Fair enough. Then the dependency should not be in test scope, but in the 
default scope, so users get this dependency into their fat jar as well when 
using the HBase output format. May be worth to define a few exclusions, though, 
to not get the complete tail of transitive HBase dependencies (I think that 
even includes JRuby and so on)


 Impossible to output data to an HBase table
 ---

 Key: FLINK-1828
 URL: https://issues.apache.org/jira/browse/FLINK-1828
 Project: Flink
  Issue Type: Bug
  Components: Hadoop Compatibility
Affects Versions: 0.9
Reporter: Flavio Pompermaier
  Labels: hadoop, hbase
 Fix For: 0.9


 Right now it is not possible to use HBase TableOutputFormat as output format 
 because Configurable.setConf  is not called in the configure() method of the 
 HadoopOutputFormatBase



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-1486) Add a string to the print method to identify output


[ 
https://issues.apache.org/jira/browse/FLINK-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14509039#comment-14509039
 ] 

ASF GitHub Bot commented on FLINK-1486:
---

Github user mxm commented on the pull request:

https://github.com/apache/flink/pull/372#issuecomment-95583993
  
@StephanEwen Don't think printing on the client can be worse if the output 
still contains information about the producers (e.g. by a task id). IMO, a sink 
identifier could still make sense when you make multiple calls to print and 
want to distinguish easily between the outputs. 


 Add a string to the print method to identify output
 ---

 Key: FLINK-1486
 URL: https://issues.apache.org/jira/browse/FLINK-1486
 Project: Flink
  Issue Type: Improvement
  Components: Local Runtime
Reporter: Maximilian Michels
Assignee: Maximilian Michels
Priority: Minor
  Labels: usability
 Fix For: 0.9


 The output of the {{print}} method of {[DataSet}} is mainly used for debug 
 purposes. Currently, it is difficult to identify the output.
 I would suggest to add another {{print(String str)}} method which allows the 
 user to supply a String to identify the output. This could be a prefix before 
 the actual output or a format string (which might be an overkill).
 {code}
 DataSet data = env.fromElements(1,2,3,4,5);
 {code}
 For example, {{data.print(MyDataSet: )}} would output print
 {noformat}
 MyDataSet: 1
 MyDataSet: 2
 ...
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK-938] Auomatically configure the jobmana...

Github user uce commented on the pull request:

https://github.com/apache/flink/pull/248#issuecomment-95586174
  
I've rebased this on the current master, added log messages when falling 
back to the default address (which I've renamed as Till suggested), and tested 
it.

There is one remaining issue though: When starting the task manager, you 
use `hostname` as the default JM address. As @StephanEwen pointed out in 
another thread [1], this will only work for certain setups. Then the question 
is whether we really improve deployment with this or not. @rmetzger, you 
reported the original issue. Do you (and others of course) have an idea about 
this?

[1] https://github.com/apache/flink/pull/581#issuecomment-94911176

PS: The rebased branch is here, if you have time to pick it up from there: 
https://github.com/uce/incubator-flink/tree/jmaddress


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-1486) Add a string to the print method to identify output


[ 
https://issues.apache.org/jira/browse/FLINK-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14509051#comment-14509051
 ] 

ASF GitHub Bot commented on FLINK-1486:
---

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/372#issuecomment-95586431
  
Ah, okay. You mean we have two methods:

1.  `print()` which just prints everything onto the client
2.  `printWithTaskid()`, which prefixes lines with the task ID?


 Add a string to the print method to identify output
 ---

 Key: FLINK-1486
 URL: https://issues.apache.org/jira/browse/FLINK-1486
 Project: Flink
  Issue Type: Improvement
  Components: Local Runtime
Reporter: Maximilian Michels
Assignee: Maximilian Michels
Priority: Minor
  Labels: usability
 Fix For: 0.9


 The output of the {{print}} method of {[DataSet}} is mainly used for debug 
 purposes. Currently, it is difficult to identify the output.
 I would suggest to add another {{print(String str)}} method which allows the 
 user to supply a String to identify the output. This could be a prefix before 
 the actual output or a format string (which might be an overkill).
 {code}
 DataSet data = env.fromElements(1,2,3,4,5);
 {code}
 For example, {{data.print(MyDataSet: )}} would output print
 {noformat}
 MyDataSet: 1
 MyDataSet: 2
 ...
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK-924] Add automatic dependency retrieval...

Github user aljoscha commented on the pull request:

https://github.com/apache/flink/pull/35#issuecomment-95590430
  
Hi @qmlmoon,
sorry for the long wait on this PR. Could you please rebase on top of the 
current master and also get rid of the merge commits in the process?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-924) Extend JarFileCreator to automatically include dependencies


[ 
https://issues.apache.org/jira/browse/FLINK-924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14509075#comment-14509075
 ] 

ASF GitHub Bot commented on FLINK-924:
--

Github user aljoscha commented on the pull request:

https://github.com/apache/flink/pull/35#issuecomment-95590430
  
Hi @qmlmoon,
sorry for the long wait on this PR. Could you please rebase on top of the 
current master and also get rid of the merge commits in the process?


 Extend JarFileCreator to automatically include dependencies
 ---

 Key: FLINK-924
 URL: https://issues.apache.org/jira/browse/FLINK-924
 Project: Flink
  Issue Type: Improvement
Reporter: Ufuk Celebi
Assignee: Mingliang Qi
Priority: Minor

 We have a simple {{JarFileCreator}}, which allows to add classes to a JAR 
 file as follows:
 {code:java}
 JarFileCreator jfc = new JarFileCreator(jarFile);
 jfc.addClass(X.class);
 jfc.addClass(Y.class);
 jfc.createJarFile();
 {code}
 The created file can then be used with the remote execution environment, 
 which requires a JAR file to ship.
 I propose the following improvement: use [ASM|http://asm.ow2.org/] to extract 
 all dependencies and add create the JAR file automatically.
 There is an [old tutorial|http://asm.ow2.org/doc/tutorial-asm-2.0.html] (for 
 ASM 2), which implements a {{DependencyVisitor}}. Unfortuneately the code 
 does not directly work with ASM 5, but it should be a good starting point.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-938) Change start-cluster.sh script so that users don't have to configure the JobManager address


[ 
https://issues.apache.org/jira/browse/FLINK-938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14509010#comment-14509010
 ] 

ASF GitHub Bot commented on FLINK-938:
--

Github user uce commented on the pull request:

https://github.com/apache/flink/pull/48#issuecomment-95579798
  
Thanks for the PR. I've looked into the other approach (#248) and I think 
you can safely close this on now.


 Change start-cluster.sh script so that users don't have to configure the 
 JobManager address
 ---

 Key: FLINK-938
 URL: https://issues.apache.org/jira/browse/FLINK-938
 Project: Flink
  Issue Type: Improvement
  Components: Build System
Reporter: Robert Metzger
Assignee: Mingliang Qi
Priority: Minor
 Fix For: 0.9


 To improve the user experience, Flink should not require users to configure 
 the JobManager's address on a cluster.
 In combination with FLINK-934, this would allow running Flink with decent 
 performance on a cluster without setting a single configuration value.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK-1486] add print method for prefixing a ...

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/372#issuecomment-95580149
  
Can you think of a case where printing on the client is worse than printing 
on worker?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-938] Automatically configure the jobman...

2015-04-23 Thread qmlmoon

Github user qmlmoon closed the pull request at:

https://github.com/apache/flink/pull/48


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Commented] (FLINK-938) Change start-cluster.sh script so that users don't have to configure the JobManager address


[ 
https://issues.apache.org/jira/browse/FLINK-938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14509013#comment-14509013
 ] 

ASF GitHub Bot commented on FLINK-938:
--

Github user qmlmoon closed the pull request at:

https://github.com/apache/flink/pull/48


 Change start-cluster.sh script so that users don't have to configure the 
 JobManager address
 ---

 Key: FLINK-938
 URL: https://issues.apache.org/jira/browse/FLINK-938
 Project: Flink
  Issue Type: Improvement
  Components: Build System
Reporter: Robert Metzger
Assignee: Mingliang Qi
Priority: Minor
 Fix For: 0.9


 To improve the user experience, Flink should not require users to configure 
 the JobManager's address on a cluster.
 In combination with FLINK-934, this would allow running Flink with decent 
 performance on a cluster without setting a single configuration value.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-1486) Add a string to the print method to identify output


[ 
https://issues.apache.org/jira/browse/FLINK-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14509012#comment-14509012
 ] 

ASF GitHub Bot commented on FLINK-1486:
---

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/372#issuecomment-95580149
  
Can you think of a case where printing on the client is worse than printing 
on worker?


 Add a string to the print method to identify output
 ---

 Key: FLINK-1486
 URL: https://issues.apache.org/jira/browse/FLINK-1486
 Project: Flink
  Issue Type: Improvement
  Components: Local Runtime
Reporter: Maximilian Michels
Assignee: Maximilian Michels
Priority: Minor
  Labels: usability
 Fix For: 0.9


 The output of the {{print}} method of {[DataSet}} is mainly used for debug 
 purposes. Currently, it is difficult to identify the output.
 I would suggest to add another {{print(String str)}} method which allows the 
 user to supply a String to identify the output. This could be a prefix before 
 the actual output or a format string (which might be an overkill).
 {code}
 DataSet data = env.fromElements(1,2,3,4,5);
 {code}
 For example, {{data.print(MyDataSet: )}} would output print
 {noformat}
 MyDataSet: 1
 MyDataSet: 2
 ...
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (FLINK-1472) Web frontend config overview shows wrong value


[ 
https://issues.apache.org/jira/browse/FLINK-1472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14509017#comment-14509017
 ] 

ASF GitHub Bot commented on FLINK-1472:
---

Github user aljoscha commented on the pull request:

https://github.com/apache/flink/pull/439#issuecomment-95580562
  
Yes, but then we should change this now and not build more code on top of 
this that can fail in the future if someone forgets to add the names to the 
correct hash set in some other class.


 Web frontend config overview shows wrong value
 --

 Key: FLINK-1472
 URL: https://issues.apache.org/jira/browse/FLINK-1472
 Project: Flink
  Issue Type: Bug
  Components: Webfrontend
Affects Versions: master
Reporter: Ufuk Celebi
Assignee: Mingliang Qi
Priority: Minor

 The web frontend shows configuration values even if they could not be 
 correctly parsed.
 For example I've configured the number of buffers as 123.000, which cannot 
 be parsed as an Integer by GlobalConfiguration and the default value is used. 
 Still, the web frontend shows the not used 123.000.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] flink pull request: [FLINK-1867/1880] Raise test timeouts in hope ...

Github user StephanEwen commented on a diff in the pull request:

https://github.com/apache/flink/pull/612#discussion_r28961064
  
--- Diff: 
flink-tests/src/test/scala/org/apache/flink/api/scala/runtime/taskmanager/TaskManagerFailsITCase.scala
 ---
@@ -231,9 +231,9 @@ with WordSpecLike with Matchers with BeforeAndAfterAll {
 config.setInteger(ConfigConstants.TASK_MANAGER_NUM_TASK_SLOTS, 
numSlots)
 
config.setInteger(ConfigConstants.LOCAL_INSTANCE_MANAGER_NUMBER_TASK_MANAGER, 
numTaskmanagers)
 config.setString(ConfigConstants.AKKA_WATCH_HEARTBEAT_INTERVAL, 1000 
ms)
-config.setString(ConfigConstants.AKKA_WATCH_HEARTBEAT_PAUSE, 4000 ms)
+config.setString(ConfigConstants.AKKA_WATCH_HEARTBEAT_PAUSE, 20 s)
--- End diff --

True, the heartbeats go only between actor systems, not actors. If 
everything is local, no hearbeats happen anyways.

May as well drop this config entry then, I suppose...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: Fixed Configurable HadoopOutputFormat (FLINK-1...

Github user StephanEwen commented on a diff in the pull request:

https://github.com/apache/flink/pull/571#discussion_r28961417
  
--- Diff: flink-staging/flink-hbase/pom.xml ---
@@ -112,6 +112,12 @@ under the License.
/exclusion
/exclusions
/dependency
+   dependency
--- End diff --

Fair enough. Then the dependency should not be in test scope, but in the 
default scope, so users get this dependency into their fat jar as well when 
using the HBase output format. May be worth to define a few exclusions, though, 
to not get the complete tail of transitive HBase dependencies (I think that 
even includes JRuby and so on)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] flink pull request: [FLINK-1867/1880] Raise test timeouts in hope ...