[jira] [Resolved] (FLINK-1546) Failed job causes JobManager to shutdown due to uncatched WebFrontend exception

2015-03-17 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-1546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen resolved FLINK-1546.
-
   Resolution: Fixed
Fix Version/s: 0.9

Job Archiving was fixed in 8ae0dc2d768aecfa3129df553f43d827792b65d7

 Failed job causes JobManager to shutdown due to uncatched WebFrontend 
 exception
 ---

 Key: FLINK-1546
 URL: https://issues.apache.org/jira/browse/FLINK-1546
 Project: Flink
  Issue Type: Bug
  Components: JobManager
Affects Versions: 0.9
Reporter: Robert Metzger
 Fix For: 0.9


 {code}
 16:59:26,588 INFO  org.apache.flink.yarn.ApplicationMaster$$anonfun$2$$anon$1 
- Status of job ef19b2b201d4b81f031334cb76eadc78 (Basic Page Rank Example) 
 changed to FAILEDCleanup job ef19b2b201d4b81f031334cb76eadc78..
 16:59:26,591 ERROR akka.actor.OneForOneStrategy   
- Can only archive the job from a terminal state
 java.lang.IllegalStateException: Can only archive the job from a terminal 
 state
   at 
 org.apache.flink.runtime.executiongraph.ExecutionGraph.prepareForArchiving(ExecutionGraph.java:648)
   at 
 org.apache.flink.runtime.jobmanager.JobManager.org$apache$flink$runtime$jobmanager$JobManager$$removeJob(JobManager.scala:508)
   at 
 org.apache.flink.runtime.jobmanager.JobManager$$anonfun$receiveWithLogMessages$1.applyOrElse(JobManager.scala:271)
   at 
 scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33)
   at 
 scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33)
   at 
 scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25)
   at 
 org.apache.flink.yarn.YarnJobManager$$anonfun$receiveYarnMessages$1.applyOrElse(YarnJobManager.scala:70)
   at scala.PartialFunction$OrElse.apply(PartialFunction.scala:162)
   at 
 org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:37)
   at 
 org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:30)
   at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118)
   at 
 org.apache.flink.runtime.ActorLogMessages$$anon$1.applyOrElse(ActorLogMessages.scala:30)
   at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
   at 
 org.apache.flink.runtime.jobmanager.JobManager.aroundReceive(JobManager.scala:86)
   at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
   at akka.actor.ActorCell.invoke(ActorCell.scala:487)
   at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254)
   at akka.dispatch.Mailbox.run(Mailbox.scala:221)
   at akka.dispatch.Mailbox.exec(Mailbox.scala:231)
   at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
   at 
 scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
   at 
 scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
   at 
 scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
 16:59:26,595 INFO  org.apache.flink.yarn.ApplicationMaster$$anonfun$2$$anon$1 
- Stopping webserver.
 16:59:26,654 INFO  org.apache.flink.yarn.ApplicationMaster$$anonfun$2$$anon$1 
- Stopped webserver.
 16:59:26,656 INFO  org.apache.flink.yarn.ApplicationMaster$$anonfun$2$$anon$1 
- Stopping job manager akka://flink/user/jobmanager.
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1635) Remove Apache Thrift dependency from Flink

2015-03-17 Thread Stephan Ewen (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14366023#comment-14366023
 ] 

Stephan Ewen commented on FLINK-1635:
-

[~rmetzger] This is removed an fixed, if I understand correctly?

 Remove Apache Thrift dependency from Flink
 --

 Key: FLINK-1635
 URL: https://issues.apache.org/jira/browse/FLINK-1635
 Project: Flink
  Issue Type: Bug
  Components: Java API
Affects Versions: 0.9
Reporter: Robert Metzger

 I've added Thrift and Protobuf to Flink to support it out of the box with 
 Kryo.
 However, after trying to access a HCatalog/Hive table yesterday using Flink I 
 found that there is a dependency conflict between Flink and Hive (on thrift).
 Maybe it makes more sense to properly document our serialization framework 
 and provide a copypaste solution on how to get thrift/protobuf et al to 
 work with Flink.
 Please chime in if you are against removing the out of the box support for 
 protobuf and kryo.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (FLINK-1584) Spurious failure of TaskManagerFailsITCase

2015-03-17 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-1584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen resolved FLINK-1584.
-
Resolution: Fixed

Fixed with the switch to the newer akka version (enabled by shading away 
conflicting dependencies)

84e76f4d3274e07176f7377b7b739b6f180c6296

 Spurious failure of TaskManagerFailsITCase
 --

 Key: FLINK-1584
 URL: https://issues.apache.org/jira/browse/FLINK-1584
 Project: Flink
  Issue Type: Bug
Reporter: Till Rohrmann
Assignee: Till Rohrmann
 Fix For: 0.9


 The {{TaskManagerFailsITCase}} fails spuriously on Travis. The reason might 
 be that different test cases try to access the same {{JobManager}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1500) exampleScalaPrograms.EnumTriangleOptITCase does not finish on Travis

2015-03-17 Thread Stephan Ewen (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14366001#comment-14366001
 ] 

Stephan Ewen commented on FLINK-1500:
-

Have we seen this again, or was this an artifact of one of the bugs we fixed in 
the last weeks (like intermediate result partition lookup) ?

 exampleScalaPrograms.EnumTriangleOptITCase does not finish on Travis
 

 Key: FLINK-1500
 URL: https://issues.apache.org/jira/browse/FLINK-1500
 Project: Flink
  Issue Type: Bug
Reporter: Till Rohrmann

 The test case 
 org.apache.flink.test.exampleScalaPrograms.EnumTriangleOptITCase does not 
 finish on Travis. This problem is non-deterministic.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (FLINK-1535) Use usercode class loader to serialize/deserialize accumulators

2015-03-17 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-1535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen updated FLINK-1535:

Priority: Blocker  (was: Major)

 Use usercode class loader to serialize/deserialize accumulators
 ---

 Key: FLINK-1535
 URL: https://issues.apache.org/jira/browse/FLINK-1535
 Project: Flink
  Issue Type: Bug
  Components: JobManager
Affects Versions: 0.9
Reporter: Stephan Ewen
Assignee: Stephan Ewen
Priority: Blocker
 Fix For: 0.9


 Currently, accumulators are transferred via simple Akka Messages. Since the 
 accumulators may be user defined types, we should use the user code class 
 loader for code loading when deserializing them.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (FLINK-1459) Collect DataSet to client

2015-03-17 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-1459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen resolved FLINK-1459.
-
   Resolution: Fixed
Fix Version/s: 0.9
 Assignee: Stephan Ewen

This has been implemented a while back, actually. Sorry for the late update.

Implemented in 3dc2fe1dc300146e5209023274c0b0d04277f9ee

 Collect DataSet to client
 -

 Key: FLINK-1459
 URL: https://issues.apache.org/jira/browse/FLINK-1459
 Project: Flink
  Issue Type: Improvement
Reporter: John Sandiford
Assignee: Stephan Ewen
 Fix For: 0.9


 Hi, I may well have missed something obvious here but I cannot find an easy 
 way to extract the values in a DataSet to the client.  Spark has collect, 
 collectAsMap etc...  
 (I need to pass the values from a small aggregated DataSet back to a machine 
 learning library which is controlling the iterations.)
 The only way I could find to do this was to implement my own in memory 
 OutputFormat.  This is not ideal, but does work.
 Many thanks, John
   
 val env = ExecutionEnvironment.getExecutionEnvironment
   val data: DataSet[Double] = env.fromElements(1.0, 2.0, 3.0, 4.0)
   val result = data.reduce((a, b) = a)
   val valuesOnClient = result.???
   env.execute(Simple example)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1467) Job deployment fails with NPE on JobManager, if TMs did not start properly

2015-03-17 Thread Stephan Ewen (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14366026#comment-14366026
 ] 

Stephan Ewen commented on FLINK-1467:
-

The null-pointer exception has fixed in one of the TaskManager / Akka exception 
reworks.

The fix for the root cause (TaskManagers fail fast when memory initialization 
fails) is part of [FLINK-1580].

I am closing this as a duplicate.

 Job deployment fails with NPE on JobManager, if TMs did not start properly
 --

 Key: FLINK-1467
 URL: https://issues.apache.org/jira/browse/FLINK-1467
 Project: Flink
  Issue Type: Bug
  Components: JobManager
Reporter: Robert Metzger

 I have a Flink cluster started where all TaskManagers died 
 (misconfiguration). The JobManager needs more than 200 seconds to realize 
 that (on the TaskManagers overview, you see timeouts  200). When submitting 
 a job, you'll get the following exception:
 {code}
 org.apache.flink.client.program.ProgramInvocationException: The program 
 execution failed: java.lang.Exception: Failed to deploy the task CHAIN 
 DataSource (Generator: class io.airlift.tpch.NationGenerator) - Map (Map at 
 writeAsFormattedText(DataSet.java:1132)) (1/1) - execution #0 to slot SubSlot 
 0 (f8d11026ec5a11f0b273184c74ec4f29 (0) - ALLOCATED/ALIVE): 
 java.lang.NullPointerException
 at 
 org.apache.flink.runtime.taskmanager.TaskManager.org$apache$flink$runtime$taskmanager$TaskManager$$submitTask(TaskManager.scala:346)
 at 
 org.apache.flink.runtime.taskmanager.TaskManager$$anonfun$receiveWithLogMessages$1.applyOrElse(TaskManager.scala:248)
 at 
 scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33)
 at 
 scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33)
 at 
 scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25)
 at 
 org.apache.flink.yarn.YarnTaskManager$$anonfun$receiveYarnMessages$1.applyOrElse(YarnTaskManager.scala:32)
 at scala.PartialFunction$OrElse.apply(PartialFunction.scala:162)
 at 
 org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:41)
 at 
 org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:27)
 at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118)
 at 
 org.apache.flink.runtime.ActorLogMessages$$anon$1.applyOrElse(ActorLogMessages.scala:27)
 at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
 at 
 org.apache.flink.runtime.taskmanager.TaskManager.aroundReceive(TaskManager.scala:78)
 at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
 at akka.actor.ActorCell.invoke(ActorCell.scala:487)
 at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254)
 at akka.dispatch.Mailbox.run(Mailbox.scala:221)
 at akka.dispatch.Mailbox.exec(Mailbox.scala:231)
 at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
 at 
 scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
 at 
 scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
 at 
 scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
 at 
 org.apache.flink.runtime.executiongraph.Execution$2.onComplete(Execution.java:311)
 at akka.dispatch.OnComplete.internal(Future.scala:247)
 at akka.dispatch.OnComplete.internal(Future.scala:244)
 at akka.dispatch.japi$CallbackBridge.apply(Future.scala:174)
 at akka.dispatch.japi$CallbackBridge.apply(Future.scala:171)
 at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
 at 
 scala.concurrent.impl.ExecutionContextImpl$$anon$3.exec(ExecutionContextImpl.scala:107)
 at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
 at 
 scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
 at 
 scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
 at 
 scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
 at org.apache.flink.client.program.Client.run(Client.java:345)
 at org.apache.flink.client.program.Client.run(Client.java:304)
 at org.apache.flink.client.program.Client.run(Client.java:298)
 at 
 org.apache.flink.client.program.ContextEnvironment.execute(ContextEnvironment.java:55)
 at flink.generators.programs.TPCHGenerator.main(TPCHGenerator.java:80)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at 
 

[jira] [Resolved] (FLINK-947) Add support for Named Datasets

2015-03-20 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen resolved FLINK-947.

   Resolution: Fixed
Fix Version/s: 0.9

Merged under {{flink-staging/flink-expressions}}

 Add support for Named Datasets
 

 Key: FLINK-947
 URL: https://issues.apache.org/jira/browse/FLINK-947
 Project: Flink
  Issue Type: New Feature
  Components: Java API
Reporter: Aljoscha Krettek
Assignee: Aljoscha Krettek
Priority: Minor
 Fix For: 0.9


 This would create an API that is a mix between SQL like declarativity and the 
 power of user defined functions. Example user code could look like this:
 {code:Java}
 NamedDataSet one = ...
 NamedDataSet two = ...
 NamedDataSet result = one.join(two).where(key).equalTo(otherKey)
   .project(a, b, c)
   .map( (UserTypeIn in) - return new UserTypeOut(...) )
   .print();
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (FLINK-1018) Logistic Regression deadlocks

2015-03-20 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-1018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen resolved FLINK-1018.
-
   Resolution: Fixed
Fix Version/s: 0.9

Fixed via 9c77f0785e43326521da5e535f9ab1f05a9c6280

 Logistic Regression deadlocks
 -

 Key: FLINK-1018
 URL: https://issues.apache.org/jira/browse/FLINK-1018
 Project: Flink
  Issue Type: Bug
Reporter: Markus Holzemer
 Fix For: 0.9

 Attachments: LogisticRegression.java


 We are currently running our implementation of logistic regression with batch 
 gradient descent on the cluster.
 Unfortunatelly for datasets  1GB it seems to deadlock inside of the 
 iteration. This means the first iteration is never finished.
 The iteration does a map over all points, the map gets the iteration input as 
 broadcast variable. The result of the map is reduced and the result of the 
 reducer (1 tuple) is crossed with the iteration input.
 There should be no reason for the deadlock, since the data is still quite 
 small compared to the cluster size (4 nodes a 32GB). Also the datasize stays 
 constant throughout the algorithm.
 Here is the generated plan. I will also attach the full algorithm.
 {code}
 {
   nodes: [
   {
   id: 2,
   type: source,
   pact: Data Source,
   contents: [([0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 
 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.,
   parallelism: 1,
   subtasks_per_instance: 1,
   global_properties: [
   { name: Partitioning, value: RANDOM },
   { name: Partitioning Order, value: (none) },
   { name: Uniqueness, value: not unique }
   ],
   local_properties: [
   { name: Order, value: (none) },
   { name: Grouping, value: not grouped },
   { name: Uniqueness, value: not unique }
   ],
   estimates: [
   { name: Est. Output Size, value: (unknown) },
   { name: Est. Cardinality, value: (unknown) }
 ],
   costs: [
   { name: Network, value: 0.0 B },
   { name: Disk I/O, value: 0.0 B },
   { name: CPU, value: 0.0  },
   { name: Cumulative Network, value: 0.0 B },
   { name: Cumulative Disk I/O, value: 0.0 B },
   { name: Cumulative CPU, value: 0.0  }
   ],
   compiler_hints: [
   { name: Output Size (bytes), value: (none) },
   { name: Output Cardinality, value: (none) },
   { name: Avg. Output Record Size (bytes), value: 
 (none) },
   { name: Filter Factor, value: (none) }  
 ]
   },
   {
   step_function: [
   {
   id: 8,
   type: source,
   pact: Data Source,
   contents: TextInputFormat 
 (hdfs://cloud-7:45010/tmp/input/higgs.M.txt) - UTF-8,
   parallelism: 64,
   subtasks_per_instance: 16,
   global_properties: [
   { name: Partitioning, value: RANDOM },
   { name: Partitioning Order, value: (none) },
   { name: Uniqueness, value: not unique }
   ],
   local_properties: [
   { name: Order, value: (none) },
   { name: Grouping, value: not grouped },
   { name: Uniqueness, value: not unique }
   ],
   estimates: [
   { name: Est. Output Size, value: 8.0.31 GB },
   { name: Est. Cardinality, value: 109.90 M } 
 ],
   costs: [
   { name: Network, value: 0.0 B },
   { name: Disk I/O, value: 8.0.31 GB },
   { name: CPU, value: 0.0  },
   { name: Cumulative Network, value: 0.0 B },
   { name: Cumulative Disk I/O, value: 8.0.31 GB },
   { name: Cumulative CPU, value: 0.0  }
   ],
   compiler_hints: [
   { name: Output Size (bytes), value: (none) },
   { name: Output Cardinality, value: (none) },
   { name: Avg. Output Record Size (bytes), value: 
 (none) },
   { name: Filter Factor, value: (none) }  
 ]
   },
   {
   id: 7,
   type: pact,
   pact: Map,
   contents: 
 de.tu_berlin.impro3.stratosphere.classification.logreg.LogisticRegression$6,
   

[jira] [Resolved] (FLINK-952) TypeExtractor requires the argument types of the UDF to be identical to the parameter types

2015-03-20 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen resolved FLINK-952.

   Resolution: Fixed
Fix Version/s: 0.9

Fixed with the introduction of the subclass aware pojo and generic type 
serializers.

 TypeExtractor requires the argument types of the UDF to be identical to the 
 parameter types
 ---

 Key: FLINK-952
 URL: https://issues.apache.org/jira/browse/FLINK-952
 Project: Flink
  Issue Type: Bug
  Components: Local Runtime
Reporter: Till Rohrmann
 Fix For: 0.9


 The TypeExtractor checks for each operation whether the DataSet element types 
 are valid arguments for the UDF. However, it checks for strict equality 
 instead of a subtype relationship. Thus the following computation would not 
 work even though it should semantically be correct.
 DataSet[B].map(new MapFunction[A,A](){ A map(A x)}) with B being a sub class 
 of A.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (FLINK-1090) Join deadlocks when used inside Delta iteration

2015-03-20 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-1090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen resolved FLINK-1090.
-
   Resolution: Fixed
Fix Version/s: 0.9

Solved via 9c77f0785e43326521da5e535f9ab1f05a9c6280

 Join deadlocks when used inside Delta iteration
 ---

 Key: FLINK-1090
 URL: https://issues.apache.org/jira/browse/FLINK-1090
 Project: Flink
  Issue Type: Bug
  Components: Distributed Runtime, Iterations, Scala API
Affects Versions: 0.6-incubating
 Environment: Ubuntu 14.04, Flink 0.6-incubating
 LocalExecutor
 JVM 1.7 with 7 GB RAM assigned
Reporter: Stefan Bunk
 Fix For: 0.9


 I have a join inside a delta iteration, which hangs, i.e.I think it's 
 deadlocked.
 If I do the join without a delta iteration, it works.
 _Why I think it's a deadlock_:
 - no output in the logs
 - CPU idles
 - no IO (measured using iotop)
 - stacktrace (when starting in debug mode and stopping at arbitrary points) 
 locks deadlockish !http://i.imgur.com/4TgSK3x.png!
 _Join properties_:
 - size of the operands: 6.1 GB, 257 MB
 - estimated result size: 50 MB
 - the deadlock only occurs for big inputs, if I decrease the size of the 
 first operand to something smaller, e.g. 1MB, it works.
 I am using the Scala API.
 Let me know, which further information you need. The code is basically the 
 one I posted on the [mailing 
 list|http://apache-flink-incubator-user-mailing-list-archive.2336050.n4.nabble.com/Delta-Iteration-Runtime-Error-quot-Could-not-set-up-runtime-strategy-for-input-channel-to-node-quot-td6.html],
  but I could provide a compilable version if thats necessary.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (FLINK-1088) Iteration head deadlock

2015-03-20 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-1088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen resolved FLINK-1088.
-
   Resolution: Fixed
Fix Version/s: 0.9

Fixed via 9c77f0785e43326521da5e535f9ab1f05a9c6280

 Iteration head deadlock
 ---

 Key: FLINK-1088
 URL: https://issues.apache.org/jira/browse/FLINK-1088
 Project: Flink
  Issue Type: Bug
  Components: Iterations
Affects Versions: 0.7.0-incubating
Reporter: Márton Balassi
 Fix For: 0.9


 Flink hangs up for an iterative algorithm for which Stratosphere 0.5 was 
 working. 
 For the code please check out the following repo:
 https://github.com/mbalassi/als-comparison
 The stacktrace includes the following on Brokers:
 Join(Sends the rows of p with multiple keys)) (1/1) daemon prio=10 
 tid=0x7f8928014800 nid=0x998 waiting on condition [0x7f8912eed000]
java.lang.Thread.State: WAITING (parking)
 at sun.misc.Unsafe.park(Native Method)
 - parking to wait for  0x0007d2668ea0 (a 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
 at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
 at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
 at 
 java.util.concurrent.ArrayBlockingQueue.take(ArrayBlockingQueue.java:374)
 at 
 org.apache.flink.runtime.iterative.concurrent.Broker.get(Broker.java:63)
 at 
 org.apache.flink.runtime.iterative.task.IterationIntermediatePactTask.run(IterationIntermediatePactTask.java:84)
 at 
 org.apache.flink.runtime.operators.RegularPactTask.invoke(RegularPactTask.java:375)
 at 
 org.apache.flink.runtime.execution.RuntimeEnvironment.run(RuntimeEnvironment.java:265)
 at java.lang.Thread.run(Thread.java:744)
 This part waits for the iteration head which has not been started yet and 
 thus induces a deadlock.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-1756) Rename Stream Monitoring to Stream Checkpointing

2015-03-20 Thread Stephan Ewen (JIRA)
Stephan Ewen created FLINK-1756:
---

 Summary: Rename Stream Monitoring to Stream Checkpointing
 Key: FLINK-1756
 URL: https://issues.apache.org/jira/browse/FLINK-1756
 Project: Flink
  Issue Type: Bug
  Components: Streaming
Affects Versions: 0.9
Reporter: Stephan Ewen
 Fix For: 0.9


Currently, to enable the streaming checkpointing, you have to set monitoring 
on. I vote to call it checkpointing, because that describes it better and is 
more intuitive.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1710) Expression API Tests take very long

2015-03-20 Thread Stephan Ewen (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14371133#comment-14371133
 ] 

Stephan Ewen commented on FLINK-1710:
-

Maybe there are some known tweaks / best practices to
  - speed up compilation 
  - serialize generated code and integrate it into class loaders

 Expression API Tests take very long
 ---

 Key: FLINK-1710
 URL: https://issues.apache.org/jira/browse/FLINK-1710
 Project: Flink
  Issue Type: Bug
  Components: Expression API
Affects Versions: 0.9
Reporter: Stephan Ewen
Assignee: Aljoscha Krettek
 Fix For: 0.9


 The tests of the Expression API take an immense amount of time, compared to 
 the other API tests.
 Is that because they execute on large (generated) data sets, because the 
 program compilation overhead is high, or because there is an inefficiency in 
 the execution still?
 Running org.apache.flink.api.scala.expressions.AggregationsITCase
 Running org.apache.flink.api.scala.expressions.SelectITCase
 Running org.apache.flink.api.scala.expressions.AsITCase
 Running org.apache.flink.api.scala.expressions.StringExpressionsITCase
 Running org.apache.flink.api.scala.expressions.CastingITCase
 Running org.apache.flink.api.scala.expressions.JoinITCase
 Tests run: 12, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 34.652 sec - 
 in org.apache.flink.api.scala.expressions.AsITCase
 Running org.apache.flink.api.scala.expressions.GroupedAggreagationsITCase
 Tests run: 8, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 55.797 sec - 
 in org.apache.flink.api.scala.expressions.StringExpressionsITCase
 Running org.apache.flink.api.scala.expressions.PageRankExpressionITCase
 Tests run: 12, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 63.072 sec - 
 in org.apache.flink.api.scala.expressions.SelectITCase
 Running org.apache.flink.api.scala.expressions.ExpressionsITCase
 Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 65.628 sec - 
 in org.apache.flink.api.scala.expressions.CastingITCase
 Running org.apache.flink.api.scala.expressions.FilterITCase
 Tests run: 12, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 74.174 sec - 
 in org.apache.flink.api.scala.expressions.AggregationsITCase
 Tests run: 12, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 93.878 sec - 
 in org.apache.flink.api.scala.expressions.JoinITCase
 Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 63.4 sec - in 
 org.apache.flink.api.scala.expressions.GroupedAggreagationsITCase
 Tests run: 12, Failures: 0, Errors: 0, Skipped: 4, Time elapsed: 44.179 sec - 
 in org.apache.flink.api.scala.expressions.FilterITCase
 Tests run: 12, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 53.801 sec - 
 in org.apache.flink.api.scala.expressions.ExpressionsITCase
 Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 124.365 sec - 
 in org.apache.flink.api.scala.expressions.PageRankExpressionITCase



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1430) Add test for streaming scala api completeness

2015-03-20 Thread Stephan Ewen (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14371256#comment-14371256
 ] 

Stephan Ewen commented on FLINK-1430:
-

The completeness test is to ensure that the Java and Scala API are in sync, not 
the Batch and the Streaming API.

So, I think we should include these utility methods that you mentioned.

 Add test for streaming scala api completeness
 -

 Key: FLINK-1430
 URL: https://issues.apache.org/jira/browse/FLINK-1430
 Project: Flink
  Issue Type: Test
  Components: Streaming
Affects Versions: 0.9
Reporter: Márton Balassi
Assignee: Mingliang Qi

 Currently the completeness of the streaming scala api is not tested.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1759) Execution statistics for vertex-centric iterations

2015-03-20 Thread Stephan Ewen (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14371300#comment-14371300
 ] 

Stephan Ewen commented on FLINK-1759:
-

Robert has started an effort to integrate more profiling into the system, and I 
am working (side project, will bring it onto the mailing list soon) on getting 
a new extended version of the web frontend in, that visualizes all that 
information.

As part of that, we should be able to display that information.

 Execution statistics for vertex-centric iterations
 --

 Key: FLINK-1759
 URL: https://issues.apache.org/jira/browse/FLINK-1759
 Project: Flink
  Issue Type: Improvement
  Components: Gelly
Affects Versions: 0.9
Reporter: Vasia Kalavri
Assignee: Andra Lungu
Priority: Minor

 It would be nice to add an option for gathering execution statistics from 
 VertexCentricIteration.
 In particular, the following metrics could be useful:
 - total number of supersteps
 - number of messages sent (total / per superstep)
 - bytes of messages exchanged (total / per superstep)
 - execution time (total / per superstep)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (FLINK-1760) Add support for building Flink with Scala 2.11

2015-03-20 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-1760?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen resolved FLINK-1760.
-
Resolution: Fixed

Fixed via 2cd5e93daa9dc7b1e024ec7c1f1fc665f953510a

 Add support for building Flink with Scala 2.11
 --

 Key: FLINK-1760
 URL: https://issues.apache.org/jira/browse/FLINK-1760
 Project: Flink
  Issue Type: Improvement
  Components: Build System
Affects Versions: 0.9
Reporter: Stephan Ewen
Assignee: Alexander Alexandrov
 Fix For: 0.9


 Pull request https://github.com/apache/flink/pull/477 is implementing this 
 feature.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-1762) Make Statefulness of a Streaming Function explicit

2015-03-20 Thread Stephan Ewen (JIRA)
Stephan Ewen created FLINK-1762:
---

 Summary: Make Statefulness of a Streaming Function explicit
 Key: FLINK-1762
 URL: https://issues.apache.org/jira/browse/FLINK-1762
 Project: Flink
  Issue Type: Improvement
  Components: Streaming
Affects Versions: 0.9
Reporter: Stephan Ewen


Currently, the state of streaming functions stored in the 
{{StreamingRuntimeContext}}.

That is rather inexplicit, a function may or may not make use of the state. 
This also hides from the system whether a function is stateful or not.

How about we make this explicit by letting stateful functions extend a special 
interface (see below). That would allow the stream graph to already know which 
functions are stateful. Certain vertices would not participate in the 
checkpointing, if they only contain stateless vertices. We can set up the 
ExecutionGraph to expect confirmations only from the participating vertices, 
saving messages.

{code}
public interface Statehandle {

 get, put, ...
}

public interface Stateful {

void setStateHandle(Statehandle handle);
}
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (FLINK-974) TaskManager startup script does not respect JAVA_HOME

2015-03-20 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen resolved FLINK-974.

Resolution: Invalid

This is invalid. The taskmanager takes the JAVA home from its environment. In a 
local setup (ssh to localhost), this may be different from the user 
environment, but this is expected and intended.

 TaskManager startup script does not respect JAVA_HOME
 -

 Key: FLINK-974
 URL: https://issues.apache.org/jira/browse/FLINK-974
 Project: Flink
  Issue Type: Bug
  Components: TaskManager
Affects Versions: 0.6-incubating
Reporter: Stephan Ewen
Assignee: Ufuk Celebi
  Labels: Starter

 The TaskManager startup script does not respect JAVA_HOME, while the 
 JobManager startup script does.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (FLINK-1720) Integrate ScalaDoc in Scala sources into overall JavaDoc

2015-03-19 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-1720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen resolved FLINK-1720.
-
Resolution: Fixed

Fixed via ce39c190a2befa837fd03436a8420d40edf8d713

 Integrate ScalaDoc in Scala sources into overall JavaDoc
 

 Key: FLINK-1720
 URL: https://issues.apache.org/jira/browse/FLINK-1720
 Project: Flink
  Issue Type: Improvement
Reporter: Aljoscha Krettek
Assignee: Aljoscha Krettek
 Fix For: 0.9






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-1760) Add support for building Flink with Scala 2.11

2015-03-20 Thread Stephan Ewen (JIRA)
Stephan Ewen created FLINK-1760:
---

 Summary: Add support for building Flink with Scala 2.11
 Key: FLINK-1760
 URL: https://issues.apache.org/jira/browse/FLINK-1760
 Project: Flink
  Issue Type: Improvement
  Components: Build System
Affects Versions: 0.9
Reporter: Stephan Ewen
Assignee: Alexander Alexandrov
 Fix For: 0.9


Pull request https://github.com/apache/flink/pull/477 is implementing this 
feature.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1764) Rework record copying logic in streaming API

2015-03-20 Thread Stephan Ewen (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14371673#comment-14371673
 ] 

Stephan Ewen commented on FLINK-1764:
-

The chained filter invokable does that in my case. Once in the {{collect()}} 
method, once in the {{filter()}} method.

 Rework record copying logic in streaming API
 

 Key: FLINK-1764
 URL: https://issues.apache.org/jira/browse/FLINK-1764
 Project: Flink
  Issue Type: Improvement
  Components: Streaming
Affects Versions: 0.9
Reporter: Stephan Ewen

 The logic for chained tasks in the streaming API does a lot of copying of 
 records. In some cases, a record is copied multiple times before being passed 
 to a function.
 This seems unnecessary, in the general case. In any case, multiple copies 
 seem incorrect.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1764) Rework record copying logic in streaming API

2015-03-20 Thread Stephan Ewen (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14371706#comment-14371706
 ] 

Stephan Ewen commented on FLINK-1764:
-

I think that very much depends on the contract that you define:

  - After a source function gave a record to the collector, should it be 
guaranteed to still be the same? If you do not promise that, you need not copy.
  - Do you want to guarantee that the value emitted by a map function is never 
changed? That is only ever a problem anyways if the MapFunction retains a 
reference to that value (by storing it in a list or so).

I am unsure whether always copying is a good way to go. The initial use cases 
here use all very small records (often with immutable types anyways) where 
copying comes cheap. As soon as someone uses heavier objects, this
can be pretty heavy on the performance.

I am curious whether we can avoid that by making the copies optional. It can be 
either on or off by default.

 Rework record copying logic in streaming API
 

 Key: FLINK-1764
 URL: https://issues.apache.org/jira/browse/FLINK-1764
 Project: Flink
  Issue Type: Improvement
  Components: Streaming
Affects Versions: 0.9
Reporter: Stephan Ewen

 The logic for chained tasks in the streaming API does a lot of copying of 
 records. In some cases, a record is copied multiple times before being passed 
 to a function.
 This seems unnecessary, in the general case. In any case, multiple copies 
 seem incorrect.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-1765) Reducer grouping is skippted when parallelism is one

2015-03-20 Thread Stephan Ewen (JIRA)
Stephan Ewen created FLINK-1765:
---

 Summary: Reducer grouping is skippted when parallelism is one
 Key: FLINK-1765
 URL: https://issues.apache.org/jira/browse/FLINK-1765
 Project: Flink
  Issue Type: Bug
  Components: Streaming
Affects Versions: 0.9
Reporter: Stephan Ewen
 Fix For: 0.9


This program (not the parallelism) incorrectly runs a non grouped reduce and 
fails with a NullPointerException.

{code}
StreamExecutionEnvironment env = ...
env.setDegreeOfParallelism(1);

DataStreamString stream = env.addSource(...);

stream
.filter(...)
.map(...)
.groupBy(someField)
.reduce(new ReduceFunction() {...} )
.addSink(...);

env.execute();
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1710) Expression API Tests take very long

2015-03-20 Thread Stephan Ewen (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14371543#comment-14371543
 ] 

Stephan Ewen commented on FLINK-1710:
-

That would be a major rewrite, I guess?

Is this part not dependent on certain Scala compiler features?

 Expression API Tests take very long
 ---

 Key: FLINK-1710
 URL: https://issues.apache.org/jira/browse/FLINK-1710
 Project: Flink
  Issue Type: Bug
  Components: Expression API
Affects Versions: 0.9
Reporter: Stephan Ewen
Assignee: Aljoscha Krettek
 Fix For: 0.9


 The tests of the Expression API take an immense amount of time, compared to 
 the other API tests.
 Is that because they execute on large (generated) data sets, because the 
 program compilation overhead is high, or because there is an inefficiency in 
 the execution still?
 Running org.apache.flink.api.scala.expressions.AggregationsITCase
 Running org.apache.flink.api.scala.expressions.SelectITCase
 Running org.apache.flink.api.scala.expressions.AsITCase
 Running org.apache.flink.api.scala.expressions.StringExpressionsITCase
 Running org.apache.flink.api.scala.expressions.CastingITCase
 Running org.apache.flink.api.scala.expressions.JoinITCase
 Tests run: 12, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 34.652 sec - 
 in org.apache.flink.api.scala.expressions.AsITCase
 Running org.apache.flink.api.scala.expressions.GroupedAggreagationsITCase
 Tests run: 8, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 55.797 sec - 
 in org.apache.flink.api.scala.expressions.StringExpressionsITCase
 Running org.apache.flink.api.scala.expressions.PageRankExpressionITCase
 Tests run: 12, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 63.072 sec - 
 in org.apache.flink.api.scala.expressions.SelectITCase
 Running org.apache.flink.api.scala.expressions.ExpressionsITCase
 Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 65.628 sec - 
 in org.apache.flink.api.scala.expressions.CastingITCase
 Running org.apache.flink.api.scala.expressions.FilterITCase
 Tests run: 12, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 74.174 sec - 
 in org.apache.flink.api.scala.expressions.AggregationsITCase
 Tests run: 12, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 93.878 sec - 
 in org.apache.flink.api.scala.expressions.JoinITCase
 Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 63.4 sec - in 
 org.apache.flink.api.scala.expressions.GroupedAggreagationsITCase
 Tests run: 12, Failures: 0, Errors: 0, Skipped: 4, Time elapsed: 44.179 sec - 
 in org.apache.flink.api.scala.expressions.FilterITCase
 Tests run: 12, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 53.801 sec - 
 in org.apache.flink.api.scala.expressions.ExpressionsITCase
 Tests run: 6, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 124.365 sec - 
 in org.apache.flink.api.scala.expressions.PageRankExpressionITCase



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-1763) Remove cancel from streaming SinkFunction

2015-03-20 Thread Stephan Ewen (JIRA)
Stephan Ewen created FLINK-1763:
---

 Summary: Remove cancel from streaming SinkFunction
 Key: FLINK-1763
 URL: https://issues.apache.org/jira/browse/FLINK-1763
 Project: Flink
  Issue Type: Improvement
  Components: Streaming
Affects Versions: 0.9
Reporter: Stephan Ewen
 Fix For: 0.9


Since the streaming sink function is called individually for each record, it 
does not require a {{cancel()}} function. The system can cancel between calls 
to that function (which it cannot do for the source function).

Removing this method removes the need to always implement the unnecessary, and 
usually empty, method.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-1764) Rework record copying logic in streaming API

2015-03-20 Thread Stephan Ewen (JIRA)
Stephan Ewen created FLINK-1764:
---

 Summary: Rework record copying logic in streaming API
 Key: FLINK-1764
 URL: https://issues.apache.org/jira/browse/FLINK-1764
 Project: Flink
  Issue Type: Improvement
  Components: Streaming
Affects Versions: 0.9
Reporter: Stephan Ewen


The logic for chained tasks in the streaming API does a lot of copying of 
records. In some cases, a record is copied multiple times before being passed 
to a function.

This seems unnecessary, in the general case. In any case, multiple copies seem 
incorrect.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-1668) Add a config option to specify delays between restarts

2015-03-09 Thread Stephan Ewen (JIRA)
Stephan Ewen created FLINK-1668:
---

 Summary: Add a config option to specify delays between restarts
 Key: FLINK-1668
 URL: https://issues.apache.org/jira/browse/FLINK-1668
 Project: Flink
  Issue Type: Improvement
Affects Versions: 0.9
Reporter: Stephan Ewen
Assignee: Stephan Ewen
 Fix For: 0.9


The system currently introduces a short delay between a failed task execution 
and the restarted execution.

The reason is that this delay seemed to help in letting problems surface that 
let to the failed task. As an example, if a TaskManager fails, tasks fail due 
to data transfer errors. The TaskManager is not immediately recognized as 
failed, though (takes a bit until heartbeats time out). Immediately 
re-deploying tasks has a very high chance of assigning work to the TaskManager 
that is actually not responding, causing the execution retry to fail again. The 
delay gives the system time to figure out that the TaskManager was lost and 
does not take it into account upon the retry.

Currently, the system uses the heartbeat timeout as the default delay value. 
This may make sense as a default value for critical task failures, but is 
actually quite high for other types of failures.

In any case, I would like to add an option for users to specify the delay (even 
set it to 0, if desired).

The delay is not the best solution, in my opinion, we should eventually move to 
something better. Ideas are to put TaskManagers responsible for failed tasks in 
a probationary mode until they have reported back that everything is good 
(still alive, disk space available, etc)




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-1667) Add tests for recovery with distributed process failure

2015-03-09 Thread Stephan Ewen (JIRA)
Stephan Ewen created FLINK-1667:
---

 Summary: Add tests for recovery with distributed process failure
 Key: FLINK-1667
 URL: https://issues.apache.org/jira/browse/FLINK-1667
 Project: Flink
  Issue Type: Improvement
  Components: test
Affects Versions: 0.9
Reporter: Stephan Ewen
Assignee: Stephan Ewen
 Fix For: 0.9


The system should have a test that actually spawns multiple TaskManager 
processes (JVMs) and tests how recovery works when one of them is killed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1690) ProcessFailureBatchRecoveryITCase.testTaskManagerProcessFailure spuriously fails on Travis

2015-03-11 Thread Stephan Ewen (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357146#comment-14357146
 ] 

Stephan Ewen commented on FLINK-1690:
-

I will create a patch with increased timeout...

 ProcessFailureBatchRecoveryITCase.testTaskManagerProcessFailure spuriously 
 fails on Travis
 --

 Key: FLINK-1690
 URL: https://issues.apache.org/jira/browse/FLINK-1690
 Project: Flink
  Issue Type: Bug
Reporter: Till Rohrmann
Priority: Minor

 I got the following error on Travis.
 {code}
 ProcessFailureBatchRecoveryITCase.testTaskManagerProcessFailure:244 The 
 program did not finish in time
 {code}
 I think we have to increase the timeouts for this test case to make it 
 reliably run on Travis.
 The log of the failed Travis build can be found 
 [here|https://api.travis-ci.org/jobs/53952486/log.txt?deansi=true]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-1691) Inprove CountCollectITCase

2015-03-11 Thread Stephan Ewen (JIRA)
Stephan Ewen created FLINK-1691:
---

 Summary: Inprove CountCollectITCase
 Key: FLINK-1691
 URL: https://issues.apache.org/jira/browse/FLINK-1691
 Project: Flink
  Issue Type: Bug
  Components: test
Affects Versions: 0.9
Reporter: Stephan Ewen
Assignee: Maximilian Michels
 Fix For: 0.9


The CountCollectITCase logs heavily and does not reuse the same cluster across 
multiple tests.

Both can be addressed by letting it extend the MultipleProgramsTestBase



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-1671) Add execution modes for programs

2015-03-10 Thread Stephan Ewen (JIRA)
Stephan Ewen created FLINK-1671:
---

 Summary: Add execution modes for programs
 Key: FLINK-1671
 URL: https://issues.apache.org/jira/browse/FLINK-1671
 Project: Flink
  Issue Type: Bug
Affects Versions: 0.9
Reporter: Stephan Ewen
Assignee: Stephan Ewen
 Fix For: 0.9


Currently, there is a single way that programs get executed: Pipelined. With 
the new code for batch shuffles (https://github.com/apache/flink/pull/471), we 
have much more flexibility and I would like to expose that.

I suggest to add more execution modes that can be chosen on the 
`ExecutionEnvironment`:

  - {{BATCH}} A mode where every shuffle is executed in a batch way, meaning 
preceding operators must be done before successors start. Only for the batch 
programs (d'oh).

  - {{PIPELINED}} This is the mode corresponding to the current execution mode. 
It pipelines where possible and batches, where deadlocks would otherwise 
happen. Initially, I would make this the default (be close to the current 
behavior). Only available for batch programs.

  - {{PIPELINED_WITH_BATCH_FALLBACK}} This would start out with pipelining 
shuffles and fall back to batch shuffles upon failure and recovery, or once it 
sees that not enough slots are available to bring up all operators at once 
(requirement for pipelining).

  - {{STREAMING}} This is the default and only way for streaming programs. All 
communication is pipelined, and the special streaming checkpointing code is 
activated.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1690) ProcessFailureBatchRecoveryITCase.testTaskManagerProcessFailure spuriously fails on Travis

2015-03-12 Thread Stephan Ewen (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1690?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357144#comment-14357144
 ] 

Stephan Ewen commented on FLINK-1690:
-

I retract my statement. After careful log defusing with [~uce], I think that 
things go as planned.

The test has actually a too limited time budget. It takes almost 8 seconds on 
my machine, so I grant it that Travis may not be able to complete it in 30 
seconds.

 ProcessFailureBatchRecoveryITCase.testTaskManagerProcessFailure spuriously 
 fails on Travis
 --

 Key: FLINK-1690
 URL: https://issues.apache.org/jira/browse/FLINK-1690
 Project: Flink
  Issue Type: Bug
Reporter: Till Rohrmann
Priority: Minor

 I got the following error on Travis.
 {code}
 ProcessFailureBatchRecoveryITCase.testTaskManagerProcessFailure:244 The 
 program did not finish in time
 {code}
 I think we have to increase the timeouts for this test case to make it 
 reliably run on Travis.
 The log of the failed Travis build can be found 
 [here|https://api.travis-ci.org/jobs/53952486/log.txt?deansi=true]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (FLINK-1648) Add a mode where the system automatically sets the parallelism to the available task slots

2015-03-06 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-1648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen resolved FLINK-1648.
-
Resolution: Implemented

Implemented in d8d642fd6d7d9b8526325d4efff1015f636c5ddb

 Add a mode where the system automatically sets the parallelism to the 
 available task slots
 --

 Key: FLINK-1648
 URL: https://issues.apache.org/jira/browse/FLINK-1648
 Project: Flink
  Issue Type: New Feature
  Components: JobManager
Affects Versions: 0.9
Reporter: Stephan Ewen
Assignee: Stephan Ewen
 Fix For: 0.9


 This is basically a port of this code form the 0.8 release:
 https://github.com/apache/flink/pull/410



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-1705) InstanceConnectionInfo returns wrong hostname when no DNS entry exists

2015-03-14 Thread Stephan Ewen (JIRA)
Stephan Ewen created FLINK-1705:
---

 Summary: InstanceConnectionInfo returns wrong hostname when no DNS 
entry exists
 Key: FLINK-1705
 URL: https://issues.apache.org/jira/browse/FLINK-1705
 Project: Flink
  Issue Type: Bug
  Components: TaskManager
Affects Versions: 0.9
Reporter: Stephan Ewen
Assignee: Stephan Ewen
 Fix For: 0.9


If there is no DNS entry for an address (like 10.4.122.43), then the 
{{InstanceConnectionInfo}} returns the first octet ({{10}}) as the hostame.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1106) Deprecate old Record API

2015-03-10 Thread Stephan Ewen (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1106?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14355063#comment-14355063
 ] 

Stephan Ewen commented on FLINK-1106:
-

A bit of test coverage depends on the deprecated API.

We would need to port at least some of the tests to the new API.

We can probably drop some subsumed / obsolete tests.

 Deprecate old Record API
 

 Key: FLINK-1106
 URL: https://issues.apache.org/jira/browse/FLINK-1106
 Project: Flink
  Issue Type: Task
  Components: Java API
Affects Versions: 0.7.0-incubating
Reporter: Robert Metzger
Assignee: Robert Metzger
Priority: Critical
 Fix For: 0.7.0-incubating


 For the upcoming 0.7 release, we should mark all user-facing methods from the 
 old Record Java API as deprecated, with a warning that we are going to remove 
 it at some point.
 I would suggest to wait one or two releases from the 0.7 release (given our 
 current release cycle). I'll start a mailing-list discussion at some point 
 regarding this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-1675) Rework Accumulators

2015-03-10 Thread Stephan Ewen (JIRA)
Stephan Ewen created FLINK-1675:
---

 Summary: Rework Accumulators
 Key: FLINK-1675
 URL: https://issues.apache.org/jira/browse/FLINK-1675
 Project: Flink
  Issue Type: Bug
  Components: JobManager, TaskManager
Affects Versions: 0.9
Reporter: Stephan Ewen
 Fix For: 0.9


The accumulators need an overhaul to address various issues:

1.  User defined Accumulator classes crash the client, because it is not using 
the user code classloader to decode the received message.

2.  They should be attached to the ExecutionGraph, not the dedicated 
AccumulatorManager. That makes them accessible also for archived execution 
graphs.

3.  Accumulators should be sent periodically, as part of the heart beat that 
sends metrics. This allows them to be updated in real time

4. Accumulators should be stored fine grained (per executionvertex, or per 
execution) and the final value should be on computed by merging all involved 
ones. This allows users to access the per-subtask accumulators, which is often 
interesting.

5. Accumulators should subsume the aggregators by allowing to be versioned 
with a superstep. The versioned ones should be redistributed to the cluster 
after each superstep.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1659) Rename classes and packages that contains Pact

2015-03-13 Thread Stephan Ewen (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14360185#comment-14360185
 ] 

Stephan Ewen commented on FLINK-1659:
-

I am working on getting the configurable data exchanges (shuffles / broadcasts) 
in BATCH and PIPELINING into th eoptimizer right now. After that, I would offer 
to rename the classes.

My feeling is that we should call it *Optimizer*, not *Compiler*, as that 
confused people - it sounds too much like a language compiler (Java / Scala)

 Rename classes and packages that contains Pact
 --

 Key: FLINK-1659
 URL: https://issues.apache.org/jira/browse/FLINK-1659
 Project: Flink
  Issue Type: Task
Reporter: Henry Saputra
Priority: Minor

 We have several class names that contain or start with Pact.
 Pact is the previous term for Flink data model and user defined functions/ 
 operators.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (FLINK-441) Renaming in pact-compiler

2015-03-13 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen updated FLINK-441:
---
Issue Type: Sub-task  (was: Improvement)
Parent: FLINK-1659

 Renaming in pact-compiler
 -

 Key: FLINK-441
 URL: https://issues.apache.org/jira/browse/FLINK-441
 Project: Flink
  Issue Type: Sub-task
Reporter: GitHub Import
Priority: Minor
  Labels: github-import
 Fix For: pre-apache


 I would like to do a cleanup and renaming in the pact-compiler. Most of the 
 work is in line with the recent global renaming, but I also want to clear and 
 organize the various representation structures for the optimized plan. 
 I open this issue to keep track and discuss the suggested renaming.
 We'll have to coordinate the merging of this issue because some renamings 
 (e.g. PactCompiler - Compiler) seem to affect a lot of other packages.
 ### Global Scope (Wide Dependencies)
 The following names are part of the public API of stratosphere-compiler. 
 Their renaming will probably affect a lot of other modules.
 In ```eu.stratosphere.compiler```:
 * ```PactCompiler``` ⇒ ```Compiler```
 ### Module Scope (Narrow Dependencies)
 The following names are part of the internal API of stratosphere-compiler. 
 Their renaming will probably affect only stratosphere-compiler and 
 stratosphere-tests.
 In ```eu.stratosphere.compiler```:
 * ```DataStatistics``` ⇒ ```StatsStore``` This should be developed as an API 
 for data stats over *expressions* instead of just over *data sources*.
 * ```NonCachingDataStatistics``` ⇒ *delete*. This class does not seem to be 
 used.
  Imported from GitHub 
 Url: https://github.com/stratosphere/stratosphere/issues/441
 Created by: [aalexandrov|https://github.com/aalexandrov]
 Labels: 
 Created at: Mon Jan 27 12:33:50 CET 2014
 State: open



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1650) Suppress Akka's Netty Shutdown Errors through the log config

2015-03-13 Thread Stephan Ewen (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14360230#comment-14360230
 ] 

Stephan Ewen commented on FLINK-1650:
-

I this YARN specific?

Does the shading affect the netty classes and hence change the classnames and 
the logger config?

 Suppress Akka's Netty Shutdown Errors through the log config
 

 Key: FLINK-1650
 URL: https://issues.apache.org/jira/browse/FLINK-1650
 Project: Flink
  Issue Type: Bug
  Components: other
Affects Versions: 0.9
Reporter: Stephan Ewen
Assignee: Stephan Ewen
 Fix For: 0.9


 I suggest to set the logging for 
 `org.jboss.netty.channel.DefaultChannelPipeline` to error, in order to get 
 rid of the misleading stack trace caused by an akka/netty hickup on shutdown.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (FLINK-1668) Add a config option to specify delays between restarts

2015-03-10 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-1668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen resolved FLINK-1668.
-
Resolution: Implemented

Implemented in abbb0a93ca67da17197dc5372e6d95edd8149d44

 Add a config option to specify delays between restarts
 --

 Key: FLINK-1668
 URL: https://issues.apache.org/jira/browse/FLINK-1668
 Project: Flink
  Issue Type: Improvement
Affects Versions: 0.9
Reporter: Stephan Ewen
Assignee: Stephan Ewen
 Fix For: 0.9


 The system currently introduces a short delay between a failed task execution 
 and the restarted execution.
 The reason is that this delay seemed to help in letting problems surface that 
 let to the failed task. As an example, if a TaskManager fails, tasks fail due 
 to data transfer errors. The TaskManager is not immediately recognized as 
 failed, though (takes a bit until heartbeats time out). Immediately 
 re-deploying tasks has a very high chance of assigning work to the 
 TaskManager that is actually not responding, causing the execution retry to 
 fail again. The delay gives the system time to figure out that the 
 TaskManager was lost and does not take it into account upon the retry.
 Currently, the system uses the heartbeat timeout as the default delay value. 
 This may make sense as a default value for critical task failures, but is 
 actually quite high for other types of failures.
 In any case, I would like to add an option for users to specify the delay 
 (even set it to 0, if desired).
 The delay is not the best solution, in my opinion, we should eventually move 
 to something better. Ideas are to put TaskManagers responsible for failed 
 tasks in a probationary mode until they have reported back that everything 
 is good (still alive, disk space available, etc)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1776) APIs provide invalid Semantic Properties for Operators with SelectorFunction keys

2015-03-24 Thread Stephan Ewen (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14377789#comment-14377789
 ] 

Stephan Ewen commented on FLINK-1776:
-

That is critical, I agree.

Can we make the assumption that all fields are shifted by one (or as many as 
the key selector returns values) since the data gets wrapped in a tuple2(key, 
value)?

 APIs provide invalid Semantic Properties for Operators with SelectorFunction 
 keys
 -

 Key: FLINK-1776
 URL: https://issues.apache.org/jira/browse/FLINK-1776
 Project: Flink
  Issue Type: Bug
  Components: Java API, Scala API
Affects Versions: 0.9
Reporter: Fabian Hueske
Assignee: Fabian Hueske
Priority: Critical
 Fix For: 0.9


 Semantic properties are defined by users and evaluated by the optimizer.
 When semantic properties such as forwarded or read fields are bound to the 
 input type of a function.
 In case of operators with selector function keys, a user function is wrapped 
 by a wrapping function that has a different input types than the original 
 user function. However, the user-defined semantic properties are verbatim 
 forwarded to the optimizer. 
 Since the properties refer to a specific type which is changed by the 
 wrapping function and the semantic properties are not adapted, the optimizer 
 uses wrong properties and might produce invalid plans



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (FLINK-1761) IndexOutOfBoundsException when receiving empty buffer at remote channel

2015-03-24 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-1761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen resolved FLINK-1761.
-
   Resolution: Fixed
Fix Version/s: 0.9

Fixed in 380ef878c850f83b5e12176e465d59c737066e20 and 
925481fb1c88f3c45b289cdf5ef203190492031a

 IndexOutOfBoundsException when receiving empty buffer at remote channel
 ---

 Key: FLINK-1761
 URL: https://issues.apache.org/jira/browse/FLINK-1761
 Project: Flink
  Issue Type: Bug
  Components: Distributed Runtime
Affects Versions: master
Reporter: Ufuk Celebi
Assignee: Ufuk Celebi
 Fix For: 0.9


 Receiving buffers from remote input channels with size 0 results in an 
 {{IndexOutOfBoundsException}}.
 {code}
 Caused by: java.lang.IndexOutOfBoundsException: index: 30 (expected: range(0, 
 30))
   at io.netty.buffer.AbstractByteBuf.checkIndex(AbstractByteBuf.java:1123)
   at 
 io.netty.buffer.PooledUnsafeDirectByteBuf.getBytes(PooledUnsafeDirectByteBuf.java:156)
   at 
 io.netty.buffer.PooledUnsafeDirectByteBuf.getBytes(PooledUnsafeDirectByteBuf.java:151)
   at io.netty.buffer.SlicedByteBuf.getBytes(SlicedByteBuf.java:179)
   at io.netty.buffer.AbstractByteBuf.readBytes(AbstractByteBuf.java:717)
   at 
 org.apache.flink.runtime.io.network.netty.PartitionRequestClientHandler.decodeBufferOrEvent(PartitionRequestClientHandler.java:205)
   at 
 org.apache.flink.runtime.io.network.netty.PartitionRequestClientHandler.decodeMsg(PartitionRequestClientHandler.java:164)
   at 
 org.apache.flink.runtime.io.network.netty.PartitionRequestClientHandler.channelRead(PartitionRequestClientHandler.java:118)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1769) Maven deploy is broken (build artifacts are cleaned in docs-and-sources profile)

2015-03-24 Thread Stephan Ewen (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14377791#comment-14377791
 ] 

Stephan Ewen commented on FLINK-1769:
-

Anyone looking into this already?

This effectively means that we do not have snapshots right now, correct?

 Maven deploy is broken (build artifacts are cleaned in docs-and-sources 
 profile)
 

 Key: FLINK-1769
 URL: https://issues.apache.org/jira/browse/FLINK-1769
 Project: Flink
  Issue Type: Bug
  Components: Build System
Reporter: Robert Metzger
Priority: Critical

 The issue has been introduced by FLINK-1720.
 This change broke the deployment to maven snapshots / central.
 {code}
 [ERROR] Failed to execute goal 
 org.apache.maven.plugins:maven-install-plugin:2.5.1:install (default-install) 
 on project flink-shaded-include-yarn: Failed to install artifact 
 org.apache.flink:flink-shaded-include-yarn:pom:0.9-SNAPSHOT: 
 /home/robert/incubator-flink/flink-shaded-hadoop/flink-shaded-include-yarn/target/dependency-reduced-pom.xml
  (No such file or directory) - [Help 1]
 {code}
 The issue is that maven is now executing {{clean}} after {{shade}} and then 
 {{install}} can not store the result of {{shade}} anymore (because it has 
 been deleted)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-1778) Improve normalized keys in composite key case

2015-03-24 Thread Stephan Ewen (JIRA)
Stephan Ewen created FLINK-1778:
---

 Summary: Improve normalized keys in composite key case
 Key: FLINK-1778
 URL: https://issues.apache.org/jira/browse/FLINK-1778
 Project: Flink
  Issue Type: Improvement
  Components: Local Runtime
Affects Versions: 0.9
Reporter: Stephan Ewen
Assignee: Stephan Ewen
 Fix For: 0.9


Currently, if we have a key (String, long), the String will take up the entire 
normalized key space, without being fully discerning anyways. Limiting the key 
prefix in size and giving space to the second key field should in most cases 
improve the comparison efficiency.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-1782) Change Quickstart Java version to 1.7

2015-03-24 Thread Stephan Ewen (JIRA)
Stephan Ewen created FLINK-1782:
---

 Summary: Change Quickstart Java version to 1.7
 Key: FLINK-1782
 URL: https://issues.apache.org/jira/browse/FLINK-1782
 Project: Flink
  Issue Type: Improvement
  Components: quickstars
Affects Versions: 0.9
Reporter: Stephan Ewen
 Fix For: 0.9


The quickstarts refer to the outdated Java 1.6 source and bin version. We 
should upgrade this to 1.7.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-1781) Quickstarts broken due to Scala Version Variables

2015-03-24 Thread Stephan Ewen (JIRA)
Stephan Ewen created FLINK-1781:
---

 Summary: Quickstarts broken due to Scala Version Variables
 Key: FLINK-1781
 URL: https://issues.apache.org/jira/browse/FLINK-1781
 Project: Flink
  Issue Type: Bug
  Components: quickstars
Affects Versions: 0.9
Reporter: Stephan Ewen
Priority: Blocker
 Fix For: 0.9


The quickstart archetype resources refer to the scala version variables.
When creating a maven project standalone, these variables are not defined, and 
the pom is invalid.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-1783) Quickstart shading should not created shaded jar and dependency reduced pom

2015-03-24 Thread Stephan Ewen (JIRA)
Stephan Ewen created FLINK-1783:
---

 Summary: Quickstart shading should not created shaded jar and 
dependency reduced pom
 Key: FLINK-1783
 URL: https://issues.apache.org/jira/browse/FLINK-1783
 Project: Flink
  Issue Type: Improvement
  Components: quickstars
Affects Versions: 0.9
Reporter: Stephan Ewen
 Fix For: 0.9






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (FLINK-1774) Remove the redundant code in try{} block

2015-03-30 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-1774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen resolved FLINK-1774.
-
Resolution: Fixed

Fixed in c89c657ae16bbe89da54669a234713a3811813ee

Thank you for the patch!

 Remove the redundant code in try{} block
 

 Key: FLINK-1774
 URL: https://issues.apache.org/jira/browse/FLINK-1774
 Project: Flink
  Issue Type: Improvement
Affects Versions: master
Reporter: Sibao Hong
Assignee: Sibao Hong
Priority: Minor
 Fix For: master


 Remove the redundant code of fos.close(); fos = null; in try block because 
 the fos,close() code will always executes in finally block.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (FLINK-1779) Rename the function name from getCurrentyActiveConnections to getCurrentActiveConnections in org.apache.flink.runtime.blob

2015-03-30 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-1779?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen resolved FLINK-1779.
-
Resolution: Fixed

Fixed in fb3f3ee845a3aae295c9aae00f3d406d9f1d5813

Thank you for the patch!

 Rename the function name from getCurrentyActiveConnections to 
 getCurrentActiveConnections in  org.apache.flink.runtime.blob
 ---

 Key: FLINK-1779
 URL: https://issues.apache.org/jira/browse/FLINK-1779
 Project: Flink
  Issue Type: Improvement
Reporter: Sibao Hong
Assignee: Sibao Hong
Priority: Minor
 Fix For: master


 I think the function name getCurrentyActiveConnections in ' 
 org.apache.flink.runtime.blob' is a wrong spelling, it should be 
 getCurrentActiveConnections is more better, and also I add some comments 
 about the function and the Tests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-1801) NetworkEnvironment should start without JobManager association

2015-03-30 Thread Stephan Ewen (JIRA)
Stephan Ewen created FLINK-1801:
---

 Summary: NetworkEnvironment should start without JobManager 
association
 Key: FLINK-1801
 URL: https://issues.apache.org/jira/browse/FLINK-1801
 Project: Flink
  Issue Type: Sub-task
  Components: TaskManager
Affects Versions: 0.9
Reporter: Stephan Ewen
Assignee: Stephan Ewen
 Fix For: 0.9


The NetworkEnvironment should be able to start without a dedicated JobManager 
association and get one / loose one as the TaskManager connects to different 
JobManagers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (FLINK-1348) Move Stream Connector Jars from lib to Client JARs

2015-03-30 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-1348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen resolved FLINK-1348.
-
   Resolution: Fixed
Fix Version/s: 0.9

Fixed with the updated dependency management in the rewrite to use shading.

 Move Stream Connector Jars from lib to Client JARs
 

 Key: FLINK-1348
 URL: https://issues.apache.org/jira/browse/FLINK-1348
 Project: Flink
  Issue Type: Improvement
  Components: Streaming
Affects Versions: 0.9
Reporter: Márton Balassi
Assignee: Márton Balassi
 Fix For: 0.9


 Right now, the connectors and all dependencies are put into the lib folder
 and are part of the system at startup time. This is a large bunch of
 dependencies, and they may actually conflict with the dependencies of
 custom connectors (or example with a different version of RabbitMQ or so).
 We could fix that, if we remove the dependencies from the lib folder and
 set up archetypes that build fat jars with the dependencies. That way, each
 job (with its custom class loader) will gets the dependencies it needs and
 will not see all the other (potentially conflicting ones) in the namespace.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (FLINK-1754) Deadlock in job execution

2015-03-30 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-1754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen resolved FLINK-1754.
-
Resolution: Not a Problem

Is actually a known bug in 0.8 and fixed in 0.9

 Deadlock in job execution
 -

 Key: FLINK-1754
 URL: https://issues.apache.org/jira/browse/FLINK-1754
 Project: Flink
  Issue Type: Bug
Affects Versions: 0.8.1
Reporter: Sebastian Kruse

 I have encountered a reproducible deadlock in the execution of one of my 
 jobs. The part of the plan, where this happens, is the following:
 {code:java}
 /** Performs the reduction via creating transitive INDs and removing them 
 from the original IND set. */
 private DataSetTuple2Integer, int[] 
 calculateTransitiveReduction1(DataSetTuple2Integer, int[] 
 inclusionDependencies) {
 // Concatenate INDs (only one hop).
 DataSetTuple2Integer, int[] transitiveInds = inclusionDependencies
 .flatMap(new SplitInds())
 .joinWithTiny(inclusionDependencies)
 .where(1).equalTo(0)
 .with(new ConcatenateInds());
 // Remove the concatenated INDs to come up with a transitive 
 reduction of the INDs.
 return inclusionDependencies
 .coGroup(transitiveInds)
 .where(0).equalTo(0)
 .with(new RemoveTransitiveInds());
 }
 {code}
 Seemingly, the flatmap operator waits infinitely for a free buffer to write 
 on.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (FLINK-1435) TaskManager does not log missing memory error on start up

2015-03-30 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-1435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen resolved FLINK-1435.
-
Resolution: Not a Problem

 TaskManager does not log missing memory error on start up
 -

 Key: FLINK-1435
 URL: https://issues.apache.org/jira/browse/FLINK-1435
 Project: Flink
  Issue Type: Bug
  Components: TaskManager
Affects Versions: 0.7.0-incubating
Reporter: Malte Schwarzer
Priority: Minor
  Labels: memorymanager, starter

 When using bin/start-cluster.sh to start TaskManagers and a worker node is 
 failing to start because of missing memory, you do not receive any error 
 messages in log files.
 Worker node has only 15000M memory available, but it is configured with 
 Maximum heap size: 4 MiBytes. Task manager does not join the cluster. 
 Process hangs.
 Last lines of log looks like this:
 ...
 ... - - Starting with 12 incoming and 12 outgoing connection threads.
 ... - Setting low water mark to 16384 and high water mark to 32768 bytes.
 ... - Instantiated PooledByteBufAllocator with direct arenas: 24, heap 
 arenas: 0, page size (bytes): 65536, chunk size (bytes): 16777216.
 ... - Using 0.7 of the free heap space for managed memory.
 ... - Initializing memory manager with 24447 megabytes of memory. Page size 
 is 32768 bytes.
 (END)
 Error message about not enough memory is missing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (FLINK-1801) NetworkEnvironment should start without JobManager association

2015-03-30 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-1801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen resolved FLINK-1801.
-
Resolution: Implemented

Implemented in ee273dbe01e95d2b260fa690e21e2c244a2a5711

 NetworkEnvironment should start without JobManager association
 --

 Key: FLINK-1801
 URL: https://issues.apache.org/jira/browse/FLINK-1801
 Project: Flink
  Issue Type: Sub-task
  Components: TaskManager
Affects Versions: 0.9
Reporter: Stephan Ewen
Assignee: Stephan Ewen
 Fix For: 0.9


 The NetworkEnvironment should be able to start without a dedicated JobManager 
 association and get one / loose one as the TaskManager connects to different 
 JobManagers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (FLINK-1465) GlobalBufferPool reports negative memory allocation

2015-03-30 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-1465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen resolved FLINK-1465.
-
   Resolution: Fixed
Fix Version/s: 0.9
 Assignee: Stephan Ewen  (was: Ufuk Celebi)

Fixed via ee273dbe01e95d2b260fa690e21e2c244a2a5711

 GlobalBufferPool reports negative memory allocation
 ---

 Key: FLINK-1465
 URL: https://issues.apache.org/jira/browse/FLINK-1465
 Project: Flink
  Issue Type: Bug
  Components: Local Runtime, TaskManager
Affects Versions: 0.9
Reporter: Robert Metzger
Assignee: Stephan Ewen
 Fix For: 0.9


 I've got this error message when starting Flink.
 It does not really help me. I suspect that my configuration files (which 
 worked with 0.8 aren't working with 0.9 anymore). Still, the exception is 
 reporting weird stuff
 {code}
 11:41:02,516 INFO  
 org.apache.flink.yarn.YarnUtils$$anonfun$startActorSystemAndTaskManager$1$$anon$1
   - TaskManager successfully registered at JobManager 
 akka.tcp://fl...@cloud-18.dima.tu-berlin.de:39674/user/jo
 bmanager.
 11:41:25,230 ERROR 
 org.apache.flink.yarn.YarnUtils$$anonfun$startActorSystemAndTaskManager$1$$anon$1
   - Failed to instantiate network environment.
 java.io.IOException: Failed to instantiate network buffer pool: Could not 
 allocate enough memory segments for GlobalBufferPool (required (Mb): 0, 
 allocated (Mb): -965, missing (Mb): 965).
 at 
 org.apache.flink.runtime.io.network.NetworkEnvironment.init(NetworkEnvironment.java:81)
 at 
 org.apache.flink.runtime.taskmanager.TaskManager.setupNetworkEnvironment(TaskManager.scala:508)
 at 
 org.apache.flink.runtime.taskmanager.TaskManager.org$apache$flink$runtime$taskmanager$TaskManager$$finishRegistration(TaskManager.scala:479)
 at 
 org.apache.flink.runtime.taskmanager.TaskManager$$anonfun$receiveWithLogMessages$1.applyOrElse(TaskManager.scala:226)
 at 
 scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33)
 at 
 scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33)
 at 
 scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25)
 at 
 org.apache.flink.yarn.YarnTaskManager$$anonfun$receiveYarnMessages$1.applyOrElse(YarnTaskManager.scala:32)
 at scala.PartialFunction$OrElse.apply(PartialFunction.scala:162)
 at 
 org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:41)
 at 
 org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:27)
 at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118)
 at 
 org.apache.flink.runtime.ActorLogMessages$$anon$1.applyOrElse(ActorLogMessages.scala:27)
 at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
 at 
 org.apache.flink.runtime.taskmanager.TaskManager.aroundReceive(TaskManager.scala:78)
 at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
 at akka.actor.ActorCell.invoke(ActorCell.scala:487)
 at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254)
 at akka.dispatch.Mailbox.run(Mailbox.scala:221)
 at akka.dispatch.Mailbox.exec(Mailbox.scala:231)
 at 
 scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
 at 
 scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
 at 
 scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
 at 
 scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
 Caused by: java.lang.OutOfMemoryError: Could not allocate enough memory 
 segments for GlobalBufferPool (required (Mb): 0, allocated (Mb): -965, 
 missing (Mb): 965).
 at 
 org.apache.flink.runtime.io.network.buffer.NetworkBufferPool.init(NetworkBufferPool.java:76)
 at 
 org.apache.flink.runtime.io.network.NetworkEnvironment.init(NetworkEnvironment.java:78)
 ... 23 more
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1435) TaskManager does not log missing memory error on start up

2015-03-30 Thread Stephan Ewen (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14386949#comment-14386949
 ] 

Stephan Ewen commented on FLINK-1435:
-

I think we misunderstood this issue initially. This seems like the TaskManager 
is started with a heap size that exceeds the physical memory of the machine. It 
is possible to do that, if your OS has enough swap space.
The process hangs, because it is incredibly slow doe to non-stop swapping.

Inside the JVM, you do not see that memory is missing, because it is not, it 
only comes from the swap space.

This is not a Flink bug, such mis-configuration is well possible.

 TaskManager does not log missing memory error on start up
 -

 Key: FLINK-1435
 URL: https://issues.apache.org/jira/browse/FLINK-1435
 Project: Flink
  Issue Type: Bug
  Components: TaskManager
Affects Versions: 0.7.0-incubating
Reporter: Malte Schwarzer
Priority: Minor
  Labels: memorymanager, starter

 When using bin/start-cluster.sh to start TaskManagers and a worker node is 
 failing to start because of missing memory, you do not receive any error 
 messages in log files.
 Worker node has only 15000M memory available, but it is configured with 
 Maximum heap size: 4 MiBytes. Task manager does not join the cluster. 
 Process hangs.
 Last lines of log looks like this:
 ...
 ... - - Starting with 12 incoming and 12 outgoing connection threads.
 ... - Setting low water mark to 16384 and high water mark to 32768 bytes.
 ... - Instantiated PooledByteBufAllocator with direct arenas: 24, heap 
 arenas: 0, page size (bytes): 65536, chunk size (bytes): 16777216.
 ... - Using 0.7 of the free heap space for managed memory.
 ... - Initializing memory manager with 24447 megabytes of memory. Page size 
 is 32768 bytes.
 (END)
 Error message about not enough memory is missing.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1465) GlobalBufferPool reports negative memory allocation

2015-03-30 Thread Stephan Ewen (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14386966#comment-14386966
 ] 

Stephan Ewen commented on FLINK-1465:
-

This is actually an integer overflow issue.

I have a fix coming up...

 GlobalBufferPool reports negative memory allocation
 ---

 Key: FLINK-1465
 URL: https://issues.apache.org/jira/browse/FLINK-1465
 Project: Flink
  Issue Type: Bug
  Components: Local Runtime, TaskManager
Affects Versions: 0.9
Reporter: Robert Metzger
Assignee: Ufuk Celebi

 I've got this error message when starting Flink.
 It does not really help me. I suspect that my configuration files (which 
 worked with 0.8 aren't working with 0.9 anymore). Still, the exception is 
 reporting weird stuff
 {code}
 11:41:02,516 INFO  
 org.apache.flink.yarn.YarnUtils$$anonfun$startActorSystemAndTaskManager$1$$anon$1
   - TaskManager successfully registered at JobManager 
 akka.tcp://fl...@cloud-18.dima.tu-berlin.de:39674/user/jo
 bmanager.
 11:41:25,230 ERROR 
 org.apache.flink.yarn.YarnUtils$$anonfun$startActorSystemAndTaskManager$1$$anon$1
   - Failed to instantiate network environment.
 java.io.IOException: Failed to instantiate network buffer pool: Could not 
 allocate enough memory segments for GlobalBufferPool (required (Mb): 0, 
 allocated (Mb): -965, missing (Mb): 965).
 at 
 org.apache.flink.runtime.io.network.NetworkEnvironment.init(NetworkEnvironment.java:81)
 at 
 org.apache.flink.runtime.taskmanager.TaskManager.setupNetworkEnvironment(TaskManager.scala:508)
 at 
 org.apache.flink.runtime.taskmanager.TaskManager.org$apache$flink$runtime$taskmanager$TaskManager$$finishRegistration(TaskManager.scala:479)
 at 
 org.apache.flink.runtime.taskmanager.TaskManager$$anonfun$receiveWithLogMessages$1.applyOrElse(TaskManager.scala:226)
 at 
 scala.runtime.AbstractPartialFunction$mcVL$sp.apply$mcVL$sp(AbstractPartialFunction.scala:33)
 at 
 scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:33)
 at 
 scala.runtime.AbstractPartialFunction$mcVL$sp.apply(AbstractPartialFunction.scala:25)
 at 
 org.apache.flink.yarn.YarnTaskManager$$anonfun$receiveYarnMessages$1.applyOrElse(YarnTaskManager.scala:32)
 at scala.PartialFunction$OrElse.apply(PartialFunction.scala:162)
 at 
 org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:41)
 at 
 org.apache.flink.runtime.ActorLogMessages$$anon$1.apply(ActorLogMessages.scala:27)
 at scala.PartialFunction$class.applyOrElse(PartialFunction.scala:118)
 at 
 org.apache.flink.runtime.ActorLogMessages$$anon$1.applyOrElse(ActorLogMessages.scala:27)
 at akka.actor.Actor$class.aroundReceive(Actor.scala:465)
 at 
 org.apache.flink.runtime.taskmanager.TaskManager.aroundReceive(TaskManager.scala:78)
 at akka.actor.ActorCell.receiveMessage(ActorCell.scala:516)
 at akka.actor.ActorCell.invoke(ActorCell.scala:487)
 at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:254)
 at akka.dispatch.Mailbox.run(Mailbox.scala:221)
 at akka.dispatch.Mailbox.exec(Mailbox.scala:231)
 at 
 scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
 at 
 scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
 at 
 scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
 at 
 scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
 Caused by: java.lang.OutOfMemoryError: Could not allocate enough memory 
 segments for GlobalBufferPool (required (Mb): 0, allocated (Mb): -965, 
 missing (Mb): 965).
 at 
 org.apache.flink.runtime.io.network.buffer.NetworkBufferPool.init(NetworkBufferPool.java:76)
 at 
 org.apache.flink.runtime.io.network.NetworkEnvironment.init(NetworkEnvironment.java:78)
 ... 23 more
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1319) Add static code analysis for UDFs

2015-03-26 Thread Stephan Ewen (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14382269#comment-14382269
 ] 

Stephan Ewen commented on FLINK-1319:
-

I think there is no reason they are not available in the Scala API.

They absolutely should be ;-)

I vote to move them to the core project.

 Add static code analysis for UDFs
 -

 Key: FLINK-1319
 URL: https://issues.apache.org/jira/browse/FLINK-1319
 Project: Flink
  Issue Type: New Feature
  Components: Java API, Scala API
Reporter: Stephan Ewen
Assignee: Timo Walther
Priority: Minor

 Flink's Optimizer takes information that tells it for UDFs which fields of 
 the input elements are accessed, modified, or frwarded/copied. This 
 information frequently helps to reuse partitionings, sorts, etc. It may speed 
 up programs significantly, as it can frequently eliminate sorts and shuffles, 
 which are costly.
 Right now, users can add lightweight annotations to UDFs to provide this 
 information (such as adding {{@ConstandFields(0-3, 1, 2-1)}}.
 We worked with static code analysis of UDFs before, to determine this 
 information automatically. This is an incredible feature, as it magically 
 makes programs faster.
 For record-at-a-time operations (Map, Reduce, FlatMap, Join, Cross), this 
 works surprisingly well in many cases. We used the Soot toolkit for the 
 static code analysis. Unfortunately, Soot is LGPL licensed and thus we did 
 not include any of the code so far.
 I propose to add this functionality to Flink, in the form of a drop-in 
 addition, to work around the LGPL incompatibility with ALS 2.0. Users could 
 simply download a special flink-code-analysis.jar and drop it into the 
 lib folder to enable this functionality. We may even add a script to 
 tools that downloads that library automatically into the lib folder. This 
 should be legally fine, since we do not redistribute LGPL code and only 
 dynamically link it (the incompatibility with ASL 2.0 is mainly in the 
 patentability, if I remember correctly).
 Prior work on this has been done by [~aljoscha] and [~skunert], which could 
 provide a code base to start with.
 *Appendix*
 Hompage to Soot static analysis toolkit: http://www.sable.mcgill.ca/soot/
 Papers on static analysis and for optimization: 
 http://stratosphere.eu/assets/papers/EnablingOperatorReorderingSCA_12.pdf and 
 http://stratosphere.eu/assets/papers/openingTheBlackBoxes_12.pdf
 Quick introduction to the Optimizer: 
 http://stratosphere.eu/assets/papers/2014-VLDBJ_Stratosphere_Overview.pdf 
 (Section 6)
 Optimizer for Iterations: 
 http://stratosphere.eu/assets/papers/spinningFastIterativeDataFlows_12.pdf 
 (Sections 4.3 and 5.3)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1783) Quickstart shading should not created shaded jar and dependency reduced pom

2015-03-26 Thread Stephan Ewen (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14382294#comment-14382294
 ] 

Stephan Ewen commented on FLINK-1783:
-

It does not affect one, it only causes maven to create additional artifacts in 
the {{target}} directory: a reduced pom, an original jar, a reduced jar.
I saw people being confused by which jar to use...

 Quickstart shading should not created shaded jar and dependency reduced pom
 ---

 Key: FLINK-1783
 URL: https://issues.apache.org/jira/browse/FLINK-1783
 Project: Flink
  Issue Type: Improvement
  Components: Quickstarts
Affects Versions: 0.9
Reporter: Stephan Ewen
 Fix For: 0.9






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1783) Quickstart shading should not created shaded jar and dependency reduced pom

2015-03-26 Thread Stephan Ewen (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14382295#comment-14382295
 ] 

Stephan Ewen commented on FLINK-1783:
-

Even though they were identical in the end...

 Quickstart shading should not created shaded jar and dependency reduced pom
 ---

 Key: FLINK-1783
 URL: https://issues.apache.org/jira/browse/FLINK-1783
 Project: Flink
  Issue Type: Improvement
  Components: Quickstarts
Affects Versions: 0.9
Reporter: Stephan Ewen
 Fix For: 0.9






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1789) Allow adding of URLs to the usercode class loader

2015-03-26 Thread Stephan Ewen (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14382291#comment-14382291
 ] 

Stephan Ewen commented on FLINK-1789:
-

I think that is a great idea. We can even use that for lazy class loading for 
interactive jobs.

Brilliant idea, actually!

 Allow adding of URLs to the usercode class loader
 -

 Key: FLINK-1789
 URL: https://issues.apache.org/jira/browse/FLINK-1789
 Project: Flink
  Issue Type: Improvement
  Components: Distributed Runtime
Reporter: Timo Walther
Assignee: Timo Walther
Priority: Minor

 Currently, there is no option to add customs classpath URLs to the 
 FlinkUserCodeClassLoader. JARs always need to be shipped to the cluster even 
 if they are already present on all nodes.
 It would be great if RemoteEnvironment also accepts valid classpaths URLs and 
 forwards them to BlobLibraryCacheManager.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1782) Change Quickstart Java version to 1.7

2015-03-26 Thread Stephan Ewen (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14382276#comment-14382276
 ] 

Stephan Ewen commented on FLINK-1782:
-

I'll take a quick stab at fixing the quickstart related issues...

 Change Quickstart Java version to 1.7
 -

 Key: FLINK-1782
 URL: https://issues.apache.org/jira/browse/FLINK-1782
 Project: Flink
  Issue Type: Improvement
  Components: Quickstarts
Affects Versions: 0.9
Reporter: Stephan Ewen
 Fix For: 0.9


 The quickstarts refer to the outdated Java 1.6 source and bin version. We 
 should upgrade this to 1.7.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (FLINK-1796) Local mode TaskManager should have a process reaper

2015-03-29 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-1796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen resolved FLINK-1796.
-
Resolution: Fixed

Fixed via 8c32142528590a030693529c7c8d93f194968c0a

 Local mode TaskManager should have a process reaper
 ---

 Key: FLINK-1796
 URL: https://issues.apache.org/jira/browse/FLINK-1796
 Project: Flink
  Issue Type: Bug
  Components: JobManager
Affects Versions: 0.9
Reporter: Stephan Ewen
Assignee: Stephan Ewen
 Fix For: 0.9


 We use process reaper actors (a typical Akka design pattern) to shut down the 
 JVM processes when the core actors die, as this is currently unrecoverable.
 The local mode uses the process reaper only for the JobManager actor, not for 
 the TaskManager actor. This may lead to dead stale JVMs on critical 
 TaskManager errors and makes debugging harder.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (FLINK-1795) Solution set allows duplicates upon construction.

2015-03-29 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-1795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen resolved FLINK-1795.
-
Resolution: Fixed

Fixed via 923a2ae259bd72a2d48639ae0e64db0a04a4aa91

 Solution set allows duplicates upon construction.
 -

 Key: FLINK-1795
 URL: https://issues.apache.org/jira/browse/FLINK-1795
 Project: Flink
  Issue Type: Bug
  Components: Iterations
Affects Versions: 0.9
Reporter: Stephan Ewen
Assignee: Stephan Ewen
 Fix For: 0.9


 The solution set identifies entries uniquely by key.
 During construction, it does not eliminate duplicates. The duplicates do not 
 get updated during the iterations (since only the first match is considered), 
 but are contained in the final result.
 This contradicts the definition of the solution set. It should not contain 
 duplicates to begin with.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (FLINK-1669) Streaming tests for recovery with distributed process failure

2015-03-29 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-1669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen resolved FLINK-1669.
-
Resolution: Fixed

Fixed in c284745ee4612054339842789b0d87eb4f9a821d

 Streaming tests for recovery with distributed process failure
 -

 Key: FLINK-1669
 URL: https://issues.apache.org/jira/browse/FLINK-1669
 Project: Flink
  Issue Type: Sub-task
  Components: Streaming
Affects Versions: 0.9
Reporter: Márton Balassi
Assignee: Márton Balassi
 Fix For: 0.9


 Multiple JVM test for streaming recovery from failure.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (FLINK-1608) TaskManagers may pick wrong network interface when starting before JobManager

2015-03-02 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-1608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen resolved FLINK-1608.
-
Resolution: Fixed
  Assignee: Stephan Ewen

Resolved in 861ebe753ff982b4cbf7c3c5b8c43fa306ac89f0

 TaskManagers may pick wrong network interface when starting before JobManager
 -

 Key: FLINK-1608
 URL: https://issues.apache.org/jira/browse/FLINK-1608
 Project: Flink
  Issue Type: Bug
  Components: TaskManager
Affects Versions: 0.9
Reporter: Stephan Ewen
Assignee: Stephan Ewen
 Fix For: 0.9


 The TaskManager uses a NetUtils routine to pick a network interface that lets 
 it talk to the Jobmanager. However, if the JobManager is not online yet, the 
 TaskManager falls back to an arbitrary non-localhost device.
 In cases where the TaskManagers start faster than the JobManager, they may 
 pick the wrong interface (and associated address and hostname)
 The later logic (that tries to connect to the JobManager actor) does several 
 retries. I think we need similar logic when looking for a suitable network 
 interface to use.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1628) Strange behavior of where function during a join

2015-03-02 Thread Stephan Ewen (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14343584#comment-14343584
 ] 

Stephan Ewen commented on FLINK-1628:
-

Is this specific to the combination of tuple positions and key selector in the 
where clause, or does it also occur when using tuple positions for both inputs 
(where and equalTo clause)

 Strange behavior of where function during a join
 --

 Key: FLINK-1628
 URL: https://issues.apache.org/jira/browse/FLINK-1628
 Project: Flink
  Issue Type: Bug
Affects Versions: 0.9
Reporter: Daniel Bali
  Labels: batch

 Hello!
 If I use the `where` function with a field list during a join, it exhibits 
 strange behavior.
 Here is the sample code that triggers the error: 
 https://gist.github.com/balidani/d9789b713e559d867d5c
 This example joins a DataSet with itself, then counts the number of rows. If 
 I use `.where(0, 1)` the result is (22), which is not correct. If I use 
 `EdgeKeySelector`, I get the correct result (101).
 When I pass a field list to the `equalTo` function (but not `where`), 
 everything works again.
 If I don't include the `groupBy` and `reduceGroup` parts, everything works.
 Also, when working with large DataSets, passing a field list to `where` makes 
 it incredibly slow, even though I don't see any exceptions in the log (in 
 DEBUG mode).
 Does anybody know what might cause this problem?
 Thanks!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-1631) Port collisions in ProcessReaping tests

2015-03-02 Thread Stephan Ewen (JIRA)
Stephan Ewen created FLINK-1631:
---

 Summary: Port collisions in ProcessReaping tests
 Key: FLINK-1631
 URL: https://issues.apache.org/jira/browse/FLINK-1631
 Project: Flink
  Issue Type: Bug
  Components: Distributed Runtime
Affects Versions: 0.9
Reporter: Stephan Ewen
Assignee: Stephan Ewen
 Fix For: 0.9


The process reaping tests for the JobManager spawn a process that starts a 
webserver on the default port. It may happen that this port is not available, 
due to another concurrently running task.

I suggest to add an option to not start the webserver to prevent this, by 
setting the webserver port to {{-1}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1616) Action list -r gives IOException when there are running jobs

2015-03-03 Thread Stephan Ewen (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14345457#comment-14345457
 ] 

Stephan Ewen commented on FLINK-1616:
-

There are a few issues with the CLI frontend.

I am doing a major cleanup today.

 Action list -r gives IOException when there are running jobs
 --

 Key: FLINK-1616
 URL: https://issues.apache.org/jira/browse/FLINK-1616
 Project: Flink
  Issue Type: Bug
Affects Versions: 0.9
Reporter: Vasia Kalavri
Priority: Minor

 Here's the full exception:
 java.io.IOException: Could not retrieve running jobs from job manager.
   at org.apache.flink.client.CliFrontend.list(CliFrontend.java:528)
   at 
 org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:1089)
   at org.apache.flink.client.CliFrontend.main(CliFrontend.java:1114)
 Caused by: akka.pattern.AskTimeoutException: Ask timed out on 
 [Actor[akka.tcp://flink@10.20.0.25:13245/user/jobmanager#-1517424081]] after 
 [10 ms]
   at 
 akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:333)
   at akka.actor.Scheduler$$anon$7.run(Scheduler.scala:117)
   at 
 akka.actor.LightArrayRevolverScheduler$TaskHolder.run(Scheduler.scala:476)
   at 
 akka.actor.LightArrayRevolverScheduler$$anonfun$close$1.apply(Scheduler.scala:282)
   at 
 akka.actor.LightArrayRevolverScheduler$$anonfun$close$1.apply(Scheduler.scala:281)
   at scala.collection.Iterator$class.foreach(Iterator.scala:727)
   at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
   at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
   at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
   at akka.actor.LightArrayRevolverScheduler.close(Scheduler.scala:280)
   at akka.actor.ActorSystemImpl.stopScheduler(ActorSystem.scala:688)
   at 
 akka.actor.ActorSystemImpl$$anonfun$liftedTree2$1$1.apply$mcV$sp(ActorSystem.scala:617)
   at 
 akka.actor.ActorSystemImpl$$anonfun$liftedTree2$1$1.apply(ActorSystem.scala:617)
   at 
 akka.actor.ActorSystemImpl$$anonfun$liftedTree2$1$1.apply(ActorSystem.scala:617)
   at akka.actor.ActorSystemImpl$$anon$3.run(ActorSystem.scala:641)
   at 
 akka.actor.ActorSystemImpl$TerminationCallbacks$$anonfun$run$1.runNext$1(ActorSystem.scala:808)
   at 
 akka.actor.ActorSystemImpl$TerminationCallbacks$$anonfun$run$1.apply$mcV$sp(ActorSystem.scala:811)
   at 
 akka.actor.ActorSystemImpl$TerminationCallbacks$$anonfun$run$1.apply(ActorSystem.scala:804)
   at 
 akka.actor.ActorSystemImpl$TerminationCallbacks$$anonfun$run$1.apply(ActorSystem.scala:804)
   at akka.util.ReentrantGuard.withGuard(LockUtil.scala:15)
   at 
 akka.actor.ActorSystemImpl$TerminationCallbacks.run(ActorSystem.scala:804)
   at 
 akka.actor.ActorSystemImpl$$anonfun$terminationCallbacks$1.apply(ActorSystem.scala:638)
   at 
 akka.actor.ActorSystemImpl$$anonfun$terminationCallbacks$1.apply(ActorSystem.scala:638)
   at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
   at 
 akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.processBatch$1(BatchingExecutor.scala:67)
   at 
 akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:82)
   at 
 akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59)
   at 
 akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59)
   at 
 scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72)
   at akka.dispatch.BatchingExecutor$Batch.run(BatchingExecutor.scala:58)
   at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:41)
   at 
 akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:401)
   at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
   at 
 scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.pollAndExecAll(ForkJoinPool.java:1253)
   at 
 scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1346)
   at 
 scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
   at 
 scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
 If there are no running jobs, no exception is thrown.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (FLINK-1631) Port collisions in ProcessReaping tests

2015-03-03 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-1631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen resolved FLINK-1631.
-
Resolution: Fixed

Fixed via 94a66d570e4bb40824813911a4f1bb47a8bf8b90

 Port collisions in ProcessReaping tests
 ---

 Key: FLINK-1631
 URL: https://issues.apache.org/jira/browse/FLINK-1631
 Project: Flink
  Issue Type: Bug
  Components: Distributed Runtime
Affects Versions: 0.9
Reporter: Stephan Ewen
Assignee: Stephan Ewen
 Fix For: 0.9


 The process reaping tests for the JobManager spawn a process that starts a 
 webserver on the default port. It may happen that this port is not available, 
 due to another concurrently running task.
 I suggest to add an option to not start the webserver to prevent this, by 
 setting the webserver port to {{-1}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-1648) Add a mode where the system automatically sets the parallelism to the available task slots

2015-03-04 Thread Stephan Ewen (JIRA)
Stephan Ewen created FLINK-1648:
---

 Summary: Add a mode where the system automatically sets the 
parallelism to the available task slots
 Key: FLINK-1648
 URL: https://issues.apache.org/jira/browse/FLINK-1648
 Project: Flink
  Issue Type: New Feature
  Components: JobManager
Affects Versions: 0.9
Reporter: Stephan Ewen
Assignee: Stephan Ewen
 Fix For: 0.9


This is basically a port of this code form the 0.8 release:

https://github.com/apache/flink/pull/410



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1650) Suppress Akka's Netty Shutdown Errors through the log config

2015-03-04 Thread Stephan Ewen (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14347450#comment-14347450
 ] 

Stephan Ewen commented on FLINK-1650:
-

The logged error is below.
Setting the log level to ERROR should allow us to see critical messages and 
suppress the confusing warnings that seem to be an akka/netty bug.

{code}
Feb 18, 2015 5:25:18 PM org.jboss.netty.channel.DefaultChannelPipeline
WARNING: An exception was thrown by an exception handler.
java.util.concurrent.RejectedExecutionException: Worker has already been 
shutdown
at 
org.jboss.netty.channel.socket.nio.AbstractNioSelector.registerTask(AbstractNioSelector.java:120)
at 
org.jboss.netty.channel.socket.nio.AbstractNioWorker.executeInIoThread(AbstractNioWorker.java:72)
at 
org.jboss.netty.channel.socket.nio.NioWorker.executeInIoThread(NioWorker.java:36)
at 
org.jboss.netty.channel.socket.nio.AbstractNioWorker.executeInIoThread(AbstractNioWorker.java:56)
at 
org.jboss.netty.channel.socket.nio.NioWorker.executeInIoThread(NioWorker.java:36)
at 
org.jboss.netty.channel.socket.nio.AbstractNioChannelSink.execute(AbstractNioChannelSink.java:34)
at 
org.jboss.netty.channel.Channels.fireExceptionCaughtLater(Channels.java:496)
at 
org.jboss.netty.channel.AbstractChannelSink.exceptionCaught(AbstractChannelSink.java:46)
at 
org.jboss.netty.handler.codec.oneone.OneToOneEncoder.handleDownstream(OneToOneEncoder.java:54)
at org.jboss.netty.channel.Channels.disconnect(Channels.java:781)
at 
org.jboss.netty.channel.AbstractChannel.disconnect(AbstractChannel.java:211)
at 
akka.remote.transport.netty.NettyTransport$$anonfun$gracefulClose$1.apply(NettyTransport.scala:223)
at 
akka.remote.transport.netty.NettyTransport$$anonfun$gracefulClose$1.apply(NettyTransport.scala:222)
at scala.util.Success.foreach(Try.scala:205)
at scala.concurrent.Future$$anonfun$foreach$1.apply(Future.scala:204)
at scala.concurrent.Future$$anonfun$foreach$1.apply(Future.scala:204)
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
at 
akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.processBatch$1(BatchingExecutor.scala:67)
at 
akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply$mcV$sp(BatchingExecutor.scala:82)
at 
akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59)
at 
akka.dispatch.BatchingExecutor$Batch$$anonfun$run$1.apply(BatchingExecutor.scala:59)
at 
scala.concurrent.BlockContext$.withBlockContext(BlockContext.scala:72)
at akka.dispatch.BatchingExecutor$Batch.run(BatchingExecutor.scala:58)
at akka.dispatch.TaskInvocation.run(AbstractDispatcher.scala:41)
at 
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:401)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at 
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.pollAndExecAll(ForkJoinPool.java:1253)
at 
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1346)
at 
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at 
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
{code}

 Suppress Akka's Netty Shutdown Errors through the log config
 

 Key: FLINK-1650
 URL: https://issues.apache.org/jira/browse/FLINK-1650
 Project: Flink
  Issue Type: Bug
  Components: other
Affects Versions: 0.9
Reporter: Stephan Ewen
Assignee: Stephan Ewen
 Fix For: 0.9


 I suggest to set the logging for 
 `org.jboss.netty.channel.DefaultChannelPipeline` to error, in order to get 
 rid of the misleading stack trace caused by an akka/netty hickup on shutdown.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-1649) Give a good error message when a user program emits a null record

2015-03-04 Thread Stephan Ewen (JIRA)
Stephan Ewen created FLINK-1649:
---

 Summary: Give a good error message when a user program emits a 
null record
 Key: FLINK-1649
 URL: https://issues.apache.org/jira/browse/FLINK-1649
 Project: Flink
  Issue Type: Bug
  Components: Local Runtime
Affects Versions: 0.9
Reporter: Stephan Ewen
Assignee: Stephan Ewen
 Fix For: 0.9






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1572) Output directories are created before input paths are checked

2015-03-05 Thread Stephan Ewen (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14348675#comment-14348675
 ] 

Stephan Ewen commented on FLINK-1572:
-

Can we always remove the output directory? What if it was already there and 
only the files (1, 2, 3, ...) were created by Flink?

 Output directories are created before input paths are checked
 -

 Key: FLINK-1572
 URL: https://issues.apache.org/jira/browse/FLINK-1572
 Project: Flink
  Issue Type: Improvement
  Components: JobManager
Affects Versions: 0.9
Reporter: Robert Metzger
Priority: Minor

 Flink is first creating the output directories for a job before creating the 
 input splits.
 If a job's input directories are wrong, the system will have created output 
 directories for a failed job.
 It would be much better if the system is creating the output directories on 
 demand before data is actually written.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1601) Sometimes the YARNSessionFIFOITCase fails on Travis

2015-03-05 Thread Stephan Ewen (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14348685#comment-14348685
 ] 

Stephan Ewen commented on FLINK-1601:
-

[~rmetzger] Any insight where this may come from?

 Sometimes the YARNSessionFIFOITCase fails on Travis
 ---

 Key: FLINK-1601
 URL: https://issues.apache.org/jira/browse/FLINK-1601
 Project: Flink
  Issue Type: Bug
Reporter: Till Rohrmann
Assignee: Robert Metzger

 Sometimes the {{YARNSessionFIFOITCase}} fails on Travis with the following 
 exception.
 {code}
 Tests run: 8, Failures: 8, Errors: 0, Skipped: 0, Time elapsed: 71.375 sec 
  FAILURE! - in org.apache.flink.yarn.YARNSessionFIFOITCase
 perJobYarnCluster(org.apache.flink.yarn.YARNSessionFIFOITCase)  Time elapsed: 
 60.707 sec   FAILURE!
 java.lang.AssertionError: During the timeout period of 60 seconds the 
 expected string did not show up
   at org.junit.Assert.fail(Assert.java:88)
   at org.junit.Assert.assertTrue(Assert.java:41)
   at org.apache.flink.yarn.YarnTestBase.runWithArgs(YarnTestBase.java:315)
   at 
 org.apache.flink.yarn.YARNSessionFIFOITCase.perJobYarnCluster(YARNSessionFIFOITCase.java:185)
 testQueryCluster(org.apache.flink.yarn.YARNSessionFIFOITCase)  Time elapsed: 
 0.507 sec   FAILURE!
 java.lang.AssertionError: There is at least one application on the cluster is 
 not finished
   at org.junit.Assert.fail(Assert.java:88)
   at 
 org.apache.flink.yarn.YarnTestBase.checkClusterEmpty(YarnTestBase.java:146)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   at 
 org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
   at 
 org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
   at 
 org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
   at 
 org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:24)
   at 
 org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
   at 
 org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
   at 
 org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
   at 
 org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
   at 
 org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
   at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
   at 
 org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:264)
   at 
 org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:153)
   at 
 org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:124)
   at 
 org.apache.maven.surefire.booter.ForkedBooter.invokeProviderInSameClassLoader(ForkedBooter.java:200)
   at 
 org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:153)
   at 
 org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:103)
 {code}
 The result is
 {code}
 Failed tests: 
   YARNSessionFIFOITCase.perJobYarnCluster:185-YarnTestBase.runWithArgs:315 
 During the timeout period of 60 seconds the expected string did not show up
   YARNSessionFIFOITCaseYarnTestBase.checkClusterEmpty:146 There is at least 
 one application on the cluster is not finished
   YARNSessionFIFOITCaseYarnTestBase.checkClusterEmpty:146 There is at least 
 one application on the cluster is not finished
   YARNSessionFIFOITCaseYarnTestBase.checkClusterEmpty:146 There is at least 
 one application on the cluster is not finished
   YARNSessionFIFOITCaseYarnTestBase.checkClusterEmpty:146 There is at least 
 one application on the cluster is not finished
   YARNSessionFIFOITCaseYarnTestBase.checkClusterEmpty:146 There is at least 
 one application on the cluster is not finished
   YARNSessionFIFOITCaseYarnTestBase.checkClusterEmpty:146 There is at least 
 one application on the cluster is not finished
   YARNSessionFIFOITCaseYarnTestBase.checkClusterEmpty:146 There is at least 
 one application on 

[jira] [Resolved] (FLINK-1649) Give a good error message when a user program emits a null record

2015-03-05 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-1649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen resolved FLINK-1649.
-
Resolution: Fixed

Fixed via 482766e949d69e282ed862bd97f2a8378b2f699e

 Give a good error message when a user program emits a null record
 -

 Key: FLINK-1649
 URL: https://issues.apache.org/jira/browse/FLINK-1649
 Project: Flink
  Issue Type: Bug
  Components: Local Runtime
Affects Versions: 0.9
Reporter: Stephan Ewen
Assignee: Stephan Ewen
 Fix For: 0.9






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)



[jira] [Commented] (FLINK-1635) Remove Apache Thrift dependency from Flink

2015-03-03 Thread Stephan Ewen (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14344841#comment-14344841
 ] 

Stephan Ewen commented on FLINK-1635:
-

I agree, let's remove this. The way that it currently does out-of-the-box ties 
it to a specific version, which is bad for both protobuf and thrift.

 Remove Apache Thrift dependency from Flink
 --

 Key: FLINK-1635
 URL: https://issues.apache.org/jira/browse/FLINK-1635
 Project: Flink
  Issue Type: Bug
  Components: Java API
Affects Versions: 0.9
Reporter: Robert Metzger

 I've added Thrift and Protobuf to Flink to support it out of the box with 
 Kryo.
 However, after trying to access a HCatalog/Hive table yesterday using Flink I 
 found that there is a dependency conflict between Flink and Hive (on thrift).
 Maybe it makes more sense to properly document our serialization framework 
 and provide a copypaste solution on how to get thrift/protobuf et al to 
 work with Flink.
 Please chime in if you are against removing the out of the box support for 
 protobuf and kryo.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (FLINK-1608) TaskManagers may pick wrong network interface when starting before JobManager

2015-02-24 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-1608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen updated FLINK-1608:

Description: 
The taskmanagers use a NetUtils routine to find an interface that lets them 
talk to the Jobmanager. However, if the JobManager is not online yet, they fall 
back to some non-localhost device.

In cases where the TaskManagers start faster than the JobManager, they pick the 
wrong hostname and interface.

The later logic (that tries to connect to the JobManager actor) has a logic 
with retries. I think we need a similar logic here...

  was:
The taskmanagers use a NetUtils routine to find an interface that lets them 
talk to the Jobmanager. However, if the JobManager is not online yet, they fall 
back to localhost.

In cases where the TaskManagers start faster than the JobManager, they pick the 
wrong hostname and interface.


 TaskManagers may pick wrong network interface when starting before JobManager
 -

 Key: FLINK-1608
 URL: https://issues.apache.org/jira/browse/FLINK-1608
 Project: Flink
  Issue Type: Bug
  Components: TaskManager
Affects Versions: 0.9
Reporter: Stephan Ewen
 Fix For: 0.9


 The taskmanagers use a NetUtils routine to find an interface that lets them 
 talk to the Jobmanager. However, if the JobManager is not online yet, they 
 fall back to some non-localhost device.
 In cases where the TaskManagers start faster than the JobManager, they pick 
 the wrong hostname and interface.
 The later logic (that tries to connect to the JobManager actor) has a logic 
 with retries. I think we need a similar logic here...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1580) Cleanup TaskManager initialization logic

2015-02-24 Thread Stephan Ewen (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335468#comment-14335468
 ] 

Stephan Ewen commented on FLINK-1580:
-

Partially solved in ed8b26bf2e8dd7c187c24ad0d8ff3e67f6a7478c

 Cleanup TaskManager initialization logic
 

 Key: FLINK-1580
 URL: https://issues.apache.org/jira/browse/FLINK-1580
 Project: Flink
  Issue Type: Bug
Reporter: Till Rohrmann

 Currently, the TaskManager initializes many heavy load objects upon 
 registration at the JobManager. If an exception occurs during the 
 initialization it takes quite long until the {{JobManager}} detects the 
 {{TaskManager}} failure.
 Therefore, it would be better if we could rearrange the initialization logic 
 so that the {{TaskManager}} only registers at the {{JobManager}} if the all 
 objects could be initialized successfully. Moreover, it would be worthwhile 
 to move some of the initialization work out of the {{TaskManager}}.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (FLINK-1608) TaskManagers may pick wrong network interface when starting before JobManager

2015-02-24 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-1608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen updated FLINK-1608:

Description: 
The TaskManager uses a NetUtils routine to pick a network interface that lets 
it talk to the Jobmanager. However, if the JobManager is not online yet, the 
TaskManager falls back to an arbitrary non-localhost device.

In cases where the TaskManagers start faster than the JobManager, they may pick 
the wrong interface (and associated address and hostname)

The later logic (that tries to connect to the JobManager actor) does several 
retries. I think we need similar logic when looking for a suitable network 
interface to use.

  was:
The taskmanagers use a NetUtils routine to find an interface that lets them 
talk to the Jobmanager. However, if the JobManager is not online yet, they fall 
back to some non-localhost device.

In cases where the TaskManagers start faster than the JobManager, they pick the 
wrong hostname and interface.

The later logic (that tries to connect to the JobManager actor) has a logic 
with retries. I think we need a similar logic here...


 TaskManagers may pick wrong network interface when starting before JobManager
 -

 Key: FLINK-1608
 URL: https://issues.apache.org/jira/browse/FLINK-1608
 Project: Flink
  Issue Type: Bug
  Components: TaskManager
Affects Versions: 0.9
Reporter: Stephan Ewen
 Fix For: 0.9


 The TaskManager uses a NetUtils routine to pick a network interface that lets 
 it talk to the Jobmanager. However, if the JobManager is not online yet, the 
 TaskManager falls back to an arbitrary non-localhost device.
 In cases where the TaskManagers start faster than the JobManager, they may 
 pick the wrong interface (and associated address and hostname)
 The later logic (that tries to connect to the JobManager actor) does several 
 retries. I think we need similar logic when looking for a suitable network 
 interface to use.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (FLINK-1590) Log environment information also in YARN mode

2015-02-24 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-1590?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen resolved FLINK-1590.
-
   Resolution: Fixed
Fix Version/s: 0.9
 Assignee: Stephan Ewen  (was: Robert Metzger)

Fixed via ed8b26bf2e8dd7c187c24ad0d8ff3e67f6a7478c

 Log environment information also in YARN mode
 -

 Key: FLINK-1590
 URL: https://issues.apache.org/jira/browse/FLINK-1590
 Project: Flink
  Issue Type: Improvement
  Components: YARN Client
Affects Versions: 0.9
Reporter: Robert Metzger
Assignee: Stephan Ewen
Priority: Minor
 Fix For: 0.9






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1608) TaskManagers may pick wrong network interface when starting before JobManager

2015-02-24 Thread Stephan Ewen (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14335442#comment-14335442
 ] 

Stephan Ewen commented on FLINK-1608:
-

As a safety fallback, I suggest that we allow the TaskManager hostname to be 
specified in the configuration. To make proper use of this, each TaskManager 
would need a distinct configuration. Not standard scenario, but a fallback 
solution if the automatic methods fail.

 TaskManagers may pick wrong network interface when starting before JobManager
 -

 Key: FLINK-1608
 URL: https://issues.apache.org/jira/browse/FLINK-1608
 Project: Flink
  Issue Type: Bug
  Components: TaskManager
Affects Versions: 0.9
Reporter: Stephan Ewen
 Fix For: 0.9


 The taskmanagers use a NetUtils routine to find an interface that lets them 
 talk to the Jobmanager. However, if the JobManager is not online yet, they 
 fall back to some non-localhost device.
 In cases where the TaskManagers start faster than the JobManager, they pick 
 the wrong hostname and interface.
 The later logic (that tries to connect to the JobManager actor) has a logic 
 with retries. I think we need a similar logic here...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1610) Java docs do not build

2015-02-25 Thread Stephan Ewen (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14336993#comment-14336993
 ] 

Stephan Ewen commented on FLINK-1610:
-

May be related to referring to Scala classes (and objects) from Java. Some 
tools have hickups there, due to the fact that the Scala class file names are 
not like the class name in the code (objects for example append a `$`)

 Java docs do not build
 --

 Key: FLINK-1610
 URL: https://issues.apache.org/jira/browse/FLINK-1610
 Project: Flink
  Issue Type: Bug
  Components: Build System, Documentation
Affects Versions: 0.9
Reporter: Max Michels
 Fix For: 0.9


 Among a bunch of warnings, I get the following error which prevents the java 
 doc generation from finishing:
 {code}
 javadoc: error - 
 com.sun.tools.doclets.internal.toolkit.util.DocletAbortException: 
 com.sun.tools.javac.code.Symbol$CompletionFailure: class file for 
 akka.testkit.TestKit not found
 Command line was: 
 /System/Library/Frameworks/JavaVM.framework/Versions/CurrentJDK/Home/bin/javadoc
  -Xdoclint:none @options @packages
   at 
 org.apache.maven.plugin.javadoc.AbstractJavadocMojo.executeJavadocCommandLine(AbstractJavadocMojo.java:5074)
   at 
 org.apache.maven.plugin.javadoc.AbstractJavadocMojo.executeReport(AbstractJavadocMojo.java:1999)
   at 
 org.apache.maven.plugin.javadoc.JavadocReport.generate(JavadocReport.java:130)
   at 
 org.apache.maven.plugin.javadoc.JavadocReport.execute(JavadocReport.java:315)
   at 
 org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:132)
   at 
 org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:208)
   at 
 org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:153)
   at 
 org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:145)
   at 
 org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:116)
   at 
 org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:80)
   at 
 org.apache.maven.lifecycle.internal.builder.singlethreaded.SingleThreadedBuilder.build(SingleThreadedBuilder.java:51)
   at 
 org.apache.maven.lifecycle.internal.LifecycleStarter.execute(LifecycleStarter.java:120)
   at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:355)
   at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:155)
   at org.apache.maven.cli.MavenCli.execute(MavenCli.java:584)
   at org.apache.maven.cli.MavenCli.doMain(MavenCli.java:216)
   at org.apache.maven.cli.MavenCli.main(MavenCli.java:160)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:483)
   at 
 org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced(Launcher.java:289)
   at 
 org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:229)
   at 
 org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode(Launcher.java:415)
   at 
 org.codehaus.plexus.classworlds.launcher.Launcher.main(Launcher.java:356)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1162) Cannot serialize Scala classes with Avro serializer

2015-02-23 Thread Stephan Ewen (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14333167#comment-14333167
 ] 

Stephan Ewen commented on FLINK-1162:
-

Yep, this one should be resolved.

 Cannot serialize Scala classes with Avro serializer
 ---

 Key: FLINK-1162
 URL: https://issues.apache.org/jira/browse/FLINK-1162
 Project: Flink
  Issue Type: Bug
  Components: Local Runtime, Scala API
Reporter: Till Rohrmann

 The problem occurs for class names containing a '$' dollar sign in its name 
 how it is sometimes the case for Scala classes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (FLINK-1596) FileIOChannel introduces space in temp file name

2015-02-24 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-1596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen resolved FLINK-1596.
-
Resolution: Fixed

Fixed in 0.9 in 98bc7b951b30961871958a4483e0b186bfb785b8

 FileIOChannel introduces space in temp file name
 

 Key: FLINK-1596
 URL: https://issues.apache.org/jira/browse/FLINK-1596
 Project: Flink
  Issue Type: Bug
  Components: Local Runtime
Affects Versions: 0.9, 0.8.1
Reporter: Johannes
Assignee: Johannes
Priority: Minor
  Labels: easyfix

 FLINK-1483 introduced separate directories for all threads.
 Unfortunately this seems to not work on windows, due to spaces in the filename
 Stacktrace
 {code}
 Caused by: java.io.IOException: Channel to path 
 '\AppData\Local\Temp\flink-io-366dee63-092c-415c-b119-a138506dec86\ 
 fa44b17b98c3b1b1e30185fd92be5d01.02.channel' could not be opened.
   at 
 org.apache.flink.runtime.io.disk.iomanager.AbstractFileIOChannel.init(AbstractFileIOChannel.java:61)
   at 
 org.apache.flink.runtime.io.disk.iomanager.AsynchronousFileIOChannel.init(AsynchronousFileIOChannel.java:77)
   at 
 org.apache.flink.runtime.io.disk.iomanager.AsynchronousBulkBlockReader.init(AsynchronousBulkBlockReader.java:46)
   at 
 org.apache.flink.runtime.io.disk.iomanager.AsynchronousBulkBlockReader.init(AsynchronousBulkBlockReader.java:39)
   at 
 org.apache.flink.runtime.io.disk.iomanager.IOManagerAsync.createBulkBlockChannelReader(IOManagerAsync.java:236)
   at 
 org.apache.flink.runtime.operators.hash.MutableHashTable.buildTableFromSpilledPartition(MutableHashTable.java:747)
   at 
 org.apache.flink.runtime.operators.hash.MutableHashTable.prepareNextPartition(MutableHashTable.java:508)
   at 
 org.apache.flink.runtime.operators.hash.ReOpenableMutableHashTable.prepareNextPartition(ReOpenableMutableHashTable.java:167)
   at 
 org.apache.flink.runtime.operators.hash.MutableHashTable.nextRecord(MutableHashTable.java:541)
   at 
 org.apache.flink.runtime.operators.hash.NonReusingBuildFirstHashMatchIterator.callWithNextKey(NonReusingBuildFirstHashMatchIterator.java:104)
   at 
 org.apache.flink.runtime.operators.AbstractCachedBuildSideMatchDriver.run(AbstractCachedBuildSideMatchDriver.java:155)
   at 
 org.apache.flink.runtime.operators.RegularPactTask.run(RegularPactTask.java:496)
   at 
 org.apache.flink.runtime.iterative.task.AbstractIterativePactTask.run(AbstractIterativePactTask.java:139)
   at 
 org.apache.flink.runtime.iterative.task.IterationIntermediatePactTask.run(IterationIntermediatePactTask.java:92)
   at 
 org.apache.flink.runtime.operators.RegularPactTask.invoke(RegularPactTask.java:362)
   at 
 org.apache.flink.runtime.execution.RuntimeEnvironment.run(RuntimeEnvironment.java:204)
   at java.lang.Thread.run(Thread.java:745)
 Caused by: java.io.FileNotFoundException: 
 AppData\Local\Temp\flink-io-366dee63-092c-415c-b119-a138506dec86\ 
 fa44b17b98c3b1b1e30185fd92be5d01.02.channel (Das System kann die 
 angegebene Datei nicht finden)
   at java.io.RandomAccessFile.open(Native Method)
   at java.io.RandomAccessFile.init(RandomAccessFile.java:241)
   at java.io.RandomAccessFile.init(RandomAccessFile.java:122)
   at 
 org.apache.flink.runtime.io.disk.iomanager.AbstractFileIOChannel.init(AbstractFileIOChannel.java:57)
   ... 16 more
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (FLINK-1598) Give better error messages when serializers run out of space.

2015-02-24 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-1598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen resolved FLINK-1598.
-
Resolution: Fixed

Fixed via 9528a521f56e0c6b0c70d43e62ad84b19c048c36

 Give better error messages when serializers run out of space.
 -

 Key: FLINK-1598
 URL: https://issues.apache.org/jira/browse/FLINK-1598
 Project: Flink
  Issue Type: Bug
  Components: Local Runtime
Affects Versions: 0.9
Reporter: Stephan Ewen
Assignee: Stephan Ewen
 Fix For: 0.9






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1613) Cannost submit to remote ExecutionEnvironment from IDE

2015-02-26 Thread Stephan Ewen (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14338714#comment-14338714
 ] 

Stephan Ewen commented on FLINK-1613:
-

This is somewhat expected, currently. The remote executor needs a JAR file with 
all the user defined classes. It will shiip that JAR into the cluster.

There is work to collect these classes automatically. If you want, pick up the 
pull request and use the jarfile creator to automatically gather the classes 
and generate the JAR to ship.

https://github.com/apache/flink/pull/35

 Cannost submit to remote ExecutionEnvironment from IDE
 --

 Key: FLINK-1613
 URL: https://issues.apache.org/jira/browse/FLINK-1613
 Project: Flink
  Issue Type: Bug
  Components: Distributed Runtime
Affects Versions: 0.8.1
 Environment: * Ubuntu Linux 14.04
 * Flink 0.9-SNAPSHOT or 0.8.1 running in standalone mode on localhost
Reporter: Alexander Alexandrov
 Fix For: 0.9, 0.8.2


 I am reporting this as [~rmetzler] mentioned offline that it was working in 
 the past.
 At the moment it is not possible to submit jobs directly from the IDE. Both 
 the Java and the Scala quickstart guides fail on both 0.8.1 and 0.9-SNAPSHOT 
 with ClassNotFoundException exceptions.
 To reproduce the error, run the quickstart scripts and change the 
 ExecutionEnvironment initialization:
 {code:java}
 env = ExecutionEnvironment.createRemoteEnvironment(localhost, 6123)
 {code}
 This is the cause for Java:
 {noformat}
 Caused by: java.lang.ClassNotFoundException: 
 org.myorg.quickstart.WordCount$LineSplitter
   at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
   at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
   at java.security.AccessController.doPrivileged(Native Method)
   at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
   at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
   at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
   at java.lang.Class.forName0(Native Method)
   at java.lang.Class.forName(Class.java:274)
   at 
 org.apache.flink.util.InstantiationUtil$ClassLoaderObjectInputStream.resolveClass(InstantiationUtil.java:54)
   at 
 java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1612)
   at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517)
   at 
 java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771)
   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
   at 
 java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
   at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
   at 
 java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
   at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
   at 
 org.apache.flink.util.InstantiationUtil.deserializeObject(InstantiationUtil.java:274)
   at 
 org.apache.flink.util.InstantiationUtil.readObjectFromConfig(InstantiationUtil.java:236)
   at 
 org.apache.flink.runtime.operators.util.TaskConfig.getStubWrapper(TaskConfig.java:281)
 {noformat}
 This is for Scala:
 {noformat}
 java.lang.ClassNotFoundException: 
 org.myorg.quickstart.WordCount$$anon$2$$anon$1
   at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
   at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
   at java.security.AccessController.doPrivileged(Native Method)
   at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
   at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
   at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
   at java.lang.Class.forName0(Native Method)
   at java.lang.Class.forName(Class.java:274)
   at 
 org.apache.flink.util.InstantiationUtil$ClassLoaderObjectInputStream.resolveClass(InstantiationUtil.java:54)
   at 
 java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1612)
   at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517)
   at 
 java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771)
   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
   at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
   at 
 org.apache.flink.util.InstantiationUtil.deserializeObject(InstantiationUtil.java:274)
   at 
 org.apache.flink.util.InstantiationUtil.readObjectFromConfig(InstantiationUtil.java:236)
   at 
 org.apache.flink.api.java.typeutils.runtime.RuntimeSerializerFactory.readParametersFromConfig(RuntimeSerializerFactory.java:76)
   at 
 

[jira] [Resolved] (FLINK-1781) Quickstarts broken due to Scala Version Variables

2015-03-27 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-1781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen resolved FLINK-1781.
-
Resolution: Fixed

Fixed in 1aba942c1fd7c4dbf1c4d4f30602d69c2cb3540e

 Quickstarts broken due to Scala Version Variables
 -

 Key: FLINK-1781
 URL: https://issues.apache.org/jira/browse/FLINK-1781
 Project: Flink
  Issue Type: Bug
  Components: Quickstarts
Affects Versions: 0.9
Reporter: Stephan Ewen
Assignee: Stephan Ewen
Priority: Blocker
 Fix For: 0.9


 The quickstart archetype resources refer to the scala version variables.
 When creating a maven project standalone, these variables are not defined, 
 and the pom is invalid.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1754) Deadlock in job execution

2015-03-27 Thread Stephan Ewen (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14384450#comment-14384450
 ] 

Stephan Ewen commented on FLINK-1754:
-

Is this issue resolved?

The issue with deadlocks in 0.8.1 is (I think) that the runtime does not obey 
the assumptions from the optimizer. The hash table building requires (for some 
reason) data availability on the probe side as well.

 Deadlock in job execution
 -

 Key: FLINK-1754
 URL: https://issues.apache.org/jira/browse/FLINK-1754
 Project: Flink
  Issue Type: Bug
Affects Versions: 0.8.1
Reporter: Sebastian Kruse

 I have encountered a reproducible deadlock in the execution of one of my 
 jobs. The part of the plan, where this happens, is the following:
 {code:java}
 /** Performs the reduction via creating transitive INDs and removing them 
 from the original IND set. */
 private DataSetTuple2Integer, int[] 
 calculateTransitiveReduction1(DataSetTuple2Integer, int[] 
 inclusionDependencies) {
 // Concatenate INDs (only one hop).
 DataSetTuple2Integer, int[] transitiveInds = inclusionDependencies
 .flatMap(new SplitInds())
 .joinWithTiny(inclusionDependencies)
 .where(1).equalTo(0)
 .with(new ConcatenateInds());
 // Remove the concatenated INDs to come up with a transitive 
 reduction of the INDs.
 return inclusionDependencies
 .coGroup(transitiveInds)
 .where(0).equalTo(0)
 .with(new RemoveTransitiveInds());
 }
 {code}
 Seemingly, the flatmap operator waits infinitely for a free buffer to write 
 on.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1796) Local mode TaskManager should have a process reaper

2015-03-27 Thread Stephan Ewen (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14384458#comment-14384458
 ] 

Stephan Ewen commented on FLINK-1796:
-

I have a fix for this coming up...

 Local mode TaskManager should have a process reaper
 ---

 Key: FLINK-1796
 URL: https://issues.apache.org/jira/browse/FLINK-1796
 Project: Flink
  Issue Type: Bug
  Components: JobManager
Affects Versions: 0.9
Reporter: Stephan Ewen
Assignee: Stephan Ewen
 Fix For: 0.9


 We use process reaper actors (a typical Akka design pattern) to shut down the 
 JVM processes when the core actors die, as this is currently unrecoverable.
 The local mode uses the process reaper only for the JobManager actor, not for 
 the TaskManager actor. This may lead to dead stale JVMs on critical 
 TaskManager errors and makes debugging harder.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (FLINK-1783) Quickstart shading should not created shaded jar and dependency reduced pom

2015-03-27 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-1783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen resolved FLINK-1783.
-
Resolution: Fixed
  Assignee: Stephan Ewen

Fixed in d11e0910880a48bbd5c452e4c76ffdca000f5614

 Quickstart shading should not created shaded jar and dependency reduced pom
 ---

 Key: FLINK-1783
 URL: https://issues.apache.org/jira/browse/FLINK-1783
 Project: Flink
  Issue Type: Improvement
  Components: Quickstarts
Affects Versions: 0.9
Reporter: Stephan Ewen
Assignee: Stephan Ewen
 Fix For: 0.9






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1795) Solution set allows duplicates upon construction.

2015-03-27 Thread Stephan Ewen (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14384456#comment-14384456
 ] 

Stephan Ewen commented on FLINK-1795:
-

I have a fix and test coming up...

 Solution set allows duplicates upon construction.
 -

 Key: FLINK-1795
 URL: https://issues.apache.org/jira/browse/FLINK-1795
 Project: Flink
  Issue Type: Bug
  Components: Iterations
Affects Versions: 0.9
Reporter: Stephan Ewen
Assignee: Stephan Ewen
 Fix For: 0.9


 The solution set identifies entries uniquely by key.
 During construction, it does not eliminate duplicates. The duplicates do not 
 get updated during the iterations (since only the first match is considered), 
 but are contained in the final result.
 This contradicts the definition of the solution set. It should not contain 
 duplicates to begin with.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-1796) Local mode TaskManager should have a process reaper

2015-03-27 Thread Stephan Ewen (JIRA)
Stephan Ewen created FLINK-1796:
---

 Summary: Local mode TaskManager should have a process reaper
 Key: FLINK-1796
 URL: https://issues.apache.org/jira/browse/FLINK-1796
 Project: Flink
  Issue Type: Bug
  Components: JobManager
Affects Versions: 0.9
Reporter: Stephan Ewen
Assignee: Stephan Ewen
 Fix For: 0.9


We use process reaper actors (a typical Akka design pattern) to shut down the 
JVM processes when the core actors die, as this is currently unrecoverable.

The local mode uses the process reaper only for the JobManager actor, not for 
the TaskManager actor. This may lead to dead stale JVMs on critical TaskManager 
errors and makes debugging harder.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-1795) Solution set allows duplicates upon construction.

2015-03-27 Thread Stephan Ewen (JIRA)
Stephan Ewen created FLINK-1795:
---

 Summary: Solution set allows duplicates upon construction.
 Key: FLINK-1795
 URL: https://issues.apache.org/jira/browse/FLINK-1795
 Project: Flink
  Issue Type: Bug
  Components: Iterations
Affects Versions: 0.9
Reporter: Stephan Ewen
Assignee: Stephan Ewen
 Fix For: 0.9


The solution set identifies entries uniquely by key.

During construction, it does not eliminate duplicates. The duplicates do not 
get updated during the iterations (since only the first match is considered), 
but are contained in the final result.

This contradicts the definition of the solution set. It should not contain 
duplicates to begin with.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1782) Change Quickstart Java version to 1.7

2015-03-27 Thread Stephan Ewen (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1782?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14384254#comment-14384254
 ] 

Stephan Ewen commented on FLINK-1782:
-

This is currently not possible, since the builds with Java 6 fail to run the 
quickstart tests when the quickstart pom specifies compiler version 1.7 (Java 7)

 Change Quickstart Java version to 1.7
 -

 Key: FLINK-1782
 URL: https://issues.apache.org/jira/browse/FLINK-1782
 Project: Flink
  Issue Type: Improvement
  Components: Quickstarts
Affects Versions: 0.9
Reporter: Stephan Ewen

 The quickstarts refer to the outdated Java 1.6 source and bin version. We 
 should upgrade this to 1.7.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (FLINK-1782) Change Quickstart Java version to 1.7

2015-03-27 Thread Stephan Ewen (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-1782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stephan Ewen resolved FLINK-1782.
-
   Resolution: Won't Fix
Fix Version/s: (was: 0.9)

 Change Quickstart Java version to 1.7
 -

 Key: FLINK-1782
 URL: https://issues.apache.org/jira/browse/FLINK-1782
 Project: Flink
  Issue Type: Improvement
  Components: Quickstarts
Affects Versions: 0.9
Reporter: Stephan Ewen

 The quickstarts refer to the outdated Java 1.6 source and bin version. We 
 should upgrade this to 1.7.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1319) Add static code analysis for UDFs

2015-03-23 Thread Stephan Ewen (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14376899#comment-14376899
 ] 

Stephan Ewen commented on FLINK-1319:
-

Very nice result, a very much anticipated feature.

Can you tell us how many functions are currently analyzed by this? Does the 
basic mechanism work with record-at-a-time functions only, or also with 
group-at-a-time functions?

To proceed:
  - Do we nee an extra project for this? I would actually not mind having this 
in core / java. It is sort of lightweight and we have the ASM dependency 
anyways (closure cleaning).
  - To activate or deactivate it, I would use the ExecutionConfig in the 
ExecutionEnvironment. From my experience with users, no one bothers to call any 
of the parametrization methods ever (withForwardFields, withName, analyzeUdf, 
...). If we make it dependent on that, it will effectively not be used.
  - I would have it deactivated by default initially. Users can activate it 
globally with the ExecutionConfig. We should have it activated it in all test 
to give the code coverage with our test UDFs. This can be done centralized, 
where the test context environments are created.
  - We can activate it by default in the next release, once we have given this 
some testing and exposure.

Other comments:
  - I would vote to throw an exception (or at least print a warning) if you 
detect that any path in the program returns a null value.
  - ASM dependency versions needs to be set by a variable (defined in root pom, 
interaction with shading)
  - Can you format the POM xml like the other POMs (tabs) ?


 Add static code analysis for UDFs
 -

 Key: FLINK-1319
 URL: https://issues.apache.org/jira/browse/FLINK-1319
 Project: Flink
  Issue Type: New Feature
  Components: Java API, Scala API
Reporter: Stephan Ewen
Assignee: Timo Walther
Priority: Minor

 Flink's Optimizer takes information that tells it for UDFs which fields of 
 the input elements are accessed, modified, or frwarded/copied. This 
 information frequently helps to reuse partitionings, sorts, etc. It may speed 
 up programs significantly, as it can frequently eliminate sorts and shuffles, 
 which are costly.
 Right now, users can add lightweight annotations to UDFs to provide this 
 information (such as adding {{@ConstandFields(0-3, 1, 2-1)}}.
 We worked with static code analysis of UDFs before, to determine this 
 information automatically. This is an incredible feature, as it magically 
 makes programs faster.
 For record-at-a-time operations (Map, Reduce, FlatMap, Join, Cross), this 
 works surprisingly well in many cases. We used the Soot toolkit for the 
 static code analysis. Unfortunately, Soot is LGPL licensed and thus we did 
 not include any of the code so far.
 I propose to add this functionality to Flink, in the form of a drop-in 
 addition, to work around the LGPL incompatibility with ALS 2.0. Users could 
 simply download a special flink-code-analysis.jar and drop it into the 
 lib folder to enable this functionality. We may even add a script to 
 tools that downloads that library automatically into the lib folder. This 
 should be legally fine, since we do not redistribute LGPL code and only 
 dynamically link it (the incompatibility with ASL 2.0 is mainly in the 
 patentability, if I remember correctly).
 Prior work on this has been done by [~aljoscha] and [~skunert], which could 
 provide a code base to start with.
 *Appendix*
 Hompage to Soot static analysis toolkit: http://www.sable.mcgill.ca/soot/
 Papers on static analysis and for optimization: 
 http://stratosphere.eu/assets/papers/EnablingOperatorReorderingSCA_12.pdf and 
 http://stratosphere.eu/assets/papers/openingTheBlackBoxes_12.pdf
 Quick introduction to the Optimizer: 
 http://stratosphere.eu/assets/papers/2014-VLDBJ_Stratosphere_Overview.pdf 
 (Section 6)
 Optimizer for Iterations: 
 http://stratosphere.eu/assets/papers/spinningFastIterativeDataFlows_12.pdf 
 (Sections 4.3 and 5.3)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1916) EOFException when running delta-iteration job

2015-04-21 Thread Stephan Ewen (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14505338#comment-14505338
 ] 

Stephan Ewen commented on FLINK-1916:
-

Confirmed, this is a bug in the {{CompactingHashTable}} class.

[~knub] - do you have a minimal example that is able to reproduce this bug? 
Then I'll try and fix it.

 EOFException when running delta-iteration job
 -

 Key: FLINK-1916
 URL: https://issues.apache.org/jira/browse/FLINK-1916
 Project: Flink
  Issue Type: Bug
 Environment: 0.9-milestone-1
 Exception on the cluster, local execution works
Reporter: Stefan Bunk

 The delta-iteration program in [1] ends with an
 java.io.EOFException
   at 
 org.apache.flink.runtime.operators.hash.InMemoryPartition$WriteView.nextSegment(InMemoryPartition.java:333)
   at 
 org.apache.flink.runtime.memorymanager.AbstractPagedOutputView.advance(AbstractPagedOutputView.java:140)
   at 
 org.apache.flink.runtime.memorymanager.AbstractPagedOutputView.writeByte(AbstractPagedOutputView.java:223)
   at 
 org.apache.flink.runtime.memorymanager.AbstractPagedOutputView.writeLong(AbstractPagedOutputView.java:291)
   at 
 org.apache.flink.runtime.memorymanager.AbstractPagedOutputView.writeDouble(AbstractPagedOutputView.java:307)
   at 
 org.apache.flink.api.common.typeutils.base.DoubleSerializer.serialize(DoubleSerializer.java:62)
   at 
 org.apache.flink.api.common.typeutils.base.DoubleSerializer.serialize(DoubleSerializer.java:26)
   at 
 org.apache.flink.api.scala.typeutils.CaseClassSerializer.serialize(CaseClassSerializer.scala:89)
   at 
 org.apache.flink.api.scala.typeutils.CaseClassSerializer.serialize(CaseClassSerializer.scala:29)
   at 
 org.apache.flink.runtime.operators.hash.InMemoryPartition.appendRecord(InMemoryPartition.java:219)
   at 
 org.apache.flink.runtime.operators.hash.CompactingHashTable.insertOrReplaceRecord(CompactingHashTable.java:536)
   at 
 org.apache.flink.runtime.operators.hash.CompactingHashTable.buildTableWithUniqueKey(CompactingHashTable.java:347)
   at 
 org.apache.flink.runtime.iterative.task.IterationHeadPactTask.readInitialSolutionSet(IterationHeadPactTask.java:209)
   at 
 org.apache.flink.runtime.iterative.task.IterationHeadPactTask.run(IterationHeadPactTask.java:270)
   at 
 org.apache.flink.runtime.operators.RegularPactTask.invoke(RegularPactTask.java:362)
   at 
 org.apache.flink.runtime.execution.RuntimeEnvironment.run(RuntimeEnvironment.java:217)
   at java.lang.Thread.run(Thread.java:745)
 For logs and the accompanying mailing list discussion see below.
 When running with slightly different memory configuration, as hinted on the 
 mailing list, I sometimes also get this exception:
 19.Apr. 13:39:29 INFO  Task - IterationHead(WorksetIteration 
 (Resolved-Redirects)) (10/10) switched to FAILED : 
 java.lang.IndexOutOfBoundsException: Index: 161, Size: 161
 at java.util.ArrayList.rangeCheck(ArrayList.java:635)
 at java.util.ArrayList.get(ArrayList.java:411)
 at 
 org.apache.flink.runtime.operators.hash.InMemoryPartition$WriteView.resetTo(InMemoryPartition.java:352)
 at 
 org.apache.flink.runtime.operators.hash.InMemoryPartition$WriteView.access$100(InMemoryPartition.java:301)
 at 
 org.apache.flink.runtime.operators.hash.InMemoryPartition.appendRecord(InMemoryPartition.java:226)
 at 
 org.apache.flink.runtime.operators.hash.CompactingHashTable.insertOrReplaceRecord(CompactingHashTable.java:536)
 at 
 org.apache.flink.runtime.operators.hash.CompactingHashTable.buildTableWithUniqueKey(CompactingHashTable.java:347)
 at 
 org.apache.flink.runtime.iterative.task.IterationHeadPactTask.readInitialSolutionSet(IterationHeadPactTask.java:209)
 at 
 org.apache.flink.runtime.iterative.task.IterationHeadPactTask.run(IterationHeadPactTask.java:270)
 at 
 org.apache.flink.runtime.operators.RegularPactTask.invoke(RegularPactTask.java:362)
 at 
 org.apache.flink.runtime.execution.RuntimeEnvironment.run(RuntimeEnvironment.java:217)
 at java.lang.Thread.run(Thread.java:745)
 [1] Flink job: https://gist.github.com/knub/0bfec859a563009c1d57
 [2] Job manager logs: https://gist.github.com/knub/01e3a4b0edb8cde66ff4
 [3] One task manager's logs: https://gist.github.com/knub/8f2f953da95c8d7adefc
 [4] 
 http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/EOFException-when-running-Flink-job-td1092.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1908) JobManager startup delay isn't considered when using start-cluster.sh script

2015-04-21 Thread Stephan Ewen (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14505313#comment-14505313
 ] 

Stephan Ewen commented on FLINK-1908:
-

I don't think that this issue will be fixed in 0.8.x.

@DarkKnightCZ Can you verify whether 0.9 works for you?

 JobManager startup delay isn't considered when using start-cluster.sh script
 

 Key: FLINK-1908
 URL: https://issues.apache.org/jira/browse/FLINK-1908
 Project: Flink
  Issue Type: Bug
  Components: Distributed Runtime
Affects Versions: 0.9, 0.8.1
 Environment: Linux
Reporter: Lukas Raska
Priority: Minor
   Original Estimate: 5m
  Remaining Estimate: 5m

 When starting Flink cluster via start-cluster.sh script, JobManager startup 
 can be delayed (as it's started asynchronously), which can result in failed 
 startup of several task managers.
 Solution is to wait certain amount of time and periodically check if RPC port 
 is accessible, then proceed with starting task managers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


<    1   2   3   4   5   6   7   8   9   10   >