[jira] [Commented] (FLINK-785) Add Chained operators for AllReduce and AllGroupReduce

2015-02-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14311258#comment-14311258
 ] 

ASF GitHub Bot commented on FLINK-785:
--

Github user zentol commented on the pull request:

https://github.com/apache/flink/pull/370#issuecomment-73404704
  
there's something funky going on with the tests here.

i got 2 failing tests in ObjectReuseITCase:
```

ObjectReuseITCaseJavaProgramTestBase.testJobWithoutObjectReuse:168-postSubmit:68-TestBaseUtils.compareResultsByLinesInMemory:223-TestBaseUtils.compareResultsByLinesInMemory:238
 arrays first differed at element [0]; expected:a,[10]0 but was:a,[6]0


ObjectReuseITCaseJavaProgramTestBase.testJobWithObjectReuse:120-postSubmit:68-TestBaseUtils.compareResultsByLinesInMemory:223-TestBaseUtils.compareResultsByLinesInMemory:238
 arrays first differed at element [0]; expected:a,[10]0 but was:a,[6]0
```
These two tests verify the wrong behaviour that occurs when object reuse is 
enabled but not accounted for. i thought this was generally treated as 
undefined behaviour, why are there tests for that?

the other 2 tests fail with NullPointerException when accessing the 
expected result.
```

ClosureCleanerITCase.after:52-TestBaseUtils.compareResultsByLinesInMemory:223-TestBaseUtils.compareResultsByLinesInMemory:234
 » NullPointer

ClosureCleanerITCase.after:52-TestBaseUtils.compareResultsByLinesInMemory:223-TestBaseUtils.compareResultsByLinesInMemory:234
 » NullPointer
```
i can't figure out why this occurs.


 Add Chained operators for AllReduce and AllGroupReduce
 --

 Key: FLINK-785
 URL: https://issues.apache.org/jira/browse/FLINK-785
 Project: Flink
  Issue Type: Improvement
Reporter: GitHub Import
  Labels: github-import
 Fix For: pre-apache


 Because the operators `AllReduce` and `AllGroupReduce` are used both for the 
 pre-reduce (combiner side) and the final reduce, they would greatly benefit 
 from a chained version.
  Imported from GitHub 
 Url: https://github.com/stratosphere/stratosphere/issues/785
 Created by: [StephanEwen|https://github.com/StephanEwen]
 Labels: runtime, 
 Milestone: Release 0.6 (unplanned)
 Created at: Sun May 11 17:41:12 CEST 2014
 State: open



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-785] Chained AllReduce

2015-02-08 Thread zentol
Github user zentol commented on the pull request:

https://github.com/apache/flink/pull/370#issuecomment-73404704
  
there's something funky going on with the tests here.

i got 2 failing tests in ObjectReuseITCase:
```

ObjectReuseITCaseJavaProgramTestBase.testJobWithoutObjectReuse:168-postSubmit:68-TestBaseUtils.compareResultsByLinesInMemory:223-TestBaseUtils.compareResultsByLinesInMemory:238
 arrays first differed at element [0]; expected:a,[10]0 but was:a,[6]0


ObjectReuseITCaseJavaProgramTestBase.testJobWithObjectReuse:120-postSubmit:68-TestBaseUtils.compareResultsByLinesInMemory:223-TestBaseUtils.compareResultsByLinesInMemory:238
 arrays first differed at element [0]; expected:a,[10]0 but was:a,[6]0
```
These two tests verify the wrong behaviour that occurs when object reuse is 
enabled but not accounted for. i thought this was generally treated as 
undefined behaviour, why are there tests for that?

the other 2 tests fail with NullPointerException when accessing the 
expected result.
```

ClosureCleanerITCase.after:52-TestBaseUtils.compareResultsByLinesInMemory:223-TestBaseUtils.compareResultsByLinesInMemory:234
 » NullPointer

ClosureCleanerITCase.after:52-TestBaseUtils.compareResultsByLinesInMemory:223-TestBaseUtils.compareResultsByLinesInMemory:234
 » NullPointer
```
i can't figure out why this occurs.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1179] Add button to JobManager web inte...

2015-02-08 Thread tillrohrmann
Github user tillrohrmann commented on a diff in the pull request:

https://github.com/apache/flink/pull/374#discussion_r24298303
  
--- Diff: 
flink-runtime/src/main/scala/org/apache/flink/runtime/jobmanager/JobManager.scala
 ---
@@ -349,6 +349,11 @@ Actor with ActorLogMessages with ActorLogging {
 case Heartbeat(instanceID) =
   instanceManager.reportHeartBeat(instanceID)
 
+case RequestStackTrace(instanceID) =
+  val taskManager = 
instanceManager.getRegisteredInstanceById(instanceID).getTaskManager
+  val result = AkkaUtils.ask[StackTrace](taskManager, SendStackTrace)
--- End diff --

This is a blocking call within the actor thread. We should avoid this. You 
can simply forward the ```SendStackTrace``` message to the respective 
TaskManager: ```taskManager forward SendStacktrace```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-1179) Add button to JobManager web interface to request stack trace of a TaskManager

2015-02-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14311301#comment-14311301
 ] 

ASF GitHub Bot commented on FLINK-1179:
---

Github user tillrohrmann commented on a diff in the pull request:

https://github.com/apache/flink/pull/374#discussion_r24298303
  
--- Diff: 
flink-runtime/src/main/scala/org/apache/flink/runtime/jobmanager/JobManager.scala
 ---
@@ -349,6 +349,11 @@ Actor with ActorLogMessages with ActorLogging {
 case Heartbeat(instanceID) =
   instanceManager.reportHeartBeat(instanceID)
 
+case RequestStackTrace(instanceID) =
+  val taskManager = 
instanceManager.getRegisteredInstanceById(instanceID).getTaskManager
+  val result = AkkaUtils.ask[StackTrace](taskManager, SendStackTrace)
--- End diff --

This is a blocking call within the actor thread. We should avoid this. You 
can simply forward the ```SendStackTrace``` message to the respective 
TaskManager: ```taskManager forward SendStacktrace```


 Add button to JobManager web interface to request stack trace of a TaskManager
 --

 Key: FLINK-1179
 URL: https://issues.apache.org/jira/browse/FLINK-1179
 Project: Flink
  Issue Type: New Feature
  Components: JobManager
Reporter: Robert Metzger
Assignee: Chiwan Park
Priority: Minor
  Labels: starter

 This is something I do quite often manually and I think it might be helpful 
 for users as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-1179] Add button to JobManager web inte...

2015-02-08 Thread chiwanpark
Github user chiwanpark commented on the pull request:

https://github.com/apache/flink/pull/374#issuecomment-73410413
  
@tillrohrmann Thanks for your advice. I will fix it!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-1486) Add a string to the print method to identify output

2015-02-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14311318#comment-14311318
 ] 

ASF GitHub Bot commented on FLINK-1486:
---

Github user mxm commented on the pull request:

https://github.com/apache/flink/pull/372#issuecomment-73410628
  
Good idea. So we would print `$taskId  $outputValue` if the user did not 
supply a string and `$string:$taskId  $outputValue` otherwise. If the 
parallelization degree is 1, we would just print `$string  $outputValue` if a 
string was supplied.


 Add a string to the print method to identify output
 ---

 Key: FLINK-1486
 URL: https://issues.apache.org/jira/browse/FLINK-1486
 Project: Flink
  Issue Type: Improvement
  Components: Local Runtime
Reporter: Max Michels
Assignee: Max Michels
Priority: Minor
  Labels: usability

 The output of the {{print}} method of {[DataSet}} is mainly used for debug 
 purposes. Currently, it is difficult to identify the output.
 I would suggest to add another {{print(String str)}} method which allows the 
 user to supply a String to identify the output. This could be a prefix before 
 the actual output or a format string (which might be an overkill).
 {code}
 DataSet data = env.fromElements(1,2,3,4,5);
 {code}
 For example, {{data.print(MyDataSet: )}} would output print
 {noformat}
 MyDataSet: 1
 MyDataSet: 2
 ...
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-1179] Add button to JobManager web inte...

2015-02-08 Thread tillrohrmann
Github user tillrohrmann commented on the pull request:

https://github.com/apache/flink/pull/374#issuecomment-73409662
  
Hi Chiwan, thanks for your work. It looks really good. I had some just some 
minor remarks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-1179) Add button to JobManager web interface to request stack trace of a TaskManager

2015-02-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14311304#comment-14311304
 ] 

ASF GitHub Bot commented on FLINK-1179:
---

Github user tillrohrmann commented on the pull request:

https://github.com/apache/flink/pull/374#issuecomment-73409662
  
Hi Chiwan, thanks for your work. It looks really good. I had some just some 
minor remarks.


 Add button to JobManager web interface to request stack trace of a TaskManager
 --

 Key: FLINK-1179
 URL: https://issues.apache.org/jira/browse/FLINK-1179
 Project: Flink
  Issue Type: New Feature
  Components: JobManager
Reporter: Robert Metzger
Assignee: Chiwan Park
Priority: Minor
  Labels: starter

 This is something I do quite often manually and I think it might be helpful 
 for users as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-1179] Add button to JobManager web inte...

2015-02-08 Thread tillrohrmann
Github user tillrohrmann commented on a diff in the pull request:

https://github.com/apache/flink/pull/374#discussion_r24298309
  
--- Diff: 
flink-runtime/src/main/scala/org/apache/flink/runtime/taskmanager/TaskManager.scala
 ---
@@ -300,6 +300,16 @@ import scala.collection.JavaConverters._
 case LogMemoryUsage =
   logMemoryStats()
 
+case SendStackTrace =
+  val traces = Thread.getAllStackTraces.asScala
+  val stackTraceStr = traces.map((trace: (Thread, 
Array[StackTraceElement])) = {
+val (thread, elements) = trace
+val traceStr = elements.map((trace: StackTraceElement) = 
trace.toString).mkString(\n)
--- End diff --

I think that you don't need the map operation here. 
```elements.mkString(\n)``` should do the same.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1486] add print method for prefixing a ...

2015-02-08 Thread mxm
Github user mxm commented on the pull request:

https://github.com/apache/flink/pull/372#issuecomment-73410628
  
Good idea. So we would print `$taskId  $outputValue` if the user did not 
supply a string and `$string:$taskId  $outputValue` otherwise. If the 
parallelization degree is 1, we would just print `$string  $outputValue` if a 
string was supplied.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-1493) Support for streaming jobs preserving global ordering of records

2015-02-08 Thread Matthias J. Sax (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14311465#comment-14311465
 ] 

Matthias J. Sax commented on FLINK-1493:


Hi,
I had a look into this. From my point of view, the best way to implement it, is 
to provide a MutableOrderedRecordReader in addition to the MutableRecordReader. 
The new reader buffers up all received StreamRecords in seperate buffers (one 
for each InputChannel). The channel information can be provided easily from the 
AbstractRecordReader. InputHandler can instantiace one or the other depending 
on the configuration (ie, if ordering is requiered or not).

Pros:
  This design avoids any deadlocks.
Cons:
  The needed memory is consumed from the heap and each StreamRecord is eagerly 
deserialized. An implementation using MemorySegments (or a BufferPool) could be 
added later on (limiting memory usage including an naive load shedding approach 
and allowind a lazy deserialization strategy).

Pleas give some feedback.

Two more question about the usage of generics:
  - Why is the ReaderIterator created with no generics type arguments in 
InputHandler.createInputIterator()?
  - Why does StreamRecord not implement IOReadableWritable (or requieres its 
member streamObject to do so)?

 Support for streaming jobs preserving global ordering of records
 

 Key: FLINK-1493
 URL: https://issues.apache.org/jira/browse/FLINK-1493
 Project: Flink
  Issue Type: New Feature
  Components: Streaming
Reporter: Márton Balassi

 Distributed streaming jobs do not give total, global ordering guarantees for 
 records only partial ordering is provided by the system: records travelling 
 on the same exact route of the physical plan are ordered, but they aren't 
 between routes.
 It turns out that although this feature can only be implemented via merge 
 sorting in the input buffers on a timestamp field thus creating substantial 
 latency is still desired for a number of applications.
 Just a heads up for the implementation: the sorting introduces back pressure 
 in the buffers and might cause deadlocks.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-1486] add print method for prefixing a ...

2015-02-08 Thread fhueske
Github user fhueske commented on the pull request:

https://github.com/apache/flink/pull/372#issuecomment-73417227
  
Sounds good to me!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-1486) Add a string to the print method to identify output

2015-02-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14311383#comment-14311383
 ] 

ASF GitHub Bot commented on FLINK-1486:
---

Github user fhueske commented on the pull request:

https://github.com/apache/flink/pull/372#issuecomment-73417227
  
Sounds good to me!


 Add a string to the print method to identify output
 ---

 Key: FLINK-1486
 URL: https://issues.apache.org/jira/browse/FLINK-1486
 Project: Flink
  Issue Type: Improvement
  Components: Local Runtime
Reporter: Max Michels
Assignee: Max Michels
Priority: Minor
  Labels: usability

 The output of the {{print}} method of {[DataSet}} is mainly used for debug 
 purposes. Currently, it is difficult to identify the output.
 I would suggest to add another {{print(String str)}} method which allows the 
 user to supply a String to identify the output. This could be a prefix before 
 the actual output or a format string (which might be an overkill).
 {code}
 DataSet data = env.fromElements(1,2,3,4,5);
 {code}
 For example, {{data.print(MyDataSet: )}} would output print
 {noformat}
 MyDataSet: 1
 MyDataSet: 2
 ...
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1388) POJO support for writeAsCsv

2015-02-08 Thread Timo Walther (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14311342#comment-14311342
 ] 

Timo Walther commented on FLINK-1388:
-

Hey Adnan,

for testing purposes you can create a PojoTypeInfo instance by using 
TypeExtractor.createTypeInfo(MyPojo.class). The PojoTypeInfo will later be 
supplied by the preceding operator.

 POJO support for writeAsCsv
 ---

 Key: FLINK-1388
 URL: https://issues.apache.org/jira/browse/FLINK-1388
 Project: Flink
  Issue Type: New Feature
  Components: Java API
Reporter: Timo Walther
Assignee: Adnan Khan
Priority: Minor

 It would be great if one could simply write out POJOs in CSV format.
 {code}
 public class MyPojo {
String a;
int b;
 }
 {code}
 to:
 {code}
 # CSV file of org.apache.flink.MyPojo: String a, int b
 Hello World, 42
 Hello World 2, 47
 ...
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1388) POJO support for writeAsCsv

2015-02-08 Thread Adnan Khan (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14311329#comment-14311329
 ] 

Adnan Khan commented on FLINK-1388:
---

Hey Fabian, 

I've been digging through those two classes you mentioned, but I'm still not 
clear on how to use the {{PojoTypeInfo}}. So given a custom POJO, how to create 
a {{PojoTypeInfo}} instance. Specifically because it takes in a 
{{ListPojoField}} as a constructor parameter as seen here {{public 
PojoTypeInfo(ClassT typeClass, ListPojoField fields)}}

However looking at the example output in the ticket description above, I can 
generate a CSV string using Java's reflect library. I feel I've missed 
something.

 POJO support for writeAsCsv
 ---

 Key: FLINK-1388
 URL: https://issues.apache.org/jira/browse/FLINK-1388
 Project: Flink
  Issue Type: New Feature
  Components: Java API
Reporter: Timo Walther
Assignee: Adnan Khan
Priority: Minor

 It would be great if one could simply write out POJOs in CSV format.
 {code}
 public class MyPojo {
String a;
int b;
 }
 {code}
 to:
 {code}
 # CSV file of org.apache.flink.MyPojo: String a, int b
 Hello World, 42
 Hello World 2, 47
 ...
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-1201] Add flink-gelly to flink-addons (...

2015-02-08 Thread vasia
Github user vasia commented on the pull request:

https://github.com/apache/flink/pull/335#issuecomment-73434636
  
@rmetzger does this mean I need to do the history filtering magic again and 
open a new pr?
@andralungu thanks a lot!
@balidani have you submitted yours?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-1201) Graph API for Flink

2015-02-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14311581#comment-14311581
 ] 

ASF GitHub Bot commented on FLINK-1201:
---

Github user vasia commented on the pull request:

https://github.com/apache/flink/pull/335#issuecomment-73434636
  
@rmetzger does this mean I need to do the history filtering magic again and 
open a new pr?
@andralungu thanks a lot!
@balidani have you submitted yours?


 Graph API for Flink 
 

 Key: FLINK-1201
 URL: https://issues.apache.org/jira/browse/FLINK-1201
 Project: Flink
  Issue Type: New Feature
Reporter: Kostas Tzoumas
Assignee: Vasia Kalavri

 This issue tracks the development of a Graph API/DSL for Flink.
 Until the code is pushed to the Flink repository, collaboration is happening 
 here: https://github.com/project-flink/flink-graph



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1201) Graph API for Flink

2015-02-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14311583#comment-14311583
 ] 

ASF GitHub Bot commented on FLINK-1201:
---

Github user balidani commented on the pull request:

https://github.com/apache/flink/pull/335#issuecomment-73434866
  
@vasia yes, just submitted mine


 Graph API for Flink 
 

 Key: FLINK-1201
 URL: https://issues.apache.org/jira/browse/FLINK-1201
 Project: Flink
  Issue Type: New Feature
Reporter: Kostas Tzoumas
Assignee: Vasia Kalavri

 This issue tracks the development of a Graph API/DSL for Flink.
 Until the code is pushed to the Flink repository, collaboration is happening 
 here: https://github.com/project-flink/flink-graph



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-1201] Add flink-gelly to flink-addons (...

2015-02-08 Thread vasia
Github user vasia commented on the pull request:

https://github.com/apache/flink/pull/335#issuecomment-73434977
  
Great, thanks ^^


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Assigned] (FLINK-703) Use complete element as join key.

2015-02-08 Thread Chiwan Park (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chiwan Park reassigned FLINK-703:
-

Assignee: Chiwan Park

 Use complete element as join key.
 -

 Key: FLINK-703
 URL: https://issues.apache.org/jira/browse/FLINK-703
 Project: Flink
  Issue Type: Improvement
Reporter: GitHub Import
Assignee: Chiwan Park
Priority: Trivial
  Labels: github-import
 Fix For: pre-apache


 In some situations such as semi-joins it could make sense to use a complete 
 element as join key. 
 Currently this can be done using a key-selector function, but we could offer 
 a shortcut for that.
 This is not an urgent issue, but might be helpful.
  Imported from GitHub 
 Url: https://github.com/stratosphere/stratosphere/issues/703
 Created by: [fhueske|https://github.com/fhueske]
 Labels: enhancement, java api, user satisfaction, 
 Milestone: Release 0.6 (unplanned)
 Created at: Thu Apr 17 23:40:00 CEST 2014
 State: open



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)