[jira] [Commented] (FLINK-785) Add Chained operators for AllReduce and AllGroupReduce
[ https://issues.apache.org/jira/browse/FLINK-785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14311258#comment-14311258 ] ASF GitHub Bot commented on FLINK-785: -- Github user zentol commented on the pull request: https://github.com/apache/flink/pull/370#issuecomment-73404704 there's something funky going on with the tests here. i got 2 failing tests in ObjectReuseITCase: ``` ObjectReuseITCaseJavaProgramTestBase.testJobWithoutObjectReuse:168-postSubmit:68-TestBaseUtils.compareResultsByLinesInMemory:223-TestBaseUtils.compareResultsByLinesInMemory:238 arrays first differed at element [0]; expected:a,[10]0 but was:a,[6]0 ObjectReuseITCaseJavaProgramTestBase.testJobWithObjectReuse:120-postSubmit:68-TestBaseUtils.compareResultsByLinesInMemory:223-TestBaseUtils.compareResultsByLinesInMemory:238 arrays first differed at element [0]; expected:a,[10]0 but was:a,[6]0 ``` These two tests verify the wrong behaviour that occurs when object reuse is enabled but not accounted for. i thought this was generally treated as undefined behaviour, why are there tests for that? the other 2 tests fail with NullPointerException when accessing the expected result. ``` ClosureCleanerITCase.after:52-TestBaseUtils.compareResultsByLinesInMemory:223-TestBaseUtils.compareResultsByLinesInMemory:234 » NullPointer ClosureCleanerITCase.after:52-TestBaseUtils.compareResultsByLinesInMemory:223-TestBaseUtils.compareResultsByLinesInMemory:234 » NullPointer ``` i can't figure out why this occurs. Add Chained operators for AllReduce and AllGroupReduce -- Key: FLINK-785 URL: https://issues.apache.org/jira/browse/FLINK-785 Project: Flink Issue Type: Improvement Reporter: GitHub Import Labels: github-import Fix For: pre-apache Because the operators `AllReduce` and `AllGroupReduce` are used both for the pre-reduce (combiner side) and the final reduce, they would greatly benefit from a chained version. Imported from GitHub Url: https://github.com/stratosphere/stratosphere/issues/785 Created by: [StephanEwen|https://github.com/StephanEwen] Labels: runtime, Milestone: Release 0.6 (unplanned) Created at: Sun May 11 17:41:12 CEST 2014 State: open -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-785] Chained AllReduce
Github user zentol commented on the pull request: https://github.com/apache/flink/pull/370#issuecomment-73404704 there's something funky going on with the tests here. i got 2 failing tests in ObjectReuseITCase: ``` ObjectReuseITCaseJavaProgramTestBase.testJobWithoutObjectReuse:168-postSubmit:68-TestBaseUtils.compareResultsByLinesInMemory:223-TestBaseUtils.compareResultsByLinesInMemory:238 arrays first differed at element [0]; expected:a,[10]0 but was:a,[6]0 ObjectReuseITCaseJavaProgramTestBase.testJobWithObjectReuse:120-postSubmit:68-TestBaseUtils.compareResultsByLinesInMemory:223-TestBaseUtils.compareResultsByLinesInMemory:238 arrays first differed at element [0]; expected:a,[10]0 but was:a,[6]0 ``` These two tests verify the wrong behaviour that occurs when object reuse is enabled but not accounted for. i thought this was generally treated as undefined behaviour, why are there tests for that? the other 2 tests fail with NullPointerException when accessing the expected result. ``` ClosureCleanerITCase.after:52-TestBaseUtils.compareResultsByLinesInMemory:223-TestBaseUtils.compareResultsByLinesInMemory:234 » NullPointer ClosureCleanerITCase.after:52-TestBaseUtils.compareResultsByLinesInMemory:223-TestBaseUtils.compareResultsByLinesInMemory:234 » NullPointer ``` i can't figure out why this occurs. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1179] Add button to JobManager web inte...
Github user tillrohrmann commented on a diff in the pull request: https://github.com/apache/flink/pull/374#discussion_r24298303 --- Diff: flink-runtime/src/main/scala/org/apache/flink/runtime/jobmanager/JobManager.scala --- @@ -349,6 +349,11 @@ Actor with ActorLogMessages with ActorLogging { case Heartbeat(instanceID) = instanceManager.reportHeartBeat(instanceID) +case RequestStackTrace(instanceID) = + val taskManager = instanceManager.getRegisteredInstanceById(instanceID).getTaskManager + val result = AkkaUtils.ask[StackTrace](taskManager, SendStackTrace) --- End diff -- This is a blocking call within the actor thread. We should avoid this. You can simply forward the ```SendStackTrace``` message to the respective TaskManager: ```taskManager forward SendStacktrace``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1179) Add button to JobManager web interface to request stack trace of a TaskManager
[ https://issues.apache.org/jira/browse/FLINK-1179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14311301#comment-14311301 ] ASF GitHub Bot commented on FLINK-1179: --- Github user tillrohrmann commented on a diff in the pull request: https://github.com/apache/flink/pull/374#discussion_r24298303 --- Diff: flink-runtime/src/main/scala/org/apache/flink/runtime/jobmanager/JobManager.scala --- @@ -349,6 +349,11 @@ Actor with ActorLogMessages with ActorLogging { case Heartbeat(instanceID) = instanceManager.reportHeartBeat(instanceID) +case RequestStackTrace(instanceID) = + val taskManager = instanceManager.getRegisteredInstanceById(instanceID).getTaskManager + val result = AkkaUtils.ask[StackTrace](taskManager, SendStackTrace) --- End diff -- This is a blocking call within the actor thread. We should avoid this. You can simply forward the ```SendStackTrace``` message to the respective TaskManager: ```taskManager forward SendStacktrace``` Add button to JobManager web interface to request stack trace of a TaskManager -- Key: FLINK-1179 URL: https://issues.apache.org/jira/browse/FLINK-1179 Project: Flink Issue Type: New Feature Components: JobManager Reporter: Robert Metzger Assignee: Chiwan Park Priority: Minor Labels: starter This is something I do quite often manually and I think it might be helpful for users as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1179] Add button to JobManager web inte...
Github user chiwanpark commented on the pull request: https://github.com/apache/flink/pull/374#issuecomment-73410413 @tillrohrmann Thanks for your advice. I will fix it! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1486) Add a string to the print method to identify output
[ https://issues.apache.org/jira/browse/FLINK-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14311318#comment-14311318 ] ASF GitHub Bot commented on FLINK-1486: --- Github user mxm commented on the pull request: https://github.com/apache/flink/pull/372#issuecomment-73410628 Good idea. So we would print `$taskId $outputValue` if the user did not supply a string and `$string:$taskId $outputValue` otherwise. If the parallelization degree is 1, we would just print `$string $outputValue` if a string was supplied. Add a string to the print method to identify output --- Key: FLINK-1486 URL: https://issues.apache.org/jira/browse/FLINK-1486 Project: Flink Issue Type: Improvement Components: Local Runtime Reporter: Max Michels Assignee: Max Michels Priority: Minor Labels: usability The output of the {{print}} method of {[DataSet}} is mainly used for debug purposes. Currently, it is difficult to identify the output. I would suggest to add another {{print(String str)}} method which allows the user to supply a String to identify the output. This could be a prefix before the actual output or a format string (which might be an overkill). {code} DataSet data = env.fromElements(1,2,3,4,5); {code} For example, {{data.print(MyDataSet: )}} would output print {noformat} MyDataSet: 1 MyDataSet: 2 ... {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1179] Add button to JobManager web inte...
Github user tillrohrmann commented on the pull request: https://github.com/apache/flink/pull/374#issuecomment-73409662 Hi Chiwan, thanks for your work. It looks really good. I had some just some minor remarks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1179) Add button to JobManager web interface to request stack trace of a TaskManager
[ https://issues.apache.org/jira/browse/FLINK-1179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14311304#comment-14311304 ] ASF GitHub Bot commented on FLINK-1179: --- Github user tillrohrmann commented on the pull request: https://github.com/apache/flink/pull/374#issuecomment-73409662 Hi Chiwan, thanks for your work. It looks really good. I had some just some minor remarks. Add button to JobManager web interface to request stack trace of a TaskManager -- Key: FLINK-1179 URL: https://issues.apache.org/jira/browse/FLINK-1179 Project: Flink Issue Type: New Feature Components: JobManager Reporter: Robert Metzger Assignee: Chiwan Park Priority: Minor Labels: starter This is something I do quite often manually and I think it might be helpful for users as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1179] Add button to JobManager web inte...
Github user tillrohrmann commented on a diff in the pull request: https://github.com/apache/flink/pull/374#discussion_r24298309 --- Diff: flink-runtime/src/main/scala/org/apache/flink/runtime/taskmanager/TaskManager.scala --- @@ -300,6 +300,16 @@ import scala.collection.JavaConverters._ case LogMemoryUsage = logMemoryStats() +case SendStackTrace = + val traces = Thread.getAllStackTraces.asScala + val stackTraceStr = traces.map((trace: (Thread, Array[StackTraceElement])) = { +val (thread, elements) = trace +val traceStr = elements.map((trace: StackTraceElement) = trace.toString).mkString(\n) --- End diff -- I think that you don't need the map operation here. ```elements.mkString(\n)``` should do the same. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1486] add print method for prefixing a ...
Github user mxm commented on the pull request: https://github.com/apache/flink/pull/372#issuecomment-73410628 Good idea. So we would print `$taskId $outputValue` if the user did not supply a string and `$string:$taskId $outputValue` otherwise. If the parallelization degree is 1, we would just print `$string $outputValue` if a string was supplied. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1493) Support for streaming jobs preserving global ordering of records
[ https://issues.apache.org/jira/browse/FLINK-1493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14311465#comment-14311465 ] Matthias J. Sax commented on FLINK-1493: Hi, I had a look into this. From my point of view, the best way to implement it, is to provide a MutableOrderedRecordReader in addition to the MutableRecordReader. The new reader buffers up all received StreamRecords in seperate buffers (one for each InputChannel). The channel information can be provided easily from the AbstractRecordReader. InputHandler can instantiace one or the other depending on the configuration (ie, if ordering is requiered or not). Pros: This design avoids any deadlocks. Cons: The needed memory is consumed from the heap and each StreamRecord is eagerly deserialized. An implementation using MemorySegments (or a BufferPool) could be added later on (limiting memory usage including an naive load shedding approach and allowind a lazy deserialization strategy). Pleas give some feedback. Two more question about the usage of generics: - Why is the ReaderIterator created with no generics type arguments in InputHandler.createInputIterator()? - Why does StreamRecord not implement IOReadableWritable (or requieres its member streamObject to do so)? Support for streaming jobs preserving global ordering of records Key: FLINK-1493 URL: https://issues.apache.org/jira/browse/FLINK-1493 Project: Flink Issue Type: New Feature Components: Streaming Reporter: Márton Balassi Distributed streaming jobs do not give total, global ordering guarantees for records only partial ordering is provided by the system: records travelling on the same exact route of the physical plan are ordered, but they aren't between routes. It turns out that although this feature can only be implemented via merge sorting in the input buffers on a timestamp field thus creating substantial latency is still desired for a number of applications. Just a heads up for the implementation: the sorting introduces back pressure in the buffers and might cause deadlocks. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1486] add print method for prefixing a ...
Github user fhueske commented on the pull request: https://github.com/apache/flink/pull/372#issuecomment-73417227 Sounds good to me! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1486) Add a string to the print method to identify output
[ https://issues.apache.org/jira/browse/FLINK-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14311383#comment-14311383 ] ASF GitHub Bot commented on FLINK-1486: --- Github user fhueske commented on the pull request: https://github.com/apache/flink/pull/372#issuecomment-73417227 Sounds good to me! Add a string to the print method to identify output --- Key: FLINK-1486 URL: https://issues.apache.org/jira/browse/FLINK-1486 Project: Flink Issue Type: Improvement Components: Local Runtime Reporter: Max Michels Assignee: Max Michels Priority: Minor Labels: usability The output of the {{print}} method of {[DataSet}} is mainly used for debug purposes. Currently, it is difficult to identify the output. I would suggest to add another {{print(String str)}} method which allows the user to supply a String to identify the output. This could be a prefix before the actual output or a format string (which might be an overkill). {code} DataSet data = env.fromElements(1,2,3,4,5); {code} For example, {{data.print(MyDataSet: )}} would output print {noformat} MyDataSet: 1 MyDataSet: 2 ... {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1388) POJO support for writeAsCsv
[ https://issues.apache.org/jira/browse/FLINK-1388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14311342#comment-14311342 ] Timo Walther commented on FLINK-1388: - Hey Adnan, for testing purposes you can create a PojoTypeInfo instance by using TypeExtractor.createTypeInfo(MyPojo.class). The PojoTypeInfo will later be supplied by the preceding operator. POJO support for writeAsCsv --- Key: FLINK-1388 URL: https://issues.apache.org/jira/browse/FLINK-1388 Project: Flink Issue Type: New Feature Components: Java API Reporter: Timo Walther Assignee: Adnan Khan Priority: Minor It would be great if one could simply write out POJOs in CSV format. {code} public class MyPojo { String a; int b; } {code} to: {code} # CSV file of org.apache.flink.MyPojo: String a, int b Hello World, 42 Hello World 2, 47 ... {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1388) POJO support for writeAsCsv
[ https://issues.apache.org/jira/browse/FLINK-1388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14311329#comment-14311329 ] Adnan Khan commented on FLINK-1388: --- Hey Fabian, I've been digging through those two classes you mentioned, but I'm still not clear on how to use the {{PojoTypeInfo}}. So given a custom POJO, how to create a {{PojoTypeInfo}} instance. Specifically because it takes in a {{ListPojoField}} as a constructor parameter as seen here {{public PojoTypeInfo(ClassT typeClass, ListPojoField fields)}} However looking at the example output in the ticket description above, I can generate a CSV string using Java's reflect library. I feel I've missed something. POJO support for writeAsCsv --- Key: FLINK-1388 URL: https://issues.apache.org/jira/browse/FLINK-1388 Project: Flink Issue Type: New Feature Components: Java API Reporter: Timo Walther Assignee: Adnan Khan Priority: Minor It would be great if one could simply write out POJOs in CSV format. {code} public class MyPojo { String a; int b; } {code} to: {code} # CSV file of org.apache.flink.MyPojo: String a, int b Hello World, 42 Hello World 2, 47 ... {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1201] Add flink-gelly to flink-addons (...
Github user vasia commented on the pull request: https://github.com/apache/flink/pull/335#issuecomment-73434636 @rmetzger does this mean I need to do the history filtering magic again and open a new pr? @andralungu thanks a lot! @balidani have you submitted yours? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1201) Graph API for Flink
[ https://issues.apache.org/jira/browse/FLINK-1201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14311581#comment-14311581 ] ASF GitHub Bot commented on FLINK-1201: --- Github user vasia commented on the pull request: https://github.com/apache/flink/pull/335#issuecomment-73434636 @rmetzger does this mean I need to do the history filtering magic again and open a new pr? @andralungu thanks a lot! @balidani have you submitted yours? Graph API for Flink Key: FLINK-1201 URL: https://issues.apache.org/jira/browse/FLINK-1201 Project: Flink Issue Type: New Feature Reporter: Kostas Tzoumas Assignee: Vasia Kalavri This issue tracks the development of a Graph API/DSL for Flink. Until the code is pushed to the Flink repository, collaboration is happening here: https://github.com/project-flink/flink-graph -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1201) Graph API for Flink
[ https://issues.apache.org/jira/browse/FLINK-1201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14311583#comment-14311583 ] ASF GitHub Bot commented on FLINK-1201: --- Github user balidani commented on the pull request: https://github.com/apache/flink/pull/335#issuecomment-73434866 @vasia yes, just submitted mine Graph API for Flink Key: FLINK-1201 URL: https://issues.apache.org/jira/browse/FLINK-1201 Project: Flink Issue Type: New Feature Reporter: Kostas Tzoumas Assignee: Vasia Kalavri This issue tracks the development of a Graph API/DSL for Flink. Until the code is pushed to the Flink repository, collaboration is happening here: https://github.com/project-flink/flink-graph -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1201] Add flink-gelly to flink-addons (...
Github user vasia commented on the pull request: https://github.com/apache/flink/pull/335#issuecomment-73434977 Great, thanks ^^ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Assigned] (FLINK-703) Use complete element as join key.
[ https://issues.apache.org/jira/browse/FLINK-703?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chiwan Park reassigned FLINK-703: - Assignee: Chiwan Park Use complete element as join key. - Key: FLINK-703 URL: https://issues.apache.org/jira/browse/FLINK-703 Project: Flink Issue Type: Improvement Reporter: GitHub Import Assignee: Chiwan Park Priority: Trivial Labels: github-import Fix For: pre-apache In some situations such as semi-joins it could make sense to use a complete element as join key. Currently this can be done using a key-selector function, but we could offer a shortcut for that. This is not an urgent issue, but might be helpful. Imported from GitHub Url: https://github.com/stratosphere/stratosphere/issues/703 Created by: [fhueske|https://github.com/fhueske] Labels: enhancement, java api, user satisfaction, Milestone: Release 0.6 (unplanned) Created at: Thu Apr 17 23:40:00 CEST 2014 State: open -- This message was sent by Atlassian JIRA (v6.3.4#6332)