[GitHub] flink pull request: [FLINK-1183] Generate gentle notification mess...
Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/296#issuecomment-69970228 How about setting the message to something like Warning: You are running Flink with Java 6, which is not maintained any more by Oracle or the OpenJDK community. Flink currently supports Java 6, but the support for Java 6 may be stopped in future versions due to the unavailability of bug fixes security patched. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1395) Add Jodatime support to Kryo
[ https://issues.apache.org/jira/browse/FLINK-1395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277647#comment-14277647 ] ASF GitHub Bot commented on FLINK-1395: --- Github user aljoscha commented on the pull request: https://github.com/apache/flink/pull/304#issuecomment-69988955 Ok, I looked at the existing LICENSE and NOTICE files and they don't contain any entries for apache licences projects. jodatime and the kaffee serialisers are also apache licenced, that's why I didn't add any entries for them either. Add Jodatime support to Kryo Key: FLINK-1395 URL: https://issues.apache.org/jira/browse/FLINK-1395 Project: Flink Issue Type: Bug Reporter: Robert Metzger Assignee: Aljoscha Krettek -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1400) In local mode, the default TaskManager won't listen on the data port.
[ https://issues.apache.org/jira/browse/FLINK-1400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277522#comment-14277522 ] Stephan Ewen commented on FLINK-1400: - I think you can also locally start a job manager with multiple task managers. In 0.8, you can use the MiniCluster and set the number of taskmanagers there. With 2 and more, you will see that the data port is actually used. In local mode, the default TaskManager won't listen on the data port. - Key: FLINK-1400 URL: https://issues.apache.org/jira/browse/FLINK-1400 Project: Flink Issue Type: Bug Affects Versions: 0.9 Environment: Ubuntu 14.04 LTS Reporter: Sergey Dudoladov Priority: Minor The Task Manager automatically started by the Job Manager (JobManager.scala, appr. line 470) in the local mode does not listen on the dataport. To reproduce: 1) Start Flink via ./start-local.sh 2) Look up the data port number on locahost:8081 - Task Managers tab 3) sudo netstat -taupen | grep dataport_number Or start the second Task Manager and run Flink with the degree of parallelism 2 (assuming one slot per Task Manager) 4) ./flink run -p 2 ... Task Managers started via ./taskmanager.sh work fine. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: FLINK-1402 - Remove Serializable extends from ...
Github user hsaputra commented on the pull request: https://github.com/apache/flink/pull/306#issuecomment-69967270 Ah ok, thanks for the info Stephen, good to know it was intentional. Do you want to keep this pattern? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1395) Add Jodatime support to Kryo
[ https://issues.apache.org/jira/browse/FLINK-1395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277644#comment-14277644 ] ASF GitHub Bot commented on FLINK-1395: --- Github user rmetzger commented on a diff in the pull request: https://github.com/apache/flink/pull/304#discussion_r22968592 --- Diff: flink-java/pom.xml --- @@ -64,6 +64,18 @@ under the License. version0.5.1/version /dependency + dependency --- End diff -- This is pulling some unneeded dependencies: https://github.com/magro/kryo-serializers/blob/master/pom.xml for example cglib,org.apache.wicket, Add Jodatime support to Kryo Key: FLINK-1395 URL: https://issues.apache.org/jira/browse/FLINK-1395 Project: Flink Issue Type: Bug Reporter: Robert Metzger Assignee: Aljoscha Krettek -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1395] Add support for JodaTime in KryoS...
Github user rmetzger commented on a diff in the pull request: https://github.com/apache/flink/pull/304#discussion_r22968592 --- Diff: flink-java/pom.xml --- @@ -64,6 +64,18 @@ under the License. version0.5.1/version /dependency + dependency --- End diff -- This is pulling some unneeded dependencies: https://github.com/magro/kryo-serializers/blob/master/pom.xml for example cglib,org.apache.wicket, --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1398) A new DataSet function: extractElementFromTuple
[ https://issues.apache.org/jira/browse/FLINK-1398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277526#comment-14277526 ] ASF GitHub Bot commented on FLINK-1398: --- Github user rmetzger commented on the pull request: https://github.com/apache/flink/pull/308#issuecomment-69976328 +1 for some utilities. I'm not sure however where to put it. Should we add another maven module? Make it part of the current flink-java ? Or start it as a github repo outside of the main project? A new DataSet function: extractElementFromTuple --- Key: FLINK-1398 URL: https://issues.apache.org/jira/browse/FLINK-1398 Project: Flink Issue Type: Wish Reporter: Felix Neutatz Assignee: Felix Neutatz Priority: Minor This is the use case: {code:xml} DataSetTuple2Integer, Double data = env.fromElements(new Tuple2Integer, Double(1,2.0)); data.map(new ElementFromTuple()); } public static final class ElementFromTuple implements MapFunctionTuple2Integer, Double, Double { @Override public Double map(Tuple2Integer, Double value) { return value.f1; } } {code} It would be awesome if we had something like this: {code:xml} data.extractElement(1); {code} This means that we implement a function for DataSet which extracts a certain element from a given Tuple. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1395) Add Jodatime support to Kryo
[ https://issues.apache.org/jira/browse/FLINK-1395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277635#comment-14277635 ] ASF GitHub Bot commented on FLINK-1395: --- Github user aljoscha commented on a diff in the pull request: https://github.com/apache/flink/pull/304#discussion_r22968315 --- Diff: flink-tests/src/test/scala/org/apache/flink/api/scala/runtime/KryoGenericTypeSerializerTest.scala --- @@ -125,6 +133,7 @@ class KryoGenericTypeSerializerTest { def runTests[T : ClassTag](objects: Seq[T]): Unit ={ val clsTag = classTag[T] val typeInfo = new GenericTypeInfo[T](clsTag.runtimeClass.asInstanceOf[Class[T]]) +println(TPE: + typeInfo) --- End diff -- Yes, my bad. Add Jodatime support to Kryo Key: FLINK-1395 URL: https://issues.apache.org/jira/browse/FLINK-1395 Project: Flink Issue Type: Bug Reporter: Robert Metzger Assignee: Aljoscha Krettek -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1183) Generate gentle notification message when Flink is started with Java 6
[ https://issues.apache.org/jira/browse/FLINK-1183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277455#comment-14277455 ] ASF GitHub Bot commented on FLINK-1183: --- Github user hsaputra commented on the pull request: https://github.com/apache/flink/pull/296#issuecomment-69971212 +1 gentler and informative =) Generate gentle notification message when Flink is started with Java 6 -- Key: FLINK-1183 URL: https://issues.apache.org/jira/browse/FLINK-1183 Project: Flink Issue Type: Improvement Reporter: Henry Saputra Priority: Minor With Java 6 is reaching EOL we would like to let Flink's applications to know that it is recommended to move to Jav 7 or higher. This could be done as logging message when Flink Job Manager is starting. This will allow us to deprecate the support for Java 6 in the future by providing early notification to the users. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1183] Generate gentle notification mess...
Github user hsaputra commented on the pull request: https://github.com/apache/flink/pull/296#issuecomment-69971212 +1 gentler and informative =) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1398] Introduce extractSingleField() in...
Github user rmetzger commented on the pull request: https://github.com/apache/flink/pull/308#issuecomment-69976328 +1 for some utilities. I'm not sure however where to put it. Should we add another maven module? Make it part of the current flink-java ? Or start it as a github repo outside of the main project? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1395) Add Jodatime support to Kryo
[ https://issues.apache.org/jira/browse/FLINK-1395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277634#comment-14277634 ] ASF GitHub Bot commented on FLINK-1395: --- Github user aljoscha commented on a diff in the pull request: https://github.com/apache/flink/pull/304#discussion_r22968293 --- Diff: flink-core/src/test/java/org/apache/flink/api/common/typeutils/SerializerTestBase.java --- @@ -99,6 +104,7 @@ public void testCopy() { for (T datum : testData) { T copy = serializer.copy(datum); + String str = copy.toString(); --- End diff -- Will change. Add Jodatime support to Kryo Key: FLINK-1395 URL: https://issues.apache.org/jira/browse/FLINK-1395 Project: Flink Issue Type: Bug Reporter: Robert Metzger Assignee: Aljoscha Krettek -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1395] Add support for JodaTime in KryoS...
Github user aljoscha commented on a diff in the pull request: https://github.com/apache/flink/pull/304#discussion_r22968293 --- Diff: flink-core/src/test/java/org/apache/flink/api/common/typeutils/SerializerTestBase.java --- @@ -99,6 +104,7 @@ public void testCopy() { for (T datum : testData) { T copy = serializer.copy(datum); + String str = copy.toString(); --- End diff -- Will change. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: FLINK-1402 - Remove Serializable extends from ...
Github user hsaputra commented on the pull request: https://github.com/apache/flink/pull/306#issuecomment-69992202 I don't remember if there any best practice about this, so If we think it is useful we could keep this style and maybe document it? But I don't think it is good practice for other interfaces. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1381) Only one output splitter supported per operator
[ https://issues.apache.org/jira/browse/FLINK-1381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277061#comment-14277061 ] Gyula Fora commented on FLINK-1381: --- Hey, So the problem is that the splitting is tied to an operator (for which the output will be split) while the DataStream can represents stream from multiple operators if they are merged. Actually once we can define multiple output selectors for one operator, we can move the split method to the DataStream, which will apply the splitting for each operator that it corresponds to. Only one output splitter supported per operator --- Key: FLINK-1381 URL: https://issues.apache.org/jira/browse/FLINK-1381 Project: Flink Issue Type: Bug Components: Streaming Affects Versions: 0.9 Reporter: Gyula Fora Priority: Minor Currently the streaming api only supports output splitting once per operator. The splitting logic should be reworked to allow any number of it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-1404) Trigger recycling of buffers held by historic intermediate result partitions
Ufuk Celebi created FLINK-1404: -- Summary: Trigger recycling of buffers held by historic intermediate result partitions Key: FLINK-1404 URL: https://issues.apache.org/jira/browse/FLINK-1404 Project: Flink Issue Type: Improvement Components: Distributed Runtime Reporter: Ufuk Celebi Assignee: Ufuk Celebi With blocking intermediate results (FLINK-1350) and proper partition state management (FLINK-1359) it is necessary to allow the network buffer pool to request eviction of historic intermediate results when not enough buffers are available. With the currently available pipelined intermediate partitions this is not an issue, because buffer pools can be released as soon as a partition is consumed. We need to be able to trigger the recycling of buffers held by historic intermediate results when not enough buffers are available for new local pools. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1403) Streaming api doesn't support output file named with file:// prefix
[ https://issues.apache.org/jira/browse/FLINK-1403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277073#comment-14277073 ] Márton Balassi commented on FLINK-1403: --- Hey, Thanks for posting the issue. We have just implemented it, pushing it soon. Streaming api doesn't support output file named with file:// prefix - Key: FLINK-1403 URL: https://issues.apache.org/jira/browse/FLINK-1403 Project: Flink Issue Type: Bug Reporter: Mingliang Qi Priority: Minor -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1381) Only one output splitter supported per operator
[ https://issues.apache.org/jira/browse/FLINK-1381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14276984#comment-14276984 ] Mingliang Qi commented on FLINK-1381: - Is there any problem of moving the split function from {{SingleOutputStreamOperator}} into {{DataStream}}? Only one output splitter supported per operator --- Key: FLINK-1381 URL: https://issues.apache.org/jira/browse/FLINK-1381 Project: Flink Issue Type: Bug Components: Streaming Affects Versions: 0.9 Reporter: Gyula Fora Priority: Minor Currently the streaming api only supports output splitting once per operator. The splitting logic should be reworked to allow any number of it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1388) POJO support for writeAsCsv
[ https://issues.apache.org/jira/browse/FLINK-1388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14278333#comment-14278333 ] ASF GitHub Bot commented on FLINK-1388: --- GitHub user msdevanms opened a pull request: https://github.com/apache/flink/pull/312 https://issues.apache.org/jira/browse/FLINK-1388 https://issues.apache.org/jira/browse/FLINK-1388 You can merge this pull request into a Git repository by running: $ git pull https://github.com/apache/flink release-0.8 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/312.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #312 commit a33ad5d8303295e57f0e6a8df3c071f010963029 Author: mbalassi mbala...@apache.org Date: 2014-12-16T11:48:09Z [streaming] DataStream print functionality update PrintSinkFunction now explicitly states threads in output Added printToErr functionality commit e34aca7545f4900725b470ff1ab2db4b48c2275f Author: mbalassi mbala...@apache.org Date: 2014-12-16T20:00:31Z [streaming] [examples] Refactor and packaging for windowing examples The current examples show-case the API, more meaningful examples are coming for the 0.9 release. commit c8e306b6567a4b615c54e8d8f88116ff6f1a0e38 Author: Gyula Fora gyf...@apache.org Date: 2014-12-16T21:49:58Z [streaming] Updated deprecated iterative functionality and docs commit 1ac9651abff2760a2a66c6cce0e3aa2e1bf5d1dd Author: twalthr i...@twalthr.com Date: 2014-12-10T21:02:07Z Fix invalid type hierarchy creation by Pojo logic commit a835e5dfb97624f3132761c7933aaffb03b0d06f Author: Robert Metzger rmetz...@apache.org Date: 2014-12-16T10:30:52Z [FLINK-610] Replace Avro by Kryo as the GenericType serializer The performance of data-intensive jobs using Kryo is probably going to be slow. Set correct classloader try to use Kryo.copy() with fallback to serialization copy commit 7ef04c625768515c874f3b015cf30f6631c4dade Author: Robert Metzger metzg...@web.de Date: 2014-12-16T21:00:50Z [FLINK-1333] Fixed getter/setter recognition for POJOs This closes #271 commit 02bad15318da525f6db938a41cd10c7203156314 Author: mbalassi mbala...@apache.org Date: 2014-12-17T15:46:01Z [FLINK-1325] [streaming] Added clousure cleaning to streaming This closes #273 commit 6f481ce785b9d4a9824b9d7c82e18342bbeaf897 Author: Till Rohrmann trohrm...@apache.org Date: 2014-12-18T10:37:23Z Fixes race condition in ExecutionGraph which allowed a job to go into the finished state without all job vertices having properly processed the finalizeOnMaster method. commit 15fb1da9907cc549ddb94f191fba11618f546854 Author: Stephan Ewen se...@apache.org Date: 2014-12-18T14:28:54Z [FLINK-1336] [core] Fix bug in StringValue binary copy method commit 88f38e49202354926cb4ec36390cbe34bad247a3 Author: Gyula Fora gyf...@apache.org Date: 2014-12-17T22:34:26Z [streaming] StreamInvokable rework for simpler logic and easier use commit 2ac49856ac8a3aa42912b1e74a7109792c5b93aa Author: mbalassi mbala...@apache.org Date: 2014-12-17T23:28:07Z [dist] Updated the assembly of the examples subdirectory Excluded the scala example jars Excluded the example source code subdirectories This closes #274 commit 446cc1253554f924664d7fe753f3cee46ee87c13 Author: Gyula Fora gyf...@apache.org Date: 2014-12-18T13:52:15Z [streaming] Added immutability for window and filter operators commit cadc9cce0bbfa2615ed746a6b0372f21e042a945 Author: Jonas Traub (powibol) j...@s-traub.com Date: 2014-12-18T15:11:16Z [streaming] Make windowed data stream aware of time based trigger/eviction in tumbling window situations. [streaming] Changed TimeEvictionPolicy to keep timestamps in the buffer instead of data-items commit 9555c827f8b354722f6865d531f3fccd400c0015 Author: mbalassi mbala...@apache.org Date: 2014-12-19T22:34:02Z [FLINK-1338] Updates necessary due to Apache graduation Removed Disclaimer file Eliminated unnecessary incubating substrings Bumped version to 0.8-SNAPSHOT commit 6b3c3a1780b1270226a76f6a81a9ad2029578a7a Author: mbalassi mbala...@apache.org Date: 2014-12-26T17:02:55Z [FLINK-1225] Fix for quickstart packaging This closes #279 commit 2467f36c80830e83b43271c89cf1ec827882b424 Author: mbalassi mbala...@apache.org Date: 2014-12-26T17:06:51Z [streaming] Temporal fix for streaming source parallelism commit b2271bd9eb3adc1770f40d452ed7fb69614ea649 Author: Gyula Fora gyf...@apache.org Date: 2015-01-02T17:33:46Z [streaming] Time trigger preNotify fix commit 6b1fd156a774a8292dd2e9227d611dcca5b9c526 Author: Gyula Fora gyf...@apache.org Date: 2014-12-11T14:22:03Z
[jira] [Commented] (FLINK-1183) Generate gentle notification message when Flink is started with Java 6
[ https://issues.apache.org/jira/browse/FLINK-1183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14278338#comment-14278338 ] ASF GitHub Bot commented on FLINK-1183: --- Github user ajaybhat commented on the pull request: https://github.com/apache/flink/pull/296#issuecomment-70046292 Thanks for the comments. Do you think the new message is better? Generate gentle notification message when Flink is started with Java 6 -- Key: FLINK-1183 URL: https://issues.apache.org/jira/browse/FLINK-1183 Project: Flink Issue Type: Improvement Reporter: Henry Saputra Priority: Minor With Java 6 is reaching EOL we would like to let Flink's applications to know that it is recommended to move to Jav 7 or higher. This could be done as logging message when Flink Job Manager is starting. This will allow us to deprecate the support for Java 6 in the future by providing early notification to the users. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1398) A new DataSet function: extractElementFromTuple
[ https://issues.apache.org/jira/browse/FLINK-1398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277105#comment-14277105 ] ASF GitHub Bot commented on FLINK-1398: --- GitHub user FelixNeutatz opened a pull request: https://github.com/apache/flink/pull/308 [FLINK-1398] Introduce extractSingleField() in DataSet This is a prototype how we could implement extractSingleField() for DataSet. Let's discuss :) You can merge this pull request into a Git repository by running: $ git pull https://github.com/FelixNeutatz/incubator-flink ExtractSingleField-FLINK1398 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/308.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #308 commit c3162413b2f6979595393f20d347e6e2057620fa Author: FelixNeutatz neut...@googlemail.com Date: 2015-01-14T15:50:37Z [FLINK-1398] Introduce extractSingleField() in DataSet A new DataSet function: extractElementFromTuple --- Key: FLINK-1398 URL: https://issues.apache.org/jira/browse/FLINK-1398 Project: Flink Issue Type: Wish Reporter: Felix Neutatz Priority: Minor This is the use case: {code:xml} DataSetTuple2Integer, Double data = env.fromElements(new Tuple2Integer, Double(1,2.0)); data.map(new ElementFromTuple()); } public static final class ElementFromTuple implements MapFunctionTuple2Integer, Double, Double { @Override public Double map(Tuple2Integer, Double value) { return value.f1; } } {code} It would be awesome if we had something like this: {code:xml} data.extractElement(1); {code} This means that we implement a function for DataSet which extracts a certain element from a given Tuple. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1398] Introduce extractSingleField() in...
GitHub user FelixNeutatz opened a pull request: https://github.com/apache/flink/pull/308 [FLINK-1398] Introduce extractSingleField() in DataSet This is a prototype how we could implement extractSingleField() for DataSet. Let's discuss :) You can merge this pull request into a Git repository by running: $ git pull https://github.com/FelixNeutatz/incubator-flink ExtractSingleField-FLINK1398 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/308.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #308 commit c3162413b2f6979595393f20d347e6e2057620fa Author: FelixNeutatz neut...@googlemail.com Date: 2015-01-14T15:50:37Z [FLINK-1398] Introduce extractSingleField() in DataSet --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1398) A new DataSet function: extractElementFromTuple
[ https://issues.apache.org/jira/browse/FLINK-1398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277141#comment-14277141 ] ASF GitHub Bot commented on FLINK-1398: --- Github user zentol commented on the pull request: https://github.com/apache/flink/pull/308#issuecomment-69940866 Why did you make a new operator instead of implementing it as a simple map function? A new DataSet function: extractElementFromTuple --- Key: FLINK-1398 URL: https://issues.apache.org/jira/browse/FLINK-1398 Project: Flink Issue Type: Wish Reporter: Felix Neutatz Priority: Minor This is the use case: {code:xml} DataSetTuple2Integer, Double data = env.fromElements(new Tuple2Integer, Double(1,2.0)); data.map(new ElementFromTuple()); } public static final class ElementFromTuple implements MapFunctionTuple2Integer, Double, Double { @Override public Double map(Tuple2Integer, Double value) { return value.f1; } } {code} It would be awesome if we had something like this: {code:xml} data.extractElement(1); {code} This means that we implement a function for DataSet which extracts a certain element from a given Tuple. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1376) SubSlots are not properly released in case that a TaskManager fatally fails, leaving the system in a corrupted state
[ https://issues.apache.org/jira/browse/FLINK-1376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277304#comment-14277304 ] ASF GitHub Bot commented on FLINK-1376: --- Github user tillrohrmann commented on the pull request: https://github.com/apache/flink/pull/300#issuecomment-69958460 I found the error and fixed it. I'll close this PR and open a new one rebased on the latest 0.8 release candidate. SubSlots are not properly released in case that a TaskManager fatally fails, leaving the system in a corrupted state Key: FLINK-1376 URL: https://issues.apache.org/jira/browse/FLINK-1376 Project: Flink Issue Type: Bug Reporter: Till Rohrmann In case that the TaskManager fatally fails and some of the failing node's slots are SharedSlots, then the slots are not properly released by the JobManager. This causes that the corresponding job will not be properly failed, leaving the system in a corrupted state. The reason for that is that the AllocatedSlot is not aware of being treated as a SharedSlot and thus he cannot release the associated SubSlots. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: Add [FLINK-1376] Add proper shared slot releas...
Github user tillrohrmann closed the pull request at: https://github.com/apache/flink/pull/300 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] flink pull request: [FLINK-1395] Add support for JodaTime in KryoS...
Github user StephanEwen commented on a diff in the pull request: https://github.com/apache/flink/pull/304#discussion_r22955775 --- Diff: flink-core/src/test/java/org/apache/flink/api/common/typeutils/SerializerTestBase.java --- @@ -99,6 +104,7 @@ public void testCopy() { for (T datum : testData) { T copy = serializer.copy(datum); + String str = copy.toString(); --- End diff -- This causes a lot of warnings for me. Can we change that to simply `copy.toString()` without declaring an unused variable to hold the result? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1398) A new DataSet function: extractElementFromTuple
[ https://issues.apache.org/jira/browse/FLINK-1398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277226#comment-14277226 ] Fabian Hueske commented on FLINK-1398: -- I am not sure how useful / how much needed such an operator is. Designing an API includes finding the right trade-off of conciseness and providing build-in operators. Extracting an element can be done using a trivial MapFunction, in Scala or Java8 even a lambda function. So this is just syntactic sugar for convenience. For that we would pay with two additional methods (one with an Integer index for tuples and another one with a field expression String for Pojo and tuple types) in the API which is already quite loaded, IMO. My feeling is, that the gain is not enough for extending the API, but I am open for other arguments ;-) A new DataSet function: extractElementFromTuple --- Key: FLINK-1398 URL: https://issues.apache.org/jira/browse/FLINK-1398 Project: Flink Issue Type: Wish Reporter: Felix Neutatz Priority: Minor This is the use case: {code:xml} DataSetTuple2Integer, Double data = env.fromElements(new Tuple2Integer, Double(1,2.0)); data.map(new ElementFromTuple()); } public static final class ElementFromTuple implements MapFunctionTuple2Integer, Double, Double { @Override public Double map(Tuple2Integer, Double value) { return value.f1; } } {code} It would be awesome if we had something like this: {code:xml} data.extractElement(1); {code} This means that we implement a function for DataSet which extracts a certain element from a given Tuple. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1395] Add support for JodaTime in KryoS...
Github user StephanEwen commented on a diff in the pull request: https://github.com/apache/flink/pull/304#discussion_r22956572 --- Diff: flink-tests/src/test/scala/org/apache/flink/api/scala/runtime/KryoGenericTypeSerializerTest.scala --- @@ -125,6 +133,7 @@ class KryoGenericTypeSerializerTest { def runTests[T : ClassTag](objects: Seq[T]): Unit ={ val clsTag = classTag[T] val typeInfo = new GenericTypeInfo[T](clsTag.runtimeClass.asInstanceOf[Class[T]]) +println(TPE: + typeInfo) --- End diff -- Is this still a debugging artifact? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Assigned] (FLINK-1398) A new DataSet function: extractElementFromTuple
[ https://issues.apache.org/jira/browse/FLINK-1398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Felix Neutatz reassigned FLINK-1398: Assignee: Felix Neutatz A new DataSet function: extractElementFromTuple --- Key: FLINK-1398 URL: https://issues.apache.org/jira/browse/FLINK-1398 Project: Flink Issue Type: Wish Reporter: Felix Neutatz Assignee: Felix Neutatz Priority: Minor This is the use case: {code:xml} DataSetTuple2Integer, Double data = env.fromElements(new Tuple2Integer, Double(1,2.0)); data.map(new ElementFromTuple()); } public static final class ElementFromTuple implements MapFunctionTuple2Integer, Double, Double { @Override public Double map(Tuple2Integer, Double value) { return value.f1; } } {code} It would be awesome if we had something like this: {code:xml} data.extractElement(1); {code} This means that we implement a function for DataSet which extracts a certain element from a given Tuple. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1395) Add Jodatime support to Kryo
[ https://issues.apache.org/jira/browse/FLINK-1395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277379#comment-14277379 ] ASF GitHub Bot commented on FLINK-1395: --- Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/304#issuecomment-69964529 Good work. Here are some comments/questions - jodatime has many classes beyond `DateTime`, such as for example `LocalDate`. Should we register them all? They are many, so it may be an idea to have something like a common serializer util registers them for you. - We definitely need to list jodatime and the kaffee serializers in the LICENSE and NOTICE files of the binary distribution. Add Jodatime support to Kryo Key: FLINK-1395 URL: https://issues.apache.org/jira/browse/FLINK-1395 Project: Flink Issue Type: Bug Reporter: Robert Metzger Assignee: Aljoscha Krettek -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1376] [runtime] Add proper shared slot ...
GitHub user tillrohrmann opened a pull request: https://github.com/apache/flink/pull/309 [FLINK-1376] [runtime] Add proper shared slot release in case of a fatal TaskManager failure This PR introduces SharedSlots as being a special Slot type and as such being released properly in case an Instance has been marked dead. This fixes the problem that a dead instance, which has not been shutdown properly, causes a job not being removed properly from the system, because it is not aware of the SubSlots. Adds test cases where only the task manager is killed by a Kill message (hard shutdown) @StephanEwen: Requires thorough review because it touches some delicate scheduling/slot logic. You can merge this pull request into a Git repository by running: $ git pull https://github.com/tillrohrmann/flink fixSharedSlotReleaseAkka Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/309.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #309 commit ba1dd8b2ce956eb1b14a0ca458a3ca5240da0aee Author: Till Rohrmann trohrm...@apache.org Date: 2015-01-12T09:58:45Z [FLINK-1376] [runtime] Add proper shared slot release in case of a fatal TaskManager failure. Fixes concurrent modification exception of SharedSlot's subSlots field by synchronizing all state changing operations through the associated assignment group. Fixes deadlock where Instance.markDead first acquires InstanceLock and then by releasing the associated slots the assignment group lockcan block with a direct releaseSlot call on a SharedSlot which first acquires the assignment group lock and then the instance lock in order to return the slot to the instance. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1376) SubSlots are not properly released in case that a TaskManager fatally fails, leaving the system in a corrupted state
[ https://issues.apache.org/jira/browse/FLINK-1376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277385#comment-14277385 ] ASF GitHub Bot commented on FLINK-1376: --- GitHub user tillrohrmann opened a pull request: https://github.com/apache/flink/pull/309 [FLINK-1376] [runtime] Add proper shared slot release in case of a fatal TaskManager failure This PR introduces SharedSlots as being a special Slot type and as such being released properly in case an Instance has been marked dead. This fixes the problem that a dead instance, which has not been shutdown properly, causes a job not being removed properly from the system, because it is not aware of the SubSlots. Adds test cases where only the task manager is killed by a Kill message (hard shutdown) @StephanEwen: Requires thorough review because it touches some delicate scheduling/slot logic. You can merge this pull request into a Git repository by running: $ git pull https://github.com/tillrohrmann/flink fixSharedSlotReleaseAkka Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flink/pull/309.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #309 commit ba1dd8b2ce956eb1b14a0ca458a3ca5240da0aee Author: Till Rohrmann trohrm...@apache.org Date: 2015-01-12T09:58:45Z [FLINK-1376] [runtime] Add proper shared slot release in case of a fatal TaskManager failure. Fixes concurrent modification exception of SharedSlot's subSlots field by synchronizing all state changing operations through the associated assignment group. Fixes deadlock where Instance.markDead first acquires InstanceLock and then by releasing the associated slots the assignment group lockcan block with a direct releaseSlot call on a SharedSlot which first acquires the assignment group lock and then the instance lock in order to return the slot to the instance. SubSlots are not properly released in case that a TaskManager fatally fails, leaving the system in a corrupted state Key: FLINK-1376 URL: https://issues.apache.org/jira/browse/FLINK-1376 Project: Flink Issue Type: Bug Reporter: Till Rohrmann In case that the TaskManager fatally fails and some of the failing node's slots are SharedSlots, then the slots are not properly released by the JobManager. This causes that the corresponding job will not be properly failed, leaving the system in a corrupted state. The reason for that is that the AllocatedSlot is not aware of being treated as a SharedSlot and thus he cannot release the associated SubSlots. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1402) Remove extra extend Serializable in InputFormat interface
[ https://issues.apache.org/jira/browse/FLINK-1402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277415#comment-14277415 ] ASF GitHub Bot commented on FLINK-1402: --- Github user hsaputra commented on the pull request: https://github.com/apache/flink/pull/306#issuecomment-69967270 Ah ok, thanks for the info Stephen, good to know it was intentional. Do you want to keep this pattern? Remove extra extend Serializable in InputFormat interface - Key: FLINK-1402 URL: https://issues.apache.org/jira/browse/FLINK-1402 Project: Flink Issue Type: Bug Reporter: Henry Saputra Assignee: Henry Saputra Priority: Minor The org.apache.flink.api.common.io.InputFormat currently defined as: public interface InputFormatOT, T extends InputSplit extends InputSplitSourceT, Serializable however, InputSplitSource already extend Serializable: public interface InputSplitSourceT extends InputSplit extends Serializable so no need for InputFormat to explicitly extend Serializable interface. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: Remove dup code fromRemoteExecutor#executePlan...
Github user StephanEwen commented on the pull request: https://github.com/apache/flink/pull/307#issuecomment-69967981 This change looks good to be merged... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1389) Allow setting custom file extensions for files created by the FileOutputFormat
[ https://issues.apache.org/jira/browse/FLINK-1389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14276888#comment-14276888 ] ASF GitHub Bot commented on FLINK-1389: --- Github user rmetzger commented on the pull request: https://github.com/apache/flink/pull/301#issuecomment-69913057 Thank you for the feedback. I'll rework the pull request soon. Allow setting custom file extensions for files created by the FileOutputFormat -- Key: FLINK-1389 URL: https://issues.apache.org/jira/browse/FLINK-1389 Project: Flink Issue Type: New Feature Reporter: Robert Metzger Assignee: Robert Metzger Priority: Minor A user requested the ability to name avro files with the avro extension. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1399) Add support for registering Serializers with Kryo
[ https://issues.apache.org/jira/browse/FLINK-1399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14276744#comment-14276744 ] ASF GitHub Bot commented on FLINK-1399: --- Github user uce commented on the pull request: https://github.com/apache/flink/pull/305#issuecomment-69896334 I agree with @rmetzger, but I can see why @aljoscha likes the first suggestion more ;) Add support for registering Serializers with Kryo - Key: FLINK-1399 URL: https://issues.apache.org/jira/browse/FLINK-1399 Project: Flink Issue Type: Improvement Reporter: Aljoscha Krettek Assignee: Aljoscha Krettek -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1400) In local mode, the default TaskManager won't listen on the data port.
[ https://issues.apache.org/jira/browse/FLINK-1400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14276731#comment-14276731 ] Aljoscha Krettek commented on FLINK-1400: - You can, with a hack. You start a cluster with one local TaskManager using start-cluster.sh. Then you manually delete the pid file in the tmp directory and start an additional TaskManager using task manager.sh. On my machine I can use this to start a local cluster with 4 separate TaskManagers: {code} flink-dist/target/flink-0.9-SNAPSHOT-bin/flink-0.9-SNAPSHOT/bin/start-cluster.sh rm /tmp/flink-aljoscha-taskmanager.pid flink-dist/target/flink-0.9-SNAPSHOT-bin/flink-0.9-SNAPSHOT/bin/taskmanager.sh start rm /tmp/flink-aljoscha-taskmanager.pid flink-dist/target/flink-0.9-SNAPSHOT-bin/flink-0.9-SNAPSHOT/bin/taskmanager.sh start rm /tmp/flink-aljoscha-taskmanager.pid flink-dist/target/flink-0.9-SNAPSHOT-bin/flink-0.9-SNAPSHOT/bin/taskmanager.sh start {code} In local mode, the default TaskManager won't listen on the data port. - Key: FLINK-1400 URL: https://issues.apache.org/jira/browse/FLINK-1400 Project: Flink Issue Type: Bug Affects Versions: 0.9 Environment: Ubuntu 14.04 LTS Reporter: Sergey Dudoladov Priority: Minor The Task Manager automatically started by the Job Manager (JobManager.scala, appr. line 470) in the local mode does not listen on the dataport. To reproduce: 1) Start Flink via ./start-local.sh 2) Look up the data port number on locahost:8081 - Task Managers tab 3) sudo netstat -taupen | grep dataport_number Or start the second Task Manager and run Flink with the degree of parallelism 2 (assuming one slot per Task Manager) 4) ./flink run -p 2 ... Task Managers started via ./taskmanager.sh work fine. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] flink pull request: [FLINK-1399] Add support for registering Seria...
Github user rmetzger commented on the pull request: https://github.com/apache/flink/pull/305#issuecomment-69896090 The second option (env) is probably better because people will see the method when their IDE suggests method names. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (FLINK-1320) Add an off-heap variant of the managed memory
[ https://issues.apache.org/jira/browse/FLINK-1320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14276911#comment-14276911 ] ASF GitHub Bot commented on FLINK-1320: --- Github user mxm commented on the pull request: https://github.com/apache/flink/pull/290#issuecomment-69917146 I made some finally changes that were required for the current master. Also, I ran all integration tests with direct memory allocation enabled. Any objections for merging this pull request? Add an off-heap variant of the managed memory - Key: FLINK-1320 URL: https://issues.apache.org/jira/browse/FLINK-1320 Project: Flink Issue Type: Improvement Components: Local Runtime Reporter: Stephan Ewen Priority: Minor For (nearly) all memory that Flink accumulates (in the form of sort buffers, hash tables, caching), we use a special way of representing data serialized across a set of memory pages. The big work lies in the way the algorithms are implemented to operate on pages, rather than on objects. The core class for the memory is the {{MemorySegment}}, which has all methods to set and get primitives values efficiently. It is a somewhat simpler (and faster) variant of a HeapByteBuffer. As such, it should be straightforward to create a version where the memory segment is not backed by a heap byte[], but by memory allocated outside the JVM, in a similar way as the NIO DirectByteBuffers, or the Netty direct buffers do it. This may have multiple advantages: - We reduce the size of the JVM heap (garbage collected) and the number and size of long living alive objects. For large JVM sizes, this may improve performance quite a bit. Utilmately, we would in many cases reduce JVM size to 1/3 to 1/2 and keep the remaining memory outside the JVM. - We save copies when we move memory pages to disk (spilling) or through the network (shuffling / broadcasting / forward piping) The changes required to implement this are - Add a {{UnmanagedMemorySegment}} that only stores the memory adress as a long, and the segment size. It is initialized from a DirectByteBuffer. - Allow the MemoryManager to allocate these MemorySegments, instead of the current ones. - Make sure that the startup script pick up the mode and configure the heap size and the max direct memory properly. Since the MemorySegment is probably the most performance critical class in Flink, we must take care that we do this right. The following are critical considerations: - If we want both solutions (heap and off-heap) to exist side-by-side (configurable), we must make the base MemorySegment abstract and implement two versions (heap and off-heap). - To get the best performance, we need to make sure that only one class gets loaded (or at least ever used), to ensure optimal JIT de-virtualization and inlining. - We should carefully measure the performance of both variants. From previous micro benchmarks, I remember that individual byte accesses in DirectByteBuffers (off-heap) were slightly slower than on-heap, any larger accesses were equally good or slightly better. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-1320) Add an off-heap variant of the managed memory
[ https://issues.apache.org/jira/browse/FLINK-1320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14276923#comment-14276923 ] ASF GitHub Bot commented on FLINK-1320: --- Github user mxm commented on the pull request: https://github.com/apache/flink/pull/290#issuecomment-69920591 @uce Yes, it will be copied from off-heap to heap first. There are also other places like the `EventSerializer` where this is the case. I guess it depends on the amount of data that is being copied. If you want to operate on a byte array, then you have to copy it into the JVM heap. @uce I ran the integration tests manually. We could add the random testing in a separate pull request. Add an off-heap variant of the managed memory - Key: FLINK-1320 URL: https://issues.apache.org/jira/browse/FLINK-1320 Project: Flink Issue Type: Improvement Components: Local Runtime Reporter: Stephan Ewen Priority: Minor For (nearly) all memory that Flink accumulates (in the form of sort buffers, hash tables, caching), we use a special way of representing data serialized across a set of memory pages. The big work lies in the way the algorithms are implemented to operate on pages, rather than on objects. The core class for the memory is the {{MemorySegment}}, which has all methods to set and get primitives values efficiently. It is a somewhat simpler (and faster) variant of a HeapByteBuffer. As such, it should be straightforward to create a version where the memory segment is not backed by a heap byte[], but by memory allocated outside the JVM, in a similar way as the NIO DirectByteBuffers, or the Netty direct buffers do it. This may have multiple advantages: - We reduce the size of the JVM heap (garbage collected) and the number and size of long living alive objects. For large JVM sizes, this may improve performance quite a bit. Utilmately, we would in many cases reduce JVM size to 1/3 to 1/2 and keep the remaining memory outside the JVM. - We save copies when we move memory pages to disk (spilling) or through the network (shuffling / broadcasting / forward piping) The changes required to implement this are - Add a {{UnmanagedMemorySegment}} that only stores the memory adress as a long, and the segment size. It is initialized from a DirectByteBuffer. - Allow the MemoryManager to allocate these MemorySegments, instead of the current ones. - Make sure that the startup script pick up the mode and configure the heap size and the max direct memory properly. Since the MemorySegment is probably the most performance critical class in Flink, we must take care that we do this right. The following are critical considerations: - If we want both solutions (heap and off-heap) to exist side-by-side (configurable), we must make the base MemorySegment abstract and implement two versions (heap and off-heap). - To get the best performance, we need to make sure that only one class gets loaded (or at least ever used), to ensure optimal JIT de-virtualization and inlining. - We should carefully measure the performance of both variants. From previous micro benchmarks, I remember that individual byte accesses in DirectByteBuffers (off-heap) were slightly slower than on-heap, any larger accesses were equally good or slightly better. -- This message was sent by Atlassian JIRA (v6.3.4#6332)