[GitHub] flink pull request: [FLINK-1183] Generate gentle notification mess...

2015-01-14 Thread StephanEwen
Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/296#issuecomment-69970228
  
How about setting the message to something like

 Warning: You are running Flink with Java 6, which is not maintained any 
more by Oracle or the OpenJDK community. Flink currently supports Java 6, but 
the support for Java 6 may be stopped in future versions due to the 
unavailability of bug fixes security patched.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-1395) Add Jodatime support to Kryo

2015-01-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277647#comment-14277647
 ] 

ASF GitHub Bot commented on FLINK-1395:
---

Github user aljoscha commented on the pull request:

https://github.com/apache/flink/pull/304#issuecomment-69988955
  
Ok, I looked at the existing LICENSE and NOTICE files and they don't 
contain any entries for apache licences projects. jodatime and the kaffee 
serialisers are also apache licenced, that's why I didn't add any entries for 
them either.


 Add Jodatime support to Kryo
 

 Key: FLINK-1395
 URL: https://issues.apache.org/jira/browse/FLINK-1395
 Project: Flink
  Issue Type: Bug
Reporter: Robert Metzger
Assignee: Aljoscha Krettek





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1400) In local mode, the default TaskManager won't listen on the data port.

2015-01-14 Thread Stephan Ewen (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277522#comment-14277522
 ] 

Stephan Ewen commented on FLINK-1400:
-

I think you can also locally start a job manager with multiple task managers. 
In 0.8, you can use the MiniCluster and set the number of taskmanagers there. 
With 2 and more, you will see that the data port is actually used.

 In local mode, the default TaskManager won't listen on the data port.
 -

 Key: FLINK-1400
 URL: https://issues.apache.org/jira/browse/FLINK-1400
 Project: Flink
  Issue Type: Bug
Affects Versions: 0.9
 Environment: Ubuntu 14.04 LTS
Reporter: Sergey Dudoladov
Priority: Minor

  The Task Manager automatically started by the Job Manager (JobManager.scala, 
 appr. line  470)  in the local mode does not listen on the dataport. 
 To reproduce:
 1) Start Flink via ./start-local.sh
 2) Look up the data port number on locahost:8081 - Task Managers tab
 3) sudo netstat -taupen | grep dataport_number 
  
 Or  start the second Task Manager and run  Flink with the degree of 
 parallelism 2 (assuming one slot per Task Manager)
 4) ./flink run -p 2 ...
 Task Managers started via ./taskmanager.sh work fine.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: FLINK-1402 - Remove Serializable extends from ...

2015-01-14 Thread hsaputra
Github user hsaputra commented on the pull request:

https://github.com/apache/flink/pull/306#issuecomment-69967270
  
Ah ok, thanks for the info Stephen, good to know it was intentional.
Do you want to keep this pattern?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-1395) Add Jodatime support to Kryo

2015-01-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277644#comment-14277644
 ] 

ASF GitHub Bot commented on FLINK-1395:
---

Github user rmetzger commented on a diff in the pull request:

https://github.com/apache/flink/pull/304#discussion_r22968592
  
--- Diff: flink-java/pom.xml ---
@@ -64,6 +64,18 @@ under the License.
version0.5.1/version
/dependency
 
+   dependency
--- End diff --

This is pulling some unneeded dependencies: 
https://github.com/magro/kryo-serializers/blob/master/pom.xml
for example cglib,org.apache.wicket, 


 Add Jodatime support to Kryo
 

 Key: FLINK-1395
 URL: https://issues.apache.org/jira/browse/FLINK-1395
 Project: Flink
  Issue Type: Bug
Reporter: Robert Metzger
Assignee: Aljoscha Krettek





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-1395] Add support for JodaTime in KryoS...

2015-01-14 Thread rmetzger
Github user rmetzger commented on a diff in the pull request:

https://github.com/apache/flink/pull/304#discussion_r22968592
  
--- Diff: flink-java/pom.xml ---
@@ -64,6 +64,18 @@ under the License.
version0.5.1/version
/dependency
 
+   dependency
--- End diff --

This is pulling some unneeded dependencies: 
https://github.com/magro/kryo-serializers/blob/master/pom.xml
for example cglib,org.apache.wicket, 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-1398) A new DataSet function: extractElementFromTuple

2015-01-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277526#comment-14277526
 ] 

ASF GitHub Bot commented on FLINK-1398:
---

Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/308#issuecomment-69976328
  
+1 for some utilities.
I'm not sure however where to put it.
Should we add another maven module? Make it part of the current 
flink-java ? Or start it as a github repo outside of the main project?


 A new DataSet function: extractElementFromTuple
 ---

 Key: FLINK-1398
 URL: https://issues.apache.org/jira/browse/FLINK-1398
 Project: Flink
  Issue Type: Wish
Reporter: Felix Neutatz
Assignee: Felix Neutatz
Priority: Minor

 This is the use case:
 {code:xml}
 DataSetTuple2Integer, Double data =  env.fromElements(new 
 Tuple2Integer, Double(1,2.0));
 
 data.map(new ElementFromTuple());
 
 }
 public static final class ElementFromTuple implements 
 MapFunctionTuple2Integer, Double, Double {
 @Override
 public Double map(Tuple2Integer, Double value) {
 return value.f1;
 }
 }
 {code}
 It would be awesome if we had something like this:
 {code:xml}
 data.extractElement(1);
 {code}
 This means that we implement a function for DataSet which extracts a certain 
 element from a given Tuple.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1395) Add Jodatime support to Kryo

2015-01-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277635#comment-14277635
 ] 

ASF GitHub Bot commented on FLINK-1395:
---

Github user aljoscha commented on a diff in the pull request:

https://github.com/apache/flink/pull/304#discussion_r22968315
  
--- Diff: 
flink-tests/src/test/scala/org/apache/flink/api/scala/runtime/KryoGenericTypeSerializerTest.scala
 ---
@@ -125,6 +133,7 @@ class KryoGenericTypeSerializerTest {
   def runTests[T : ClassTag](objects: Seq[T]): Unit ={
 val clsTag = classTag[T]
 val typeInfo = new 
GenericTypeInfo[T](clsTag.runtimeClass.asInstanceOf[Class[T]])
+println(TPE:  + typeInfo)
--- End diff --

Yes, my bad.


 Add Jodatime support to Kryo
 

 Key: FLINK-1395
 URL: https://issues.apache.org/jira/browse/FLINK-1395
 Project: Flink
  Issue Type: Bug
Reporter: Robert Metzger
Assignee: Aljoscha Krettek





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1183) Generate gentle notification message when Flink is started with Java 6

2015-01-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277455#comment-14277455
 ] 

ASF GitHub Bot commented on FLINK-1183:
---

Github user hsaputra commented on the pull request:

https://github.com/apache/flink/pull/296#issuecomment-69971212
  
+1 gentler and informative =)


 Generate gentle notification message when Flink is started with Java 6
 --

 Key: FLINK-1183
 URL: https://issues.apache.org/jira/browse/FLINK-1183
 Project: Flink
  Issue Type: Improvement
Reporter: Henry Saputra
Priority: Minor

 With Java 6 is reaching EOL we would like to let Flink's applications to know 
 that it is recommended to move to Jav 7 or higher.
 This could be done as logging message when Flink Job Manager is starting.
 This will allow us to deprecate the support for Java 6 in the future by 
 providing early notification to the users.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-1183] Generate gentle notification mess...

2015-01-14 Thread hsaputra
Github user hsaputra commented on the pull request:

https://github.com/apache/flink/pull/296#issuecomment-69971212
  
+1 gentler and informative =)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1398] Introduce extractSingleField() in...

2015-01-14 Thread rmetzger
Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/308#issuecomment-69976328
  
+1 for some utilities.
I'm not sure however where to put it.
Should we add another maven module? Make it part of the current 
flink-java ? Or start it as a github repo outside of the main project?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-1395) Add Jodatime support to Kryo

2015-01-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277634#comment-14277634
 ] 

ASF GitHub Bot commented on FLINK-1395:
---

Github user aljoscha commented on a diff in the pull request:

https://github.com/apache/flink/pull/304#discussion_r22968293
  
--- Diff: 
flink-core/src/test/java/org/apache/flink/api/common/typeutils/SerializerTestBase.java
 ---
@@ -99,6 +104,7 @@ public void testCopy() {

for (T datum : testData) {
T copy = serializer.copy(datum);
+   String str = copy.toString();
--- End diff --

Will change.



 Add Jodatime support to Kryo
 

 Key: FLINK-1395
 URL: https://issues.apache.org/jira/browse/FLINK-1395
 Project: Flink
  Issue Type: Bug
Reporter: Robert Metzger
Assignee: Aljoscha Krettek





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-1395] Add support for JodaTime in KryoS...

2015-01-14 Thread aljoscha
Github user aljoscha commented on a diff in the pull request:

https://github.com/apache/flink/pull/304#discussion_r22968293
  
--- Diff: 
flink-core/src/test/java/org/apache/flink/api/common/typeutils/SerializerTestBase.java
 ---
@@ -99,6 +104,7 @@ public void testCopy() {

for (T datum : testData) {
T copy = serializer.copy(datum);
+   String str = copy.toString();
--- End diff --

Will change.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: FLINK-1402 - Remove Serializable extends from ...

2015-01-14 Thread hsaputra
Github user hsaputra commented on the pull request:

https://github.com/apache/flink/pull/306#issuecomment-69992202
  
I don't remember if there any best practice about this, so If we think it 
is useful we could keep this style and maybe document it?
But I don't think it is good practice for other interfaces.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-1381) Only one output splitter supported per operator

2015-01-14 Thread Gyula Fora (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277061#comment-14277061
 ] 

Gyula Fora commented on FLINK-1381:
---

Hey,

So the problem is that the splitting is tied to an operator (for which the 
output will be split) while the DataStream can represents stream from multiple 
operators if they are merged. 

Actually once we can define multiple output selectors for one operator, we can 
move the split method to the DataStream, which will apply the splitting for 
each operator that it corresponds to.

 Only one output splitter supported per operator
 ---

 Key: FLINK-1381
 URL: https://issues.apache.org/jira/browse/FLINK-1381
 Project: Flink
  Issue Type: Bug
  Components: Streaming
Affects Versions: 0.9
Reporter: Gyula Fora
Priority: Minor

 Currently the streaming api only supports output splitting once per operator. 
 The splitting logic should be reworked to allow any number of it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-1404) Trigger recycling of buffers held by historic intermediate result partitions

2015-01-14 Thread Ufuk Celebi (JIRA)
Ufuk Celebi created FLINK-1404:
--

 Summary: Trigger recycling of buffers held by historic 
intermediate result partitions
 Key: FLINK-1404
 URL: https://issues.apache.org/jira/browse/FLINK-1404
 Project: Flink
  Issue Type: Improvement
  Components: Distributed Runtime
Reporter: Ufuk Celebi
Assignee: Ufuk Celebi


With blocking intermediate results (FLINK-1350) and proper partition state 
management (FLINK-1359) it is necessary to allow the network buffer pool to 
request eviction of historic intermediate results when not enough buffers are 
available. With the currently available pipelined intermediate partitions this 
is not an issue, because buffer pools can be released as soon as a partition is 
consumed.

We need to be able to trigger the recycling of buffers held by historic 
intermediate results when not enough buffers are available for new local pools.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1403) Streaming api doesn't support output file named with file:// prefix

2015-01-14 Thread JIRA

[ 
https://issues.apache.org/jira/browse/FLINK-1403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277073#comment-14277073
 ] 

Márton Balassi commented on FLINK-1403:
---

Hey,

Thanks for posting the issue. We have just implemented it, pushing it soon.

 Streaming api doesn't support output file named with file:// prefix
 -

 Key: FLINK-1403
 URL: https://issues.apache.org/jira/browse/FLINK-1403
 Project: Flink
  Issue Type: Bug
Reporter: Mingliang Qi
Priority: Minor





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1381) Only one output splitter supported per operator

2015-01-14 Thread Mingliang Qi (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14276984#comment-14276984
 ] 

Mingliang Qi commented on FLINK-1381:
-

Is there any problem of moving the split function from 
{{SingleOutputStreamOperator}} into {{DataStream}}?

 Only one output splitter supported per operator
 ---

 Key: FLINK-1381
 URL: https://issues.apache.org/jira/browse/FLINK-1381
 Project: Flink
  Issue Type: Bug
  Components: Streaming
Affects Versions: 0.9
Reporter: Gyula Fora
Priority: Minor

 Currently the streaming api only supports output splitting once per operator. 
 The splitting logic should be reworked to allow any number of it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1388) POJO support for writeAsCsv

2015-01-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14278333#comment-14278333
 ] 

ASF GitHub Bot commented on FLINK-1388:
---

GitHub user msdevanms opened a pull request:

https://github.com/apache/flink/pull/312

https://issues.apache.org/jira/browse/FLINK-1388

https://issues.apache.org/jira/browse/FLINK-1388

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/apache/flink release-0.8

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/312.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #312


commit a33ad5d8303295e57f0e6a8df3c071f010963029
Author: mbalassi mbala...@apache.org
Date:   2014-12-16T11:48:09Z

[streaming] DataStream print functionality update

PrintSinkFunction now explicitly states threads in output
Added printToErr functionality

commit e34aca7545f4900725b470ff1ab2db4b48c2275f
Author: mbalassi mbala...@apache.org
Date:   2014-12-16T20:00:31Z

[streaming] [examples] Refactor and packaging for windowing examples

The current examples show-case the API, more meaningful examples are coming 
for the 0.9 release.

commit c8e306b6567a4b615c54e8d8f88116ff6f1a0e38
Author: Gyula Fora gyf...@apache.org
Date:   2014-12-16T21:49:58Z

[streaming] Updated deprecated iterative functionality and docs

commit 1ac9651abff2760a2a66c6cce0e3aa2e1bf5d1dd
Author: twalthr i...@twalthr.com
Date:   2014-12-10T21:02:07Z

Fix invalid type hierarchy creation by Pojo logic

commit a835e5dfb97624f3132761c7933aaffb03b0d06f
Author: Robert Metzger rmetz...@apache.org
Date:   2014-12-16T10:30:52Z

[FLINK-610] Replace Avro by Kryo as the GenericType serializer

The performance of data-intensive jobs using Kryo is probably going to be 
slow.

Set correct classloader

try to use Kryo.copy() with fallback to serialization copy

commit 7ef04c625768515c874f3b015cf30f6631c4dade
Author: Robert Metzger metzg...@web.de
Date:   2014-12-16T21:00:50Z

[FLINK-1333] Fixed getter/setter recognition for POJOs

This closes #271

commit 02bad15318da525f6db938a41cd10c7203156314
Author: mbalassi mbala...@apache.org
Date:   2014-12-17T15:46:01Z

[FLINK-1325] [streaming] Added clousure cleaning to streaming

This closes #273

commit 6f481ce785b9d4a9824b9d7c82e18342bbeaf897
Author: Till Rohrmann trohrm...@apache.org
Date:   2014-12-18T10:37:23Z

Fixes race condition in ExecutionGraph which allowed a job to go into the 
finished state without all job vertices having properly processed the 
finalizeOnMaster method.

commit 15fb1da9907cc549ddb94f191fba11618f546854
Author: Stephan Ewen se...@apache.org
Date:   2014-12-18T14:28:54Z

[FLINK-1336] [core] Fix bug in StringValue binary copy method

commit 88f38e49202354926cb4ec36390cbe34bad247a3
Author: Gyula Fora gyf...@apache.org
Date:   2014-12-17T22:34:26Z

[streaming] StreamInvokable rework for simpler logic and easier use

commit 2ac49856ac8a3aa42912b1e74a7109792c5b93aa
Author: mbalassi mbala...@apache.org
Date:   2014-12-17T23:28:07Z

[dist] Updated the assembly of the examples subdirectory

Excluded the scala example jars
Excluded the example source code subdirectories

This closes #274

commit 446cc1253554f924664d7fe753f3cee46ee87c13
Author: Gyula Fora gyf...@apache.org
Date:   2014-12-18T13:52:15Z

[streaming] Added immutability for window and filter operators

commit cadc9cce0bbfa2615ed746a6b0372f21e042a945
Author: Jonas Traub (powibol) j...@s-traub.com
Date:   2014-12-18T15:11:16Z

[streaming] Make windowed data stream aware of time based trigger/eviction 
in tumbling window situations.

[streaming] Changed TimeEvictionPolicy to keep timestamps in the buffer 
instead of data-items

commit 9555c827f8b354722f6865d531f3fccd400c0015
Author: mbalassi mbala...@apache.org
Date:   2014-12-19T22:34:02Z

[FLINK-1338] Updates necessary due to Apache graduation

Removed Disclaimer file
Eliminated unnecessary incubating substrings
Bumped version to 0.8-SNAPSHOT

commit 6b3c3a1780b1270226a76f6a81a9ad2029578a7a
Author: mbalassi mbala...@apache.org
Date:   2014-12-26T17:02:55Z

[FLINK-1225] Fix for quickstart packaging

This closes #279

commit 2467f36c80830e83b43271c89cf1ec827882b424
Author: mbalassi mbala...@apache.org
Date:   2014-12-26T17:06:51Z

[streaming] Temporal fix for streaming source parallelism

commit b2271bd9eb3adc1770f40d452ed7fb69614ea649
Author: Gyula Fora gyf...@apache.org
Date:   2015-01-02T17:33:46Z

[streaming] Time trigger preNotify fix

commit 6b1fd156a774a8292dd2e9227d611dcca5b9c526
Author: Gyula Fora gyf...@apache.org
Date:   2014-12-11T14:22:03Z


[jira] [Commented] (FLINK-1183) Generate gentle notification message when Flink is started with Java 6

2015-01-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14278338#comment-14278338
 ] 

ASF GitHub Bot commented on FLINK-1183:
---

Github user ajaybhat commented on the pull request:

https://github.com/apache/flink/pull/296#issuecomment-70046292
  
Thanks for the comments. Do you think the new message is better?


 Generate gentle notification message when Flink is started with Java 6
 --

 Key: FLINK-1183
 URL: https://issues.apache.org/jira/browse/FLINK-1183
 Project: Flink
  Issue Type: Improvement
Reporter: Henry Saputra
Priority: Minor

 With Java 6 is reaching EOL we would like to let Flink's applications to know 
 that it is recommended to move to Jav 7 or higher.
 This could be done as logging message when Flink Job Manager is starting.
 This will allow us to deprecate the support for Java 6 in the future by 
 providing early notification to the users.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1398) A new DataSet function: extractElementFromTuple

2015-01-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277105#comment-14277105
 ] 

ASF GitHub Bot commented on FLINK-1398:
---

GitHub user FelixNeutatz opened a pull request:

https://github.com/apache/flink/pull/308

[FLINK-1398] Introduce extractSingleField() in DataSet

This is a prototype how we could implement extractSingleField() for 
DataSet. Let's discuss :)

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/FelixNeutatz/incubator-flink 
ExtractSingleField-FLINK1398

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/308.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #308


commit c3162413b2f6979595393f20d347e6e2057620fa
Author: FelixNeutatz neut...@googlemail.com
Date:   2015-01-14T15:50:37Z

[FLINK-1398] Introduce extractSingleField() in DataSet




 A new DataSet function: extractElementFromTuple
 ---

 Key: FLINK-1398
 URL: https://issues.apache.org/jira/browse/FLINK-1398
 Project: Flink
  Issue Type: Wish
Reporter: Felix Neutatz
Priority: Minor

 This is the use case:
 {code:xml}
 DataSetTuple2Integer, Double data =  env.fromElements(new 
 Tuple2Integer, Double(1,2.0));
 
 data.map(new ElementFromTuple());
 
 }
 public static final class ElementFromTuple implements 
 MapFunctionTuple2Integer, Double, Double {
 @Override
 public Double map(Tuple2Integer, Double value) {
 return value.f1;
 }
 }
 {code}
 It would be awesome if we had something like this:
 {code:xml}
 data.extractElement(1);
 {code}
 This means that we implement a function for DataSet which extracts a certain 
 element from a given Tuple.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-1398] Introduce extractSingleField() in...

2015-01-14 Thread FelixNeutatz
GitHub user FelixNeutatz opened a pull request:

https://github.com/apache/flink/pull/308

[FLINK-1398] Introduce extractSingleField() in DataSet

This is a prototype how we could implement extractSingleField() for 
DataSet. Let's discuss :)

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/FelixNeutatz/incubator-flink 
ExtractSingleField-FLINK1398

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/308.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #308


commit c3162413b2f6979595393f20d347e6e2057620fa
Author: FelixNeutatz neut...@googlemail.com
Date:   2015-01-14T15:50:37Z

[FLINK-1398] Introduce extractSingleField() in DataSet




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-1398) A new DataSet function: extractElementFromTuple

2015-01-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277141#comment-14277141
 ] 

ASF GitHub Bot commented on FLINK-1398:
---

Github user zentol commented on the pull request:

https://github.com/apache/flink/pull/308#issuecomment-69940866
  
Why did you make a new operator instead of implementing it as a simple map 
function?


 A new DataSet function: extractElementFromTuple
 ---

 Key: FLINK-1398
 URL: https://issues.apache.org/jira/browse/FLINK-1398
 Project: Flink
  Issue Type: Wish
Reporter: Felix Neutatz
Priority: Minor

 This is the use case:
 {code:xml}
 DataSetTuple2Integer, Double data =  env.fromElements(new 
 Tuple2Integer, Double(1,2.0));
 
 data.map(new ElementFromTuple());
 
 }
 public static final class ElementFromTuple implements 
 MapFunctionTuple2Integer, Double, Double {
 @Override
 public Double map(Tuple2Integer, Double value) {
 return value.f1;
 }
 }
 {code}
 It would be awesome if we had something like this:
 {code:xml}
 data.extractElement(1);
 {code}
 This means that we implement a function for DataSet which extracts a certain 
 element from a given Tuple.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1376) SubSlots are not properly released in case that a TaskManager fatally fails, leaving the system in a corrupted state

2015-01-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277304#comment-14277304
 ] 

ASF GitHub Bot commented on FLINK-1376:
---

Github user tillrohrmann commented on the pull request:

https://github.com/apache/flink/pull/300#issuecomment-69958460
  
I found the error and fixed it. I'll close this PR and open a new one 
rebased on the latest 0.8 release candidate.


 SubSlots are not properly released in case that a TaskManager fatally fails, 
 leaving the system in a corrupted state
 

 Key: FLINK-1376
 URL: https://issues.apache.org/jira/browse/FLINK-1376
 Project: Flink
  Issue Type: Bug
Reporter: Till Rohrmann

 In case that the TaskManager fatally fails and some of the failing node's 
 slots are SharedSlots, then the slots are not properly released by the 
 JobManager. This causes that the corresponding job will not be properly 
 failed, leaving the system in a corrupted state.
 The reason for that is that the AllocatedSlot is not aware of being treated 
 as a SharedSlot and thus he cannot release the associated SubSlots.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: Add [FLINK-1376] Add proper shared slot releas...

2015-01-14 Thread tillrohrmann
Github user tillrohrmann closed the pull request at:

https://github.com/apache/flink/pull/300


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] flink pull request: [FLINK-1395] Add support for JodaTime in KryoS...

2015-01-14 Thread StephanEwen
Github user StephanEwen commented on a diff in the pull request:

https://github.com/apache/flink/pull/304#discussion_r22955775
  
--- Diff: 
flink-core/src/test/java/org/apache/flink/api/common/typeutils/SerializerTestBase.java
 ---
@@ -99,6 +104,7 @@ public void testCopy() {

for (T datum : testData) {
T copy = serializer.copy(datum);
+   String str = copy.toString();
--- End diff --

This causes a lot of warnings for me. Can we change that to simply 
`copy.toString()` without declaring an unused variable to hold the result?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-1398) A new DataSet function: extractElementFromTuple

2015-01-14 Thread Fabian Hueske (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277226#comment-14277226
 ] 

Fabian Hueske commented on FLINK-1398:
--

I am not sure how useful / how much needed such an operator is. 
Designing an API includes finding the right trade-off of conciseness and 
providing build-in operators.

Extracting an element can be done using a trivial MapFunction, in Scala or 
Java8 even a lambda function. So this is just syntactic sugar for convenience. 
For that we would pay with two additional methods (one with an Integer index 
for tuples and another one with a field expression String for Pojo and tuple 
types) in the API which is already quite loaded, IMO.

My feeling is, that the gain is not enough for extending the API, but I am open 
for other arguments ;-)

 A new DataSet function: extractElementFromTuple
 ---

 Key: FLINK-1398
 URL: https://issues.apache.org/jira/browse/FLINK-1398
 Project: Flink
  Issue Type: Wish
Reporter: Felix Neutatz
Priority: Minor

 This is the use case:
 {code:xml}
 DataSetTuple2Integer, Double data =  env.fromElements(new 
 Tuple2Integer, Double(1,2.0));
 
 data.map(new ElementFromTuple());
 
 }
 public static final class ElementFromTuple implements 
 MapFunctionTuple2Integer, Double, Double {
 @Override
 public Double map(Tuple2Integer, Double value) {
 return value.f1;
 }
 }
 {code}
 It would be awesome if we had something like this:
 {code:xml}
 data.extractElement(1);
 {code}
 This means that we implement a function for DataSet which extracts a certain 
 element from a given Tuple.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-1395] Add support for JodaTime in KryoS...

2015-01-14 Thread StephanEwen
Github user StephanEwen commented on a diff in the pull request:

https://github.com/apache/flink/pull/304#discussion_r22956572
  
--- Diff: 
flink-tests/src/test/scala/org/apache/flink/api/scala/runtime/KryoGenericTypeSerializerTest.scala
 ---
@@ -125,6 +133,7 @@ class KryoGenericTypeSerializerTest {
   def runTests[T : ClassTag](objects: Seq[T]): Unit ={
 val clsTag = classTag[T]
 val typeInfo = new 
GenericTypeInfo[T](clsTag.runtimeClass.asInstanceOf[Class[T]])
+println(TPE:  + typeInfo)
--- End diff --

Is this still a debugging artifact?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Assigned] (FLINK-1398) A new DataSet function: extractElementFromTuple

2015-01-14 Thread Felix Neutatz (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-1398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Felix Neutatz reassigned FLINK-1398:


Assignee: Felix Neutatz

 A new DataSet function: extractElementFromTuple
 ---

 Key: FLINK-1398
 URL: https://issues.apache.org/jira/browse/FLINK-1398
 Project: Flink
  Issue Type: Wish
Reporter: Felix Neutatz
Assignee: Felix Neutatz
Priority: Minor

 This is the use case:
 {code:xml}
 DataSetTuple2Integer, Double data =  env.fromElements(new 
 Tuple2Integer, Double(1,2.0));
 
 data.map(new ElementFromTuple());
 
 }
 public static final class ElementFromTuple implements 
 MapFunctionTuple2Integer, Double, Double {
 @Override
 public Double map(Tuple2Integer, Double value) {
 return value.f1;
 }
 }
 {code}
 It would be awesome if we had something like this:
 {code:xml}
 data.extractElement(1);
 {code}
 This means that we implement a function for DataSet which extracts a certain 
 element from a given Tuple.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1395) Add Jodatime support to Kryo

2015-01-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277379#comment-14277379
 ] 

ASF GitHub Bot commented on FLINK-1395:
---

Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/304#issuecomment-69964529
  
Good work. Here are some comments/questions

  - jodatime has many classes beyond `DateTime`, such as for example 
`LocalDate`. Should we register them all? They are many, so it may be an idea 
to have something like a common serializer util registers them for you.

  - We definitely need to list jodatime and the kaffee serializers in the 
LICENSE and NOTICE files of the binary distribution.


 Add Jodatime support to Kryo
 

 Key: FLINK-1395
 URL: https://issues.apache.org/jira/browse/FLINK-1395
 Project: Flink
  Issue Type: Bug
Reporter: Robert Metzger
Assignee: Aljoscha Krettek





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-1376] [runtime] Add proper shared slot ...

2015-01-14 Thread tillrohrmann
GitHub user tillrohrmann opened a pull request:

https://github.com/apache/flink/pull/309

[FLINK-1376] [runtime] Add proper shared slot release in case of a fatal 
TaskManager failure

This PR introduces SharedSlots as being a special Slot type and as such 
being released properly in case an Instance has been marked dead. This fixes 
the problem that a dead instance, which has not been shutdown properly, causes 
a job not being removed properly from the system, because it is not aware of 
the SubSlots.

Adds test cases where only the task manager is killed by a Kill message 
(hard shutdown)

@StephanEwen: Requires thorough review because it touches some delicate 
scheduling/slot logic.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tillrohrmann/flink fixSharedSlotReleaseAkka

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/309.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #309


commit ba1dd8b2ce956eb1b14a0ca458a3ca5240da0aee
Author: Till Rohrmann trohrm...@apache.org
Date:   2015-01-12T09:58:45Z

[FLINK-1376] [runtime] Add proper shared slot release in case of a fatal 
TaskManager failure.

Fixes concurrent modification exception of SharedSlot's subSlots field by 
synchronizing all state changing operations through the associated assignment 
group. Fixes deadlock where Instance.markDead first acquires InstanceLock and 
then by releasing the associated slots the assignment group lockcan block with 
a direct releaseSlot call on a SharedSlot which first acquires the assignment 
group lock and then the instance lock in order to return the slot to the 
instance.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-1376) SubSlots are not properly released in case that a TaskManager fatally fails, leaving the system in a corrupted state

2015-01-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277385#comment-14277385
 ] 

ASF GitHub Bot commented on FLINK-1376:
---

GitHub user tillrohrmann opened a pull request:

https://github.com/apache/flink/pull/309

[FLINK-1376] [runtime] Add proper shared slot release in case of a fatal 
TaskManager failure

This PR introduces SharedSlots as being a special Slot type and as such 
being released properly in case an Instance has been marked dead. This fixes 
the problem that a dead instance, which has not been shutdown properly, causes 
a job not being removed properly from the system, because it is not aware of 
the SubSlots.

Adds test cases where only the task manager is killed by a Kill message 
(hard shutdown)

@StephanEwen: Requires thorough review because it touches some delicate 
scheduling/slot logic.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tillrohrmann/flink fixSharedSlotReleaseAkka

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flink/pull/309.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #309


commit ba1dd8b2ce956eb1b14a0ca458a3ca5240da0aee
Author: Till Rohrmann trohrm...@apache.org
Date:   2015-01-12T09:58:45Z

[FLINK-1376] [runtime] Add proper shared slot release in case of a fatal 
TaskManager failure.

Fixes concurrent modification exception of SharedSlot's subSlots field by 
synchronizing all state changing operations through the associated assignment 
group. Fixes deadlock where Instance.markDead first acquires InstanceLock and 
then by releasing the associated slots the assignment group lockcan block with 
a direct releaseSlot call on a SharedSlot which first acquires the assignment 
group lock and then the instance lock in order to return the slot to the 
instance.




 SubSlots are not properly released in case that a TaskManager fatally fails, 
 leaving the system in a corrupted state
 

 Key: FLINK-1376
 URL: https://issues.apache.org/jira/browse/FLINK-1376
 Project: Flink
  Issue Type: Bug
Reporter: Till Rohrmann

 In case that the TaskManager fatally fails and some of the failing node's 
 slots are SharedSlots, then the slots are not properly released by the 
 JobManager. This causes that the corresponding job will not be properly 
 failed, leaving the system in a corrupted state.
 The reason for that is that the AllocatedSlot is not aware of being treated 
 as a SharedSlot and thus he cannot release the associated SubSlots.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1402) Remove extra extend Serializable in InputFormat interface

2015-01-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14277415#comment-14277415
 ] 

ASF GitHub Bot commented on FLINK-1402:
---

Github user hsaputra commented on the pull request:

https://github.com/apache/flink/pull/306#issuecomment-69967270
  
Ah ok, thanks for the info Stephen, good to know it was intentional.
Do you want to keep this pattern?


 Remove extra extend Serializable in InputFormat interface
 -

 Key: FLINK-1402
 URL: https://issues.apache.org/jira/browse/FLINK-1402
 Project: Flink
  Issue Type: Bug
Reporter: Henry Saputra
Assignee: Henry Saputra
Priority: Minor

 The org.apache.flink.api.common.io.InputFormat currently defined as:
 public interface InputFormatOT, T extends InputSplit extends 
 InputSplitSourceT, Serializable
 however, InputSplitSource already extend Serializable:
 public interface InputSplitSourceT extends InputSplit extends Serializable
 so no need for InputFormat to explicitly extend Serializable interface.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: Remove dup code fromRemoteExecutor#executePlan...

2015-01-14 Thread StephanEwen
Github user StephanEwen commented on the pull request:

https://github.com/apache/flink/pull/307#issuecomment-69967981
  
This change looks good to be merged...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-1389) Allow setting custom file extensions for files created by the FileOutputFormat

2015-01-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14276888#comment-14276888
 ] 

ASF GitHub Bot commented on FLINK-1389:
---

Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/301#issuecomment-69913057
  
Thank you for the feedback.
I'll rework the pull request soon.


 Allow setting custom file extensions for files created by the FileOutputFormat
 --

 Key: FLINK-1389
 URL: https://issues.apache.org/jira/browse/FLINK-1389
 Project: Flink
  Issue Type: New Feature
Reporter: Robert Metzger
Assignee: Robert Metzger
Priority: Minor

 A user requested the ability to name avro files with the avro extension.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1399) Add support for registering Serializers with Kryo

2015-01-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14276744#comment-14276744
 ] 

ASF GitHub Bot commented on FLINK-1399:
---

Github user uce commented on the pull request:

https://github.com/apache/flink/pull/305#issuecomment-69896334
  
I agree with @rmetzger, but I can see why @aljoscha likes the first 
suggestion more ;)


 Add support for registering Serializers with Kryo
 -

 Key: FLINK-1399
 URL: https://issues.apache.org/jira/browse/FLINK-1399
 Project: Flink
  Issue Type: Improvement
Reporter: Aljoscha Krettek
Assignee: Aljoscha Krettek





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1400) In local mode, the default TaskManager won't listen on the data port.

2015-01-14 Thread Aljoscha Krettek (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14276731#comment-14276731
 ] 

Aljoscha Krettek commented on FLINK-1400:
-

You can, with a hack. You start a cluster with one local TaskManager using 
start-cluster.sh. Then you manually delete the pid file in the tmp directory 
and start an additional TaskManager using task manager.sh.

On my machine I can use this to start a local cluster with 4 separate 
TaskManagers:
{code}
flink-dist/target/flink-0.9-SNAPSHOT-bin/flink-0.9-SNAPSHOT/bin/start-cluster.sh
  rm /tmp/flink-aljoscha-taskmanager.pid  
flink-dist/target/flink-0.9-SNAPSHOT-bin/flink-0.9-SNAPSHOT/bin/taskmanager.sh 
start  rm /tmp/flink-aljoscha-taskmanager.pid  
flink-dist/target/flink-0.9-SNAPSHOT-bin/flink-0.9-SNAPSHOT/bin/taskmanager.sh 
start  rm /tmp/flink-aljoscha-taskmanager.pid  
flink-dist/target/flink-0.9-SNAPSHOT-bin/flink-0.9-SNAPSHOT/bin/taskmanager.sh 
start
{code}

 In local mode, the default TaskManager won't listen on the data port.
 -

 Key: FLINK-1400
 URL: https://issues.apache.org/jira/browse/FLINK-1400
 Project: Flink
  Issue Type: Bug
Affects Versions: 0.9
 Environment: Ubuntu 14.04 LTS
Reporter: Sergey Dudoladov
Priority: Minor

  The Task Manager automatically started by the Job Manager (JobManager.scala, 
 appr. line  470)  in the local mode does not listen on the dataport. 
 To reproduce:
 1) Start Flink via ./start-local.sh
 2) Look up the data port number on locahost:8081 - Task Managers tab
 3) sudo netstat -taupen | grep dataport_number 
  
 Or  start the second Task Manager and run  Flink with the degree of 
 parallelism 2 (assuming one slot per Task Manager)
 4) ./flink run -p 2 ...
 Task Managers started via ./taskmanager.sh work fine.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] flink pull request: [FLINK-1399] Add support for registering Seria...

2015-01-14 Thread rmetzger
Github user rmetzger commented on the pull request:

https://github.com/apache/flink/pull/305#issuecomment-69896090
  
The second option (env) is probably better because people will see the 
method when their IDE suggests method names.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (FLINK-1320) Add an off-heap variant of the managed memory

2015-01-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14276911#comment-14276911
 ] 

ASF GitHub Bot commented on FLINK-1320:
---

Github user mxm commented on the pull request:

https://github.com/apache/flink/pull/290#issuecomment-69917146
  
I made some finally changes that were required for the current master. 
Also, I ran all integration tests with direct memory allocation enabled. Any 
objections for merging this pull request?


 Add an off-heap variant of the managed memory
 -

 Key: FLINK-1320
 URL: https://issues.apache.org/jira/browse/FLINK-1320
 Project: Flink
  Issue Type: Improvement
  Components: Local Runtime
Reporter: Stephan Ewen
Priority: Minor

 For (nearly) all memory that Flink accumulates (in the form of sort buffers, 
 hash tables, caching), we use a special way of representing data serialized 
 across a set of memory pages. The big work lies in the way the algorithms are 
 implemented to operate on pages, rather than on objects.
 The core class for the memory is the {{MemorySegment}}, which has all methods 
 to set and get primitives values efficiently. It is a somewhat simpler (and 
 faster) variant of a HeapByteBuffer.
 As such, it should be straightforward to create a version where the memory 
 segment is not backed by a heap byte[], but by memory allocated outside the 
 JVM, in a similar way as the NIO DirectByteBuffers, or the Netty direct 
 buffers do it.
 This may have multiple advantages:
   - We reduce the size of the JVM heap (garbage collected) and the number and 
 size of long living alive objects. For large JVM sizes, this may improve 
 performance quite a bit. Utilmately, we would in many cases reduce JVM size 
 to 1/3 to 1/2 and keep the remaining memory outside the JVM.
   - We save copies when we move memory pages to disk (spilling) or through 
 the network (shuffling / broadcasting / forward piping)
 The changes required to implement this are
   - Add a {{UnmanagedMemorySegment}} that only stores the memory adress as a 
 long, and the segment size. It is initialized from a DirectByteBuffer.
   - Allow the MemoryManager to allocate these MemorySegments, instead of the 
 current ones.
   - Make sure that the startup script pick up the mode and configure the heap 
 size and the max direct memory properly.
 Since the MemorySegment is probably the most performance critical class in 
 Flink, we must take care that we do this right. The following are critical 
 considerations:
   - If we want both solutions (heap and off-heap) to exist side-by-side 
 (configurable), we must make the base MemorySegment abstract and implement 
 two versions (heap and off-heap).
   - To get the best performance, we need to make sure that only one class 
 gets loaded (or at least ever used), to ensure optimal JIT de-virtualization 
 and inlining.
   - We should carefully measure the performance of both variants. From 
 previous micro benchmarks, I remember that individual byte accesses in 
 DirectByteBuffers (off-heap) were slightly slower than on-heap, any larger 
 accesses were equally good or slightly better.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-1320) Add an off-heap variant of the managed memory

2015-01-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-1320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14276923#comment-14276923
 ] 

ASF GitHub Bot commented on FLINK-1320:
---

Github user mxm commented on the pull request:

https://github.com/apache/flink/pull/290#issuecomment-69920591
  
@uce Yes, it will be copied from off-heap to heap first. There are also 
other places like the `EventSerializer` where this is the case. I guess it 
depends on the amount of data that is being copied. If you want to operate on a 
byte array, then you have to copy it into the JVM heap.

@uce I ran the integration tests manually. We could add the random testing 
in a separate pull request.


 Add an off-heap variant of the managed memory
 -

 Key: FLINK-1320
 URL: https://issues.apache.org/jira/browse/FLINK-1320
 Project: Flink
  Issue Type: Improvement
  Components: Local Runtime
Reporter: Stephan Ewen
Priority: Minor

 For (nearly) all memory that Flink accumulates (in the form of sort buffers, 
 hash tables, caching), we use a special way of representing data serialized 
 across a set of memory pages. The big work lies in the way the algorithms are 
 implemented to operate on pages, rather than on objects.
 The core class for the memory is the {{MemorySegment}}, which has all methods 
 to set and get primitives values efficiently. It is a somewhat simpler (and 
 faster) variant of a HeapByteBuffer.
 As such, it should be straightforward to create a version where the memory 
 segment is not backed by a heap byte[], but by memory allocated outside the 
 JVM, in a similar way as the NIO DirectByteBuffers, or the Netty direct 
 buffers do it.
 This may have multiple advantages:
   - We reduce the size of the JVM heap (garbage collected) and the number and 
 size of long living alive objects. For large JVM sizes, this may improve 
 performance quite a bit. Utilmately, we would in many cases reduce JVM size 
 to 1/3 to 1/2 and keep the remaining memory outside the JVM.
   - We save copies when we move memory pages to disk (spilling) or through 
 the network (shuffling / broadcasting / forward piping)
 The changes required to implement this are
   - Add a {{UnmanagedMemorySegment}} that only stores the memory adress as a 
 long, and the segment size. It is initialized from a DirectByteBuffer.
   - Allow the MemoryManager to allocate these MemorySegments, instead of the 
 current ones.
   - Make sure that the startup script pick up the mode and configure the heap 
 size and the max direct memory properly.
 Since the MemorySegment is probably the most performance critical class in 
 Flink, we must take care that we do this right. The following are critical 
 considerations:
   - If we want both solutions (heap and off-heap) to exist side-by-side 
 (configurable), we must make the base MemorySegment abstract and implement 
 two versions (heap and off-heap).
   - To get the best performance, we need to make sure that only one class 
 gets loaded (or at least ever used), to ensure optimal JIT de-virtualization 
 and inlining.
   - We should carefully measure the performance of both variants. From 
 previous micro benchmarks, I remember that individual byte accesses in 
 DirectByteBuffers (off-heap) were slightly slower than on-heap, any larger 
 accesses were equally good or slightly better.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)