[jira] [Commented] (FLINK-3673) Annotations for code generation

2016-07-07 Thread Gabor Horvath (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15365925#comment-15365925
 ] 

Gabor Horvath commented on FLINK-3673:
--

According to our discussion at DataArtisans, we do not want to break the 
serialization format right now. The null checks for primitive types and the 
subclass checks for final types are already eliminated from the generated code, 
but the tags are still written out. In the future, when it is desirable to 
change the format, it might give some performance advantage to not to write 
those tags out. This way spills to the disk might happen less frequently.

> Annotations for code generation
> ---
>
> Key: FLINK-3673
> URL: https://issues.apache.org/jira/browse/FLINK-3673
> Project: Flink
>  Issue Type: Sub-task
>  Components: Type Serialization System
>Reporter: Gabor Horvath
>Assignee: Gabor Horvath
>  Labels: gsoc2016
>
> Annotations should be utilized to generate more efficient serialization code.
> Planned improvements:
> * Using never null annotations on a field, the serialized representation can 
> omit the 1 byte null tags and the serializer code handling this tag.
> * Using never null annotiation on the POJO, we can omit the top level null 
> tag.
> * Making a POJO final we can omit the subclass tag.
> The very same annotations can be used to make the getLength method much 
> smarter.
> Code generation is a prerequisite, to avoid runtime checks which could make 
> the common codepath (without annotations) slower.
> I could also annotate some internal Flink types to make them more efficient.
> The main risk: it would break savepoints created with a Flink version that 
> did not have annotation. We could either introduce a compatibility mode, or 
> force  users to recreate those save points.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3672) Code generation for POJO comparators

2016-07-07 Thread Gabor Horvath (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15365921#comment-15365921
 ] 

Gabor Horvath commented on FLINK-3672:
--

Code generation for comparators is done.
Distributing the generated code to task managers is not done yet.

> Code generation for POJO comparators
> 
>
> Key: FLINK-3672
> URL: https://issues.apache.org/jira/browse/FLINK-3672
> Project: Flink
>  Issue Type: Sub-task
>  Components: Type Serialization System
>Reporter: Gabor Horvath
>Assignee: Gabor Horvath
>  Labels: gsoc2016
>
> Runtime code generation should be used to generate the comparison methods for 
> POJOs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3671) Code generation for POJO serializer

2016-07-07 Thread Gabor Horvath (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15365920#comment-15365920
 ] 

Gabor Horvath commented on FLINK-3671:
--

Code generation for POJO serialization is done. 
Distribution of the generated serializers to task managers is not done yet. 
Initial benchmarks on my local machine shows about 10% performance improvements 
on the word count example using POJOs.

> Code generation for POJO serializer
> ---
>
> Key: FLINK-3671
> URL: https://issues.apache.org/jira/browse/FLINK-3671
> Project: Flink
>  Issue Type: Sub-task
>  Components: Type Serialization System
>Reporter: Gabor Horvath
>Assignee: Gabor Horvath
>  Labels: gsoc2016
>
> A new serializer should be added that uses runtime code generation to 
> serialize POJOs.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3673) Annotations for code generation

2016-03-30 Thread Gabor Horvath (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15217940#comment-15217940
 ] 

Gabor Horvath commented on FLINK-3673:
--

Sure, I have extended the description of the ticket.

> Annotations for code generation
> ---
>
> Key: FLINK-3673
> URL: https://issues.apache.org/jira/browse/FLINK-3673
> Project: Flink
>  Issue Type: Sub-task
>  Components: Type Serialization System
>Reporter: Gabor Horvath
>Assignee: Gabor Horvath
>  Labels: gsoc2016
>
> Annotations should be utilized to generate more efficient serialization code.
> Planned improvements:
> * Using never null annotations on a field, the serialized representation can 
> omit the 1 byte null tags and the serializer code handling this tag.
> * Using never null annotiation on the POJO, we can omit the top level null 
> tag.
> * Making a POJO final we can omit the subclass tag.
> The very same annotations can be used to make the getLength method much 
> smarter.
> Code generation is a prerequisite, to avoid runtime checks which could make 
> the common codepath (without annotations) slower.
> I could also annotate some internal Flink types to make them more efficient.
> The main risk: it would break savepoints created with a Flink version that 
> did not have annotation. We could either introduce a compatibility mode, or 
> force  users to recreate those save points.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (FLINK-3673) Annotations for code generation

2016-03-30 Thread Gabor Horvath (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-3673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gabor Horvath updated FLINK-3673:
-
Description: 
Annotations should be utilized to generate more efficient serialization code.

Planned improvements:
* Using never null annotations on a field, the serialized representation can 
omit the 1 byte null tags and the serializer code handling this tag.
* Using never null annotiation on the POJO, we can omit the top level null tag.
* Making a POJO final we can omit the subclass tag.

The very same annotations can be used to make the getLength method much smarter.

Code generation is a prerequisite, to avoid runtime checks which could make the 
common codepath (without annotations) slower.

I could also annotate some internal Flink types to make them more efficient.

The main risk: it would break savepoints created with a Flink version that did 
not have annotation. We could either introduce a compatibility mode, or force  
users to recreate those save points.

  was:
Annotations should be utilized to generate more efficient serialization code.
The very same annotations can be used to make the getLength method much smarter.


> Annotations for code generation
> ---
>
> Key: FLINK-3673
> URL: https://issues.apache.org/jira/browse/FLINK-3673
> Project: Flink
>  Issue Type: Sub-task
>  Components: Type Serialization System
>Reporter: Gabor Horvath
>Assignee: Gabor Horvath
>  Labels: gsoc2016
>
> Annotations should be utilized to generate more efficient serialization code.
> Planned improvements:
> * Using never null annotations on a field, the serialized representation can 
> omit the 1 byte null tags and the serializer code handling this tag.
> * Using never null annotiation on the POJO, we can omit the top level null 
> tag.
> * Making a POJO final we can omit the subclass tag.
> The very same annotations can be used to make the getLength method much 
> smarter.
> Code generation is a prerequisite, to avoid runtime checks which could make 
> the common codepath (without annotations) slower.
> I could also annotate some internal Flink types to make them more efficient.
> The main risk: it would break savepoints created with a Flink version that 
> did not have annotation. We could either introduce a compatibility mode, or 
> force  users to recreate those save points.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (FLINK-3673) Annotations for code generation

2016-03-29 Thread Gabor Horvath (JIRA)
Gabor Horvath created FLINK-3673:


 Summary: Annotations for code generation
 Key: FLINK-3673
 URL: https://issues.apache.org/jira/browse/FLINK-3673
 Project: Flink
  Issue Type: Sub-task
  Components: Type Serialization System
Reporter: Gabor Horvath
Assignee: Gabor Horvath


Annotations should be utilized to generate more efficient serialization code.
The very same annotations can be used to make the getLength method much smarter.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (FLINK-3599) GSoC: Code Generation in Serializers

2016-03-13 Thread Gabor Horvath (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-3599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gabor Horvath updated FLINK-3599:
-
Description: 
The current implementation of the serializers can be a
performance bottleneck in some scenarios. These performance problems were
also reported on the mailing list recently [1].

E.g. the PojoSerializer uses reflection for accessing the fields, which is slow 
[2].

For the complete proposal see [3].

[1] 
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Tuple-performance-and-the-curious-JIT-compiler-td10666.html

[2] 
https://github.com/apache/flink/blob/master/flink-core/src/main/java/org/apache/flink/api/java/typeutils/runtime/PojoSerializer.java#L369

[3] 
https://docs.google.com/document/d/1VC8lCeErx9kI5lCMPiUn625PO0rxR-iKlVqtt3hkVnk

  was:
The current implementation of the serializers can be a
performance bottleneck in some scenarios. These performance problems were
also reported on the mailing list recently [1].

E.g. the PojoSerializer uses reflection for accessing the fields, which is slow 
[2].

For the complete proposal see [3].

[1] 
http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Tuple-performance-and-the-curious-JIT-compiler-td10666.html

[2] 
https://github.com/apache/flink/blob/master/flink-java/src/main/java/org/apache/flink/api/java/typeutils/runtime/PojoSerializer.java#L369

[3] 
https://docs.google.com/document/d/1VC8lCeErx9kI5lCMPiUn625PO0rxR-iKlVqtt3hkVnk


> GSoC: Code Generation in Serializers
> 
>
> Key: FLINK-3599
> URL: https://issues.apache.org/jira/browse/FLINK-3599
> Project: Flink
>  Issue Type: Improvement
>  Components: Type Serialization System
>Reporter: Márton Balassi
>Assignee: Gabor Horvath
>  Labels: gsoc2016, mentor
>
> The current implementation of the serializers can be a
> performance bottleneck in some scenarios. These performance problems were
> also reported on the mailing list recently [1].
> E.g. the PojoSerializer uses reflection for accessing the fields, which is 
> slow [2].
> For the complete proposal see [3].
> [1] 
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Tuple-performance-and-the-curious-JIT-compiler-td10666.html
> [2] 
> https://github.com/apache/flink/blob/master/flink-core/src/main/java/org/apache/flink/api/java/typeutils/runtime/PojoSerializer.java#L369
> [3] 
> https://docs.google.com/document/d/1VC8lCeErx9kI5lCMPiUn625PO0rxR-iKlVqtt3hkVnk



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (FLINK-3322) MemoryManager creates too much GC pressure with iterative jobs

2016-03-02 Thread Gabor Horvath (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-3322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gabor Horvath reassigned FLINK-3322:


Assignee: Gabor Horvath

> MemoryManager creates too much GC pressure with iterative jobs
> --
>
> Key: FLINK-3322
> URL: https://issues.apache.org/jira/browse/FLINK-3322
> Project: Flink
>  Issue Type: Bug
>  Components: Distributed Runtime
>Affects Versions: 1.0.0
>Reporter: Gabor Gevay
>Assignee: Gabor Horvath
>Priority: Critical
> Fix For: 1.0.0
>
>
> When taskmanager.memory.preallocate is false (the default), released memory 
> segments are not added to a pool, but the GC is expected to take care of 
> them. This puts too much pressure on the GC with iterative jobs, where the 
> operators reallocate all memory at every superstep.
> See the following discussion on the mailing list:
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Memory-manager-behavior-in-iterative-jobs-tt10066.html
> Reproducing the issue:
> https://github.com/ggevay/flink/tree/MemoryManager-crazy-gc
> The class to start is malom.Solver. If you increase the memory given to the 
> JVM from 1 to 50 GB, performance gradually degrades by more than 10 times. 
> (It will generate some lookuptables to /tmp on first run for a few minutes.) 
> (I think the slowdown might also depend somewhat on 
> taskmanager.memory.fraction, because more unused non-managed memory results 
> in rarer GCs.)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (FLINK-3457) Link to Apache Flink meetups from the 'Community' section of the website

2016-02-23 Thread Gabor Horvath (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-3457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gabor Horvath reassigned FLINK-3457:


Assignee: Gabor Horvath

> Link to Apache Flink meetups from the 'Community' section of the website
> 
>
> Key: FLINK-3457
> URL: https://issues.apache.org/jira/browse/FLINK-3457
> Project: Flink
>  Issue Type: Task
>  Components: Documentation
>Reporter: Slim Baltagi
>Assignee: Gabor Horvath
>Priority: Trivial
>
> Now with the number of Apache Flink meetups increasing worldwide, it is 
> helpful to add a link to Apache Flink meetups 
> http://www.meetup.com/topics/apache-flink/ to the community section of 
> https://flink.apache.org/community.html so visitors can conveniently find 
> them  right from the Apache Flink website. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3438) ExternalProcessRunner fails to detect ClassNotFound exception because of locale settings

2016-02-20 Thread Gabor Horvath (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15155701#comment-15155701
 ] 

Gabor Horvath commented on FLINK-3438:
--

I had similar problems due to the following check: 
https://github.com/apache/flink/blob/master/flink-runtime/src/test/java/org/apache/flink/runtime/jobmanager/JobManagerStartupTest.java#L92

The local workaround in my case was to use LANG=en_us when I was running the 
test suite.


> ExternalProcessRunner fails to detect ClassNotFound exception because of 
> locale settings
> 
>
> Key: FLINK-3438
> URL: https://issues.apache.org/jira/browse/FLINK-3438
> Project: Flink
>  Issue Type: Bug
>  Components: Core
>Affects Versions: 1.0.0
>Reporter: Stefano Baghino
>Priority: Minor
> Fix For: 1.0.0
>
>
> ExternalProcessRunner tries to detect a ClassNotFoundException in the run 
> process by comparing its output with a fixed string of test; this means that 
> localized text reporting said exception is not interpreted as such.
> To reproduce:
> * test the `ExternalProcessRunnerTest.testClassNotFound` setting the 
> following system properties on the JVM: {{-Duser.country=IT 
> -Duser.language=it}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (FLINK-3422) Scramble HashPartitioner hashes

2016-02-18 Thread Gabor Horvath (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15152209#comment-15152209
 ] 

Gabor Horvath edited comment on FLINK-3422 at 2/18/16 12:13 PM:


Hi, I am a Masters student from Hungary and a newcomer to Flink. I plan to look 
into this issue during the weekend to get more familiar with the code. In case 
I get stuck, Márton Balassi and Gábor Gévay will help me. I am also planning to 
do a GSoC project during this summer.


was (Author: xazax):
Hi, I am a Masters student from Hungary and a newcomer to Flink. I plan to look 
into this issue during the weekend to get more familiar with the code.

> Scramble HashPartitioner hashes
> ---
>
> Key: FLINK-3422
> URL: https://issues.apache.org/jira/browse/FLINK-3422
> Project: Flink
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 0.10.2
>Reporter: Stephan Ewen
>Assignee: Gabor Horvath
>Priority: Critical
> Fix For: 1.0.0
>
>
> The {{HashPartitioner}} used by the streaming API does not apply any hash 
> scrambling against bad user hash functions.
> We should apply a murmor or jenkins hash on top of the hash code, similar as 
> in the {{DataSet}} API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (FLINK-3422) Scramble HashPartitioner hashes

2016-02-18 Thread Gabor Horvath (JIRA)

 [ 
https://issues.apache.org/jira/browse/FLINK-3422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gabor Horvath reassigned FLINK-3422:


Assignee: Gabor Horvath

> Scramble HashPartitioner hashes
> ---
>
> Key: FLINK-3422
> URL: https://issues.apache.org/jira/browse/FLINK-3422
> Project: Flink
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 0.10.2
>Reporter: Stephan Ewen
>Assignee: Gabor Horvath
>Priority: Critical
> Fix For: 1.0.0
>
>
> The {{HashPartitioner}} used by the streaming API does not apply any hash 
> scrambling against bad user hash functions.
> We should apply a murmor or jenkins hash on top of the hash code, similar as 
> in the {{DataSet}} API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLINK-3422) Scramble HashPartitioner hashes

2016-02-18 Thread Gabor Horvath (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-3422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15152209#comment-15152209
 ] 

Gabor Horvath commented on FLINK-3422:
--

Hi, I am a Masters student from Hungary and a newcomer to Flink. I plan to look 
into this issue during the weekend to get more familiar with the code.

> Scramble HashPartitioner hashes
> ---
>
> Key: FLINK-3422
> URL: https://issues.apache.org/jira/browse/FLINK-3422
> Project: Flink
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 0.10.2
>Reporter: Stephan Ewen
>Priority: Critical
> Fix For: 1.0.0
>
>
> The {{HashPartitioner}} used by the streaming API does not apply any hash 
> scrambling against bad user hash functions.
> We should apply a murmor or jenkins hash on top of the hash code, similar as 
> in the {{DataSet}} API.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)