[jira] [Commented] (FLINK-3673) Annotations for code generation
[ https://issues.apache.org/jira/browse/FLINK-3673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15365925#comment-15365925 ] Gabor Horvath commented on FLINK-3673: -- According to our discussion at DataArtisans, we do not want to break the serialization format right now. The null checks for primitive types and the subclass checks for final types are already eliminated from the generated code, but the tags are still written out. In the future, when it is desirable to change the format, it might give some performance advantage to not to write those tags out. This way spills to the disk might happen less frequently. > Annotations for code generation > --- > > Key: FLINK-3673 > URL: https://issues.apache.org/jira/browse/FLINK-3673 > Project: Flink > Issue Type: Sub-task > Components: Type Serialization System >Reporter: Gabor Horvath >Assignee: Gabor Horvath > Labels: gsoc2016 > > Annotations should be utilized to generate more efficient serialization code. > Planned improvements: > * Using never null annotations on a field, the serialized representation can > omit the 1 byte null tags and the serializer code handling this tag. > * Using never null annotiation on the POJO, we can omit the top level null > tag. > * Making a POJO final we can omit the subclass tag. > The very same annotations can be used to make the getLength method much > smarter. > Code generation is a prerequisite, to avoid runtime checks which could make > the common codepath (without annotations) slower. > I could also annotate some internal Flink types to make them more efficient. > The main risk: it would break savepoints created with a Flink version that > did not have annotation. We could either introduce a compatibility mode, or > force users to recreate those save points. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-3672) Code generation for POJO comparators
[ https://issues.apache.org/jira/browse/FLINK-3672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15365921#comment-15365921 ] Gabor Horvath commented on FLINK-3672: -- Code generation for comparators is done. Distributing the generated code to task managers is not done yet. > Code generation for POJO comparators > > > Key: FLINK-3672 > URL: https://issues.apache.org/jira/browse/FLINK-3672 > Project: Flink > Issue Type: Sub-task > Components: Type Serialization System >Reporter: Gabor Horvath >Assignee: Gabor Horvath > Labels: gsoc2016 > > Runtime code generation should be used to generate the comparison methods for > POJOs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-3671) Code generation for POJO serializer
[ https://issues.apache.org/jira/browse/FLINK-3671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15365920#comment-15365920 ] Gabor Horvath commented on FLINK-3671: -- Code generation for POJO serialization is done. Distribution of the generated serializers to task managers is not done yet. Initial benchmarks on my local machine shows about 10% performance improvements on the word count example using POJOs. > Code generation for POJO serializer > --- > > Key: FLINK-3671 > URL: https://issues.apache.org/jira/browse/FLINK-3671 > Project: Flink > Issue Type: Sub-task > Components: Type Serialization System >Reporter: Gabor Horvath >Assignee: Gabor Horvath > Labels: gsoc2016 > > A new serializer should be added that uses runtime code generation to > serialize POJOs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-3673) Annotations for code generation
[ https://issues.apache.org/jira/browse/FLINK-3673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15217940#comment-15217940 ] Gabor Horvath commented on FLINK-3673: -- Sure, I have extended the description of the ticket. > Annotations for code generation > --- > > Key: FLINK-3673 > URL: https://issues.apache.org/jira/browse/FLINK-3673 > Project: Flink > Issue Type: Sub-task > Components: Type Serialization System >Reporter: Gabor Horvath >Assignee: Gabor Horvath > Labels: gsoc2016 > > Annotations should be utilized to generate more efficient serialization code. > Planned improvements: > * Using never null annotations on a field, the serialized representation can > omit the 1 byte null tags and the serializer code handling this tag. > * Using never null annotiation on the POJO, we can omit the top level null > tag. > * Making a POJO final we can omit the subclass tag. > The very same annotations can be used to make the getLength method much > smarter. > Code generation is a prerequisite, to avoid runtime checks which could make > the common codepath (without annotations) slower. > I could also annotate some internal Flink types to make them more efficient. > The main risk: it would break savepoints created with a Flink version that > did not have annotation. We could either introduce a compatibility mode, or > force users to recreate those save points. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (FLINK-3673) Annotations for code generation
[ https://issues.apache.org/jira/browse/FLINK-3673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabor Horvath updated FLINK-3673: - Description: Annotations should be utilized to generate more efficient serialization code. Planned improvements: * Using never null annotations on a field, the serialized representation can omit the 1 byte null tags and the serializer code handling this tag. * Using never null annotiation on the POJO, we can omit the top level null tag. * Making a POJO final we can omit the subclass tag. The very same annotations can be used to make the getLength method much smarter. Code generation is a prerequisite, to avoid runtime checks which could make the common codepath (without annotations) slower. I could also annotate some internal Flink types to make them more efficient. The main risk: it would break savepoints created with a Flink version that did not have annotation. We could either introduce a compatibility mode, or force users to recreate those save points. was: Annotations should be utilized to generate more efficient serialization code. The very same annotations can be used to make the getLength method much smarter. > Annotations for code generation > --- > > Key: FLINK-3673 > URL: https://issues.apache.org/jira/browse/FLINK-3673 > Project: Flink > Issue Type: Sub-task > Components: Type Serialization System >Reporter: Gabor Horvath >Assignee: Gabor Horvath > Labels: gsoc2016 > > Annotations should be utilized to generate more efficient serialization code. > Planned improvements: > * Using never null annotations on a field, the serialized representation can > omit the 1 byte null tags and the serializer code handling this tag. > * Using never null annotiation on the POJO, we can omit the top level null > tag. > * Making a POJO final we can omit the subclass tag. > The very same annotations can be used to make the getLength method much > smarter. > Code generation is a prerequisite, to avoid runtime checks which could make > the common codepath (without annotations) slower. > I could also annotate some internal Flink types to make them more efficient. > The main risk: it would break savepoints created with a Flink version that > did not have annotation. We could either introduce a compatibility mode, or > force users to recreate those save points. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (FLINK-3673) Annotations for code generation
Gabor Horvath created FLINK-3673: Summary: Annotations for code generation Key: FLINK-3673 URL: https://issues.apache.org/jira/browse/FLINK-3673 Project: Flink Issue Type: Sub-task Components: Type Serialization System Reporter: Gabor Horvath Assignee: Gabor Horvath Annotations should be utilized to generate more efficient serialization code. The very same annotations can be used to make the getLength method much smarter. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (FLINK-3599) GSoC: Code Generation in Serializers
[ https://issues.apache.org/jira/browse/FLINK-3599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabor Horvath updated FLINK-3599: - Description: The current implementation of the serializers can be a performance bottleneck in some scenarios. These performance problems were also reported on the mailing list recently [1]. E.g. the PojoSerializer uses reflection for accessing the fields, which is slow [2]. For the complete proposal see [3]. [1] http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Tuple-performance-and-the-curious-JIT-compiler-td10666.html [2] https://github.com/apache/flink/blob/master/flink-core/src/main/java/org/apache/flink/api/java/typeutils/runtime/PojoSerializer.java#L369 [3] https://docs.google.com/document/d/1VC8lCeErx9kI5lCMPiUn625PO0rxR-iKlVqtt3hkVnk was: The current implementation of the serializers can be a performance bottleneck in some scenarios. These performance problems were also reported on the mailing list recently [1]. E.g. the PojoSerializer uses reflection for accessing the fields, which is slow [2]. For the complete proposal see [3]. [1] http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Tuple-performance-and-the-curious-JIT-compiler-td10666.html [2] https://github.com/apache/flink/blob/master/flink-java/src/main/java/org/apache/flink/api/java/typeutils/runtime/PojoSerializer.java#L369 [3] https://docs.google.com/document/d/1VC8lCeErx9kI5lCMPiUn625PO0rxR-iKlVqtt3hkVnk > GSoC: Code Generation in Serializers > > > Key: FLINK-3599 > URL: https://issues.apache.org/jira/browse/FLINK-3599 > Project: Flink > Issue Type: Improvement > Components: Type Serialization System >Reporter: Márton Balassi >Assignee: Gabor Horvath > Labels: gsoc2016, mentor > > The current implementation of the serializers can be a > performance bottleneck in some scenarios. These performance problems were > also reported on the mailing list recently [1]. > E.g. the PojoSerializer uses reflection for accessing the fields, which is > slow [2]. > For the complete proposal see [3]. > [1] > http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Tuple-performance-and-the-curious-JIT-compiler-td10666.html > [2] > https://github.com/apache/flink/blob/master/flink-core/src/main/java/org/apache/flink/api/java/typeutils/runtime/PojoSerializer.java#L369 > [3] > https://docs.google.com/document/d/1VC8lCeErx9kI5lCMPiUn625PO0rxR-iKlVqtt3hkVnk -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (FLINK-3322) MemoryManager creates too much GC pressure with iterative jobs
[ https://issues.apache.org/jira/browse/FLINK-3322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabor Horvath reassigned FLINK-3322: Assignee: Gabor Horvath > MemoryManager creates too much GC pressure with iterative jobs > -- > > Key: FLINK-3322 > URL: https://issues.apache.org/jira/browse/FLINK-3322 > Project: Flink > Issue Type: Bug > Components: Distributed Runtime >Affects Versions: 1.0.0 >Reporter: Gabor Gevay >Assignee: Gabor Horvath >Priority: Critical > Fix For: 1.0.0 > > > When taskmanager.memory.preallocate is false (the default), released memory > segments are not added to a pool, but the GC is expected to take care of > them. This puts too much pressure on the GC with iterative jobs, where the > operators reallocate all memory at every superstep. > See the following discussion on the mailing list: > http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Memory-manager-behavior-in-iterative-jobs-tt10066.html > Reproducing the issue: > https://github.com/ggevay/flink/tree/MemoryManager-crazy-gc > The class to start is malom.Solver. If you increase the memory given to the > JVM from 1 to 50 GB, performance gradually degrades by more than 10 times. > (It will generate some lookuptables to /tmp on first run for a few minutes.) > (I think the slowdown might also depend somewhat on > taskmanager.memory.fraction, because more unused non-managed memory results > in rarer GCs.) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (FLINK-3457) Link to Apache Flink meetups from the 'Community' section of the website
[ https://issues.apache.org/jira/browse/FLINK-3457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabor Horvath reassigned FLINK-3457: Assignee: Gabor Horvath > Link to Apache Flink meetups from the 'Community' section of the website > > > Key: FLINK-3457 > URL: https://issues.apache.org/jira/browse/FLINK-3457 > Project: Flink > Issue Type: Task > Components: Documentation >Reporter: Slim Baltagi >Assignee: Gabor Horvath >Priority: Trivial > > Now with the number of Apache Flink meetups increasing worldwide, it is > helpful to add a link to Apache Flink meetups > http://www.meetup.com/topics/apache-flink/ to the community section of > https://flink.apache.org/community.html so visitors can conveniently find > them right from the Apache Flink website. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-3438) ExternalProcessRunner fails to detect ClassNotFound exception because of locale settings
[ https://issues.apache.org/jira/browse/FLINK-3438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15155701#comment-15155701 ] Gabor Horvath commented on FLINK-3438: -- I had similar problems due to the following check: https://github.com/apache/flink/blob/master/flink-runtime/src/test/java/org/apache/flink/runtime/jobmanager/JobManagerStartupTest.java#L92 The local workaround in my case was to use LANG=en_us when I was running the test suite. > ExternalProcessRunner fails to detect ClassNotFound exception because of > locale settings > > > Key: FLINK-3438 > URL: https://issues.apache.org/jira/browse/FLINK-3438 > Project: Flink > Issue Type: Bug > Components: Core >Affects Versions: 1.0.0 >Reporter: Stefano Baghino >Priority: Minor > Fix For: 1.0.0 > > > ExternalProcessRunner tries to detect a ClassNotFoundException in the run > process by comparing its output with a fixed string of test; this means that > localized text reporting said exception is not interpreted as such. > To reproduce: > * test the `ExternalProcessRunnerTest.testClassNotFound` setting the > following system properties on the JVM: {{-Duser.country=IT > -Duser.language=it}} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (FLINK-3422) Scramble HashPartitioner hashes
[ https://issues.apache.org/jira/browse/FLINK-3422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15152209#comment-15152209 ] Gabor Horvath edited comment on FLINK-3422 at 2/18/16 12:13 PM: Hi, I am a Masters student from Hungary and a newcomer to Flink. I plan to look into this issue during the weekend to get more familiar with the code. In case I get stuck, Márton Balassi and Gábor Gévay will help me. I am also planning to do a GSoC project during this summer. was (Author: xazax): Hi, I am a Masters student from Hungary and a newcomer to Flink. I plan to look into this issue during the weekend to get more familiar with the code. > Scramble HashPartitioner hashes > --- > > Key: FLINK-3422 > URL: https://issues.apache.org/jira/browse/FLINK-3422 > Project: Flink > Issue Type: Improvement > Components: Streaming >Affects Versions: 0.10.2 >Reporter: Stephan Ewen >Assignee: Gabor Horvath >Priority: Critical > Fix For: 1.0.0 > > > The {{HashPartitioner}} used by the streaming API does not apply any hash > scrambling against bad user hash functions. > We should apply a murmor or jenkins hash on top of the hash code, similar as > in the {{DataSet}} API. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (FLINK-3422) Scramble HashPartitioner hashes
[ https://issues.apache.org/jira/browse/FLINK-3422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabor Horvath reassigned FLINK-3422: Assignee: Gabor Horvath > Scramble HashPartitioner hashes > --- > > Key: FLINK-3422 > URL: https://issues.apache.org/jira/browse/FLINK-3422 > Project: Flink > Issue Type: Improvement > Components: Streaming >Affects Versions: 0.10.2 >Reporter: Stephan Ewen >Assignee: Gabor Horvath >Priority: Critical > Fix For: 1.0.0 > > > The {{HashPartitioner}} used by the streaming API does not apply any hash > scrambling against bad user hash functions. > We should apply a murmor or jenkins hash on top of the hash code, similar as > in the {{DataSet}} API. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLINK-3422) Scramble HashPartitioner hashes
[ https://issues.apache.org/jira/browse/FLINK-3422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15152209#comment-15152209 ] Gabor Horvath commented on FLINK-3422: -- Hi, I am a Masters student from Hungary and a newcomer to Flink. I plan to look into this issue during the weekend to get more familiar with the code. > Scramble HashPartitioner hashes > --- > > Key: FLINK-3422 > URL: https://issues.apache.org/jira/browse/FLINK-3422 > Project: Flink > Issue Type: Improvement > Components: Streaming >Affects Versions: 0.10.2 >Reporter: Stephan Ewen >Priority: Critical > Fix For: 1.0.0 > > > The {{HashPartitioner}} used by the streaming API does not apply any hash > scrambling against bad user hash functions. > We should apply a murmor or jenkins hash on top of the hash code, similar as > in the {{DataSet}} API. -- This message was sent by Atlassian JIRA (v6.3.4#6332)