[jira] [Commented] (FLINK-6022) Don't serialise Schema when serialising Avro GenericRecord
[ https://issues.apache.org/jira/browse/FLINK-6022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16443069#comment-16443069 ] ASF GitHub Bot commented on FLINK-6022: --- Github user shashank734 commented on the issue: https://github.com/apache/flink/pull/4943 I think this is due to this issue : https://issues.apache.org/jira/browse/FLINK-9202 > Don't serialise Schema when serialising Avro GenericRecord > -- > > Key: FLINK-6022 > URL: https://issues.apache.org/jira/browse/FLINK-6022 > Project: Flink > Issue Type: Improvement > Components: Type Serialization System >Reporter: Robert Metzger >Assignee: Stephan Ewen >Priority: Major > Fix For: 1.5.0 > > > Currently, Flink is serializing the schema for each Avro GenericRecord in the > stream. > This leads to a lot of overhead over the wire/disk + high serialization costs. > Therefore, I'm proposing to improve the support for GenericRecord in Flink by > shipping the schema to each serializer through the AvroTypeInformation. > Then, we can only support GenericRecords with the same type per stream, but > the performance will be much better. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-6022) Don't serialise Schema when serialising Avro GenericRecord
[ https://issues.apache.org/jira/browse/FLINK-6022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16432341#comment-16432341 ] ASF GitHub Bot commented on FLINK-6022: --- Github user shashank734 commented on the issue: https://github.com/apache/flink/pull/4943 @zentol Thanks, I think it's the wrong place to ask But Actually We have tried to use AvroTypeInfo, But it was unable to restore from the savepoint (Note we have changed the schema and class with 1 extra variable) So why I was asking if I can get a very minimal example or hint to check Am I am doing something wrong? I am using Scala. > Don't serialise Schema when serialising Avro GenericRecord > -- > > Key: FLINK-6022 > URL: https://issues.apache.org/jira/browse/FLINK-6022 > Project: Flink > Issue Type: Improvement > Components: Type Serialization System >Reporter: Robert Metzger >Assignee: Stephan Ewen >Priority: Major > Fix For: 1.5.0 > > > Currently, Flink is serializing the schema for each Avro GenericRecord in the > stream. > This leads to a lot of overhead over the wire/disk + high serialization costs. > Therefore, I'm proposing to improve the support for GenericRecord in Flink by > shipping the schema to each serializer through the AvroTypeInformation. > Then, we can only support GenericRecords with the same type per stream, but > the performance will be much better. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-6022) Don't serialise Schema when serialising Avro GenericRecord
[ https://issues.apache.org/jira/browse/FLINK-6022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16432133#comment-16432133 ] ASF GitHub Bot commented on FLINK-6022: --- Github user zentol commented on the issue: https://github.com/apache/flink/pull/4943 @shashank734 The commits are contained in 1.4 already. Have you read [this](https://github.com/apache/flink/pull/4943#issuecomment-342156083) comment? > Don't serialise Schema when serialising Avro GenericRecord > -- > > Key: FLINK-6022 > URL: https://issues.apache.org/jira/browse/FLINK-6022 > Project: Flink > Issue Type: Improvement > Components: Type Serialization System >Reporter: Robert Metzger >Assignee: Stephan Ewen >Priority: Major > Fix For: 1.5.0 > > > Currently, Flink is serializing the schema for each Avro GenericRecord in the > stream. > This leads to a lot of overhead over the wire/disk + high serialization costs. > Therefore, I'm proposing to improve the support for GenericRecord in Flink by > shipping the schema to each serializer through the AvroTypeInformation. > Then, we can only support GenericRecords with the same type per stream, but > the performance will be much better. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-6022) Don't serialise Schema when serialising Avro GenericRecord
[ https://issues.apache.org/jira/browse/FLINK-6022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16432112#comment-16432112 ] ASF GitHub Bot commented on FLINK-6022: --- Github user shashank734 commented on the issue: https://github.com/apache/flink/pull/4943 @StephanEwen Are these changes part of 1.5 or 1.4, Do you have any example how I can use this with states and CEP? Please give me some hint. I have seen test cases of Input and Output only. State evolution is the main issue for us nowadays. > Don't serialise Schema when serialising Avro GenericRecord > -- > > Key: FLINK-6022 > URL: https://issues.apache.org/jira/browse/FLINK-6022 > Project: Flink > Issue Type: Improvement > Components: Type Serialization System >Reporter: Robert Metzger >Assignee: Stephan Ewen >Priority: Major > Fix For: 1.5.0 > > > Currently, Flink is serializing the schema for each Avro GenericRecord in the > stream. > This leads to a lot of overhead over the wire/disk + high serialization costs. > Therefore, I'm proposing to improve the support for GenericRecord in Flink by > shipping the schema to each serializer through the AvroTypeInformation. > Then, we can only support GenericRecords with the same type per stream, but > the performance will be much better. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (FLINK-6022) Don't serialise Schema when serialising Avro GenericRecord
[ https://issues.apache.org/jira/browse/FLINK-6022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16246015#comment-16246015 ] ASF GitHub Bot commented on FLINK-6022: --- Github user StephanEwen closed the pull request at: https://github.com/apache/flink/pull/4943 > Don't serialise Schema when serialising Avro GenericRecord > -- > > Key: FLINK-6022 > URL: https://issues.apache.org/jira/browse/FLINK-6022 > Project: Flink > Issue Type: Improvement > Components: Type Serialization System >Reporter: Robert Metzger >Assignee: Stephan Ewen > Fix For: 1.5.0 > > > Currently, Flink is serializing the schema for each Avro GenericRecord in the > stream. > This leads to a lot of overhead over the wire/disk + high serialization costs. > Therefore, I'm proposing to improve the support for GenericRecord in Flink by > shipping the schema to each serializer through the AvroTypeInformation. > Then, we can only support GenericRecords with the same type per stream, but > the performance will be much better. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (FLINK-6022) Don't serialise Schema when serialising Avro GenericRecord
[ https://issues.apache.org/jira/browse/FLINK-6022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16245640#comment-16245640 ] Robert Metzger commented on FLINK-6022: --- This whole JIRA is only about the case when people are using a DataStream. I agree its not a good idea, but people are doing it. As stated in the JIRA description, the schema of all records in a stream need to be the same for this to work. I believe we should keep this JIRA open, because the problem has not been addressed. > Don't serialise Schema when serialising Avro GenericRecord > -- > > Key: FLINK-6022 > URL: https://issues.apache.org/jira/browse/FLINK-6022 > Project: Flink > Issue Type: Improvement > Components: Type Serialization System >Reporter: Robert Metzger >Assignee: Stephan Ewen > Fix For: 1.5.0 > > > Currently, Flink is serializing the schema for each Avro GenericRecord in the > stream. > This leads to a lot of overhead over the wire/disk + high serialization costs. > Therefore, I'm proposing to improve the support for GenericRecord in Flink by > shipping the schema to each serializer through the AvroTypeInformation. > Then, we can only support GenericRecords with the same type per stream, but > the performance will be much better. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (FLINK-6022) Don't serialise Schema when serialising Avro GenericRecord
[ https://issues.apache.org/jira/browse/FLINK-6022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16245577#comment-16245577 ] Stephan Ewen commented on FLINK-6022: - We are not serializing the schema in the Avro Serializer. If the Avro Serializer is chosen, this is fixed. I am wondering if the case is if one uses explicitly a "generic record" from Avro as the exchange data type. That is not a good idea in the first place in my opinion. In that case, isn't it possible that each generic record is different and thus you always need a schema anyways. > Don't serialise Schema when serialising Avro GenericRecord > -- > > Key: FLINK-6022 > URL: https://issues.apache.org/jira/browse/FLINK-6022 > Project: Flink > Issue Type: Improvement > Components: Type Serialization System >Reporter: Robert Metzger >Assignee: Stephan Ewen > Fix For: 1.5.0 > > > Currently, Flink is serializing the schema for each Avro GenericRecord in the > stream. > This leads to a lot of overhead over the wire/disk + high serialization costs. > Therefore, I'm proposing to improve the support for GenericRecord in Flink by > shipping the schema to each serializer through the AvroTypeInformation. > Then, we can only support GenericRecords with the same type per stream, but > the performance will be much better. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (FLINK-6022) Don't serialise Schema when serialising Avro GenericRecord
[ https://issues.apache.org/jira/browse/FLINK-6022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16243822#comment-16243822 ] Aljoscha Krettek commented on FLINK-6022: - I changed the title and will move to 1.5. Is it even possible to tell Avro to not serialise the schema? > Don't serialise Schema when serialising Avro GenericRecord > -- > > Key: FLINK-6022 > URL: https://issues.apache.org/jira/browse/FLINK-6022 > Project: Flink > Issue Type: Improvement > Components: Type Serialization System >Reporter: Robert Metzger >Assignee: Stephan Ewen >Priority: Blocker > Fix For: 1.4.0 > > > Currently, Flink is serializing the schema for each Avro GenericRecord in the > stream. > This leads to a lot of overhead over the wire/disk + high serialization costs. > Therefore, I'm proposing to improve the support for GenericRecord in Flink by > shipping the schema to each serializer through the AvroTypeInformation. > Then, we can only support GenericRecords with the same type per stream, but > the performance will be much better. -- This message was sent by Atlassian JIRA (v6.4.14#64029)