[jira] [Commented] (FLINK-6022) Don't serialise Schema when serialising Avro GenericRecord

2018-04-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-6022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16443069#comment-16443069
 ] 

ASF GitHub Bot commented on FLINK-6022:
---

Github user shashank734 commented on the issue:

https://github.com/apache/flink/pull/4943
  
I think this is due to this issue :  
https://issues.apache.org/jira/browse/FLINK-9202


> Don't serialise Schema when serialising Avro GenericRecord
> --
>
> Key: FLINK-6022
> URL: https://issues.apache.org/jira/browse/FLINK-6022
> Project: Flink
>  Issue Type: Improvement
>  Components: Type Serialization System
>Reporter: Robert Metzger
>Assignee: Stephan Ewen
>Priority: Major
> Fix For: 1.5.0
>
>
> Currently, Flink is serializing the schema for each Avro GenericRecord in the 
> stream.
> This leads to a lot of overhead over the wire/disk + high serialization costs.
> Therefore, I'm proposing to improve the support for GenericRecord in Flink by 
> shipping the schema to each serializer  through the AvroTypeInformation.
> Then, we can only support GenericRecords with the same type per stream, but 
> the performance will be much better.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-6022) Don't serialise Schema when serialising Avro GenericRecord

2018-04-10 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-6022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16432341#comment-16432341
 ] 

ASF GitHub Bot commented on FLINK-6022:
---

Github user shashank734 commented on the issue:

https://github.com/apache/flink/pull/4943
  
@zentol Thanks, I think it's the wrong place to ask But Actually We have 
tried to use AvroTypeInfo, But it was unable to restore from the savepoint 
(Note we have changed the schema and class with 1 extra variable) So why I was 
asking if I can get a very minimal example or hint to check Am I am doing 
something wrong? I am using Scala.


> Don't serialise Schema when serialising Avro GenericRecord
> --
>
> Key: FLINK-6022
> URL: https://issues.apache.org/jira/browse/FLINK-6022
> Project: Flink
>  Issue Type: Improvement
>  Components: Type Serialization System
>Reporter: Robert Metzger
>Assignee: Stephan Ewen
>Priority: Major
> Fix For: 1.5.0
>
>
> Currently, Flink is serializing the schema for each Avro GenericRecord in the 
> stream.
> This leads to a lot of overhead over the wire/disk + high serialization costs.
> Therefore, I'm proposing to improve the support for GenericRecord in Flink by 
> shipping the schema to each serializer  through the AvroTypeInformation.
> Then, we can only support GenericRecords with the same type per stream, but 
> the performance will be much better.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-6022) Don't serialise Schema when serialising Avro GenericRecord

2018-04-10 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-6022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16432133#comment-16432133
 ] 

ASF GitHub Bot commented on FLINK-6022:
---

Github user zentol commented on the issue:

https://github.com/apache/flink/pull/4943
  
@shashank734 The commits are contained in 1.4 already. Have you read 
[this](https://github.com/apache/flink/pull/4943#issuecomment-342156083) 
comment?


> Don't serialise Schema when serialising Avro GenericRecord
> --
>
> Key: FLINK-6022
> URL: https://issues.apache.org/jira/browse/FLINK-6022
> Project: Flink
>  Issue Type: Improvement
>  Components: Type Serialization System
>Reporter: Robert Metzger
>Assignee: Stephan Ewen
>Priority: Major
> Fix For: 1.5.0
>
>
> Currently, Flink is serializing the schema for each Avro GenericRecord in the 
> stream.
> This leads to a lot of overhead over the wire/disk + high serialization costs.
> Therefore, I'm proposing to improve the support for GenericRecord in Flink by 
> shipping the schema to each serializer  through the AvroTypeInformation.
> Then, we can only support GenericRecords with the same type per stream, but 
> the performance will be much better.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-6022) Don't serialise Schema when serialising Avro GenericRecord

2018-04-10 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-6022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16432112#comment-16432112
 ] 

ASF GitHub Bot commented on FLINK-6022:
---

Github user shashank734 commented on the issue:

https://github.com/apache/flink/pull/4943
  
@StephanEwen Are these changes part of 1.5 or 1.4, Do you have any example 
how I can use this with states and CEP? Please give me some hint. I have seen 
test cases of Input and Output only. State evolution is the main issue for us 
nowadays.


> Don't serialise Schema when serialising Avro GenericRecord
> --
>
> Key: FLINK-6022
> URL: https://issues.apache.org/jira/browse/FLINK-6022
> Project: Flink
>  Issue Type: Improvement
>  Components: Type Serialization System
>Reporter: Robert Metzger
>Assignee: Stephan Ewen
>Priority: Major
> Fix For: 1.5.0
>
>
> Currently, Flink is serializing the schema for each Avro GenericRecord in the 
> stream.
> This leads to a lot of overhead over the wire/disk + high serialization costs.
> Therefore, I'm proposing to improve the support for GenericRecord in Flink by 
> shipping the schema to each serializer  through the AvroTypeInformation.
> Then, we can only support GenericRecords with the same type per stream, but 
> the performance will be much better.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (FLINK-6022) Don't serialise Schema when serialising Avro GenericRecord

2017-11-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-6022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16246015#comment-16246015
 ] 

ASF GitHub Bot commented on FLINK-6022:
---

Github user StephanEwen closed the pull request at:

https://github.com/apache/flink/pull/4943


> Don't serialise Schema when serialising Avro GenericRecord
> --
>
> Key: FLINK-6022
> URL: https://issues.apache.org/jira/browse/FLINK-6022
> Project: Flink
>  Issue Type: Improvement
>  Components: Type Serialization System
>Reporter: Robert Metzger
>Assignee: Stephan Ewen
> Fix For: 1.5.0
>
>
> Currently, Flink is serializing the schema for each Avro GenericRecord in the 
> stream.
> This leads to a lot of overhead over the wire/disk + high serialization costs.
> Therefore, I'm proposing to improve the support for GenericRecord in Flink by 
> shipping the schema to each serializer  through the AvroTypeInformation.
> Then, we can only support GenericRecords with the same type per stream, but 
> the performance will be much better.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (FLINK-6022) Don't serialise Schema when serialising Avro GenericRecord

2017-11-09 Thread Robert Metzger (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-6022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16245640#comment-16245640
 ] 

Robert Metzger commented on FLINK-6022:
---

This whole JIRA is only about the case when people are using a 
DataStream. 
I agree its not a good idea, but people are doing it. As stated in the JIRA 
description, the schema of all records in a stream need to be the same for this 
to work.
I believe we should keep this JIRA open, because the problem has not been 
addressed.

> Don't serialise Schema when serialising Avro GenericRecord
> --
>
> Key: FLINK-6022
> URL: https://issues.apache.org/jira/browse/FLINK-6022
> Project: Flink
>  Issue Type: Improvement
>  Components: Type Serialization System
>Reporter: Robert Metzger
>Assignee: Stephan Ewen
> Fix For: 1.5.0
>
>
> Currently, Flink is serializing the schema for each Avro GenericRecord in the 
> stream.
> This leads to a lot of overhead over the wire/disk + high serialization costs.
> Therefore, I'm proposing to improve the support for GenericRecord in Flink by 
> shipping the schema to each serializer  through the AvroTypeInformation.
> Then, we can only support GenericRecords with the same type per stream, but 
> the performance will be much better.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (FLINK-6022) Don't serialise Schema when serialising Avro GenericRecord

2017-11-09 Thread Stephan Ewen (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-6022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16245577#comment-16245577
 ] 

Stephan Ewen commented on FLINK-6022:
-

We are not serializing the schema in the Avro Serializer. If the Avro 
Serializer is chosen, this is fixed.

I am wondering if the case is if one uses explicitly a "generic record" from 
Avro as the exchange data type. That is not a good idea in the first place in 
my opinion. In that case, isn't it possible that each generic record is 
different and thus you always need a schema anyways.

> Don't serialise Schema when serialising Avro GenericRecord
> --
>
> Key: FLINK-6022
> URL: https://issues.apache.org/jira/browse/FLINK-6022
> Project: Flink
>  Issue Type: Improvement
>  Components: Type Serialization System
>Reporter: Robert Metzger
>Assignee: Stephan Ewen
> Fix For: 1.5.0
>
>
> Currently, Flink is serializing the schema for each Avro GenericRecord in the 
> stream.
> This leads to a lot of overhead over the wire/disk + high serialization costs.
> Therefore, I'm proposing to improve the support for GenericRecord in Flink by 
> shipping the schema to each serializer  through the AvroTypeInformation.
> Then, we can only support GenericRecords with the same type per stream, but 
> the performance will be much better.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (FLINK-6022) Don't serialise Schema when serialising Avro GenericRecord

2017-11-08 Thread Aljoscha Krettek (JIRA)

[ 
https://issues.apache.org/jira/browse/FLINK-6022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16243822#comment-16243822
 ] 

Aljoscha Krettek commented on FLINK-6022:
-

I changed the title and will move to 1.5. Is it even possible to tell Avro to 
not serialise the schema?

> Don't serialise Schema when serialising Avro GenericRecord
> --
>
> Key: FLINK-6022
> URL: https://issues.apache.org/jira/browse/FLINK-6022
> Project: Flink
>  Issue Type: Improvement
>  Components: Type Serialization System
>Reporter: Robert Metzger
>Assignee: Stephan Ewen
>Priority: Blocker
> Fix For: 1.4.0
>
>
> Currently, Flink is serializing the schema for each Avro GenericRecord in the 
> stream.
> This leads to a lot of overhead over the wire/disk + high serialization costs.
> Therefore, I'm proposing to improve the support for GenericRecord in Flink by 
> shipping the schema to each serializer  through the AvroTypeInformation.
> Then, we can only support GenericRecords with the same type per stream, but 
> the performance will be much better.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)