[jira] [Commented] (FLUME-2942) AvroEventDeserializer ignores header from spool source
[ https://issues.apache.org/jira/browse/FLUME-2942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16162944#comment-16162944 ] Sebastian Alfers commented on FLUME-2942: - [~fszabo] Any play if this gets attention in the future? > AvroEventDeserializer ignores header from spool source > -- > > Key: FLUME-2942 > URL: https://issues.apache.org/jira/browse/FLUME-2942 > Project: Flume > Issue Type: Bug >Affects Versions: 1.6.0 >Reporter: Sebastian Alfers > Fix For: 1.8.1 > > Attachments: FLUME-2942-0.patch > > > I have a spool file source and use avro for de-/serialization > In detail, serialized events store the topic of the kafka sink in the header. > When I load the events from the spool directory, the header are ignored. > Please see: > https://github.com/apache/flume/blob/caa64a1a6d4bc97be5993cb468516e9ffe862794/flume-ng-core/src/main/java/org/apache/flume/serialization/AvroEventDeserializer.java#L122 > You can see, it uses the whole event as body but does not distinguish between > the header and body encoded by avro. > Please verify that this is a bug. > I fixed this but by using the record that stores header and body separately. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (FLUME-2942) AvroEventDeserializer ignores header from spool source
[ https://issues.apache.org/jira/browse/FLUME-2942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15926235#comment-15926235 ] Sebastian Alfers commented on FLUME-2942: - Push > AvroEventDeserializer ignores header from spool source > -- > > Key: FLUME-2942 > URL: https://issues.apache.org/jira/browse/FLUME-2942 > Project: Flume > Issue Type: Bug >Affects Versions: v1.6.0 >Reporter: Sebastian Alfers > Attachments: FLUME-2942-0.patch > > > I have a spool file source and use avro for de-/serialization > In detail, serialized events store the topic of the kafka sink in the header. > When I load the events from the spool directory, the header are ignored. > Please see: > https://github.com/apache/flume/blob/caa64a1a6d4bc97be5993cb468516e9ffe862794/flume-ng-core/src/main/java/org/apache/flume/serialization/AvroEventDeserializer.java#L122 > You can see, it uses the whole event as body but does not distinguish between > the header and body encoded by avro. > Please verify that this is a bug. > I fixed this but by using the record that stores header and body separately. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (FLUME-2942) AvroEventDeserializer ignores header from spool source
[ https://issues.apache.org/jira/browse/FLUME-2942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15826279#comment-15826279 ] Sebastian Alfers commented on FLUME-2942: - [~lakshman] can you push the PR? > AvroEventDeserializer ignores header from spool source > -- > > Key: FLUME-2942 > URL: https://issues.apache.org/jira/browse/FLUME-2942 > Project: Flume > Issue Type: Bug >Affects Versions: v1.6.0 >Reporter: Sebastian Alfers > Attachments: FLUME-2942-0.patch > > > I have a spool file source and use avro for de-/serialization > In detail, serialized events store the topic of the kafka sink in the header. > When I load the events from the spool directory, the header are ignored. > Please see: > https://github.com/apache/flume/blob/caa64a1a6d4bc97be5993cb468516e9ffe862794/flume-ng-core/src/main/java/org/apache/flume/serialization/AvroEventDeserializer.java#L122 > You can see, it uses the whole event as body but does not distinguish between > the header and body encoded by avro. > Please verify that this is a bug. > I fixed this but by using the record that stores header and body separately. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLUME-2942) AvroEventDeserializer ignores header from spool source
[ https://issues.apache.org/jira/browse/FLUME-2942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15795182#comment-15795182 ] Sebastian Alfers commented on FLUME-2942: - [~lakshman] it would be the same as above but this value: agent1.sources.spool.deserializer = FLUME > AvroEventDeserializer ignores header from spool source > -- > > Key: FLUME-2942 > URL: https://issues.apache.org/jira/browse/FLUME-2942 > Project: Flume > Issue Type: Bug >Affects Versions: v1.6.0 >Reporter: Sebastian Alfers > Attachments: FLUME-2942-0.patch > > > I have a spool file source and use avro for de-/serialization > In detail, serialized events store the topic of the kafka sink in the header. > When I load the events from the spool directory, the header are ignored. > Please see: > https://github.com/apache/flume/blob/caa64a1a6d4bc97be5993cb468516e9ffe862794/flume-ng-core/src/main/java/org/apache/flume/serialization/AvroEventDeserializer.java#L122 > You can see, it uses the whole event as body but does not distinguish between > the header and body encoded by avro. > Please verify that this is a bug. > I fixed this but by using the record that stores header and body separately. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLUME-2942) AvroEventDeserializer ignores header from spool source
[ https://issues.apache.org/jira/browse/FLUME-2942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15794265#comment-15794265 ] Laxman commented on FLUME-2942: --- Thanks [~sebalf] for the update. Will review and try to verify soon. > AvroEventDeserializer ignores header from spool source > -- > > Key: FLUME-2942 > URL: https://issues.apache.org/jira/browse/FLUME-2942 > Project: Flume > Issue Type: Bug >Affects Versions: v1.6.0 >Reporter: Sebastian Alfers > Attachments: FLUME-2942-0.patch > > > I have a spool file source and use avro for de-/serialization > In detail, serialized events store the topic of the kafka sink in the header. > When I load the events from the spool directory, the header are ignored. > Please see: > https://github.com/apache/flume/blob/caa64a1a6d4bc97be5993cb468516e9ffe862794/flume-ng-core/src/main/java/org/apache/flume/serialization/AvroEventDeserializer.java#L122 > You can see, it uses the whole event as body but does not distinguish between > the header and body encoded by avro. > Please verify that this is a bug. > I fixed this but by using the record that stores header and body separately. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLUME-2942) AvroEventDeserializer ignores header from spool source
[ https://issues.apache.org/jira/browse/FLUME-2942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15792976#comment-15792976 ] Sebastian Alfers commented on FLUME-2942: - [~lakshman] Thank you for participation. I update my pull request. What do you think? > AvroEventDeserializer ignores header from spool source > -- > > Key: FLUME-2942 > URL: https://issues.apache.org/jira/browse/FLUME-2942 > Project: Flume > Issue Type: Bug >Affects Versions: v1.6.0 >Reporter: Sebastian Alfers > Attachments: FLUME-2942-0.patch > > > I have a spool file source and use avro for de-/serialization > In detail, serialized events store the topic of the kafka sink in the header. > When I load the events from the spool directory, the header are ignored. > Please see: > https://github.com/apache/flume/blob/caa64a1a6d4bc97be5993cb468516e9ffe862794/flume-ng-core/src/main/java/org/apache/flume/serialization/AvroEventDeserializer.java#L122 > You can see, it uses the whole event as body but does not distinguish between > the header and body encoded by avro. > Please verify that this is a bug. > I fixed this but by using the record that stores header and body separately. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLUME-2942) AvroEventDeserializer ignores header from spool source
[ https://issues.apache.org/jira/browse/FLUME-2942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15782068#comment-15782068 ] Laxman commented on FLUME-2942: --- Answered related query in user mailing list. May be useful for further discussion here. {noformat} In one-liner, FlumeEventAvroEventSerializer and AvroEventDeserializer are not in sync and they can't be used as a serde pair. Flume's built-in avro serializer FlumeEventAvroEventSerializer which serializes Flume events with shell. It's important to note that, here actual raw event is wrapped inside the flume shell object and this raw object is treated as binary (which can be thrift, avro, or just a byte array, etc). Flume's built-in avro deserializer AvroEventDeserializer which deserializes any generic event serialized and it wraps the deserialized event into another flume shell object. This means as per your configuration, spool directory source (persistence-dev-source) will get an double wrapped flume event (FlumeEvent -> FlumeEvent -> raw event body) To solve this problem, we need to have serializer and deserializer to be in sync. We can achieve it in either of the following approaches. - Use a custom FluemEventAvroEventDeserializer to extract directly FlumeEvent without double wrapper and use it with spool directory source. Similar attempt has already been made by Sebastian here. https://issues.apache.org/jira/browse/FLUME-2942 I personally recommend to write a FlumeEventAvroEventDeserializer than to modify the existing one. - Use a custom AvroEventSerializer to directly serialize the avro event and use it with file_roll sink. Reference implementation is available in hdfs sink (org.apache.flume.sink.hdfs.AvroEventSerializer) You may strip of hdfs dependencies from it achieve what you want. {noformat} > AvroEventDeserializer ignores header from spool source > -- > > Key: FLUME-2942 > URL: https://issues.apache.org/jira/browse/FLUME-2942 > Project: Flume > Issue Type: Bug >Affects Versions: v1.6.0 >Reporter: Sebastian Alfers > Attachments: FLUME-2942-0.patch > > > I have a spool file source and use avro for de-/serialization > In detail, serialized events store the topic of the kafka sink in the header. > When I load the events from the spool directory, the header are ignored. > Please see: > https://github.com/apache/flume/blob/caa64a1a6d4bc97be5993cb468516e9ffe862794/flume-ng-core/src/main/java/org/apache/flume/serialization/AvroEventDeserializer.java#L122 > You can see, it uses the whole event as body but does not distinguish between > the header and body encoded by avro. > Please verify that this is a bug. > I fixed this but by using the record that stores header and body separately. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLUME-2942) AvroEventDeserializer ignores header from spool source
[ https://issues.apache.org/jira/browse/FLUME-2942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15766433#comment-15766433 ] ASF GitHub Bot commented on FLUME-2942: --- GitHub user sebastian-alfers opened a pull request: https://github.com/apache/flume/pull/99 Read header and footer if available This commit fixes an issue, that header-values can not be restored correctly after an event was avro-serialized to disk. The problem was, that deserializing the event from disk, the body of the event contained the binary and the header. See: https://issues.apache.org/jira/browse/FLUME-2942 Discussion welcome! You can merge this pull request into a Git repository by running: $ git pull https://github.com/sebastian-alfers/flume FLUME-2942 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/flume/pull/99.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #99 commit 84a1783217428f683a21c6199d83035792f7d718 Author: sa Date: 2016-12-21T08:09:26Z Read header and footer if available > AvroEventDeserializer ignores header from spool source > -- > > Key: FLUME-2942 > URL: https://issues.apache.org/jira/browse/FLUME-2942 > Project: Flume > Issue Type: Bug >Affects Versions: v1.6.0 >Reporter: Sebastian Alfers > Attachments: FLUME-2942-0.patch > > > I have a spool file source and use avro for de-/serialization > In detail, serialized events store the topic of the kafka sink in the header. > When I load the events from the spool directory, the header are ignored. > Please see: > https://github.com/apache/flume/blob/caa64a1a6d4bc97be5993cb468516e9ffe862794/flume-ng-core/src/main/java/org/apache/flume/serialization/AvroEventDeserializer.java#L122 > You can see, it uses the whole event as body but does not distinguish between > the header and body encoded by avro. > Please verify that this is a bug. > I fixed this but by using the record that stores header and body separately. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLUME-2942) AvroEventDeserializer ignores header from spool source
[ https://issues.apache.org/jira/browse/FLUME-2942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15764311#comment-15764311 ] Sebastian Alfers commented on FLUME-2942: - Added a patch > AvroEventDeserializer ignores header from spool source > -- > > Key: FLUME-2942 > URL: https://issues.apache.org/jira/browse/FLUME-2942 > Project: Flume > Issue Type: Bug >Affects Versions: v1.6.0 >Reporter: Sebastian Alfers > Attachments: FLUME-2942-0.patch > > > I have a spool file source and use avro for de-/serialization > In detail, serialized events store the topic of the kafka sink in the header. > When I load the events from the spool directory, the header are ignored. > Please see: > https://github.com/apache/flume/blob/caa64a1a6d4bc97be5993cb468516e9ffe862794/flume-ng-core/src/main/java/org/apache/flume/serialization/AvroEventDeserializer.java#L122 > You can see, it uses the whole event as body but does not distinguish between > the header and body encoded by avro. > Please verify that this is a bug. > I fixed this but by using the record that stores header and body separately. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLUME-2942) AvroEventDeserializer ignores header from spool source
[ https://issues.apache.org/jira/browse/FLUME-2942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15763715#comment-15763715 ] Sebastian Alfers commented on FLUME-2942: - [~mpercy] any updates on this? > AvroEventDeserializer ignores header from spool source > -- > > Key: FLUME-2942 > URL: https://issues.apache.org/jira/browse/FLUME-2942 > Project: Flume > Issue Type: Bug >Affects Versions: v1.6.0 >Reporter: Sebastian Alfers > > I have a spool file source and use avro for de-/serialization > In detail, serialized events store the topic of the kafka sink in the header. > When I load the events from the spool directory, the header are ignored. > Please see: > https://github.com/apache/flume/blob/caa64a1a6d4bc97be5993cb468516e9ffe862794/flume-ng-core/src/main/java/org/apache/flume/serialization/AvroEventDeserializer.java#L122 > You can see, it uses the whole event as body but does not distinguish between > the header and body encoded by avro. > Please verify that this is a bug. > I fixed this but by using the record that stores header and body separately. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLUME-2942) AvroEventDeserializer ignores header from spool source
[ https://issues.apache.org/jira/browse/FLUME-2942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15372573#comment-15372573 ] Sebastian Alfers commented on FLUME-2942: - Hi [~mpercy] , thanks for you reply. This is our config: # AGENT SETTINGS agent1.channels = ch1 agent1.sources = thriftSrc spool agent1.sinks = kafka fileroll agent1.sinkgroups = g1 # MEMORY CHANNEL agent1.channels.ch1.type = memory agent1.channels.ch1.capacity = 1 agent1.channels.ch1.transactionCapacity = 500 # THRIFT (source) agent1.sources.thriftSrc.type = thrift agent1.sources.thriftSrc.channels = ch1 agent1.sources.thriftSrc.bind = 0.0.0.0 agent1.sources.thriftSrc.port = 4042 # SPOOLDIR (source) agent1.sources.spool.type = spooldir agent1.sources.spool.channels = ch1 agent1.sources.spool.spoolDir = /opt/flume-ng/failover/spool agent1.sources.spool.fileHeader = true agent1.sources.spool.deserializer = AVRO agent1.sources.thriftSrc.threads = 150 agent1.sinks.kafka.channel = ch1 agent1.sinks.kafka.type = org.apache.flume.sink.kafka.KafkaSink agent1.sinks.kafka.batchSize = 50 agent1.sinks.kafka.brokerList = plista590.plista.com:9092,plista591.plista.com:9092 #agent1.sinks.kafka.topic = HPTStream.raw # FILE ROLL (failover sink) agent1.sinks.fileroll.type = file_roll agent1.sinks.fileroll.channel = ch1 agent1.sinks.fileroll.sink.directory = /opt/flume-ng/failover/data agent1.sinks.fileroll.sink.serializer = avro_event # FAILOVER GROUP agent1.sinkgroups.g1.sinks = kafka fileroll agent1.sinkgroups.g1.processor.type = failover agent1.sinkgroups.g1.processor.priority.kafka = 10 agent1.sinkgroups.g1.processor.priority.fileroll = 5 agent1.sinkgroups.g1.processor.maxpenalty = 1 Please look at the agent1.sources.spool.deserializer config. It refers to the reference above. Here, we use our FQCN to apply the fix. > AvroEventDeserializer ignores header from spool source > -- > > Key: FLUME-2942 > URL: https://issues.apache.org/jira/browse/FLUME-2942 > Project: Flume > Issue Type: Bug >Affects Versions: v1.6.0 >Reporter: Sebastian Alfers > > I have a spool file source and use avro for de-/serialization > In detail, serialized events store the topic of the kafka sink in the header. > When I load the events from the spool directory, the header are ignored. > Please see: > https://github.com/apache/flume/blob/caa64a1a6d4bc97be5993cb468516e9ffe862794/flume-ng-core/src/main/java/org/apache/flume/serialization/AvroEventDeserializer.java#L122 > You can see, it uses the whole event as body but does not distinguish between > the header and body encoded by avro. > Please verify that this is a bug. > I fixed this but by using the record that stores header and body separately. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (FLUME-2942) AvroEventDeserializer ignores header from spool source
[ https://issues.apache.org/jira/browse/FLUME-2942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15368868#comment-15368868 ] Mike Percy commented on FLUME-2942: --- Hi [~sebalf], please provide more detail such as the relevant Flume configuration. > AvroEventDeserializer ignores header from spool source > -- > > Key: FLUME-2942 > URL: https://issues.apache.org/jira/browse/FLUME-2942 > Project: Flume > Issue Type: Bug >Affects Versions: v1.6.0 >Reporter: Sebastian Alfers > > I have a spool file source and use avro for de-/serialization > In detail, serialized events store the topic of the kafka sink in the header. > When I load the events from the spool directory, the header are ignored. > Please see: > https://github.com/apache/flume/blob/caa64a1a6d4bc97be5993cb468516e9ffe862794/flume-ng-core/src/main/java/org/apache/flume/serialization/AvroEventDeserializer.java#L122 > You can see, it uses the whole event as body but does not distinguish between > the header and body encoded by avro. > Please verify that this is a bug. > I fixed this but by using the record that stores header and body separately. -- This message was sent by Atlassian JIRA (v6.3.4#6332)