[jira] [Commented] (FLUME-2942) AvroEventDeserializer ignores header from spool source

2017-09-12 Thread Sebastian Alfers (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16162944#comment-16162944
 ] 

Sebastian Alfers commented on FLUME-2942:
-

[~fszabo] Any play if this gets attention in the future?

> AvroEventDeserializer ignores header from spool source
> --
>
> Key: FLUME-2942
> URL: https://issues.apache.org/jira/browse/FLUME-2942
> Project: Flume
>  Issue Type: Bug
>Affects Versions: 1.6.0
>Reporter: Sebastian Alfers
> Fix For: 1.8.1
>
> Attachments: FLUME-2942-0.patch
>
>
> I have a spool file source and use avro for de-/serialization
> In detail, serialized events store the topic of the kafka sink in the header.
> When I load the events from the spool directory, the header are ignored. 
> Please see: 
> https://github.com/apache/flume/blob/caa64a1a6d4bc97be5993cb468516e9ffe862794/flume-ng-core/src/main/java/org/apache/flume/serialization/AvroEventDeserializer.java#L122
> You can see, it uses the whole event as body but does not distinguish between 
> the header and body encoded by avro.
> Please verify that this is a bug.
> I fixed this but by using the record that stores header and body separately.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (FLUME-2942) AvroEventDeserializer ignores header from spool source

2017-03-15 Thread Sebastian Alfers (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15926235#comment-15926235
 ] 

Sebastian Alfers commented on FLUME-2942:
-

Push

> AvroEventDeserializer ignores header from spool source
> --
>
> Key: FLUME-2942
> URL: https://issues.apache.org/jira/browse/FLUME-2942
> Project: Flume
>  Issue Type: Bug
>Affects Versions: v1.6.0
>Reporter: Sebastian Alfers
> Attachments: FLUME-2942-0.patch
>
>
> I have a spool file source and use avro for de-/serialization
> In detail, serialized events store the topic of the kafka sink in the header.
> When I load the events from the spool directory, the header are ignored. 
> Please see: 
> https://github.com/apache/flume/blob/caa64a1a6d4bc97be5993cb468516e9ffe862794/flume-ng-core/src/main/java/org/apache/flume/serialization/AvroEventDeserializer.java#L122
> You can see, it uses the whole event as body but does not distinguish between 
> the header and body encoded by avro.
> Please verify that this is a bug.
> I fixed this but by using the record that stores header and body separately.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (FLUME-2942) AvroEventDeserializer ignores header from spool source

2017-01-17 Thread Sebastian Alfers (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15826279#comment-15826279
 ] 

Sebastian Alfers commented on FLUME-2942:
-

[~lakshman] can you push the PR?

> AvroEventDeserializer ignores header from spool source
> --
>
> Key: FLUME-2942
> URL: https://issues.apache.org/jira/browse/FLUME-2942
> Project: Flume
>  Issue Type: Bug
>Affects Versions: v1.6.0
>Reporter: Sebastian Alfers
> Attachments: FLUME-2942-0.patch
>
>
> I have a spool file source and use avro for de-/serialization
> In detail, serialized events store the topic of the kafka sink in the header.
> When I load the events from the spool directory, the header are ignored. 
> Please see: 
> https://github.com/apache/flume/blob/caa64a1a6d4bc97be5993cb468516e9ffe862794/flume-ng-core/src/main/java/org/apache/flume/serialization/AvroEventDeserializer.java#L122
> You can see, it uses the whole event as body but does not distinguish between 
> the header and body encoded by avro.
> Please verify that this is a bug.
> I fixed this but by using the record that stores header and body separately.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLUME-2942) AvroEventDeserializer ignores header from spool source

2017-01-03 Thread Sebastian Alfers (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15795182#comment-15795182
 ] 

Sebastian Alfers commented on FLUME-2942:
-

[~lakshman] it would be the same as above but this value:

agent1.sources.spool.deserializer = FLUME

> AvroEventDeserializer ignores header from spool source
> --
>
> Key: FLUME-2942
> URL: https://issues.apache.org/jira/browse/FLUME-2942
> Project: Flume
>  Issue Type: Bug
>Affects Versions: v1.6.0
>Reporter: Sebastian Alfers
> Attachments: FLUME-2942-0.patch
>
>
> I have a spool file source and use avro for de-/serialization
> In detail, serialized events store the topic of the kafka sink in the header.
> When I load the events from the spool directory, the header are ignored. 
> Please see: 
> https://github.com/apache/flume/blob/caa64a1a6d4bc97be5993cb468516e9ffe862794/flume-ng-core/src/main/java/org/apache/flume/serialization/AvroEventDeserializer.java#L122
> You can see, it uses the whole event as body but does not distinguish between 
> the header and body encoded by avro.
> Please verify that this is a bug.
> I fixed this but by using the record that stores header and body separately.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLUME-2942) AvroEventDeserializer ignores header from spool source

2017-01-02 Thread Laxman (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15794265#comment-15794265
 ] 

Laxman commented on FLUME-2942:
---

Thanks [~sebalf] for the update. Will review and try to verify soon.

> AvroEventDeserializer ignores header from spool source
> --
>
> Key: FLUME-2942
> URL: https://issues.apache.org/jira/browse/FLUME-2942
> Project: Flume
>  Issue Type: Bug
>Affects Versions: v1.6.0
>Reporter: Sebastian Alfers
> Attachments: FLUME-2942-0.patch
>
>
> I have a spool file source and use avro for de-/serialization
> In detail, serialized events store the topic of the kafka sink in the header.
> When I load the events from the spool directory, the header are ignored. 
> Please see: 
> https://github.com/apache/flume/blob/caa64a1a6d4bc97be5993cb468516e9ffe862794/flume-ng-core/src/main/java/org/apache/flume/serialization/AvroEventDeserializer.java#L122
> You can see, it uses the whole event as body but does not distinguish between 
> the header and body encoded by avro.
> Please verify that this is a bug.
> I fixed this but by using the record that stores header and body separately.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLUME-2942) AvroEventDeserializer ignores header from spool source

2017-01-02 Thread Sebastian Alfers (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15792976#comment-15792976
 ] 

Sebastian Alfers commented on FLUME-2942:
-

[~lakshman] Thank you for participation. I update my pull request. What do you 
think?

> AvroEventDeserializer ignores header from spool source
> --
>
> Key: FLUME-2942
> URL: https://issues.apache.org/jira/browse/FLUME-2942
> Project: Flume
>  Issue Type: Bug
>Affects Versions: v1.6.0
>Reporter: Sebastian Alfers
> Attachments: FLUME-2942-0.patch
>
>
> I have a spool file source and use avro for de-/serialization
> In detail, serialized events store the topic of the kafka sink in the header.
> When I load the events from the spool directory, the header are ignored. 
> Please see: 
> https://github.com/apache/flume/blob/caa64a1a6d4bc97be5993cb468516e9ffe862794/flume-ng-core/src/main/java/org/apache/flume/serialization/AvroEventDeserializer.java#L122
> You can see, it uses the whole event as body but does not distinguish between 
> the header and body encoded by avro.
> Please verify that this is a bug.
> I fixed this but by using the record that stores header and body separately.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLUME-2942) AvroEventDeserializer ignores header from spool source

2016-12-27 Thread Laxman (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15782068#comment-15782068
 ] 

Laxman commented on FLUME-2942:
---

Answered related query in user mailing list. May be useful for further 
discussion here.

{noformat}
In one-liner, FlumeEventAvroEventSerializer and AvroEventDeserializer are not 
in sync and they can't be used as a serde pair.

Flume's built-in avro serializer FlumeEventAvroEventSerializer which serializes 
Flume events with shell. It's important to note that, here actual raw event is 
wrapped inside the flume shell object and this raw object is treated as binary 
(which can be thrift, avro, or just a byte array, etc). 
Flume's built-in avro deserializer AvroEventDeserializer which deserializes any 
generic event serialized and it wraps the deserialized event into another flume 
shell object.

This means as per your configuration, spool directory source 
(persistence-dev-source) will get an double wrapped flume event (FlumeEvent -> 
FlumeEvent -> raw event body)

To solve this problem, we need to have serializer and deserializer to be in 
sync. We can achieve it in either of the following approaches.
- Use a custom FluemEventAvroEventDeserializer to extract directly FlumeEvent 
without double wrapper and use it with spool directory source.

Similar attempt has already been made by Sebastian here.
https://issues.apache.org/jira/browse/FLUME-2942

I personally recommend to write a FlumeEventAvroEventDeserializer than to 
modify the existing one.

- Use a custom AvroEventSerializer to directly serialize the avro event and use 
it with file_roll sink.
Reference implementation is available in hdfs sink 
(org.apache.flume.sink.hdfs.AvroEventSerializer)
You may strip of hdfs dependencies from it achieve what you want.
{noformat}

> AvroEventDeserializer ignores header from spool source
> --
>
> Key: FLUME-2942
> URL: https://issues.apache.org/jira/browse/FLUME-2942
> Project: Flume
>  Issue Type: Bug
>Affects Versions: v1.6.0
>Reporter: Sebastian Alfers
> Attachments: FLUME-2942-0.patch
>
>
> I have a spool file source and use avro for de-/serialization
> In detail, serialized events store the topic of the kafka sink in the header.
> When I load the events from the spool directory, the header are ignored. 
> Please see: 
> https://github.com/apache/flume/blob/caa64a1a6d4bc97be5993cb468516e9ffe862794/flume-ng-core/src/main/java/org/apache/flume/serialization/AvroEventDeserializer.java#L122
> You can see, it uses the whole event as body but does not distinguish between 
> the header and body encoded by avro.
> Please verify that this is a bug.
> I fixed this but by using the record that stores header and body separately.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLUME-2942) AvroEventDeserializer ignores header from spool source

2016-12-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15766433#comment-15766433
 ] 

ASF GitHub Bot commented on FLUME-2942:
---

GitHub user sebastian-alfers opened a pull request:

https://github.com/apache/flume/pull/99

Read header and footer if available

This commit fixes an issue, that header-values can not be restored 
correctly after an event was avro-serialized to disk.

The problem was, that deserializing the event from disk, the body of the 
event contained the binary and the header.

See: https://issues.apache.org/jira/browse/FLUME-2942

Discussion welcome!

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sebastian-alfers/flume FLUME-2942

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/flume/pull/99.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #99


commit 84a1783217428f683a21c6199d83035792f7d718
Author: sa 
Date:   2016-12-21T08:09:26Z

Read header and footer if available




> AvroEventDeserializer ignores header from spool source
> --
>
> Key: FLUME-2942
> URL: https://issues.apache.org/jira/browse/FLUME-2942
> Project: Flume
>  Issue Type: Bug
>Affects Versions: v1.6.0
>Reporter: Sebastian Alfers
> Attachments: FLUME-2942-0.patch
>
>
> I have a spool file source and use avro for de-/serialization
> In detail, serialized events store the topic of the kafka sink in the header.
> When I load the events from the spool directory, the header are ignored. 
> Please see: 
> https://github.com/apache/flume/blob/caa64a1a6d4bc97be5993cb468516e9ffe862794/flume-ng-core/src/main/java/org/apache/flume/serialization/AvroEventDeserializer.java#L122
> You can see, it uses the whole event as body but does not distinguish between 
> the header and body encoded by avro.
> Please verify that this is a bug.
> I fixed this but by using the record that stores header and body separately.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLUME-2942) AvroEventDeserializer ignores header from spool source

2016-12-20 Thread Sebastian Alfers (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15764311#comment-15764311
 ] 

Sebastian Alfers commented on FLUME-2942:
-

Added a patch 

> AvroEventDeserializer ignores header from spool source
> --
>
> Key: FLUME-2942
> URL: https://issues.apache.org/jira/browse/FLUME-2942
> Project: Flume
>  Issue Type: Bug
>Affects Versions: v1.6.0
>Reporter: Sebastian Alfers
> Attachments: FLUME-2942-0.patch
>
>
> I have a spool file source and use avro for de-/serialization
> In detail, serialized events store the topic of the kafka sink in the header.
> When I load the events from the spool directory, the header are ignored. 
> Please see: 
> https://github.com/apache/flume/blob/caa64a1a6d4bc97be5993cb468516e9ffe862794/flume-ng-core/src/main/java/org/apache/flume/serialization/AvroEventDeserializer.java#L122
> You can see, it uses the whole event as body but does not distinguish between 
> the header and body encoded by avro.
> Please verify that this is a bug.
> I fixed this but by using the record that stores header and body separately.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLUME-2942) AvroEventDeserializer ignores header from spool source

2016-12-20 Thread Sebastian Alfers (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15763715#comment-15763715
 ] 

Sebastian Alfers commented on FLUME-2942:
-

[~mpercy] any updates on this?

> AvroEventDeserializer ignores header from spool source
> --
>
> Key: FLUME-2942
> URL: https://issues.apache.org/jira/browse/FLUME-2942
> Project: Flume
>  Issue Type: Bug
>Affects Versions: v1.6.0
>Reporter: Sebastian Alfers
>
> I have a spool file source and use avro for de-/serialization
> In detail, serialized events store the topic of the kafka sink in the header.
> When I load the events from the spool directory, the header are ignored. 
> Please see: 
> https://github.com/apache/flume/blob/caa64a1a6d4bc97be5993cb468516e9ffe862794/flume-ng-core/src/main/java/org/apache/flume/serialization/AvroEventDeserializer.java#L122
> You can see, it uses the whole event as body but does not distinguish between 
> the header and body encoded by avro.
> Please verify that this is a bug.
> I fixed this but by using the record that stores header and body separately.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLUME-2942) AvroEventDeserializer ignores header from spool source

2016-07-12 Thread Sebastian Alfers (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15372573#comment-15372573
 ] 

Sebastian Alfers commented on FLUME-2942:
-

Hi [~mpercy] , thanks for you reply.

This is our config:

# AGENT SETTINGS
agent1.channels = ch1
agent1.sources = thriftSrc spool
agent1.sinks = kafka fileroll
agent1.sinkgroups = g1

# MEMORY CHANNEL
agent1.channels.ch1.type = memory
agent1.channels.ch1.capacity = 1
agent1.channels.ch1.transactionCapacity = 500

# THRIFT (source)
agent1.sources.thriftSrc.type = thrift
agent1.sources.thriftSrc.channels = ch1
agent1.sources.thriftSrc.bind = 0.0.0.0
agent1.sources.thriftSrc.port = 4042

# SPOOLDIR (source)
agent1.sources.spool.type = spooldir
agent1.sources.spool.channels = ch1
agent1.sources.spool.spoolDir = /opt/flume-ng/failover/spool
agent1.sources.spool.fileHeader = true

agent1.sources.spool.deserializer = AVRO

agent1.sources.thriftSrc.threads = 150

agent1.sinks.kafka.channel = ch1 
agent1.sinks.kafka.type = org.apache.flume.sink.kafka.KafkaSink
agent1.sinks.kafka.batchSize = 50
agent1.sinks.kafka.brokerList = 
plista590.plista.com:9092,plista591.plista.com:9092
#agent1.sinks.kafka.topic = HPTStream.raw


# FILE ROLL (failover sink)
agent1.sinks.fileroll.type = file_roll
agent1.sinks.fileroll.channel = ch1
agent1.sinks.fileroll.sink.directory = /opt/flume-ng/failover/data
agent1.sinks.fileroll.sink.serializer = avro_event

# FAILOVER GROUP
agent1.sinkgroups.g1.sinks = kafka fileroll
agent1.sinkgroups.g1.processor.type = failover
agent1.sinkgroups.g1.processor.priority.kafka = 10
agent1.sinkgroups.g1.processor.priority.fileroll = 5
agent1.sinkgroups.g1.processor.maxpenalty = 1

Please look at the agent1.sources.spool.deserializer config. It refers to the 
reference above.

Here, we use our FQCN to apply the fix.

> AvroEventDeserializer ignores header from spool source
> --
>
> Key: FLUME-2942
> URL: https://issues.apache.org/jira/browse/FLUME-2942
> Project: Flume
>  Issue Type: Bug
>Affects Versions: v1.6.0
>Reporter: Sebastian Alfers
>
> I have a spool file source and use avro for de-/serialization
> In detail, serialized events store the topic of the kafka sink in the header.
> When I load the events from the spool directory, the header are ignored. 
> Please see: 
> https://github.com/apache/flume/blob/caa64a1a6d4bc97be5993cb468516e9ffe862794/flume-ng-core/src/main/java/org/apache/flume/serialization/AvroEventDeserializer.java#L122
> You can see, it uses the whole event as body but does not distinguish between 
> the header and body encoded by avro.
> Please verify that this is a bug.
> I fixed this but by using the record that stores header and body separately.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (FLUME-2942) AvroEventDeserializer ignores header from spool source

2016-07-08 Thread Mike Percy (JIRA)

[ 
https://issues.apache.org/jira/browse/FLUME-2942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15368868#comment-15368868
 ] 

Mike Percy commented on FLUME-2942:
---

Hi [~sebalf], please provide more detail such as the relevant Flume 
configuration.

> AvroEventDeserializer ignores header from spool source
> --
>
> Key: FLUME-2942
> URL: https://issues.apache.org/jira/browse/FLUME-2942
> Project: Flume
>  Issue Type: Bug
>Affects Versions: v1.6.0
>Reporter: Sebastian Alfers
>
> I have a spool file source and use avro for de-/serialization
> In detail, serialized events store the topic of the kafka sink in the header.
> When I load the events from the spool directory, the header are ignored. 
> Please see: 
> https://github.com/apache/flume/blob/caa64a1a6d4bc97be5993cb468516e9ffe862794/flume-ng-core/src/main/java/org/apache/flume/serialization/AvroEventDeserializer.java#L122
> You can see, it uses the whole event as body but does not distinguish between 
> the header and body encoded by avro.
> Please verify that this is a bug.
> I fixed this but by using the record that stores header and body separately.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)