You can use a URL (on HDFS/HTTP), that points to the schema: https://github.com/apache/flume/blob/trunk/flume-ng-sinks/flume-hdfs-sink/src/main/java/org/apache/flume/sink/hdfs/AvroEventSerializer.java#L70
Use that URL to store your schema for the event, so you don't have to add it to the event itself. Avro schema is only embedded in the files and not in event data, so we need to make sure we write to the correct file based on the event's own schema. avro_event works because we write the events out in a fixed schema (not the event's schema itself). Thanks, Hari On Tue, Mar 8, 2016 at 1:05 PM, Justin Ryan <[email protected]> wrote: > Hiya folks, still struggling with this, is anyone on the list familiar > with AvroEventSerializer$Builder ? > > While I have gotten past my outright failure, I’ve only done so by > adopting a fairly inflexible schema, which seems counter to the goal of > using avro. Particularly frustrating is that flume simply needs to pass > the existing message along, though I understand it likely needs to grok to > separate messages. I can’t even find Kafka consumer code which is capable > of being schema-aware. > > From: Justin Ryan <[email protected]> > Reply-To: <[email protected]> > Date: Thursday, March 3, 2016 at 2:08 PM > To: <[email protected]> > Subject: Re: Avro source: could not find schema for event > > Update: > > So, I changed my serializer from > org.apache.flume.sink.hdfs.AvroEventSerializer$Builder to avro_event, and > this started working. Well, working-ish, the data is a little funky but > it’s arriving, being delivered to HDFS, and I can pull a file and examine > it manually. > > I seem to remember that I had the former based on some things I read about > not having to specify a schema, since the schema is embedded in the avro > data. > > So I’m confused, it seems that my previous configuration should have > worked without any special attention to the schema, but I got complaints > that the schema couldn’t be found. > > If anyone could shed a bit of light here, it would be much appreciated. > > From: Justin Ryan <[email protected]> > Reply-To: <[email protected]> > Date: Monday, February 29, 2016 at 2:52 PM > To: "[email protected]" <[email protected]> > Subject: Avro source: could not find schema for event > > Hiya, > > I’ve got a fairly simply flume agent pulling events from kafka and landing > them in HDFS. For plain text messages, this works fine. > > I created a topic specifically for the purpose of testing sending avro > messages through kafka to land in HDFS, which I’m having some trouble with. > > I noted from > https://thisdataguy.com/2014/07/28/avro-end-to-end-in-hdfs-part-2-flume-setup/ > the example of flume’s default avro schema[0], which will do for my > testing, and set up my python-avro producer to send messages with this > schema. Unfortunately, I still have flume looping this message in its’ log: > > org.apache.flume.FlumeException: Could not find schema for event > > I’m running out of assumptions to rethink / verify here, would appreciate > any guidance on what I may be missing.. > > Thanks in advance, > > Justin > > [0] { > "type": "record", > "name": "Event", > "fields": [{ > "name": "headers", > "type": { > "type": "map", > "values": "string" > } > }, { > "name": "body", > "type": "bytes" > }] > } > >
