Update: So, I changed my serializer from org.apache.flume.sink.hdfs.AvroEventSerializer$Builder to avro_event, and this started working. Well, working-ish, the data is a little funky but it¹s arriving, being delivered to HDFS, and I can pull a file and examine it manually.
I seem to remember that I had the former based on some things I read about not having to specify a schema, since the schema is embedded in the avro data. So I¹m confused, it seems that my previous configuration should have worked without any special attention to the schema, but I got complaints that the schema couldn¹t be found. If anyone could shed a bit of light here, it would be much appreciated. From: Justin Ryan <[email protected]> Reply-To: <[email protected]> Date: Monday, February 29, 2016 at 2:52 PM To: "[email protected]" <[email protected]> Subject: Avro source: could not find schema for event Hiya, I¹ve got a fairly simply flume agent pulling events from kafka and landing them in HDFS. For plain text messages, this works fine. I created a topic specifically for the purpose of testing sending avro messages through kafka to land in HDFS, which I¹m having some trouble with. I noted from https://thisdataguy.com/2014/07/28/avro-end-to-end-in-hdfs-part-2-flume-setu p/ the example of flume¹s default avro schema[0], which will do for my testing, and set up my python-avro producer to send messages with this schema. Unfortunately, I still have flume looping this message in its¹ log: org.apache.flume.FlumeException: Could not find schema for event I¹m running out of assumptions to rethink / verify here, would appreciate any guidance on what I may be missing.. Thanks in advance, Justin [0] { "type": "record", "name": "Event", "fields": [{ "name": "headers", "type": { "type": "map", "values": "string" } }, { "name": "body", "type": "bytes" }] }
