Re: CustomSerializer throws org.apache.avro.AvroRuntimeException: not open

Sunita Arvind Thu, 28 Jul 2016 09:28:43 -0700

Nicolas,

For my usecase, I have the schema so I used the maven plugin to generate
avro classes for it. However, for your usecase where you do not have that
and just have a .avro, looks like this is an easy way:


java -jar ~/avro-tools-1.7.7.jar getschema twitter.avro > twitter.avsc

Found it here - https://github.com/miguno/avro-cli-examples

Another option as you are not using java is,
On mac :
1. ruby -e "$(curl -fsSL
https://raw.githubusercontent.com/Homebrew/install/master/install)" <
/dev/null 2> /dev/null
2. brew install avro-tools
3. avro-tools getschema file.avro

Typing avro-tools should give you the usage details and that might get you
further details. Hope this helps.

regards
Sunita

On Thu, Jul 28, 2016 at 6:29 AM, Nicolas Ranc <[email protected]> wrote:

> Dear Sunita,
>
> Thank you for your fast answer.
>
> It's not exactly what i'm expected.
> I am using apache avro C++ and i would like to deserialize an .avro file.
> Just with one .avro file and Avro C++ functions i'm trying to extract the
> schematic, the keys on the schema and the data at the end of the file.
> ("avroProgram2.avro").
>
> I saw different functions: for example 
> *http://avro.apache.org/docs/1.7.6/api/cpp/html/
> <http://avro.apache.org/docs/1.7.6/api/cpp/html/>* in the file:*
> resolving.cc*, the program use load(...) to import schema and use it
> after in the avro::resolvingDecoder.
> In my case, i can't import this schema for deserialization: i just have
> the .avro file and i'm searching function to store information from this
> file avro file (extract the schema, the keys and values). I need these
> information because i have to create new object after in Matlab - using Mex
> functions (in C++).
>
> Thanks you for your time,
> Nicolas Ranc
>
>
>
>
>
>
>
>
>
>
>
>
>
> 2016-07-27 19:10 GMT+02:00 Sunita Arvind <[email protected]>:
>
>> For benefit of anyone else hitting the same issue, here is what I found:
>>
>> The serializer I was using was extending AbstractAvroEventSerializer.
>> This class has a lot of adoption, so its not likely to be an issue in the
>> abstract class. However, I got rid of this issue by overriding the
>> configure method in AbstractAvroEventSerializer in my custom serializer, as
>> below:
>>
>>
>> public void configure(Context context) {
>>     int syncIntervalBytes = context.getInteger("syncIntervalBytes", 
>> Integer.valueOf(2048000)).intValue();
>>     String compressionCodec = context.getString("compressionCodec", "null");
>>     this.writer = new ReflectDatumWriter(this.getSchema());
>>     this.dataFileWriter = new DataFileWriter(this.writer);
>>     this.dataFileWriter.setSyncInterval(syncIntervalBytes);
>>     try {
>>         CodecFactory e = CodecFactory.fromString(compressionCodec);
>>         this.dataFileWriter.setCodec(e);
>>       *  this.dataFileWriter.create(schema,out); --> added the creation *
>>     } catch (AvroRuntimeException var5) {
>>         logger.warn("Unable to instantiate avro codec with name (" + 
>> compressionCodec + "). Compression disabled. Exception follows.", var5);
>>     } catch (IOException io){
>>         logger.warn("Could not open dataFileWriter Exception follows.", 
>> io.getStackTrace());
>>     }
>>
>> }
>>
>> After this, the files are getting created in hdfs just right.
>> I was also able to view the files in spark using spark-avro package.
>> Hope this is the right way to do it and the solution helps someone.
>> Would love to hear if anyone in avro or flume community knows of a better 
>> way to do it.
>>
>> regards
>> Sunita
>>
>>
>> On Tue, Jul 26, 2016 at 12:45 PM, Sunita Arvind <[email protected]>
>> wrote:
>>
>>> Hello Experts,
>>>
>>> I am trying to convert a custom data source received in flume into avro
>>> and push to hdfs. What I am attempting to do is
>>> syslog -> flume -> flume interceptor to convert into
>>> avroObject.toByteArray -> hdfs serializer which decodes the byteArray back
>>> to Avro
>>>
>>> The flume configuration looks like:
>>>
>>> tier1.sources.syslogsource.interceptors.i2.type=timestamp
>>> tier1.sources.syslogsource.interceptors.i2.preserveExisting=true
>>> tier1.sources.syslogsource.interceptors.i1.dataSourceType=DataSource1
>>> tier1.sources.syslogsource.interceptors.i1.type =
>>> com.flume.CustomToAvroConvertInterceptor$Builder
>>>
>>> #hdfs sink for archival and batch analysis
>>> tier1.sinks.hdfssink.type = hdfs
>>> tier1.sinks.hdfssink.hdfs.writeFormat = Text
>>> tier1.sinks.hdfssink.hdfs.fileType = DataStream
>>>
>>> tier1.sinks.hdfssink.hdfs.filePrefix=%{flumeHost}-%{host}%{customerId}-%Y%m%d-%H
>>> tier1.sinks.hdfssink.hdfs.inUsePrefix=_
>>>
>>> tier1.sinks.hdfssink.hdfs.path=/hive/rawavro/customer_id=%{customerId}/date=%Y%m%d/hr=%H
>>> tier1.sinks.hdfssink.hdfs.fileSuffix=.avro
>>> # roll file if it's been 10 * 60 seconds = 600
>>> tier1.sinks.hdfssink.hdfs.rollInterval=600
>>> # roll file if we get 50,000 log lines (~25MB)
>>> tier1.sinks.hdfssink.hdfs.rollCount=0
>>> tier1.sinks.hdfssink.hdfs.batchSize = 100
>>> tier1.sinks.hdfssink.hdfs.rollSize=0
>>> tier1.sinks.hdfssink.serializer=com.flume.RawAvroHiveSerializer$Builder
>>> tier1.sinks.hdfssink.serializer.compressionCodec=snappy
>>> tier1.sinks.hdfssink.channel = hdfsmem
>>>
>>> When I use tier1.sinks.hdfssink.serializer=avro_event
>>> I get binary data stored into hdfs which is the
>>> CustomToAvroConvertInterceptor.intercept(event.getbody).toByteArray ,
>>> however this data cannot be parsed in hive. As a result, I see all nulls in
>>> the column values.
>>> Based on -
>>> https://cwiki.apache.org/confluence/display/AVRO/FAQ#FAQ-HowcanIserializedirectlyto/fromabytearray
>>> ?
>>> all I am doing in RawAvroHiveSerializer.convert is to decode using
>>> binary Decoder.
>>> The exception I get seems to be unrelated to the code itself, hence
>>> pasting the stack trace. Will share the code if it is required to identify
>>> the rootcause:
>>>
>>> 2016-07-26 19:15:27,187 ERROR org.apache.flume.SinkRunner: Unable to
>>> deliver event. Exception follows.
>>> org.apache.flume.EventDeliveryException:
>>> org.apache.avro.AvroRuntimeException: not open
>>>         at
>>> org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:463)
>>>         at
>>> org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
>>>         at
>>> org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:147)
>>>         at java.lang.Thread.run(Thread.java:745)
>>> Caused by: org.apache.avro.AvroRuntimeException: not open
>>>         at
>>> org.apache.avro.file.DataFileWriter.assertOpen(DataFileWriter.java:82)
>>>         at
>>> org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:299)
>>>         at
>>> org.apache.flume.serialization.AbstractAvroEventSerializer.write(AbstractAvroEventSerializer.java:108)
>>>         at
>>> org.apache.flume.sink.hdfs.HDFSDataStream.append(HDFSDataStream.java:124)
>>>         at
>>> org.apache.flume.sink.hdfs.BucketWriter$7.call(BucketWriter.java:550)
>>>         at
>>> org.apache.flume.sink.hdfs.BucketWriter$7.call(BucketWriter.java:547)
>>>         at
>>> org.apache.flume.sink.hdfs.BucketWriter$9$1.run(BucketWriter.java:679)
>>>         at
>>> org.apache.flume.auth.SimpleAuthenticator.execute(SimpleAuthenticator.java:50)
>>>
>>> I can reproduce this local file system as well. In the testcase, I tried
>>> setting the file open to append=true and still encounter the same exception.
>>>
>>> Appreciate any guidance in this regard.
>>>
>>> regards
>>> Sunita
>>>
>>
>>
>

Re: CustomSerializer throws org.apache.avro.AvroRuntimeException: not open

Reply via email to