Re: Appending to .avro log files

Terry Healy Wed, 09 Jan 2013 08:22:44 -0800

Thank you Scott - that did the trick. It seems that I may need to reduce
my sync value as well.



On 01/08/2013 04:14 AM, Scott Carey wrote:
> A sync marker delimits each block in the avro file.  If you want to start
> reading data from the middle of a 100GB file, DataFileReader will seek to
> the middle and find the next sync marker.  Each block can be individually
> compressed, and by default when writing a file the writer will not
> compress the block and flush to disk until a block as gotten as large as
> the sync interval in bytes.    Alternatively, you can manually sync().
> 
> If you have a 1000000 byte sync interval, you may not see any data reach
> disk until that many bytes have been written (or sync() is called
> manually).
> 
> Your problem is likely that the first block in the file has not been
> flushed to disk yet, and therefore the file is corrupt and missing a
> trailing sync marker.
> 
> On 1/3/13 12:36 PM, "Terry Healy" <[email protected]> wrote:
> 
>> Hello-
>>
>> I'm upgrading a logging program to append GenericRecords to a .avro file
>> instead of text (.tsv). I have a working schema that is used to convert
>> existing .tsv of the same format into .avro and that works fine.
>>
>> When I run a test writing 30,000 bogus records, it runs but when I try
>> to use "avro-tools-1.7.3.jar tojson" on the output file, it reports:
>>
>> "AvroRuntimeException: java.io.IOException: Invalid sync!"
>>
>> The file is still open at this point since the logging program is
>> running. Is this expected behavior because it is still open? (getmeta
>> and getschema work fine).
>>
>> I'm not sure if it has any bearing, since I never really understood the
>> function of the the AVRO sync interval; in this and the working programs
>> it is set to 1000000.
>>
>> Any ideas appreciated.
>>
>> -Terry
> 
>

Re: Appending to .avro log files

Reply via email to