Thank you Scott - that did the trick. It seems that I may need to reduce my sync value as well.
On 01/08/2013 04:14 AM, Scott Carey wrote: > A sync marker delimits each block in the avro file. If you want to start > reading data from the middle of a 100GB file, DataFileReader will seek to > the middle and find the next sync marker. Each block can be individually > compressed, and by default when writing a file the writer will not > compress the block and flush to disk until a block as gotten as large as > the sync interval in bytes. Alternatively, you can manually sync(). > > If you have a 1000000 byte sync interval, you may not see any data reach > disk until that many bytes have been written (or sync() is called > manually). > > Your problem is likely that the first block in the file has not been > flushed to disk yet, and therefore the file is corrupt and missing a > trailing sync marker. > > On 1/3/13 12:36 PM, "Terry Healy" <[email protected]> wrote: > >> Hello- >> >> I'm upgrading a logging program to append GenericRecords to a .avro file >> instead of text (.tsv). I have a working schema that is used to convert >> existing .tsv of the same format into .avro and that works fine. >> >> When I run a test writing 30,000 bogus records, it runs but when I try >> to use "avro-tools-1.7.3.jar tojson" on the output file, it reports: >> >> "AvroRuntimeException: java.io.IOException: Invalid sync!" >> >> The file is still open at this point since the logging program is >> running. Is this expected behavior because it is still open? (getmeta >> and getschema work fine). >> >> I'm not sure if it has any bearing, since I never really understood the >> function of the the AVRO sync interval; in this and the working programs >> it is set to 1000000. >> >> Any ideas appreciated. >> >> -Terry > >
