Python version 2. I have an avro binary file. I'm not sure how to go from the "bad" version to something that with retracted names, since I can't read it in python to begin with...
On Tue, Oct 27, 2015 at 2:56 PM, Sam Groth <[email protected]> wrote: > Are you using version 2 or 3 of python avro? For a redacted schema, just > give the schema with all field names and namespaces changed. If the schema > is really long and complicated, you could just give the part that you > suspect is causing issues. > > > Sam > > > > > > On Tuesday, October 27, 2015 1:42 PM, web user <[email protected]> > wrote: > > > No. I don't think the problem is that. The same code has worked with > reading many many files. This particular file hit a corner case where one > of the data structures has no records in it and it is causing a lot of > grief to the python avro routine. It's been generated from C++ avro > routines... > > Regards, > > WU > > On Tue, Oct 27, 2015 at 2:38 PM, Sam Groth <[email protected]> wrote: > > I think you may be missing a "return" when you create your DataFileReader. > I have always been able to read data in python using the standard methods; > so I don't think there is a problem with the implementation. That said, the > python implementation is significantly slower than Java or C. > > > Sam > > > > On Tuesday, October 27, 2015 1:23 PM, web user <[email protected]> > wrote: > > > Unfortunately the company I work at has a strict policy about sharing > data. Having said that I don't think the file is corrupted. > > I ran the following command: > > java -jar avro-tools-1.7.7.jar tojson testdata.avro > > and it generates a file of 1 byte > > I also ran java -jar avro-tools-1.7.7.jar getschema testdata.avro and it > gets back the correct schema. > > Is there any way when using the python library for it not to have consume > all memory on the entire box? > > Regards, > > WU > > > > On Tue, Oct 27, 2015 at 2:08 PM, Sean Busbey <[email protected]> wrote: > > It sounds like the file you are reading is malformed. Could you share > the file or how it was written? > > On Tue, Oct 27, 2015 at 1:01 PM, web user <[email protected]> wrote: > > I ran this in a vm with much less memory and it immediately failed with a > > memory error: > > > > Traceback (most recent call last): > > File "testavro.py", line 31, in <module> > > for r in reader: > > File "/usr/local/lib/python2.7/dist-packages/avro/datafile.py", line > 362, > > in next > > datum = self.datum_reader.read(self.datum_decoder) > > File "/usr/local/lib/python2.7/dist-packages/avro/io.py", line 445, in > > read > > return self.read_data(self.writers_schema, self.readers_schema, > decoder) > > File "/usr/local/lib/python2.7/dist-packages/avro/io.py", line 490, in > > read_data > > return self.read_record(writers_schema, readers_schema, decoder) > > File "/usr/local/lib/python2.7/dist-packages/avro/io.py", line 690, in > > read_record > > field_val = self.read_data(field.type, readers_field.type, decoder) > > File "/usr/local/lib/python2.7/dist-packages/avro/io.py", line 484, in > > read_data > > return self.read_array(writers_schema, readers_schema, decoder) > > File "/usr/local/lib/python2.7/dist-packages/avro/io.py", line 582, in > > read_array > > for i in range(block_count): > > MemoryError > > > > > > On Tue, Oct 27, 2015 at 1:36 PM, web user <[email protected]> wrote: > >> > >> Hi, > >> > >> I'm doing the following: > >> > >> from avro.datafile import DataFileReader > >> from avro.datafile import DataFileWriter > >> from avro.io import DatumReader > >> from avro.io import DatumWriter > >> > >> def OpenAvroFileToRead(avro_filename): > >> DataFileReader(open(avro_filename, 'r'), DatumReader()) > >> > >> > >> with OpenAvroFileToRead(avro_filename) as reader: > >> for r in reader: > >> .... > >> > >> I have an avro file which is only 500 bytes. I think there is a data > >> structure in there which is null or empty. > >> > >> I put in print statements before and after "for r in reader". On the > >> instruction, for r in reader it consumes about 400Gigs of memory before > I > >> have to kill the process. > >> > >> That is 400Gigs! Ihave 1TB on my server. I have tried this with 1.6.1 > and > >> 1.7.1 and 1.7.7 and get the same behavior on all three versions. > >> > >> Any ideas on what is causing this? > >> > >> Regards, > >> > >> WU > > > > > > > > -- > Sean > > > > > > > >
