Hi, I can’t really comment why Python Avro is slow but you could try fastavro.
https://pypi.python.org/pypi/fastavro <https://pypi.python.org/pypi/fastavro> -Mika > On 09 Jan 2015, at 15:32, Han JU <[email protected]> wrote: > > Hi, > > I'm evaluating Avro to replace our csv based datasets and I notice a > performance problem in avro python bindings. > Basically I've tested on a 1.8GB dataset with 5 columns. With scala (avro > java bindings), reads and writes are fast (18s, 44s) but in python, for the > same file, it took nearly one hour to write, and 50 miniutes to read ... > > My code is based on the avro documentation examples, and the schema is > relatively simple. My question: > - Is this performance difference a known issue? > - Is there something I miss (say a special configuration or something)? > > I've seen a fastavro project and that is much faster in reading, but not > write support. This will prevent us from using Avro since we've lot of python > based programs that need to persist data. > > Thanks! > -- > JU Han > > Data Engineer @ Botify.com > > +33 0619608888
