Hi, Thanks. I've tried this project and its performance approaches java/scala. But it seems that it has only read support. We have indeed lots of use cases where python program need to persist datasets.
2015-01-09 14:39 GMT+01:00 Mika Ristimaki <[email protected]>: > Hi, > > I can’t really comment why Python Avro is slow but you could try fastavro. > > https://pypi.python.org/pypi/fastavro > > -Mika > > On 09 Jan 2015, at 15:32, Han JU <[email protected]> wrote: > > Hi, > > I'm evaluating Avro to replace our csv based datasets and I notice a > performance problem in avro python bindings. > Basically I've tested on a 1.8GB dataset with 5 columns. With scala (avro > java bindings), reads and writes are fast (18s, 44s) but in python, for the > same file, it took nearly one hour to write, and 50 miniutes to read ... > > My code is based on the avro documentation examples, and the schema is > relatively simple. My question: > - Is this performance difference a known issue? > - Is there something I miss (say a special configuration or something)? > > I've seen a fastavro project and that is much faster in reading, but not > write support. This will prevent us from using Avro since we've lot of > python based programs that need to persist data. > > Thanks! > -- > *JU Han* > > Data Engineer @ Botify.com > > +33 0619608888 > > > -- *JU Han* Data Engineer @ Botify.com +33 0619608888
