Has anyone profiled the Python code or otherwise looked at the performance?
- Bruce Sent from my iPhone > On Jan 9, 2015, at 8:56 PM, Han JU <[email protected]> wrote: > > Hi, > > Thanks. I've tried this project and its performance approaches java/scala. > But it seems that it has only read support. We have indeed lots of use cases > where python program need to persist datasets. > > 2015-01-09 14:39 GMT+01:00 Mika Ristimaki <[email protected]>: >> Hi, >> >> I can’t really comment why Python Avro is slow but you could try fastavro. >> >> https://pypi.python.org/pypi/fastavro >> >> -Mika >> >>> On 09 Jan 2015, at 15:32, Han JU <[email protected]> wrote: >>> >>> Hi, >>> >>> I'm evaluating Avro to replace our csv based datasets and I notice a >>> performance problem in avro python bindings. >>> Basically I've tested on a 1.8GB dataset with 5 columns. With scala (avro >>> java bindings), reads and writes are fast (18s, 44s) but in python, for the >>> same file, it took nearly one hour to write, and 50 miniutes to read ... >>> >>> My code is based on the avro documentation examples, and the schema is >>> relatively simple. My question: >>> - Is this performance difference a known issue? >>> - Is there something I miss (say a special configuration or something)? >>> >>> I've seen a fastavro project and that is much faster in reading, but not >>> write support. This will prevent us from using Avro since we've lot of >>> python based programs that need to persist data. >>> >>> Thanks! >>> -- >>> JU Han >>> >>> Data Engineer @ Botify.com >>> >>> +33 0619608888 > > > > -- > JU Han > > Data Engineer @ Botify.com > > +33 0619608888
