Has anyone profiled the Python code or otherwise looked at the performance?

 - Bruce

Sent from my iPhone

> On Jan 9, 2015, at 8:56 PM, Han JU <[email protected]> wrote:
> 
> Hi, 
> 
> Thanks. I've tried this project and its performance approaches java/scala. 
> But it seems that it has only read support. We have indeed lots of use cases 
> where python program need to persist datasets. 
> 
> 2015-01-09 14:39 GMT+01:00 Mika Ristimaki <[email protected]>:
>> Hi,
>> 
>> I can’t really comment why Python Avro is slow but you could try fastavro.
>> 
>> https://pypi.python.org/pypi/fastavro
>> 
>> -Mika
>> 
>>> On 09 Jan 2015, at 15:32, Han JU <[email protected]> wrote:
>>> 
>>> Hi,
>>> 
>>> I'm evaluating Avro to replace our csv based datasets and I notice a 
>>> performance problem in avro python bindings.
>>> Basically I've tested on a 1.8GB dataset with 5 columns. With scala (avro 
>>> java bindings), reads and writes are fast (18s, 44s) but in python, for the 
>>> same file, it took nearly one hour to write, and 50 miniutes to read ...
>>> 
>>> My code is based on the avro documentation examples, and the schema is 
>>> relatively simple. My question: 
>>>   - Is this performance difference a known issue? 
>>>   - Is there something I miss (say a special configuration or something)?
>>> 
>>> I've seen a fastavro project and that is much faster in reading, but not 
>>> write support. This will prevent us from using Avro since we've lot of 
>>> python based programs that need to persist data.
>>> 
>>> Thanks!
>>> -- 
>>> JU Han
>>> 
>>> Data Engineer @ Botify.com
>>> 
>>> +33 0619608888
> 
> 
> 
> -- 
> JU Han
> 
> Data Engineer @ Botify.com
> 
> +33 0619608888

Reply via email to