Hi,

I can’t really comment why Python Avro is slow but you could try fastavro.

https://pypi.python.org/pypi/fastavro <https://pypi.python.org/pypi/fastavro>

-Mika

> On 09 Jan 2015, at 15:32, Han JU <[email protected]> wrote:
> 
> Hi,
> 
> I'm evaluating Avro to replace our csv based datasets and I notice a 
> performance problem in avro python bindings.
> Basically I've tested on a 1.8GB dataset with 5 columns. With scala (avro 
> java bindings), reads and writes are fast (18s, 44s) but in python, for the 
> same file, it took nearly one hour to write, and 50 miniutes to read ...
> 
> My code is based on the avro documentation examples, and the schema is 
> relatively simple. My question: 
>   - Is this performance difference a known issue? 
>   - Is there something I miss (say a special configuration or something)?
> 
> I've seen a fastavro project and that is much faster in reading, but not 
> write support. This will prevent us from using Avro since we've lot of python 
> based programs that need to persist data.
> 
> Thanks!
> -- 
> JU Han
> 
> Data Engineer @ Botify.com
> 
> +33 0619608888

Reply via email to