Python avro performance

Han JU Fri, 09 Jan 2015 05:36:09 -0800

Hi,

I'm evaluating Avro to replace our csv based datasets and I notice a
performance problem in avro python bindings.
Basically I've tested on a 1.8GB dataset with 5 columns. With scala (avro
java bindings), reads and writes are fast (18s, 44s) but in python, for the
same file, it took nearly one hour to write, and 50 miniutes to read ...


My code is based on the avro documentation examples, and the schema is
relatively simple. My question:
  - Is this performance difference a known issue?
  - Is there something I miss (say a special configuration or something)?

I've seen a fastavro project and that is much faster in reading, but not
write support. This will prevent us from using Avro since we've lot of
python based programs that need to persist data.

Thanks!
-- 
*JU Han*

Data Engineer @ Botify.com

+33 0619608888

Python avro performance

Reply via email to