On 05/06/2011 10:34 AM, Miki Tebeka wrote: > I'm using the avro python package (1.5.0), and it is slow. > It takes about 1min to process 33K records file. For comparison the > Java packages process the same file in 1sec. > > Any ideas on how to speed that up?
Does the schema have unions? Last I checked, python recursively validates data in order to determine which branch of a union should be written. In the worst case (nested unions) this can lead to quadratic serialization times. It should be possible to determine the union branch to write much more efficiently. It would be great to have some performance benchmarks for Python, as we do for Java. Doug
