Hey all,
I’ve been migrating some processes over from ingesting Avro to ingesting 
Parquet. In Spark, we’re seeing 2x-8x performance gains when using Parquet over 
Avro. In Pig, similar processes are about the same runtime between the two 
formats (and sometimes even higher using Parquet). We’ve enabled dictionary 
filtering as well as predicate filter/pushdown. Wondering if there are other 
settings / strategies we might be missing to take advantage of Parquet.

Thanks,
Michael

Reply via email to