Avro vs ORC in Spark

Ryan Schachte Tue, 09 Nov 2021 08:35:17 -0800

Hi everyone, I'm looking for a better understanding of ORC compared to Avro
when leveraging a big data compute engine like Spark.


If I have 100GB dataset of Avro and the same dataset in ORC which consumes
10GB, would the ORC dataset be more performant and consume less memory than
the Avro counterpart?

My initial assumption was no because the data would both be deserialized
and I'm consuming the entire dataset for both, but wanted to have the
conversation to see if I'm thinking about that correctly.

Cheers,
Ryan S.

Avro vs ORC in Spark

Reply via email to