Hi everyone, I'm looking for a better understanding of ORC compared to Avro
when leveraging a big data compute engine like Spark.

If I have 100GB dataset of Avro and the same dataset in ORC which consumes
10GB, would the ORC dataset be more performant and consume less memory than
the Avro counterpart?

My initial assumption was no because the data would both be deserialized
and I'm consuming the entire dataset for both, but wanted to have the
conversation to see if I'm thinking about that correctly.

Cheers,
Ryan S.

Reply via email to