Many thanks for this!
Ryan Blue schrieb am 17:44 Dienstag, 8.August
2017:
Joerg,
Parquet is columnar storage, but a lot of execution engines actually
operate on rows. It's common to select columns and push down filters, but
then want the rows reconstructed
Joerg,
Parquet is columnar storage, but a lot of execution engines actually
operate on rows. It's common to select columns and push down filters, but
then want the rows reconstructed because it is difficult to work with
columnar data. Spark operates on rows and when Parquet stands in for
another