Hello,

Recently there has been an addition to the parquet files. Namely, the column 
indexes.

See: 
https://stackoverflow.com/questions/26909543/index-in-parquet/40714337#40714337

Available since parquet encoder 1.11, parquet format 2.5.

It seems to improve the IO performance by an order of magnitude in certain 
scenarios, which is simply fantastic.

My question are:
- are there any plans to include it in upcoming spark releases? Could you 
direct me to an issue, if such exists?
- is not, could you suggest a way to at least write parquet files in the new 
format and worry about the optimized reads later? Would simply forcing the 
parquet dependencies to the said versions be enough?

Thank you!

Cheers,
Kamil Krynicki

Reply via email to