Hi, I am a drill user and use parquet as the store format.
I have known some new feature has been added to the latest Parquet Format.
The new Parquet feature of column indexes seams very attractive and is
there any plan to be supported in drill?

thanks very much!

the feature detail:
https://github.com/apache/parquet-format/blob/master/CHANGES.md#version-250
See https://issues.apache.org/jira/browse/PARQUET-1201

And the goals: make both range scans and point lookups I/O efficient by
allowing direct access to pages based on their min and max values. In
particular:
1.A single-row lookup in a rowgroup based on the sort column of that
rowgroup will only read one data page per retrieved column. Range scans on
the sort column will only need to read the exact data pages that contain
relevant data.
2.Make other selective scans I/O efficient: if we have a very selective
predicate on a non-sorting column, for the other retrieved columns we
should only need to access data pages that contain matching rows.
3.No additional decoding effort for scans without selective predicates,
e.g., full-row group scans. If a reader determines that it does not need to
read the index data, it does not incur any overhead.
4.Index pages for sorted columns use minimal storage by storing only the
boundary elements between pages.

Reply via email to