Tim Armstrong has posted comments on this change.

Change subject: IMPALA-3909: [DOCS] Add general info about Parquet min/max 
optimization
......................................................................


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/7068/1/docs/topics/impala_parquet.xml
File docs/topics/impala_parquet.xml:

PS1, Line 363: data block
Not sure what "data block" means. "each row group and data page" would be more 
precise.

I feel like the current text may confuse readers about what is in Parquet files 
in general versus how Impala writes out files versus what Impala actually makes 
use of on the read path right now.

Currently both Impala and other tools write out stats at both the row group and 
data page level. The data pages are a smaller granularity. Row groups are much 
larger granularity. I think the salient fact there is that there are typically 
a small number of row groups per file (1 for Impala).

Impala currently only uses the row group-level statistics to skip over large 
parts of the file at a time, but we have plans to use the page-level statistics.


PS1, Line 366: whether the file
"parts of each file", because it could be a data page or row group.


-- 
To view, visit http://gerrit.cloudera.org:8080/7068
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I5fd5f7b157024f6089af7feffcb538c160bb130d
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: John Russell <[email protected]>
Gerrit-Reviewer: Lars Volker <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-HasComments: Yes

Reply via email to