Pooja Nilangekar has posted comments on this change.

Change subject: Populate OffsetIndex and ColumnIndex of a row_group and Filter 
pages
......................................................................


Patch Set 2:

(20 comments)

http://gerrit.cloudera.org:8080/#/c/7465/1/be/src/exec/hdfs-parquet-scanner.cc
File be/src/exec/hdfs-parquet-scanner.cc:

Line 617:     }
> Have you considered moving this into its own method?
Done


http://gerrit.cloudera.org:8080/#/c/7465/1/be/src/exec/hdfs-parquet-table-writer.cc
File be/src/exec/hdfs-parquet-table-writer.cc:

Line 1205:       !current_row_group_->sorting_columns.empty();
> Couldn't it happen that columns not in the sort_columns() list are sorted?
Done


Line 1218:     std::vector<parquet::OffsetIndex> 
offset_indexes(columns_.size());
> I think it may be easier to track the previous value and then decide in Pro
Done


PS1, Line 1217: rquet::RowGroup* row_group = &(file_metadata_.row_groups[0]);
              :     std::vector<parquet::OffsetIndex> 
offset_indexes(columns_.size());
              :     uint8_t* buffer = nullptr;
              :     uint32_t len = 0;
              :     for (int i = 0; i < columns_.size(); i++) {
              : 
              :       parquet::OffsetIndex offset_index;
              :       offset_index.__set_page_loca
> It's unclear to me how this works. Does it go through the min_values and ma
Done


http://gerrit.cloudera.org:8080/#/c/7465/1/be/src/exec/parquet-index-filter.cc
File be/src/exec/parquet-index-filter.cc:

Line 39:   if (!min_max_tuple_desc_) return Status::OK();
> Can you add a comment for this line? It's unclear to me why it's necessary.
Done


PS1, Line 45: ges 
> This could have a better name, e.g. skip_ranges or ranges_to_skip
Done


PS1, Line 45: o
> nit: you can remove the (0) here and in all other vector ctors. It's the de
Done


Line 46:   vector<RowRange> skip_ranges;
> move this to where it's used, think of a better name
Done


Line 150:     std::sort(skip_ranges.begin(), skip_ranges.end(), 
std::less<RowRange>());
> Describe what this variable stores. Maybe also explain in a high level that
Done


PS1, Line 152: is block computes the unio
> You could also use 'for (auto& row_range : skip) ...'
Done


PS1, Line 158:  If the skip begins before
> Is this check really needed?
Yes, it is necessary when the current skip range ends before the previous skip 
range ends. In that case, the end of the skip_range should not be updated to 
ensure correctness.


PS1, Line 178: 
> nit: use pre-increment
Done


Line 181: 
> Add a comment explaining why you need to do this. The vectors should be of 
Done


Line 196:         } else
> resize() should not be needed. You could also use http://en.cppreference.co
Done


Line 199:         FilteredPageInfo page{.page_id = page_idx,
> One variable per line makes it more readable.
Done


PS1, Line 215:           }
> add {}, here and below. This helps prevent mistakes when changing the code 
Done


Line 224:             page.ranges.push_back(std::make_pair(start_offset, 
end_offset));
> Statements that fit into a single line are ok.
Done


http://gerrit.cloudera.org:8080/#/c/7465/1/be/src/exec/parquet-index-filter.h
File be/src/exec/parquet-index-filter.h:

Line 37: /// Pair of int64_t to represent the start and end of a range of rows. 
Each boundary is
> Mention whether each boundary is inclusive. In my experience it's most comm
Done


Line 45:   int64_t offset;
> Maybe "PageReadInfo" or something similar makes it more clear that this str
Done


Line 50: typedef vector<FilteredPageInfo> FilteredPages;
> Here you could add a comment explaining the overall usage of this class: Wh
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/7465
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Idace1e57067f95973cef3567eeb84f2ad87fd3f6
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Pooja Nilangekar <pooja.nilange...@cloudera.com>
Gerrit-Reviewer: Lars Volker <l...@cloudera.com>
Gerrit-Reviewer: Pooja Nilangekar <pooja.nilange...@cloudera.com>
Gerrit-HasComments: Yes

Reply via email to