[Impala-ASF-CR] IMPALA-5842: Write page index in Parquet files

Zoltan Borok-Nagy (Code Review) Mon, 07 May 2018 09:23:35 -0700

Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/9693 )


Change subject: IMPALA-5842: Write page index in Parquet files
......................................................................


Patch Set 13:

(15 comments)

http://gerrit.cloudera.org:8080/#/c/9693/2/be/src/exec/hdfs-parquet-table-writer.cc
File be/src/exec/hdfs-parquet-table-writer.cc:

http://gerrit.cloudera.org:8080/#/c/9693/2/be/src/exec/hdfs-parquet-table-writer.cc@1195
PS2, Line 1195:   // Populate RowGroup::sorting_columns with all columns 
specified by the Frontend.
> I didn't follow up on this. It looks like the code only ever writes one row
Unfortunately it wouldn't work, because based on the description of the Parquet 
page indexes 
(https://github.com/apache/parquet-format/blob/master/PageIndex.md) we should 
write all the page indexes of all the row groups together, just before the 
footer. In other words, we have to flush all the row groups, then we can write 
all the page indexes.

Currently Impala writes only one row group per Parquet file, so I just add a 
DCHECK here. If Impala's behavior changes in the future in this regard, we'll 
need to update the write path of the page indexes.


http://gerrit.cloudera.org:8080/#/c/9693/13/be/src/exec/hdfs-parquet-table-writer.cc
File be/src/exec/hdfs-parquet-table-writer.cc:

http://gerrit.cloudera.org:8080/#/c/9693/13/be/src/exec/hdfs-parquet-table-writer.cc@107
PS13, Line 107:       page_index_mem_tracker_(-1, "Parquet Page Index",
> We can have many HdfsTableWriters per HdfsTableSink if we're doing non-clus
I track directly against HdfsTableSink.


http://gerrit.cloudera.org:8080/#/c/9693/13/be/src/exec/hdfs-parquet-table-writer.cc@1212
PS13, Line 1212:   if (file_metadata_.row_groups.size() != 1) return 
Status::OK();
> This
Answered it at PS2.


http://gerrit.cloudera.org:8080/#/c/9693/13/be/src/exec/hdfs-parquet-table-writer.cc@1214
PS13, Line 1214:   parquet::RowGroup* row_group = 
&(file_metadata_.row_groups[0]);
> Can't we use current_row_group_ here?
We can't, detailed answer is at PS2.


http://gerrit.cloudera.org:8080/#/c/9693/13/be/src/exec/hdfs-parquet-table-writer.cc@1216
PS13, Line 1216:   for (int i = 0; i < columns_.size(); i++) {
> I injected an error here by forgetting to write out one of the pages and th
Thanks for checking.


http://gerrit.cloudera.org:8080/#/c/9693/13/be/src/exec/hdfs-parquet-table-writer.cc@1241
PS13, Line 1241:     row_group->columns[i].__set_offset_index_offset(file_pos_);
> I added some errors to the offsets and lengths - looks like the tests caugh
Thanks


http://gerrit.cloudera.org:8080/#/c/9693/13/be/src/exec/parquet-column-stats.inline.h
File be/src/exec/parquet-column-stats.inline.h:

http://gerrit.cloudera.org:8080/#/c/9693/13/be/src/exec/parquet-column-stats.inline.h@31
PS13, Line 31:   ascending_boundary_order_ = true;
> I tried injecting a bug by initializing these to "false". The tests didn't
Currently we don't invoke Reset() on the ColumnStatsBase object where it would 
matter.

In HdfsParquetTableWriter::BaseColumnWriter there are two ColumnStatsBase 
members: page_stats_base_ and row_group_stats_base_.

For each new page, we invoke Reset() on page_stats_base_, but that doesn't 
corrupt the ordering, because we only consider the state of 
row_group_stats_base_ for that, and we never call Reset() on it.

Even in BaseColumnWriter::Reset(), we throw away the old row_group_stats_base_ 
and create a brand new one.


http://gerrit.cloudera.org:8080/#/c/9693/13/be/src/exec/parquet-column-stats.inline.h@56
PS13, Line 56: false
> I tried injecting a bug by flipping the value of this and the tests caught
Thanks!


http://gerrit.cloudera.org:8080/#/c/9693/13/be/src/exec/parquet-column-stats.inline.h@60
PS13, Line 60:         if (MinMaxTrait<T>::Compare(prev_page_max_value_, 
cs->max_value_) < 0 ||
> I tried injecting a bug by removing the first condition and the tests caugh
Thanks!


http://gerrit.cloudera.org:8080/#/c/9693/13/be/src/exec/parquet-column-stats.inline.h@68
PS13, Line 68: max_value_
> I tried switching this to cs->min_value. The tests caught it.
Thanks!


http://gerrit.cloudera.org:8080/#/c/9693/13/be/src/exec/parquet-column-stats.inline.h@69
PS13, Line 69:     prev_page_min_buffer_.Clear();
> I tried removing this line. The tests didn't catch it, even under ASAN.
Turned out that the _validate_ordering function was too loose in 
test_parquet_page_index.py. We could remove both prev_page_min_buffer_ and 
prev_page_max_buffer_ without noticing it.

Now _validate_ordering implements the same logic from parquet-column-stats to 
determine the expected ordering.

Interestingly the tests only catch the case when we remove 
prev_page_max_buffer_.Clear(). The lack of prev_page_min_buffer.Clear() 
remained unnoticed. I tried to run the tests on a couple of existing tables, 
but without any luck.

The reason is a bit subtle, basically prev_page_min_value_.ptr is "lucky" 
enough to point at memory addresses that make the tests pass (most of the time 
it points to the same address as cs->min_value_.ptr). I suppose that the tests 
would fail occasionally.

I'll try a couple of other tables to see if they fail the tests when 
prev_page_min_buffer.Clear() is missing.

For now I specialized the Merge function for StringValue, and added DCHECKs to 
test that we stored the prev pages' min/max values in their StringBuffers.


http://gerrit.cloudera.org:8080/#/c/9693/13/be/src/util/string-util.cc
File be/src/util/string-util.cc:

http://gerrit.cloudera.org:8080/#/c/9693/13/be/src/util/string-util.cc@29
PS13, Line 29: int
> nit: use static_cast<>
Done


http://gerrit.cloudera.org:8080/#/c/9693/13/testdata/bin/load-dependent-tables.sql
File testdata/bin/load-dependent-tables.sql:

http://gerrit.cloudera.org:8080/#/c/9693/13/testdata/bin/load-dependent-tables.sql@77
PS13, Line 77: WITH DEFERRED REBUILD IN TABLE hive_index_tbl
> missing trailing semicolon here and on some of the "DROP TABLE" statements
My bad, I thought if I run a complete data load after formatting, it will 
execute this file as well, but now I see it's not.

Now I run it through beeline (the same way as in create-data-load.sh), and my 
lines execute fine, but I need to comment out the above commands because the 
INSERT OVERWRITE TABLE command in L44 raises an error.

Since I didn't touch those lines, I must have invoked it the wrong way. How do 
you execute this file?

Anyway, I fixed the issues based on your comments.


http://gerrit.cloudera.org:8080/#/c/9693/13/testdata/bin/load-dependent-tables.sql@94
PS13, Line 94: '\\\\'
> I had to replace this with \\ to get this to work locally.
Done


http://gerrit.cloudera.org:8080/#/c/9693/13/testdata/bin/load-dependent-tables.sql@106
PS13, Line 106: "}},
> I had to replace }} with } and {{ with { to get this to run locally.
Done



--
To view, visit http://gerrit.cloudera.org:8080/9693
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icbacf7fe3b7672e3ce719261ecef445b16f8dec9
Gerrit-Change-Number: 9693
Gerrit-PatchSet: 13
Gerrit-Owner: Zoltan Borok-Nagy <[email protected]>
Gerrit-Reviewer: Anonymous Coward #248
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Lars Volker <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>
Gerrit-Comment-Date: Mon, 07 May 2018 16:22:47 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-5842: Write page index in Parquet files

Reply via email to