Csaba Ringhofer has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/17262 )

Change subject: WIP - IMPALA-10642: Write support for Parquet Bloom filters - 
most common types
......................................................................


Patch Set 3:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/17262/3//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/17262/3//COMMIT_MSG@16
PS3, Line 16: TBL_PROPS
TBL_PROPS will configure bloom filters per column, right?


http://gerrit.cloudera.org:8080/#/c/17262/3//COMMIT_MSG@16
PS3, Line 16:   TBL_PROPS  - write Parquet Bloom filters as set in table 
properties
            :   IF_NO_DICT - write Parquet Bloom filters if the row group is not
            :                fully dictionary encoded
What is the relation of TBL_PROPS and IF_NO_DICT? E.g. with TBL_PROPS will you 
write bloom filter even if the column dictionary encoded?


http://gerrit.cloudera.org:8080/#/c/17262/1/be/src/exec/parquet/hdfs-parquet-table-writer.cc
File be/src/exec/parquet/hdfs-parquet-table-writer.cc:

http://gerrit.cloudera.org:8080/#/c/17262/1/be/src/exec/parquet/hdfs-parquet-table-writer.cc@605
PS1, Line 605:   // The ParquetBloomFilter object if one is being written. If
             :   // 'ShouldInitParquetBloomFilter()' is false, the combination 
of the impala type and the
             :   // parquet type is not supported or some error occurs during 
the initialisation of the
             :   // ParquetBloomFilter object, it is set to NULL.
             :   unique_ptr<ParquetB
> I find using dst_ptr a bit risky:
I like the solution, but I disagree here:
"in case of int8 and int16 we need to pad them to int32 because parquet doesn't 
support smaller ints, so in these cases we cannot use the buffer, adding even 
more special cases."
Why can't we use the buffer? dst_ptr points to the plain encoded page buffer, 
where the i16/i8 are already converted to i32


http://gerrit.cloudera.org:8080/#/c/17262/3/be/src/exec/parquet/hdfs-parquet-table-writer.cc
File be/src/exec/parquet/hdfs-parquet-table-writer.cc:

http://gerrit.cloudera.org:8080/#/c/17262/3/be/src/exec/parquet/hdfs-parquet-table-writer.cc@493
PS3, Line 493: if (parent_
I would prefer to move this to a separate function like 
FlushDictionaryToBloomFilterIfNeeded()

ProcessValue() is a critical function when trying to understand how Impala 
writes Parquet files, so I think that we should try to minimize noise.


http://gerrit.cloudera.org:8080/#/c/17262/3/be/src/exec/parquet/hdfs-parquet-table-writer.cc@542
PS3, Line 542:     if (ShouldUpdateParquetBloomFilter()) {
I would prefer to move this to a separate function like 
UpdateBloomFilterIfNeeded()


http://gerrit.cloudera.org:8080/#/c/17262/3/be/src/exec/parquet/hdfs-parquet-table-writer.cc@597
PS3, Line 597: whwn
typo


http://gerrit.cloudera.org:8080/#/c/17262/3/be/src/exec/parquet/parquet-bloom-filter-util.cc
File be/src/exec/parquet/parquet-bloom-filter-util.cc:

http://gerrit.cloudera.org:8080/#/c/17262/3/be/src/exec/parquet/parquet-bloom-filter-util.cc@220
PS3, Line 220: {
nit: brace placement



--
To view, visit http://gerrit.cloudera.org:8080/17262
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie865efd4f0c11b9e111fb94f77d084bf6ee20792
Gerrit-Change-Number: 17262
Gerrit-PatchSet: 3
Gerrit-Owner: Daniel Becker <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Daniel Becker <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Comment-Date: Wed, 14 Apr 2021 14:03:25 +0000
Gerrit-HasComments: Yes

Reply via email to