Norbert Luksa has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14264 )

Change subject: IMPALA-8498: Write column index for floating types when NaN is 
not present
......................................................................


Patch Set 5:

(8 comments)

http://gerrit.cloudera.org:8080/#/c/14264/4/be/src/exec/parquet/hdfs-parquet-table-writer.cc
File be/src/exec/parquet/hdfs-parquet-table-writer.cc:

http://gerrit.cloudera.org:8080/#/c/14264/4/be/src/exec/parquet/hdfs-parquet-table-writer.cc@444
PS4, Line 444:
> std::isnan() could be put in an UNLIKELY macro.
Done


http://gerrit.cloudera.org:8080/#/c/14264/4/be/src/exec/parquet/hdfs-parquet-table-writer.cc@438
PS4, Line 438:     T* val = CastValue(value);
             :     if (current_encoding_ == 
parquet::Encoding::PLAIN_DICTIONARY) {
             :       if (UNLIKELY(num_values_since_dict_size_check_ >=
             :                    DICTIONARY_DATA_PAGE_SIZE_CHECK_PERIOD)) {
             :         num_values_since_dict_size_check_ = 0;
             :         if (dict_encoder_->EstimatedDataEncodedSize() >= 
page_size_) return false;
             :       }
             :       ++num_values_since_dict_size_check_;
             :       *bytes_needed = dict_encoder_->Put(*val);
             :       // If the dictionary contains the maximum number of 
values, switch to plain
             :       // encoding for the next page. The current page is full 
and must be written out.
             :       if (UNLIKELY(*bytes_needed < 0)) {
             :         next_page_encoding_ = parquet::Encoding::PLAIN;
             :
> I have 3 minor issues about this block:
I have moved the block to the end, right before updating page stats.
The problem I run into with your suggestion was that I cannot use CastValue 
since it returns a T*, and the compiler can't deduce from 
std::is_floating_point that T can be passed to std::isnan.
But I understand your point regard compile/runtime checks, proposed a solution 
with only compile time checks. It separates the double and float type checks 
using std::is_same.

+: Added the cast to the beginning of the function.


http://gerrit.cloudera.org:8080/#/c/14264/4/be/src/exec/parquet/hdfs-parquet-table-writer.cc@452
PS4, Line 452:   }
> optional: a potential optimization would be to do the NaN check here only f
I would skip this optimization for now, not sure if it's worth the extra work, 
since this is only a workaround until the aforementioned Parquet issue is 
solved.


http://gerrit.cloudera.org:8080/#/c/14264/2/tests/query_test/test_parquet_page_index.py
File tests/query_test/test_parquet_page_index.py:

http://gerrit.cloudera.org:8080/#/c/14264/2/tests/query_test/test_parquet_page_index.py@429
PS2, Line 429:
> flake8: E501 line too long (93 > 90 characters)
Done


http://gerrit.cloudera.org:8080/#/c/14264/2/tests/query_test/test_parquet_page_index.py@462
PS2, Line 462:
> flake8: E116 unexpected indentation (comment)
Done


http://gerrit.cloudera.org:8080/#/c/14264/2/tests/query_test/test_parquet_page_index.py@473
PS2, Line 473:
> flake8: E116 unexpected indentation (comment)
Done


http://gerrit.cloudera.org:8080/#/c/14264/2/tests/query_test/test_parquet_page_index.py@483
PS2, Line 483:
> flake8: E116 unexpected indentation (comment)
Done


http://gerrit.cloudera.org:8080/#/c/14264/4/tests/query_test/test_parquet_page_index.py
File tests/query_test/test_parquet_page_index.py:

http://gerrit.cloudera.org:8080/#/c/14264/4/tests/query_test/test_parquet_page_index.py@461
PS4, Line 461: for col_type in ["
> I am bit worried about double being untested, and it seems simple to run th
Done



--
To view, visit http://gerrit.cloudera.org:8080/14264
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic9d367500243c8ca142a16ebfeef6c841f013434
Gerrit-Change-Number: 14264
Gerrit-PatchSet: 5
Gerrit-Owner: Norbert Luksa <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Norbert Luksa <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>
Gerrit-Comment-Date: Thu, 03 Oct 2019 07:05:57 +0000
Gerrit-HasComments: Yes

Reply via email to