Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8147 )

Change subject: IMPALA-5448: fix invalid number of splits reported in Parquet 
scan node
......................................................................


Patch Set 2:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/8147/2/be/src/exec/hdfs-scan-node-base.h
File be/src/exec/hdfs-scan-node-base.h:

http://gerrit.cloudera.org:8080/#/c/8147/2/be/src/exec/hdfs-scan-node-base.h@557
PS2, Line 557: __builtin_popcount
Should call BitUtil::Popcount(), which will use hardware acceleration if 
appropriate.


http://gerrit.cloudera.org:8080/#/c/8147/2/be/src/exec/hdfs-scan-node-base.h@579
PS2, Line 579: bit_map
We put an underscore at the end of private members, i.e. 'bit_map_'


http://gerrit.cloudera.org:8080/#/c/8147/2/be/src/exec/hdfs-scan-node-base.h@582
PS2, Line 582:   /// Mapping of file formats (file type, compression types set) 
to the number of
Not your change, but it should mention the second entry in the tuple - whether 
the split was skipped.


http://gerrit.cloudera.org:8080/#/c/8147/2/testdata/datasets/functional/functional_schema_template.sql
File testdata/datasets/functional/functional_schema_template.sql:

http://gerrit.cloudera.org:8080/#/c/8147/2/testdata/datasets/functional/functional_schema_template.sql@1581
PS2, Line 1581: -- IMPALA-5448: parquet files with multiple compression types
We moved to loading "special" files as part of the tests rather than part of 
the data loading in a lot of cases. I think that is better practically because 
if you change this template then everyone has to reload data.

I commented on an instance of the alternative approach that we should switch to.


http://gerrit.cloudera.org:8080/#/c/8147/2/tests/query_test/test_scanners.py
File tests/query_test/test_scanners.py:

http://gerrit.cloudera.org:8080/#/c/8147/2/tests/query_test/test_scanners.py@82
PS2, Line 82:   def test_hdfs_parquet_scan_node_profile(self, vector):
This only applies to parquet so should go in TestParquet below 
(TestScannersAllTableFormats runs the test for all table formats).


http://gerrit.cloudera.org:8080/#/c/8147/2/tests/query_test/test_scanners.py@337
PS2, Line 337:   def test_corrupt_rle_counts(self, vector, unique_database):
This is an example of the alternative way of loading data files as part of the 
test.



--
To view, visit http://gerrit.cloudera.org:8080/8147
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iaacc2d775032f5707061e704f12e0a63cde695d1
Gerrit-Change-Number: 8147
Gerrit-PatchSet: 2
Gerrit-Owner: Quanlong Huang <[email protected]>
Gerrit-Reviewer: Quanlong Huang <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-Comment-Date: Mon, 02 Oct 2017 21:47:57 +0000
Gerrit-HasComments: Yes

Reply via email to