Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/8147 )
Change subject: IMPALA-5448: fix invalid number of splits reported in Parquet scan node ...................................................................... Patch Set 2: (6 comments) http://gerrit.cloudera.org:8080/#/c/8147/2/be/src/exec/hdfs-scan-node-base.h File be/src/exec/hdfs-scan-node-base.h: http://gerrit.cloudera.org:8080/#/c/8147/2/be/src/exec/hdfs-scan-node-base.h@557 PS2, Line 557: __builtin_popcount Should call BitUtil::Popcount(), which will use hardware acceleration if appropriate. http://gerrit.cloudera.org:8080/#/c/8147/2/be/src/exec/hdfs-scan-node-base.h@579 PS2, Line 579: bit_map We put an underscore at the end of private members, i.e. 'bit_map_' http://gerrit.cloudera.org:8080/#/c/8147/2/be/src/exec/hdfs-scan-node-base.h@582 PS2, Line 582: /// Mapping of file formats (file type, compression types set) to the number of Not your change, but it should mention the second entry in the tuple - whether the split was skipped. http://gerrit.cloudera.org:8080/#/c/8147/2/testdata/datasets/functional/functional_schema_template.sql File testdata/datasets/functional/functional_schema_template.sql: http://gerrit.cloudera.org:8080/#/c/8147/2/testdata/datasets/functional/functional_schema_template.sql@1581 PS2, Line 1581: -- IMPALA-5448: parquet files with multiple compression types We moved to loading "special" files as part of the tests rather than part of the data loading in a lot of cases. I think that is better practically because if you change this template then everyone has to reload data. I commented on an instance of the alternative approach that we should switch to. http://gerrit.cloudera.org:8080/#/c/8147/2/tests/query_test/test_scanners.py File tests/query_test/test_scanners.py: http://gerrit.cloudera.org:8080/#/c/8147/2/tests/query_test/test_scanners.py@82 PS2, Line 82: def test_hdfs_parquet_scan_node_profile(self, vector): This only applies to parquet so should go in TestParquet below (TestScannersAllTableFormats runs the test for all table formats). http://gerrit.cloudera.org:8080/#/c/8147/2/tests/query_test/test_scanners.py@337 PS2, Line 337: def test_corrupt_rle_counts(self, vector, unique_database): This is an example of the alternative way of loading data files as part of the test. -- To view, visit http://gerrit.cloudera.org:8080/8147 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iaacc2d775032f5707061e704f12e0a63cde695d1 Gerrit-Change-Number: 8147 Gerrit-PatchSet: 2 Gerrit-Owner: Quanlong Huang <[email protected]> Gerrit-Reviewer: Quanlong Huang <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-Comment-Date: Mon, 02 Oct 2017 21:47:57 +0000 Gerrit-HasComments: Yes
