Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/16066 )
Change subject: IMPALA-9482 Support for BINARY columns ...................................................................... Patch Set 7: (13 comments) http://gerrit.cloudera.org:8080/#/c/16066/7//COMMIT_MSG Commit Message: PS7: I think you're missing some handling for the parquet_annotate_strings_utf8 option - it looks like we might set this for BINARY types, which is wrong. http://gerrit.cloudera.org:8080/#/c/16066/7//COMMIT_MSG@29 PS7, Line 29: - HS2/Beeswax service Do we need to update clients? Looks like impala-shell might have some logic for HS2 but not sure that it's tested. http://gerrit.cloudera.org:8080/#/c/16066/7//COMMIT_MSG@55 PS7, Line 55: - Added FE/EE tests mainly based on the ones added to the DATE type Can we add some more sanity tests for different parts of query processing. I think the fact that it's handled as string internally reduces the amount of testing required, but some sanity checks would be good. E.g. a join on BINARY, a group by on aggregation and a sort by BINARY? http://gerrit.cloudera.org:8080/#/c/16066/7//COMMIT_MSG@56 PS7, Line 56: - Ran exhaustive tests. Can we also add a shell test? http://gerrit.cloudera.org:8080/#/c/16066/7/be/src/exec/hbase-scan-node.h File be/src/exec/hbase-scan-node.h: http://gerrit.cloudera.org:8080/#/c/16066/7/be/src/exec/hbase-scan-node.h@84 PS7, Line 84: /// Descriptor of the HBase table. Can you mention it's set in Prepare() http://gerrit.cloudera.org:8080/#/c/16066/7/be/src/exec/hdfs-text-table-writer.cc File be/src/exec/hdfs-text-table-writer.cc: http://gerrit.cloudera.org:8080/#/c/16066/7/be/src/exec/hdfs-text-table-writer.cc@107 PS7, Line 107: // TODO: try to find a more efficient imlementation Maybe remove the TODO? Or file a JIRA? Not sure that it adds much value here. http://gerrit.cloudera.org:8080/#/c/16066/7/be/src/exec/text-converter.h File be/src/exec/text-converter.h: http://gerrit.cloudera.org:8080/#/c/16066/7/be/src/exec/text-converter.h@65 PS7, Line 65: auxType aux_type here and elsewhere http://gerrit.cloudera.org:8080/#/c/16066/7/be/src/exec/text-converter.cc File be/src/exec/text-converter.cc: http://gerrit.cloudera.org:8080/#/c/16066/7/be/src/exec/text-converter.cc@353 PS7, Line 353: bool TextConverter::SupportsCodegenWriteSlot(const ColumnType& col_type, I don't love adding more logic without corresponding codegen, we spent so much time filling in gaps in the past and it feels bad to add more back in. http://gerrit.cloudera.org:8080/#/c/16066/7/be/src/runtime/descriptors.h File be/src/runtime/descriptors.h: http://gerrit.cloudera.org:8080/#/c/16066/7/be/src/runtime/descriptors.h@209 PS7, Line 209: ( aux_type http://gerrit.cloudera.org:8080/#/c/16066/7/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java File fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java: http://gerrit.cloudera.org:8080/#/c/16066/7/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@259 PS7, Line 259: // NDV is not calculated for BINARY columns (similarly to Hive). This seems kinda arbitrary - is there a real reason not to calculate it? I guess not a big deal. http://gerrit.cloudera.org:8080/#/c/16066/7/fe/src/main/java/org/apache/impala/analysis/LikePredicate.java File fe/src/main/java/org/apache/impala/analysis/LikePredicate.java: http://gerrit.cloudera.org:8080/#/c/16066/7/fe/src/main/java/org/apache/impala/analysis/LikePredicate.java@116 PS7, Line 116: private static boolean isLikeableType(Type type) { lol at the function name (no need to change). http://gerrit.cloudera.org:8080/#/c/16066/7/fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoTest.java File fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoTest.java: http://gerrit.cloudera.org:8080/#/c/16066/7/fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoTest.java@192 PS7, Line 192: C nit: Column http://gerrit.cloudera.org:8080/#/c/16066/7/tests/query_test/test_scanners.py File tests/query_test/test_scanners.py: http://gerrit.cloudera.org:8080/#/c/16066/7/tests/query_test/test_scanners.py@1611 PS7, Line 1611: def add_test_dimensions(cls): It'd be good to run this for HS2 as well. -- To view, visit http://gerrit.cloudera.org:8080/16066 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I36861a9ca6c2047b0d76862507c86f7f153bc582 Gerrit-Change-Number: 16066 Gerrit-PatchSet: 7 Gerrit-Owner: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Attila Jeges <[email protected]> Gerrit-Reviewer: Csaba Ringhofer <[email protected]> Gerrit-Reviewer: Gabor Kaszab <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Tim Armstrong <[email protected]> Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]> Gerrit-Comment-Date: Fri, 04 Sep 2020 04:20:59 +0000 Gerrit-HasComments: Yes
