Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16066 )

Change subject: IMPALA-9482 Support for BINARY columns
......................................................................


Patch Set 7:

(13 comments)

http://gerrit.cloudera.org:8080/#/c/16066/7//COMMIT_MSG
Commit Message:

PS7:
I think you're missing some handling for the parquet_annotate_strings_utf8 
option - it looks like we might set this for BINARY types, which is wrong.


http://gerrit.cloudera.org:8080/#/c/16066/7//COMMIT_MSG@29
PS7, Line 29: - HS2/Beeswax service
Do we need to update clients? Looks like impala-shell might have some logic for 
HS2 but not sure that it's tested.


http://gerrit.cloudera.org:8080/#/c/16066/7//COMMIT_MSG@55
PS7, Line 55: - Added FE/EE tests mainly based on the ones added to the DATE 
type
Can we add some more sanity tests for different parts of query processing. I 
think the fact that it's handled as string internally reduces the amount of 
testing required, but some sanity checks would be good. E.g. a join on BINARY, 
a group by on aggregation and a sort by BINARY?


http://gerrit.cloudera.org:8080/#/c/16066/7//COMMIT_MSG@56
PS7, Line 56: - Ran exhaustive tests.
Can we also add a shell test?


http://gerrit.cloudera.org:8080/#/c/16066/7/be/src/exec/hbase-scan-node.h
File be/src/exec/hbase-scan-node.h:

http://gerrit.cloudera.org:8080/#/c/16066/7/be/src/exec/hbase-scan-node.h@84
PS7, Line 84:   /// Descriptor of the HBase table.
Can you mention it's set in Prepare()


http://gerrit.cloudera.org:8080/#/c/16066/7/be/src/exec/hdfs-text-table-writer.cc
File be/src/exec/hdfs-text-table-writer.cc:

http://gerrit.cloudera.org:8080/#/c/16066/7/be/src/exec/hdfs-text-table-writer.cc@107
PS7, Line 107:               // TODO: try to find a more efficient imlementation
Maybe remove the TODO? Or file a JIRA? Not sure that it adds much value here.


http://gerrit.cloudera.org:8080/#/c/16066/7/be/src/exec/text-converter.h
File be/src/exec/text-converter.h:

http://gerrit.cloudera.org:8080/#/c/16066/7/be/src/exec/text-converter.h@65
PS7, Line 65: auxType
aux_type here and elsewhere


http://gerrit.cloudera.org:8080/#/c/16066/7/be/src/exec/text-converter.cc
File be/src/exec/text-converter.cc:

http://gerrit.cloudera.org:8080/#/c/16066/7/be/src/exec/text-converter.cc@353
PS7, Line 353: bool TextConverter::SupportsCodegenWriteSlot(const ColumnType& 
col_type,
I don't love adding more logic without corresponding codegen, we spent so much 
time filling in gaps in the past and it feels bad to add more back in.


http://gerrit.cloudera.org:8080/#/c/16066/7/be/src/runtime/descriptors.h
File be/src/runtime/descriptors.h:

http://gerrit.cloudera.org:8080/#/c/16066/7/be/src/runtime/descriptors.h@209
PS7, Line 209: (
aux_type


http://gerrit.cloudera.org:8080/#/c/16066/7/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java
File fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java:

http://gerrit.cloudera.org:8080/#/c/16066/7/fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java@259
PS7, Line 259:         // NDV is not calculated for BINARY columns (similarly 
to Hive).
This seems kinda arbitrary - is there a real reason not to calculate it? I 
guess not a big deal.


http://gerrit.cloudera.org:8080/#/c/16066/7/fe/src/main/java/org/apache/impala/analysis/LikePredicate.java
File fe/src/main/java/org/apache/impala/analysis/LikePredicate.java:

http://gerrit.cloudera.org:8080/#/c/16066/7/fe/src/main/java/org/apache/impala/analysis/LikePredicate.java@116
PS7, Line 116:   private static boolean isLikeableType(Type type) {
lol at the function name (no need to change).


http://gerrit.cloudera.org:8080/#/c/16066/7/fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoTest.java
File fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoTest.java:

http://gerrit.cloudera.org:8080/#/c/16066/7/fe/src/test/java/org/apache/impala/catalog/PartialCatalogInfoTest.java@192
PS7, Line 192: C
nit: Column


http://gerrit.cloudera.org:8080/#/c/16066/7/tests/query_test/test_scanners.py
File tests/query_test/test_scanners.py:

http://gerrit.cloudera.org:8080/#/c/16066/7/tests/query_test/test_scanners.py@1611
PS7, Line 1611:   def add_test_dimensions(cls):
It'd be good to run this for HS2 as well.



--
To view, visit http://gerrit.cloudera.org:8080/16066
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I36861a9ca6c2047b0d76862507c86f7f153bc582
Gerrit-Change-Number: 16066
Gerrit-PatchSet: 7
Gerrit-Owner: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Attila Jeges <[email protected]>
Gerrit-Reviewer: Csaba Ringhofer <[email protected]>
Gerrit-Reviewer: Gabor Kaszab <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Tim Armstrong <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>
Gerrit-Comment-Date: Fri, 04 Sep 2020 04:20:59 +0000
Gerrit-HasComments: Yes

Reply via email to