[Impala-ASF-CR] IMPALA-8778: Support Apache Hudi Read Optimized Table

Yanjia Gary Li (Code Review) Wed, 15 Jan 2020 12:50:53 -0800

Yanjia Gary Li has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/14711 )


Change subject: IMPALA-8778: Support Apache Hudi Read Optimized Table
......................................................................


Patch Set 14:

(4 comments)

Thanks for reviewing! I still have some questions left. Once everything is 
clear I will address all the comments.

http://gerrit.cloudera.org:8080/#/c/14711/14/fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java
File fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java:

http://gerrit.cloudera.org:8080/#/c/14711/14/fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java@68
PS14, Line 68: THdfsFileFormat.HUDI_PARQUET
> I think Impala should not support inserting into HUDI_PARQUET because curre
Thanks for reviewing! I put HUDI_PARQUET here because the comment above says 
"The statement supports an optional PARTITIONED BY clause." So my guess was if 
we don't support HUDI_PARQUET here, does that mean we are not able to partition 
HUDI_PARQUET table?


http://gerrit.cloudera.org:8080/#/c/14711/14/fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java
File fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java:

http://gerrit.cloudera.org:8080/#/c/14711/14/fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java@74
PS14, Line 74: HUDI_PARQUET
> We can only create a table from a Parquet file, it doesn't matter if the fi
Make sense. statement like "CREATE TABLE LIKE PARQUET xxx STORED AS HUDIPARQUET 
LOCATION xxx" would work


http://gerrit.cloudera.org:8080/#/c/14711/14/fe/src/main/jflex/sql-scanner.flex
File fe/src/main/jflex/sql-scanner.flex:

http://gerrit.cloudera.org:8080/#/c/14711/14/fe/src/main/jflex/sql-scanner.flex@149
PS14, Line 149: hudiparquet
> nit: I'm in favor of "hudi_parquet" because it's more readable.
I am following the naming convention of SEQUENCEFILE.
In HdfsFileFormat it was SEQUENCE_FILE but in Sql it was SEQUENCEFILE. Same for 
RCFILE. Does it make sense to keep HUDIPARQUET here?


http://gerrit.cloudera.org:8080/#/c/14711/14/tests/common/test_dimensions.py
File tests/common/test_dimensions.py:

http://gerrit.cloudera.org:8080/#/c/14711/14/tests/common/test_dimensions.py@33
PS14, Line 33: hudiparquet
> We cannot run these tests on Hudi tables because we cannot load the workloa
I am not quite sure if I understand the .test execution sequence correctly.
I first looked into test_scanner.py file and see TestParquet is using 
'functional-query' workload, so I tried to do the same for HUDIPARQUET.
Then I created a class TestHudiParquet in order to execute the queries under 
QueryTest/hudiparquet by following the same way how QueryTest/parquet gets 
executed.
If my understanding was wrong here, would you point me to the right direction?



--
To view, visit http://gerrit.cloudera.org:8080/14711
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I65e146b347714df32fe968409ef2dde1f6a25cdf
Gerrit-Change-Number: 14711
Gerrit-PatchSet: 14
Gerrit-Owner: Yanjia Gary Li <yanjia.gary...@gmail.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Norbert Luksa <norbert.lu...@cloudera.com>
Gerrit-Reviewer: Yanjia Gary Li <yanjia.gary...@gmail.com>
Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com>
Gerrit-Comment-Date: Wed, 15 Jan 2020 20:49:53 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8778: Support Apache Hudi Read Optimized Table

Reply via email to