Yanjia Gary Li has posted comments on this change. ( http://gerrit.cloudera.org:8080/14711 )
Change subject: IMPALA-8778: Support Apache Hudi Read Optimized Table ...................................................................... Patch Set 14: (4 comments) Thanks for reviewing! I still have some questions left. Once everything is clear I will address all the comments. http://gerrit.cloudera.org:8080/#/c/14711/14/fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java File fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java: http://gerrit.cloudera.org:8080/#/c/14711/14/fe/src/main/java/org/apache/impala/analysis/CreateTableAsSelectStmt.java@68 PS14, Line 68: THdfsFileFormat.HUDI_PARQUET > I think Impala should not support inserting into HUDI_PARQUET because curre Thanks for reviewing! I put HUDI_PARQUET here because the comment above says "The statement supports an optional PARTITIONED BY clause." So my guess was if we don't support HUDI_PARQUET here, does that mean we are not able to partition HUDI_PARQUET table? http://gerrit.cloudera.org:8080/#/c/14711/14/fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java File fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java: http://gerrit.cloudera.org:8080/#/c/14711/14/fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java@74 PS14, Line 74: HUDI_PARQUET > We can only create a table from a Parquet file, it doesn't matter if the fi Make sense. statement like "CREATE TABLE LIKE PARQUET xxx STORED AS HUDIPARQUET LOCATION xxx" would work http://gerrit.cloudera.org:8080/#/c/14711/14/fe/src/main/jflex/sql-scanner.flex File fe/src/main/jflex/sql-scanner.flex: http://gerrit.cloudera.org:8080/#/c/14711/14/fe/src/main/jflex/sql-scanner.flex@149 PS14, Line 149: hudiparquet > nit: I'm in favor of "hudi_parquet" because it's more readable. I am following the naming convention of SEQUENCEFILE. In HdfsFileFormat it was SEQUENCE_FILE but in Sql it was SEQUENCEFILE. Same for RCFILE. Does it make sense to keep HUDIPARQUET here? http://gerrit.cloudera.org:8080/#/c/14711/14/tests/common/test_dimensions.py File tests/common/test_dimensions.py: http://gerrit.cloudera.org:8080/#/c/14711/14/tests/common/test_dimensions.py@33 PS14, Line 33: hudiparquet > We cannot run these tests on Hudi tables because we cannot load the workloa I am not quite sure if I understand the .test execution sequence correctly. I first looked into test_scanner.py file and see TestParquet is using 'functional-query' workload, so I tried to do the same for HUDIPARQUET. Then I created a class TestHudiParquet in order to execute the queries under QueryTest/hudiparquet by following the same way how QueryTest/parquet gets executed. If my understanding was wrong here, would you point me to the right direction? -- To view, visit http://gerrit.cloudera.org:8080/14711 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I65e146b347714df32fe968409ef2dde1f6a25cdf Gerrit-Change-Number: 14711 Gerrit-PatchSet: 14 Gerrit-Owner: Yanjia Gary Li <yanjia.gary...@gmail.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Norbert Luksa <norbert.lu...@cloudera.com> Gerrit-Reviewer: Yanjia Gary Li <yanjia.gary...@gmail.com> Gerrit-Reviewer: Zoltan Borok-Nagy <borokna...@cloudera.com> Gerrit-Comment-Date: Wed, 15 Jan 2020 20:49:53 +0000 Gerrit-HasComments: Yes