Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/22914 )
Change subject: IMPALA-14081: Support create/drop paimon table for impala ...................................................................... Patch Set 8: (29 comments) Thanks for working on it, this will be a great addition to Impala! The change is huge, so I'll need to go over it a few more times. http://gerrit.cloudera.org:8080/#/c/22914/8//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/22914/8//COMMIT_MSG@9 PS8, Line 9: t nit: capital 'T' http://gerrit.cloudera.org:8080/#/c/22914/8//COMMIT_MSG@9 PS8, Line 9: pala. nit: we limit commit text body to 72 chars. See https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=65147115#ContributingtoImpala-Fix http://gerrit.cloudera.org:8080/#/c/22914/8//COMMIT_MSG@37 PS8, Line 37: EXTERNAL Is it required to have the EXTERNAL keyword here? Can't Impala create a brand new Paimon table in a hadoop catalog? http://gerrit.cloudera.org:8080/#/c/22914/8/bin/impala-config.sh File bin/impala-config.sh: http://gerrit.cloudera.org:8080/#/c/22914/8/bin/impala-config.sh@1244 PS8, Line 1244: echo "IMPALA_PAIMON_VERSION = $IMPALA_PAIMON_VERSION" nit: This line could come right after IMPALA_ICEBERG_VERSION, where we list the versions of libraries http://gerrit.cloudera.org:8080/#/c/22914/8/common/thrift/CatalogObjects.thrift File common/thrift/CatalogObjects.thrift: http://gerrit.cloudera.org:8080/#/c/22914/8/common/thrift/CatalogObjects.thrift@708 PS8, Line 708: R nit: missing space http://gerrit.cloudera.org:8080/#/c/22914/8/common/thrift/CatalogObjects.thrift@775 PS8, Line 775: // Set if this is a paimon table nit: Missing empty line before the comment http://gerrit.cloudera.org:8080/#/c/22914/8/docs/topics/impala_paimon.xml File docs/topics/impala_paimon.xml: http://gerrit.cloudera.org:8080/#/c/22914/8/docs/topics/impala_paimon.xml@40 PS8, Line 40: Impala now supports experimentally for Apache Paimon nit: Impala now adds experimental support for Apache Paimon? http://gerrit.cloudera.org:8080/#/c/22914/8/docs/topics/impala_paimon.xml@42 PS8, Line 42: h nit: capital 'H' http://gerrit.cloudera.org:8080/#/c/22914/8/docs/topics/impala_paimon.xml@77 PS8, Line 77: <li> : Append table (no primary-key) provides large scale batch & streaming processing : capability. Automatic Small File Merge. : </li> : <li> : Supports Data Compaction with z-order sorting to optimize file layout, : provides fast queries based on data skipping using indexes such as minmax. : </li> Not sure we want to mention these while Impala doesn't support these operations. http://gerrit.cloudera.org:8080/#/c/22914/8/fe/src/main/cup/sql-parser.cup File fe/src/main/cup/sql-parser.cup: http://gerrit.cloudera.org:8080/#/c/22914/8/fe/src/main/cup/sql-parser.cup@2302 PS8, Line 2302: | KW_STORED KW_BY storage_engine_val:storage_engine For Iceberg we also allow STORED BY ICEBERG, I think we should also allow it for PAIMON. We just need to add PAIMON to storage_engine_val. http://gerrit.cloudera.org:8080/#/c/22914/8/fe/src/main/java/org/apache/impala/analysis/paimon/PaimonAnalyzer.java File fe/src/main/java/org/apache/impala/analysis/paimon/PaimonAnalyzer.java: http://gerrit.cloudera.org:8080/#/c/22914/8/fe/src/main/java/org/apache/impala/analysis/paimon/PaimonAnalyzer.java@92 PS8, Line 92: /** nit: missing line before the comment http://gerrit.cloudera.org:8080/#/c/22914/8/fe/src/main/java/org/apache/impala/analysis/paimon/PaimonAnalyzer.java@165 PS8, Line 165: // if (stmt.isExternalWithNoPurge() && : // catalog.equals(TPaimonCatalog.HIVE_CATALOG)) { : // throw new AnalysisException("Cannot create EXTERNAL Paimon table in the : // " + : // "Hive Catalog."); : // } Please uncomment or remove. http://gerrit.cloudera.org:8080/#/c/22914/8/fe/src/main/java/org/apache/impala/analysis/paimon/PaimonCatalogOpExecutor.java File fe/src/main/java/org/apache/impala/analysis/paimon/PaimonCatalogOpExecutor.java: http://gerrit.cloudera.org:8080/#/c/22914/8/fe/src/main/java/org/apache/impala/analysis/paimon/PaimonCatalogOpExecutor.java@18 PS8, Line 18: analysis This class fits better under the catalog package, similarly to IcebergCatalogOpExecutor. http://gerrit.cloudera.org:8080/#/c/22914/8/fe/src/main/java/org/apache/impala/analysis/paimon/PaimonCatalogOpExecutor.java@21 PS8, Line 21: nit: unnecessary empty line http://gerrit.cloudera.org:8080/#/c/22914/8/fe/src/main/java/org/apache/impala/catalog/local/LocalPaimonTable.java File fe/src/main/java/org/apache/impala/catalog/local/LocalPaimonTable.java: http://gerrit.cloudera.org:8080/#/c/22914/8/fe/src/main/java/org/apache/impala/catalog/local/LocalPaimonTable.java@35 PS8, Line 35: Iceberg Paimon http://gerrit.cloudera.org:8080/#/c/22914/8/fe/src/main/java/org/apache/impala/catalog/paimon/FePaimonTable.java File fe/src/main/java/org/apache/impala/catalog/paimon/FePaimonTable.java: http://gerrit.cloudera.org:8080/#/c/22914/8/fe/src/main/java/org/apache/impala/catalog/paimon/FePaimonTable.java@48 PS8, Line 48: /** : * Return paimon catalog from table properties : */ Method is missing for this comment. http://gerrit.cloudera.org:8080/#/c/22914/8/fe/src/main/java/org/apache/impala/catalog/paimon/PaimonTable.java File fe/src/main/java/org/apache/impala/catalog/paimon/PaimonTable.java: http://gerrit.cloudera.org:8080/#/c/22914/8/fe/src/main/java/org/apache/impala/catalog/paimon/PaimonTable.java@180 PS8, Line 180: /** : * Add virtual columns for Paimon table : */ : private void addVirtualColumns() { : addVirtualColumn(VirtualColumn.INPUT_FILE_NAME); : addVirtualColumn(VirtualColumn.FILE_POSITION); : } Will the Paimon scanners support these virtual columns? http://gerrit.cloudera.org:8080/#/c/22914/8/fe/src/main/java/org/apache/impala/catalog/paimon/PaimonTable.java@190 PS8, Line 190: <p> It doesn't have a closing </p> pair. http://gerrit.cloudera.org:8080/#/c/22914/8/fe/src/main/java/org/apache/impala/catalog/paimon/PaimonTable.java@221 PS8, Line 221: { storageMetadataLoadTime_ = ctxStorageLdTime.stop(); } nit: please write the body in a new line. Same goes for L224. http://gerrit.cloudera.org:8080/#/c/22914/8/fe/src/main/java/org/apache/impala/catalog/paimon/PaimonUtil.java File fe/src/main/java/org/apache/impala/catalog/paimon/PaimonUtil.java: http://gerrit.cloudera.org:8080/#/c/22914/8/fe/src/main/java/org/apache/impala/catalog/paimon/PaimonUtil.java@174 PS8, Line 174: /** nit: Missing empty line before comment. There are several other instances of it in this file. http://gerrit.cloudera.org:8080/#/c/22914/8/fe/src/main/java/org/apache/impala/catalog/paimon/PaimonUtil.java@381 PS8, Line 381: setAvgColLen You probably want to invoke setMaxColLen() here. http://gerrit.cloudera.org:8080/#/c/22914/8/fe/src/main/java/org/apache/impala/catalog/paimon/PaimonUtil.java@396 PS8, Line 396: setAvgColLen Same as L381. http://gerrit.cloudera.org:8080/#/c/22914/8/fe/src/main/java/org/apache/impala/catalog/paimon/PaimonUtil.java@564 PS8, Line 564: TODO Please file a ticket to track this. http://gerrit.cloudera.org:8080/#/c/22914/8/fe/src/main/java/org/apache/impala/service/Frontend.java File fe/src/main/java/org/apache/impala/service/Frontend.java: http://gerrit.cloudera.org:8080/#/c/22914/8/fe/src/main/java/org/apache/impala/service/Frontend.java@242 PS8, Line 242: import java.io.IOException; : import java.util.ArrayList; : import java.util.Arrays; : import java.util.Collection; : import java.util.Collections; : import java.util.Comparator; : import java.util.Date; : import java.util.EnumSet; : import java.util.HashMap; : import java.util.HashSet; : import java.util.Iterator; : import java.util.List; : import java.util.Set; : import java.util.concurrent.Callable; : import java.util.concurrent.ExecutionException; : import java.util.concurrent.ExecutorService; : import java.util.concurrent.Executors; : import java.util.concurrent.Future; : import java.util.concurrent.TimeUnit; : import java.util.concurrent.atomic.AtomicReference; : import java.util.stream.Collectors; : : import javax.annotation.Nullable; Seems there are still a few import relocations. http://gerrit.cloudera.org:8080/#/c/22914/8/fe/src/test/java/org/apache/impala/catalog/paimon/ImpalaTypeUtilsTest.java File fe/src/test/java/org/apache/impala/catalog/paimon/ImpalaTypeUtilsTest.java: http://gerrit.cloudera.org:8080/#/c/22914/8/fe/src/test/java/org/apache/impala/catalog/paimon/ImpalaTypeUtilsTest.java@54 PS8, Line 54: Assert.assertTrue(true); Is it needed? http://gerrit.cloudera.org:8080/#/c/22914/8/testdata/workloads/functional-query/queries/QueryTest/show-create-table-paimon.test File testdata/workloads/functional-query/queries/QueryTest/show-create-table-paimon.test: http://gerrit.cloudera.org:8080/#/c/22914/8/testdata/workloads/functional-query/queries/QueryTest/show-create-table-paimon.test@77 PS8, Line 77: PRIMARY KEY (user_id) Is it possible to define the PRIMARY KEY like this? If so, we should add a test for this syntax. http://gerrit.cloudera.org:8080/#/c/22914/8/tests/custom_cluster/test_paimon.py File tests/custom_cluster/test_paimon.py: http://gerrit.cloudera.org:8080/#/c/22914/8/tests/custom_cluster/test_paimon.py@26 PS8, Line 26: nit: we only use +2 indent in Python code We only use +4 indent for continuation lines, e.g.: invoke_foo(long_param_list, more_params) Or we can indent those to the parantheses: invoke_foo(long_param_list, more_params) http://gerrit.cloudera.org:8080/#/c/22914/8/tests/custom_cluster/test_paimon.py@32 PS8, Line 32: @pytest.mark.execute_serially The test uses a unique_database, which means it is OK to run it in parallel with other tests. http://gerrit.cloudera.org:8080/#/c/22914/8/tests/custom_cluster/test_paimon.py@34 PS8, Line 34: use_db=u nit: 'usedb=' is not necessary here. -- To view, visit http://gerrit.cloudera.org:8080/22914 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I57e77f28151e4a91353ef77050f9f0cd7d9d05ef Gerrit-Change-Number: 22914 Gerrit-PatchSet: 8 Gerrit-Owner: ji chen <[email protected]> Gerrit-Reviewer: Impala Public Jenkins <[email protected]> Gerrit-Reviewer: Riza Suminto <[email protected]> Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]> Gerrit-Comment-Date: Thu, 05 Jun 2025 15:57:53 +0000 Gerrit-HasComments: Yes
