Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/22914 )

Change subject: IMPALA-14081: Support create/drop paimon table for impala
......................................................................


Patch Set 8:

(29 comments)

Thanks for working on it, this will be a great addition to Impala! The change 
is huge, so I'll need to go over it a few more times.

http://gerrit.cloudera.org:8080/#/c/22914/8//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/22914/8//COMMIT_MSG@9
PS8, Line 9: t
nit: capital 'T'


http://gerrit.cloudera.org:8080/#/c/22914/8//COMMIT_MSG@9
PS8, Line 9: pala.
nit: we limit commit text body to 72 chars.

See 
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=65147115#ContributingtoImpala-Fix


http://gerrit.cloudera.org:8080/#/c/22914/8//COMMIT_MSG@37
PS8, Line 37: EXTERNAL
Is it required to have the EXTERNAL keyword here? Can't Impala create a brand 
new Paimon table in a hadoop catalog?


http://gerrit.cloudera.org:8080/#/c/22914/8/bin/impala-config.sh
File bin/impala-config.sh:

http://gerrit.cloudera.org:8080/#/c/22914/8/bin/impala-config.sh@1244
PS8, Line 1244: echo "IMPALA_PAIMON_VERSION   = $IMPALA_PAIMON_VERSION"
nit: This line could come right after IMPALA_ICEBERG_VERSION, where we list the 
versions of libraries


http://gerrit.cloudera.org:8080/#/c/22914/8/common/thrift/CatalogObjects.thrift
File common/thrift/CatalogObjects.thrift:

http://gerrit.cloudera.org:8080/#/c/22914/8/common/thrift/CatalogObjects.thrift@708
PS8, Line 708: R
nit: missing space


http://gerrit.cloudera.org:8080/#/c/22914/8/common/thrift/CatalogObjects.thrift@775
PS8, Line 775:   // Set if this is a paimon table
nit: Missing empty line before the comment


http://gerrit.cloudera.org:8080/#/c/22914/8/docs/topics/impala_paimon.xml
File docs/topics/impala_paimon.xml:

http://gerrit.cloudera.org:8080/#/c/22914/8/docs/topics/impala_paimon.xml@40
PS8, Line 40: Impala now supports experimentally for Apache Paimon
nit: Impala now adds experimental support for Apache Paimon?


http://gerrit.cloudera.org:8080/#/c/22914/8/docs/topics/impala_paimon.xml@42
PS8, Line 42: h
nit: capital 'H'


http://gerrit.cloudera.org:8080/#/c/22914/8/docs/topics/impala_paimon.xml@77
PS8, Line 77:           <li>
            :             Append table (no primary-key) provides large scale 
batch &amp; streaming processing
            :             capability. Automatic Small File Merge.
            :           </li>
            :           <li>
            :             Supports Data Compaction with z-order sorting to 
optimize file layout,
            :             provides fast queries based on data skipping using 
indexes such as minmax.
            :           </li>
Not sure we want to mention these while Impala doesn't support these operations.


http://gerrit.cloudera.org:8080/#/c/22914/8/fe/src/main/cup/sql-parser.cup
File fe/src/main/cup/sql-parser.cup:

http://gerrit.cloudera.org:8080/#/c/22914/8/fe/src/main/cup/sql-parser.cup@2302
PS8, Line 2302:   | KW_STORED KW_BY storage_engine_val:storage_engine
For Iceberg we also allow STORED BY ICEBERG, I think we should also allow it 
for PAIMON. We just need to add PAIMON to storage_engine_val.


http://gerrit.cloudera.org:8080/#/c/22914/8/fe/src/main/java/org/apache/impala/analysis/paimon/PaimonAnalyzer.java
File fe/src/main/java/org/apache/impala/analysis/paimon/PaimonAnalyzer.java:

http://gerrit.cloudera.org:8080/#/c/22914/8/fe/src/main/java/org/apache/impala/analysis/paimon/PaimonAnalyzer.java@92
PS8, Line 92:   /**
nit: missing line before the comment


http://gerrit.cloudera.org:8080/#/c/22914/8/fe/src/main/java/org/apache/impala/analysis/paimon/PaimonAnalyzer.java@165
PS8, Line 165:     //       if (stmt.isExternalWithNoPurge() &&
             :     //       catalog.equals(TPaimonCatalog.HIVE_CATALOG)) {
             :     //            throw new AnalysisException("Cannot create 
EXTERNAL Paimon table in the
             :     //            " +
             :     //                    "Hive Catalog.");
             :     //        }
Please uncomment or remove.


http://gerrit.cloudera.org:8080/#/c/22914/8/fe/src/main/java/org/apache/impala/analysis/paimon/PaimonCatalogOpExecutor.java
File 
fe/src/main/java/org/apache/impala/analysis/paimon/PaimonCatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/22914/8/fe/src/main/java/org/apache/impala/analysis/paimon/PaimonCatalogOpExecutor.java@18
PS8, Line 18: analysis
This class fits better under the catalog package, similarly to 
IcebergCatalogOpExecutor.


http://gerrit.cloudera.org:8080/#/c/22914/8/fe/src/main/java/org/apache/impala/analysis/paimon/PaimonCatalogOpExecutor.java@21
PS8, Line 21:
nit: unnecessary empty line


http://gerrit.cloudera.org:8080/#/c/22914/8/fe/src/main/java/org/apache/impala/catalog/local/LocalPaimonTable.java
File fe/src/main/java/org/apache/impala/catalog/local/LocalPaimonTable.java:

http://gerrit.cloudera.org:8080/#/c/22914/8/fe/src/main/java/org/apache/impala/catalog/local/LocalPaimonTable.java@35
PS8, Line 35: Iceberg
Paimon


http://gerrit.cloudera.org:8080/#/c/22914/8/fe/src/main/java/org/apache/impala/catalog/paimon/FePaimonTable.java
File fe/src/main/java/org/apache/impala/catalog/paimon/FePaimonTable.java:

http://gerrit.cloudera.org:8080/#/c/22914/8/fe/src/main/java/org/apache/impala/catalog/paimon/FePaimonTable.java@48
PS8, Line 48:   /**
            :    * Return paimon catalog from table properties
            :    */
Method is missing for this comment.


http://gerrit.cloudera.org:8080/#/c/22914/8/fe/src/main/java/org/apache/impala/catalog/paimon/PaimonTable.java
File fe/src/main/java/org/apache/impala/catalog/paimon/PaimonTable.java:

http://gerrit.cloudera.org:8080/#/c/22914/8/fe/src/main/java/org/apache/impala/catalog/paimon/PaimonTable.java@180
PS8, Line 180:   /**
             :    * Add virtual columns for Paimon table
             :    */
             :   private void addVirtualColumns() {
             :     addVirtualColumn(VirtualColumn.INPUT_FILE_NAME);
             :     addVirtualColumn(VirtualColumn.FILE_POSITION);
             :   }
Will the Paimon scanners support these virtual columns?


http://gerrit.cloudera.org:8080/#/c/22914/8/fe/src/main/java/org/apache/impala/catalog/paimon/PaimonTable.java@190
PS8, Line 190: <p>
It doesn't have a closing </p> pair.


http://gerrit.cloudera.org:8080/#/c/22914/8/fe/src/main/java/org/apache/impala/catalog/paimon/PaimonTable.java@221
PS8, Line 221: { storageMetadataLoadTime_ = ctxStorageLdTime.stop(); }
nit: please write the body in a new line. Same goes for L224.


http://gerrit.cloudera.org:8080/#/c/22914/8/fe/src/main/java/org/apache/impala/catalog/paimon/PaimonUtil.java
File fe/src/main/java/org/apache/impala/catalog/paimon/PaimonUtil.java:

http://gerrit.cloudera.org:8080/#/c/22914/8/fe/src/main/java/org/apache/impala/catalog/paimon/PaimonUtil.java@174
PS8, Line 174:   /**
nit: Missing empty line before comment. There are several other instances of it 
in this file.


http://gerrit.cloudera.org:8080/#/c/22914/8/fe/src/main/java/org/apache/impala/catalog/paimon/PaimonUtil.java@381
PS8, Line 381: setAvgColLen
You probably want to invoke setMaxColLen() here.


http://gerrit.cloudera.org:8080/#/c/22914/8/fe/src/main/java/org/apache/impala/catalog/paimon/PaimonUtil.java@396
PS8, Line 396: setAvgColLen
Same as L381.


http://gerrit.cloudera.org:8080/#/c/22914/8/fe/src/main/java/org/apache/impala/catalog/paimon/PaimonUtil.java@564
PS8, Line 564: TODO
Please file a ticket to track this.


http://gerrit.cloudera.org:8080/#/c/22914/8/fe/src/main/java/org/apache/impala/service/Frontend.java
File fe/src/main/java/org/apache/impala/service/Frontend.java:

http://gerrit.cloudera.org:8080/#/c/22914/8/fe/src/main/java/org/apache/impala/service/Frontend.java@242
PS8, Line 242: import java.io.IOException;
             : import java.util.ArrayList;
             : import java.util.Arrays;
             : import java.util.Collection;
             : import java.util.Collections;
             : import java.util.Comparator;
             : import java.util.Date;
             : import java.util.EnumSet;
             : import java.util.HashMap;
             : import java.util.HashSet;
             : import java.util.Iterator;
             : import java.util.List;
             : import java.util.Set;
             : import java.util.concurrent.Callable;
             : import java.util.concurrent.ExecutionException;
             : import java.util.concurrent.ExecutorService;
             : import java.util.concurrent.Executors;
             : import java.util.concurrent.Future;
             : import java.util.concurrent.TimeUnit;
             : import java.util.concurrent.atomic.AtomicReference;
             : import java.util.stream.Collectors;
             :
             : import javax.annotation.Nullable;
Seems there are still a few import relocations.


http://gerrit.cloudera.org:8080/#/c/22914/8/fe/src/test/java/org/apache/impala/catalog/paimon/ImpalaTypeUtilsTest.java
File fe/src/test/java/org/apache/impala/catalog/paimon/ImpalaTypeUtilsTest.java:

http://gerrit.cloudera.org:8080/#/c/22914/8/fe/src/test/java/org/apache/impala/catalog/paimon/ImpalaTypeUtilsTest.java@54
PS8, Line 54:  Assert.assertTrue(true);
Is it needed?


http://gerrit.cloudera.org:8080/#/c/22914/8/testdata/workloads/functional-query/queries/QueryTest/show-create-table-paimon.test
File 
testdata/workloads/functional-query/queries/QueryTest/show-create-table-paimon.test:

http://gerrit.cloudera.org:8080/#/c/22914/8/testdata/workloads/functional-query/queries/QueryTest/show-create-table-paimon.test@77
PS8, Line 77: PRIMARY KEY (user_id)
Is it possible to define the PRIMARY KEY like this? If so, we should add a test 
for this syntax.


http://gerrit.cloudera.org:8080/#/c/22914/8/tests/custom_cluster/test_paimon.py
File tests/custom_cluster/test_paimon.py:

http://gerrit.cloudera.org:8080/#/c/22914/8/tests/custom_cluster/test_paimon.py@26
PS8, Line 26:
nit: we only use +2 indent in Python code

We only use +4 indent for continuation lines, e.g.:

  invoke_foo(long_param_list,
      more_params)

Or we can indent those to the parantheses:

  invoke_foo(long_param_list,
             more_params)


http://gerrit.cloudera.org:8080/#/c/22914/8/tests/custom_cluster/test_paimon.py@32
PS8, Line 32:     @pytest.mark.execute_serially
The test uses a unique_database, which means it is OK to run it in parallel 
with other tests.


http://gerrit.cloudera.org:8080/#/c/22914/8/tests/custom_cluster/test_paimon.py@34
PS8, Line 34: use_db=u
nit: 'usedb=' is not necessary here.



--
To view, visit http://gerrit.cloudera.org:8080/22914
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I57e77f28151e4a91353ef77050f9f0cd7d9d05ef
Gerrit-Change-Number: 22914
Gerrit-PatchSet: 8
Gerrit-Owner: ji chen <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Riza Suminto <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>
Gerrit-Comment-Date: Thu, 05 Jun 2025 15:57:53 +0000
Gerrit-HasComments: Yes

Reply via email to