[Impala-ASF-CR] IMPALA-9741: Supported query Icebreg table by impala
wangsheng has uploaded a new patch set (#8). ( http://gerrit.cloudera.org:8080/16143 ) Change subject: IMPALA-9741: Supported query Icebreg table by impala .. IMPALA-9741: Supported query Icebreg table by impala This patch mainly realizes the query of iceberg table through impala, we can use the following sql to create an external iceberg table: CREATE EXTERNAL TABLE default.iceberg_test ( level string, event_time timestamp, message string, ) STORED AS ICEBERG LOCATION 'hdfs://xxx' TBLPROPERTIES ('iceberg_file_format'='parquet'); Or just including table name and location like this: CREATE EXTERNAL TABLE default.iceberg_test STORED AS ICEBERG LOCATION 'hdfs://xxx' TBLPROPERTIES ('iceberg_file_format'='parquet'); 'iceberg_file_format' is the file format in iceberg, currently only support PARQUET, other format would be supported in the future. And if you don't identity this property in your SQL, default file format is PARQUET. We achieved this function by treating the iceberg table as normal unpartitioned hdfs table. When query iceberg table, we pushdown partition column predicates to iceberg to decided which data files need to be scanned, and then transformed these information to BE to do the real scan operation. Testing: - Unit test for Iceberg in FileMetadataLoaderTest - Create table tests in functional_schema_template.sql - Iceberg table query test in custom cluster test test_iceberg.py Change-Id: I856cfee4f3397d1a89cf17650e8d4fbfe1f2b006 --- M be/src/runtime/descriptors.cc M bin/rat_exclude_files.txt M common/thrift/CatalogObjects.thrift M fe/pom.xml M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java M fe/src/main/java/org/apache/impala/analysis/IcebergPartitionField.java M fe/src/main/java/org/apache/impala/analysis/IcebergPartitionSpec.java M fe/src/main/java/org/apache/impala/analysis/ShowFilesStmt.java M fe/src/main/java/org/apache/impala/analysis/ShowStatsStmt.java M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/IcebergTable.java M fe/src/main/java/org/apache/impala/catalog/local/LocalFsPartition.java M fe/src/main/java/org/apache/impala/catalog/local/LocalFsTable.java M fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java A fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java M fe/src/main/java/org/apache/impala/service/Frontend.java M fe/src/main/java/org/apache/impala/util/IcebergUtil.java M fe/src/test/java/org/apache/impala/catalog/FileMetadataLoaderTest.java M testdata/data/README A testdata/data/iceberg_test/iceberg_non_partitioned/data/1-100-e1a80ed6-1064-494d-9cdd-c4a30c1ab8dc-0.parquet A testdata/data/iceberg_test/iceberg_non_partitioned/data/3-102-511427f2-85f0-43ae-9b39-a456f8dc57b6-0.parquet A testdata/data/iceberg_test/iceberg_non_partitioned/data/4-103-00fc55e1-6ef7-4241-ace2-6d075b9737fc-0.parquet A testdata/data/iceberg_test/iceberg_non_partitioned/data/6-105-ef9e76d5-c060-4040-8aa1-b7c275610daa-0.parquet A testdata/data/iceberg_test/iceberg_non_partitioned/data/7-106-c09c9c8d-9478-44f9-8501-f85f53112bc3-0.parquet A testdata/data/iceberg_test/iceberg_non_partitioned/data/9-108-3b4f06ac-dca3-4f4e-be60-bf42d9927b5b-0.parquet A testdata/data/iceberg_test/iceberg_non_partitioned/data/00011-110-1e653ccf-0963-4fb0-941c-32c9de13268b-0.parquet A testdata/data/iceberg_test/iceberg_non_partitioned/data/00012-111-dfa70658-eb4b-4fa0-9ffa-b892cf90d6ac-0.parquet A testdata/data/iceberg_test/iceberg_non_partitioned/data/00014-113-2d16e751-e2a4-4856-ab89-145996e3815e-0.parquet A testdata/data/iceberg_test/iceberg_non_partitioned/data/00015-114-0f710621-cbbf-4509-a93d-b58808978e2e-0.parquet A testdata/data/iceberg_test/iceberg_non_partitioned/data/00017-116-0b666c79-53df-4507-906c-542e65a83443-0.parquet A testdata/data/iceberg_test/iceberg_non_partitioned/data/00019-118-1bc6bc6e-e061-4da3-9d1e-a427a306c471-0.parquet A testdata/data/iceberg_test/iceberg_non_partitioned/data/00020-119-ae7b2c67-1538-4429-8246-4998960e3817-0.parquet A testdata/data/iceberg_test/iceberg_non_partitioned/data/00022-121-8db0f1e1-d88c-4aad-a8b3-24fd07329cdb-0.parquet A testdata/data/iceberg_test/iceberg_non_partitioned/data/00023-122-de57b6b0-f54b-40ac-85cd-e783505094b6-0.parquet A
[Impala-ASF-CR] IMPALA-9741: Supported query Icebreg table by impala
wangsheng has posted comments on this change. ( http://gerrit.cloudera.org:8080/16143 ) Change subject: IMPALA-9741: Supported query Icebreg table by impala .. Patch Set 8: (13 comments) Done! http://gerrit.cloudera.org:8080/#/c/16143/6/common/thrift/CatalogObjects.thrift File common/thrift/CatalogObjects.thrift: http://gerrit.cloudera.org:8080/#/c/16143/6/common/thrift/CatalogObjects.thrift@512 PS6, Line 512: column_to_sourc > nit: column_to_source_id ? Done http://gerrit.cloudera.org:8080/#/c/16143/6/common/thrift/CatalogObjects.thrift@515 PS6, Line 515: source_id_to_partition > The mapping is reversed. Name it "source_id_to_partition" ? Done http://gerrit.cloudera.org:8080/#/c/16143/6/common/thrift/CatalogObjects.thrift@516 PS6, Line 516: map path_md5_to_file > Please follow the above conventions for naming maps. Done http://gerrit.cloudera.org:8080/#/c/16143/6/fe/src/main/java/org/apache/impala/analysis/IcebergPartitionField.java File fe/src/main/java/org/apache/impala/analysis/IcebergPartitionField.java: http://gerrit.cloudera.org:8080/#/c/16143/6/fe/src/main/java/org/apache/impala/analysis/IcebergPartitionField.java@28 PS6, Line 28: // The id of the source column in the Iceberg table schema. The source column is : // used as the input for this partition field. > Might worth rewording it a bit: Done http://gerrit.cloudera.org:8080/#/c/16143/6/fe/src/main/java/org/apache/impala/analysis/ShowStatsStmt.java File fe/src/main/java/org/apache/impala/analysis/ShowStatsStmt.java: http://gerrit.cloudera.org:8080/#/c/16143/6/fe/src/main/java/org/apache/impala/analysis/ShowStatsStmt.java@88 PS6, Line 88: if (table_ instanceof FeIcebergTable) { : if (((FeIcebergTable) table_).getSourceIdToPartitionMap().isEmpty()) { : notPartitioned = true; : } > Probably we should treat all Iceberg tables as not partitioned, since it's Yes, you are right, we treated iceberg table as unpartitioned hdfs table, but iceberg table still has it's own partition info, we get this info by 'show partitions xxx.iceberg_table_test' like this: +--+---+--++---+ | Partition Id | Source Id | Field Id | Field Name | Field Partition Transform | +--+---+--++---+ | 0| 2 | 1000 | sex| IDENTITY | | 0| 3 | 1001 | action | IDENTITY | +--+---+--++---+ If I set 'notPartitioned' as true, even if getPartitionColToSourceIdMap() is not empty, how can I get the iceberg partition info? 'show partitions xxx.iceberg_table_test' will always return AnalysisException. http://gerrit.cloudera.org:8080/#/c/16143/6/fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java File fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java: http://gerrit.cloudera.org:8080/#/c/16143/6/fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java@66 PS6, Line 66: getPathMD5ToFi > nit: getPartitionToFileDescMap Done http://gerrit.cloudera.org:8080/#/c/16143/6/fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java@219 PS6, Line 219: isPartitioned(Fe > nit: isPartitioned? Done http://gerrit.cloudera.org:8080/#/c/16143/6/fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java@258 PS6, Line 258: PartitionColToSourceId > It returns a mapping from source ids to partition columns, therefore please Done http://gerrit.cloudera.org:8080/#/c/16143/6/fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java@271 PS6, Line 271: getColumnToSourc > nit: getColumnToSourceIdMap? Done http://gerrit.cloudera.org:8080/#/c/16143/6/fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java@305 PS6, Line 305: > nit: wrong indentation Done http://gerrit.cloudera.org:8080/#/c/16143/6/fe/src/main/java/org/apache/impala/util/IcebergUtil.java File fe/src/main/java/org/apache/impala/util/IcebergUtil.java: http://gerrit.cloudera.org:8080/#/c/16143/6/fe/src/main/java/org/apache/impala/util/IcebergUtil.java@114 PS6, Line 114: if ("PARQUET".equalsIgnoreCase(format)) return TIcebergFileFormat.PARQUET; : return null; : } : : /** :* Build TIceb > How about: Done http://gerrit.cloudera.org:8080/#/c/16143/6/testdata/bin/generate-schema-statements.py File testdata/bin/generate-schema-statements.py: http://gerrit.cloudera.org:8080/#/c/16143/6/testdata/bin/generate-schema-statements.py@193 PS6, Line 193: } > You probably don't need to modify this file. I think adding HUDIPARQUET to Done http://gerrit.cloudera.org:8080/#/c/16143/6/testdata/bin/generate-schema-statements.py@766 PS6, Line 766: > flake8: E501 line too
[Impala-ASF-CR] IMPALA-9741: Supported query Icebreg table by impala
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16143 ) Change subject: IMPALA-9741: Supported query Icebreg table by impala .. Patch Set 7: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/6612/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16143 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I856cfee4f3397d1a89cf17650e8d4fbfe1f2b006 Gerrit-Change-Number: 16143 Gerrit-PatchSet: 7 Gerrit-Owner: wangsheng Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: wangsheng Gerrit-Comment-Date: Thu, 16 Jul 2020 05:13:55 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9741: Supported query Icebreg table by impala
wangsheng has uploaded a new patch set (#7). ( http://gerrit.cloudera.org:8080/16143 ) Change subject: IMPALA-9741: Supported query Icebreg table by impala .. IMPALA-9741: Supported query Icebreg table by impala This patch mainly realizes the query of iceberg table through impala, we can use the following sql to create an external iceberg table: CREATE EXTERNAL TABLE default.iceberg_test ( level string, event_time timestamp, message string, ) STORED AS ICEBERG LOCATION 'hdfs://xxx' TBLPROPERTIES ('iceberg_file_format'='parquet'); Or just including table name and location like this: CREATE EXTERNAL TABLE default.iceberg_test STORED AS ICEBERG LOCATION 'hdfs://xxx' TBLPROPERTIES ('iceberg_file_format'='parquet'); 'iceberg_file_format' is the file format in iceberg, currently only support PARQUET, other format would be supported in the future. And if you don't identity this property in your SQL, default file format is PARQUET. We achieved this function by treating the iceberg table as normal unpartitioned hdfs table. When query iceberg table, we pushdown partition column predicates to iceberg to decided which data files need to be scanned, and then transformed these information to BE to do the real scan operation. Testing: - Unit test for Iceberg in FileMetadataLoaderTest - Create table tests in functional_schema_template.sql - Iceberg table query test in custom cluster test test_iceberg.py Change-Id: I856cfee4f3397d1a89cf17650e8d4fbfe1f2b006 --- M be/src/runtime/descriptors.cc M bin/rat_exclude_files.txt M common/thrift/CatalogObjects.thrift M fe/pom.xml M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java M fe/src/main/java/org/apache/impala/analysis/IcebergPartitionField.java M fe/src/main/java/org/apache/impala/analysis/IcebergPartitionSpec.java M fe/src/main/java/org/apache/impala/analysis/ShowFilesStmt.java M fe/src/main/java/org/apache/impala/analysis/ShowStatsStmt.java M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/IcebergTable.java M fe/src/main/java/org/apache/impala/catalog/local/LocalFsPartition.java M fe/src/main/java/org/apache/impala/catalog/local/LocalFsTable.java M fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java A fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java M fe/src/main/java/org/apache/impala/service/Frontend.java M fe/src/main/java/org/apache/impala/util/IcebergUtil.java M fe/src/test/java/org/apache/impala/catalog/FileMetadataLoaderTest.java M testdata/data/README A testdata/data/iceberg_test/iceberg_non_partitioned/data/1-100-e1a80ed6-1064-494d-9cdd-c4a30c1ab8dc-0.parquet A testdata/data/iceberg_test/iceberg_non_partitioned/data/3-102-511427f2-85f0-43ae-9b39-a456f8dc57b6-0.parquet A testdata/data/iceberg_test/iceberg_non_partitioned/data/4-103-00fc55e1-6ef7-4241-ace2-6d075b9737fc-0.parquet A testdata/data/iceberg_test/iceberg_non_partitioned/data/6-105-ef9e76d5-c060-4040-8aa1-b7c275610daa-0.parquet A testdata/data/iceberg_test/iceberg_non_partitioned/data/7-106-c09c9c8d-9478-44f9-8501-f85f53112bc3-0.parquet A testdata/data/iceberg_test/iceberg_non_partitioned/data/9-108-3b4f06ac-dca3-4f4e-be60-bf42d9927b5b-0.parquet A testdata/data/iceberg_test/iceberg_non_partitioned/data/00011-110-1e653ccf-0963-4fb0-941c-32c9de13268b-0.parquet A testdata/data/iceberg_test/iceberg_non_partitioned/data/00012-111-dfa70658-eb4b-4fa0-9ffa-b892cf90d6ac-0.parquet A testdata/data/iceberg_test/iceberg_non_partitioned/data/00014-113-2d16e751-e2a4-4856-ab89-145996e3815e-0.parquet A testdata/data/iceberg_test/iceberg_non_partitioned/data/00015-114-0f710621-cbbf-4509-a93d-b58808978e2e-0.parquet A testdata/data/iceberg_test/iceberg_non_partitioned/data/00017-116-0b666c79-53df-4507-906c-542e65a83443-0.parquet A testdata/data/iceberg_test/iceberg_non_partitioned/data/00019-118-1bc6bc6e-e061-4da3-9d1e-a427a306c471-0.parquet A testdata/data/iceberg_test/iceberg_non_partitioned/data/00020-119-ae7b2c67-1538-4429-8246-4998960e3817-0.parquet A testdata/data/iceberg_test/iceberg_non_partitioned/data/00022-121-8db0f1e1-d88c-4aad-a8b3-24fd07329cdb-0.parquet A testdata/data/iceberg_test/iceberg_non_partitioned/data/00023-122-de57b6b0-f54b-40ac-85cd-e783505094b6-0.parquet A
[Impala-ASF-CR] IMPALA-9741: Supported query Icebreg table by impala
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/16143 ) Change subject: IMPALA-9741: Supported query Icebreg table by impala .. Patch Set 6: (15 comments) http://gerrit.cloudera.org:8080/#/c/16143/6//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/16143/6//COMMIT_MSG@7 PS6, Line 7: Icebreg It's still misspelled http://gerrit.cloudera.org:8080/#/c/16143/6//COMMIT_MSG@7 PS6, Line 7: Supported query nit: Support querying http://gerrit.cloudera.org:8080/#/c/16143/6//COMMIT_MSG@26 PS6, Line 26: identity specify http://gerrit.cloudera.org:8080/#/c/16143/6/common/thrift/CatalogObjects.thrift File common/thrift/CatalogObjects.thrift: http://gerrit.cloudera.org:8080/#/c/16143/6/common/thrift/CatalogObjects.thrift@512 PS6, Line 512: source_cols_map nit: column_to_source_id ? http://gerrit.cloudera.org:8080/#/c/16143/6/common/thrift/CatalogObjects.thrift@515 PS6, Line 515: partition_col_to_source_id_map The mapping is reversed. Name it "source_id_to_partition" ? http://gerrit.cloudera.org:8080/#/c/16143/6/common/thrift/CatalogObjects.thrift@516 PS6, Line 516: map file_descriptors Please follow the above conventions for naming maps. http://gerrit.cloudera.org:8080/#/c/16143/6/fe/src/main/java/org/apache/impala/analysis/IcebergPartitionField.java File fe/src/main/java/org/apache/impala/analysis/IcebergPartitionField.java: http://gerrit.cloudera.org:8080/#/c/16143/6/fe/src/main/java/org/apache/impala/analysis/IcebergPartitionField.java@28 PS6, Line 28: // The id of the source field in iceberg table Schema, you can get these source : // fields by Schema.columns(), the return type is List. Might worth rewording it a bit: "The id of the source column in the Iceberg table schema. The source column is used as the input for this partition field." http://gerrit.cloudera.org:8080/#/c/16143/6/fe/src/main/java/org/apache/impala/analysis/ShowStatsStmt.java File fe/src/main/java/org/apache/impala/analysis/ShowStatsStmt.java: http://gerrit.cloudera.org:8080/#/c/16143/6/fe/src/main/java/org/apache/impala/analysis/ShowStatsStmt.java@88 PS6, Line 88: if (table_ instanceof FeIcebergTable) { : if (((FeIcebergTable) table_).getPartitionColToSourceIdMap().isEmpty()) { : notPartitioned = true; : } Probably we should treat all Iceberg tables as not partitioned, since it's partitioning is different than other file system tables' partitioning. http://gerrit.cloudera.org:8080/#/c/16143/6/fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java File fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java: http://gerrit.cloudera.org:8080/#/c/16143/6/fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java@66 PS6, Line 66: getFileDescMap nit: getPartitionToFileDescMap http://gerrit.cloudera.org:8080/#/c/16143/6/fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java@219 PS6, Line 219: isPartitionTable nit: isPartitioned? http://gerrit.cloudera.org:8080/#/c/16143/6/fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java@258 PS6, Line 258: PartitionColToSourceId It returns a mapping from source ids to partition columns, therefore please name it "sourceIdToPartitionCol". http://gerrit.cloudera.org:8080/#/c/16143/6/fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java@271 PS6, Line 271: getSourceColsMap nit: getColumnToSourceIdMap? http://gerrit.cloudera.org:8080/#/c/16143/6/fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java@305 PS6, Line 305: nit: wrong indentation http://gerrit.cloudera.org:8080/#/c/16143/6/fe/src/main/java/org/apache/impala/util/IcebergUtil.java File fe/src/main/java/org/apache/impala/util/IcebergUtil.java: http://gerrit.cloudera.org:8080/#/c/16143/6/fe/src/main/java/org/apache/impala/util/IcebergUtil.java@114 PS6, Line 114: if (format == null) return null; : format = format.toUpperCase(); : if (format.equals("PARQUET")) { : return TIcebergFileFormat.PARQUET; : } : return null; How about: if ("PARQUET".equalsIgnoreCase(format)) return TIcebergFileFormat.PARQUET; return null; http://gerrit.cloudera.org:8080/#/c/16143/6/testdata/bin/generate-schema-statements.py File testdata/bin/generate-schema-statements.py: http://gerrit.cloudera.org:8080/#/c/16143/6/testdata/bin/generate-schema-statements.py@193 PS6, Line 193: 'iceberg': 'ICEBERG' You probably don't need to modify this file. I think adding HUDIPARQUET to this file was also unnecessary. Probably we can do the same thing that we did for Hudi, i.e. add the Iceberg tables under the functional_parquet database. https://gerrit.cloudera.org/c/14711/25/testdata/datasets/functional/schema_constraints.csv
[Impala-ASF-CR] IMPALA-9741: Supported query Icebreg table by impala
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16143 ) Change subject: IMPALA-9741: Supported query Icebreg table by impala .. Patch Set 6: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/6604/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16143 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I856cfee4f3397d1a89cf17650e8d4fbfe1f2b006 Gerrit-Change-Number: 16143 Gerrit-PatchSet: 6 Gerrit-Owner: wangsheng Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: wangsheng Gerrit-Comment-Date: Wed, 15 Jul 2020 08:32:39 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9741: Supported query Icebreg table by impala
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16143 ) Change subject: IMPALA-9741: Supported query Icebreg table by impala .. Patch Set 6: (2 comments) http://gerrit.cloudera.org:8080/#/c/16143/6/testdata/bin/generate-schema-statements.py File testdata/bin/generate-schema-statements.py: http://gerrit.cloudera.org:8080/#/c/16143/6/testdata/bin/generate-schema-statements.py@766 PS6, Line 766: n flake8: E501 line too long (94 > 90 characters) http://gerrit.cloudera.org:8080/#/c/16143/6/tests/common/test_dimensions.py File tests/common/test_dimensions.py: http://gerrit.cloudera.org:8080/#/c/16143/6/tests/common/test_dimensions.py@32 PS6, Line 32: c flake8: E501 line too long (98 > 90 characters) -- To view, visit http://gerrit.cloudera.org:8080/16143 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I856cfee4f3397d1a89cf17650e8d4fbfe1f2b006 Gerrit-Change-Number: 16143 Gerrit-PatchSet: 6 Gerrit-Owner: wangsheng Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: wangsheng Gerrit-Comment-Date: Wed, 15 Jul 2020 08:05:11 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9741: Supported query Icebreg table by impala
wangsheng has uploaded a new patch set (#6). ( http://gerrit.cloudera.org:8080/16143 ) Change subject: IMPALA-9741: Supported query Icebreg table by impala .. IMPALA-9741: Supported query Icebreg table by impala This patch mainly realizes the query of iceberg table through impala, we can use the following sql to create an external iceberg table: CREATE EXTERNAL TABLE default.iceberg_test ( level string, event_time timestamp, message string, ) STORED AS ICEBERG LOCATION 'hdfs://xxx' TBLPROPERTIES ('iceberg_file_format'='parquet'); Or just including table name and location like this: CREATE EXTERNAL TABLE default.iceberg_test STORED AS ICEBERG LOCATION 'hdfs://xxx' TBLPROPERTIES ('iceberg_file_format'='parquet'); 'iceberg_file_format' is the file format in iceberg, currently only support PARQUET, other format would be supported in the future. And if you don't identity this property in your SQL, default file format is PARQUET. We achieved this function by treating the iceberg table as normal unpartitioned hdfs table. When query iceberg table, we pushdown partition column predicates to iceberg to decided which data files need to be scanned, and then transformed these information to BE to do the real scan operation. Testing: - Unit test for Iceberg in FileMetadataLoaderTest - Create table tests in functional_schema_template.sql - Iceberg table query test in custom cluster test test_iceberg.py Change-Id: I856cfee4f3397d1a89cf17650e8d4fbfe1f2b006 --- M be/src/runtime/descriptors.cc M bin/rat_exclude_files.txt M common/thrift/CatalogObjects.thrift M fe/pom.xml M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java M fe/src/main/java/org/apache/impala/analysis/IcebergPartitionField.java M fe/src/main/java/org/apache/impala/analysis/IcebergPartitionSpec.java M fe/src/main/java/org/apache/impala/analysis/ShowFilesStmt.java M fe/src/main/java/org/apache/impala/analysis/ShowStatsStmt.java M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/IcebergTable.java M fe/src/main/java/org/apache/impala/catalog/local/LocalFsPartition.java M fe/src/main/java/org/apache/impala/catalog/local/LocalFsTable.java M fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java A fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java M fe/src/main/java/org/apache/impala/service/Frontend.java M fe/src/main/java/org/apache/impala/util/IcebergUtil.java M fe/src/test/java/org/apache/impala/catalog/FileMetadataLoaderTest.java M testdata/bin/generate-schema-statements.py M testdata/data/README A testdata/data/iceberg_test/iceberg_non_partitioned/data/1-100-e1a80ed6-1064-494d-9cdd-c4a30c1ab8dc-0.parquet A testdata/data/iceberg_test/iceberg_non_partitioned/data/3-102-511427f2-85f0-43ae-9b39-a456f8dc57b6-0.parquet A testdata/data/iceberg_test/iceberg_non_partitioned/data/4-103-00fc55e1-6ef7-4241-ace2-6d075b9737fc-0.parquet A testdata/data/iceberg_test/iceberg_non_partitioned/data/6-105-ef9e76d5-c060-4040-8aa1-b7c275610daa-0.parquet A testdata/data/iceberg_test/iceberg_non_partitioned/data/7-106-c09c9c8d-9478-44f9-8501-f85f53112bc3-0.parquet A testdata/data/iceberg_test/iceberg_non_partitioned/data/9-108-3b4f06ac-dca3-4f4e-be60-bf42d9927b5b-0.parquet A testdata/data/iceberg_test/iceberg_non_partitioned/data/00011-110-1e653ccf-0963-4fb0-941c-32c9de13268b-0.parquet A testdata/data/iceberg_test/iceberg_non_partitioned/data/00012-111-dfa70658-eb4b-4fa0-9ffa-b892cf90d6ac-0.parquet A testdata/data/iceberg_test/iceberg_non_partitioned/data/00014-113-2d16e751-e2a4-4856-ab89-145996e3815e-0.parquet A testdata/data/iceberg_test/iceberg_non_partitioned/data/00015-114-0f710621-cbbf-4509-a93d-b58808978e2e-0.parquet A testdata/data/iceberg_test/iceberg_non_partitioned/data/00017-116-0b666c79-53df-4507-906c-542e65a83443-0.parquet A testdata/data/iceberg_test/iceberg_non_partitioned/data/00019-118-1bc6bc6e-e061-4da3-9d1e-a427a306c471-0.parquet A testdata/data/iceberg_test/iceberg_non_partitioned/data/00020-119-ae7b2c67-1538-4429-8246-4998960e3817-0.parquet A testdata/data/iceberg_test/iceberg_non_partitioned/data/00022-121-8db0f1e1-d88c-4aad-a8b3-24fd07329cdb-0.parquet A testdata/data/iceberg_test/iceberg_non_partitioned/data/00023-122-de57b6b0-f54b-40ac-85cd-e783505094b6-0.parquet A
[Impala-ASF-CR] IMPALA-9741: Supported query Icebreg table by impala
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16143 ) Change subject: IMPALA-9741: Supported query Icebreg table by impala .. Patch Set 5: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/6593/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16143 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I856cfee4f3397d1a89cf17650e8d4fbfe1f2b006 Gerrit-Change-Number: 16143 Gerrit-PatchSet: 5 Gerrit-Owner: wangsheng Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: wangsheng Gerrit-Comment-Date: Tue, 14 Jul 2020 15:18:10 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9741: Supported query Icebreg table by impala
wangsheng has uploaded a new patch set (#5). ( http://gerrit.cloudera.org:8080/16143 ) Change subject: IMPALA-9741: Supported query Icebreg table by impala .. IMPALA-9741: Supported query Icebreg table by impala This patch mainly realizes the query of iceberg table through impala, we can use the following sql to create an external iceberg table: CREATE EXTERNAL TABLE default.iceberg_test ( level string, event_time timestamp, message string, ) STORED AS ICEBERG LOCATION 'hdfs://xxx' TBLPROPERTIES ('iceberg_file_format'='parquet'); Or just including table name and location like this: CREATE EXTERNAL TABLE default.iceberg_test STORED AS ICEBERG LOCATION 'hdfs://xxx' TBLPROPERTIES ('iceberg_file_format'='parquet'); 'iceberg_file_format' is the file format in iceberg, currently only support PARQUET, other format would be supported in the future. And if you don't identity this property in your SQL, default file format is PARQUET. We achieved this function by treating the iceberg table as normal unpartitioned hdfs table. When query iceberg table, we pushdown partition column predicates to iceberg to decided which data files need to be scanned, and then transformed these information to BE to do the real scan operation. Testing: - Unit test for Iceberg in FileMetadataLoaderTest - Create table tests in functional_schema_template.sql - Iceberg table query test in custom cluster test test_iceberg.py Change-Id: I856cfee4f3397d1a89cf17650e8d4fbfe1f2b006 --- M be/src/runtime/descriptors.cc M bin/rat_exclude_files.txt M common/thrift/CatalogObjects.thrift M fe/pom.xml M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java M fe/src/main/java/org/apache/impala/analysis/IcebergPartitionField.java M fe/src/main/java/org/apache/impala/analysis/IcebergPartitionSpec.java M fe/src/main/java/org/apache/impala/analysis/ShowFilesStmt.java M fe/src/main/java/org/apache/impala/analysis/ShowStatsStmt.java M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/IcebergTable.java M fe/src/main/java/org/apache/impala/catalog/local/LocalFsPartition.java M fe/src/main/java/org/apache/impala/catalog/local/LocalFsTable.java M fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java A fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java M fe/src/main/java/org/apache/impala/service/Frontend.java M fe/src/main/java/org/apache/impala/util/IcebergUtil.java M fe/src/test/java/org/apache/impala/catalog/FileMetadataLoaderTest.java M testdata/data/README A testdata/data/iceberg_test/iceberg_non_partitioned/data/1-100-e1a80ed6-1064-494d-9cdd-c4a30c1ab8dc-0.parquet A testdata/data/iceberg_test/iceberg_non_partitioned/data/3-102-511427f2-85f0-43ae-9b39-a456f8dc57b6-0.parquet A testdata/data/iceberg_test/iceberg_non_partitioned/data/4-103-00fc55e1-6ef7-4241-ace2-6d075b9737fc-0.parquet A testdata/data/iceberg_test/iceberg_non_partitioned/data/6-105-ef9e76d5-c060-4040-8aa1-b7c275610daa-0.parquet A testdata/data/iceberg_test/iceberg_non_partitioned/data/7-106-c09c9c8d-9478-44f9-8501-f85f53112bc3-0.parquet A testdata/data/iceberg_test/iceberg_non_partitioned/data/9-108-3b4f06ac-dca3-4f4e-be60-bf42d9927b5b-0.parquet A testdata/data/iceberg_test/iceberg_non_partitioned/data/00011-110-1e653ccf-0963-4fb0-941c-32c9de13268b-0.parquet A testdata/data/iceberg_test/iceberg_non_partitioned/data/00012-111-dfa70658-eb4b-4fa0-9ffa-b892cf90d6ac-0.parquet A testdata/data/iceberg_test/iceberg_non_partitioned/data/00014-113-2d16e751-e2a4-4856-ab89-145996e3815e-0.parquet A testdata/data/iceberg_test/iceberg_non_partitioned/data/00015-114-0f710621-cbbf-4509-a93d-b58808978e2e-0.parquet A testdata/data/iceberg_test/iceberg_non_partitioned/data/00017-116-0b666c79-53df-4507-906c-542e65a83443-0.parquet A testdata/data/iceberg_test/iceberg_non_partitioned/data/00019-118-1bc6bc6e-e061-4da3-9d1e-a427a306c471-0.parquet A testdata/data/iceberg_test/iceberg_non_partitioned/data/00020-119-ae7b2c67-1538-4429-8246-4998960e3817-0.parquet A testdata/data/iceberg_test/iceberg_non_partitioned/data/00022-121-8db0f1e1-d88c-4aad-a8b3-24fd07329cdb-0.parquet A testdata/data/iceberg_test/iceberg_non_partitioned/data/00023-122-de57b6b0-f54b-40ac-85cd-e783505094b6-0.parquet A
[Impala-ASF-CR] IMPALA-9741: Supported query icebreg table by impala
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/16143 ) Change subject: IMPALA-9741: Supported query icebreg table by impala .. Patch Set 4: (10 comments) Thanks for working on this, it will be a really great addition to Impala! http://gerrit.cloudera.org:8080/#/c/16143/4//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/16143/4//COMMIT_MSG@7 PS4, Line 7: icebreg Iceberg http://gerrit.cloudera.org:8080/#/c/16143/4//COMMIT_MSG@26 PS4, Line 26: PARQUT PARQUET http://gerrit.cloudera.org:8080/#/c/16143/4//COMMIT_MSG@27 PS4, Line 27: Please add a high-level description about what this patch does. http://gerrit.cloudera.org:8080/#/c/16143/4/common/thrift/CatalogObjects.thrift File common/thrift/CatalogObjects.thrift: http://gerrit.cloudera.org:8080/#/c/16143/4/common/thrift/CatalogObjects.thrift@510 PS4, Line 510: source_cols_map please add some comment about the fields http://gerrit.cloudera.org:8080/#/c/16143/4/fe/src/main/java/org/apache/impala/analysis/ShowStatsStmt.java File fe/src/main/java/org/apache/impala/analysis/ShowStatsStmt.java: http://gerrit.cloudera.org:8080/#/c/16143/4/fe/src/main/java/org/apache/impala/analysis/ShowStatsStmt.java@86 PS4, Line 86: boolean flag = op_ == TShowStatsOp.PARTITIONS ? table_ instanceof FeIcebergTable ? : ((FeIcebergTable) table_).getPartitionColToSourceIdMap().isEmpty() : : table_.getNumClusteringCols() == 0 : false; nit: for readability, please use if statements instead of nested ternary operators http://gerrit.cloudera.org:8080/#/c/16143/4/fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java File fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java: http://gerrit.cloudera.org:8080/#/c/16143/4/fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java@69 PS4, Line 69: transfromed transformed http://gerrit.cloudera.org:8080/#/c/16143/4/fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java@71 PS4, Line 71: getFeFsTable Maybe rename to getHdfsTable() ? http://gerrit.cloudera.org:8080/#/c/16143/4/fe/src/main/java/org/apache/impala/util/IcebergUtil.java File fe/src/main/java/org/apache/impala/util/IcebergUtil.java: http://gerrit.cloudera.org:8080/#/c/16143/4/fe/src/main/java/org/apache/impala/util/IcebergUtil.java@115 PS4, Line 115: toUpperCase(); : if (format.equalsIgnoreCase nit: toUpperCase() or equalsIgnoreCase() is not needed. http://gerrit.cloudera.org:8080/#/c/16143/4/fe/src/main/java/org/apache/impala/util/IcebergUtil.java@246 PS4, Line 246: List dataFileList = new ArrayList<>(); : for (FileScanTask task : scan.planFiles()) { : dataFileList.add(task.file()); : } : return dataFileList; nit: return Lists.newArrayList(scan.planFiles()); http://gerrit.cloudera.org:8080/#/c/16143/4/fe/src/test/java/org/apache/impala/customcluster/IcebergTableQueryTest.java File fe/src/test/java/org/apache/impala/customcluster/IcebergTableQueryTest.java: http://gerrit.cloudera.org:8080/#/c/16143/4/fe/src/test/java/org/apache/impala/customcluster/IcebergTableQueryTest.java@42 PS4, Line 42: /** : * Test impala query iceberg table : * impala not supported insert into iceberg table now, so we construct iceberg : * table by iceberg api : */ Instead of writing the Iceberg table each time, can we just check it into the repository then copy it to the HDFS warehouse directory during data loading? We did something similar with Apache Hudi: https://gerrit.cloudera.org/c/14711/ After that you could create end-to-end tests in Python and in ".test" files. E.g.: https://github.com/apache/impala/blob/65722d3e9051d6a08cb1e69fd36a06684745c226/tests/query_test/test_scanners.py#L326-L340 https://github.com/apache/impala/blob/master/testdata/workloads/functional-query/queries/QueryTest/hudi-parquet.test -- To view, visit http://gerrit.cloudera.org:8080/16143 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I856cfee4f3397d1a89cf17650e8d4fbfe1f2b006 Gerrit-Change-Number: 16143 Gerrit-PatchSet: 4 Gerrit-Owner: wangsheng Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 10 Jul 2020 14:09:12 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-9741: Supported query icebreg table by impala
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16143 ) Change subject: IMPALA-9741: Supported query icebreg table by impala .. Patch Set 4: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/6549/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16143 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I856cfee4f3397d1a89cf17650e8d4fbfe1f2b006 Gerrit-Change-Number: 16143 Gerrit-PatchSet: 4 Gerrit-Owner: wangsheng Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Fri, 10 Jul 2020 12:54:17 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9741: Supported query icebreg table by impala
wangsheng has uploaded a new patch set (#4). ( http://gerrit.cloudera.org:8080/16143 ) Change subject: IMPALA-9741: Supported query icebreg table by impala .. IMPALA-9741: Supported query icebreg table by impala This patch mainly realizes the query of iceberg table through impala, we can use the following sql to create an external iceberg table: CREATE EXTERNAL TABLE default.iceberg_test ( level string, event_time timestamp, message string, ) STORED AS ICEBERG LOCATION 'hdfs://xxx' TBLPROPERTIES ('iceberg_file_format'='parquet'); Or just including table name and location like this: CREATE EXTERNAL TABLE default.iceberg_test STORED AS ICEBERG LOCATION 'hdfs://xxx' TBLPROPERTIES ('iceberg_file_format'='parquet'); 'iceberg_file_format' is the file format in iceberg, currently support PARQUET and ORC. And if you don't identity this property in your SQL, default file format is PARQUT. Testing: - Add fe test IcebergTableQueryTest.java Change-Id: I856cfee4f3397d1a89cf17650e8d4fbfe1f2b006 --- M be/src/runtime/descriptors.cc M common/thrift/CatalogObjects.thrift M fe/pom.xml M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java M fe/src/main/java/org/apache/impala/analysis/IcebergPartitionField.java M fe/src/main/java/org/apache/impala/analysis/IcebergPartitionSpec.java M fe/src/main/java/org/apache/impala/analysis/ShowFilesStmt.java M fe/src/main/java/org/apache/impala/analysis/ShowStatsStmt.java M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/IcebergTable.java M fe/src/main/java/org/apache/impala/catalog/local/LocalFsPartition.java M fe/src/main/java/org/apache/impala/catalog/local/LocalFsTable.java M fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java A fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java M fe/src/main/java/org/apache/impala/service/Frontend.java M fe/src/main/java/org/apache/impala/util/IcebergUtil.java A fe/src/test/java/org/apache/impala/customcluster/IcebergTableQueryTest.java M testdata/workloads/functional-query/queries/QueryTest/iceberg_create.test 25 files changed, 1,148 insertions(+), 158 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/43/16143/4 -- To view, visit http://gerrit.cloudera.org:8080/16143 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I856cfee4f3397d1a89cf17650e8d4fbfe1f2b006 Gerrit-Change-Number: 16143 Gerrit-PatchSet: 4 Gerrit-Owner: wangsheng Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-9741: Supported query icebreg table by impala
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16143 ) Change subject: IMPALA-9741: Supported query icebreg table by impala .. Patch Set 3: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/6537/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16143 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I856cfee4f3397d1a89cf17650e8d4fbfe1f2b006 Gerrit-Change-Number: 16143 Gerrit-PatchSet: 3 Gerrit-Owner: wangsheng Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Thu, 09 Jul 2020 03:11:41 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9741: Supported query icebreg table by impala
wangsheng has uploaded a new patch set (#3). ( http://gerrit.cloudera.org:8080/16143 ) Change subject: IMPALA-9741: Supported query icebreg table by impala .. IMPALA-9741: Supported query icebreg table by impala This patch mainly realizes the query of iceberg table through impala, we can use the following sql to create an external iceberg table: CREATE EXTERNAL TABLE default.iceberg_test ( level string, event_time timestamp, message string, ) STORED AS ICEBERG LOCATION 'hdfs://xxx' TBLPROPERTIES ('iceberg_file_format'='parquet'); Or just including table name and location like this: CREATE EXTERNAL TABLE default.iceberg_test STORED AS ICEBERG LOCATION 'hdfs://xxx' TBLPROPERTIES ('iceberg_file_format'='parquet'); 'iceberg_file_format' is the file format in iceberg, currently support PARQUET and ORC. And if you don't identity this property in your SQL, default file format is PARQUT. Testing: - Add fe test IcebergTableQueryTest.java Change-Id: I856cfee4f3397d1a89cf17650e8d4fbfe1f2b006 --- M be/src/runtime/descriptors.cc M common/thrift/CatalogObjects.thrift M fe/pom.xml M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java M fe/src/main/java/org/apache/impala/analysis/IcebergPartitionField.java M fe/src/main/java/org/apache/impala/analysis/IcebergPartitionSpec.java M fe/src/main/java/org/apache/impala/analysis/ShowFilesStmt.java M fe/src/main/java/org/apache/impala/analysis/ShowStatsStmt.java M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/IcebergTable.java M fe/src/main/java/org/apache/impala/catalog/local/LocalFsPartition.java M fe/src/main/java/org/apache/impala/catalog/local/LocalFsTable.java M fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java A fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java M fe/src/main/java/org/apache/impala/service/Frontend.java M fe/src/main/java/org/apache/impala/util/IcebergUtil.java A fe/src/test/java/org/apache/impala/customcluster/IcebergTableQueryTest.java M testdata/workloads/functional-query/queries/QueryTest/iceberg_create.test 25 files changed, 1,082 insertions(+), 155 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/43/16143/3 -- To view, visit http://gerrit.cloudera.org:8080/16143 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I856cfee4f3397d1a89cf17650e8d4fbfe1f2b006 Gerrit-Change-Number: 16143 Gerrit-PatchSet: 3 Gerrit-Owner: wangsheng Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-9741: Supported query icebreg table by impala
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16143 ) Change subject: IMPALA-9741: Supported query icebreg table by impala .. Patch Set 2: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/6496/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16143 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I856cfee4f3397d1a89cf17650e8d4fbfe1f2b006 Gerrit-Change-Number: 16143 Gerrit-PatchSet: 2 Gerrit-Owner: wangsheng Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Mon, 06 Jul 2020 08:01:42 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9741: Supported query icebreg table by impala
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16143 ) Change subject: IMPALA-9741: Supported query icebreg table by impala .. Patch Set 1: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/6495/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/16143 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I856cfee4f3397d1a89cf17650e8d4fbfe1f2b006 Gerrit-Change-Number: 16143 Gerrit-PatchSet: 1 Gerrit-Owner: wangsheng Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Mon, 06 Jul 2020 07:49:19 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-9741: Supported query icebreg table by impala
wangsheng has uploaded a new patch set (#2). ( http://gerrit.cloudera.org:8080/16143 ) Change subject: IMPALA-9741: Supported query icebreg table by impala .. IMPALA-9741: Supported query icebreg table by impala This patch mainly realizes the query of iceberg table through impala, we can use the following sql to create an external iceberg table: CREATE EXTERNAL TABLE default.iceberg_test ( level string, event_time timestamp, message string, ) STORED AS ICEBERG LOCATION 'hdfs://xxx' TBLPROPERTIES ('iceberg_file_format'='parquet'); Or just including table name and location like this: CREATE EXTERNAL TABLE default.iceberg_test STORED AS ICEBERG LOCATION 'hdfs://xxx' TBLPROPERTIES ('iceberg_file_format'='parquet'); 'iceberg_file_format' is the file format in iceberg, currently support PARQUET and ORC. And if you don't identity this property in your SQL, default file format is PARQUT. Change-Id: I856cfee4f3397d1a89cf17650e8d4fbfe1f2b006 --- M be/src/runtime/descriptors.cc M common/thrift/CatalogObjects.thrift M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java M fe/src/main/java/org/apache/impala/analysis/IcebergPartitionField.java M fe/src/main/java/org/apache/impala/analysis/IcebergPartitionSpec.java M fe/src/main/java/org/apache/impala/analysis/ShowFilesStmt.java M fe/src/main/java/org/apache/impala/analysis/ShowStatsStmt.java M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/IcebergTable.java M fe/src/main/java/org/apache/impala/catalog/local/LocalFsPartition.java M fe/src/main/java/org/apache/impala/catalog/local/LocalFsTable.java M fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java A fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java M fe/src/main/java/org/apache/impala/service/Frontend.java M fe/src/main/java/org/apache/impala/util/IcebergUtil.java M testdata/workloads/functional-query/queries/QueryTest/iceberg_create.test 23 files changed, 852 insertions(+), 135 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/43/16143/2 -- To view, visit http://gerrit.cloudera.org:8080/16143 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I856cfee4f3397d1a89cf17650e8d4fbfe1f2b006 Gerrit-Change-Number: 16143 Gerrit-PatchSet: 2 Gerrit-Owner: wangsheng Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-9741: Supported query icebreg table by impala
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/16143 ) Change subject: IMPALA-9741: Supported query icebreg table by impala .. Patch Set 1: (17 comments) http://gerrit.cloudera.org:8080/#/c/16143/1/fe/src/main/java/org/apache/impala/analysis/ShowFilesStmt.java File fe/src/main/java/org/apache/impala/analysis/ShowFilesStmt.java: http://gerrit.cloudera.org:8080/#/c/16143/1/fe/src/main/java/org/apache/impala/analysis/ShowFilesStmt.java@80 PS1, Line 80: "SHOW FILES not applicable to a non hdfs table and non iceberg table: %s", tableName_)); line too long (98 > 90) http://gerrit.cloudera.org:8080/#/c/16143/1/fe/src/main/java/org/apache/impala/analysis/ShowStatsStmt.java File fe/src/main/java/org/apache/impala/analysis/ShowStatsStmt.java: http://gerrit.cloudera.org:8080/#/c/16143/1/fe/src/main/java/org/apache/impala/analysis/ShowStatsStmt.java@84 PS1, Line 84: // There two cases here: Non-partitioned hdfs table and non-partitioned iceberg table line too long (91 > 90) http://gerrit.cloudera.org:8080/#/c/16143/1/fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java File fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java: http://gerrit.cloudera.org:8080/#/c/16143/1/fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java@249 PS1, Line 249: public static Map getPartitionColToSourceIdMap(List specs) { line too long (103 > 90) http://gerrit.cloudera.org:8080/#/c/16143/1/fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java@286 PS1, Line 286: tIcebergTable.setPartition_col_to_source_id_map(icebergTable.getPartitionColToSourceIdMap()); line too long (99 > 90) http://gerrit.cloudera.org:8080/#/c/16143/1/fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java@309 PS1, Line 309: private static HdfsPartition.FileDescriptor getFileDescriptor(FileSystem fs, Path tableLoc, line too long (95 > 90) http://gerrit.cloudera.org:8080/#/c/16143/1/fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java@322 PS1, Line 322: return HdfsPartition.FileDescriptor.create(fileStatus, relPath, locations, hostIndex, line too long (91 > 90) http://gerrit.cloudera.org:8080/#/c/16143/1/fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java@329 PS1, Line 329: public static Map loadAllPartition(String location, line too long (93 > 90) http://gerrit.cloudera.org:8080/#/c/16143/1/fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java@337 PS1, Line 337: HdfsPartition.FileDescriptor fileDesc = getFileDescriptor(new Path(file.path().toString()), line too long (99 > 90) http://gerrit.cloudera.org:8080/#/c/16143/1/fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java@368 PS1, Line 368: partition.setFileFormat(IcebergUtil.toTHdfsFileFormat(icebergTable.getIcebergFileFormat())); line too long (100 > 90) http://gerrit.cloudera.org:8080/#/c/16143/1/fe/src/main/java/org/apache/impala/catalog/local/LocalFsPartition.java File fe/src/main/java/org/apache/impala/catalog/local/LocalFsPartition.java: http://gerrit.cloudera.org:8080/#/c/16143/1/fe/src/main/java/org/apache/impala/catalog/local/LocalFsPartition.java@176 PS1, Line 176: table_.getMetaStoreTable().getParameters().get(IcebergTable.ICEBERG_FILE_FORMAT); line too long (91 > 90) http://gerrit.cloudera.org:8080/#/c/16143/1/fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java File fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java: http://gerrit.cloudera.org:8080/#/c/16143/1/fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java@71 PS1, Line 71: Map partitionColToSourceIdMap = Utils.getPartitionColToSourceIdMap(partitionSpecs); line too long (104 > 90) http://gerrit.cloudera.org:8080/#/c/16143/1/fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java@78 PS1, Line 78: ColumnMap cmap, List partitionSpecs, Map sourceColsMap, line too long (100 > 90) http://gerrit.cloudera.org:8080/#/c/16143/1/fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java@161 PS1, Line 161: localFsTable_.createPrototypePartition(), CatalogObject.ThriftObjectType.DESCRIPTOR_ONLY); line too long (98 > 90) http://gerrit.cloudera.org:8080/#/c/16143/1/fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java@162 PS1, Line 162: THdfsTable hdfsTable = new THdfsTable(localFsTable_.getHdfsBaseDir(), getColumnNames(), line too long (91 > 90) http://gerrit.cloudera.org:8080/#/c/16143/1/fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java@163 PS1, Line 163: localFsTable_.getNullPartitionKeyValue(), FeFsTable.DEFAULT_NULL_COLUMN_VALUE, idToPartition, line too long (101 > 90) http://gerrit.cloudera.org:8080/#/c/16143/1/fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java File
[Impala-ASF-CR] IMPALA-9741: Supported query icebreg table by impala
wangsheng has uploaded this change for review. ( http://gerrit.cloudera.org:8080/16143 Change subject: IMPALA-9741: Supported query icebreg table by impala .. IMPALA-9741: Supported query icebreg table by impala This patch mainly realizes the query of iceberg table through impala, we can use the following sql to create an external iceberg table: CREATE EXTERNAL TABLE default.iceberg_test ( level string, event_time timestamp, message string, ) STORED AS ICEBERG LOCATION 'hdfs://xxx' TBLPROPERTIES ('iceberg_file_format'='parquet'); Or just including table name and location like this: CREATE EXTERNAL TABLE default.iceberg_test STORED AS ICEBERG LOCATION 'hdfs://xxx' TBLPROPERTIES ('iceberg_file_format'='parquet'); 'iceberg_file_format' is the file format in iceberg, currently support PARQUET and ORC. And if you don't identity this property in your SQL, default file format is PARQUT. Change-Id: I856cfee4f3397d1a89cf17650e8d4fbfe1f2b006 --- M be/src/runtime/descriptors.cc M common/thrift/CatalogObjects.thrift M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java M fe/src/main/java/org/apache/impala/analysis/IcebergPartitionField.java M fe/src/main/java/org/apache/impala/analysis/IcebergPartitionSpec.java M fe/src/main/java/org/apache/impala/analysis/ShowFilesStmt.java M fe/src/main/java/org/apache/impala/analysis/ShowStatsStmt.java M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/IcebergTable.java M fe/src/main/java/org/apache/impala/catalog/local/LocalFsPartition.java M fe/src/main/java/org/apache/impala/catalog/local/LocalFsTable.java M fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java A fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java M fe/src/main/java/org/apache/impala/service/Frontend.java M fe/src/main/java/org/apache/impala/util/IcebergUtil.java M testdata/workloads/functional-query/queries/QueryTest/iceberg_create.test 23 files changed, 842 insertions(+), 135 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/43/16143/1 -- To view, visit http://gerrit.cloudera.org:8080/16143 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I856cfee4f3397d1a89cf17650e8d4fbfe1f2b006 Gerrit-Change-Number: 16143 Gerrit-PatchSet: 1 Gerrit-Owner: wangsheng