[Impala-ASF-CR] IMPALA-9741: Supported query Icebreg table by impala

2020-07-16 Thread wangsheng (Code Review)
wangsheng has uploaded a new patch set (#8). ( 
http://gerrit.cloudera.org:8080/16143 )

Change subject: IMPALA-9741: Supported query Icebreg table by impala
..

IMPALA-9741: Supported query Icebreg table by impala

This patch mainly realizes the query of iceberg table through impala,
we can use the following sql to create an external iceberg table:
CREATE EXTERNAL TABLE default.iceberg_test (
level string,
event_time timestamp,
message string,
)
STORED AS ICEBERG
LOCATION 'hdfs://xxx'
TBLPROPERTIES ('iceberg_file_format'='parquet');
Or just including table name and location like this:
CREATE EXTERNAL TABLE default.iceberg_test
STORED AS ICEBERG
LOCATION 'hdfs://xxx'
TBLPROPERTIES ('iceberg_file_format'='parquet');
'iceberg_file_format' is the file format in iceberg, currently only
support PARQUET, other format would be supported in the future. And
if you don't identity this property in your SQL, default file format
is PARQUET.

We achieved this function by treating the iceberg table as normal
unpartitioned hdfs table. When query iceberg table, we pushdown
partition column predicates to iceberg to decided which data files
need to be scanned, and then transformed these information to BE to
do the real scan operation.

Testing:
- Unit test for Iceberg in FileMetadataLoaderTest
- Create table tests in functional_schema_template.sql
- Iceberg table query test in custom cluster test test_iceberg.py

Change-Id: I856cfee4f3397d1a89cf17650e8d4fbfe1f2b006
---
M be/src/runtime/descriptors.cc
M bin/rat_exclude_files.txt
M common/thrift/CatalogObjects.thrift
M fe/pom.xml
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
M fe/src/main/java/org/apache/impala/analysis/IcebergPartitionField.java
M fe/src/main/java/org/apache/impala/analysis/IcebergPartitionSpec.java
M fe/src/main/java/org/apache/impala/analysis/ShowFilesStmt.java
M fe/src/main/java/org/apache/impala/analysis/ShowStatsStmt.java
M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/IcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalFsPartition.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalFsTable.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
A fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/main/java/org/apache/impala/util/IcebergUtil.java
M fe/src/test/java/org/apache/impala/catalog/FileMetadataLoaderTest.java
M testdata/data/README
A 
testdata/data/iceberg_test/iceberg_non_partitioned/data/1-100-e1a80ed6-1064-494d-9cdd-c4a30c1ab8dc-0.parquet
A 
testdata/data/iceberg_test/iceberg_non_partitioned/data/3-102-511427f2-85f0-43ae-9b39-a456f8dc57b6-0.parquet
A 
testdata/data/iceberg_test/iceberg_non_partitioned/data/4-103-00fc55e1-6ef7-4241-ace2-6d075b9737fc-0.parquet
A 
testdata/data/iceberg_test/iceberg_non_partitioned/data/6-105-ef9e76d5-c060-4040-8aa1-b7c275610daa-0.parquet
A 
testdata/data/iceberg_test/iceberg_non_partitioned/data/7-106-c09c9c8d-9478-44f9-8501-f85f53112bc3-0.parquet
A 
testdata/data/iceberg_test/iceberg_non_partitioned/data/9-108-3b4f06ac-dca3-4f4e-be60-bf42d9927b5b-0.parquet
A 
testdata/data/iceberg_test/iceberg_non_partitioned/data/00011-110-1e653ccf-0963-4fb0-941c-32c9de13268b-0.parquet
A 
testdata/data/iceberg_test/iceberg_non_partitioned/data/00012-111-dfa70658-eb4b-4fa0-9ffa-b892cf90d6ac-0.parquet
A 
testdata/data/iceberg_test/iceberg_non_partitioned/data/00014-113-2d16e751-e2a4-4856-ab89-145996e3815e-0.parquet
A 
testdata/data/iceberg_test/iceberg_non_partitioned/data/00015-114-0f710621-cbbf-4509-a93d-b58808978e2e-0.parquet
A 
testdata/data/iceberg_test/iceberg_non_partitioned/data/00017-116-0b666c79-53df-4507-906c-542e65a83443-0.parquet
A 
testdata/data/iceberg_test/iceberg_non_partitioned/data/00019-118-1bc6bc6e-e061-4da3-9d1e-a427a306c471-0.parquet
A 
testdata/data/iceberg_test/iceberg_non_partitioned/data/00020-119-ae7b2c67-1538-4429-8246-4998960e3817-0.parquet
A 
testdata/data/iceberg_test/iceberg_non_partitioned/data/00022-121-8db0f1e1-d88c-4aad-a8b3-24fd07329cdb-0.parquet
A 
testdata/data/iceberg_test/iceberg_non_partitioned/data/00023-122-de57b6b0-f54b-40ac-85cd-e783505094b6-0.parquet
A 

[Impala-ASF-CR] IMPALA-9741: Supported query Icebreg table by impala

2020-07-16 Thread wangsheng (Code Review)
wangsheng has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16143 )

Change subject: IMPALA-9741: Supported query Icebreg table by impala
..


Patch Set 8:

(13 comments)

Done!

http://gerrit.cloudera.org:8080/#/c/16143/6/common/thrift/CatalogObjects.thrift
File common/thrift/CatalogObjects.thrift:

http://gerrit.cloudera.org:8080/#/c/16143/6/common/thrift/CatalogObjects.thrift@512
PS6, Line 512: column_to_sourc
> nit: column_to_source_id ?
Done


http://gerrit.cloudera.org:8080/#/c/16143/6/common/thrift/CatalogObjects.thrift@515
PS6, Line 515: source_id_to_partition
> The mapping is reversed. Name it "source_id_to_partition" ?
Done


http://gerrit.cloudera.org:8080/#/c/16143/6/common/thrift/CatalogObjects.thrift@516
PS6, Line 516: map path_md5_to_file
> Please follow the above conventions for naming maps.
Done


http://gerrit.cloudera.org:8080/#/c/16143/6/fe/src/main/java/org/apache/impala/analysis/IcebergPartitionField.java
File fe/src/main/java/org/apache/impala/analysis/IcebergPartitionField.java:

http://gerrit.cloudera.org:8080/#/c/16143/6/fe/src/main/java/org/apache/impala/analysis/IcebergPartitionField.java@28
PS6, Line 28:   // The id of the source column in the Iceberg table schema. The 
source column is
:   // used as the input for this partition field.
> Might worth rewording it a bit:
Done


http://gerrit.cloudera.org:8080/#/c/16143/6/fe/src/main/java/org/apache/impala/analysis/ShowStatsStmt.java
File fe/src/main/java/org/apache/impala/analysis/ShowStatsStmt.java:

http://gerrit.cloudera.org:8080/#/c/16143/6/fe/src/main/java/org/apache/impala/analysis/ShowStatsStmt.java@88
PS6, Line 88: if (table_ instanceof FeIcebergTable) {
:   if (((FeIcebergTable) 
table_).getSourceIdToPartitionMap().isEmpty()) {
: notPartitioned = true;
:   }
> Probably we should treat all Iceberg tables as not partitioned, since it's
Yes, you are right, we treated iceberg table as unpartitioned hdfs table, but 
iceberg table still has it's own partition info, we get this info by 'show 
partitions xxx.iceberg_table_test' like this:

+--+---+--++---+
| Partition Id | Source Id | Field Id | Field Name | Field Partition Transform |
+--+---+--++---+
| 0| 2 | 1000 | sex| IDENTITY  |
| 0| 3 | 1001 | action | IDENTITY  |
+--+---+--++---+

If I set 'notPartitioned' as true, even if getPartitionColToSourceIdMap() is 
not empty, how can I get the iceberg partition info? 'show partitions 
xxx.iceberg_table_test' will always return AnalysisException.


http://gerrit.cloudera.org:8080/#/c/16143/6/fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java
File fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java:

http://gerrit.cloudera.org:8080/#/c/16143/6/fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java@66
PS6, Line 66: getPathMD5ToFi
> nit: getPartitionToFileDescMap
Done


http://gerrit.cloudera.org:8080/#/c/16143/6/fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java@219
PS6, Line 219: isPartitioned(Fe
> nit: isPartitioned?
Done


http://gerrit.cloudera.org:8080/#/c/16143/6/fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java@258
PS6, Line 258: PartitionColToSourceId
> It returns a mapping from source ids to partition columns, therefore please
Done


http://gerrit.cloudera.org:8080/#/c/16143/6/fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java@271
PS6, Line 271: getColumnToSourc
> nit: getColumnToSourceIdMap?
Done


http://gerrit.cloudera.org:8080/#/c/16143/6/fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java@305
PS6, Line 305:   
> nit: wrong indentation
Done


http://gerrit.cloudera.org:8080/#/c/16143/6/fe/src/main/java/org/apache/impala/util/IcebergUtil.java
File fe/src/main/java/org/apache/impala/util/IcebergUtil.java:

http://gerrit.cloudera.org:8080/#/c/16143/6/fe/src/main/java/org/apache/impala/util/IcebergUtil.java@114
PS6, Line 114: if ("PARQUET".equalsIgnoreCase(format)) return 
TIcebergFileFormat.PARQUET;
 : return null;
 :   }
 :
 :   /**
 :* Build TIceb
> How about:
Done


http://gerrit.cloudera.org:8080/#/c/16143/6/testdata/bin/generate-schema-statements.py
File testdata/bin/generate-schema-statements.py:

http://gerrit.cloudera.org:8080/#/c/16143/6/testdata/bin/generate-schema-statements.py@193
PS6, Line 193:   }
> You probably don't need to modify this file. I think adding HUDIPARQUET to
Done


http://gerrit.cloudera.org:8080/#/c/16143/6/testdata/bin/generate-schema-statements.py@766
PS6, Line 766:
> flake8: E501 line too 

[Impala-ASF-CR] IMPALA-9741: Supported query Icebreg table by impala

2020-07-15 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16143 )

Change subject: IMPALA-9741: Supported query Icebreg table by impala
..


Patch Set 7:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/6612/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16143
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I856cfee4f3397d1a89cf17650e8d4fbfe1f2b006
Gerrit-Change-Number: 16143
Gerrit-PatchSet: 7
Gerrit-Owner: wangsheng 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: wangsheng 
Gerrit-Comment-Date: Thu, 16 Jul 2020 05:13:55 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9741: Supported query Icebreg table by impala

2020-07-15 Thread wangsheng (Code Review)
wangsheng has uploaded a new patch set (#7). ( 
http://gerrit.cloudera.org:8080/16143 )

Change subject: IMPALA-9741: Supported query Icebreg table by impala
..

IMPALA-9741: Supported query Icebreg table by impala

This patch mainly realizes the query of iceberg table through impala,
we can use the following sql to create an external iceberg table:
CREATE EXTERNAL TABLE default.iceberg_test (
level string,
event_time timestamp,
message string,
)
STORED AS ICEBERG
LOCATION 'hdfs://xxx'
TBLPROPERTIES ('iceberg_file_format'='parquet');
Or just including table name and location like this:
CREATE EXTERNAL TABLE default.iceberg_test
STORED AS ICEBERG
LOCATION 'hdfs://xxx'
TBLPROPERTIES ('iceberg_file_format'='parquet');
'iceberg_file_format' is the file format in iceberg, currently only
support PARQUET, other format would be supported in the future. And
if you don't identity this property in your SQL, default file format
is PARQUET.

We achieved this function by treating the iceberg table as normal
unpartitioned hdfs table. When query iceberg table, we pushdown
partition column predicates to iceberg to decided which data files
need to be scanned, and then transformed these information to BE to
do the real scan operation.

Testing:
- Unit test for Iceberg in FileMetadataLoaderTest
- Create table tests in functional_schema_template.sql
- Iceberg table query test in custom cluster test test_iceberg.py

Change-Id: I856cfee4f3397d1a89cf17650e8d4fbfe1f2b006
---
M be/src/runtime/descriptors.cc
M bin/rat_exclude_files.txt
M common/thrift/CatalogObjects.thrift
M fe/pom.xml
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
M fe/src/main/java/org/apache/impala/analysis/IcebergPartitionField.java
M fe/src/main/java/org/apache/impala/analysis/IcebergPartitionSpec.java
M fe/src/main/java/org/apache/impala/analysis/ShowFilesStmt.java
M fe/src/main/java/org/apache/impala/analysis/ShowStatsStmt.java
M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/IcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalFsPartition.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalFsTable.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
A fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/main/java/org/apache/impala/util/IcebergUtil.java
M fe/src/test/java/org/apache/impala/catalog/FileMetadataLoaderTest.java
M testdata/data/README
A 
testdata/data/iceberg_test/iceberg_non_partitioned/data/1-100-e1a80ed6-1064-494d-9cdd-c4a30c1ab8dc-0.parquet
A 
testdata/data/iceberg_test/iceberg_non_partitioned/data/3-102-511427f2-85f0-43ae-9b39-a456f8dc57b6-0.parquet
A 
testdata/data/iceberg_test/iceberg_non_partitioned/data/4-103-00fc55e1-6ef7-4241-ace2-6d075b9737fc-0.parquet
A 
testdata/data/iceberg_test/iceberg_non_partitioned/data/6-105-ef9e76d5-c060-4040-8aa1-b7c275610daa-0.parquet
A 
testdata/data/iceberg_test/iceberg_non_partitioned/data/7-106-c09c9c8d-9478-44f9-8501-f85f53112bc3-0.parquet
A 
testdata/data/iceberg_test/iceberg_non_partitioned/data/9-108-3b4f06ac-dca3-4f4e-be60-bf42d9927b5b-0.parquet
A 
testdata/data/iceberg_test/iceberg_non_partitioned/data/00011-110-1e653ccf-0963-4fb0-941c-32c9de13268b-0.parquet
A 
testdata/data/iceberg_test/iceberg_non_partitioned/data/00012-111-dfa70658-eb4b-4fa0-9ffa-b892cf90d6ac-0.parquet
A 
testdata/data/iceberg_test/iceberg_non_partitioned/data/00014-113-2d16e751-e2a4-4856-ab89-145996e3815e-0.parquet
A 
testdata/data/iceberg_test/iceberg_non_partitioned/data/00015-114-0f710621-cbbf-4509-a93d-b58808978e2e-0.parquet
A 
testdata/data/iceberg_test/iceberg_non_partitioned/data/00017-116-0b666c79-53df-4507-906c-542e65a83443-0.parquet
A 
testdata/data/iceberg_test/iceberg_non_partitioned/data/00019-118-1bc6bc6e-e061-4da3-9d1e-a427a306c471-0.parquet
A 
testdata/data/iceberg_test/iceberg_non_partitioned/data/00020-119-ae7b2c67-1538-4429-8246-4998960e3817-0.parquet
A 
testdata/data/iceberg_test/iceberg_non_partitioned/data/00022-121-8db0f1e1-d88c-4aad-a8b3-24fd07329cdb-0.parquet
A 
testdata/data/iceberg_test/iceberg_non_partitioned/data/00023-122-de57b6b0-f54b-40ac-85cd-e783505094b6-0.parquet
A 

[Impala-ASF-CR] IMPALA-9741: Supported query Icebreg table by impala

2020-07-15 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16143 )

Change subject: IMPALA-9741: Supported query Icebreg table by impala
..


Patch Set 6:

(15 comments)

http://gerrit.cloudera.org:8080/#/c/16143/6//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/16143/6//COMMIT_MSG@7
PS6, Line 7: Icebreg
It's still misspelled


http://gerrit.cloudera.org:8080/#/c/16143/6//COMMIT_MSG@7
PS6, Line 7: Supported query
nit: Support querying


http://gerrit.cloudera.org:8080/#/c/16143/6//COMMIT_MSG@26
PS6, Line 26: identity
specify


http://gerrit.cloudera.org:8080/#/c/16143/6/common/thrift/CatalogObjects.thrift
File common/thrift/CatalogObjects.thrift:

http://gerrit.cloudera.org:8080/#/c/16143/6/common/thrift/CatalogObjects.thrift@512
PS6, Line 512: source_cols_map
nit: column_to_source_id ?


http://gerrit.cloudera.org:8080/#/c/16143/6/common/thrift/CatalogObjects.thrift@515
PS6, Line 515: partition_col_to_source_id_map
The mapping is reversed. Name it "source_id_to_partition" ?


http://gerrit.cloudera.org:8080/#/c/16143/6/common/thrift/CatalogObjects.thrift@516
PS6, Line 516: map file_descriptors
Please follow the above conventions for naming maps.


http://gerrit.cloudera.org:8080/#/c/16143/6/fe/src/main/java/org/apache/impala/analysis/IcebergPartitionField.java
File fe/src/main/java/org/apache/impala/analysis/IcebergPartitionField.java:

http://gerrit.cloudera.org:8080/#/c/16143/6/fe/src/main/java/org/apache/impala/analysis/IcebergPartitionField.java@28
PS6, Line 28:   // The id of the source field in iceberg table Schema, you can 
get these source
:   // fields by Schema.columns(), the return type is 
List.
Might worth rewording it a bit:

"The id of the source column in the Iceberg table schema. The source column is 
used as the input for this partition field."


http://gerrit.cloudera.org:8080/#/c/16143/6/fe/src/main/java/org/apache/impala/analysis/ShowStatsStmt.java
File fe/src/main/java/org/apache/impala/analysis/ShowStatsStmt.java:

http://gerrit.cloudera.org:8080/#/c/16143/6/fe/src/main/java/org/apache/impala/analysis/ShowStatsStmt.java@88
PS6, Line 88: if (table_ instanceof FeIcebergTable) {
:   if (((FeIcebergTable) 
table_).getPartitionColToSourceIdMap().isEmpty()) {
: notPartitioned = true;
:   }
Probably we should treat all Iceberg tables as not partitioned, since it's 
partitioning is different than other file system tables' partitioning.


http://gerrit.cloudera.org:8080/#/c/16143/6/fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java
File fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java:

http://gerrit.cloudera.org:8080/#/c/16143/6/fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java@66
PS6, Line 66: getFileDescMap
nit: getPartitionToFileDescMap


http://gerrit.cloudera.org:8080/#/c/16143/6/fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java@219
PS6, Line 219: isPartitionTable
nit: isPartitioned?


http://gerrit.cloudera.org:8080/#/c/16143/6/fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java@258
PS6, Line 258: PartitionColToSourceId
It returns a mapping from source ids to partition columns, therefore please 
name it "sourceIdToPartitionCol".


http://gerrit.cloudera.org:8080/#/c/16143/6/fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java@271
PS6, Line 271: getSourceColsMap
nit: getColumnToSourceIdMap?


http://gerrit.cloudera.org:8080/#/c/16143/6/fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java@305
PS6, Line 305:
nit: wrong indentation


http://gerrit.cloudera.org:8080/#/c/16143/6/fe/src/main/java/org/apache/impala/util/IcebergUtil.java
File fe/src/main/java/org/apache/impala/util/IcebergUtil.java:

http://gerrit.cloudera.org:8080/#/c/16143/6/fe/src/main/java/org/apache/impala/util/IcebergUtil.java@114
PS6, Line 114: if (format == null) return null;
 : format = format.toUpperCase();
 : if (format.equals("PARQUET")) {
 :   return TIcebergFileFormat.PARQUET;
 : }
 : return null;
How about:

 if ("PARQUET".equalsIgnoreCase(format)) return TIcebergFileFormat.PARQUET;
 return null;


http://gerrit.cloudera.org:8080/#/c/16143/6/testdata/bin/generate-schema-statements.py
File testdata/bin/generate-schema-statements.py:

http://gerrit.cloudera.org:8080/#/c/16143/6/testdata/bin/generate-schema-statements.py@193
PS6, Line 193:   'iceberg': 'ICEBERG'
You probably don't need to modify this file. I think adding HUDIPARQUET to this 
file was also unnecessary.

Probably we can do the same thing that we did for Hudi, i.e. add the Iceberg 
tables under the functional_parquet database.

https://gerrit.cloudera.org/c/14711/25/testdata/datasets/functional/schema_constraints.csv

[Impala-ASF-CR] IMPALA-9741: Supported query Icebreg table by impala

2020-07-15 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16143 )

Change subject: IMPALA-9741: Supported query Icebreg table by impala
..


Patch Set 6:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/6604/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16143
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I856cfee4f3397d1a89cf17650e8d4fbfe1f2b006
Gerrit-Change-Number: 16143
Gerrit-PatchSet: 6
Gerrit-Owner: wangsheng 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: wangsheng 
Gerrit-Comment-Date: Wed, 15 Jul 2020 08:32:39 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9741: Supported query Icebreg table by impala

2020-07-15 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16143 )

Change subject: IMPALA-9741: Supported query Icebreg table by impala
..


Patch Set 6:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/16143/6/testdata/bin/generate-schema-statements.py
File testdata/bin/generate-schema-statements.py:

http://gerrit.cloudera.org:8080/#/c/16143/6/testdata/bin/generate-schema-statements.py@766
PS6, Line 766: n
flake8: E501 line too long (94 > 90 characters)


http://gerrit.cloudera.org:8080/#/c/16143/6/tests/common/test_dimensions.py
File tests/common/test_dimensions.py:

http://gerrit.cloudera.org:8080/#/c/16143/6/tests/common/test_dimensions.py@32
PS6, Line 32: c
flake8: E501 line too long (98 > 90 characters)



--
To view, visit http://gerrit.cloudera.org:8080/16143
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I856cfee4f3397d1a89cf17650e8d4fbfe1f2b006
Gerrit-Change-Number: 16143
Gerrit-PatchSet: 6
Gerrit-Owner: wangsheng 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: wangsheng 
Gerrit-Comment-Date: Wed, 15 Jul 2020 08:05:11 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9741: Supported query Icebreg table by impala

2020-07-15 Thread wangsheng (Code Review)
wangsheng has uploaded a new patch set (#6). ( 
http://gerrit.cloudera.org:8080/16143 )

Change subject: IMPALA-9741: Supported query Icebreg table by impala
..

IMPALA-9741: Supported query Icebreg table by impala

This patch mainly realizes the query of iceberg table through impala,
we can use the following sql to create an external iceberg table:
CREATE EXTERNAL TABLE default.iceberg_test (
level string,
event_time timestamp,
message string,
)
STORED AS ICEBERG
LOCATION 'hdfs://xxx'
TBLPROPERTIES ('iceberg_file_format'='parquet');
Or just including table name and location like this:
CREATE EXTERNAL TABLE default.iceberg_test
STORED AS ICEBERG
LOCATION 'hdfs://xxx'
TBLPROPERTIES ('iceberg_file_format'='parquet');
'iceberg_file_format' is the file format in iceberg, currently only
support PARQUET, other format would be supported in the future. And
if you don't identity this property in your SQL, default file format
is PARQUET.

We achieved this function by treating the iceberg table as normal
unpartitioned hdfs table. When query iceberg table, we pushdown
partition column predicates to iceberg to decided which data files
need to be scanned, and then transformed these information to BE to
do the real scan operation.

Testing:
- Unit test for Iceberg in FileMetadataLoaderTest
- Create table tests in functional_schema_template.sql
- Iceberg table query test in custom cluster test test_iceberg.py

Change-Id: I856cfee4f3397d1a89cf17650e8d4fbfe1f2b006
---
M be/src/runtime/descriptors.cc
M bin/rat_exclude_files.txt
M common/thrift/CatalogObjects.thrift
M fe/pom.xml
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
M fe/src/main/java/org/apache/impala/analysis/IcebergPartitionField.java
M fe/src/main/java/org/apache/impala/analysis/IcebergPartitionSpec.java
M fe/src/main/java/org/apache/impala/analysis/ShowFilesStmt.java
M fe/src/main/java/org/apache/impala/analysis/ShowStatsStmt.java
M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/IcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalFsPartition.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalFsTable.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
A fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/main/java/org/apache/impala/util/IcebergUtil.java
M fe/src/test/java/org/apache/impala/catalog/FileMetadataLoaderTest.java
M testdata/bin/generate-schema-statements.py
M testdata/data/README
A 
testdata/data/iceberg_test/iceberg_non_partitioned/data/1-100-e1a80ed6-1064-494d-9cdd-c4a30c1ab8dc-0.parquet
A 
testdata/data/iceberg_test/iceberg_non_partitioned/data/3-102-511427f2-85f0-43ae-9b39-a456f8dc57b6-0.parquet
A 
testdata/data/iceberg_test/iceberg_non_partitioned/data/4-103-00fc55e1-6ef7-4241-ace2-6d075b9737fc-0.parquet
A 
testdata/data/iceberg_test/iceberg_non_partitioned/data/6-105-ef9e76d5-c060-4040-8aa1-b7c275610daa-0.parquet
A 
testdata/data/iceberg_test/iceberg_non_partitioned/data/7-106-c09c9c8d-9478-44f9-8501-f85f53112bc3-0.parquet
A 
testdata/data/iceberg_test/iceberg_non_partitioned/data/9-108-3b4f06ac-dca3-4f4e-be60-bf42d9927b5b-0.parquet
A 
testdata/data/iceberg_test/iceberg_non_partitioned/data/00011-110-1e653ccf-0963-4fb0-941c-32c9de13268b-0.parquet
A 
testdata/data/iceberg_test/iceberg_non_partitioned/data/00012-111-dfa70658-eb4b-4fa0-9ffa-b892cf90d6ac-0.parquet
A 
testdata/data/iceberg_test/iceberg_non_partitioned/data/00014-113-2d16e751-e2a4-4856-ab89-145996e3815e-0.parquet
A 
testdata/data/iceberg_test/iceberg_non_partitioned/data/00015-114-0f710621-cbbf-4509-a93d-b58808978e2e-0.parquet
A 
testdata/data/iceberg_test/iceberg_non_partitioned/data/00017-116-0b666c79-53df-4507-906c-542e65a83443-0.parquet
A 
testdata/data/iceberg_test/iceberg_non_partitioned/data/00019-118-1bc6bc6e-e061-4da3-9d1e-a427a306c471-0.parquet
A 
testdata/data/iceberg_test/iceberg_non_partitioned/data/00020-119-ae7b2c67-1538-4429-8246-4998960e3817-0.parquet
A 
testdata/data/iceberg_test/iceberg_non_partitioned/data/00022-121-8db0f1e1-d88c-4aad-a8b3-24fd07329cdb-0.parquet
A 
testdata/data/iceberg_test/iceberg_non_partitioned/data/00023-122-de57b6b0-f54b-40ac-85cd-e783505094b6-0.parquet
A 

[Impala-ASF-CR] IMPALA-9741: Supported query Icebreg table by impala

2020-07-14 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16143 )

Change subject: IMPALA-9741: Supported query Icebreg table by impala
..


Patch Set 5:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/6593/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16143
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I856cfee4f3397d1a89cf17650e8d4fbfe1f2b006
Gerrit-Change-Number: 16143
Gerrit-PatchSet: 5
Gerrit-Owner: wangsheng 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: wangsheng 
Gerrit-Comment-Date: Tue, 14 Jul 2020 15:18:10 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9741: Supported query Icebreg table by impala

2020-07-14 Thread wangsheng (Code Review)
wangsheng has uploaded a new patch set (#5). ( 
http://gerrit.cloudera.org:8080/16143 )

Change subject: IMPALA-9741: Supported query Icebreg table by impala
..

IMPALA-9741: Supported query Icebreg table by impala

This patch mainly realizes the query of iceberg table through impala,
we can use the following sql to create an external iceberg table:
CREATE EXTERNAL TABLE default.iceberg_test (
level string,
event_time timestamp,
message string,
)
STORED AS ICEBERG
LOCATION 'hdfs://xxx'
TBLPROPERTIES ('iceberg_file_format'='parquet');
Or just including table name and location like this:
CREATE EXTERNAL TABLE default.iceberg_test
STORED AS ICEBERG
LOCATION 'hdfs://xxx'
TBLPROPERTIES ('iceberg_file_format'='parquet');
'iceberg_file_format' is the file format in iceberg, currently only
support PARQUET, other format would be supported in the future. And
if you don't identity this property in your SQL, default file format
is PARQUET.

We achieved this function by treating the iceberg table as normal
unpartitioned hdfs table. When query iceberg table, we pushdown
partition column predicates to iceberg to decided which data files
need to be scanned, and then transformed these information to BE to
do the real scan operation.

Testing:
- Unit test for Iceberg in FileMetadataLoaderTest
- Create table tests in functional_schema_template.sql
- Iceberg table query test in custom cluster test test_iceberg.py

Change-Id: I856cfee4f3397d1a89cf17650e8d4fbfe1f2b006
---
M be/src/runtime/descriptors.cc
M bin/rat_exclude_files.txt
M common/thrift/CatalogObjects.thrift
M fe/pom.xml
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
M fe/src/main/java/org/apache/impala/analysis/IcebergPartitionField.java
M fe/src/main/java/org/apache/impala/analysis/IcebergPartitionSpec.java
M fe/src/main/java/org/apache/impala/analysis/ShowFilesStmt.java
M fe/src/main/java/org/apache/impala/analysis/ShowStatsStmt.java
M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/IcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalFsPartition.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalFsTable.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
A fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/main/java/org/apache/impala/util/IcebergUtil.java
M fe/src/test/java/org/apache/impala/catalog/FileMetadataLoaderTest.java
M testdata/data/README
A 
testdata/data/iceberg_test/iceberg_non_partitioned/data/1-100-e1a80ed6-1064-494d-9cdd-c4a30c1ab8dc-0.parquet
A 
testdata/data/iceberg_test/iceberg_non_partitioned/data/3-102-511427f2-85f0-43ae-9b39-a456f8dc57b6-0.parquet
A 
testdata/data/iceberg_test/iceberg_non_partitioned/data/4-103-00fc55e1-6ef7-4241-ace2-6d075b9737fc-0.parquet
A 
testdata/data/iceberg_test/iceberg_non_partitioned/data/6-105-ef9e76d5-c060-4040-8aa1-b7c275610daa-0.parquet
A 
testdata/data/iceberg_test/iceberg_non_partitioned/data/7-106-c09c9c8d-9478-44f9-8501-f85f53112bc3-0.parquet
A 
testdata/data/iceberg_test/iceberg_non_partitioned/data/9-108-3b4f06ac-dca3-4f4e-be60-bf42d9927b5b-0.parquet
A 
testdata/data/iceberg_test/iceberg_non_partitioned/data/00011-110-1e653ccf-0963-4fb0-941c-32c9de13268b-0.parquet
A 
testdata/data/iceberg_test/iceberg_non_partitioned/data/00012-111-dfa70658-eb4b-4fa0-9ffa-b892cf90d6ac-0.parquet
A 
testdata/data/iceberg_test/iceberg_non_partitioned/data/00014-113-2d16e751-e2a4-4856-ab89-145996e3815e-0.parquet
A 
testdata/data/iceberg_test/iceberg_non_partitioned/data/00015-114-0f710621-cbbf-4509-a93d-b58808978e2e-0.parquet
A 
testdata/data/iceberg_test/iceberg_non_partitioned/data/00017-116-0b666c79-53df-4507-906c-542e65a83443-0.parquet
A 
testdata/data/iceberg_test/iceberg_non_partitioned/data/00019-118-1bc6bc6e-e061-4da3-9d1e-a427a306c471-0.parquet
A 
testdata/data/iceberg_test/iceberg_non_partitioned/data/00020-119-ae7b2c67-1538-4429-8246-4998960e3817-0.parquet
A 
testdata/data/iceberg_test/iceberg_non_partitioned/data/00022-121-8db0f1e1-d88c-4aad-a8b3-24fd07329cdb-0.parquet
A 
testdata/data/iceberg_test/iceberg_non_partitioned/data/00023-122-de57b6b0-f54b-40ac-85cd-e783505094b6-0.parquet
A 

[Impala-ASF-CR] IMPALA-9741: Supported query icebreg table by impala

2020-07-10 Thread Zoltan Borok-Nagy (Code Review)
Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16143 )

Change subject: IMPALA-9741: Supported query icebreg table by impala
..


Patch Set 4:

(10 comments)

Thanks for working on this, it will be a really great addition to Impala!

http://gerrit.cloudera.org:8080/#/c/16143/4//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/16143/4//COMMIT_MSG@7
PS4, Line 7: icebreg
Iceberg


http://gerrit.cloudera.org:8080/#/c/16143/4//COMMIT_MSG@26
PS4, Line 26: PARQUT
PARQUET


http://gerrit.cloudera.org:8080/#/c/16143/4//COMMIT_MSG@27
PS4, Line 27:
Please add a high-level description about what this patch does.


http://gerrit.cloudera.org:8080/#/c/16143/4/common/thrift/CatalogObjects.thrift
File common/thrift/CatalogObjects.thrift:

http://gerrit.cloudera.org:8080/#/c/16143/4/common/thrift/CatalogObjects.thrift@510
PS4, Line 510: source_cols_map
please add some comment about the fields


http://gerrit.cloudera.org:8080/#/c/16143/4/fe/src/main/java/org/apache/impala/analysis/ShowStatsStmt.java
File fe/src/main/java/org/apache/impala/analysis/ShowStatsStmt.java:

http://gerrit.cloudera.org:8080/#/c/16143/4/fe/src/main/java/org/apache/impala/analysis/ShowStatsStmt.java@86
PS4, Line 86:   boolean flag = op_ == TShowStatsOp.PARTITIONS ? table_ 
instanceof FeIcebergTable ?
:   ((FeIcebergTable) 
table_).getPartitionColToSourceIdMap().isEmpty() :
:   table_.getNumClusteringCols() == 0 : false;
nit: for readability, please use if statements instead of nested ternary 
operators


http://gerrit.cloudera.org:8080/#/c/16143/4/fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java
File fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java:

http://gerrit.cloudera.org:8080/#/c/16143/4/fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java@69
PS4, Line 69: transfromed
transformed


http://gerrit.cloudera.org:8080/#/c/16143/4/fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java@71
PS4, Line 71: getFeFsTable
Maybe rename to getHdfsTable() ?


http://gerrit.cloudera.org:8080/#/c/16143/4/fe/src/main/java/org/apache/impala/util/IcebergUtil.java
File fe/src/main/java/org/apache/impala/util/IcebergUtil.java:

http://gerrit.cloudera.org:8080/#/c/16143/4/fe/src/main/java/org/apache/impala/util/IcebergUtil.java@115
PS4, Line 115: toUpperCase();
 : if (format.equalsIgnoreCase
nit: toUpperCase() or equalsIgnoreCase() is not needed.


http://gerrit.cloudera.org:8080/#/c/16143/4/fe/src/main/java/org/apache/impala/util/IcebergUtil.java@246
PS4, Line 246: List dataFileList = new ArrayList<>();
 : for (FileScanTask task : scan.planFiles()) {
 :   dataFileList.add(task.file());
 : }
 : return dataFileList;
nit: return Lists.newArrayList(scan.planFiles());


http://gerrit.cloudera.org:8080/#/c/16143/4/fe/src/test/java/org/apache/impala/customcluster/IcebergTableQueryTest.java
File 
fe/src/test/java/org/apache/impala/customcluster/IcebergTableQueryTest.java:

http://gerrit.cloudera.org:8080/#/c/16143/4/fe/src/test/java/org/apache/impala/customcluster/IcebergTableQueryTest.java@42
PS4, Line 42: /**
:  * Test impala query iceberg table
:  * impala not supported insert into iceberg table now, so we 
construct iceberg
:  * table by iceberg api
:  */
Instead of writing the Iceberg table each time, can we just check it into the 
repository then copy it to the HDFS warehouse directory during data loading?

We did something similar with Apache Hudi: https://gerrit.cloudera.org/c/14711/

After that you could create end-to-end tests in Python and in ".test" files. 
E.g.:

https://github.com/apache/impala/blob/65722d3e9051d6a08cb1e69fd36a06684745c226/tests/query_test/test_scanners.py#L326-L340

https://github.com/apache/impala/blob/master/testdata/workloads/functional-query/queries/QueryTest/hudi-parquet.test



--
To view, visit http://gerrit.cloudera.org:8080/16143
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I856cfee4f3397d1a89cf17650e8d4fbfe1f2b006
Gerrit-Change-Number: 16143
Gerrit-PatchSet: 4
Gerrit-Owner: wangsheng 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Fri, 10 Jul 2020 14:09:12 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-9741: Supported query icebreg table by impala

2020-07-10 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16143 )

Change subject: IMPALA-9741: Supported query icebreg table by impala
..


Patch Set 4:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/6549/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16143
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I856cfee4f3397d1a89cf17650e8d4fbfe1f2b006
Gerrit-Change-Number: 16143
Gerrit-PatchSet: 4
Gerrit-Owner: wangsheng 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Fri, 10 Jul 2020 12:54:17 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9741: Supported query icebreg table by impala

2020-07-10 Thread wangsheng (Code Review)
wangsheng has uploaded a new patch set (#4). ( 
http://gerrit.cloudera.org:8080/16143 )

Change subject: IMPALA-9741: Supported query icebreg table by impala
..

IMPALA-9741: Supported query icebreg table by impala

This patch mainly realizes the query of iceberg table through impala,
we can use the following sql to create an external iceberg table:
CREATE EXTERNAL TABLE default.iceberg_test (
level string,
event_time timestamp,
message string,
)
STORED AS ICEBERG
LOCATION 'hdfs://xxx'
TBLPROPERTIES ('iceberg_file_format'='parquet');
Or just including table name and location like this:
CREATE EXTERNAL TABLE default.iceberg_test
STORED AS ICEBERG
LOCATION 'hdfs://xxx'
TBLPROPERTIES ('iceberg_file_format'='parquet');
'iceberg_file_format' is the file format in iceberg, currently support
PARQUET and ORC. And if you don't identity this property in your SQL,
default file format is PARQUT.

Testing:
- Add fe test IcebergTableQueryTest.java

Change-Id: I856cfee4f3397d1a89cf17650e8d4fbfe1f2b006
---
M be/src/runtime/descriptors.cc
M common/thrift/CatalogObjects.thrift
M fe/pom.xml
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
M fe/src/main/java/org/apache/impala/analysis/IcebergPartitionField.java
M fe/src/main/java/org/apache/impala/analysis/IcebergPartitionSpec.java
M fe/src/main/java/org/apache/impala/analysis/ShowFilesStmt.java
M fe/src/main/java/org/apache/impala/analysis/ShowStatsStmt.java
M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/IcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalFsPartition.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalFsTable.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
A fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/main/java/org/apache/impala/util/IcebergUtil.java
A fe/src/test/java/org/apache/impala/customcluster/IcebergTableQueryTest.java
M testdata/workloads/functional-query/queries/QueryTest/iceberg_create.test
25 files changed, 1,148 insertions(+), 158 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/43/16143/4
--
To view, visit http://gerrit.cloudera.org:8080/16143
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I856cfee4f3397d1a89cf17650e8d4fbfe1f2b006
Gerrit-Change-Number: 16143
Gerrit-PatchSet: 4
Gerrit-Owner: wangsheng 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-9741: Supported query icebreg table by impala

2020-07-08 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16143 )

Change subject: IMPALA-9741: Supported query icebreg table by impala
..


Patch Set 3:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/6537/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16143
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I856cfee4f3397d1a89cf17650e8d4fbfe1f2b006
Gerrit-Change-Number: 16143
Gerrit-PatchSet: 3
Gerrit-Owner: wangsheng 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Thu, 09 Jul 2020 03:11:41 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9741: Supported query icebreg table by impala

2020-07-08 Thread wangsheng (Code Review)
wangsheng has uploaded a new patch set (#3). ( 
http://gerrit.cloudera.org:8080/16143 )

Change subject: IMPALA-9741: Supported query icebreg table by impala
..

IMPALA-9741: Supported query icebreg table by impala

This patch mainly realizes the query of iceberg table through impala,
we can use the following sql to create an external iceberg table:
CREATE EXTERNAL TABLE default.iceberg_test (
level string,
event_time timestamp,
message string,
)
STORED AS ICEBERG
LOCATION 'hdfs://xxx'
TBLPROPERTIES ('iceberg_file_format'='parquet');
Or just including table name and location like this:
CREATE EXTERNAL TABLE default.iceberg_test
STORED AS ICEBERG
LOCATION 'hdfs://xxx'
TBLPROPERTIES ('iceberg_file_format'='parquet');
'iceberg_file_format' is the file format in iceberg, currently support
PARQUET and ORC. And if you don't identity this property in your SQL,
default file format is PARQUT.

Testing:
- Add fe test IcebergTableQueryTest.java

Change-Id: I856cfee4f3397d1a89cf17650e8d4fbfe1f2b006
---
M be/src/runtime/descriptors.cc
M common/thrift/CatalogObjects.thrift
M fe/pom.xml
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
M fe/src/main/java/org/apache/impala/analysis/IcebergPartitionField.java
M fe/src/main/java/org/apache/impala/analysis/IcebergPartitionSpec.java
M fe/src/main/java/org/apache/impala/analysis/ShowFilesStmt.java
M fe/src/main/java/org/apache/impala/analysis/ShowStatsStmt.java
M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/IcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalFsPartition.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalFsTable.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
A fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/main/java/org/apache/impala/util/IcebergUtil.java
A fe/src/test/java/org/apache/impala/customcluster/IcebergTableQueryTest.java
M testdata/workloads/functional-query/queries/QueryTest/iceberg_create.test
25 files changed, 1,082 insertions(+), 155 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/43/16143/3
--
To view, visit http://gerrit.cloudera.org:8080/16143
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I856cfee4f3397d1a89cf17650e8d4fbfe1f2b006
Gerrit-Change-Number: 16143
Gerrit-PatchSet: 3
Gerrit-Owner: wangsheng 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Zoltan Borok-Nagy 


[Impala-ASF-CR] IMPALA-9741: Supported query icebreg table by impala

2020-07-06 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16143 )

Change subject: IMPALA-9741: Supported query icebreg table by impala
..


Patch Set 2:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/6496/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16143
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I856cfee4f3397d1a89cf17650e8d4fbfe1f2b006
Gerrit-Change-Number: 16143
Gerrit-PatchSet: 2
Gerrit-Owner: wangsheng 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Mon, 06 Jul 2020 08:01:42 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9741: Supported query icebreg table by impala

2020-07-06 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16143 )

Change subject: IMPALA-9741: Supported query icebreg table by impala
..


Patch Set 1:

Build Successful

https://jenkins.impala.io/job/gerrit-code-review-checks/6495/ : Initial code 
review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun 
to run full precommit tests.


--
To view, visit http://gerrit.cloudera.org:8080/16143
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I856cfee4f3397d1a89cf17650e8d4fbfe1f2b006
Gerrit-Change-Number: 16143
Gerrit-PatchSet: 1
Gerrit-Owner: wangsheng 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Comment-Date: Mon, 06 Jul 2020 07:49:19 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-9741: Supported query icebreg table by impala

2020-07-06 Thread wangsheng (Code Review)
wangsheng has uploaded a new patch set (#2). ( 
http://gerrit.cloudera.org:8080/16143 )

Change subject: IMPALA-9741: Supported query icebreg table by impala
..

IMPALA-9741: Supported query icebreg table by impala

This patch mainly realizes the query of iceberg table through impala,
we can use the following sql to create an external iceberg table:
CREATE EXTERNAL TABLE default.iceberg_test (
level string,
event_time timestamp,
message string,
)
STORED AS ICEBERG
LOCATION 'hdfs://xxx'
TBLPROPERTIES ('iceberg_file_format'='parquet');
Or just including table name and location like this:
CREATE EXTERNAL TABLE default.iceberg_test
STORED AS ICEBERG
LOCATION 'hdfs://xxx'
TBLPROPERTIES ('iceberg_file_format'='parquet');
'iceberg_file_format' is the file format in iceberg, currently support
PARQUET and ORC. And if you don't identity this property in your SQL,
default file format is PARQUT.

Change-Id: I856cfee4f3397d1a89cf17650e8d4fbfe1f2b006
---
M be/src/runtime/descriptors.cc
M common/thrift/CatalogObjects.thrift
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
M fe/src/main/java/org/apache/impala/analysis/IcebergPartitionField.java
M fe/src/main/java/org/apache/impala/analysis/IcebergPartitionSpec.java
M fe/src/main/java/org/apache/impala/analysis/ShowFilesStmt.java
M fe/src/main/java/org/apache/impala/analysis/ShowStatsStmt.java
M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/IcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalFsPartition.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalFsTable.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
A fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/main/java/org/apache/impala/util/IcebergUtil.java
M testdata/workloads/functional-query/queries/QueryTest/iceberg_create.test
23 files changed, 852 insertions(+), 135 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/43/16143/2
--
To view, visit http://gerrit.cloudera.org:8080/16143
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I856cfee4f3397d1a89cf17650e8d4fbfe1f2b006
Gerrit-Change-Number: 16143
Gerrit-PatchSet: 2
Gerrit-Owner: wangsheng 
Gerrit-Reviewer: Impala Public Jenkins 


[Impala-ASF-CR] IMPALA-9741: Supported query icebreg table by impala

2020-07-06 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16143 )

Change subject: IMPALA-9741: Supported query icebreg table by impala
..


Patch Set 1:

(17 comments)

http://gerrit.cloudera.org:8080/#/c/16143/1/fe/src/main/java/org/apache/impala/analysis/ShowFilesStmt.java
File fe/src/main/java/org/apache/impala/analysis/ShowFilesStmt.java:

http://gerrit.cloudera.org:8080/#/c/16143/1/fe/src/main/java/org/apache/impala/analysis/ShowFilesStmt.java@80
PS1, Line 80:   "SHOW FILES not applicable to a non hdfs table and non 
iceberg table: %s", tableName_));
line too long (98 > 90)


http://gerrit.cloudera.org:8080/#/c/16143/1/fe/src/main/java/org/apache/impala/analysis/ShowStatsStmt.java
File fe/src/main/java/org/apache/impala/analysis/ShowStatsStmt.java:

http://gerrit.cloudera.org:8080/#/c/16143/1/fe/src/main/java/org/apache/impala/analysis/ShowStatsStmt.java@84
PS1, Line 84:   // There two cases here: Non-partitioned hdfs table and 
non-partitioned iceberg table
line too long (91 > 90)


http://gerrit.cloudera.org:8080/#/c/16143/1/fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java
File fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java:

http://gerrit.cloudera.org:8080/#/c/16143/1/fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java@249
PS1, Line 249: public static Map 
getPartitionColToSourceIdMap(List specs) {
line too long (103 > 90)


http://gerrit.cloudera.org:8080/#/c/16143/1/fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java@286
PS1, Line 286:   
tIcebergTable.setPartition_col_to_source_id_map(icebergTable.getPartitionColToSourceIdMap());
line too long (99 > 90)


http://gerrit.cloudera.org:8080/#/c/16143/1/fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java@309
PS1, Line 309: private static HdfsPartition.FileDescriptor 
getFileDescriptor(FileSystem fs, Path tableLoc,
line too long (95 > 90)


http://gerrit.cloudera.org:8080/#/c/16143/1/fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java@322
PS1, Line 322:   return HdfsPartition.FileDescriptor.create(fileStatus, 
relPath, locations, hostIndex,
line too long (91 > 90)


http://gerrit.cloudera.org:8080/#/c/16143/1/fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java@329
PS1, Line 329: public static Map 
loadAllPartition(String location,
line too long (93 > 90)


http://gerrit.cloudera.org:8080/#/c/16143/1/fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java@337
PS1, Line 337: HdfsPartition.FileDescriptor fileDesc = 
getFileDescriptor(new Path(file.path().toString()),
line too long (99 > 90)


http://gerrit.cloudera.org:8080/#/c/16143/1/fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java@368
PS1, Line 368: 
partition.setFileFormat(IcebergUtil.toTHdfsFileFormat(icebergTable.getIcebergFileFormat()));
line too long (100 > 90)


http://gerrit.cloudera.org:8080/#/c/16143/1/fe/src/main/java/org/apache/impala/catalog/local/LocalFsPartition.java
File fe/src/main/java/org/apache/impala/catalog/local/LocalFsPartition.java:

http://gerrit.cloudera.org:8080/#/c/16143/1/fe/src/main/java/org/apache/impala/catalog/local/LocalFsPartition.java@176
PS1, Line 176:   
table_.getMetaStoreTable().getParameters().get(IcebergTable.ICEBERG_FILE_FORMAT);
line too long (91 > 90)


http://gerrit.cloudera.org:8080/#/c/16143/1/fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java
File fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java:

http://gerrit.cloudera.org:8080/#/c/16143/1/fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java@71
PS1, Line 71: Map partitionColToSourceIdMap = 
Utils.getPartitionColToSourceIdMap(partitionSpecs);
line too long (104 > 90)


http://gerrit.cloudera.org:8080/#/c/16143/1/fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java@78
PS1, Line 78:   ColumnMap cmap, List partitionSpecs, 
Map sourceColsMap,
line too long (100 > 90)


http://gerrit.cloudera.org:8080/#/c/16143/1/fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java@161
PS1, Line 161: localFsTable_.createPrototypePartition(), 
CatalogObject.ThriftObjectType.DESCRIPTOR_ONLY);
line too long (98 > 90)


http://gerrit.cloudera.org:8080/#/c/16143/1/fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java@162
PS1, Line 162: THdfsTable hdfsTable = new 
THdfsTable(localFsTable_.getHdfsBaseDir(), getColumnNames(),
line too long (91 > 90)


http://gerrit.cloudera.org:8080/#/c/16143/1/fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java@163
PS1, Line 163: localFsTable_.getNullPartitionKeyValue(), 
FeFsTable.DEFAULT_NULL_COLUMN_VALUE, idToPartition,
line too long (101 > 90)


http://gerrit.cloudera.org:8080/#/c/16143/1/fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java
File 

[Impala-ASF-CR] IMPALA-9741: Supported query icebreg table by impala

2020-07-06 Thread wangsheng (Code Review)
wangsheng has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/16143


Change subject: IMPALA-9741: Supported query icebreg table by impala
..

IMPALA-9741: Supported query icebreg table by impala

This patch mainly realizes the query of iceberg table through impala,
we can use the following sql to create an external iceberg table:
CREATE EXTERNAL TABLE default.iceberg_test (
level string,
event_time timestamp,
message string,
)
STORED AS ICEBERG
LOCATION 'hdfs://xxx'
TBLPROPERTIES ('iceberg_file_format'='parquet');
Or just including table name and location like this:
CREATE EXTERNAL TABLE default.iceberg_test
STORED AS ICEBERG
LOCATION 'hdfs://xxx'
TBLPROPERTIES ('iceberg_file_format'='parquet');
'iceberg_file_format' is the file format in iceberg, currently support
PARQUET and ORC. And if you don't identity this property in your SQL,
default file format is PARQUT.

Change-Id: I856cfee4f3397d1a89cf17650e8d4fbfe1f2b006
---
M be/src/runtime/descriptors.cc
M common/thrift/CatalogObjects.thrift
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/ComputeStatsStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
M fe/src/main/java/org/apache/impala/analysis/IcebergPartitionField.java
M fe/src/main/java/org/apache/impala/analysis/IcebergPartitionSpec.java
M fe/src/main/java/org/apache/impala/analysis/ShowFilesStmt.java
M fe/src/main/java/org/apache/impala/analysis/ShowStatsStmt.java
M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
M fe/src/main/java/org/apache/impala/catalog/CatalogServiceCatalog.java
M fe/src/main/java/org/apache/impala/catalog/FeIcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/IcebergTable.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalFsPartition.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalFsTable.java
M fe/src/main/java/org/apache/impala/catalog/local/LocalIcebergTable.java
M fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
A fe/src/main/java/org/apache/impala/planner/IcebergScanNode.java
M fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
M fe/src/main/java/org/apache/impala/service/Frontend.java
M fe/src/main/java/org/apache/impala/util/IcebergUtil.java
M testdata/workloads/functional-query/queries/QueryTest/iceberg_create.test
23 files changed, 842 insertions(+), 135 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/43/16143/1
--
To view, visit http://gerrit.cloudera.org:8080/16143
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I856cfee4f3397d1a89cf17650e8d4fbfe1f2b006
Gerrit-Change-Number: 16143
Gerrit-PatchSet: 1
Gerrit-Owner: wangsheng