[Impala-ASF-CR] IMPALA-9741: Support querying Iceberg table by impala

Zoltan Borok-Nagy (Code Review) Sun, 19 Jul 2020 04:36:01 -0700

Zoltan Borok-Nagy has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/16143 )


Change subject: IMPALA-9741: Support querying Iceberg table by impala
......................................................................


Patch Set 13:

(10 comments)

Thanks for applying the changes and for the tests. Unfortunately last time the 
gerrit "New UI" hid some files from me...

http://gerrit.cloudera.org:8080/#/c/16143/13//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/16143/13//COMMIT_MSG@38
PS13, Line 38: custom cluster test test_iceberg.py
The tests are added to test_scanners.py


http://gerrit.cloudera.org:8080/#/c/16143/13/testdata/data/README
File testdata/data/README:

http://gerrit.cloudera.org:8080/#/c/16143/13/testdata/data/README@503
PS13, Line 503: Supported query icebreg table by impala
Please update this to reflect the current title


http://gerrit.cloudera.org:8080/#/c/16143/13/testdata/data/README@504
PS13, Line 504: Apache Iceberg is an open table format for huge analytic 
datasets.
              : Iceberg avoids unpleasant surprises. Schema evolution works and 
won’t inadvertently
              : un-delete data. Users don’t need to know about partitioning to 
get fast queries.
              : And Iceberg was designed to solve correctness problems in 
eventually-consistent cloud
              : object stores.
Please provide a description about the tables in the iceberg directory, not 
about Iceberg itself.

Also, please mention how the data was generated, e.g. what tool and what 
version.


http://gerrit.cloudera.org:8080/#/c/16143/13/testdata/datasets/functional/functional_schema_template.sql
File testdata/datasets/functional/functional_schema_template.sql:

http://gerrit.cloudera.org:8080/#/c/16143/13/testdata/datasets/functional/functional_schema_template.sql@2868
PS13, Line 2868: STORED AS ICEBERG
Don't you need to provide a partition spec?

Also, please add a SHOW PARTITIONS test for it in one of the test files.


http://gerrit.cloudera.org:8080/#/c/16143/13/testdata/workloads/functional-query/queries/QueryTest/iceberg_query.test
File testdata/workloads/functional-query/queries/QueryTest/iceberg_query.test:

http://gerrit.cloudera.org:8080/#/c/16143/13/testdata/workloads/functional-query/queries/QueryTest/iceberg_query.test@3
PS13, Line 3: Supported query icebreg table by impala
Please update this


http://gerrit.cloudera.org:8080/#/c/16143/13/testdata/workloads/functional-query/queries/QueryTest/iceberg_query.test@42
PS13, Line 42: 14
You could add a RUNTIME_PROFILE section to check that partition pruning worked. 
E.g.

https://github.com/apache/impala/blob/76e4a17fb379bb232618dccb4ad3504dbe5c945c/testdata/workloads/functional-query/queries/QueryTest/runtime_row_filters.test#L29


http://gerrit.cloudera.org:8080/#/c/16143/13/testdata/workloads/functional-query/queries/QueryTest/iceberg_query.test@77
PS13, Line 77: SELECT count(*) from iceberg_non_partitioned
Please add non-count queries as well.

After you are convinced that the queries produce the correct results, you can 
use the following command to re-generate this file:

 bin/impala-py.test --update_results 
tests/query_test/test_scanners.py::TestIceberg::test_iceberg

You'll find the generated .test file in 'logs/ee_tests/'

Alternatively, you can set these in the Impala shell:

 set write_delimited=true;
 set delimiter=,;

Then Impala shell will produce an output that can be copy-pasted here.


http://gerrit.cloudera.org:8080/#/c/16143/13/testdata/workloads/functional-query/queries/QueryTest/iceberg_query.test@83
PS13, Line 83: ====
Please add SHOW FILES for each table.

Also, you could add an iceberg-negative.test that'd check we get proper error 
messages for features not supported for Iceberg. E.g.: 
https://github.com/apache/impala/blob/master/testdata/workloads/functional-query/queries/QueryTest/acid-negative.test


http://gerrit.cloudera.org:8080/#/c/16143/13/tests/custom_cluster/test_iceberg.py
File tests/custom_cluster/test_iceberg.py:

http://gerrit.cloudera.org:8080/#/c/16143/13/tests/custom_cluster/test_iceberg.py@23
PS13, Line 23: class TestCreatingIcebergTable(CustomClusterTestSuite):
This test could be put into tests/query_test, instead of tests/custom_cluster. 
And it could just inherit from ImpalaTestSuite. This way we wouldn't need to 
restart the Impala cluster whenever this test is being run.


http://gerrit.cloudera.org:8080/#/c/16143/13/tests/custom_cluster/test_iceberg.py@30
PS13, Line 30: @pytest.mark.execute_serially
I think there's no need for this annotation. Since this test is using a unique 
database it can be run in parallel with other tests.



--
To view, visit http://gerrit.cloudera.org:8080/16143
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I856cfee4f3397d1a89cf17650e8d4fbfe1f2b006
Gerrit-Change-Number: 16143
Gerrit-PatchSet: 13
Gerrit-Owner: wangsheng <[email protected]>
Gerrit-Reviewer: Impala Public Jenkins <[email protected]>
Gerrit-Reviewer: Zoltan Borok-Nagy <[email protected]>
Gerrit-Reviewer: wangsheng <[email protected]>
Gerrit-Comment-Date: Sun, 19 Jul 2020 11:34:42 +0000
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-9741: Support querying Iceberg table by impala

Reply via email to