[Impala-ASF-CR] IMPALA-8839: Remove COLUMN STATS ACCURATE from properties
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/14037 ) Change subject: IMPALA-8839: Remove COLUMN_STATS_ACCURATE from properties .. IMPALA-8839: Remove COLUMN_STATS_ACCURATE from properties Hive depends on property COLUMN_STATS_ACCURATE to tell if the stored statistics accurate. After Impala inserts data, it does not set statistics values up-to-date(for example numRows). Impala should unset COLUMN_STATS_ACCURATE to tell Hive the stored stats are no longer accurate. The patch impletes: After Impala insert data, Remove COLUMN_STATS_ACCURATE from table properties if it exists Remove COLUMN_STATS_ACCURATE from partition params if it exists Add helper methods to handle alter table/partition for acid tables. Implements the stats changes above for both acid/non-acid tables. Tests: Manual tests. Run core tests. Add ee tests to test interop with Hive for acid/external tables. Change-Id: I13f4a77022a7112e10a07314359f927eae083deb Reviewed-on: http://gerrit.cloudera.org:8080/14037 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins --- M fe/src/compat-hive-2/java/org/apache/impala/compat/MetastoreShim.java M fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java A testdata/workloads/functional-query/queries/QueryTest/acid-clear-statsaccurate.test A testdata/workloads/functional-query/queries/QueryTest/clear-statsaccurate.test M tests/query_test/test_acid.py 6 files changed, 340 insertions(+), 5 deletions(-) Approvals: Impala Public Jenkins: Looks good to me, approved; Verified -- To view, visit http://gerrit.cloudera.org:8080/14037 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I13f4a77022a7112e10a07314359f927eae083deb Gerrit-Change-Number: 14037 Gerrit-PatchSet: 8 Gerrit-Owner: Yongzhi Chen Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Yongzhi Chen Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-8839: Remove COLUMN STATS ACCURATE from properties
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14037 ) Change subject: IMPALA-8839: Remove COLUMN_STATS_ACCURATE from properties .. Patch Set 7: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/14037 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I13f4a77022a7112e10a07314359f927eae083deb Gerrit-Change-Number: 14037 Gerrit-PatchSet: 7 Gerrit-Owner: Yongzhi Chen Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Yongzhi Chen Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 13 Aug 2019 18:55:08 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8839: Remove COLUMN STATS ACCURATE from properties
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/14037 ) Change subject: IMPALA-8839: Remove COLUMN_STATS_ACCURATE from properties .. Patch Set 6: Code-Review+2 Okay, thanks for the explanation! -- To view, visit http://gerrit.cloudera.org:8080/14037 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I13f4a77022a7112e10a07314359f927eae083deb Gerrit-Change-Number: 14037 Gerrit-PatchSet: 6 Gerrit-Owner: Yongzhi Chen Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Yongzhi Chen Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 13 Aug 2019 14:41:58 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8839: Remove COLUMN STATS ACCURATE from properties
Yongzhi Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/14037 ) Change subject: IMPALA-8839: Remove COLUMN_STATS_ACCURATE from properties .. Patch Set 6: (1 comment) Because partition's column_stats_accurate property is never shown in "show create table" or show partitions, I added Hive query select count(*) in the tests to show the patch works by showing hive does not return wrong values for count after impala insert. http://gerrit.cloudera.org:8080/#/c/14037/6/testdata/workloads/functional-query/queries/QueryTest/clear-statsaccurate.test File testdata/workloads/functional-query/queries/QueryTest/clear-statsaccurate.test: http://gerrit.cloudera.org:8080/#/c/14037/6/testdata/workloads/functional-query/queries/QueryTest/clear-statsaccurate.test@45 PS6, Line 45: > Again, no checks about the stats for the partitioned table. the same reason -- To view, visit http://gerrit.cloudera.org:8080/14037 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I13f4a77022a7112e10a07314359f927eae083deb Gerrit-Change-Number: 14037 Gerrit-PatchSet: 6 Gerrit-Owner: Yongzhi Chen Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Yongzhi Chen Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 13 Aug 2019 14:19:24 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8839: Remove COLUMN STATS ACCURATE from properties
Yongzhi Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/14037 ) Change subject: IMPALA-8839: Remove COLUMN_STATS_ACCURATE from properties .. Patch Set 6: Because partition's column_stats_accurate property is never shown in "show create table" or show partitions, I added Hive query select count(*) in the tests to show the patch works by showing hive does not return wrong values for count after impala insert. -- To view, visit http://gerrit.cloudera.org:8080/14037 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I13f4a77022a7112e10a07314359f927eae083deb Gerrit-Change-Number: 14037 Gerrit-PatchSet: 6 Gerrit-Owner: Yongzhi Chen Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Yongzhi Chen Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 13 Aug 2019 14:17:20 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8839: Remove COLUMN STATS ACCURATE from properties
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/14037 ) Change subject: IMPALA-8839: Remove COLUMN_STATS_ACCURATE from properties .. Patch Set 6: (2 comments) http://gerrit.cloudera.org:8080/#/c/14037/6/testdata/workloads/functional-query/queries/QueryTest/acid-clear-statsaccurate.test File testdata/workloads/functional-query/queries/QueryTest/acid-clear-statsaccurate.test: http://gerrit.cloudera.org:8080/#/c/14037/6/testdata/workloads/functional-query/queries/QueryTest/acid-clear-statsaccurate.test@52 PS6, Line 52: Didn't you want to add a 'show create table' statement for the partitioned table to check the removal of COLUMN_STATS_ACCURATE? http://gerrit.cloudera.org:8080/#/c/14037/6/testdata/workloads/functional-query/queries/QueryTest/clear-statsaccurate.test File testdata/workloads/functional-query/queries/QueryTest/clear-statsaccurate.test: http://gerrit.cloudera.org:8080/#/c/14037/6/testdata/workloads/functional-query/queries/QueryTest/clear-statsaccurate.test@45 PS6, Line 45: Again, no checks about the stats for the partitioned table. -- To view, visit http://gerrit.cloudera.org:8080/14037 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I13f4a77022a7112e10a07314359f927eae083deb Gerrit-Change-Number: 14037 Gerrit-PatchSet: 6 Gerrit-Owner: Yongzhi Chen Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Yongzhi Chen Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 13 Aug 2019 14:00:10 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8839: Remove COLUMN STATS ACCURATE from properties
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14037 ) Change subject: IMPALA-8839: Remove COLUMN_STATS_ACCURATE from properties .. Patch Set 6: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/4222/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/14037 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I13f4a77022a7112e10a07314359f927eae083deb Gerrit-Change-Number: 14037 Gerrit-PatchSet: 6 Gerrit-Owner: Yongzhi Chen Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Yongzhi Chen Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Mon, 12 Aug 2019 19:24:22 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8839: Remove COLUMN STATS ACCURATE from properties
Yongzhi Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/14037 ) Change subject: IMPALA-8839: Remove COLUMN_STATS_ACCURATE from properties .. Patch Set 5: (2 comments) submit patch set 6 http://gerrit.cloudera.org:8080/#/c/14037/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: http://gerrit.cloudera.org:8080/#/c/14037/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@3762 PS3, Line 3762: table.getDb().getName(), table.getName()); : } : } > You could get the write id of INSERT the same way as we get the transaction I got the conclusion of the same writeid by testing as the following: This list the write ID in partitions table after 1. Hive did compute stats for the whole table (analyze table insertonly_part_colstats compute statistics for columns;) . 2. Impala insert a row with column stats accurate remove. 3. Hive compute stats for the whole table again. You can see the write ID in partitions increased only 1 each time which means no waste of write id number. And it consists with hive's insert statement. See the last select on partitions(it is after hive insert a row to the 2010-01-01 partiton: HMS_home_yongzhi_Impala_cdp=> select * from "PARTITIONS" where "TBL_ID"=3274; PART_ID | CREATE_TIME | LAST_ACCESS_TIME | PART_NAME | SD_ID | TBL_ID | WRITE_ID -+-+--+---+---++-- 15991 | 1565289739 |0 | ds=2010-01-01 | 19250 | 3274 | 18 15992 | 1565289739 |0 | ds=2010-01-02 | 19251 | 3274 | 18 (2 rows) HMS_home_yongzhi_Impala_cdp=> select * from "PARTITIONS" where "TBL_ID"=3274; PART_ID | CREATE_TIME | LAST_ACCESS_TIME | PART_NAME | SD_ID | TBL_ID | WRITE_ID -+-+--+---+---++-- 15991 | 1565289739 |0 | ds=2010-01-01 | 19250 | 3274 | 19 15992 | 1565289739 |0 | ds=2010-01-02 | 19251 | 3274 | 18 (2 rows) HMS_home_yongzhi_Impala_cdp=> select * from "PARTITION_PARAMS" where "PART_ID"=15991; PART_ID | PARAM_KEY | PARAM_VALUE -+---+- 15991 | transient_lastDdlTime | 1565633500 15991 | numFiles | 4 15991 | totalSize | 8 15991 | numRows | 4 15991 | rawDataSize | 4 (5 rows) HMS_home_yongzhi_Impala_cdp=> select * from "TBLS" where "TBL_NAME"='insertonly_part_colstats'; HMS_home_yongzhi_Impala_cdp=> select * from "PARTITIONS" where "TBL_ID"=3274; PART_ID | CREATE_TIME | LAST_ACCESS_TIME | PART_NAME | SD_ID | TBL_ID | WRITE_ID -+-+--+---+---++-- 15991 | 1565289739 |0 | ds=2010-01-01 | 19250 | 3274 | 20 15992 | 1565289739 |0 | ds=2010-01-02 | 19251 | 3274 | 20 (2 rows) HMS_home_yongzhi_Impala_cdp=> select * from "PARTITIONS" where "TBL_ID"=3274; PART_ID | CREATE_TIME | LAST_ACCESS_TIME | PART_NAME | SD_ID | TBL_ID | WRITE_ID -+-+--+---+---++-- 15991 | 1565289739 |0 | ds=2010-01-01 | 19250 | 3274 | 21 15992 | 1565289739 |0 | ds=2010-01-02 | 19251 | 3274 | 20 (2 rows) http://gerrit.cloudera.org:8080/#/c/14037/5/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: http://gerrit.cloudera.org:8080/#/c/14037/5/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@3718 PS5, Line 3718: if (update.isSetTransaction_id()) { : transactionId = update.getTransaction_id(); : } > nit: fits single line Done -- To view, visit http://gerrit.cloudera.org:8080/14037 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I13f4a77022a7112e10a07314359f927eae083deb Gerrit-Change-Number: 14037 Gerrit-PatchSet: 5 Gerrit-Owner: Yongzhi Chen Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Yongzhi Chen Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Mon, 12 Aug 2019 18:43:52 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8839: Remove COLUMN STATS ACCURATE from properties
Hello Zoltan Borok-Nagy, Csaba Ringhofer, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/14037 to look at the new patch set (#6). Change subject: IMPALA-8839: Remove COLUMN_STATS_ACCURATE from properties .. IMPALA-8839: Remove COLUMN_STATS_ACCURATE from properties Hive depends on property COLUMN_STATS_ACCURATE to tell if the stored statistics accurate. After Impala inserts data, it does not set statistics values up-to-date(for example numRows). Impala should unset COLUMN_STATS_ACCURATE to tell Hive the stored stats are no longer accurate. The patch impletes: After Impala insert data, Remove COLUMN_STATS_ACCURATE from table properties if it exists Remove COLUMN_STATS_ACCURATE from partition params if it exists Add helper methods to handle alter table/partition for acid tables. Implements the stats changes above for both acid/non-acid tables. Tests: Manual tests. Run core tests. Add ee tests to test interop with Hive for acid/external tables. Change-Id: I13f4a77022a7112e10a07314359f927eae083deb --- M fe/src/compat-hive-2/java/org/apache/impala/compat/MetastoreShim.java M fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java A testdata/workloads/functional-query/queries/QueryTest/acid-clear-statsaccurate.test A testdata/workloads/functional-query/queries/QueryTest/clear-statsaccurate.test M tests/query_test/test_acid.py 6 files changed, 340 insertions(+), 5 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/37/14037/6 -- To view, visit http://gerrit.cloudera.org:8080/14037 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I13f4a77022a7112e10a07314359f927eae083deb Gerrit-Change-Number: 14037 Gerrit-PatchSet: 6 Gerrit-Owner: Yongzhi Chen Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Yongzhi Chen Gerrit-Reviewer: Zoltan Borok-Nagy
[Impala-ASF-CR] IMPALA-8839: Remove COLUMN STATS ACCURATE from properties
Zoltan Borok-Nagy has posted comments on this change. ( http://gerrit.cloudera.org:8080/14037 ) Change subject: IMPALA-8839: Remove COLUMN_STATS_ACCURATE from properties .. Patch Set 5: (2 comments) Did a quick initial pass over it. Looks good to me in overall but I'm planning to do another pass tomorrow. http://gerrit.cloudera.org:8080/#/c/14037/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: http://gerrit.cloudera.org:8080/#/c/14037/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@3762 PS3, Line 3762: table.getDb().getName(), table.getName()); : } : } > From my test, it seems the same value. How to get Insert statement's writeI You could get the write id of INSERT the same way as we get the transaction id, i.e. putting it in the relevant thrift object and transfer it from the coordinator. But since allocateTableWriteId() returns the same write id I think it's not a problem to get it this way. It's just one extra round-trip to HMS. http://gerrit.cloudera.org:8080/#/c/14037/5/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: http://gerrit.cloudera.org:8080/#/c/14037/5/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@3718 PS5, Line 3718: if (update.isSetTransaction_id()) { : transactionId = update.getTransaction_id(); : } nit: fits single line -- To view, visit http://gerrit.cloudera.org:8080/14037 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I13f4a77022a7112e10a07314359f927eae083deb Gerrit-Change-Number: 14037 Gerrit-PatchSet: 5 Gerrit-Owner: Yongzhi Chen Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Yongzhi Chen Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Mon, 12 Aug 2019 17:08:15 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8839: Remove COLUMN STATS ACCURATE from properties
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14037 ) Change subject: IMPALA-8839: Remove COLUMN_STATS_ACCURATE from properties .. Patch Set 5: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/4216/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/14037 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I13f4a77022a7112e10a07314359f927eae083deb Gerrit-Change-Number: 14037 Gerrit-PatchSet: 5 Gerrit-Owner: Yongzhi Chen Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Yongzhi Chen Gerrit-Comment-Date: Sat, 10 Aug 2019 20:20:18 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8839: Remove COLUMN STATS ACCURATE from properties
Yongzhi Chen has posted comments on this change. ( http://gerrit.cloudera.org:8080/14037 ) Change subject: IMPALA-8839: Remove COLUMN_STATS_ACCURATE from properties .. Patch Set 5: (2 comments) Submit patch 5 http://gerrit.cloudera.org:8080/#/c/14037/3/fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java File fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java: http://gerrit.cloudera.org:8080/#/c/14037/3/fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java@110 PS3, Line 110: : /** > This signature should be changed in the Hive2 MetastoreShim Done http://gerrit.cloudera.org:8080/#/c/14037/3/tests/query_test/test_acid.py File tests/query_test/test_acid.py: http://gerrit.cloudera.org:8080/#/c/14037/3/tests/query_test/test_acid.py@104 PS3, Line 104: pIfABFS.hive > This sounds weird, it would be good to investigate the cause, but I am ok w I will do more research later, I have a feeling that hive client and impala clients not using the same thread (at least sometimes), they cannot see each other's changes sometimes because one calls before another one really finishes. -- To view, visit http://gerrit.cloudera.org:8080/14037 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I13f4a77022a7112e10a07314359f927eae083deb Gerrit-Change-Number: 14037 Gerrit-PatchSet: 5 Gerrit-Owner: Yongzhi Chen Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Yongzhi Chen Gerrit-Comment-Date: Sat, 10 Aug 2019 20:08:20 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8839: Remove COLUMN STATS ACCURATE from properties
Hello Csaba Ringhofer, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/14037 to look at the new patch set (#5). Change subject: IMPALA-8839: Remove COLUMN_STATS_ACCURATE from properties .. IMPALA-8839: Remove COLUMN_STATS_ACCURATE from properties Hive depends on property COLUMN_STATS_ACCURATE to tell if the stored statistics accurate. After Impala inserts data, it does not set statistics values up-to-date(for example numRows). Impala should unset COLUMN_STATS_ACCURATE to tell Hive the stored stats are no longer accurate. The patch impletes: After Impala insert data, Remove COLUMN_STATS_ACCURATE from table properties if it exists Remove COLUMN_STATS_ACCURATE from partition params if it exists Add helper methods to handle alter table/partition for acid tables. Implements the stats changes above for both acid/non-acid tables. Tests: Manual tests. Run core tests. Add ee tests to test interop with Hive for acid/external tables. Change-Id: I13f4a77022a7112e10a07314359f927eae083deb --- M fe/src/compat-hive-2/java/org/apache/impala/compat/MetastoreShim.java M fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java A testdata/workloads/functional-query/queries/QueryTest/acid-clear-statsaccurate.test A testdata/workloads/functional-query/queries/QueryTest/clear-statsaccurate.test M tests/query_test/test_acid.py 6 files changed, 342 insertions(+), 5 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/37/14037/5 -- To view, visit http://gerrit.cloudera.org:8080/14037 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I13f4a77022a7112e10a07314359f927eae083deb Gerrit-Change-Number: 14037 Gerrit-PatchSet: 5 Gerrit-Owner: Yongzhi Chen Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Yongzhi Chen
[Impala-ASF-CR] IMPALA-8839: Remove COLUMN STATS ACCURATE from properties
Csaba Ringhofer has posted comments on this change. ( http://gerrit.cloudera.org:8080/14037 ) Change subject: IMPALA-8839: Remove COLUMN_STATS_ACCURATE from properties .. Patch Set 4: Code-Review+1 (2 comments) Besides the build fix for Hive 2 builds looks good to me. http://gerrit.cloudera.org:8080/#/c/14037/3/fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java File fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java: http://gerrit.cloudera.org:8080/#/c/14037/3/fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java@110 PS3, Line 110: : /** > Done This signature should be changed in the Hive2 MetastoreShim too (this lead to compilation error in precommit build) http://gerrit.cloudera.org:8080/#/c/14037/3/tests/query_test/test_acid.py File tests/query_test/test_acid.py: http://gerrit.cloudera.org:8080/#/c/14037/3/tests/query_test/test_acid.py@104 PS3, Line 104: pIfABFS.hive > I have to use hive to create tables to make the test work for I need hive t This sounds weird, it would be good to investigate the cause, but I am ok with leaving it as it is for this patch. It is a known thing that without SYNC_DDL, other Impala coordinators may not see the new tables for some time, but it is surprising with Hive - Impala CREATE TABLE should wait for the table to be written to Metastore, and Hive queries always check the metastore for fresh data as far as I know. -- To view, visit http://gerrit.cloudera.org:8080/14037 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I13f4a77022a7112e10a07314359f927eae083deb Gerrit-Change-Number: 14037 Gerrit-PatchSet: 4 Gerrit-Owner: Yongzhi Chen Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Yongzhi Chen Gerrit-Comment-Date: Sat, 10 Aug 2019 14:46:49 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8839: Remove COLUMN STATS ACCURATE from properties
Hello Csaba Ringhofer, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/14037 to look at the new patch set (#4). Change subject: IMPALA-8839: Remove COLUMN_STATS_ACCURATE from properties .. IMPALA-8839: Remove COLUMN_STATS_ACCURATE from properties Hive depends on property COLUMN_STATS_ACCURATE to tell if the stored statistics accurate. After Impala inserts data, it does not set statistics values up-to-date(for example numRows). Impala should unset COLUMN_STATS_ACCURATE to tell Hive the stored stats are no longer accurate. The patch impletes: After Impala insert data, Remove COLUMN_STATS_ACCURATE from table properties if it exists Remove COLUMN_STATS_ACCURATE from partition params if it exists Add helper methods to handle alter table/partition for acid tables. Implements the stats changes above for both acid/non-acid tables. Tests: Manual tests. Run core tests. Add ee tests to test interop with Hive for acid/external tables. Change-Id: I13f4a77022a7112e10a07314359f927eae083deb --- M fe/src/compat-hive-2/java/org/apache/impala/compat/MetastoreShim.java M fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java A testdata/workloads/functional-query/queries/QueryTest/acid-clear-statsaccurate.test A testdata/workloads/functional-query/queries/QueryTest/clear-statsaccurate.test M tests/query_test/test_acid.py 6 files changed, 340 insertions(+), 5 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/37/14037/4 -- To view, visit http://gerrit.cloudera.org:8080/14037 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I13f4a77022a7112e10a07314359f927eae083deb Gerrit-Change-Number: 14037 Gerrit-PatchSet: 4 Gerrit-Owner: Yongzhi Chen Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-8839: Remove COLUMN STATS ACCURATE from properties
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14037 ) Change subject: IMPALA-8839: Remove COLUMN_STATS_ACCURATE from properties .. Patch Set 3: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/4205/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/14037 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I13f4a77022a7112e10a07314359f927eae083deb Gerrit-Change-Number: 14037 Gerrit-PatchSet: 3 Gerrit-Owner: Yongzhi Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Fri, 09 Aug 2019 15:19:59 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8839: Remove COLUMN STATS ACCURATE from properties
Hello Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/14037 to look at the new patch set (#3). Change subject: IMPALA-8839: Remove COLUMN_STATS_ACCURATE from properties .. IMPALA-8839: Remove COLUMN_STATS_ACCURATE from properties After Impala insert data, Remove COLUMN_STATS_ACCURATE from table properties if it exists Remove COLUMN_STATS_ACCURATE from partition params if it exists Add helper methods to handle alter table/partition for acid tables. Implemented the stats change for both acid/non-acid tables. Tests: Manual tests. Add ee tests to test interop with Hive for acid/external tables. Change-Id: I13f4a77022a7112e10a07314359f927eae083deb --- M fe/src/compat-hive-2/java/org/apache/impala/compat/MetastoreShim.java M fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java A testdata/workloads/functional-query/queries/QueryTest/acid-clear-statsaccurate.test A testdata/workloads/functional-query/queries/QueryTest/clear-statsaccurate.test M tests/query_test/test_acid.py 6 files changed, 338 insertions(+), 5 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/37/14037/3 -- To view, visit http://gerrit.cloudera.org:8080/14037 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I13f4a77022a7112e10a07314359f927eae083deb Gerrit-Change-Number: 14037 Gerrit-PatchSet: 3 Gerrit-Owner: Yongzhi Chen Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-8839: Remove COLUMN STATS ACCURATE from properties
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14037 ) Change subject: IMPALA-8839: Remove COLUMN_STATS_ACCURATE from properties .. Patch Set 2: Build Successful https://jenkins.impala.io/job/gerrit-code-review-checks/4197/ : Initial code review checks passed. Use gerrit-verify-dryrun-external or gerrit-verify-dryrun to run full precommit tests. -- To view, visit http://gerrit.cloudera.org:8080/14037 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I13f4a77022a7112e10a07314359f927eae083deb Gerrit-Change-Number: 14037 Gerrit-PatchSet: 2 Gerrit-Owner: Yongzhi Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Fri, 09 Aug 2019 02:21:08 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8839: Remove COLUMN STATS ACCURATE from properties
Hello Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/14037 to look at the new patch set (#2). Change subject: IMPALA-8839: Remove COLUMN_STATS_ACCURATE from properties .. IMPALA-8839: Remove COLUMN_STATS_ACCURATE from properties After Impala insert data, Remove COLUMN_STATS_ACCURATE from table properties if it exists Remove COLUMN_STATS_ACCURATE from partition params if it exists Add helper methods to handle alter table/partition for acid table. Implemented for both acid/non-acid tables. Tests: Manual tests. Add ee test to test interop with Hive for acid table. Change-Id: I13f4a77022a7112e10a07314359f927eae083deb --- M fe/src/compat-hive-2/java/org/apache/impala/compat/MetastoreShim.java M fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java A testdata/workloads/functional-query/queries/QueryTest/acid-clear-statsaccurate.test A testdata/workloads/functional-query/queries/QueryTest/clear-statsaccurate.test M tests/query_test/test_acid.py 6 files changed, 332 insertions(+), 5 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/37/14037/2 -- To view, visit http://gerrit.cloudera.org:8080/14037 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I13f4a77022a7112e10a07314359f927eae083deb Gerrit-Change-Number: 14037 Gerrit-PatchSet: 2 Gerrit-Owner: Yongzhi Chen Gerrit-Reviewer: Impala Public Jenkins
[Impala-ASF-CR] IMPALA-8839: Remove COLUMN STATS ACCURATE from properties
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14037 ) Change subject: IMPALA-8839: Remove COLUMN_STATS_ACCURATE from properties .. Patch Set 1: Build Failed https://jenkins.impala.io/job/gerrit-code-review-checks/4187/ : Initial code review checks failed. See linked job for details on the failure. -- To view, visit http://gerrit.cloudera.org:8080/14037 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I13f4a77022a7112e10a07314359f927eae083deb Gerrit-Change-Number: 14037 Gerrit-PatchSet: 1 Gerrit-Owner: Yongzhi Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Thu, 08 Aug 2019 21:58:08 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8839: Remove COLUMN STATS ACCURATE from properties
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/14037 ) Change subject: IMPALA-8839: Remove COLUMN_STATS_ACCURATE from properties .. Patch Set 1: (4 comments) http://gerrit.cloudera.org:8080/#/c/14037/1/tests/query_test/test_acid.py File tests/query_test/test_acid.py: http://gerrit.cloudera.org:8080/#/c/14037/1/tests/query_test/test_acid.py@23 PS1, Line 23: import pytest flake8: F401 'pytest' imported but unused http://gerrit.cloudera.org:8080/#/c/14037/1/tests/query_test/test_acid.py@24 PS1, Line 24: import time flake8: F401 'time' imported but unused http://gerrit.cloudera.org:8080/#/c/14037/1/tests/query_test/test_acid.py@93 PS1, Line 93: # flake8: E265 block comment should start with '# ' http://gerrit.cloudera.org:8080/#/c/14037/1/tests/query_test/test_acid.py@96 PS1, Line 96: # flake8: E265 block comment should start with '# ' -- To view, visit http://gerrit.cloudera.org:8080/14037 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I13f4a77022a7112e10a07314359f927eae083deb Gerrit-Change-Number: 14037 Gerrit-PatchSet: 1 Gerrit-Owner: Yongzhi Chen Gerrit-Reviewer: Impala Public Jenkins Gerrit-Comment-Date: Thu, 08 Aug 2019 21:18:24 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8839: Remove COLUMN STATS ACCURATE from properties
Yongzhi Chen has uploaded this change for review. ( http://gerrit.cloudera.org:8080/14037 Change subject: IMPALA-8839: Remove COLUMN_STATS_ACCURATE from properties .. IMPALA-8839: Remove COLUMN_STATS_ACCURATE from properties After Impala insert data, Remove COLUMN_STATS_ACCURATE from table properties if it exists Remove COLUMN_STATS_ACCURATE from partition params if it exists Add helper methods to handle alter table/partition for acid table. Implemented for both acid/non-acid tables. Tests: Manual tests. Add ee test to test interop with Hive for acid table. Change-Id: I13f4a77022a7112e10a07314359f927eae083deb --- M fe/src/compat-hive-2/java/org/apache/impala/compat/MetastoreShim.java M fe/src/compat-hive-3/java/org/apache/impala/compat/MetastoreShim.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java A testdata/workloads/functional-query/queries/QueryTest/acid-clear-statsaccurate.test A testdata/workloads/functional-query/queries/QueryTest/clear-statsaccurate.test M tests/query_test/test_acid.py 6 files changed, 326 insertions(+), 5 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/37/14037/1 -- To view, visit http://gerrit.cloudera.org:8080/14037 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I13f4a77022a7112e10a07314359f927eae083deb Gerrit-Change-Number: 14037 Gerrit-PatchSet: 1 Gerrit-Owner: Yongzhi Chen