[Impala-ASF-CR] IMPALA-7077. Add a configuration for the max number of partitions to load
Bharath Vissapragada has abandoned this change. ( http://gerrit.cloudera.org:8080/10584 ) Change subject: IMPALA-7077. Add a configuration for the max number of partitions to load .. Abandoned Stale now. Feel free to restore if needed. -- To view, visit http://gerrit.cloudera.org:8080/10584 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: abandon Gerrit-Change-Id: Ife99a97a891ed14675303ea472abb2932a72cb51 Gerrit-Change-Number: 10584 Gerrit-PatchSet: 1 Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Vuk Ercegovac
[Impala-ASF-CR] IMPALA-7077. Add a configuration for the max number of partitions to load
Vuk Ercegovac has posted comments on this change. ( http://gerrit.cloudera.org:8080/10584 ) Change subject: IMPALA-7077. Add a configuration for the max number of partitions to load .. Patch Set 1: Another idea that also came up was to go with a flag like the one in this change, but base the limit on memory consumed. That would cover more scenarios for banning a table: num partitions, num files, num columns, incremental stats, or some combination. We can obtain the size estimate at the per partition granularity, so once exceeded, back out as in this change. There's still the downside of a static flag setting (less flexibility) but there is some ease-of-use that can be argued as well. -- To view, visit http://gerrit.cloudera.org:8080/10584 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ife99a97a891ed14675303ea472abb2932a72cb51 Gerrit-Change-Number: 10584 Gerrit-PatchSet: 1 Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Tue, 05 Jun 2018 22:07:54 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7077. Add a configuration for the max number of partitions to load
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/10584 ) Change subject: IMPALA-7077. Add a configuration for the max number of partitions to load .. Patch Set 1: (2 comments) http://gerrit.cloudera.org:8080/#/c/10584/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: http://gerrit.cloudera.org:8080/#/c/10584/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@2067 PS1, Line 2067: if (curPartitionCount + allHmsPartitionsToAdd.size() > HdfsTable.MAX_PARTITIONS_PER_TABLE) { > Yea, we use 90 chars for Java too. Gerrit defaults to 100. You can re-configure Gerrit for 90 in some config option if that's your thing. http://gerrit.cloudera.org:8080/#/c/10584/1/fe/src/test/java/org/apache/impala/catalog/CatalogTest.java File fe/src/test/java/org/apache/impala/catalog/CatalogTest.java: http://gerrit.cloudera.org:8080/#/c/10584/1/fe/src/test/java/org/apache/impala/catalog/CatalogTest.java@523 PS1, Line 523: public void testPartitionLimit() throws CatalogException { You added the exception handling in "updatePartitionsFromHms()" and in some alter table path (i.e., two places), but there seems to be only one code path tested here. Should we test the other one too here, or is it enough that we test it in the QueryTest? -- To view, visit http://gerrit.cloudera.org:8080/10584 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ife99a97a891ed14675303ea472abb2932a72cb51 Gerrit-Change-Number: 10584 Gerrit-PatchSet: 1 Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Tue, 05 Jun 2018 22:06:36 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7077. Add a configuration for the max number of partitions to load
Bharath Vissapragada has posted comments on this change. ( http://gerrit.cloudera.org:8080/10584 ) Change subject: IMPALA-7077. Add a configuration for the max number of partitions to load .. Patch Set 1: (2 comments) Todd, what are your thoughts on having a "per table" property that can blacklist that particular table from loading in Impala? For example, something like alter table foo set tblproperties ("load-in-impala" = "false") The issue with having a global max partitions is that - It is a start up flag (requires a restart) - Lets say if someone sets it to 100k, we can still have multiple tables with 90k partitions or something like that - In many cases users would like to blacklist some specific problematic tables (say with large incremental stats) and having just a partition limit may not help Vuk and I were discussing this offline today and I thought I'd bring this up in this CR. It is unclear if both these flags can co-exist or we should just have one of them. Thoughts? http://gerrit.cloudera.org:8080/#/c/10584/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/10584/1//COMMIT_MSG@21 PS1, Line 21: worth the complexity to try to : somehow limit the addition of partitions during a DML > The problem is that the catalog update happens at the end of the DML, after I lean towards (2) but bubble up a meaningful error message for the DML query and not the generic message that we are not able to load the table. I prefer (2) because (1) leaves the table on HDFS in an inconsistent state. http://gerrit.cloudera.org:8080/#/c/10584/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: http://gerrit.cloudera.org:8080/#/c/10584/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@2067 PS1, Line 2067: if (curPartitionCount + allHmsPartitionsToAdd.size() > HdfsTable.MAX_PARTITIONS_PER_TABLE) { > its not explicitly stated for java in the style guide, but given the limits Yea, we use 90 chars for Java too. Gerrit defaults to 100. -- To view, visit http://gerrit.cloudera.org:8080/10584 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ife99a97a891ed14675303ea472abb2932a72cb51 Gerrit-Change-Number: 10584 Gerrit-PatchSet: 1 Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Tue, 05 Jun 2018 22:00:04 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7077. Add a configuration for the max number of partitions to load
Vuk Ercegovac has posted comments on this change. ( http://gerrit.cloudera.org:8080/10584 ) Change subject: IMPALA-7077. Add a configuration for the max number of partitions to load .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/10584/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: http://gerrit.cloudera.org:8080/#/c/10584/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@2067 PS1, Line 2067: if (curPartitionCount + allHmsPartitionsToAdd.size() > HdfsTable.MAX_PARTITIONS_PER_TABLE) { > these are < 100 chars which is the line limit that gerrit is showing. Do yo its not explicitly stated for java in the style guide, but given the limits for python/c++, it should be 90. -- To view, visit http://gerrit.cloudera.org:8080/10584 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ife99a97a891ed14675303ea472abb2932a72cb51 Gerrit-Change-Number: 10584 Gerrit-PatchSet: 1 Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Tue, 05 Jun 2018 18:04:49 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7077. Add a configuration for the max number of partitions to load
Todd Lipcon has posted comments on this change. ( http://gerrit.cloudera.org:8080/10584 ) Change subject: IMPALA-7077. Add a configuration for the max number of partitions to load .. Patch Set 1: (6 comments) http://gerrit.cloudera.org:8080/#/c/10584/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/10584/1//COMMIT_MSG@19 PS1, Line 19: partitions will be created > You mean those are created on the filesystem but not in HMS or something el right, will clarify in the commit message. http://gerrit.cloudera.org:8080/#/c/10584/1//COMMIT_MSG@19 PS1, Line 19: will no longer load : following the change > will that table be purged or retained in catalogd in some sort of unusable Will clarify this http://gerrit.cloudera.org:8080/#/c/10584/1//COMMIT_MSG@21 PS1, Line 21: worth the complexity to try to : somehow limit the addition of partitions during a DML > I guess this is a simple check in CatalogOpEx#updateCatalog() unless I'm mi The problem is that the catalog update happens at the end of the DML, after the insert sinks have already created all of the directories and files on the filesystem, if I understand the flow correctly. So, unless we wanted to try to back out the changes by rm -rfing the files that were created (seems difficult and error-prone) we're kind of stuck with the following choices: 1) leave the files on the filesystem but don't create the partitions in HMS 2) create the partitions in HMS but leave the table in a state where Impala will refuse to load it (but at least Hive will load it). Neither is great but I couldn't think of a good alternative. http://gerrit.cloudera.org:8080/#/c/10584/1/common/thrift/CatalogService.thrift File common/thrift/CatalogService.thrift: http://gerrit.cloudera.org:8080/#/c/10584/1/common/thrift/CatalogService.thrift@68 PS1, Line 68: 7: optional list warning_messages > Where is this used? Don't see it's references in the patch. ah, I originally attempted to use this to send warnings back on DML, but then reverted this part of the change. forgot to remove the thrift, sorry http://gerrit.cloudera.org:8080/#/c/10584/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java: http://gerrit.cloudera.org:8080/#/c/10584/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@287 PS1, Line 287: public static int MAX_PARTITIONS_PER_TABLE = > final? it's modified by a test so can't make it final. I can add a comment to that effect if you want. http://gerrit.cloudera.org:8080/#/c/10584/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: http://gerrit.cloudera.org:8080/#/c/10584/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@2067 PS1, Line 2067: if (curPartitionCount + allHmsPartitionsToAdd.size() > HdfsTable.MAX_PARTITIONS_PER_TABLE) { > nit: line overflow (here and below) these are < 100 chars which is the line limit that gerrit is showing. Do you try to keep to 80 chars in Java code? -- To view, visit http://gerrit.cloudera.org:8080/10584 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ife99a97a891ed14675303ea472abb2932a72cb51 Gerrit-Change-Number: 10584 Gerrit-PatchSet: 1 Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Tue, 05 Jun 2018 18:00:11 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7077. Add a configuration for the max number of partitions to load
Bharath Vissapragada has posted comments on this change. ( http://gerrit.cloudera.org:8080/10584 ) Change subject: IMPALA-7077. Add a configuration for the max number of partitions to load .. Patch Set 1: (7 comments) http://gerrit.cloudera.org:8080/#/c/10584/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/10584/1//COMMIT_MSG@19 PS1, Line 19: partitions will be created You mean those are created on the filesystem but not in HMS or something else? http://gerrit.cloudera.org:8080/#/c/10584/1//COMMIT_MSG@21 PS1, Line 21: worth the complexity to try to : somehow limit the addition of partitions during a DML I guess this is a simple check in CatalogOpEx#updateCatalog() unless I'm missing something. http://gerrit.cloudera.org:8080/#/c/10584/1/be/src/util/backend-gflag-util.cc File be/src/util/backend-gflag-util.cc: http://gerrit.cloudera.org:8080/#/c/10584/1/be/src/util/backend-gflag-util.cc@72 PS1, Line 72: nit: remove http://gerrit.cloudera.org:8080/#/c/10584/1/common/thrift/CatalogService.thrift File common/thrift/CatalogService.thrift: http://gerrit.cloudera.org:8080/#/c/10584/1/common/thrift/CatalogService.thrift@68 PS1, Line 68: 7: optional list warning_messages Where is this used? Don't see it's references in the patch. http://gerrit.cloudera.org:8080/#/c/10584/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: http://gerrit.cloudera.org:8080/#/c/10584/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@2067 PS1, Line 2067: if (curPartitionCount + allHmsPartitionsToAdd.size() > HdfsTable.MAX_PARTITIONS_PER_TABLE) { nit: line overflow (here and below) http://gerrit.cloudera.org:8080/#/c/10584/1/testdata/workloads/functional-query/queries/QueryTest/max-partitions-to-load.test File testdata/workloads/functional-query/queries/QueryTest/max-partitions-to-load.test: http://gerrit.cloudera.org:8080/#/c/10584/1/testdata/workloads/functional-query/queries/QueryTest/max-partitions-to-load.test@52 PS1, Line 52: insert into $DATABASE.t1(y) values (21) Could you add a test wherein partitions are added from Hive and user does a REFRESH and that hits the partition limit? http://gerrit.cloudera.org:8080/#/c/10584/1/testdata/workloads/functional-query/queries/QueryTest/max-partitions-to-load.test@70 PS1, Line 70: I think the patch is missing the guard rails in "ALTER TABLE RECOVER PARTITIONS" case. Could you fix that and add a test? -- To view, visit http://gerrit.cloudera.org:8080/10584 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ife99a97a891ed14675303ea472abb2932a72cb51 Gerrit-Change-Number: 10584 Gerrit-PatchSet: 1 Gerrit-Owner: Todd Lipcon Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Vuk Ercegovac Gerrit-Comment-Date: Mon, 04 Jun 2018 23:34:33 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7077. Add a configuration for the max number of partitions to load
Todd Lipcon has uploaded this change for review. ( http://gerrit.cloudera.org:8080/10584 Change subject: IMPALA-7077. Add a configuration for the max number of partitions to load .. IMPALA-7077. Add a configuration for the max number of partitions to load This adds a new flag --max_partitions_per_table which prevents catalogd from attempting to load any table with more than some maximum number of partitions. If set to 0 (the default) then no limit is enforced. The purpose of this change is to act as a "guard rail" for users in clusters where there may exist some tables with many hundreds of thousands of partitions that are known to cause catalogd/impalad OOMs. The behavior of this patch is slightly suboptimal in that, if the addition of partitions during a DML pushes a table over the limit, those partitions will be created but then the table will no longer load following the change. However, given that this is meant to be a 'last defense' safeguard, I didn't think it was worth the complexity to try to somehow limit the addition of partitions during a DML. Change-Id: Ife99a97a891ed14675303ea472abb2932a72cb51 --- M be/src/catalog/catalog.cc M be/src/util/backend-gflag-util.cc M common/thrift/BackendGflags.thrift M common/thrift/CatalogService.thrift M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java M fe/src/main/java/org/apache/impala/catalog/ScalarFunction.java M fe/src/main/java/org/apache/impala/service/BackendConfig.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/java/org/apache/impala/util/MetaStoreUtil.java M fe/src/test/java/org/apache/impala/catalog/CatalogTest.java A testdata/workloads/functional-query/queries/QueryTest/max-partitions-to-load.test A tests/custom_cluster/test_max_partitions.py 12 files changed, 172 insertions(+), 5 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/84/10584/1 -- To view, visit http://gerrit.cloudera.org:8080/10584 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Ife99a97a891ed14675303ea472abb2932a72cb51 Gerrit-Change-Number: 10584 Gerrit-PatchSet: 1 Gerrit-Owner: Todd Lipcon