[Impala-ASF-CR] IMPALA-7077. Add a configuration for the max number of partitions to load

2019-03-04 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has abandoned this change. ( 
http://gerrit.cloudera.org:8080/10584 )

Change subject: IMPALA-7077. Add a configuration for the max number of 
partitions to load
..


Abandoned

Stale now. Feel free to restore if needed.
--
To view, visit http://gerrit.cloudera.org:8080/10584
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: abandon
Gerrit-Change-Id: Ife99a97a891ed14675303ea472abb2932a72cb51
Gerrit-Change-Number: 10584
Gerrit-PatchSet: 1
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Vuk Ercegovac 


[Impala-ASF-CR] IMPALA-7077. Add a configuration for the max number of partitions to load

2018-06-05 Thread Vuk Ercegovac (Code Review)
Vuk Ercegovac has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10584 )

Change subject: IMPALA-7077. Add a configuration for the max number of 
partitions to load
..


Patch Set 1:

Another idea that also came up was to go with a flag like the one in this 
change, but base the limit on memory consumed. That would cover more scenarios 
for banning a table: num partitions, num files, num columns, incremental stats, 
or some combination. We can obtain the size estimate at the per partition 
granularity, so once exceeded, back out as in this change.
There's still the downside of a static flag setting (less flexibility) but 
there is some ease-of-use that can be argued as well.


--
To view, visit http://gerrit.cloudera.org:8080/10584
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ife99a97a891ed14675303ea472abb2932a72cb51
Gerrit-Change-Number: 10584
Gerrit-PatchSet: 1
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Tue, 05 Jun 2018 22:07:54 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-7077. Add a configuration for the max number of partitions to load

2018-06-05 Thread Philip Zeyliger (Code Review)
Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10584 )

Change subject: IMPALA-7077. Add a configuration for the max number of 
partitions to load
..


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/10584/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/10584/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@2067
PS1, Line 2067: if (curPartitionCount + allHmsPartitionsToAdd.size() > 
HdfsTable.MAX_PARTITIONS_PER_TABLE) {
> Yea, we use 90 chars for Java too. Gerrit defaults to 100.
You can re-configure Gerrit for 90 in some config option if that's your thing.


http://gerrit.cloudera.org:8080/#/c/10584/1/fe/src/test/java/org/apache/impala/catalog/CatalogTest.java
File fe/src/test/java/org/apache/impala/catalog/CatalogTest.java:

http://gerrit.cloudera.org:8080/#/c/10584/1/fe/src/test/java/org/apache/impala/catalog/CatalogTest.java@523
PS1, Line 523:   public void testPartitionLimit() throws CatalogException {
You added the exception handling in "updatePartitionsFromHms()" and in some 
alter table path (i.e., two places), but there seems to be only one code path 
tested here. Should we test the other one too here, or is it enough that we 
test it in the QueryTest?



--
To view, visit http://gerrit.cloudera.org:8080/10584
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ife99a97a891ed14675303ea472abb2932a72cb51
Gerrit-Change-Number: 10584
Gerrit-PatchSet: 1
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Tue, 05 Jun 2018 22:06:36 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7077. Add a configuration for the max number of partitions to load

2018-06-05 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10584 )

Change subject: IMPALA-7077. Add a configuration for the max number of 
partitions to load
..


Patch Set 1:

(2 comments)

Todd, what are your thoughts on having a "per table" property that can 
blacklist that particular table from loading in Impala? For example, something 
like

alter table foo set tblproperties ("load-in-impala" = "false")

The issue with having a global max partitions is that

- It is a start up flag (requires a restart)
- Lets say if someone sets it to 100k, we can still have multiple tables with 
90k partitions or something like that
- In many cases users would like to blacklist some specific problematic tables 
(say with large incremental stats) and having just a partition limit may not 
help


Vuk and I were discussing this offline today and I thought I'd bring this up in 
this CR. It is unclear if both these flags can co-exist or we should just have 
one of them. Thoughts?

http://gerrit.cloudera.org:8080/#/c/10584/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/10584/1//COMMIT_MSG@21
PS1, Line 21: worth the complexity to try to
: somehow limit the addition of partitions during a DML
> The problem is that the catalog update happens at the end of the DML, after
I lean towards (2) but bubble up a meaningful error message for the DML query 
and not the generic message that we are not able to load the table. I prefer 
(2) because (1) leaves the table on HDFS in an inconsistent state.


http://gerrit.cloudera.org:8080/#/c/10584/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/10584/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@2067
PS1, Line 2067: if (curPartitionCount + allHmsPartitionsToAdd.size() > 
HdfsTable.MAX_PARTITIONS_PER_TABLE) {
> its not explicitly stated for java in the style guide, but given the limits
Yea, we use 90 chars for Java too. Gerrit defaults to 100.



--
To view, visit http://gerrit.cloudera.org:8080/10584
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ife99a97a891ed14675303ea472abb2932a72cb51
Gerrit-Change-Number: 10584
Gerrit-PatchSet: 1
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Tue, 05 Jun 2018 22:00:04 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7077. Add a configuration for the max number of partitions to load

2018-06-05 Thread Vuk Ercegovac (Code Review)
Vuk Ercegovac has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10584 )

Change subject: IMPALA-7077. Add a configuration for the max number of 
partitions to load
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/10584/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/10584/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@2067
PS1, Line 2067: if (curPartitionCount + allHmsPartitionsToAdd.size() > 
HdfsTable.MAX_PARTITIONS_PER_TABLE) {
> these are < 100 chars which is the line limit that gerrit is showing. Do yo
its not explicitly stated for java in the style guide, but given the limits for 
python/c++, it should be 90.



--
To view, visit http://gerrit.cloudera.org:8080/10584
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ife99a97a891ed14675303ea472abb2932a72cb51
Gerrit-Change-Number: 10584
Gerrit-PatchSet: 1
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Tue, 05 Jun 2018 18:04:49 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7077. Add a configuration for the max number of partitions to load

2018-06-05 Thread Todd Lipcon (Code Review)
Todd Lipcon has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10584 )

Change subject: IMPALA-7077. Add a configuration for the max number of 
partitions to load
..


Patch Set 1:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/10584/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/10584/1//COMMIT_MSG@19
PS1, Line 19: partitions will be created
> You mean those are created on the filesystem but not in HMS or something el
right, will clarify in the commit message.


http://gerrit.cloudera.org:8080/#/c/10584/1//COMMIT_MSG@19
PS1, Line 19: will no longer load
: following the change
> will that table be purged or retained in catalogd in some sort of unusable
Will clarify this


http://gerrit.cloudera.org:8080/#/c/10584/1//COMMIT_MSG@21
PS1, Line 21: worth the complexity to try to
: somehow limit the addition of partitions during a DML
> I guess this is a simple check in CatalogOpEx#updateCatalog() unless I'm mi
The problem is that the catalog update happens at the end of the DML, after the 
insert sinks have already created all of the directories and files on the 
filesystem, if I understand the flow correctly. So, unless we wanted to try to 
back out the changes by rm -rfing the files that were created (seems difficult 
and error-prone) we're kind of stuck with the following choices:

1) leave the files on the filesystem but don't create the partitions in HMS
2) create the partitions in HMS but leave the table in a state where Impala 
will refuse to load it (but at least Hive will load it).

Neither is great but I couldn't think of a good alternative.


http://gerrit.cloudera.org:8080/#/c/10584/1/common/thrift/CatalogService.thrift
File common/thrift/CatalogService.thrift:

http://gerrit.cloudera.org:8080/#/c/10584/1/common/thrift/CatalogService.thrift@68
PS1, Line 68:   7: optional list warning_messages
> Where is this used? Don't see it's references in the patch.
ah, I originally attempted to use this to send warnings back on DML, but then 
reverted this part of the change. forgot to remove the thrift, sorry


http://gerrit.cloudera.org:8080/#/c/10584/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
File fe/src/main/java/org/apache/impala/catalog/HdfsTable.java:

http://gerrit.cloudera.org:8080/#/c/10584/1/fe/src/main/java/org/apache/impala/catalog/HdfsTable.java@287
PS1, Line 287:   public static int MAX_PARTITIONS_PER_TABLE =
> final?
it's modified by a test so can't make it final. I can add a comment to that 
effect if you want.


http://gerrit.cloudera.org:8080/#/c/10584/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/10584/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@2067
PS1, Line 2067: if (curPartitionCount + allHmsPartitionsToAdd.size() > 
HdfsTable.MAX_PARTITIONS_PER_TABLE) {
> nit: line overflow (here and below)
these are < 100 chars which is the line limit that gerrit is showing. Do you 
try to keep to 80 chars in Java code?



--
To view, visit http://gerrit.cloudera.org:8080/10584
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ife99a97a891ed14675303ea472abb2932a72cb51
Gerrit-Change-Number: 10584
Gerrit-PatchSet: 1
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Tue, 05 Jun 2018 18:00:11 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7077. Add a configuration for the max number of partitions to load

2018-06-04 Thread Bharath Vissapragada (Code Review)
Bharath Vissapragada has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10584 )

Change subject: IMPALA-7077. Add a configuration for the max number of 
partitions to load
..


Patch Set 1:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/10584/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/10584/1//COMMIT_MSG@19
PS1, Line 19: partitions will be created
You mean those are created on the filesystem but not in HMS or something else?


http://gerrit.cloudera.org:8080/#/c/10584/1//COMMIT_MSG@21
PS1, Line 21: worth the complexity to try to
: somehow limit the addition of partitions during a DML
I guess this is a simple check in CatalogOpEx#updateCatalog() unless I'm 
missing something.


http://gerrit.cloudera.org:8080/#/c/10584/1/be/src/util/backend-gflag-util.cc
File be/src/util/backend-gflag-util.cc:

http://gerrit.cloudera.org:8080/#/c/10584/1/be/src/util/backend-gflag-util.cc@72
PS1, Line 72:
nit: remove


http://gerrit.cloudera.org:8080/#/c/10584/1/common/thrift/CatalogService.thrift
File common/thrift/CatalogService.thrift:

http://gerrit.cloudera.org:8080/#/c/10584/1/common/thrift/CatalogService.thrift@68
PS1, Line 68:   7: optional list warning_messages
Where is this used? Don't see it's references in the patch.


http://gerrit.cloudera.org:8080/#/c/10584/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/10584/1/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@2067
PS1, Line 2067: if (curPartitionCount + allHmsPartitionsToAdd.size() > 
HdfsTable.MAX_PARTITIONS_PER_TABLE) {
nit: line overflow (here and below)


http://gerrit.cloudera.org:8080/#/c/10584/1/testdata/workloads/functional-query/queries/QueryTest/max-partitions-to-load.test
File 
testdata/workloads/functional-query/queries/QueryTest/max-partitions-to-load.test:

http://gerrit.cloudera.org:8080/#/c/10584/1/testdata/workloads/functional-query/queries/QueryTest/max-partitions-to-load.test@52
PS1, Line 52: insert into $DATABASE.t1(y) values (21)
Could you add a test wherein partitions are added from Hive and user does a 
REFRESH and that hits the partition limit?


http://gerrit.cloudera.org:8080/#/c/10584/1/testdata/workloads/functional-query/queries/QueryTest/max-partitions-to-load.test@70
PS1, Line 70:
I think the patch is missing the guard rails in "ALTER TABLE  RECOVER 
PARTITIONS" case. Could you fix that and add a test?



-- 
To view, visit http://gerrit.cloudera.org:8080/10584
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ife99a97a891ed14675303ea472abb2932a72cb51
Gerrit-Change-Number: 10584
Gerrit-PatchSet: 1
Gerrit-Owner: Todd Lipcon 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Vuk Ercegovac 
Gerrit-Comment-Date: Mon, 04 Jun 2018 23:34:33 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-7077. Add a configuration for the max number of partitions to load

2018-06-01 Thread Todd Lipcon (Code Review)
Todd Lipcon has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/10584


Change subject: IMPALA-7077. Add a configuration for the max number of 
partitions to load
..

IMPALA-7077. Add a configuration for the max number of partitions to load

This adds a new flag --max_partitions_per_table which prevents catalogd
from attempting to load any table with more than some maximum number of
partitions. If set to 0 (the default) then no limit is enforced.

The purpose of this change is to act as a "guard rail" for users in
clusters where there may exist some tables with many hundreds of
thousands of partitions that are known to cause catalogd/impalad OOMs.

The behavior of this patch is slightly suboptimal in that, if the
addition of partitions during a DML pushes a table over the limit, those
partitions will be created but then the table will no longer load
following the change. However, given that this is meant to be a 'last
defense' safeguard, I didn't think it was worth the complexity to try to
somehow limit the addition of partitions during a DML.

Change-Id: Ife99a97a891ed14675303ea472abb2932a72cb51
---
M be/src/catalog/catalog.cc
M be/src/util/backend-gflag-util.cc
M common/thrift/BackendGflags.thrift
M common/thrift/CatalogService.thrift
M fe/src/main/java/org/apache/impala/catalog/HdfsTable.java
M fe/src/main/java/org/apache/impala/catalog/ScalarFunction.java
M fe/src/main/java/org/apache/impala/service/BackendConfig.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/util/MetaStoreUtil.java
M fe/src/test/java/org/apache/impala/catalog/CatalogTest.java
A 
testdata/workloads/functional-query/queries/QueryTest/max-partitions-to-load.test
A tests/custom_cluster/test_max_partitions.py
12 files changed, 172 insertions(+), 5 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/84/10584/1
--
To view, visit http://gerrit.cloudera.org:8080/10584
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ife99a97a891ed14675303ea472abb2932a72cb51
Gerrit-Change-Number: 10584
Gerrit-PatchSet: 1
Gerrit-Owner: Todd Lipcon