[Impala-ASF-CR] IMPALA-4524: Batch calls to ALTER TABLE...ADD PARTITION.

2017-10-18 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8238 )

Change subject: IMPALA-4524: Batch calls to ALTER TABLE...ADD PARTITION.
..


Patch Set 7: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/8238
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I95f8221ff08c0f126f951f7d37ff5e57985f855f
Gerrit-Change-Number: 8238
Gerrit-PatchSet: 7
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Comment-Date: Thu, 19 Oct 2017 04:27:34 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4524: Batch calls to ALTER TABLE...ADD PARTITION.

2017-10-18 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8238 )

Change subject: IMPALA-4524: Batch calls to ALTER TABLE...ADD PARTITION.
..


Patch Set 7:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/8238/7/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/8238/7/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1958
PS7, Line 1958:   Lists.partition(allHmsPartitionsToAdd, 
MAX_PARTITION_UPDATES_PER_RPC)) {
While you are here, can you also change bulkAlterPartitions() to use 
Lists.partition()?


http://gerrit.cloudera.org:8080/#/c/8238/7/tests/metadata/test_ddl.py
File tests/metadata/test_ddl.py:

http://gerrit.cloudera.org:8080/#/c/8238/7/tests/metadata/test_ddl.py@514
PS7, Line 514:   def test_alter_table_create_many_partitions(self, vector, 
unique_database):
This test doesn't belong in the TestLibCache class.

Don't we have existing tests that test the multi-partition add alter? Seems 
like this test belongs there.


http://gerrit.cloudera.org:8080/#/c/8238/7/tests/metadata/test_ddl.py@519
PS7, Line 519: self.client.execute('use {0}'.format(unique_database))
"use" can generally cause problems in tests because we sometimes run a test 
against different impalads for example to exercise sync_ddl. While I think it 
is ok in this specific instance, it's generally better practice not so issue 
"use" statements unless absolutely necessary.


http://gerrit.cloudera.org:8080/#/c/8238/7/tests/metadata/test_ddl.py@527
PS7, Line 527: # 473 -1  0   0B  NOT CACHED  NOT CACHED 
 TEXTfalse   
hdfs://localhost:20500/test-warehouse/test_alter_table_create_many_partitions_5b4888f5.db/t/p=473
This comment can become stale quickly. I suggest not adding it an output row 
verbatim but mentioning that show partitions contains the partition HDFS paths 
which we expect to contain p=val sub-directories that you are going to search 
for with the regex.



--
To view, visit http://gerrit.cloudera.org:8080/8238
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I95f8221ff08c0f126f951f7d37ff5e57985f855f
Gerrit-Change-Number: 8238
Gerrit-PatchSet: 7
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Comment-Date: Thu, 19 Oct 2017 03:40:11 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4524: Batch calls to ALTER TABLE...ADD PARTITION.

2017-10-18 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8238 )

Change subject: IMPALA-4524: Batch calls to ALTER TABLE...ADD PARTITION.
..


Patch Set 7:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1350/


--
To view, visit http://gerrit.cloudera.org:8080/8238
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I95f8221ff08c0f126f951f7d37ff5e57985f855f
Gerrit-Change-Number: 8238
Gerrit-PatchSet: 7
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Comment-Date: Thu, 19 Oct 2017 00:27:24 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4524: Batch calls to ALTER TABLE...ADD PARTITION.

2017-10-18 Thread Philip Zeyliger (Code Review)
Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8238 )

Change subject: IMPALA-4524: Batch calls to ALTER TABLE...ADD PARTITION.
..


Patch Set 7: Code-Review+1

Carry post rebase.


-- 
To view, visit http://gerrit.cloudera.org:8080/8238
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I95f8221ff08c0f126f951f7d37ff5e57985f855f
Gerrit-Change-Number: 8238
Gerrit-PatchSet: 7
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Comment-Date: Wed, 18 Oct 2017 21:32:58 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4524: Batch calls to ALTER TABLE...ADD PARTITION.

2017-10-18 Thread Philip Zeyliger (Code Review)
Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8238 )

Change subject: IMPALA-4524: Batch calls to ALTER TABLE...ADD PARTITION.
..


Patch Set 6: Code-Review+1

(1 comment)

Carrying Dimitris' +1.

http://gerrit.cloudera.org:8080/#/c/8238/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/8238/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1974
PS3, Line 1974: // HMS because some of them may already exist there. In that 
case, we load in the
  :   // catalog the partitions that already exist in HMS but 
aren't in the catalog yet.
  :   if (allHmsPartitionsToAdd.size() != 
addedHmsPartitions.size()) {
  : List difference = 
computeDifference(allHmsPartitionsToAdd,
  : addedHmsPartitions);
  : addedHmsPartitions.addAll(
  : getPartitionsFromHms(msTbl, msClient, tableName, 
difference));
  :   }
  :
  :   for (Partition partition: addedHmsPartitions) {
  : // Create and add the HdfsPartition to catalog. Return 
the table object with an
  :
> Well, you're only calling it for the diff, right? I would expect in most ca
Yep, I agree.



--
To view, visit http://gerrit.cloudera.org:8080/8238
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I95f8221ff08c0f126f951f7d37ff5e57985f855f
Gerrit-Change-Number: 8238
Gerrit-PatchSet: 6
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Comment-Date: Wed, 18 Oct 2017 21:32:02 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4524: Batch calls to ALTER TABLE...ADD PARTITION.

2017-10-16 Thread Philip Zeyliger (Code Review)
Hello Dimitris Tsirogiannis, Alex Behm,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8238

to look at the new patch set (#5).

Change subject: IMPALA-4524: Batch calls to ALTER TABLE...ADD PARTITION.
..

IMPALA-4524: Batch calls to ALTER TABLE...ADD PARTITION.

This commit allows users to add more than 500 (=MAX_PARTITION_UPDATES_PER_RPC)
partitions in a single ALTER TABLE command. We batch the operations
against Hive into groups of 500.

I tested this manually, creating 1002 partitions and observing the
expected 3 API calls against the Hive Metastore in the log.  I can
confirm that there is coverage of this in some existing tests.
A new, simple, test has been added that confirms that creating 502
partitions works.

Change-Id: I95f8221ff08c0f126f951f7d37ff5e57985f855f
---
M fe/src/main/java/org/apache/impala/analysis/AlterTableAddPartitionStmt.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M tests/metadata/test_ddl.py
4 files changed, 40 insertions(+), 42 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/38/8238/5
--
To view, visit http://gerrit.cloudera.org:8080/8238
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I95f8221ff08c0f126f951f7d37ff5e57985f855f
Gerrit-Change-Number: 8238
Gerrit-PatchSet: 5
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Philip Zeyliger 


[Impala-ASF-CR] IMPALA-4524: Batch calls to ALTER TABLE...ADD PARTITION.

2017-10-16 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8238 )

Change subject: IMPALA-4524: Batch calls to ALTER TABLE...ADD PARTITION.
..


Patch Set 4: Code-Review+2

(2 comments)

http://gerrit.cloudera.org:8080/#/c/8238/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/8238/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1974
PS3, Line 1974: // HMS because some of them may already exist there. In that 
case, we load in the
  :   // catalog the partitions that already exist in HMS but 
aren't in the catalog yet.
  :   if (allHmsPartitionsToAdd.size() != 
addedHmsPartitions.size()) {
  : List difference = 
computeDifference(allHmsPartitionsToAdd,
  : addedHmsPartitions);
  : addedHmsPartitions.addAll(
  : getPartitionsFromHms(msTbl, msClient, tableName, 
difference));
  :   }
  :
  :   for (Partition partition: addedHmsPartitions) {
  : // Create and add the HdfsPartition to catalog. Return 
the table object with an
  :
> Sure. I tightened the partitioned loop to only include the msClient call.
Well, you're only calling it for the diff, right? I would expect in most cases 
the diff to be small, so I am not really worried about this call. That said, we 
can reevaluate that decision if we have any reason to believe this is going to 
cause any problems.


http://gerrit.cloudera.org:8080/#/c/8238/4/tests/metadata/test_ddl.py
File tests/metadata/test_ddl.py:

http://gerrit.cloudera.org:8080/#/c/8238/4/tests/metadata/test_ddl.py@522
PS4, Line 522: alter_stmt = "alter table t add " + " 
".join("partition(p=%d)" % (i,) for i in xrange(MAX_PARTITION_UPDATES_PER_RPC + 
2))
nit: long line



--
To view, visit http://gerrit.cloudera.org:8080/8238
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I95f8221ff08c0f126f951f7d37ff5e57985f855f
Gerrit-Change-Number: 8238
Gerrit-PatchSet: 4
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Comment-Date: Mon, 16 Oct 2017 23:15:39 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4524: Batch calls to ALTER TABLE...ADD PARTITION.

2017-10-16 Thread Philip Zeyliger (Code Review)
Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8238 )

Change subject: IMPALA-4524: Batch calls to ALTER TABLE...ADD PARTITION.
..


Patch Set 4:

(2 comments)

Thanks for the review!

I ran the core tests. Added a python test, as you suggested.

http://gerrit.cloudera.org:8080/#/c/8238/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/8238/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@a1954
PS3, Line 1954:
> Why did you remove this?
Mistake. Re-instated.

(I missed that MetaStoreClient was autocloseable...)


http://gerrit.cloudera.org:8080/#/c/8238/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1974
PS3, Line 1974: // HMS because some of them may already exist there. In that 
case, we load in the
  :   // catalog the partitions that already exist in HMS but 
aren't in the catalog yet.
  :   if (allHmsPartitionsToAdd.size() != 
addedHmsPartitions.size()) {
  : List difference = 
computeDifference(allHmsPartitionsToAdd,
  : addedHmsPartitions);
  : addedHmsPartitions.addAll(
  : getPartitionsFromHms(msTbl, msClient, tableName, 
difference));
  :   }
  :
  :   for (Partition partition: addedHmsPartitions) {
  : // Create and add the HdfsPartition to catalog. Return 
the table object with an
  :
> I think you can refactor this so that you make only one call to getPartitio
Sure. I tightened the partitioned loop to only include the msClient call.

Note that this implies that getPartitionFromHms() is ok to call with a big (> 
MAX_PARTITION_UPDATES_PER_RPC) batch. Is that against the spirit here?



--
To view, visit http://gerrit.cloudera.org:8080/8238
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I95f8221ff08c0f126f951f7d37ff5e57985f855f
Gerrit-Change-Number: 8238
Gerrit-PatchSet: 4
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Comment-Date: Mon, 16 Oct 2017 18:43:01 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4524: Batch calls to ALTER TABLE...ADD PARTITION.

2017-10-16 Thread Philip Zeyliger (Code Review)
Hello Dimitris Tsirogiannis, Alex Behm,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8238

to look at the new patch set (#4).

Change subject: IMPALA-4524: Batch calls to ALTER TABLE...ADD PARTITION.
..

IMPALA-4524: Batch calls to ALTER TABLE...ADD PARTITION.

This commit allows users to add more than 500 (=MAX_PARTITION_UPDATES_PER_RPC)
partitions in a single ALTER TABLE command. We batch the operations
against Hive into groups of 500.

I tested this manually, creating 1002 partitions and observing the
expected 3 API calls against the Hive Metastore in the log.  I can
confirm that there is coverage of this in some existing tests.
A new, simple, test has been added that confirms that creating 502
partitions works.

Change-Id: I95f8221ff08c0f126f951f7d37ff5e57985f855f
---
M fe/src/main/java/org/apache/impala/analysis/AlterTableAddPartitionStmt.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M tests/metadata/test_ddl.py
4 files changed, 39 insertions(+), 42 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/38/8238/4
--
To view, visit http://gerrit.cloudera.org:8080/8238
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I95f8221ff08c0f126f951f7d37ff5e57985f855f
Gerrit-Change-Number: 8238
Gerrit-PatchSet: 4
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 


[Impala-ASF-CR] IMPALA-4524: Batch calls to ALTER TABLE...ADD PARTITION.

2017-10-12 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/8238 )

Change subject: IMPALA-4524: Batch calls to ALTER TABLE...ADD PARTITION.
..


Patch Set 3:

(2 comments)

Re testing, a simple python test that adds at least 
MAX_PARTITION_UPDATES_PER_SEC + 1 partitions and then validates that they were 
actually added in the HMS (e.g. by doing show partitions) is sufficient. If you 
want you can convert the analysis test in AnalyzeDDLTest.java to a positive 
test but it is kind of redundant if you add the python test I just mentioned.

http://gerrit.cloudera.org:8080/#/c/8238/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

http://gerrit.cloudera.org:8080/#/c/8238/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@a1954
PS3, Line 1954:
Why did you remove this?


http://gerrit.cloudera.org:8080/#/c/8238/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1974
PS3, Line 1974: if (hmsSublist.size() != addedHmsPartitions.size()) {
  : List difference = 
computeDifference(hmsSublist,
  : addedHmsPartitions);
  : addedHmsPartitions.addAll(
  : getPartitionsFromHms(msTbl, msClient, tableName, 
difference));
  :   }
  :
  :   for (Partition partition: addedHmsPartitions) {
  : // Create and add the HdfsPartition to catalog. Return 
the table object with an
  : // updated catalog version.
  : addHdfsPartition(tbl, partition);
  :   }
I think you can refactor this so that you make only one call to 
getPartitionsFromHms() outside the for loop.



--
To view, visit http://gerrit.cloudera.org:8080/8238
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I95f8221ff08c0f126f951f7d37ff5e57985f855f
Gerrit-Change-Number: 8238
Gerrit-PatchSet: 3
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Comment-Date: Thu, 12 Oct 2017 16:59:32 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4524: Batch calls to ALTER TABLE...ADD PARTITION.

2017-10-12 Thread Philip Zeyliger (Code Review)
Hello Alex Behm,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/8238

to look at the new patch set (#3).

Change subject: IMPALA-4524: Batch calls to ALTER TABLE...ADD PARTITION.
..

IMPALA-4524: Batch calls to ALTER TABLE...ADD PARTITION.

This commit allows users to add more than 500 (=MAX_PARTITION_UPDATES_PER_RPC)
partitions in a single ALTER TABLE command. We batch the operations
against Hive into groups of 500.

I tested this manually, creating 1002 partitions and observing the
expected 3 API calls against the Hive Metastore in the log.  I can
confirm that there is coverage of this in some existing tests.
I'm very open to additional suggestions for how to test this.
I can certainly add a query test that reproduces my manual testing,
if reviewers feel like it's the right direction.

Change-Id: I95f8221ff08c0f126f951f7d37ff5e57985f855f
---
M fe/src/main/java/org/apache/impala/analysis/AlterTableAddPartitionStmt.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
3 files changed, 12 insertions(+), 37 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/38/8238/3
-- 
To view, visit http://gerrit.cloudera.org:8080/8238
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I95f8221ff08c0f126f951f7d37ff5e57985f855f
Gerrit-Change-Number: 8238
Gerrit-PatchSet: 3
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Alex Behm