[Impala-ASF-CR] IMPALA-4524: Batch calls to ALTER TABLE...ADD PARTITION.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/8238 ) Change subject: IMPALA-4524: Batch calls to ALTER TABLE...ADD PARTITION. .. Patch Set 7: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/8238 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I95f8221ff08c0f126f951f7d37ff5e57985f855f Gerrit-Change-Number: 8238 Gerrit-PatchSet: 7 Gerrit-Owner: Philip ZeyligerGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Philip Zeyliger Gerrit-Comment-Date: Thu, 19 Oct 2017 04:27:34 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4524: Batch calls to ALTER TABLE...ADD PARTITION.
Alex Behm has posted comments on this change. ( http://gerrit.cloudera.org:8080/8238 ) Change subject: IMPALA-4524: Batch calls to ALTER TABLE...ADD PARTITION. .. Patch Set 7: (4 comments) http://gerrit.cloudera.org:8080/#/c/8238/7/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: http://gerrit.cloudera.org:8080/#/c/8238/7/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1958 PS7, Line 1958: Lists.partition(allHmsPartitionsToAdd, MAX_PARTITION_UPDATES_PER_RPC)) { While you are here, can you also change bulkAlterPartitions() to use Lists.partition()? http://gerrit.cloudera.org:8080/#/c/8238/7/tests/metadata/test_ddl.py File tests/metadata/test_ddl.py: http://gerrit.cloudera.org:8080/#/c/8238/7/tests/metadata/test_ddl.py@514 PS7, Line 514: def test_alter_table_create_many_partitions(self, vector, unique_database): This test doesn't belong in the TestLibCache class. Don't we have existing tests that test the multi-partition add alter? Seems like this test belongs there. http://gerrit.cloudera.org:8080/#/c/8238/7/tests/metadata/test_ddl.py@519 PS7, Line 519: self.client.execute('use {0}'.format(unique_database)) "use" can generally cause problems in tests because we sometimes run a test against different impalads for example to exercise sync_ddl. While I think it is ok in this specific instance, it's generally better practice not so issue "use" statements unless absolutely necessary. http://gerrit.cloudera.org:8080/#/c/8238/7/tests/metadata/test_ddl.py@527 PS7, Line 527: # 473 -1 0 0B NOT CACHED NOT CACHED TEXTfalse hdfs://localhost:20500/test-warehouse/test_alter_table_create_many_partitions_5b4888f5.db/t/p=473 This comment can become stale quickly. I suggest not adding it an output row verbatim but mentioning that show partitions contains the partition HDFS paths which we expect to contain p=val sub-directories that you are going to search for with the regex. -- To view, visit http://gerrit.cloudera.org:8080/8238 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I95f8221ff08c0f126f951f7d37ff5e57985f855f Gerrit-Change-Number: 8238 Gerrit-PatchSet: 7 Gerrit-Owner: Philip ZeyligerGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Philip Zeyliger Gerrit-Comment-Date: Thu, 19 Oct 2017 03:40:11 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4524: Batch calls to ALTER TABLE...ADD PARTITION.
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/8238 ) Change subject: IMPALA-4524: Batch calls to ALTER TABLE...ADD PARTITION. .. Patch Set 7: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/1350/ -- To view, visit http://gerrit.cloudera.org:8080/8238 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I95f8221ff08c0f126f951f7d37ff5e57985f855f Gerrit-Change-Number: 8238 Gerrit-PatchSet: 7 Gerrit-Owner: Philip ZeyligerGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Philip Zeyliger Gerrit-Comment-Date: Thu, 19 Oct 2017 00:27:24 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4524: Batch calls to ALTER TABLE...ADD PARTITION.
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/8238 ) Change subject: IMPALA-4524: Batch calls to ALTER TABLE...ADD PARTITION. .. Patch Set 7: Code-Review+1 Carry post rebase. -- To view, visit http://gerrit.cloudera.org:8080/8238 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I95f8221ff08c0f126f951f7d37ff5e57985f855f Gerrit-Change-Number: 8238 Gerrit-PatchSet: 7 Gerrit-Owner: Philip ZeyligerGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Philip Zeyliger Gerrit-Comment-Date: Wed, 18 Oct 2017 21:32:58 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4524: Batch calls to ALTER TABLE...ADD PARTITION.
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/8238 ) Change subject: IMPALA-4524: Batch calls to ALTER TABLE...ADD PARTITION. .. Patch Set 6: Code-Review+1 (1 comment) Carrying Dimitris' +1. http://gerrit.cloudera.org:8080/#/c/8238/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: http://gerrit.cloudera.org:8080/#/c/8238/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1974 PS3, Line 1974: // HMS because some of them may already exist there. In that case, we load in the : // catalog the partitions that already exist in HMS but aren't in the catalog yet. : if (allHmsPartitionsToAdd.size() != addedHmsPartitions.size()) { : List difference = computeDifference(allHmsPartitionsToAdd, : addedHmsPartitions); : addedHmsPartitions.addAll( : getPartitionsFromHms(msTbl, msClient, tableName, difference)); : } : : for (Partition partition: addedHmsPartitions) { : // Create and add the HdfsPartition to catalog. Return the table object with an : > Well, you're only calling it for the diff, right? I would expect in most ca Yep, I agree. -- To view, visit http://gerrit.cloudera.org:8080/8238 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I95f8221ff08c0f126f951f7d37ff5e57985f855f Gerrit-Change-Number: 8238 Gerrit-PatchSet: 6 Gerrit-Owner: Philip ZeyligerGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Philip Zeyliger Gerrit-Comment-Date: Wed, 18 Oct 2017 21:32:02 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4524: Batch calls to ALTER TABLE...ADD PARTITION.
Hello Dimitris Tsirogiannis, Alex Behm, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8238 to look at the new patch set (#5). Change subject: IMPALA-4524: Batch calls to ALTER TABLE...ADD PARTITION. .. IMPALA-4524: Batch calls to ALTER TABLE...ADD PARTITION. This commit allows users to add more than 500 (=MAX_PARTITION_UPDATES_PER_RPC) partitions in a single ALTER TABLE command. We batch the operations against Hive into groups of 500. I tested this manually, creating 1002 partitions and observing the expected 3 API calls against the Hive Metastore in the log. I can confirm that there is coverage of this in some existing tests. A new, simple, test has been added that confirms that creating 502 partitions works. Change-Id: I95f8221ff08c0f126f951f7d37ff5e57985f855f --- M fe/src/main/java/org/apache/impala/analysis/AlterTableAddPartitionStmt.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java M tests/metadata/test_ddl.py 4 files changed, 40 insertions(+), 42 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/38/8238/5 -- To view, visit http://gerrit.cloudera.org:8080/8238 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I95f8221ff08c0f126f951f7d37ff5e57985f855f Gerrit-Change-Number: 8238 Gerrit-PatchSet: 5 Gerrit-Owner: Philip ZeyligerGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Philip Zeyliger
[Impala-ASF-CR] IMPALA-4524: Batch calls to ALTER TABLE...ADD PARTITION.
Dimitris Tsirogiannis has posted comments on this change. ( http://gerrit.cloudera.org:8080/8238 ) Change subject: IMPALA-4524: Batch calls to ALTER TABLE...ADD PARTITION. .. Patch Set 4: Code-Review+2 (2 comments) http://gerrit.cloudera.org:8080/#/c/8238/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: http://gerrit.cloudera.org:8080/#/c/8238/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1974 PS3, Line 1974: // HMS because some of them may already exist there. In that case, we load in the : // catalog the partitions that already exist in HMS but aren't in the catalog yet. : if (allHmsPartitionsToAdd.size() != addedHmsPartitions.size()) { : List difference = computeDifference(allHmsPartitionsToAdd, : addedHmsPartitions); : addedHmsPartitions.addAll( : getPartitionsFromHms(msTbl, msClient, tableName, difference)); : } : : for (Partition partition: addedHmsPartitions) { : // Create and add the HdfsPartition to catalog. Return the table object with an : > Sure. I tightened the partitioned loop to only include the msClient call. Well, you're only calling it for the diff, right? I would expect in most cases the diff to be small, so I am not really worried about this call. That said, we can reevaluate that decision if we have any reason to believe this is going to cause any problems. http://gerrit.cloudera.org:8080/#/c/8238/4/tests/metadata/test_ddl.py File tests/metadata/test_ddl.py: http://gerrit.cloudera.org:8080/#/c/8238/4/tests/metadata/test_ddl.py@522 PS4, Line 522: alter_stmt = "alter table t add " + " ".join("partition(p=%d)" % (i,) for i in xrange(MAX_PARTITION_UPDATES_PER_RPC + 2)) nit: long line -- To view, visit http://gerrit.cloudera.org:8080/8238 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I95f8221ff08c0f126f951f7d37ff5e57985f855f Gerrit-Change-Number: 8238 Gerrit-PatchSet: 4 Gerrit-Owner: Philip ZeyligerGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Philip Zeyliger Gerrit-Comment-Date: Mon, 16 Oct 2017 23:15:39 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4524: Batch calls to ALTER TABLE...ADD PARTITION.
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/8238 ) Change subject: IMPALA-4524: Batch calls to ALTER TABLE...ADD PARTITION. .. Patch Set 4: (2 comments) Thanks for the review! I ran the core tests. Added a python test, as you suggested. http://gerrit.cloudera.org:8080/#/c/8238/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: http://gerrit.cloudera.org:8080/#/c/8238/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@a1954 PS3, Line 1954: > Why did you remove this? Mistake. Re-instated. (I missed that MetaStoreClient was autocloseable...) http://gerrit.cloudera.org:8080/#/c/8238/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1974 PS3, Line 1974: // HMS because some of them may already exist there. In that case, we load in the : // catalog the partitions that already exist in HMS but aren't in the catalog yet. : if (allHmsPartitionsToAdd.size() != addedHmsPartitions.size()) { : List difference = computeDifference(allHmsPartitionsToAdd, : addedHmsPartitions); : addedHmsPartitions.addAll( : getPartitionsFromHms(msTbl, msClient, tableName, difference)); : } : : for (Partition partition: addedHmsPartitions) { : // Create and add the HdfsPartition to catalog. Return the table object with an : > I think you can refactor this so that you make only one call to getPartitio Sure. I tightened the partitioned loop to only include the msClient call. Note that this implies that getPartitionFromHms() is ok to call with a big (> MAX_PARTITION_UPDATES_PER_RPC) batch. Is that against the spirit here? -- To view, visit http://gerrit.cloudera.org:8080/8238 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I95f8221ff08c0f126f951f7d37ff5e57985f855f Gerrit-Change-Number: 8238 Gerrit-PatchSet: 4 Gerrit-Owner: Philip ZeyligerGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Philip Zeyliger Gerrit-Comment-Date: Mon, 16 Oct 2017 18:43:01 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4524: Batch calls to ALTER TABLE...ADD PARTITION.
Hello Dimitris Tsirogiannis, Alex Behm, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8238 to look at the new patch set (#4). Change subject: IMPALA-4524: Batch calls to ALTER TABLE...ADD PARTITION. .. IMPALA-4524: Batch calls to ALTER TABLE...ADD PARTITION. This commit allows users to add more than 500 (=MAX_PARTITION_UPDATES_PER_RPC) partitions in a single ALTER TABLE command. We batch the operations against Hive into groups of 500. I tested this manually, creating 1002 partitions and observing the expected 3 API calls against the Hive Metastore in the log. I can confirm that there is coverage of this in some existing tests. A new, simple, test has been added that confirms that creating 502 partitions works. Change-Id: I95f8221ff08c0f126f951f7d37ff5e57985f855f --- M fe/src/main/java/org/apache/impala/analysis/AlterTableAddPartitionStmt.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java M tests/metadata/test_ddl.py 4 files changed, 39 insertions(+), 42 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/38/8238/4 -- To view, visit http://gerrit.cloudera.org:8080/8238 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I95f8221ff08c0f126f951f7d37ff5e57985f855f Gerrit-Change-Number: 8238 Gerrit-PatchSet: 4 Gerrit-Owner: Philip ZeyligerGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis
[Impala-ASF-CR] IMPALA-4524: Batch calls to ALTER TABLE...ADD PARTITION.
Dimitris Tsirogiannis has posted comments on this change. ( http://gerrit.cloudera.org:8080/8238 ) Change subject: IMPALA-4524: Batch calls to ALTER TABLE...ADD PARTITION. .. Patch Set 3: (2 comments) Re testing, a simple python test that adds at least MAX_PARTITION_UPDATES_PER_SEC + 1 partitions and then validates that they were actually added in the HMS (e.g. by doing show partitions) is sufficient. If you want you can convert the analysis test in AnalyzeDDLTest.java to a positive test but it is kind of redundant if you add the python test I just mentioned. http://gerrit.cloudera.org:8080/#/c/8238/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: http://gerrit.cloudera.org:8080/#/c/8238/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@a1954 PS3, Line 1954: Why did you remove this? http://gerrit.cloudera.org:8080/#/c/8238/3/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java@1974 PS3, Line 1974: if (hmsSublist.size() != addedHmsPartitions.size()) { : List difference = computeDifference(hmsSublist, : addedHmsPartitions); : addedHmsPartitions.addAll( : getPartitionsFromHms(msTbl, msClient, tableName, difference)); : } : : for (Partition partition: addedHmsPartitions) { : // Create and add the HdfsPartition to catalog. Return the table object with an : // updated catalog version. : addHdfsPartition(tbl, partition); : } I think you can refactor this so that you make only one call to getPartitionsFromHms() outside the for loop. -- To view, visit http://gerrit.cloudera.org:8080/8238 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I95f8221ff08c0f126f951f7d37ff5e57985f855f Gerrit-Change-Number: 8238 Gerrit-PatchSet: 3 Gerrit-Owner: Philip ZeyligerGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Comment-Date: Thu, 12 Oct 2017 16:59:32 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4524: Batch calls to ALTER TABLE...ADD PARTITION.
Hello Alex Behm, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/8238 to look at the new patch set (#3). Change subject: IMPALA-4524: Batch calls to ALTER TABLE...ADD PARTITION. .. IMPALA-4524: Batch calls to ALTER TABLE...ADD PARTITION. This commit allows users to add more than 500 (=MAX_PARTITION_UPDATES_PER_RPC) partitions in a single ALTER TABLE command. We batch the operations against Hive into groups of 500. I tested this manually, creating 1002 partitions and observing the expected 3 API calls against the Hive Metastore in the log. I can confirm that there is coverage of this in some existing tests. I'm very open to additional suggestions for how to test this. I can certainly add a query test that reproduces my manual testing, if reviewers feel like it's the right direction. Change-Id: I95f8221ff08c0f126f951f7d37ff5e57985f855f --- M fe/src/main/java/org/apache/impala/analysis/AlterTableAddPartitionStmt.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java 3 files changed, 12 insertions(+), 37 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/38/8238/3 -- To view, visit http://gerrit.cloudera.org:8080/8238 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I95f8221ff08c0f126f951f7d37ff5e57985f855f Gerrit-Change-Number: 8238 Gerrit-PatchSet: 3 Gerrit-Owner: Philip ZeyligerGerrit-Reviewer: Alex Behm