prakharjain09 commented on a change in pull request #26569: [SPARK-29938] [SQL]
Add batching support in Alter table add partition flow
URL: https://github.com/apache/spark/pull/26569#discussion_r357100114
##########
File path:
sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala
##########
@@ -470,14 +470,36 @@ case class AlterTableAddPartitionCommand(
CatalogTablePartition(normalizedSpec, table.storage.copy(
locationUri = location.map(CatalogUtils.stringToURI)))
}
- catalog.createPartitions(table.identifier, parts, ignoreIfExists =
ifNotExists)
+
+ // Hive metastore may not have enough memory to handle millions of
partitions in single RPC.
+ // Also the request to metastore times out when adding lot of partitions
in one shot.
+ // we should split them into smaller batches
+ val batchSize = 100
Review comment:
Reverted making config for batch size after some experiments and discussion
with @srowen .
we noticed that different batch size doesn't change the overall performance.
So we don't need to make this configurable. Probably thats the reason it was
not made configurable in [AlterTableRecoverPartitions
earlier](https://github.com/apache/spark/blob/v2.4.4/sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala#L740).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]