prakharjain09 commented on a change in pull request #26569: [SPARK-29938] [SQL] 
Add batching support in Alter table add partition flow
URL: https://github.com/apache/spark/pull/26569#discussion_r357100114
 
 

 ##########
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala
 ##########
 @@ -470,14 +470,36 @@ case class AlterTableAddPartitionCommand(
       CatalogTablePartition(normalizedSpec, table.storage.copy(
         locationUri = location.map(CatalogUtils.stringToURI)))
     }
-    catalog.createPartitions(table.identifier, parts, ignoreIfExists = 
ifNotExists)
+
+    // Hive metastore may not have enough memory to handle millions of 
partitions in single RPC.
+    // Also the request to metastore times out when adding lot of partitions 
in one shot.
+    // we should split them into smaller batches
+    val batchSize = 100
 
 Review comment:
   Reverted making config for batch size after some experiments and discussion 
with @srowen .
   
   we noticed that different batch size doesn't change the overall performance. 
So we don't need to make this configurable. Probably thats the reason it was 
not made configurable in  [AlterTableRecoverPartitions 
earlier](https://github.com/apache/spark/blob/v2.4.4/sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala#L740).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to