[GitHub] [spark] c21 commented on pull request #32198: [SPARK-26164][SQL] Allow concurrent writers for writing dynamic partitions and bucket table

GitBox Wed, 28 Apr 2021 08:21:07 -0700


c21 commented on pull request #32198:
URL: https://github.com/apache/spark/pull/32198#issuecomment-828544330



   > One more thing, how much does this improve the write? Local sorts before 
the write are typically not too bad if you look at the cycles spend during the 
write. A much bigger target here would be to properly interleave I/O and CPU 
operations. You sort of achieve that by having multiple writers, but it IMO 
feels like quite a big hammer.
   
   I will add a benchmark for this as a followup.
   
   IMHO how much this can improve thing is really depending on query shape 
(cardinality of dynamic partitions and buckets). In one environment, if most 
queries having low number of partitions and users set buckets relatively small, 
this feature can help more. If in another environment, query tends to write a 
lot of partitions and users set buckets quite large, this feature helps less. 
We do see benefit for improving query internally and people raised the request 
in spark dev as well.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] c21 commented on pull request #32198: [SPARK-26164][SQL] Allow concurrent writers for writing dynamic partitions and bucket table

Reply via email to