[GitHub] [spark] HyukjinKwon edited a comment on issue #26461: [SPARK-29831][SQL] Scan Hive partitioned table should not dramatically increase data parallelism

GitBox Tue, 12 Nov 2019 01:58:10 -0800

HyukjinKwon edited a comment on issue #26461: [SPARK-29831][SQL] Scan Hive 
partitioned table should not dramatically increase data parallelism
URL: https://github.com/apache/spark/pull/26461#issuecomment-552819993
 
 
   > Maybe we can also rely on maxSplitBytes in Hive Scan and decide 
parallelism?
   
   This sounds fine in general but IIRC there have been several tries to merge 
big Hive partitions if I am not wrong; however, it needed a pretty big change 
which I don't think is worthy. e.g. https://github.com/apache/spark/pull/10572


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] HyukjinKwon edited a comment on issue #26461: [SPARK-29831][SQL] Scan Hive partitioned table should not dramatically increase data parallelism

Reply via email to