HyukjinKwon edited a comment on issue #26461: [SPARK-29831][SQL] Scan Hive partitioned table should not dramatically increase data parallelism URL: https://github.com/apache/spark/pull/26461#issuecomment-552819993 > Maybe we can also rely on maxSplitBytes in Hive Scan and decide parallelism? This sounds fine in general but IIRC there have been several tries to merge big Hive partitions if I am not wrong; however, it needed a pretty big change which I don't think is worthy. e.g. https://github.com/apache/spark/pull/10572
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
