Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/18866
does this work with append? Even you shuffle the data before writing, we
still may have multiple files for one bucket.
Is it possible to generalize this patch to data source level? The current
approach looks very hacky and is way beyond our expection that hive is also a
data source.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]