Github user vanzin commented on the issue:
https://github.com/apache/spark/pull/16179
> After the change these temporary data files in staging directory of
InsertIntoHiveTable will be moved to the table location instead of copying to
the table location. Is that right?
It depends. Without this change, it would depend on where the table was. If
the table was in HDFS (or anything but the local FS), the files would be moved,
so the behavior doesn't change. If the table was in the local filesystem,
before this change the files would be copied, and later deleted when the
staging directory was deleted. So in the end, it's the same thing.
With the change, the data would be moved in both cases, which is also
correct and leads to the same result.
I just want to reinforce, again, that this is not about a change in
behavior in Hive at all. This is Spark using a Hive API incorrectly.
> VersionSuite is also being used for testing end-to-end behaviors in
#16104.
I'm not sure that's such a great idea, but in any case, the tests for this
change are the existing tests in "InsertIntoHiveTableSuite" and
"HiveCommandSuite". So basically you'd be asking to run those against all
different version of Hive metastores supported by Spark. It's doable, but
that's a bigger change that I don't really think is necessary here. The Hive
semantics haven't changed. Spark was depending on undocumented behavior that
worked out of luck, and this change fixes that.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]