AngersZhuuuu opened a new pull request #35608:
URL: https://github.com/apache/spark/pull/35608
### What changes were proposed in this pull request?
Currently, we verify path in DataSourceAnalysis
```
// For dynamic partition overwrite, we do not delete partition directories
ahead.
// We write to staging directories and move to final partition directories
after writing
// job is done. So it is ok to have outputPath try to overwrite inputpath.
if (overwrite && !insertCommand.dynamicPartitionOverwrite) {
DDLUtils.verifyNotReadPath(actualQuery, outputPath)
}
/**
* Throws exception if outputPath tries to overwrite inputpath.
*/
def verifyNotReadPath(query: LogicalPlan, outputPath: Path) : Unit = {
val inputPaths = query.collect {
case LogicalRelation(r: HadoopFsRelation, _, _, _) =>
r.location.rootPaths
}.flatten
if (inputPaths.contains(outputPath)) {
throw new AnalysisException(
"Cannot overwrite a path that is also being read from.")
}
}
````
For static partition insert and read data form same table, it's really a
normal case. This bug troubles user a lot.
In this pr, for static partition insert, we can use same logical like
dynamic partition overwrite to avoid this issue.
### Why are the changes needed?
Support more ETL case
### Does this PR introduce _any_ user-facing change?
After this patch, user can:
1. Insert overwrite static partition from data read from same table's
partition
2. Insert overwrite static partition from data read from same table's same
partition
### How was this patch tested?
Added UT
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]