szehon-ho commented on PR #52866: URL: https://github.com/apache/spark/pull/52866#issuecomment-3509623236
Discuss offline with @cloud-fan. The new condition unfortunately forces us to move to rule to be evaluated after an initial pass of ResolveReferences. Because we are checking value assignments (from source table), it is important that these are resolved so we can be sure they mean its an assignment from there. Overall the logic is: 1. ResolveReferences resolves all columns it can. But leave unresolved assignment because it can be an assignment key that does not exist yet in the target schema, and will be added later in schema evolution. 2. ResolveMergeIntoSchemaEvolution now runs after ResolveReferences. It must unresolve all expressions, because they were resolved on the old table. This triggers another run of ResolveReferences. 3. The final run of ResolveReferences will resolve all references based on the new target table. Many changes: 1. Change ResolveReferences to no longer eagerly throw exception on the first run, but to throw on the second run after schema evolution is evaluated. 2. Change ResolveReferences to expand UPDATE SET * and INSERT * to fill in missing assignments for columns in source and not target (to trigger the ResolveMergeIntoSchamEvolution condition) 1. ResolveMergeIntoSchemaEvolution: Add a guard MergeIntoTable.canEvaluateSchemaEvolution to not trigger until ResolveReferences is run the first time (it checks if all assignments are either resolved, or if not they are possibly solved by schema evolution) 3. ResolveMergeIntoSchamEvolution: calculate sourceSchemaForEvolution which is the columns in source schema that will be added to target, pruning those that are directly subject of an assignment where the key does not exist in column but is referenced by a assignment value from a source column/field of the same name. 4. ResolveMergeIntoSchamEvolution: unresolve everything. Because it runs now after ResolveReferences, we need to re-resolve everything because the target table changes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
