SCHJonathan commented on PR #53024:
URL: https://github.com/apache/spark/pull/53024#issuecomment-3524326196
> There are a bunch of code cleanup changes here that seem great but are
outside the critical path of the main goal of this PR (supporting `spark.sql`
inside pipelines). Would it be difficult to move those changes into a separate
PR to reduce risk?
@sryza I will polish up the PR more to reflect this but unfortunately I
think most of the changes are necessary:
1. We currently don't support eager analysis / execution outside flow
function that needs to go through pipeline analysis (e.g., `spark.sql("SELECT *
FROM external_table") outside the flow function or
`spark.read.table("external_table")`). They need to go through pipeline
analysis otherwise identifier won't be correctly qualified with current catalog
/ schema tracked inside the pipeline. I introduced a
`ExternalQueryAnalysisContext` to handle that
2. Currently, changing current catalog / schema is a SQL only concept and
related current catalog / schema tracking logic is inside
`SqlGraphRegisterationContext`, I need to port that to
`GraphRegisterationContext` as Python is using that.
3. There are a few unrelated format change caused by my local scalafmt, will
revert these before requesting formal review
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]