Re: [PR] Jonathan chang data/spark sql [spark]

via GitHub Wed, 12 Nov 2025 15:30:04 -0800


SCHJonathan commented on PR #53024:
URL: https://github.com/apache/spark/pull/53024#issuecomment-3524326196


   > There are a bunch of code cleanup changes here that seem great but are 
outside the critical path of the main goal of this PR (supporting `spark.sql` 
inside pipelines). Would it be difficult to move those changes into a separate 
PR to reduce risk?
   
   @sryza I will polish up the PR more to reflect this but unfortunately I 
think most of the changes are necessary:
   1. We currently don't support eager analysis / execution outside flow 
function that needs to go through pipeline analysis (e.g., `spark.sql("SELECT * 
FROM external_table") outside the flow function or 
`spark.read.table("external_table")`). They need to go through pipeline 
analysis otherwise identifier won't be correctly qualified with current catalog 
/ schema tracked inside the pipeline. I introduced a 
`ExternalQueryAnalysisContext` to handle that
   2. Currently, changing current catalog / schema is a SQL only concept and 
related current catalog / schema tracking logic is inside 
`SqlGraphRegisterationContext`, I need to port that to 
`GraphRegisterationContext` as Python is using that.
   3. There are a few unrelated format change caused by my local scalafmt, will 
revert these before requesting formal review


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Jonathan chang data/spark sql [spark]

Reply via email to