Re: [PR] [SPARK-50130][SQL][FOLLOWUP] Simplify the resolution of LazyOuterReference [spark]

via GitHub Wed, 13 Nov 2024 14:11:30 -0800


ueshin commented on PR #48820:
URL: https://github.com/apache/spark/pull/48820#issuecomment-2474723006


   @cloud-fan 
   
   > In SQL, users can just write un-qualified column names to reference the 
outer plan. We should allow the same in DataFrame API.
   
   In that case, `sf.col("a")` should also be allowed instead of 
`sf.col("a").outer()` in the above example?
   
   ```py
   l.select(
       "a",
       (
           r
           .where(sf.col("b") == sf.col("a"))
           .select(sf.sum("d"))
           .scalar()
       ),
   ).show()
   ```
   
   otherwise, users may not know why it's necessary for `a`, but not for `b`.
   
   So far we do need at least one `.outer()` to make the analysis lazy, it 
should have a consistent meaning; otherwise we need another way to make the 
analysis lazy.
   
   
   cc @allisonwang-db who suggested the current `outer()` behavior.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-50130][SQL][FOLLOWUP] Simplify the resolution of LazyOuterReference [spark]

Reply via email to