[PR] [SPARK-49027][CONNECT][SQL] Share Column API between Class and Connect [spark]

via GitHub Wed, 21 Aug 2024 19:06:07 -0700


hvanhovell opened a new pull request, #47839:
URL: https://github.com/apache/spark/pull/47839


   ### What changes were proposed in this pull request?
   This PR moves the sql Column to sql/api. A (foreseen) consequence of this is 
that we need to move the Spark Connect Scala Client to the shared API. This 
makes the PR quite large.
   
   ### Why are the changes needed?
   We want to create a Scala Client interface that is shared between Classic 
and Connect.
   
   ### Does this PR introduce _any_ user-facing change?
   Connect does no support untyped Scala UDFs until we figure out a solution 
for the conf system.
   
   ### How was this patch tested?
   Mostly existing tests. I had to change a couple of tests on the connect side:
   - UDAF related tests used a feature that was not supposed to work in the 
first place. You could use a typed aggregator on an untyped Dataset (Row). I 
have added a test for the scenario where you use an untyped aggregator on an 
untyped dataset.
   - In some cases the shared Column API generates slightly different 
structures (more/less arguments, different names, or different structure). The 
verbatims used by `PlanGenerationTestSuite` and `ProtoToParsedPlanTestSuite` 
were updated to reflect this. The protos that had non trivial changes were 
renamed with the `_orphaned` suffix, this to make sure we keep testing plans 
produced by older clients.
   
   
   ### Was this patch authored or co-authored using generative AI tooling?
   No.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[PR] [SPARK-49027][CONNECT][SQL] Share Column API between Class and Connect [spark]

Reply via email to