dtenedor opened a new pull request, #46918:
URL: https://github.com/apache/spark/pull/46918

   ### What changes were proposed in this pull request?
   
   This PR fixes a bug that resulted in an internal error with some combination 
of the Python UDTF "select" and "partitionBy" options of the "analyze" method.
   
   To reproduce:
   
   ```
   from pyspark.sql.functions import (
       AnalyzeArgument,
       AnalyzeResult,
       PartitioningColumn,
       SelectedColumn,
       udtf
   )
   
   from pyspark.sql.types import (
       DoubleType,
       StringType,
       StructType,
   )
   
   @udtf
   class TestTvf:
       @staticmethod
       def analyze(observed: AnalyzeArgument) -> AnalyzeResult:
           out_schema = StructType()
           out_schema.add("partition_col", StringType())
           out_schema.add("double_col", DoubleType())
   
           return AnalyzeResult(
               schema=out_schema,
               partitionBy=[PartitioningColumn("partition_col")],
               select=[
                   SelectedColumn("partition_col"),
                   SelectedColumn("double_col"),
               ],
           )
   
       def eval(self, *args, **kwargs):
           pass
   
       def terminate(self):
           for _ in range(10):
               yield {
                   "partition_col": None,
                   "double_col": 1.0,
               }
   
   
   spark.udtf.register("serialize_test", TestTvf) 
   
   # Fails
   (
       spark
       .sql(
           """
           SELECT * FROM serialize_test(
               TABLE(
                   SELECT
                       5 AS unused_col,
                       'hi' AS partition_col,
                       1.0 AS double_col
                   
                   UNION ALL
   
                   SELECT
                       4 AS unused_col,
                       'hi' AS partition_col,
                       1.0 AS double_col
               )
           )
           """
       )
       .toPandas()
   )
   ```
   
   ### Why are the changes needed?
   
   The above query returned internal errors before, but works now.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No
   
   ### How was this patch tested?
   
   Additional golden file coverage
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   Some light GitHub copilot usage


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to