zhengruifeng opened a new pull request, #46948:
URL: https://github.com/apache/spark/pull/46948

   ### What changes were proposed in this pull request?
   Fix the string representation of lambda function
   
   ### Why are the changes needed?
   I happen to hit this bug
   
   
   ### Does this PR introduce _any_ user-facing change?
   yes
   
   before
   ```
   In [2]: array_sort("data", lambda x, y: when(x.isNull() | y.isNull(), 
lit(0)).otherwise(length(y) - length(x)))
   Out[2]: 
---------------------------------------------------------------------------
   TypeError                                 Traceback (most recent call last)
   File 
~/.dev/miniconda3/envs/spark_dev_312/lib/python3.12/site-packages/IPython/core/formatters.py:711,
 in PlainTextFormatter.__call__(self, obj)
       704 stream = StringIO()
       705 printer = pretty.RepresentationPrinter(stream, self.verbose,
       706     self.max_width, self.newline,
       707     max_seq_length=self.max_seq_length,
       708     singleton_pprinters=self.singleton_printers,
       709     type_pprinters=self.type_printers,
       710     deferred_pprinters=self.deferred_printers)
   --> 711 printer.pretty(obj)
       712 printer.flush()
       713 return stream.getvalue()
   
   File 
~/.dev/miniconda3/envs/spark_dev_312/lib/python3.12/site-packages/IPython/lib/pretty.py:411,
 in RepresentationPrinter.pretty(self, obj)
       408                         return meth(obj, self, cycle)
       409                 if cls is not object \
       410                         and callable(cls.__dict__.get('__repr__')):
   --> 411                     return _repr_pprint(obj, self, cycle)
       413     return _default_pprint(obj, self, cycle)
       414 finally:
   
   File 
~/.dev/miniconda3/envs/spark_dev_312/lib/python3.12/site-packages/IPython/lib/pretty.py:779,
 in _repr_pprint(obj, p, cycle)
       777 """A pprint that just redirects to the normal repr function."""
       778 # Find newlines and replace them with p.break_()
   --> 779 output = repr(obj)
       780 lines = output.splitlines()
       781 with p.group():
   
   File ~/Dev/spark/python/pyspark/sql/connect/column.py:441, in 
Column.__repr__(self)
       440 def __repr__(self) -> str:
   --> 441     return "Column<'%s'>" % self._expr.__repr__()
   
   File ~/Dev/spark/python/pyspark/sql/connect/expressions.py:626, in 
UnresolvedFunction.__repr__(self)
       624     return f"{self._name}(distinct {', '.join([str(arg) for arg in 
self._args])})"
       625 else:
   --> 626     return f"{self._name}({', '.join([str(arg) for arg in 
self._args])})"
   
   File ~/Dev/spark/python/pyspark/sql/connect/expressions.py:962, in 
LambdaFunction.__repr__(self)
       961 def __repr__(self) -> str:
   --> 962     return f"(LambdaFunction({str(self._function)}, {', 
'.join(self._arguments)})"
   
   TypeError: sequence item 0: expected str instance, 
UnresolvedNamedLambdaVariable found
   ```
   
   after
   ```
   In [2]: array_sort("data", lambda x, y: when(x.isNull() | y.isNull(), 
lit(0)).otherwise(length(y) - length(x)))
   Out[2]: Column<'array_sort(data, LambdaFunction(CASE WHEN or(isNull(x_0), 
isNull(y_1)) THEN 0 ELSE -(length(y_1), length(x_0)) END, x_0, y_1))'>
   ```
   
   ### How was this patch tested?
   CI, added test
   
   ### Was this patch authored or co-authored using generative AI tooling?
   No


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to