zhengruifeng opened a new pull request, #46948:
URL: https://github.com/apache/spark/pull/46948
### What changes were proposed in this pull request?
Fix the string representation of lambda function
### Why are the changes needed?
I happen to hit this bug
### Does this PR introduce _any_ user-facing change?
yes
before
```
In [2]: array_sort("data", lambda x, y: when(x.isNull() | y.isNull(),
lit(0)).otherwise(length(y) - length(x)))
Out[2]:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
File
~/.dev/miniconda3/envs/spark_dev_312/lib/python3.12/site-packages/IPython/core/formatters.py:711,
in PlainTextFormatter.__call__(self, obj)
704 stream = StringIO()
705 printer = pretty.RepresentationPrinter(stream, self.verbose,
706 self.max_width, self.newline,
707 max_seq_length=self.max_seq_length,
708 singleton_pprinters=self.singleton_printers,
709 type_pprinters=self.type_printers,
710 deferred_pprinters=self.deferred_printers)
--> 711 printer.pretty(obj)
712 printer.flush()
713 return stream.getvalue()
File
~/.dev/miniconda3/envs/spark_dev_312/lib/python3.12/site-packages/IPython/lib/pretty.py:411,
in RepresentationPrinter.pretty(self, obj)
408 return meth(obj, self, cycle)
409 if cls is not object \
410 and callable(cls.__dict__.get('__repr__')):
--> 411 return _repr_pprint(obj, self, cycle)
413 return _default_pprint(obj, self, cycle)
414 finally:
File
~/.dev/miniconda3/envs/spark_dev_312/lib/python3.12/site-packages/IPython/lib/pretty.py:779,
in _repr_pprint(obj, p, cycle)
777 """A pprint that just redirects to the normal repr function."""
778 # Find newlines and replace them with p.break_()
--> 779 output = repr(obj)
780 lines = output.splitlines()
781 with p.group():
File ~/Dev/spark/python/pyspark/sql/connect/column.py:441, in
Column.__repr__(self)
440 def __repr__(self) -> str:
--> 441 return "Column<'%s'>" % self._expr.__repr__()
File ~/Dev/spark/python/pyspark/sql/connect/expressions.py:626, in
UnresolvedFunction.__repr__(self)
624 return f"{self._name}(distinct {', '.join([str(arg) for arg in
self._args])})"
625 else:
--> 626 return f"{self._name}({', '.join([str(arg) for arg in
self._args])})"
File ~/Dev/spark/python/pyspark/sql/connect/expressions.py:962, in
LambdaFunction.__repr__(self)
961 def __repr__(self) -> str:
--> 962 return f"(LambdaFunction({str(self._function)}, {',
'.join(self._arguments)})"
TypeError: sequence item 0: expected str instance,
UnresolvedNamedLambdaVariable found
```
after
```
In [2]: array_sort("data", lambda x, y: when(x.isNull() | y.isNull(),
lit(0)).otherwise(length(y) - length(x)))
Out[2]: Column<'array_sort(data, LambdaFunction(CASE WHEN or(isNull(x_0),
isNull(y_1)) THEN 0 ELSE -(length(y_1), length(x_0)) END, x_0, y_1))'>
```
### How was this patch tested?
CI, added test
### Was this patch authored or co-authored using generative AI tooling?
No
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]