AgenticSpark opened a new pull request, #56726:
URL: https://github.com/apache/spark/pull/56726
### What changes were proposed in this pull request?
This adds a `_name` property to the PySpark `Column` class that returns the
column's name, alias, or expression as a string -- the same string shown
inside
`Column.__repr__` (`Column<'...'>`). It is implemented for both Spark Classic
(`self._jc.toString()`) and Spark Connect (`self._expr.__repr__()`).
```python
>>> from pyspark.sql import functions as sf
>>> df = spark.createDataFrame([(2, "Alice")], ["age", "name"])
>>> df.age._name
'age'
>>> sf.col("value")._name
'value'
>>> sf.col("a").cast("int")._name
'CAST(a AS INT)'
```
The leading underscore intentionally avoids a collision with the existing
`Column.name` method, which is an alias for `Column.alias`.
### Why are the changes needed?
Requested in
[SPARK-38483](https://issues.apache.org/jira/browse/SPARK-38483).
Having the name available as an attribute enables convenient patterns, e.g.
re-aliasing an expression with the source column's name, or branching on a
column's name inside a helper function:
```python
values = sf.col("values")
distinct_values = sf.array_distinct(values).alias(values._name)
def custom_function(col):
return col.cast("int") if col._name == "my_column" else
col.cast("string")
```
Previously the name was only obtainable by parsing `repr(col)`.
### Does this PR introduce _any_ user-facing change?
Yes -- a new `Column._name` property is available. There is no change to any
existing behavior.
### How was this patch tested?
Added `test_name_property` to `ColumnTestsMixin`, so it runs under both the
classic (`pyspark.sql.tests.test_column`) and Spark Connect parity
(`pyspark.sql.tests.connect.test_parity_column`) suites. It checks concrete
values and the invariant `repr(col) == "Column<'%s'>" % col._name`. Doctests
were also added on the new property.
### Was this patch authored or co-authored using generative AI tooling?
Generated-by: GitHub Copilot CLI (Claude Opus 4.8)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]