zero323 edited a comment on pull request #34354:
URL: https://github.com/apache/spark/pull/34354#issuecomment-949961121
> > I think we might have to redefine `ColumnOrName` to fully support these
>
> What's your idea like?
Long story short, I've been looking into different scenarios for using
aliases and types. Adding inline hints, definitely introduced uses cases that
we didn't have before, most notably `casts` (which further separate into cases
where we have generics, bound from function signature and none of the above).
And there are variances, which pop up here in there.
I suspect, that some of the cases where invariant generics hit us, might be
addressed with bounded type vars:
```python
from typing import overload, List, TypeVar, Union
from pyspark.sql import Column
from pyspark.sql.functions import col
ColumnOrName = Union[str, Column]
ColumnOrName_ = TypeVar("ColumnOrName_", bound=ColumnOrName)
def array(__cols: List[ColumnOrName_]): ...
column_names = ["a", "b", "c"]
array(column_names)
columns = [col(x) for x in column_names]
array(columns)
```
but these are not universal and there might be some caveats that I don't see
at the moment.
I hope there will be an opportunity to discuss this stuff in a more
interactive manner.
(_Note_: `ColumnOrName` is still needed for `casts` and other annotations in
contexts where `ColumnOrName_` would be unbound, like functions without
`ColumnOrName _` in arguments).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]