[GitHub] [spark] zero323 commented on a change in pull request #34354: [WIP][SPARK-37085][PYTHON][SQL] Add list/tuple overloads to array, struct, create_map, map_concat
zero323 commented on a change in pull request #34354: URL: https://github.com/apache/spark/pull/34354#discussion_r756336353 ## File path: python/pyspark/sql/functions.py ## @@ -3514,7 +3538,19 @@ def map_from_arrays(col1: "ColumnOrName", col2: "ColumnOrName") -> Column: return Column(sc._jvm.functions.map_from_arrays(_to_java_column(col1), _to_java_column(col2))) +@overload def array(*cols: "ColumnOrName") -> Column: +... + + +@overload +def array(__cols: Union[List["ColumnOrName"], Tuple["ColumnOrName", ...]]) -> Column: +... + + +def array( +*cols: Union["ColumnOrName", Union[List["ColumnOrName"], Tuple["ColumnOrName", ...]]] Review comment: In general, nested `Unions` are flattened automatically: ```python >>> Union["ColumnOrName", Union[List["ColumnOrName"], Tuple["ColumnOrName", ...]]] typing.Union[ForwardRef('ColumnOrName'), typing.List[ForwardRef('ColumnOrName')], typing.Tuple[ForwardRef('ColumnOrName'), ... ``` Personally, I find structure that clearly groups similar categories useful (for similar problem see https://github.com/apache/spark/pull/34671#discussion_r754532969), but I am not very attached to this idea. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] zero323 commented on a change in pull request #34354: [WIP][SPARK-37085][PYTHON][SQL] Add list/tuple overloads to array, struct, create_map, map_concat
zero323 commented on a change in pull request #34354: URL: https://github.com/apache/spark/pull/34354#discussion_r734849802 ## File path: python/pyspark/sql/functions.py ## @@ -1652,7 +1652,19 @@ def expr(str: str) -> Column: return Column(sc._jvm.functions.expr(str)) +@overload def struct(*cols: "ColumnOrName") -> Column: +... + + +@overload +def struct(__cols: Union[List["ColumnOrName"], Tuple["ColumnOrName", ...]]) -> Column: Review comment: > I know it backfires in some contexts, but maybe not here. But we'd need explicit checks for strings, like ```python if len(cols) == 1 and isinstance(cols[0], Sequence) and not isinstance(cols[0], str): cols = cols[0] ... ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] zero323 commented on a change in pull request #34354: [WIP][SPARK-37085][PYTHON][SQL] Add list/tuple overloads to array, struct, create_map, map_concat
zero323 commented on a change in pull request #34354: URL: https://github.com/apache/spark/pull/34354#discussion_r734825579 ## File path: python/pyspark/sql/functions.py ## @@ -1652,7 +1652,19 @@ def expr(str: str) -> Column: return Column(sc._jvm.functions.expr(str)) +@overload def struct(*cols: "ColumnOrName") -> Column: +... + + +@overload +def struct(__cols: Union[List["ColumnOrName"], Tuple["ColumnOrName", ...]]) -> Column: Review comment: In general, I think we have a bigger problem with current aliases outliving their usefulness, but that's a topic for a longer discussion and maybe formal design document. Sigh -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] zero323 commented on a change in pull request #34354: [WIP][SPARK-37085][PYTHON][SQL] Add list/tuple overloads to array, struct, create_map, map_concat
zero323 commented on a change in pull request #34354: URL: https://github.com/apache/spark/pull/34354#discussion_r734824096 ## File path: python/pyspark/sql/functions.py ## @@ -1652,7 +1652,19 @@ def expr(str: str) -> Column: return Column(sc._jvm.functions.expr(str)) +@overload def struct(*cols: "ColumnOrName") -> Column: +... + + +@overload +def struct(__cols: Union[List["ColumnOrName"], Tuple["ColumnOrName", ...]]) -> Column: Review comment: In general, I think we have a bigger problem with current aliases outliving their usefulness, but that's a topic for a longer discussion and maybe formal design document. Sigh -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] zero323 commented on a change in pull request #34354: [WIP][SPARK-37085][PYTHON][SQL] Add list/tuple overloads to array, struct, create_map, map_concat
zero323 commented on a change in pull request #34354: URL: https://github.com/apache/spark/pull/34354#discussion_r734824096 ## File path: python/pyspark/sql/functions.py ## @@ -1652,7 +1652,19 @@ def expr(str: str) -> Column: return Column(sc._jvm.functions.expr(str)) +@overload def struct(*cols: "ColumnOrName") -> Column: +... + + +@overload +def struct(__cols: Union[List["ColumnOrName"], Tuple["ColumnOrName", ...]]) -> Column: Review comment: In general, I think we have a bigger problem with current aliases outliving their usability, but that's a topic for a longer discussion and maybe formal design document. Sigh -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] zero323 commented on a change in pull request #34354: [WIP][SPARK-37085][PYTHON][SQL] Add list/tuple overloads to array, struct, create_map, map_concat
zero323 commented on a change in pull request #34354: URL: https://github.com/apache/spark/pull/34354#discussion_r734823494 ## File path: python/pyspark/sql/functions.py ## @@ -1652,7 +1652,19 @@ def expr(str: str) -> Column: return Column(sc._jvm.functions.expr(str)) +@overload def struct(*cols: "ColumnOrName") -> Column: +... + + +@overload +def struct(__cols: Union[List["ColumnOrName"], Tuple["ColumnOrName", ...]]) -> Column: Review comment: > How about using more general type, like `Sequence` or `Iterable`? Yeah, this is something that has bothering me for a couple of days now. More general type would be great (assuming we'd modify the code, what wouldn't be a bad idea anyway), if it wasn't for the fact, that `str` is recursively `Sequence[str] ` / `Iterable[str]`. ```python from typing import Sequence, Iterable x: Sequence[str] = "abc" y: Iterable[str] = "abc" ``` I know it backfires in some contexts, but maybe not here. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] zero323 commented on a change in pull request #34354: [WIP][SPARK-37085][PYTHON][SQL] Add list/tuple overloads to array, struct, create_map, map_concat
zero323 commented on a change in pull request #34354: URL: https://github.com/apache/spark/pull/34354#discussion_r734483147 ## File path: python/pyspark/sql/functions.py ## @@ -3455,7 +3469,19 @@ def translate(srcCol: "ColumnOrName", matching: str, replace: str) -> Column: # -- Collection functions -- +@overload def create_map(*cols: "ColumnOrName") -> Column: +... + + +@overload +def create_map(__cols: Union[List["ColumnOrName"], Tuple["ColumnOrName", ...]]) -> Column: Review comment: This indicates that argument should be treated as positional-only. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org