HyukjinKwon commented on a change in pull request #27406:
[SPARK-30681][PYSPARK][SQL] Add higher order functions API to PySpark
URL: https://github.com/apache/spark/pull/27406#discussion_r373896752
##########
File path: python/pyspark/sql/column.py
##########
@@ -129,6 +129,103 @@ def _(self, other):
return _
+def _unresolved_named_lambda_variable(*name_parts):
+ """
+ Create o.a.s.sql.expressions.UnresolvedNamedLambdaVariable and
+ convert it to o.s.sql.Column
+
+ :param name_parts: str
+ """
+ sc = SparkContext._active_spark_context
+ name_parts_seq = _to_seq(sc, name_parts)
+ expressions = sc._jvm.org.apache.spark.sql.catalyst.expressions
+ return Column(
+ sc._jvm.Column(
+ expressions.UnresolvedNamedLambdaVariable(name_parts_seq)
+ )
+ )
+
+
+def _get_lambda_parameters(f):
+ import inspect
+
+ signature = inspect.signature(f)
+ parameters = signature.parameters.values()
+
+ # We should exclude functions that use
+ # variable args and keyword argnames
+ # as well as keyword only args
+ supported_parmeter_types = {
+ inspect.Parameter.POSITIONAL_OR_KEYWORD,
+ inspect.Parameter.POSITIONAL_ONLY,
+ }
+
+ # Validate that
+ # function arity is between 1 and 3
+ if not (1 <= len(parameters) <= 3):
Review comment:
I am not sure if this is a good way to duplicately handle error cases. I
mean, for simple cechking, I am good with that but if it needs some
considerable codes like the current, I am not sure yet.
I fully understand the concern about JVM stracktrace being exposed to Python
users; however, I don't think we should add such checking to every API in
PySpark. It will end up with duplicating whole
[CheckAnalysis.scala](https://github.com/apache/spark/blob/290a528bff7bcb449714c1c6f1885bd0f804358d/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala)
and
[TypeCoercion.scala](https://github.com/apache/spark/blob/297f406425d410e5c450a9fbe24679b49f00a553/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala).
I admit that I don't have a good idea except just fixing
[utils.py#L76-L92](https://github.com/apache/spark/blob/0a95eb08003a115f59495b30aacaaa832940e977/python/pyspark/sql/utils.py#L76-L92)
somehow better.
Hope we can discuss separately in a separate PR.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]