zhengruifeng commented on PR #38841:
URL: https://github.com/apache/spark/pull/38841#issuecomment-1331687981
> > 1, pyspark and scala api only accept string `*str` / `string*`
>
> @zhengruifeng can you elaborate? I tested the same code in PySpark and it
works as well.
>
> > 2, pyspark and scala api will check the schema, if the datatype is
unexpected, it fails;
>
> What do you mean?
>
> > 3, if the no input column, it will check the schema and select all the
numeric columns.
>
> This is more missing functionality than this particular bug correct?
1, pyspark and scala don't take expression or column as input:
```
In [11]: df = spark.createDataFrame([(10, 80, "Alice"), (5, None, "Bob"),
(None, 10, "Tom"), (None, None, None)], schema=["age", "height", "name"])
In [12]: df.show()
+----+------+-----+
| age|height| name|
+----+------+-----+
| 10| 80|Alice|
| 5| null| Bob|
|null| 10| Tom|
|null| null| null|
+----+------+-----+
In [13]: df.groupBy("age").min(df.height)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In[13], line 1
----> 1 df.groupBy("age").min(df.height)
File ~/Dev/spark/python/pyspark/sql/group.py:49, in
df_varargs_api.<locals>._api(self, *cols)
47 def _api(self: "GroupedData", *cols: str) -> DataFrame:
48 name = f.__name__
---> 49 jdf = getattr(self._jgd, name)(_to_seq(self.session._sc, cols))
50 return DataFrame(jdf, self.session)
...
TypeError: Column is not iterable
```
2, if input columns contains non-acceptable datatypes, it fails like
```
In [14]: df.groupBy("age").min("name")
---------------------------------------------------------------------------
AnalysisException Traceback (most recent call last)
Cell In[14], line 1
----> 1 df.groupBy("age").min("name")
File ~/Dev/spark/python/pyspark/sql/group.py:49, in
df_varargs_api.<locals>._api(self, *cols)
47 def _api(self: "GroupedData", *cols: str) -> DataFrame:
48 name = f.__name__
---> 49 jdf = getattr(self._jgd, name)(_to_seq(self.session._sc, cols))
50 return DataFrame(jdf, self.session)
File ~/Dev/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py:1322,
in JavaMember.__call__(self, *args)
1316 command = proto.CALL_COMMAND_NAME +\
1317 self.command_header +\
1318 args_command +\
1319 proto.END_COMMAND_PART
1321 answer = self.gateway_client.send_command(command)
-> 1322 return_value = get_return_value(
1323 answer, self.gateway_client, self.target_id, self.name)
1325 for temp_arg in temp_args:
1326 if hasattr(temp_arg, "_detach"):
File ~/Dev/spark/python/pyspark/sql/utils.py:205, in
capture_sql_exception.<locals>.deco(*a, **kw)
201 converted = convert_exception(e.java_exception)
202 if not isinstance(converted, UnknownException):
203 # Hide where the exception came from that shows a non-Pythonic
204 # JVM exception message.
--> 205 raise converted from None
206 else:
207 raise
AnalysisException: "name" is not a numeric column. Aggregation function can
only be applied on a numeric column.
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]