HyukjinKwon commented on a change in pull request #35454:
URL: https://github.com/apache/spark/pull/35454#discussion_r805332697
##########
File path: python/pyspark/sql/dataframe.py
##########
@@ -1749,7 +1749,11 @@ def describe(self, *cols: Union[str, List[str]]) ->
"DataFrame":
| min| 2|
| max| 5|
+-------+------------------+
- >>> df.describe().show()
Review comment:
Can we improve this example by showing other supported types only?
##########
File path: python/pyspark/sql/dataframe.py
##########
@@ -1791,7 +1795,12 @@ def summary(self, *statistics: str) -> "DataFrame":
Examples
--------
- >>> df.summary().show()
Review comment:
ditto
##########
File path: python/pyspark/sql/functions.py
##########
@@ -2040,8 +2040,16 @@ def hour(col: "ColumnOrName") -> Column:
Examples
--------
>>> df = spark.createDataFrame([('2015-04-08 13:08:15',)], ['ts'])
Review comment:
Can we fix this example to use `datetime.datetime` instead?
##########
File path: python/pyspark/sql/functions.py
##########
@@ -2055,8 +2063,16 @@ def minute(col: "ColumnOrName") -> Column:
Examples
--------
>>> df = spark.createDataFrame([('2015-04-08 13:08:15',)], ['ts'])
Review comment:
ditto
##########
File path: python/pyspark/sql/functions.py
##########
@@ -2070,8 +2086,16 @@ def second(col: "ColumnOrName") -> Column:
Examples
--------
>>> df = spark.createDataFrame([('2015-04-08 13:08:15',)], ['ts'])
Review comment:
ditto
##########
File path: python/pyspark/sql/functions.py
##########
@@ -2573,10 +2597,20 @@ def window(
Examples
--------
>>> df = spark.createDataFrame([("2016-03-11 09:00:07", 1)]).toDF("date",
"val")
Review comment:
ditto
##########
File path: python/pyspark/sql/tests/test_types.py
##########
@@ -118,8 +118,9 @@ def test_infer_schema(self):
with self.tempView("test"):
df.createOrReplaceTempView("test")
- result = self.spark.sql("SELECT l[0].a from test where d['key'].d
= '2'")
- self.assertEqual(1, result.head()[0])
+ with self.sql_conf({"spark.sql.ansi.enabled": False}):
Review comment:
Can you explain why we set this as `False`, and can we create another
DataFrame that fails only when `spark.sql.ansi.enabled` is `False` instead of
disabling it for all?
##########
File path: python/pyspark/sql/functions.py
##########
@@ -3661,12 +3695,21 @@ def element_at(col: "ColumnOrName", extraction: Any) ->
Column:
Examples
--------
- >>> df = spark.createDataFrame([(["a", "b", "c"],), ([],)], ['data'])
+ >>> df = spark.createDataFrame([(["a", "b", "c"],)], ['data'])
>>> df.select(element_at(df.data, 1)).collect()
+ [Row(element_at(data, 1)='a')]
+
+ >>> df = spark.createDataFrame([({"a": 1.0, "b": 2.0},)], ['data'])
+ >>> df.select(element_at(df.data, lit("a"))).collect()
+ [Row(element_at(data, a)=1.0)]
+
+ When "spark.sql.ansi.enabled" is True, it raises excepton if element_at
returns null.
+
+ >>> df = spark.createDataFrame([(["a", "b", "c"],), ([],)], ['data']) #
doctest: +SKIP
[Row(element_at(data, 1)='a'), Row(element_at(data, 1)=None)]
>>> df = spark.createDataFrame([({"a": 1.0, "b": 2.0},), ({},)], ['data'])
- >>> df.select(element_at(df.data, lit("a"))).collect()
+ >>> df.select(element_at(df.data, lit("a"))).collect() # doctest: +SKIP
Review comment:
After rethinking, I think we can just remove these all for now.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]