[GitHub] [spark] HyukjinKwon commented on a change in pull request #35454: [SPARK-38145][PYTHON][TESTS] Make PySpark tests pass when "spark.sql.ansi.enabled" is True by default

GitBox Sun, 13 Feb 2022 18:39:29 -0800


HyukjinKwon commented on a change in pull request #35454:
URL: https://github.com/apache/spark/pull/35454#discussion_r805332697




##########
File path: python/pyspark/sql/dataframe.py
##########
@@ -1749,7 +1749,11 @@ def describe(self, *cols: Union[str, List[str]]) -> 
"DataFrame":
         |    min|                 2|
         |    max|                 5|
         +-------+------------------+
-        >>> df.describe().show()

Review comment:
       Can we improve this example by showing other supported types only?

##########
File path: python/pyspark/sql/dataframe.py
##########
@@ -1791,7 +1795,12 @@ def summary(self, *statistics: str) -> "DataFrame":
 
         Examples
         --------
-        >>> df.summary().show()

Review comment:
       ditto

##########
File path: python/pyspark/sql/functions.py
##########
@@ -2040,8 +2040,16 @@ def hour(col: "ColumnOrName") -> Column:
     Examples
     --------
     >>> df = spark.createDataFrame([('2015-04-08 13:08:15',)], ['ts'])

Review comment:
       Can we fix this example to use `datetime.datetime` instead?

##########
File path: python/pyspark/sql/functions.py
##########
@@ -2055,8 +2063,16 @@ def minute(col: "ColumnOrName") -> Column:
     Examples
     --------
     >>> df = spark.createDataFrame([('2015-04-08 13:08:15',)], ['ts'])

Review comment:
       ditto

##########
File path: python/pyspark/sql/functions.py
##########
@@ -2070,8 +2086,16 @@ def second(col: "ColumnOrName") -> Column:
     Examples
     --------
     >>> df = spark.createDataFrame([('2015-04-08 13:08:15',)], ['ts'])

Review comment:
       ditto

##########
File path: python/pyspark/sql/functions.py
##########
@@ -2573,10 +2597,20 @@ def window(
     Examples
     --------
     >>> df = spark.createDataFrame([("2016-03-11 09:00:07", 1)]).toDF("date", 
"val")

Review comment:
       ditto

##########
File path: python/pyspark/sql/tests/test_types.py
##########
@@ -118,8 +118,9 @@ def test_infer_schema(self):
 
         with self.tempView("test"):
             df.createOrReplaceTempView("test")
-            result = self.spark.sql("SELECT l[0].a from test where d['key'].d 
= '2'")
-            self.assertEqual(1, result.head()[0])
+            with self.sql_conf({"spark.sql.ansi.enabled": False}):

Review comment:
       Can you explain why we set this as `False`, and can we create another 
DataFrame that fails only when `spark.sql.ansi.enabled` is `False` instead of 
disabling it for all?

##########
File path: python/pyspark/sql/functions.py
##########
@@ -3661,12 +3695,21 @@ def element_at(col: "ColumnOrName", extraction: Any) -> 
Column:
 
     Examples
     --------
-    >>> df = spark.createDataFrame([(["a", "b", "c"],), ([],)], ['data'])
+    >>> df = spark.createDataFrame([(["a", "b", "c"],)], ['data'])
     >>> df.select(element_at(df.data, 1)).collect()
+    [Row(element_at(data, 1)='a')]
+
+    >>> df = spark.createDataFrame([({"a": 1.0, "b": 2.0},)], ['data'])
+    >>> df.select(element_at(df.data, lit("a"))).collect()
+    [Row(element_at(data, a)=1.0)]
+
+    When "spark.sql.ansi.enabled" is True, it raises excepton if element_at 
returns null.
+
+    >>> df = spark.createDataFrame([(["a", "b", "c"],), ([],)], ['data'])  # 
doctest: +SKIP
     [Row(element_at(data, 1)='a'), Row(element_at(data, 1)=None)]
 
     >>> df = spark.createDataFrame([({"a": 1.0, "b": 2.0},), ({},)], ['data'])
-    >>> df.select(element_at(df.data, lit("a"))).collect()
+    >>> df.select(element_at(df.data, lit("a"))).collect()  # doctest: +SKIP

Review comment:
       After rethinking, I think we can just remove these all for now. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] HyukjinKwon commented on a change in pull request #35454: [SPARK-38145][PYTHON][TESTS] Make PySpark tests pass when "spark.sql.ansi.enabled" is True by default

Reply via email to