[GitHub] [spark] amaliujia commented on a diff in pull request #38723: [SPARK-41201][CONNECT][PYTHON] Implement `DataFrame.SelectExpr` in Python client

GitBox Tue, 22 Nov 2022 17:19:09 -0800


amaliujia commented on code in PR #38723:
URL: https://github.com/apache/spark/pull/38723#discussion_r1029951172



##########
python/pyspark/sql/tests/connect/test_connect_basic.py:
##########
@@ -302,6 +301,31 @@ def test_to_pandas(self):
             self.spark.sql(query).toPandas(),
         )
 
+    def test_select_expr(self):
+        # SPARK-41201: test selectExpr API.
+        self.assert_eq(
+            self.connect.read.table(self.tbl_name).selectExpr("id * 
2").toPandas(),
+            self.spark.read.table(self.tbl_name).selectExpr("id * 
2").toPandas(),
+        )
+        self.assert_eq(
+            self.connect.read.table(self.tbl_name)
+            .selectExpr(["id * 2", "cast(name as long) as name"])
+            .toPandas(),
+            self.spark.read.table(self.tbl_name)
+            .selectExpr(["id * 2", "cast(name as long) as name"])
+            .toPandas(),
+        )
+
+        self.assert_eq(
+            self.connect.read.table(self.tbl_name)
+            .selectExpr("id * 2", "cast(name as long) as name")
+            .toPandas(),
+            self.spark.read.table(self.tbl_name)
+            .selectExpr("id * 2", "cast(name as long) as name")
+            .toPandas(),
+        )
+
+    @unittest.skip("test_fill_na is flaky")

Review Comment:
   I am pretty sure I removed this after conflict resolution. 
   
   Actually Martin pointed out another case: 
https://github.com/apache/spark/pull/38723#discussion_r1028942189
   
   Basically it seems happened more than once that after code conflict 
resolution, the code I want to keep is gone.|
   
   Maybe I should always do a `-i` commits square to in case more than 1 commit 
rebase causing unexpected result.



##########
python/pyspark/sql/tests/connect/test_connect_basic.py:
##########
@@ -302,6 +301,31 @@ def test_to_pandas(self):
             self.spark.sql(query).toPandas(),
         )
 
+    def test_select_expr(self):
+        # SPARK-41201: test selectExpr API.
+        self.assert_eq(
+            self.connect.read.table(self.tbl_name).selectExpr("id * 
2").toPandas(),
+            self.spark.read.table(self.tbl_name).selectExpr("id * 
2").toPandas(),
+        )
+        self.assert_eq(
+            self.connect.read.table(self.tbl_name)
+            .selectExpr(["id * 2", "cast(name as long) as name"])
+            .toPandas(),
+            self.spark.read.table(self.tbl_name)
+            .selectExpr(["id * 2", "cast(name as long) as name"])
+            .toPandas(),
+        )
+
+        self.assert_eq(
+            self.connect.read.table(self.tbl_name)
+            .selectExpr("id * 2", "cast(name as long) as name")
+            .toPandas(),
+            self.spark.read.table(self.tbl_name)
+            .selectExpr("id * 2", "cast(name as long) as name")
+            .toPandas(),
+        )
+
+    @unittest.skip("test_fill_na is flaky")

Review Comment:
   I will follow up this soon.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] amaliujia commented on a diff in pull request #38723: [SPARK-41201][CONNECT][PYTHON] Implement `DataFrame.SelectExpr` in Python client

Reply via email to