Re: [PR] [SPARK-47365][PYTHON] Add _toArrow() DataFrame method to PySpark [spark]

2024-05-07 Thread via GitHub


HyukjinKwon commented on code in PR #45481:
URL: https://github.com/apache/spark/pull/45481#discussion_r1592194102


##
python/pyspark/sql/connect/dataframe.py:
##
@@ -1775,6 +1775,10 @@ def _to_table(self) -> Tuple["pa.Table", 
Optional[StructType]]:
 assert table is not None
 return (table, schema)
 
+def _toArrow(self) -> "pa.Table":

Review Comment:
   Yeah, the current state looks a bit weird. We should either explicitly add 
an developer API, or just don't. My only question is that we can work around 
via `SparkSession.client.to_table()`, and we're going forward to Spark Connect 
default. Considering this is a developer API, I wonder if we should add this 
into DataFrame at this moment.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47365][PYTHON] Add _toArrow() DataFrame method to PySpark [spark]

2024-05-07 Thread via GitHub


grundprinzip commented on code in PR #45481:
URL: https://github.com/apache/spark/pull/45481#discussion_r1592074090


##
python/pyspark/sql/connect/dataframe.py:
##
@@ -1775,6 +1775,10 @@ def _to_table(self) -> Tuple["pa.Table", 
Optional[StructType]]:
 assert table is not None
 return (table, schema)
 
+def _toArrow(self) -> "pa.Table":

Review Comment:
   @HyukjinKwon I think it would be fine to make this a regular public method 
instead of the `_toArrow` maybe change it to `toArrowTable` to give a better 
indication of the return type.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47365][PYTHON] Add _toArrow() DataFrame method to PySpark [spark]

2024-05-07 Thread via GitHub


grundprinzip commented on code in PR #45481:
URL: https://github.com/apache/spark/pull/45481#discussion_r1592071117


##
python/pyspark/sql/connect/dataframe.py:
##
@@ -1775,6 +1775,10 @@ def _to_table(self) -> Tuple["pa.Table", 
Optional[StructType]]:
 assert table is not None
 return (table, schema)
 
+def _toArrow(self) -> "pa.Table":
+table = self._to_table()[0]

Review Comment:
   nit: Using this notion, makes the code a bit more readable and makes sure it 
fails if _to_table ever changes the tuple size, and thus behavior.
   
   ```suggestion
   table, _ = self._to_table()
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47365][PYTHON] Add _toArrow() DataFrame method to PySpark [spark]

2024-04-29 Thread via GitHub


ianmcook commented on code in PR #45481:
URL: https://github.com/apache/spark/pull/45481#discussion_r1583675813


##
python/pyspark/sql/connect/dataframe.py:
##
@@ -1775,6 +1775,12 @@ def _to_table(self) -> Tuple["pa.Table", 
Optional[StructType]]:
 assert table is not None
 return (table, schema)
 
+def _toArrow(self) -> "pa.Table":
+table = self._to_table()[0]
+return table
+
+_toArrow.__doc__ = PySparkDataFrame._toArrow.__doc__

Review Comment:
   I assume this line is no longer needed. This is what's causing the CI to 
fail. I will remove it.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47365][PYTHON] Add _toArrow() DataFrame method to PySpark [spark]

2024-04-23 Thread via GitHub


ianmcook commented on code in PR #45481:
URL: https://github.com/apache/spark/pull/45481#discussion_r1576718516


##
python/pyspark/sql/connect/dataframe.py:
##
@@ -1775,6 +1775,12 @@ def _to_table(self) -> Tuple["pa.Table", 
Optional[StructType]]:
 assert table is not None
 return (table, schema)
 
+def _toArrow(self) -> "pa.Table":
+table = self._to_table()[0]
+return table
+
+_toArrow.__doc__ = PySparkDataFrame._toArrow.__doc__

Review Comment:
   @HyukjinKwon is this line no longer needed after 
https://github.com/apache/spark/pull/46129?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-47365][PYTHON] Add _toArrow() DataFrame method to PySpark [spark]

2024-03-26 Thread via GitHub


ianmcook commented on PR #45481:
URL: https://github.com/apache/spark/pull/45481#issuecomment-2020635088

   cc @xinrong-meng @itholic @dongjoon-hyun


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org