Re: [PR] [SPARK-47365][PYTHON] Add _toArrow() DataFrame method to PySpark [spark]
HyukjinKwon commented on code in PR #45481: URL: https://github.com/apache/spark/pull/45481#discussion_r1592194102 ## python/pyspark/sql/connect/dataframe.py: ## @@ -1775,6 +1775,10 @@ def _to_table(self) -> Tuple["pa.Table", Optional[StructType]]: assert table is not None return (table, schema) +def _toArrow(self) -> "pa.Table": Review Comment: Yeah, the current state looks a bit weird. We should either explicitly add an developer API, or just don't. My only question is that we can work around via `SparkSession.client.to_table()`, and we're going forward to Spark Connect default. Considering this is a developer API, I wonder if we should add this into DataFrame at this moment. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47365][PYTHON] Add _toArrow() DataFrame method to PySpark [spark]
grundprinzip commented on code in PR #45481: URL: https://github.com/apache/spark/pull/45481#discussion_r1592074090 ## python/pyspark/sql/connect/dataframe.py: ## @@ -1775,6 +1775,10 @@ def _to_table(self) -> Tuple["pa.Table", Optional[StructType]]: assert table is not None return (table, schema) +def _toArrow(self) -> "pa.Table": Review Comment: @HyukjinKwon I think it would be fine to make this a regular public method instead of the `_toArrow` maybe change it to `toArrowTable` to give a better indication of the return type. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47365][PYTHON] Add _toArrow() DataFrame method to PySpark [spark]
grundprinzip commented on code in PR #45481: URL: https://github.com/apache/spark/pull/45481#discussion_r1592071117 ## python/pyspark/sql/connect/dataframe.py: ## @@ -1775,6 +1775,10 @@ def _to_table(self) -> Tuple["pa.Table", Optional[StructType]]: assert table is not None return (table, schema) +def _toArrow(self) -> "pa.Table": +table = self._to_table()[0] Review Comment: nit: Using this notion, makes the code a bit more readable and makes sure it fails if _to_table ever changes the tuple size, and thus behavior. ```suggestion table, _ = self._to_table() ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47365][PYTHON] Add _toArrow() DataFrame method to PySpark [spark]
ianmcook commented on code in PR #45481: URL: https://github.com/apache/spark/pull/45481#discussion_r1583675813 ## python/pyspark/sql/connect/dataframe.py: ## @@ -1775,6 +1775,12 @@ def _to_table(self) -> Tuple["pa.Table", Optional[StructType]]: assert table is not None return (table, schema) +def _toArrow(self) -> "pa.Table": +table = self._to_table()[0] +return table + +_toArrow.__doc__ = PySparkDataFrame._toArrow.__doc__ Review Comment: I assume this line is no longer needed. This is what's causing the CI to fail. I will remove it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47365][PYTHON] Add _toArrow() DataFrame method to PySpark [spark]
ianmcook commented on code in PR #45481: URL: https://github.com/apache/spark/pull/45481#discussion_r1576718516 ## python/pyspark/sql/connect/dataframe.py: ## @@ -1775,6 +1775,12 @@ def _to_table(self) -> Tuple["pa.Table", Optional[StructType]]: assert table is not None return (table, schema) +def _toArrow(self) -> "pa.Table": +table = self._to_table()[0] +return table + +_toArrow.__doc__ = PySparkDataFrame._toArrow.__doc__ Review Comment: @HyukjinKwon is this line no longer needed after https://github.com/apache/spark/pull/46129? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
Re: [PR] [SPARK-47365][PYTHON] Add _toArrow() DataFrame method to PySpark [spark]
ianmcook commented on PR #45481: URL: https://github.com/apache/spark/pull/45481#issuecomment-2020635088 cc @xinrong-meng @itholic @dongjoon-hyun -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org