[GitHub] [spark] WweiL commented on a diff in pull request #41026: [SPARK-43132] [SS] [CONNECT] Add DataStreamWriter foreach() API

via GitHub Wed, 03 May 2023 13:23:38 -0700


WweiL commented on code in PR #41026:
URL: https://github.com/apache/spark/pull/41026#discussion_r1184008423



##########
sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamWriter.scala:
##########
@@ -534,6 +552,8 @@ final class DataStreamWriter[T] private[sql](ds: 
Dataset[T]) {
 
   private var foreachWriter: ForeachWriter[T] = null
 
+  private var pythonForeachWriter: PythonForeachWriter = null
+

Review Comment:
   I definitely think a discussion is needed here. The reason why I add this is 
because `foreachWriter` cannot be used. `foreachWriter`'s type parameter needs 
to be the same as the DataStreamWriter's, which is `Row`. But 
`PythonForeachWriter` extends `ForeachWriter[UnsafeRow]`:
   
   
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/python/PythonForeachWriter.scala#L33-L34
   
   Therefore I can't just create a `PythonForeachWriter` and call 
`writer.foreach(pythonForeachWriter)` in `SparkConnectPlanner`.
   
   The original caller is in python doesn't do this additional step and just 
calls `foreach`:
   
https://github.com/apache/spark/blob/master/python/pyspark/sql/streaming/readwriter.py#L1309C24-L1315
   
   Maybe that circumvent this check somehow, I'm also very interested if anyone 
knows why is that



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] WweiL commented on a diff in pull request #41026: [SPARK-43132] [SS] [CONNECT] Add DataStreamWriter foreach() API

Reply via email to