WweiL commented on code in PR #41026:
URL: https://github.com/apache/spark/pull/41026#discussion_r1184008423
##########
sql/core/src/main/scala/org/apache/spark/sql/streaming/DataStreamWriter.scala:
##########
@@ -534,6 +552,8 @@ final class DataStreamWriter[T] private[sql](ds:
Dataset[T]) {
private var foreachWriter: ForeachWriter[T] = null
+ private var pythonForeachWriter: PythonForeachWriter = null
+
Review Comment:
I definitely think a discussion is needed here. The reason why I add this is
because `foreachWriter` cannot be used. `foreachWriter`'s type parameter needs
to be the same as the DataStreamWriter's, which is `Row`. But
`PythonForeachWriter` extends `ForeachWriter[UnsafeRow]`:
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/python/PythonForeachWriter.scala#L33-L34
Therefore I can't just create a `PythonForeachWriter` and call
`writer.foreach(pythonForeachWriter)` in `SparkConnectPlanner`.
The original caller is in python doesn't do this additional step and just
calls `foreach`:
https://github.com/apache/spark/blob/master/python/pyspark/sql/streaming/readwriter.py#L1309C24-L1315
Maybe that circumvent this check somehow, I'm also very interested if anyone
knows why is that
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]