WweiL commented on code in PR #46996:
URL: https://github.com/apache/spark/pull/46996#discussion_r1649448417
##########
python/pyspark/sql/connect/dataframe.py:
##########
@@ -206,7 +208,10 @@ def _repr_html_(self) -> Optional[str]:
@property
def write(self) -> "DataFrameWriter":
- return DataFrameWriter(self._plan, self._session)
+ def cb(qe: "ExecutionInfo") -> None:
+ self._execution_info = qe
+
+ return DataFrameWriter(self._plan, self._session, cb)
Review Comment:
Looks like writeStream is not override here. So I imagine streaming query is
not supported yet.
In streaming a query could have multiple data frames, what we do in scala is
to access it with query.lastExecution
https://github.com/apache/spark/blob/94763438943e21ee8156a4f1a3facddbf8d45797/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala#L192
That's, as it's name, the QueryExecution(`IncrementalExecution`) of the last
execution.
We could also add a similar mechanism to StreamingQuery object. This sounds
like an interesting followup that im interested in
##########
python/pyspark/sql/connect/dataframe.py:
##########
@@ -206,7 +208,10 @@ def _repr_html_(self) -> Optional[str]:
@property
def write(self) -> "DataFrameWriter":
- return DataFrameWriter(self._plan, self._session)
+ def cb(qe: "ExecutionInfo") -> None:
+ self._execution_info = qe
+
+ return DataFrameWriter(self._plan, self._session, cb)
Review Comment:
Looks like writeStream is not overriden here. So I imagine streaming query
is not supported yet.
In streaming a query could have multiple data frames, what we do in scala is
to access it with query.lastExecution
https://github.com/apache/spark/blob/94763438943e21ee8156a4f1a3facddbf8d45797/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala#L192
That's, as it's name, the QueryExecution(`IncrementalExecution`) of the last
execution.
We could also add a similar mechanism to StreamingQuery object. This sounds
like an interesting followup that im interested in
##########
python/pyspark/sql/connect/dataframe.py:
##########
@@ -206,7 +208,10 @@ def _repr_html_(self) -> Optional[str]:
@property
def write(self) -> "DataFrameWriter":
- return DataFrameWriter(self._plan, self._session)
+ def cb(qe: "ExecutionInfo") -> None:
+ self._execution_info = qe
+
+ return DataFrameWriter(self._plan, self._session, cb)
Review Comment:
Looks like writeStream is not overriden here. So I imagine streaming query
is not supported yet.
In streaming a query could have multiple data frames, what we do in scala is
to access it with query.explain(), which uses this lastExecution
https://github.com/apache/spark/blob/94763438943e21ee8156a4f1a3facddbf8d45797/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala#L192
That's, as it's name, the QueryExecution(`IncrementalExecution`) of the last
execution.
We could also add a similar mechanism to StreamingQuery object. This sounds
like an interesting followup that im interested in
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]