HyukjinKwon commented on a change in pull request #30835:
URL: https://github.com/apache/spark/pull/30835#discussion_r545818496
##########
File path: python/pyspark/sql/streaming.py
##########
@@ -1464,11 +1491,82 @@ def start(self, path=None, format=None,
outputMode=None, partitionBy=None, query
else:
return self._sq(self._jwrite.start(path))
+ def toTable(self, tableName, format=None, outputMode=None,
partitionBy=None, queryName=None,
+ **options):
+ """
+ Streams the contents of the :class:`DataFrame` to the output table.
+
+ A new table will be created if the table not exists. The returned
[[StreamingQuery]]
+ object can be used to interact with the stream.
+
+ .. versionadded:: 3.2.0
+
+ Parameters
+ ----------
+ tableName : str
+ string, for the name of the table.
+ format : str, optional
+ the format used to save.
+ outputMode : str, optional
+ specifies how data of a streaming DataFrame/Dataset is written to a
+ streaming sink.
+
+ * `append`: Only the new rows in the streaming DataFrame/Dataset
will be written to the
+ sink
+ * `complete`: All the rows in the streaming DataFrame/Dataset will
be written to the
+ sink every time these is some updates
+ * `update`: only the rows that were updated in the streaming
DataFrame/Dataset will be
+ written to the sink every time there are some updates. If the
query doesn't contain
+ aggregations, it will be equivalent to `append` mode.
+ partitionBy : str or list, optional
+ names of partitioning columns
+ queryName : str, optional
+ unique name for the query
+ **options : dict
+ All other string options. You may want to provide a
`checkpointLocation`.
+
+ Notes
+ -----
+ This API is evolving.
+
+ Examples
+ --------
+ >>> sq =
sdf.writeStream.format('parquet').queryName('this_query').option(
+ ... 'checkpointLocation',
'/tmp/checkpoint').toTable(output_table_name)
+ >>> sq.isActive
+ True
+ >>> sq.name
+ 'this_query'
+ >>> sq.stop()
+ >>> sq.isActive
+ False
+ >>> sq = sdf.writeStream.trigger(processingTime='5 seconds').toTable(
+ ... output_table_name, queryName='that_query',
outputMode="append", format='parquet',
+ ... checkpointLocation='/tmp/checkpoint')
+ >>> sq.name
+ 'that_query'
+ >>> sq.isActive
+ True
+ >>> sq.stop()
+ """
+ # FIXME: address SPARK-33659 here as well
Review comment:
I would make it like `TODO(SPARK-33659): blah blah` which I believe is
more usual.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]