[GitHub] [spark] HyukjinKwon commented on a change in pull request #30835: [SPARK-33836][SS][PYTHON] Expose DataStreamReader.table and DataStreamWriter.toTable

GitBox Fri, 18 Dec 2020 05:02:44 -0800


HyukjinKwon commented on a change in pull request #30835:
URL: https://github.com/apache/spark/pull/30835#discussion_r545816971




##########
File path: python/pyspark/sql/streaming.py
##########
@@ -1464,11 +1491,82 @@ def start(self, path=None, format=None, 
outputMode=None, partitionBy=None, query
         else:
             return self._sq(self._jwrite.start(path))
 
+    def toTable(self, tableName, format=None, outputMode=None, 
partitionBy=None, queryName=None,
+                **options):
+        """
+        Streams the contents of the :class:`DataFrame` to the output table.
+
+        A new table will be created if the table not exists. The returned 
[[StreamingQuery]]
+        object can be used to interact with the stream.
+
+        .. versionadded:: 3.2.0
+
+        Parameters
+        ----------
+        tableName : str
+            string, for the name of the table.
+        format : str, optional
+            the format used to save.
+        outputMode : str, optional
+            specifies how data of a streaming DataFrame/Dataset is written to a
+            streaming sink.
+
+            * `append`: Only the new rows in the streaming DataFrame/Dataset 
will be written to the
+              sink
+            * `complete`: All the rows in the streaming DataFrame/Dataset will 
be written to the
+              sink every time these is some updates
+            * `update`: only the rows that were updated in the streaming 
DataFrame/Dataset will be
+              written to the sink every time there are some updates. If the 
query doesn't contain
+              aggregations, it will be equivalent to `append` mode.
+        partitionBy : str or list, optional
+            names of partitioning columns
+        queryName : str, optional
+            unique name for the query
+        **options : dict
+            All other string options. You may want to provide a 
`checkpointLocation`.
+
+        Notes
+        -----
+        This API is evolving.
+
+        Examples
+        --------
+        >>> sq = 
sdf.writeStream.format('parquet').queryName('this_query').option(
+        ...      'checkpointLocation', 
'/tmp/checkpoint').toTable(output_table_name)

Review comment:
       @HeartSaVioR I think we should avoid using `/tmp` directory directly, 
see also SPARK-24126. Probably we should use `tempfile.mkdtemp()` instead.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] HyukjinKwon commented on a change in pull request #30835: [SPARK-33836][SS][PYTHON] Expose DataStreamReader.table and DataStreamWriter.toTable

Reply via email to