[GitHub] [spark] viirya commented on a diff in pull request #37461: [SPARK-40027][PYTHON][SS][DOCS] Add self-contained examples for pyspark.sql.streaming.readwriter

GitBox Tue, 09 Aug 2022 23:51:42 -0700


viirya commented on code in PR #37461:
URL: https://github.com/apache/spark/pull/37461#discussion_r942081038



##########
python/pyspark/sql/streaming/readwriter.py:
##########
@@ -908,22 +1168,34 @@ def foreach(self, f: Union[Callable[[Row], None], 
"SupportsProcess"]) -> "DataSt
 
         Examples
         --------
-        >>> # Print every row using a function
+        >>> import time
+        >>> df = spark.readStream.format("rate").load()
+
+        Print every row using a function
+
         >>> def print_row(row):
         ...     print(row)
         ...
-        >>> writer = sdf.writeStream.foreach(print_row)
-        >>> # Print every row using a object with process() method
+        >>> q = df.writeStream.foreach(print_row).start()
+        >>> time.sleep(3)
+        >>> q.stop()
+
+        Print every row using a object with process() method
+
         >>> class RowPrinter:
         ...     def open(self, partition_id, epoch_id):
         ...         print("Opened %d, %d" % (partition_id, epoch_id))
         ...         return True
+        ...
         ...     def process(self, row):
         ...         print(row)
+        ...
         ...     def close(self, error):
         ...         print("Closed with error: %s" % str(error))
         ...
-        >>> writer = sdf.writeStream.foreach(RowPrinter())
+        >>> q = df.writeStream.trigger(once=True).foreach(print_row).start()

Review Comment:
   ditto



##########
python/pyspark/sql/streaming/readwriter.py:
##########
@@ -775,14 +1022,27 @@ def trigger(
 
         Examples
         --------
-        >>> # trigger the query for execution every 5 seconds
-        >>> writer = sdf.writeStream.trigger(processingTime='5 seconds')
-        >>> # trigger the query for just once batch of data
-        >>> writer = sdf.writeStream.trigger(once=True)
-        >>> # trigger the query for execution every 5 seconds
-        >>> writer = sdf.writeStream.trigger(continuous='5 seconds')
-        >>> # trigger the query for reading all available data with multiple 
batches
-        >>> writer = sdf.writeStream.trigger(availableNow=True)
+        >>> df = spark.readStream.format("rate").load()
+
+        Trigger the query for execution every 5 seconds
+
+        >>> df.writeStream.trigger(processingTime='5 seconds')
+        <pyspark.sql.streaming.readwriter.DataStreamWriter object ...>
+
+        Trigger the query for just once batch of data
+
+        >>> df.writeStream.trigger(once=True)

Review Comment:
   I think `once` is deprecated?



##########
python/pyspark/sql/streaming/readwriter.py:
##########
@@ -638,7 +808,25 @@ def format(self, source: str) -> "DataStreamWriter":
 
         Examples
         --------
-        >>> writer = sdf.writeStream.format('json')
+        >>> df = spark.readStream.format("rate").load()
+        >>> df.writeStream.format("text")
+        <pyspark.sql.streaming.readwriter.DataStreamWriter object ...>

Review Comment:
   Is this redundant? Looks not related to the example below.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] viirya commented on a diff in pull request #37461: [SPARK-40027][PYTHON][SS][DOCS] Add self-contained examples for pyspark.sql.streaming.readwriter

Reply via email to