[GitHub] [spark] viirya commented on a change in pull request #34333: [SPARK-37062][SS] Introduce a new data source for providing consistent set of rows per microbatch

GitBox Sun, 31 Oct 2021 15:30:46 -0700


viirya commented on a change in pull request #34333:
URL: https://github.com/apache/spark/pull/34333#discussion_r739885325




##########
File path: docs/structured-streaming-programming-guide.md
##########
@@ -517,6 +517,8 @@ There are a few built-in sources.
 
   - **Rate source (for testing)** - Generates data at the specified number of 
rows per second, each output row contains a `timestamp` and `value`. Where 
`timestamp` is a `Timestamp` type containing the time of message dispatch, and 
`value` is of `Long` type containing the message count, starting from 0 as the 
first row. This source is intended for testing and benchmarking.

Review comment:
       A little concern about the confusion between "rate" and "rate per 
micro-batch". For original "rate" data source, "Generates data at the specified 
number of rows per second," it sounds like the rate of data generation is 
fixed, so mostly it makes the assumption that the number of rows per 
micro-batch is the same or close. Could we put some wordings into "rate" or 
"rate per micro-batch" that clarifies the difference for end-users?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] viirya commented on a change in pull request #34333: [SPARK-37062][SS] Introduce a new data source for providing consistent set of rows per microbatch

Reply via email to