[GitHub] [spark] HeartSaVioR commented on a change in pull request #34333: [SPARK-37062][SS] Introduce a new data source for providing consistent set of rows per microbatch

GitBox Sun, 31 Oct 2021 21:12:25 -0700


HeartSaVioR commented on a change in pull request #34333:
URL: https://github.com/apache/spark/pull/34333#discussion_r739946197




##########
File path: docs/structured-streaming-programming-guide.md
##########
@@ -517,6 +517,8 @@ There are a few built-in sources.
 
   - **Rate source (for testing)** - Generates data at the specified number of 
rows per second, each output row contains a `timestamp` and `value`. Where 
`timestamp` is a `Timestamp` type containing the time of message dispatch, and 
`value` is of `Long` type containing the message count, starting from 0 as the 
first row. This source is intended for testing and benchmarking.

Review comment:
       I'd love to address it, but honestly I have no idea other than below 
representation:
   
   * rate: Generates data at the specified number of rows per second
   * rate per micro-batch: Generates data at the specified number of rows per 
micro-batch
   
   Specified number of rows "per XXX" says the specified number of rows will be 
presented per XXX, so the main point they should check is the unit (per XXX). 
"per second" vs "per micro-batch" doesn't seem to make confusion IMHO.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] HeartSaVioR commented on a change in pull request #34333: [SPARK-37062][SS] Introduce a new data source for providing consistent set of rows per microbatch

Reply via email to