[GitHub] [spark] HeartSaVioR opened a new pull request #27209: [SPARK-29450][SS][2.4] Measure the number of output rows for streaming aggregation with append mode

GitBox Tue, 14 Jan 2020 21:26:22 -0800

HeartSaVioR opened a new pull request #27209: [SPARK-29450][SS][2.4] Measure 
the number of output rows for streaming aggregation with append mode
URL: https://github.com/apache/spark/pull/27209
 
 
   ### What changes were proposed in this pull request?
   
   This patch addresses missing metric, the number of output rows for streaming 
aggregation with append mode. Other modes are correctly measuring it.
   
   ### Why are the changes needed?
   
   Without the patch, the value for such metric is always 0.
   
   ### Does this PR introduce any user-facing change?
   
   No.
   
   ### How was this patch tested?
   
   Unit test added. Also manually tested with below query:
   
   > query
   
   ```
   import spark.implicits._
   
   spark.conf.set("spark.sql.shuffle.partitions", "5")
   
   val df = spark.readStream
     .format("rate")
     .option("rowsPerSecond", 1000)
     .load()
     .withWatermark("timestamp", "5 seconds")
     .selectExpr("timestamp", "mod(value, 100) as mod", "value")
     .groupBy(window($"timestamp", "10 seconds"), $"mod")
     .agg(max("value").as("max_value"), min("value").as("min_value"), 
avg("value").as("avg_value"))
   
   val query = df
     .writeStream
     .format("memory")
     .option("queryName", "test")
     .outputMode("append")
     .start()
   
   query.awaitTermination()
   ```
   
   > before the patch
   
   
![screenshot-before-SPARK-29450](https://user-images.githubusercontent.com/1317309/69023217-58d7bc80-0a01-11ea-8cac-40f1cced6d16.png)
   
   > after the patch
   
   
![screenshot-after-SPARK-29450](https://user-images.githubusercontent.com/1317309/69023221-5c6b4380-0a01-11ea-8a66-7bf1b7d09fc7.png)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] HeartSaVioR opened a new pull request #27209: [SPARK-29450][SS][2.4] Measure the number of output rows for streaming aggregation with append mode

Reply via email to