Github user tdas commented on a diff in the pull request:
https://github.com/apache/spark/pull/16258#discussion_r92072609
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingQueryStatusAndProgressSuite.scala
---
@@ -38,13 +38,18 @@ class StreamingQueryStatusAndProgressSuite extends
SparkFunSuite {
| "id" : "${testProgress1.id.toString}",
| "runId" : "${testProgress1.runId.toString}",
| "name" : "myName",
- | "timestamp" : "2016-12-05T20:54:20.827Z",
+ | "triggerTimestamp" : "2016-12-05T20:54:20.827Z",
| "numInputRows" : 678,
| "inputRowsPerSecond" : 10.0,
| "durationMs" : {
| "total" : 0
| },
- | "currentWatermark" : 3,
+ | "queryTimestamps" : {
+ | "eventTime.avg" : "2016-12-05T20:54:20.827Z",
--- End diff --
My view is that the `StreamingQueryProgress` class is not just for
monitoring but for debugging as well. The batchProcessingTime may be important
for debugging why a batch generate some results in that 1% of the case where
trigger time is different from the processing time. And in those cases, there
is no other way to expose what the batchProcessingTime was that batch was if
not exposed through the Progress API.
That said, we could not expose batchProcessingTime now and expose only
eventTime. But it may be more complex to add another new field in Progress to
expose the processing time (as it cannot be added to the map once we name it
`eventTime`).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]