Github user frreiss commented on a diff in the pull request:
https://github.com/apache/spark/pull/15307#discussion_r81684775
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala
---
@@ -136,16 +139,30 @@ class StreamExecution(
/** Whether the query is currently active or not */
override def isActive: Boolean = state == ACTIVE
+ override def queryStatus: StreamingQueryInfo = {
+ this.toInfo
+ }
+
/** Returns current status of all the sources. */
override def sourceStatuses: Array[SourceStatus] = {
val localAvailableOffsets = availableOffsets
sources.map(s =>
- new SourceStatus(s.toString,
localAvailableOffsets.get(s).map(_.toString))).toArray
+ new SourceStatus(
--- End diff --
Actually, you can probably drop most of the synchronization if you keep two
`StreamMetrics` objects and preallocate the slots for counters. At least the
way things are now, each counter in `StreamMetrics` is written once per batch.
If you tweak `sourceStatuses()` to return the metrics from the most recent
completed batch (i.e. the `StreamMetrics` object that's not currently being
written to), there should be no overlap between readers and writers. Eventually
you'll want to have more than one `StreamMetrics` object anyway, since the
scheduler will need to pipeline multiple batches to reach latencies below the
50-100ms level.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]