Github user frreiss commented on a diff in the pull request:

    https://github.com/apache/spark/pull/15307#discussion_r81684775
  
    --- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala
 ---
    @@ -136,16 +139,30 @@ class StreamExecution(
       /** Whether the query is currently active or not */
       override def isActive: Boolean = state == ACTIVE
     
    +  override def queryStatus: StreamingQueryInfo = {
    +    this.toInfo
    +  }
    +
       /** Returns current status of all the sources. */
       override def sourceStatuses: Array[SourceStatus] = {
         val localAvailableOffsets = availableOffsets
         sources.map(s =>
    -      new SourceStatus(s.toString, 
localAvailableOffsets.get(s).map(_.toString))).toArray
    +      new SourceStatus(
    --- End diff --
    
    Actually, you can probably drop most of the synchronization if you keep two 
`StreamMetrics` objects and preallocate the slots for counters. At least the 
way things are now, each counter in `StreamMetrics` is written once per batch. 
If you tweak `sourceStatuses()` to return the metrics from the most recent 
completed batch (i.e. the `StreamMetrics` object that's not currently being 
written to), there should be no overlap between readers and writers. Eventually 
you'll want to have more than one `StreamMetrics` object anyway, since the 
scheduler will need to pipeline multiple batches to reach latencies below the 
50-100ms level.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to