Github user tdas commented on a diff in the pull request:
https://github.com/apache/spark/pull/15307#discussion_r82857011
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamExecution.scala
---
@@ -516,12 +563,127 @@ class StreamExecution(
""".stripMargin
}
- private def toInfo: StreamingQueryInfo = {
- new StreamingQueryInfo(
- this.name,
- this.id,
- this.sourceStatuses,
- this.sinkStatus)
+ /**
+ * Report row metrics of the executed trigger
+ * @param triggerExecutionPlan Execution plan of the trigger
+ * @param triggerLogicalPlan Logical plan of the trigger, generated from
the query logical plan
+ * @param sourceToDF Source to DataFrame returned by the source.getBatch
in this trigger
+ */
+ private def reportNumRows(
+ triggerExecutionPlan: SparkPlan,
+ triggerLogicalPlan: LogicalPlan,
+ sourceToDF: Map[Source, DataFrame]): Unit = {
+ // We want to associate execution plan leaves to sources that generate
them, so that we match
+ // the their metrics (e.g. numOutputRows) to the sources. To do this
we do the following.
+ // Consider the translation from the streaming logical plan to the
final executed plan.
+ //
+ // streaming logical plan (with sources) <==> trigger's logical plan
<==> executed plan
+ //
+ // 1. We keep track of streaming sources associated with each leaf in
the trigger's logical plan
+ // - Each logical plan leaf will be associated with a single
streaming source.
+ // - There can be multiple logical plan leaves associated a
streaming source.
--- End diff --
fixed.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]