Re: [SQL] Why no numOutputRows metric for LocalTableScanExec in webUI?
Hi Ryan, That does make a lot of sense! Thanks for steering me in a right direction. Quoting SQLMetric [1]: > Updates on the driver side must be explicitly posted using SQLMetrics.postDriverMetricUpdates(). Why is LocalTableScanExec not following the "must" requirement? FileSourceScanExec does (and so does BroadcastExchangeExec, but that's not a data source so may have different reasons). [1] https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/metric/SQLMetrics.scala#L31-L32 Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski Spark Structured Streaming https://bit.ly/spark-structured-streaming Mastering Apache Spark 2 https://bit.ly/mastering-apache-spark Follow me at https://twitter.com/jaceklaskowski On Fri, Nov 17, 2017 at 2:30 AM, Shixiong(Ryan) Zhuwrote: > SQL metrics are collected using SparkListener. If there are no > tasks, org.apache.spark.sql.execution.ui.SQLListener cannot collect any > metrics. > > On Thu, Nov 16, 2017 at 1:53 AM, Jacek Laskowski wrote: > >> Hi, >> >> I seem to have figured out why the metric is not in the web UI for the >> query, but wish I knew how to explain it for any metric and operator. >> >> It seems that numOutputRows metric won't be displayed in web UI when a >> query uses no Spark jobs. >> >> val names = Seq("Jacek", "Agata").toDF("name") >> >> // no numOutputRows metric in web UI >> names.show >> >> // The query gives numOutputRows metric in web UI's Details for Query >> (SQL tab) >> scala> names.groupBy(length($"name")).count.show >> >> That must be somewhat generic and I think has nothing to do with >> LocalTableScanExec. Could anyone explain it in more detail? I'd appreciate. >> >> Pozdrawiam, >> Jacek Laskowski >> >> https://about.me/JacekLaskowski >> Spark Structured Streaming https://bit.ly/spark-structured-streaming >> Mastering Apache Spark 2 https://bit.ly/mastering-apache-spark >> Follow me at https://twitter.com/jaceklaskowski >> >> On Wed, Nov 15, 2017 at 10:14 PM, Jacek Laskowski >> wrote: >> >>> Hi, >>> >>> I've been playing with LocalTableScanExec and noticed that it >>> defines numOutputRows metric, but I couldn't find it in the diagram in web >>> UI's Details for Query in SQL tab. Why? >>> >>> scala> spark.version >>> res1: String = 2.3.0-SNAPSHOT >>> >>> scala> val hello = udf { s: String => s"Hello $s" } >>> hello: org.apache.spark.sql.expressions.UserDefinedFunction = >>> UserDefinedFunction(,StringType,Some(List(StringType))) >>> >>> scala> Seq("Jacek").toDF("name").select(hello($"name")).show >>> +---+ >>> | UDF(name)| >>> +---+ >>> |Hello Jacek| >>> +---+ >>> >>> http://localhost:4040/SQL/execution/?id=0 shows no metrics for >>> LocalTableScan. Is this intended? >>> >>> Pozdrawiam, >>> Jacek Laskowski >>> >>> https://about.me/JacekLaskowski >>> Spark Structured Streaming https://bit.ly/spark-structured-streaming >>> Mastering Apache Spark 2 https://bit.ly/mastering-apache-spark >>> Follow me at https://twitter.com/jaceklaskowski >>> >> >> >
Re: [SQL] Why no numOutputRows metric for LocalTableScanExec in webUI?
SQL metrics are collected using SparkListener. If there are no tasks, org.apache.spark.sql.execution.ui.SQLListener cannot collect any metrics. On Thu, Nov 16, 2017 at 1:53 AM, Jacek Laskowskiwrote: > Hi, > > I seem to have figured out why the metric is not in the web UI for the > query, but wish I knew how to explain it for any metric and operator. > > It seems that numOutputRows metric won't be displayed in web UI when a > query uses no Spark jobs. > > val names = Seq("Jacek", "Agata").toDF("name") > > // no numOutputRows metric in web UI > names.show > > // The query gives numOutputRows metric in web UI's Details for Query (SQL > tab) > scala> names.groupBy(length($"name")).count.show > > That must be somewhat generic and I think has nothing to do with > LocalTableScanExec. Could anyone explain it in more detail? I'd appreciate. > > Pozdrawiam, > Jacek Laskowski > > https://about.me/JacekLaskowski > Spark Structured Streaming https://bit.ly/spark-structured-streaming > Mastering Apache Spark 2 https://bit.ly/mastering-apache-spark > Follow me at https://twitter.com/jaceklaskowski > > On Wed, Nov 15, 2017 at 10:14 PM, Jacek Laskowski wrote: > >> Hi, >> >> I've been playing with LocalTableScanExec and noticed that it >> defines numOutputRows metric, but I couldn't find it in the diagram in web >> UI's Details for Query in SQL tab. Why? >> >> scala> spark.version >> res1: String = 2.3.0-SNAPSHOT >> >> scala> val hello = udf { s: String => s"Hello $s" } >> hello: org.apache.spark.sql.expressions.UserDefinedFunction = >> UserDefinedFunction(,StringType,Some(List(StringType))) >> >> scala> Seq("Jacek").toDF("name").select(hello($"name")).show >> +---+ >> | UDF(name)| >> +---+ >> |Hello Jacek| >> +---+ >> >> http://localhost:4040/SQL/execution/?id=0 shows no metrics for >> LocalTableScan. Is this intended? >> >> Pozdrawiam, >> Jacek Laskowski >> >> https://about.me/JacekLaskowski >> Spark Structured Streaming https://bit.ly/spark-structured-streaming >> Mastering Apache Spark 2 https://bit.ly/mastering-apache-spark >> Follow me at https://twitter.com/jaceklaskowski >> > >
Re: [SQL] Why no numOutputRows metric for LocalTableScanExec in webUI?
Hi, I seem to have figured out why the metric is not in the web UI for the query, but wish I knew how to explain it for any metric and operator. It seems that numOutputRows metric won't be displayed in web UI when a query uses no Spark jobs. val names = Seq("Jacek", "Agata").toDF("name") // no numOutputRows metric in web UI names.show // The query gives numOutputRows metric in web UI's Details for Query (SQL tab) scala> names.groupBy(length($"name")).count.show That must be somewhat generic and I think has nothing to do with LocalTableScanExec. Could anyone explain it in more detail? I'd appreciate. Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski Spark Structured Streaming https://bit.ly/spark-structured-streaming Mastering Apache Spark 2 https://bit.ly/mastering-apache-spark Follow me at https://twitter.com/jaceklaskowski On Wed, Nov 15, 2017 at 10:14 PM, Jacek Laskowskiwrote: > Hi, > > I've been playing with LocalTableScanExec and noticed that it > defines numOutputRows metric, but I couldn't find it in the diagram in web > UI's Details for Query in SQL tab. Why? > > scala> spark.version > res1: String = 2.3.0-SNAPSHOT > > scala> val hello = udf { s: String => s"Hello $s" } > hello: org.apache.spark.sql.expressions.UserDefinedFunction = > UserDefinedFunction(,StringType,Some(List(StringType))) > > scala> Seq("Jacek").toDF("name").select(hello($"name")).show > +---+ > | UDF(name)| > +---+ > |Hello Jacek| > +---+ > > http://localhost:4040/SQL/execution/?id=0 shows no metrics for > LocalTableScan. Is this intended? > > Pozdrawiam, > Jacek Laskowski > > https://about.me/JacekLaskowski > Spark Structured Streaming https://bit.ly/spark-structured-streaming > Mastering Apache Spark 2 https://bit.ly/mastering-apache-spark > Follow me at https://twitter.com/jaceklaskowski >
[SQL] Why no numOutputRows metric for LocalTableScanExec in webUI?
Hi, I've been playing with LocalTableScanExec and noticed that it defines numOutputRows metric, but I couldn't find it in the diagram in web UI's Details for Query in SQL tab. Why? scala> spark.version res1: String = 2.3.0-SNAPSHOT scala> val hello = udf { s: String => s"Hello $s" } hello: org.apache.spark.sql.expressions.UserDefinedFunction = UserDefinedFunction(,StringType,Some(List(StringType))) scala> Seq("Jacek").toDF("name").select(hello($"name")).show +---+ | UDF(name)| +---+ |Hello Jacek| +---+ http://localhost:4040/SQL/execution/?id=0 shows no metrics for LocalTableScan. Is this intended? Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski Spark Structured Streaming https://bit.ly/spark-structured-streaming Mastering Apache Spark 2 https://bit.ly/mastering-apache-spark Follow me at https://twitter.com/jaceklaskowski