Re: Custom Metric Sink on Executor Always ClassNotFound
Just curious - is this HttpSink your own custom sink or Dropwizard configuration? If your own custom code, I would suggest looking/trying out the Dropwizard. See http://spark.apache.org/docs/latest/monitoring.html#metrics https://metrics.dropwizard.io/4.0.0/ Also, from what I know, the metrics from the tasks/executors are sent as accumulator values to the driver and the driver makes it available to the desired sink. Furthermore, even without a custom HttpSink, there's already a builtin REST API available that provides you metrics See http://spark.apache.org/docs/latest/monitoring.html#rest-api While you can surely create your own custom sink (code), I would say try out custom configuration first as it will make Spark upgrades easy. On 12/20/18, 3:53 PM, "Marcelo Vanzin" wrote: First, it's really weird to use "org.apache.spark" for a class that is not in Spark. For executors, the jar file of the sink needs to be in the system classpath; the application jar is not in the system classpath, so that does not work. There are different ways for you to get it there, most of them manual (YARN is, I think, the only RM supported in Spark where the application itself can do it). On Thu, Dec 20, 2018 at 1:48 PM prosp4300 wrote: > > Hi, Spark Users > > I'm play with spark metric monitoring, and want to add a custom sink which is HttpSink that send the metric through Restful API > A subclass of Sink "org.apache.spark.metrics.sink.HttpSink" is created and packaged within application jar > > It works for driver instance, but once enabled for executor instance, following ClassNotFoundException will be throw out. This seems due to MetricSystem is started very early for executor before application jar is loaded. > > I wonder is there any way or best practice to add custom sink for executor instance? > > 18/12/21 04:58:32 ERROR MetricsSystem: Sink class org.apache.spark.metrics.sink.HttpSink cannot be instantiated > 18/12/21 04:58:32 WARN UserGroupInformation: PriviledgedActionException as:yarn (auth:SIMPLE) cause:java.lang.ClassNotFoundException: org.apache.spark.metrics.sink.HttpSink > Exception in thread "main" java.lang.reflect.UndeclaredThrowableException > at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1933) > at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:66) > at org.apache.spark.executor.CoarseGrainedExecutorBackend$.run(CoarseGrainedExecutorBackend.scala:188) > at org.apache.spark.executor.CoarseGrainedExecutorBackend$.main(CoarseGrainedExecutorBackend.scala:284) > at org.apache.spark.executor.CoarseGrainedExecutorBackend.main(CoarseGrainedExecutorBackend.scala) > Caused by: java.lang.ClassNotFoundException: org.apache.spark.metrics.sink.HttpSink > at java.net.URLClassLoader.findClass(URLClassLoader.java:381) > at java.lang.ClassLoader.loadClass(ClassLoader.java:424) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335) > at java.lang.ClassLoader.loadClass(ClassLoader.java:357) > at java.lang.Class.forName0(Native Method) > at java.lang.Class.forName(Class.java:348) > at org.apache.spark.util.Utils$.classForName(Utils.scala:230) > at org.apache.spark.metrics.MetricsSystem$$anonfun$registerSinks$1.apply(MetricsSystem.scala:198) > at org.apache.spark.metrics.MetricsSystem$$anonfun$registerSinks$1.apply(MetricsSystem.scala:194) > at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99) > at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99) > at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:230) > at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40) > at scala.collection.mutable.HashMap.foreach(HashMap.scala:99) > at org.apache.spark.metrics.MetricsSystem.registerSinks(MetricsSystem.scala:194) > at org.apache.spark.metrics.MetricsSystem.start(MetricsSystem.scala:102) > at org.apache.spark.SparkEnv$.create(SparkEnv.scala:366) > at org.apache.spark.SparkEnv$.createExecutorEnv(SparkEnv.scala:201) > at org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$run$1.apply$mcV$sp(CoarseGrainedExecutorBackend.scala:223) > at org.apache.spark.deploy.SparkHadoopUtil$$anon$2.run(SparkHadoopUtil.scala:67) > at org.apache.spark.deploy.SparkHadoopUtil$$anon$2.run(SparkHadoopUtil.scala:66) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920) > ... 4 more > stdout0,*container_e81_1541584460930_3814_01_05� > spark.log36118/12/21 04:58:00 ERROR org.apache.spark.metrics.MetricsSystem.logError:70 - Sink class org.apache.spark.metrics.sink.HttpSink c
Re:Re: Custom Metric Sink on Executor Always ClassNotFound
Thanks a lot for the explanation Spark declare the Sink trait with package private, that's why the package looks weird, the metric system seems not intent to be extended package org.apache.spark.metrics.sink private[spark] trait Sink Make the custom sink class available on every executor system classpath is what an application developer want to avoid, because the sink only required for specific application, and it can be difficult to maintain. If it's possible to get MetricSystem at executor level and register the custom sink there, then the problem can be resolved in a better way, not sure how to achieve this. Thanks a lot At 2018-12-21 05:53:31, "Marcelo Vanzin" wrote: >First, it's really weird to use "org.apache.spark" for a class that is >not in Spark. > >For executors, the jar file of the sink needs to be in the system >classpath; the application jar is not in the system classpath, so that >does not work. There are different ways for you to get it there, most >of them manual (YARN is, I think, the only RM supported in Spark where >the application itself can do it). > >On Thu, Dec 20, 2018 at 1:48 PM prosp4300 wrote: >> >> Hi, Spark Users >> >> I'm play with spark metric monitoring, and want to add a custom sink which >> is HttpSink that send the metric through Restful API >> A subclass of Sink "org.apache.spark.metrics.sink.HttpSink" is created and >> packaged within application jar >> >> It works for driver instance, but once enabled for executor instance, >> following ClassNotFoundException will be throw out. This seems due to >> MetricSystem is started very early for executor before application jar is >> loaded. >> >> I wonder is there any way or best practice to add custom sink for executor >> instance? >> >> 18/12/21 04:58:32 ERROR MetricsSystem: Sink class >> org.apache.spark.metrics.sink.HttpSink cannot be instantiated >> 18/12/21 04:58:32 WARN UserGroupInformation: PriviledgedActionException >> as:yarn (auth:SIMPLE) cause:java.lang.ClassNotFoundException: >> org.apache.spark.metrics.sink.HttpSink >> Exception in thread "main" java.lang.reflect.UndeclaredThrowableException >> at >> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1933) >> at >> org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:66) >> at >> org.apache.spark.executor.CoarseGrainedExecutorBackend$.run(CoarseGrainedExecutorBackend.scala:188) >> at >> org.apache.spark.executor.CoarseGrainedExecutorBackend$.main(CoarseGrainedExecutorBackend.scala:284) >> at >> org.apache.spark.executor.CoarseGrainedExecutorBackend.main(CoarseGrainedExecutorBackend.scala) >> Caused by: java.lang.ClassNotFoundException: >> org.apache.spark.metrics.sink.HttpSink >> at java.net.URLClassLoader.findClass(URLClassLoader.java:381) >> at java.lang.ClassLoader.loadClass(ClassLoader.java:424) >> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335) >> at java.lang.ClassLoader.loadClass(ClassLoader.java:357) >> at java.lang.Class.forName0(Native Method) >> at java.lang.Class.forName(Class.java:348) >> at org.apache.spark.util.Utils$.classForName(Utils.scala:230) >> at >> org.apache.spark.metrics.MetricsSystem$$anonfun$registerSinks$1.apply(MetricsSystem.scala:198) >> at >> org.apache.spark.metrics.MetricsSystem$$anonfun$registerSinks$1.apply(MetricsSystem.scala:194) >> at >> scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99) >> at >> scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99) >> at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:230) >> at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40) >> at scala.collection.mutable.HashMap.foreach(HashMap.scala:99) >> at >> org.apache.spark.metrics.MetricsSystem.registerSinks(MetricsSystem.scala:194) >> at org.apache.spark.metrics.MetricsSystem.start(MetricsSystem.scala:102) >> at org.apache.spark.SparkEnv$.create(SparkEnv.scala:366) >> at org.apache.spark.SparkEnv$.createExecutorEnv(SparkEnv.scala:201) >> at >> org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$run$1.apply$mcV$sp(CoarseGrainedExecutorBackend.scala:223) >> at >> org.apache.spark.deploy.SparkHadoopUtil$$anon$2.run(SparkHadoopUtil.scala:67) >> at >> org.apache.spark.deploy.SparkHadoopUtil$$anon$2.run(SparkHadoopUtil.scala:66) >> at java.security.AccessController.doPrivileged(Native Method) >> at javax.security.auth.Subject.doAs(Subject.java:422) >> at >> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920) >> ... 4 more >> stdout0,*container_e81_1541584460930_3814_01_05� >> spark.log36118/12/21 04:58:00 ERROR >> org.apache.spark.metrics.MetricsSystem.logError:70 - Sink class >> org.apache.spark.metrics.sink.HttpSink cannot be instantiated >> >> >> >> > > > >-- >Marcelo
Re: Custom Metric Sink on Executor Always ClassNotFound
First, it's really weird to use "org.apache.spark" for a class that is not in Spark. For executors, the jar file of the sink needs to be in the system classpath; the application jar is not in the system classpath, so that does not work. There are different ways for you to get it there, most of them manual (YARN is, I think, the only RM supported in Spark where the application itself can do it). On Thu, Dec 20, 2018 at 1:48 PM prosp4300 wrote: > > Hi, Spark Users > > I'm play with spark metric monitoring, and want to add a custom sink which is > HttpSink that send the metric through Restful API > A subclass of Sink "org.apache.spark.metrics.sink.HttpSink" is created and > packaged within application jar > > It works for driver instance, but once enabled for executor instance, > following ClassNotFoundException will be throw out. This seems due to > MetricSystem is started very early for executor before application jar is > loaded. > > I wonder is there any way or best practice to add custom sink for executor > instance? > > 18/12/21 04:58:32 ERROR MetricsSystem: Sink class > org.apache.spark.metrics.sink.HttpSink cannot be instantiated > 18/12/21 04:58:32 WARN UserGroupInformation: PriviledgedActionException > as:yarn (auth:SIMPLE) cause:java.lang.ClassNotFoundException: > org.apache.spark.metrics.sink.HttpSink > Exception in thread "main" java.lang.reflect.UndeclaredThrowableException > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1933) > at > org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:66) > at > org.apache.spark.executor.CoarseGrainedExecutorBackend$.run(CoarseGrainedExecutorBackend.scala:188) > at > org.apache.spark.executor.CoarseGrainedExecutorBackend$.main(CoarseGrainedExecutorBackend.scala:284) > at > org.apache.spark.executor.CoarseGrainedExecutorBackend.main(CoarseGrainedExecutorBackend.scala) > Caused by: java.lang.ClassNotFoundException: > org.apache.spark.metrics.sink.HttpSink > at java.net.URLClassLoader.findClass(URLClassLoader.java:381) > at java.lang.ClassLoader.loadClass(ClassLoader.java:424) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335) > at java.lang.ClassLoader.loadClass(ClassLoader.java:357) > at java.lang.Class.forName0(Native Method) > at java.lang.Class.forName(Class.java:348) > at org.apache.spark.util.Utils$.classForName(Utils.scala:230) > at > org.apache.spark.metrics.MetricsSystem$$anonfun$registerSinks$1.apply(MetricsSystem.scala:198) > at > org.apache.spark.metrics.MetricsSystem$$anonfun$registerSinks$1.apply(MetricsSystem.scala:194) > at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99) > at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99) > at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:230) > at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40) > at scala.collection.mutable.HashMap.foreach(HashMap.scala:99) > at > org.apache.spark.metrics.MetricsSystem.registerSinks(MetricsSystem.scala:194) > at org.apache.spark.metrics.MetricsSystem.start(MetricsSystem.scala:102) > at org.apache.spark.SparkEnv$.create(SparkEnv.scala:366) > at org.apache.spark.SparkEnv$.createExecutorEnv(SparkEnv.scala:201) > at > org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$run$1.apply$mcV$sp(CoarseGrainedExecutorBackend.scala:223) > at > org.apache.spark.deploy.SparkHadoopUtil$$anon$2.run(SparkHadoopUtil.scala:67) > at > org.apache.spark.deploy.SparkHadoopUtil$$anon$2.run(SparkHadoopUtil.scala:66) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:422) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920) > ... 4 more > stdout0,*container_e81_1541584460930_3814_01_05� > spark.log36118/12/21 04:58:00 ERROR > org.apache.spark.metrics.MetricsSystem.logError:70 - Sink class > org.apache.spark.metrics.sink.HttpSink cannot be instantiated > > > > -- Marcelo - To unsubscribe e-mail: user-unsubscr...@spark.apache.org
Custom Metric Sink on Executor Always ClassNotFound
Hi, Spark Users I'm play with spark metric monitoring, and want to add a custom sink which is HttpSink that send the metric through Restful API A subclass of Sink "org.apache.spark.metrics.sink.HttpSink" is created and packaged within application jar It works for driver instance, but once enabled for executor instance, following ClassNotFoundException will be throw out. This seems due to MetricSystem is started very early for executor before application jar is loaded. I wonder is there any way or best practice to add custom sink for executor instance? 18/12/21 04:58:32 ERROR MetricsSystem: Sink class org.apache.spark.metrics.sink.HttpSink cannot be instantiated 18/12/21 04:58:32 WARN UserGroupInformation: PriviledgedActionException as:yarn (auth:SIMPLE) cause:java.lang.ClassNotFoundException: org.apache.spark.metrics.sink.HttpSink Exception in thread "main" java.lang.reflect.UndeclaredThrowableException at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1933) at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:66) at org.apache.spark.executor.CoarseGrainedExecutorBackend$.run(CoarseGrainedExecutorBackend.scala:188) at org.apache.spark.executor.CoarseGrainedExecutorBackend$.main(CoarseGrainedExecutorBackend.scala:284) at org.apache.spark.executor.CoarseGrainedExecutorBackend.main(CoarseGrainedExecutorBackend.scala) Caused by: java.lang.ClassNotFoundException: org.apache.spark.metrics.sink.HttpSink at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:348) at org.apache.spark.util.Utils$.classForName(Utils.scala:230) at org.apache.spark.metrics.MetricsSystem$$anonfun$registerSinks$1.apply(MetricsSystem.scala:198) at org.apache.spark.metrics.MetricsSystem$$anonfun$registerSinks$1.apply(MetricsSystem.scala:194) at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99) at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99) at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:230) at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40) at scala.collection.mutable.HashMap.foreach(HashMap.scala:99) at org.apache.spark.metrics.MetricsSystem.registerSinks(MetricsSystem.scala:194) at org.apache.spark.metrics.MetricsSystem.start(MetricsSystem.scala:102) at org.apache.spark.SparkEnv$.create(SparkEnv.scala:366) at org.apache.spark.SparkEnv$.createExecutorEnv(SparkEnv.scala:201) at org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$run$1.apply$mcV$sp(CoarseGrainedExecutorBackend.scala:223) at org.apache.spark.deploy.SparkHadoopUtil$$anon$2.run(SparkHadoopUtil.scala:67) at org.apache.spark.deploy.SparkHadoopUtil$$anon$2.run(SparkHadoopUtil.scala:66) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:422) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920) ... 4 more stdout0,*container_e81_1541584460930_3814_01_05� spark.log36118/12/21 04:58:00 ERROR org.apache.spark.metrics.MetricsSystem.logError:70 - Sink class org.apache.spark.metrics.sink.HttpSink cannot be instantiated