Re: Custom Metric Sink on Executor Always ClassNotFound

2018-12-21 Thread Thakrar, Jayesh
Just curious - is this HttpSink your own custom sink or Dropwizard 
configuration?

If your own custom code, I would suggest looking/trying out the Dropwizard.
See 
http://spark.apache.org/docs/latest/monitoring.html#metrics
https://metrics.dropwizard.io/4.0.0/

Also, from what I know, the metrics from the tasks/executors are sent as 
accumulator values to the driver and the driver makes it available to the 
desired sink.

Furthermore, even without a custom HttpSink, there's already a builtin REST API 
available that provides you metrics
See http://spark.apache.org/docs/latest/monitoring.html#rest-api

While you can surely create your own custom sink (code), I would say try out 
custom configuration first as it will make Spark upgrades easy.

On 12/20/18, 3:53 PM, "Marcelo Vanzin"  wrote:

First, it's really weird to use "org.apache.spark" for a class that is
not in Spark.

For executors, the jar file of the sink needs to be in the system
classpath; the application jar is not in the system classpath, so that
does not work. There are different ways for you to get it there, most
of them manual (YARN is, I think, the only RM supported in Spark where
the application itself can do it).

On Thu, Dec 20, 2018 at 1:48 PM prosp4300  wrote:
>
> Hi, Spark Users
>
> I'm play with spark metric monitoring, and want to add a custom sink 
which is HttpSink that send the metric through Restful API
> A subclass of Sink "org.apache.spark.metrics.sink.HttpSink" is created 
and packaged within application jar
>
> It works for driver instance, but once enabled for executor instance, 
following ClassNotFoundException will be throw out. This seems due to 
MetricSystem is started very early for executor before application jar is 
loaded.
>
> I wonder is there any way or best practice to add custom sink for 
executor instance?
>
> 18/12/21 04:58:32 ERROR MetricsSystem: Sink class 
org.apache.spark.metrics.sink.HttpSink cannot be instantiated
> 18/12/21 04:58:32 WARN UserGroupInformation: PriviledgedActionException 
as:yarn (auth:SIMPLE) cause:java.lang.ClassNotFoundException: 
org.apache.spark.metrics.sink.HttpSink
> Exception in thread "main" java.lang.reflect.UndeclaredThrowableException
> at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1933)
> at 
org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:66)
> at 
org.apache.spark.executor.CoarseGrainedExecutorBackend$.run(CoarseGrainedExecutorBackend.scala:188)
> at 
org.apache.spark.executor.CoarseGrainedExecutorBackend$.main(CoarseGrainedExecutorBackend.scala:284)
> at 
org.apache.spark.executor.CoarseGrainedExecutorBackend.main(CoarseGrainedExecutorBackend.scala)
> Caused by: java.lang.ClassNotFoundException: 
org.apache.spark.metrics.sink.HttpSink
> at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
> at java.lang.Class.forName0(Native Method)
> at java.lang.Class.forName(Class.java:348)
> at org.apache.spark.util.Utils$.classForName(Utils.scala:230)
> at 
org.apache.spark.metrics.MetricsSystem$$anonfun$registerSinks$1.apply(MetricsSystem.scala:198)
> at 
org.apache.spark.metrics.MetricsSystem$$anonfun$registerSinks$1.apply(MetricsSystem.scala:194)
> at 
scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
> at 
scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
> at 
scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:230)
> at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40)
> at scala.collection.mutable.HashMap.foreach(HashMap.scala:99)
> at 
org.apache.spark.metrics.MetricsSystem.registerSinks(MetricsSystem.scala:194)
> at org.apache.spark.metrics.MetricsSystem.start(MetricsSystem.scala:102)
> at org.apache.spark.SparkEnv$.create(SparkEnv.scala:366)
> at org.apache.spark.SparkEnv$.createExecutorEnv(SparkEnv.scala:201)
> at 
org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$run$1.apply$mcV$sp(CoarseGrainedExecutorBackend.scala:223)
> at 
org.apache.spark.deploy.SparkHadoopUtil$$anon$2.run(SparkHadoopUtil.scala:67)
> at 
org.apache.spark.deploy.SparkHadoopUtil$$anon$2.run(SparkHadoopUtil.scala:66)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920)
> ... 4 more
> stdout0,*container_e81_1541584460930_3814_01_05�
> spark.log36118/12/21 04:58:00 ERROR 
org.apache.spark.metrics.MetricsSystem.logError:70 - Sink class 
org.apache.spark.metrics.sink.HttpSink c

Re:Re: Custom Metric Sink on Executor Always ClassNotFound

2018-12-20 Thread prosp4300



Thanks a lot for the explanation
Spark declare the Sink trait with package private, that's why the package looks 
weird, the metric system seems not intent to be extended
package org.apache.spark.metrics.sink
private[spark] trait Sink
Make the custom sink class available on every executor system classpath is what 
an application developer want to avoid, because the sink only required for 
specific application, and it can be difficult to maintain.
If it's possible to get MetricSystem at executor level and register the custom 
sink there, then the problem can be resolved in a better way, not sure how to 
achieve this.
Thanks a lot










At 2018-12-21 05:53:31, "Marcelo Vanzin"  wrote:
>First, it's really weird to use "org.apache.spark" for a class that is
>not in Spark.
>
>For executors, the jar file of the sink needs to be in the system
>classpath; the application jar is not in the system classpath, so that
>does not work. There are different ways for you to get it there, most
>of them manual (YARN is, I think, the only RM supported in Spark where
>the application itself can do it).
>
>On Thu, Dec 20, 2018 at 1:48 PM prosp4300  wrote:
>>
>> Hi, Spark Users
>>
>> I'm play with spark metric monitoring, and want to add a custom sink which 
>> is HttpSink that send the metric through Restful API
>> A subclass of Sink "org.apache.spark.metrics.sink.HttpSink" is created and 
>> packaged within application jar
>>
>> It works for driver instance, but once enabled for executor instance, 
>> following ClassNotFoundException will be throw out. This seems due to 
>> MetricSystem is started very early for executor before application jar is 
>> loaded.
>>
>> I wonder is there any way or best practice to add custom sink for executor 
>> instance?
>>
>> 18/12/21 04:58:32 ERROR MetricsSystem: Sink class 
>> org.apache.spark.metrics.sink.HttpSink cannot be instantiated
>> 18/12/21 04:58:32 WARN UserGroupInformation: PriviledgedActionException 
>> as:yarn (auth:SIMPLE) cause:java.lang.ClassNotFoundException: 
>> org.apache.spark.metrics.sink.HttpSink
>> Exception in thread "main" java.lang.reflect.UndeclaredThrowableException
>> at 
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1933)
>> at 
>> org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:66)
>> at 
>> org.apache.spark.executor.CoarseGrainedExecutorBackend$.run(CoarseGrainedExecutorBackend.scala:188)
>> at 
>> org.apache.spark.executor.CoarseGrainedExecutorBackend$.main(CoarseGrainedExecutorBackend.scala:284)
>> at 
>> org.apache.spark.executor.CoarseGrainedExecutorBackend.main(CoarseGrainedExecutorBackend.scala)
>> Caused by: java.lang.ClassNotFoundException: 
>> org.apache.spark.metrics.sink.HttpSink
>> at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
>> at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335)
>> at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>> at java.lang.Class.forName0(Native Method)
>> at java.lang.Class.forName(Class.java:348)
>> at org.apache.spark.util.Utils$.classForName(Utils.scala:230)
>> at 
>> org.apache.spark.metrics.MetricsSystem$$anonfun$registerSinks$1.apply(MetricsSystem.scala:198)
>> at 
>> org.apache.spark.metrics.MetricsSystem$$anonfun$registerSinks$1.apply(MetricsSystem.scala:194)
>> at 
>> scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
>> at 
>> scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
>> at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:230)
>> at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40)
>> at scala.collection.mutable.HashMap.foreach(HashMap.scala:99)
>> at 
>> org.apache.spark.metrics.MetricsSystem.registerSinks(MetricsSystem.scala:194)
>> at org.apache.spark.metrics.MetricsSystem.start(MetricsSystem.scala:102)
>> at org.apache.spark.SparkEnv$.create(SparkEnv.scala:366)
>> at org.apache.spark.SparkEnv$.createExecutorEnv(SparkEnv.scala:201)
>> at 
>> org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$run$1.apply$mcV$sp(CoarseGrainedExecutorBackend.scala:223)
>> at 
>> org.apache.spark.deploy.SparkHadoopUtil$$anon$2.run(SparkHadoopUtil.scala:67)
>> at 
>> org.apache.spark.deploy.SparkHadoopUtil$$anon$2.run(SparkHadoopUtil.scala:66)
>> at java.security.AccessController.doPrivileged(Native Method)
>> at javax.security.auth.Subject.doAs(Subject.java:422)
>> at 
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920)
>> ... 4 more
>> stdout0,*container_e81_1541584460930_3814_01_05�
>> spark.log36118/12/21 04:58:00 ERROR 
>> org.apache.spark.metrics.MetricsSystem.logError:70 - Sink class 
>> org.apache.spark.metrics.sink.HttpSink cannot be instantiated
>>
>>
>>
>>
>
>
>
>-- 
>Marcelo


Re: Custom Metric Sink on Executor Always ClassNotFound

2018-12-20 Thread Marcelo Vanzin
First, it's really weird to use "org.apache.spark" for a class that is
not in Spark.

For executors, the jar file of the sink needs to be in the system
classpath; the application jar is not in the system classpath, so that
does not work. There are different ways for you to get it there, most
of them manual (YARN is, I think, the only RM supported in Spark where
the application itself can do it).

On Thu, Dec 20, 2018 at 1:48 PM prosp4300  wrote:
>
> Hi, Spark Users
>
> I'm play with spark metric monitoring, and want to add a custom sink which is 
> HttpSink that send the metric through Restful API
> A subclass of Sink "org.apache.spark.metrics.sink.HttpSink" is created and 
> packaged within application jar
>
> It works for driver instance, but once enabled for executor instance, 
> following ClassNotFoundException will be throw out. This seems due to 
> MetricSystem is started very early for executor before application jar is 
> loaded.
>
> I wonder is there any way or best practice to add custom sink for executor 
> instance?
>
> 18/12/21 04:58:32 ERROR MetricsSystem: Sink class 
> org.apache.spark.metrics.sink.HttpSink cannot be instantiated
> 18/12/21 04:58:32 WARN UserGroupInformation: PriviledgedActionException 
> as:yarn (auth:SIMPLE) cause:java.lang.ClassNotFoundException: 
> org.apache.spark.metrics.sink.HttpSink
> Exception in thread "main" java.lang.reflect.UndeclaredThrowableException
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1933)
> at 
> org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:66)
> at 
> org.apache.spark.executor.CoarseGrainedExecutorBackend$.run(CoarseGrainedExecutorBackend.scala:188)
> at 
> org.apache.spark.executor.CoarseGrainedExecutorBackend$.main(CoarseGrainedExecutorBackend.scala:284)
> at 
> org.apache.spark.executor.CoarseGrainedExecutorBackend.main(CoarseGrainedExecutorBackend.scala)
> Caused by: java.lang.ClassNotFoundException: 
> org.apache.spark.metrics.sink.HttpSink
> at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
> at java.lang.Class.forName0(Native Method)
> at java.lang.Class.forName(Class.java:348)
> at org.apache.spark.util.Utils$.classForName(Utils.scala:230)
> at 
> org.apache.spark.metrics.MetricsSystem$$anonfun$registerSinks$1.apply(MetricsSystem.scala:198)
> at 
> org.apache.spark.metrics.MetricsSystem$$anonfun$registerSinks$1.apply(MetricsSystem.scala:194)
> at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
> at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
> at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:230)
> at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40)
> at scala.collection.mutable.HashMap.foreach(HashMap.scala:99)
> at 
> org.apache.spark.metrics.MetricsSystem.registerSinks(MetricsSystem.scala:194)
> at org.apache.spark.metrics.MetricsSystem.start(MetricsSystem.scala:102)
> at org.apache.spark.SparkEnv$.create(SparkEnv.scala:366)
> at org.apache.spark.SparkEnv$.createExecutorEnv(SparkEnv.scala:201)
> at 
> org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$run$1.apply$mcV$sp(CoarseGrainedExecutorBackend.scala:223)
> at 
> org.apache.spark.deploy.SparkHadoopUtil$$anon$2.run(SparkHadoopUtil.scala:67)
> at 
> org.apache.spark.deploy.SparkHadoopUtil$$anon$2.run(SparkHadoopUtil.scala:66)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920)
> ... 4 more
> stdout0,*container_e81_1541584460930_3814_01_05�
> spark.log36118/12/21 04:58:00 ERROR 
> org.apache.spark.metrics.MetricsSystem.logError:70 - Sink class 
> org.apache.spark.metrics.sink.HttpSink cannot be instantiated
>
>
>
>



-- 
Marcelo

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Custom Metric Sink on Executor Always ClassNotFound

2018-12-20 Thread prosp4300
Hi, Spark Users


I'm play with spark metric monitoring, and want to add a custom sink which is 
HttpSink that send the metric through Restful API 
A subclass of Sink "org.apache.spark.metrics.sink.HttpSink" is created and 
packaged within application jar


It works for driver instance, but once enabled for executor instance, following 
ClassNotFoundException will be throw out. This seems due to MetricSystem is 
started very early for executor before application jar is loaded.


I wonder is there any way or best practice to add custom sink for executor 
instance? 


18/12/21 04:58:32 ERROR MetricsSystem: Sink class 
org.apache.spark.metrics.sink.HttpSink cannot be instantiated
18/12/21 04:58:32 WARN UserGroupInformation: PriviledgedActionException as:yarn 
(auth:SIMPLE) cause:java.lang.ClassNotFoundException: 
org.apache.spark.metrics.sink.HttpSink
Exception in thread "main" java.lang.reflect.UndeclaredThrowableException
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1933)
at 
org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:66)
at 
org.apache.spark.executor.CoarseGrainedExecutorBackend$.run(CoarseGrainedExecutorBackend.scala:188)
at 
org.apache.spark.executor.CoarseGrainedExecutorBackend$.main(CoarseGrainedExecutorBackend.scala:284)
at 
org.apache.spark.executor.CoarseGrainedExecutorBackend.main(CoarseGrainedExecutorBackend.scala)
Caused by: java.lang.ClassNotFoundException: 
org.apache.spark.metrics.sink.HttpSink
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at org.apache.spark.util.Utils$.classForName(Utils.scala:230)
at 
org.apache.spark.metrics.MetricsSystem$$anonfun$registerSinks$1.apply(MetricsSystem.scala:198)
at 
org.apache.spark.metrics.MetricsSystem$$anonfun$registerSinks$1.apply(MetricsSystem.scala:194)
at 
scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
at 
scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
at 
scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:230)
at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40)
at scala.collection.mutable.HashMap.foreach(HashMap.scala:99)
at 
org.apache.spark.metrics.MetricsSystem.registerSinks(MetricsSystem.scala:194)
at org.apache.spark.metrics.MetricsSystem.start(MetricsSystem.scala:102)
at org.apache.spark.SparkEnv$.create(SparkEnv.scala:366)
at org.apache.spark.SparkEnv$.createExecutorEnv(SparkEnv.scala:201)
at 
org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$run$1.apply$mcV$sp(CoarseGrainedExecutorBackend.scala:223)
at 
org.apache.spark.deploy.SparkHadoopUtil$$anon$2.run(SparkHadoopUtil.scala:67)
at 
org.apache.spark.deploy.SparkHadoopUtil$$anon$2.run(SparkHadoopUtil.scala:66)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920)
... 4 more
stdout0,*container_e81_1541584460930_3814_01_05�
spark.log36118/12/21 04:58:00 ERROR 
org.apache.spark.metrics.MetricsSystem.logError:70 - Sink class 
org.apache.spark.metrics.sink.HttpSink cannot be instantiated