Hi All,

What is the best way to instrument metrics of Spark Application from both
Driver and Executor.

I am trying to send my Spark application metrics into Kafka. I found two
approaches.

*Approach 1: * Implement custom Source and Sink and use the Source for
instrumenting from both Driver and Executor(By using SparkEnv.metricSystem).

*Approach 2:* Write dropwizard/gobblin KafkaReporter and use it for
instrumentation from both Driver/Executor

Which one will be better approach?

I tried to go with Approach 1, but when I launch my application all the
containers are getting killed.

The steps I did is as below:
1. As there is no KafkaSink from org.apache.spark.metrics.sink, I have
implemented my custom KafkaSink and KafkaReporter as suggested in
https://github.com/erikerlandson/spark-kafka-sink
2. Implemented SparkMetricsSource by extending
org.apache.spark.metrics.source.Source
3. registered the source
      val sparkMetricsSource = new
SparkMetricsSource("spark.xyz.app.prefix")
       SparkEnv.get.metricsSystem.registerSource(sparkMetricsSource)
4. Instrumented the metrics

 sparkMetricsSource.registerGauge(sparkEnv.spark.sparkContext.applicationId,
schema, "app-start", System.currentTimeMillis)
5. Configured the Sink through spark properties


Thanks & Regards,
B Anil Kumar.

Reply via email to