Hi All, What is the best way to instrument metrics of Spark Application from both Driver and Executor.
I am trying to send my Spark application metrics into Kafka. I found two approaches. *Approach 1: * Implement custom Source and Sink and use the Source for instrumenting from both Driver and Executor(By using SparkEnv.metricSystem). *Approach 2:* Write dropwizard/gobblin KafkaReporter and use it for instrumentation from both Driver/Executor Which one will be better approach? I tried to go with Approach 1, but when I launch my application all the containers are getting killed. The steps I did is as below: 1. As there is no KafkaSink from org.apache.spark.metrics.sink, I have implemented my custom KafkaSink and KafkaReporter as suggested in https://github.com/erikerlandson/spark-kafka-sink 2. Implemented SparkMetricsSource by extending org.apache.spark.metrics.source.Source 3. registered the source val sparkMetricsSource = new SparkMetricsSource("spark.xyz.app.prefix") SparkEnv.get.metricsSystem.registerSource(sparkMetricsSource) 4. Instrumented the metrics sparkMetricsSource.registerGauge(sparkEnv.spark.sparkContext.applicationId, schema, "app-start", System.currentTimeMillis) 5. Configured the Sink through spark properties Thanks & Regards, B Anil Kumar.
