Is there a way to customize what metrics sources are routed to what sinks? 

If I understood the docs <https://spark.apache.org/docs/latest/monitoring.html> 
correctly, there are some global switches for enabling sources, e.g. 
spark.metrics.staticSources.enabled, 
spark.metrics.executorMetricsSource.enabled.

We would like to specify Source -> Sink routing on a namespace basis. The use 
case is the following: we would like to have Prometheus monitoring for our 
Spark jobs. A large majority of metrics are to be exposed using the 
experimental Prometheus endpoint for direct scraping. However, we would like to 
expose a select set of metrics through a push gateway, as we want to guarantee 
that these metrics are scraped. For example a counter for the number of rows 
written to each inserted table, etc. These are reported mostly at the end of a 
batch ingestion job, so a push model is a better fit. We created a dedicated 
DropWizard MetricsRegistry for these custom metrics and are using 
https://github.com/banzaicloud/spark-metrics 
<https://github.com/banzaicloud/spark-metrics> for pushing the metrics to the 
PGW. However pushing all the metrics to the gateway overloads it, and is 
unnecessary to be duplicated there. 

Ideally there should be a way to route batch-like metrics to this sink and 
having the rest of the gauges exposed through the normal prometheus sink.

Is this something that could be solved with configuration currently, or 
requires custom code on the plugin side?

Thanks,
David Szakallas



Reply via email to