I’ve been trying to set up monitoring for our Spark 3.0.1 cluster running in K8s. We are using Prometheus as our monitoring system. We require both executor and driver metrics. My initial approach was to use the following configuration, to expose both metrics on the Spark UI:
{
'spark.ui.prometheus.enabled': ‘true’
}
I was able to scrape http://<driver_hostname>:4040/metrics/prometheus/ for
driver and http://<driver_hostname>:4040/metrics/executors/prometheus/ for
executor metrics. However, the executor metrics only contain those defined
here: https://spark.apache.org/docs/latest/monitoring.html#executor-metrics
<https://spark.apache.org/docs/latest/monitoring.html#executor-metrics>, which
is referred to as ExecutorSummary. However, I would like to get all metrics
from the Executor instance metric system:
https://spark.apache.org/docs/latest/monitoring.html#component-instance--executor
<https://spark.apache.org/docs/latest/monitoring.html#component-instance--executor>.
I am not sure if these are available on the driver at all, so I’ve been
thinking of directly scraping the executors instead. It seems PrometheusServlet
is meant for this purpose, however the executors aren't running web servers. I
also don’t seem to find a configuration setting to open up a port on the
executor container, so that it can be scraped. So the thing I have in my mind
right now is writing a custom sink that exports the metrics in the Prometheus
format to a local file, and running a sidecar container with a nginx that
serves that static file. In turn the nginx endpoint can be scraped by
Prometheus. Am I overcomplicating this? Is there a simpler approach?
Thanks,
David Szakallas
signature.asc
Description: Message signed with OpenPGP
