Opening up metrics interfaces
Hi, I was wondering if there are any plans to open up the API for Spark's metrics system. I want to write custom sources and sinks, but these interfaces aren't public right now. I saw that there was also an issue open for this (https://issues.apache.org/jira/browse/SPARK-5630), but it hasn't been addressed - is there a reason why these interfaces are kept private? Thanks, Atsu
Re: Opening up metrics interfaces
I'd like this to happen, but it hasn't been super high priority on anybody's mind. There are a couple things that could be good to do: 1. At the application level: consolidate task metrics and accumulators. They have substantial overlap, and from high level should just be consolidated. Maybe there are some differences in semantics w.r.t. retries or fault-tolerance, but those can be just modes in the consolidated interface/implementation. Once we do that, then users effectively can use the new consolidated interface to add new metrics. 2. At the process/service monitoring level: expose an internal metrics interface to make it easier to create new metrics and publish them via a rest interface. Last time I looked at this (~4 weeks ago), publication of the current metrics was not as straightforward as I was hoping for. We use the codahale library only in some places (IIRC just the cluster manager, but not the actual executors). It'd make sense to create a simple wrapper for the coda hale library and make it easier to create new metrics. On Thu, Aug 27, 2015 at 12:21 PM, Atsu Kakitani atkakit...@groupon.com wrote: Hi, I was wondering if there are any plans to open up the API for Spark's metrics system. I want to write custom sources and sinks, but these interfaces aren't public right now. I saw that there was also an issue open for this (https://issues.apache.org/jira/browse/SPARK-5630), but it hasn't been addressed - is there a reason why these interfaces are kept private? Thanks, Atsu
Re: Opening up metrics interfaces
+1. I'd love to simply define a timer in my code (maybe metrics-scala ?) using Spark's metrics registry. Also maybe switch to the newer version (io.dropwizard.metrics) ? On Thu, Aug 27, 2015 at 4:42 PM, Reynold Xin r...@databricks.com wrote: I'd like this to happen, but it hasn't been super high priority on anybody's mind. There are a couple things that could be good to do: 1. At the application level: consolidate task metrics and accumulators. They have substantial overlap, and from high level should just be consolidated. Maybe there are some differences in semantics w.r.t. retries or fault-tolerance, but those can be just modes in the consolidated interface/implementation. Once we do that, then users effectively can use the new consolidated interface to add new metrics. 2. At the process/service monitoring level: expose an internal metrics interface to make it easier to create new metrics and publish them via a rest interface. Last time I looked at this (~4 weeks ago), publication of the current metrics was not as straightforward as I was hoping for. We use the codahale library only in some places (IIRC just the cluster manager, but not the actual executors). It'd make sense to create a simple wrapper for the coda hale library and make it easier to create new metrics. On Thu, Aug 27, 2015 at 12:21 PM, Atsu Kakitani atkakit...@groupon.com wrote: Hi, I was wondering if there are any plans to open up the API for Spark's metrics system. I want to write custom sources and sinks, but these interfaces aren't public right now. I saw that there was also an issue open for this (https://issues.apache.org/jira/browse/SPARK-5630), but it hasn't been addressed - is there a reason why these interfaces are kept private? Thanks, Atsu