Opening up metrics interfaces

2015-08-27 Thread Atsu Kakitani
Hi,

I was wondering if there are any plans to open up the API for Spark's
metrics system. I want to write custom sources and sinks, but these
interfaces aren't public right now. I saw that there was also an issue open
for this (https://issues.apache.org/jira/browse/SPARK-5630), but it hasn't
been addressed - is there a reason why these interfaces are kept private?

Thanks,
Atsu


Re: Opening up metrics interfaces

2015-08-27 Thread Reynold Xin
I'd like this to happen, but it hasn't been super high priority on
anybody's mind.

There are a couple things that could be good to do:

1. At the application level: consolidate task metrics and accumulators.
They have substantial overlap, and from high level should just be
consolidated. Maybe there are some differences in semantics w.r.t. retries
or fault-tolerance, but those can be just modes in the consolidated
interface/implementation.

Once we do that, then users effectively can use the new consolidated
interface to add new metrics.

2. At the process/service monitoring level: expose an internal metrics
interface to make it easier to create new metrics and publish them via a
rest interface. Last time I looked at this (~4 weeks ago), publication of
the current metrics was not as straightforward as I was hoping for. We use
the codahale library only in some places (IIRC just the cluster manager,
but not the actual executors). It'd make sense to create a simple wrapper
for the coda hale library and make it easier to create new metrics.


On Thu, Aug 27, 2015 at 12:21 PM, Atsu Kakitani atkakit...@groupon.com
wrote:

 Hi,

 I was wondering if there are any plans to open up the API for Spark's
 metrics system. I want to write custom sources and sinks, but these
 interfaces aren't public right now. I saw that there was also an issue open
 for this (https://issues.apache.org/jira/browse/SPARK-5630), but it
 hasn't been addressed - is there a reason why these interfaces are kept
 private?

 Thanks,
 Atsu



Re: Opening up metrics interfaces

2015-08-27 Thread Thomas Dudziak
+1. I'd love to simply define a timer in my code (maybe metrics-scala ?)
using Spark's metrics registry. Also maybe switch to the newer version
(io.dropwizard.metrics) ?

On Thu, Aug 27, 2015 at 4:42 PM, Reynold Xin r...@databricks.com wrote:

 I'd like this to happen, but it hasn't been super high priority on
 anybody's mind.

 There are a couple things that could be good to do:

 1. At the application level: consolidate task metrics and accumulators.
 They have substantial overlap, and from high level should just be
 consolidated. Maybe there are some differences in semantics w.r.t. retries
 or fault-tolerance, but those can be just modes in the consolidated
 interface/implementation.

 Once we do that, then users effectively can use the new consolidated
 interface to add new metrics.

 2. At the process/service monitoring level: expose an internal metrics
 interface to make it easier to create new metrics and publish them via a
 rest interface. Last time I looked at this (~4 weeks ago), publication of
 the current metrics was not as straightforward as I was hoping for. We use
 the codahale library only in some places (IIRC just the cluster manager,
 but not the actual executors). It'd make sense to create a simple wrapper
 for the coda hale library and make it easier to create new metrics.


 On Thu, Aug 27, 2015 at 12:21 PM, Atsu Kakitani atkakit...@groupon.com
 wrote:

 Hi,

 I was wondering if there are any plans to open up the API for Spark's
 metrics system. I want to write custom sources and sinks, but these
 interfaces aren't public right now. I saw that there was also an issue open
 for this (https://issues.apache.org/jira/browse/SPARK-5630), but it
 hasn't been addressed - is there a reason why these interfaces are kept
 private?

 Thanks,
 Atsu