Performance Instrumentation for Spark Jobs

Neil Ferguson Mon, 23 Feb 2015 16:03:53 -0800

Hi all

I wanted to share some details about something I've been working on with
the folks on the ADAM project: performance instrumentation for Spark jobs.


We've added a module to the bdg-utils project (
https://github.com/bigdatagenomics/bdg-utils) to enable Spark users to
instrument RDD operations, and their associated function calls (as well as
arbitrary Scala function calls).

See the "Instrumentation" section in the documentation for more details.
There's some reasonably detailed documentation there.

To get started, just add the bdg-utils-metrics artifact to your project.
For example, in Maven:

<dependency>
<groupId>org.bdgenomics.bdg-utils</groupId>
<artifactId>bdg-utils-metrics</artifactId>
<version>0.1.2</version>
</dependency>

Hope this is useful, and let me know if you have any questions.

Neil

Performance Instrumentation for Spark Jobs

Reply via email to