Hi all,

I am thinking of starting work on a profiler for Spark clusters. The current
idea is that it would collect jstacks from executor nodes and put them into
a central index (either a database or elasticsearch), and it would present
them to people in a UI that would let people slice and dice the jstacks
based on what job was running at the time, and what executor node was
running. In addition, the UI would also present time spent doing
non-computational work, such as shuffling and input/output IO. In a future
extension, we might support reading from JMX and/or a JVM agent to get more
precise data.

I know that it's already possible to use YourKit to profile individual
processes, but YourKit costs money, needs a desktop client to be installed,
and doesn't place its data in the context relevant to a Spark cluster.

Does something like this already exist (or is such a project already in
progress)? Do you have any feedback or recommendations for how to go about
it?

Thanks!
Punya



Attachment: smime.p7s
Description: S/MIME cryptographic signature

Reply via email to