Hi all, I am thinking of starting work on a profiler for Spark clusters. The current idea is that it would collect jstacks from executor nodes and put them into a central index (either a database or elasticsearch), and it would present them to people in a UI that would let people slice and dice the jstacks based on what job was running at the time, and what executor node was running. In addition, the UI would also present time spent doing non-computational work, such as shuffling and input/output IO. In a future extension, we might support reading from JMX and/or a JVM agent to get more precise data.
I know that it's already possible to use YourKit to profile individual processes, but YourKit costs money, needs a desktop client to be installed, and doesn't place its data in the context relevant to a Spark cluster. Does something like this already exist (or is such a project already in progress)? Do you have any feedback or recommendations for how to go about it? Thanks! Punya
smime.p7s
Description: S/MIME cryptographic signature