What I'd like is a way to capture the information provided on the stages page (i.e. cluster:4040/stages via IndexPage). Looking through the Spark code, it doesn't seem like it is possible to directly query for specific facts such as how many tasks have succeeded or how many total tasks there are for a given active stage. Instead, it looks like all the data for the page is generated at once using information from the JobProgressListener. It doesn't seem like I have any way to programmatically access this information myself. I can't even instantiate my own JobProgressListener because it is spark package private. I could implement my SparkListener and gather up the information myself. It feels a bit awkward since classes like Task and TaskInfo are also spark package private. It does seem possible to gather up what I need but it seems like this sort of information should just be available without by implementing a custom SparkListener (or worse screen scraping the html generated by StageTable!)

I was hoping that I would find the answer in MetricsServlet which is turned on by default. It seems that when I visit http://cluster:4040/metrics/json/ I should be able to get everything I want but I don't see the basic stage/task progress information I would expect. Are there special metrics properties that I should set to get this info? I think this would be the best solution - just give it the right URL and parse the resulting JSON - but I can't seem to figure out how to do this or if it is possible.

Any advice is appreciated.


On 04/01/2014 09:43 AM, Philip Ogren wrote:
Hi DB,

Just wondering if you ever got an answer to your question about monitoring progress - either offline or through your own investigation. Any findings would be appreciated.


On 01/30/2014 10:32 PM, DB Tsai wrote:
Hi guys,

When we're running a very long job, we would like to show users the current progress of map and reduce job. After looking at the api document, I don't find anything for this. However, in Spark UI, I could see the progress of the task. Is there anything I miss?



DB Tsai
Machine Learning Engineer
Alpine Data Labs
Web: http://alpinenow.com/

Reply via email to