What I'd like is a way to capture the information provided on the stages
page (i.e. cluster:4040/stages via IndexPage). Looking through the
Spark code, it doesn't seem like it is possible to directly query for
specific facts such as how many tasks have succeeded or how many total
tasks there are for a given active stage. Instead, it looks like all
the data for the page is generated at once using information from the
JobProgressListener. It doesn't seem like I have any way to
programmatically access this information myself. I can't even
instantiate my own JobProgressListener because it is spark package
private. I could implement my SparkListener and gather up the
information myself. It feels a bit awkward since classes like Task and
TaskInfo are also spark package private. It does seem possible to
gather up what I need but it seems like this sort of information should
just be available without by implementing a custom SparkListener (or
worse screen scraping the html generated by StageTable!)
I was hoping that I would find the answer in MetricsServlet which is
turned on by default. It seems that when I visit
http://cluster:4040/metrics/json/ I should be able to get everything I
want but I don't see the basic stage/task progress information I would
expect. Are there special metrics properties that I should set to get
this info? I think this would be the best solution - just give it the
right URL and parse the resulting JSON - but I can't seem to figure out
how to do this or if it is possible.
Any advice is appreciated.
Thanks,
Philip
On 04/01/2014 09:43 AM, Philip Ogren wrote:
Hi DB,
Just wondering if you ever got an answer to your question about
monitoring progress - either offline or through your own
investigation. Any findings would be appreciated.
Thanks,
Philip
On 01/30/2014 10:32 PM, DB Tsai wrote:
Hi guys,
When we're running a very long job, we would like to show users the
current progress of map and reduce job. After looking at the api
document, I don't find anything for this. However, in Spark UI, I
could see the progress of the task. Is there anything I miss?
Thanks.
Sincerely,
DB Tsai
Machine Learning Engineer
Alpine Data Labs
--------------------------------------
Web: http://alpinenow.com/