I can appreciate the reluctance to expose something like the
JobProgressListener as a public interface. It's exactly the sort of
thing that you want to deprecate as soon as something better comes along
and can be a real pain when trying to maintain the level of backwards
compatibility that we all expect from commercial grade software.
Instead of simply marking it private and therefore unavailable to Spark
developers, it might be worth incorporating something like a @Beta
annotation
<http://docs.guava-libraries.googlecode.com/git/javadoc/com/google/common/annotations/Beta.html>
which you could sprinkle liberally throughout Spark that communicates
"hey use this if you want to cause its here now" and "don't come crying
if we rip it out or change it later." This might be better than simply
marking so many useful functions/classes as private. I bet such an
annotation could generate a compile warning/error for those who don't
want to risk using them.
On 04/02/2014 06:40 PM, Patrick Wendell wrote:
Hey Phillip,
Right now there is no mechanism for this. You have to go in through
the low level listener interface.
We could consider exposing the JobProgressListener directly - I think
it's been factored nicely so it's fairly decoupled from the UI. The
concern is this is a semi-internal piece of functionality and
something we might, e.g. want to change the API of over time.
- Patrick
On Wed, Apr 2, 2014 at 3:39 PM, Philip Ogren <philip.og...@oracle.com
<mailto:philip.og...@oracle.com>> wrote:
What I'd like is a way to capture the information provided on the
stages page (i.e. cluster:4040/stages via IndexPage). Looking
through the Spark code, it doesn't seem like it is possible to
directly query for specific facts such as how many tasks have
succeeded or how many total tasks there are for a given active
stage. Instead, it looks like all the data for the page is
generated at once using information from the JobProgressListener.
It doesn't seem like I have any way to programmatically access
this information myself. I can't even instantiate my own
JobProgressListener because it is spark package private. I could
implement my SparkListener and gather up the information myself.
It feels a bit awkward since classes like Task and TaskInfo are
also spark package private. It does seem possible to gather up
what I need but it seems like this sort of information should just
be available without by implementing a custom SparkListener (or
worse screen scraping the html generated by StageTable!)
I was hoping that I would find the answer in MetricsServlet which
is turned on by default. It seems that when I visit
http://cluster:4040/metrics/json/ I should be able to get
everything I want but I don't see the basic stage/task progress
information I would expect. Are there special metrics properties
that I should set to get this info? I think this would be the
best solution - just give it the right URL and parse the resulting
JSON - but I can't seem to figure out how to do this or if it is
possible.
Any advice is appreciated.
Thanks,
Philip
On 04/01/2014 09:43 AM, Philip Ogren wrote:
Hi DB,
Just wondering if you ever got an answer to your question
about monitoring progress - either offline or through your own
investigation. Any findings would be appreciated.
Thanks,
Philip
On 01/30/2014 10:32 PM, DB Tsai wrote:
Hi guys,
When we're running a very long job, we would like to show
users the current progress of map and reduce job. After
looking at the api document, I don't find anything for
this. However, in Spark UI, I could see the progress of
the task. Is there anything I miss?
Thanks.
Sincerely,
DB Tsai
Machine Learning Engineer
Alpine Data Labs
--------------------------------------
Web: http://alpinenow.com/