I can appreciate the reluctance to expose something like the JobProgressListener as a public interface. It's exactly the sort of thing that you want to deprecate as soon as something better comes along and can be a real pain when trying to maintain the level of backwards compatibility that we all expect from commercial grade software. Instead of simply marking it private and therefore unavailable to Spark developers, it might be worth incorporating something like a @Beta annotation <http://docs.guava-libraries.googlecode.com/git/javadoc/com/google/common/annotations/Beta.html> which you could sprinkle liberally throughout Spark that communicates "hey use this if you want to cause its here now" and "don't come crying if we rip it out or change it later." This might be better than simply marking so many useful functions/classes as private. I bet such an annotation could generate a compile warning/error for those who don't want to risk using them.

On 04/02/2014 06:40 PM, Patrick Wendell wrote:
Hey Phillip,

Right now there is no mechanism for this. You have to go in through the low level listener interface.

We could consider exposing the JobProgressListener directly - I think it's been factored nicely so it's fairly decoupled from the UI. The concern is this is a semi-internal piece of functionality and something we might, e.g. want to change the API of over time.

- Patrick


On Wed, Apr 2, 2014 at 3:39 PM, Philip Ogren <philip.og...@oracle.com <mailto:philip.og...@oracle.com>> wrote:

    What I'd like is a way to capture the information provided on the
    stages page (i.e. cluster:4040/stages via IndexPage).  Looking
    through the Spark code, it doesn't seem like it is possible to
    directly query for specific facts such as how many tasks have
    succeeded or how many total tasks there are for a given active
    stage.  Instead, it looks like all the data for the page is
    generated at once using information from the JobProgressListener.
    It doesn't seem like I have any way to programmatically access
    this information myself.  I can't even instantiate my own
    JobProgressListener because it is spark package private.  I could
    implement my SparkListener and gather up the information myself.
     It feels a bit awkward since classes like Task and TaskInfo are
    also spark package private.  It does seem possible to gather up
    what I need but it seems like this sort of information should just
    be available without by implementing a custom SparkListener (or
    worse screen scraping the html generated by StageTable!)

    I was hoping that I would find the answer in MetricsServlet which
    is turned on by default.  It seems that when I visit
    http://cluster:4040/metrics/json/ I should be able to get
    everything I want but I don't see the basic stage/task progress
    information I would expect.  Are there special metrics properties
    that I should set to get this info?  I think this would be the
    best solution - just give it the right URL and parse the resulting
    JSON - but I can't seem to figure out how to do this or if it is
    possible.

    Any advice is appreciated.

    Thanks,
    Philip



    On 04/01/2014 09:43 AM, Philip Ogren wrote:

        Hi DB,

        Just wondering if you ever got an answer to your question
        about monitoring progress - either offline or through your own
        investigation.  Any findings would be appreciated.

        Thanks,
        Philip

        On 01/30/2014 10:32 PM, DB Tsai wrote:

            Hi guys,

            When we're running a very long job, we would like to show
            users the current progress of map and reduce job. After
            looking at the api document, I don't find anything for
            this. However, in Spark UI, I could see the progress of
            the task. Is there anything I miss?

            Thanks.

            Sincerely,

            DB Tsai
            Machine Learning Engineer
            Alpine Data Labs
            --------------------------------------
            Web: http://alpinenow.com/





Reply via email to