Github user chesterxgchen commented on the pull request:

    https://github.com/apache/spark/pull/4405#issuecomment-73642035
  
    Thanks for the detailed reply.
    
    >>
    
    >>bq. I don't understand why the standalone mode or messos mode won't
    >>need have job delegation token ?
    
    >>Because they don't need it. They don't work with kerberos, and you don't
    >>need delegation tokens without kerberos.
    
    >>bq. If you see in oozie's implementation, you can see that before the MR
    job >>is submitted
    
    >>Not sure how that's related to Spark. Spark gets the needed delegation
    >>tokens, there's nothing else to be done.
    
    you answered above as through they are separate questions, actually it's
    one question.  I am not sure you understand my original question.
    
    Spark submit MR job just like other application such as oozie to submit MR
    job. In the secured cluster, you will need to use delegation token before
    submit, i am just using oozie as example, this can be Oozie, Hue, Knox or
    Tajo.  I am under the assumption that spark would need to have delegation
    as well even in the standalone mode similar to other application.
    
    I did not get a clear picture from your answer: is spark standalone mode
    that does not support kerberos ? or is the spark already has the delegation
    token ?
    
    
    >>Both Oozie and Hive need to fork external processes to run Spark. When
    >>those processes fork, they'll be running with the kerberos credentials of
    the >>user running those services, not as the "proxy user". So the forked
    process >>needs to know which user to impersonate. loginUserFromKeytab is
    >>irrelevant here.
    
    if you forked the process, of course, that's a different story. I am
    thinking about the application that run spark job without forking a
    process. Our application is like this and I am sure there are many other
    applications like this, which don't need run kinit from commandline.
    
    Thanks for clarify what's the PR is intended for.
    
    Chester
    
    On Mon, Feb 9, 2015 at 7:54 PM, Marcelo Vanzin <[email protected]>
    wrote:
    
    > bq. Did you test with the secured Hadoop Cluster or just normal cluster ?
    >
    > Both. In kerberos mode you have to be logged in before you submit the app,
    > but that's true before this change. If you're not logged in, you can't
    > submit. So ths change is not changing any assumptions.
    >
    > bq. I don't understand why the standalone mode or messos mode won't need
    > have job delegation token ?
    >
    > Because they don't need it. They don't work with kerberos, and you don't
    > need delegation tokens without kerberos.
    >
    > bq. If you see in oozie's implementation, you can see that before the MR
    > job is submitted
    >
    > Not sure how that's related to Spark. Spark gets the needed delegation
    > tokens, there's nothing else to be done.
    >
    > bq. For application ( for example, a programs that submit the spark job
    > directly, not from command line), this seems approach doesn't seem to help
    > much.
    >
    > Both Oozie and Hive need to fork external processes to run Spark. When
    > those processes fork, they'll be running with the kerberos credentials of
    > the user running those services, not as the "proxy user". So the forked
    > process needs to know which user to impersonate. loginUserFromKeytab is
    > irrelevant here.
    >
    > bq. So is the approach is only intended for command line use ? does it
    > make sense to push more logic into spark ?
    >
    > Yes, this is only for command line use (or, in other words, running Spark
    > as a separate process). Anything else would be a lot more complicated and
    > probably a much larger project, that is really not needed for the use case
    > at hand (Hive and, eventually, Oozie).
    >
    > —
    > Reply to this email directly or view it on GitHub
    > <https://github.com/apache/spark/pull/4405#issuecomment-73640354>.
    >



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to