Michael Weiser <[email protected]> writes:

> AFAIK SGE once had built-in support for Kerberos 4 but it has been
> broken for a long time now.

[It was never really usable
<http://arc.liv.ac.uk/repos/darcs/sge/source/security/security.html> and
wouldn't solve the basic batch problem.  You also really want a GSSAPI
implementation rather than raw Kerberos.]

>> Otherwise, could it be done using startup scripts?
>
> Yes, there are prolog and epilog scripts as well as hooks meant for AFS
> called set_token_cmd, pag_cmd and token_extend_time for doing just that.
> They're documented on man 5 sge_conf. There is a collection of scripts
> for AFS (https://dvinfo.ifh.de/SGEwithAFS). But as far as I know there
> is no ready-to-use solution for Kerberos V.

The GSS security mechanism should work, within its limitations.
However, the hooks for calling the sub-programs don't work properly and
I'm not sure whether that can be fixed with the same interface.  They
should probably be replaced with a loadable module.

The AFS stuff stores credentials for queued jobs in the spool area and
credentials can be stolen by other jobs on the compute hosts without an
authentication mechanism.

>> Is there any support for automatically renewing ticktes for
>> long-running jobs?
>
> set_token_cmd looks right for that, but will only be called while
> the job is running. This is inconvenient for Kerberos because tickets
> need to be renewed while the job is waiting as well.

I think the AFS mechanism (with the coshepherd) is somewhat special
because of the AFS PAG.  Otherwise it might be convenient to co-opt load
sensors for babysitting tickets to the extent it will work but the
usual, general way to do this is to run under the krenew wrapper.

> This problem (and others) can be avoided using Constrained Delegation
> or Protocol Transition. I gave a talk on that at this year's DFN CERT
> workshop in Hamburg. The slides (German) explaining various options can
> be found here: 
>
> http://www.dfn-cert.de/dokumente/workshop/2012/Folien_Weiser.pdf

I'm not sure how much sense I can make of that, I'm sorry to say.  I
don't understand how it would solve the basic issue of authentication
with long batch jobs, even if the Microsoft mechanisms are available,
but I'm keen to hear of better solutions.  Could you explain briefly?

> There was a longer paper published in the conference proceedings as
> well. There was no real resonance however which is why I couldn't get
> any traction to actually implement some software.

I can sympathize.

> So, as of now there is
> no software for actually doing what is proposed in the talk.
> Implementing a proof-of-concept using shell or python scripts should
> however be a matter of man hours, not months.

Does it need any more than getting the GSS hooks working properly
<http://arc.liv.ac.uk/repos/darcs/sge/source/security/gss/doc/gss_customer.html>?
I kludged a proof-of-concept for MUNGE with them.

>> If I have the ticket on the host that runs the job script, the problem
>> should be solved for MPI as its children are started using SSH, and I
>> could just change the login method of SSH from pubkey to KRB5. Is that
>> correct?
>
> Yep, and forward the user's TGT to the other execution hosts using the
> -K or GSSAPIDelegateCredentials options.

That shouldn't be necessary with SGE forwarding.

> There are patches for OpenSSH to extend Kerberos support
> (http://www.sxw.org.uk/computing/patches/openssh.html). I'm not
> up-to-date on how much of them ever made it into mainline OpenSSH.

[They're in all the main GNU/Linux distributions as far as I know.]

-- 
Community Grid Engine:  http://arc.liv.ac.uk/SGE/
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users

Reply via email to