This sounds like a bug in how PBS works with LDAP authentication to me.
The default OSCAR setup assumes you are using the head node as your
accounts server and simply propogates the /etc/passwd and related
files to the nodes.  So if PBS is refering to those files for some
reason I immagine they will be fairly meaningless.  Have you checked
the /etc/nsswitch.conf file on the nodes and host to see if they are
sane?

I haven't gotten around to it yet but I am planning on using LDAP at
our new facility so I am anxious to get this resolved.

On 2/16/06, Jim Summers <[EMAIL PROTECTED]> wrote:
> Bernard Li wrote:
> > Hi Jim:
> >
> > As that user, can you ssh to 192.168.0.6?
>
> Yes.
>
> [EMAIL PROTECTED] ~]$ ssh node6
> [EMAIL PROTECTED] ~]$ ls -al
> >
> > The error message "No Password Entry for User tmac0501" sounds fishy...
>
> Strangest thing I ever saw.
>
>    does /etc/passwd, /etc/shadow look okay on that node?
>
> Looks fine.
>
> Regarding my local user test, I was wrong,  using a local user that can
> ssh to all nodes, and can run mpirun without any errors, the pbs
> submitted jobs now are returning the following:
> ----------
> p0_4722: (61.563956) Procgroup:
> p0_4722: (61.564118)     entry 0: node15.oscardomain 0 0
> /admin/localpbs/clib/node-test localpbs
> p0_4722: (61.564145)     entry 1: rhel4.ehpctc.intern 1 1
> /admin/localpbs/clib/node-test localpbs
> p0_4722:  p4_error: Could not gethostbyname for host
> rhel4.ehpctc.intern; may be invalid name
> : 62
> ----------
> This is repeatable even with a varying number of nodes.  I am not sure
> where it is getting the
>
> rhel4.ehpctc.intern
>
> entry.  The machines.LINUX is fine, mpirun works.  It seems like there
> may be a problem in the scheduler?
>
> >
>
> > Cheers,
> >
> > Bernard
> >
> > P.S. Did you run through the "Test OSCAR Setup" step and all tests passed?
>
> Yes  most of the tests passed.  It seems like one did fail, either it
> was ganglia or some pvm tests.  I have configured the ganglia and it
> works fine.  Since we aren't planning on using the pvm I disregarded it.
>
>
> > in the queue.  I am seeing the following in the mom_logs on the nodes
> > involved:
> > =========================
> > 02/15/2006 15:06:22;0008;   pbs_mom;Job;6.master;No Password Entry for
> > User tmac0501
> > 02/15/2006 15:10:26;0008;   pbs_mom;Job;6.master;ERROR:    received
> > request 'ABORT_JOB' from 192.168.0.6:1023 for job '6.master' (job does
> > not exist locally)
> > ========================
> >
> > Not sure what I have to configure.  I haven't seen anyhting in the pbs
> > docs regarding authentication yet.
> >
> > The ldap users can ssh to each node and their home is mounted.
> >
>
> TIA
>
> --
> Jim Summers
> School of Computer Science-University of Oklahoma
> -------------------------------------------------
>
>
> -------------------------------------------------------
> This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
> for problems?  Stop!  Download the new AJAX search engine that makes
> searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
> http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642
> _______________________________________________
> Oscar-users mailing list
> [email protected]
> https://lists.sourceforge.net/lists/listinfo/oscar-users
>


-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid3432&bid#0486&dat1642
_______________________________________________
Oscar-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/oscar-users

Reply via email to