Greetings,

        I have run into strange behavior on two separate installs of Oscar 2.3
on top of Redhat 8.0. In both cases RH8 was updated current as of Aug
29th. The same behaviors were noted on both installs which occurred on
two separate clusters.

The first was during step 1, download additional packages. After
selecting this step a progress bar is displayed and the install gui
becomes unresponsive. This condition lasts for over a half hour during
which perl (according to top) runs as high as 90%, takes 2GB of RAM and
dips into swap before the gui dies. Running the gui again runs fine as
long as step 1 is bypassed.

The other issue is the failure of test_cluster. The first pvm test would
timeout and subsequent tests would complain about not having enough pbs
nodes free. Watching the queue during the process showed the pvm job
hanging in the queue long after the job timed out and remained while
subsequent tests were submitted and failed. Digging into this deeper by
manipulating the time factor of the test scripts and performing some
manual tests I found the cause was that every ssh login had a 4-6 second
delay before proceeding. In troubleshooting this I found that by
changing:

## In /etc/pam.d/sshd the line:

        auth required /lib/security/pam_stack.so service=system-auth

gets changed to read:

        auth required /lib/security/pam_stack.so shadow nodelay

The delay disappears and all of the tests (pvm/mpich/lam/hdf) run
perfect within the default time factor of 3. The ssh login delay is a
killer when spread across 36 nodes.

Pam has never been an issue with RH7.3 and previous versions of Oscar.
These installations are not using ldap or nis+ for authentication and
nsswitch.conf is set to refer to files first. I followed the pam chain
from /etc/pam.d/sshd into /etc/pam,d/system-auth and I cannot determine
what pam condition is looking to satisfy to not make it wait 4-6 seconds
during an ssh login.

I do not consider the above change anything more than a band-aid while I
try and find what pam is looking for and why the delay is occurring.

Any suggestions?

Jeff 


-- 
Jeff Johnson <[EMAIL PROTECTED]>
Western Scientific, Inc

"Rome did not create a great Empire by holding meetings. They did it by
killing all those who opposed them."



-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
Oscar-users mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/oscar-users

Reply via email to