This belongs to "keeping our users from directly
SSHing into compute nodes" category.
On a test cluster (pauli), I have the following set up:
0. 1 Front end and 2 compute nodes
Each compute node has 4 cpu cores
1. Rocks 5.4 (service pack 2) - all rolls except
bio, condor and xen; runs SGE queuing system
6.2u5
[root@pauli ~]# rocks list roll
NAME VERSION ARCH ENABLED
area51: 5.4 x86_64 yes
base: 5.4 x86_64 yes
ganglia: 5.4 x86_64 yes
hpc: 5.4 x86_64 yes
kernel: 5.4 x86_64 yes
os: 5.4 x86_64 yes
sge: 5.4 x86_64 yes
web-server: 5.4 x86_64 yes
service-pack: 5.4.2 x86_64 yes
2. MPICH2 (1.4), compiled with GCC 4.1.1, is in
/share/apps/mpich2/1.4/gcc/4.1.2
Configure & make/make install commands were as
follows
export CC="/usr/bin/gcc"
export CXX="/usr/bin/g++"
export FC="/usr/bin/gfortran"
export F77="/usr/bin/gfortran"
./configure --prefix=/share/apps/mpich2/1.4/gcc/4.1.2
make
make install
I compiled a simple 'hello, world' C program
mpicc -g -Wall hello_world.c -o hello_world.x
and 'hello_world.x' runs fine.
3. There are two groups on this cluster
pauli-users : all users belong to this group
pauli-admins : only administrators belong to this one,
in addition to being part of pauli-users
I created 3 user accounts (all belonging to
pauli-users) and one more account that belongs to
pauli-users & pauli-admins
These groups & users were created before any compute
node was added to the cluster
4. The extend-compute.xml had the following lines in
<post> section
<file name="/etc/ssh/sshd_config" mode="append">
# Block non-root, non-pauli-admins users from directly
# accessing this compute node
AllowGroups root pauli-admins
</file>
xmllint -noout extend-compute.xml was run and
no errors were found.
rocks distribution was rebuilt and the compute
nodes were added via the usual insert-ethers
I ran 'rocks sync users'
When I check the '/etc/ssh/sshd_config' file
in compute nodes, I do see the line
AllowGroups root pauli-admins
The '/etc/group' file in compute node have lines
corresponding to 'pauli-users' and 'pauli-admins'
pauli-users:x:500:
pauli-admins:x:501:john
5. 'john' attempts to SSH into compute nodes get through
while 'greg' (just a pauli-user) are blocked
6. Now comes SGE
I run the 'hello_world.x' with 8 processors (spanning
both compute nodes) via SGE script - sge_test.sh -
with 8 processors
#! /bin/bash
#
#$ -cwd
#$ -j y
#$ -S /bin/bash
#$ -pe mpich 8
#
# Run 'Hello, World!'
/share/apps/mpich2/1.4/gcc/4.1.2/bin/mpirun -n $NSLOTS \
-f $TMP/machines /share/apps/bin/hello_world.x
It produces desired output when I run this as 'john'
(a pauli-admin user)
It hangs in 'r' state. 'sge_test.sh.po12' contains
-catch_rsh
/opt/gridengine/default/spool/compute-0-0/active_jobs/12.1/pe_hostfile
compute-0-0
compute-0-0
compute-0-0
compute-0-0
compute-0-1
compute-0-1
compute-0-1
compute-0-1
'sge_test.sh.o12' contains
Permission denied, please try again.
Permission denied, please try again.
Permission denied (publickey,gssapi-with-mic,password).
Can someone please help me if I am doing something wrong
or missing something?
Thanks,
g
--
Gowtham
Advanced IT Research Support
Michigan Technological University
(906) 487/3593
_______________________________________________
users mailing list
[email protected]
https://gridengine.org/mailman/listinfo/users