Good day,

I am trying to bring up a new cluster to replace our aging system, and am 
trying to duplicate our basic setup  - which is to say statelite provision on 
Dell PowerEdge servers.  After working through the documentation and looking at 
our old setup, I've gotten close (I think) to success but have now been beating 
my head against a wall for a few weeks.

In short, when I boot the node, it seems to come up successfully, at least as 
far as networking and syslogging goes, but I can neither ssh to the node, nor 
can I get a console login.  I'm afraid I'm missing something obvious, but it's 
starting to drive me crazy.  Data to follow:


The node is a Dell C6420, the OS is SciLinux 7.4, and I'm trying for a 
statelite boot.
lsdef on the node:
Object name: node-i01
    addkcmdline=bond=bond0:eth0,eth1:mode=4
    arch=x86_64
    bmc=<redactBMCIP>
    bmcpassword=<redact>
    bmcusername=<redact>
    cons=ipmi
    consoleenabled=1
    currstate=statelite SL7-x86_64-compute
    groups=all,node-i,c6420
    hostnames=node-i01<domain>
    initrd=xcat/netboot/SL7/x86_64/compute-v1/initrd-statelite.gz
    ip=<redactNodeIP>
    
kcmdline=root=nfs:<redactMasterIP>:/export/install/netboot/SL7/x86_64/compute-v1/rootimg:ro
 STATEMNT=<redactMasterIP>:/state XCAT=!myipfn!:3001 console=tty0 
console=ttyS1,115200n8r MNTOPTS=
    kernel=xcat/netboot/SL7/x86_64/compute-v1/kernel
    mac=<redact>
    mgt=ipmi
    netboot=pxe
    nfsserver=<redactMasterIP>
    nodetype=osi
    os=SL7
    postbootscripts=otherpkgs
    postscripts=syslog,remoteshell,syncfiles
    primarynic=mac
    profile=compute
    provmethod=SL7.4-compute-v1-201804
    serialflow=hard
    serialport=1
    serialspeed=115200
    status=netbooting
    statustime=05-15-2018 10:01:01

lsdef on the osimage
Object name: SL7.4-compute-v1-201804
    exlist=/opt/xcat/share/xcat/netboot/SL/compute.centos7.exlist
    imagetype=linux
    osarch=x86_64
    osdistroname=SL
    osname=Linux
    osvers=SL7
    otherpkgdir=/install/post/otherpkgs/SL7/x86_64
    permission=755
    pkgdir=/install/SL7.x/x86_64
    pkglist=/opt/xcat/share/xcat/netboot/SL/compute.centos7.pkglist
    profile=compute
    provmethod=statelite
    rootimgdir=/install/netboot/SL7/x86_64/compute-v1

when I boot with rcons running to the node I get the usual boot data and then:
PXELINUX 4.05 0x581bd748  Copyright (C) 1994-2011 H. Peter Anvin et al
!PXE entry point found (we hope) at 9878:0106 via plan A
UNDI code segment at 9878 len 4A10
UNDI data segment at 90FF len 7790
Getting cached packet  01 02 03
My IP address seems to be <redact>
ip=<redact>
BOOTIF=<redact>
SYSUUID=<redact>
TFTP prefix:
Trying to load: pxelinux.cfg/<redact>                               ok
Loading xcat/osimage/SL7.4-compute-v1-201804/kernel........
Loading xcat/osimage/SL7.4-compute-v1-201804/initrd-statelite.gz................
....................ready.

So that looks good, but then it hangs.

>From the idrac console itself, I get that data and then a lot of other boot 
>data before it hangs.  Useful snippets from that data:
the kernel command line is being passed properly.
console tty0 and ttyS1 are enabled
the console USB Keyboard/Mouse say they are found.
systemd says no hostname, random generator for ID
swap is started
"Started Dispatch Password Requests to Console Directory Watch"
"Starting dracut cmdline hook ..."
"random: crng init done"
the network comes up then ... and is pingable
dns_resolver registered
then hangs, and eventually systemd-journald crashes and restarts.


The log messages below indicate that the node is not getting the correct time 
settings (the time stamps in the logs are an hour off), which may or may not be 
related.

/var/log/messages gives me this:
May 15 09:34:54 master02 xcat[188522]: xCAT: Allowing rpower to node-i01 status 
for root from localhost
May 15 09:35:03 master02 xcat[188529]: xCAT: Allowing rpower to node-i01 reset 
for root from localhost
May 15 09:35:03 master02 xcat[188530]: node-i01 status: powering-on statustime: 
05-15-2018 09:35:03
May 15 09:35:15 master02 xcat[188560]: xCAT: Allowing tabdump site for root 
from localhost
May 15 09:35:16 master02 xcat[188570]: xCAT: Allowing tabdump site for root 
from localhost
May 15 09:35:16 master02 xcat[188591]: xCAT: Allowing nodels to node-i01 
nodehm.conserver for root from localhost
May 15 09:36:15 master02 in.tftpd[188658]: RRQ from <redactNodeIP> filename 
xcat/osimage/SL7.4-compute-v1-201804/kernel
May 15 09:36:15 master02 in.tftpd[188658]: Client <redactNodeIP> finished 
xcat/osimage/SL7.4-compute-v1-201804/kernel
May 15 09:36:15 master02 in.tftpd[188659]: RRQ from <redactNodeIP> filename 
xcat/osimage/SL7.4-compute-v1-201804/initrd-statelite.gz
May 15 09:36:17 master02 in.tftpd[188659]: Client <redactNodeIP> finished 
xcat/osimage/SL7.4-compute-v1-201804/initrd-statelite.gz
May 15 08:58:11 node-i01 kernel: [    0.000000] Command line: 
initrd=xcat/osimage/SL7.4-compute-v1-201804/initrd-statelite.gz 
root=nfs:<redactMasterIP>:/install/netboot/SL7/x86_64/compute-v1/rootimg:ro 
STATEMNT=<redactMasterIP>:/state XCAT=<redactMasterIP>:3001 NODE=node-i01 
LOGSERVER=<redactMasterIP>syslog.server=<redactMasterIP>syslog.type=rsyslogd 
syslog.filter=*.* xcatdebugmode=1 console=tty0 console=ttyS1,115200n8r MNTOPTS= 
bond=bond0:eth0,eth1:mode=4 selinux=0 rd.shell rd.debug 
BOOT_IMAGE=xcat/osimage/SL7.4-compute-v1-201804/kernel BOOTIF=<redact>
May 15 10:01:01 master02 xcat[140913]: INFO xcatd received a connection request 
from <redactNodeIP>
May 15 10:01:01 master02 xcat[140913]: node-i01 status: netbooting statustime: 
05-15-2018 10:01:01
May 15 09:01:08 node-i01 xcat: ready
May 15 09:01:08 node-i01 xcat: done
May 15 10:08:01 master02 xcat[189323]: xCAT: Allowing litefile from node-i01
May 15 10:08:01 master02 xcat[189325]: xCAT: Allowing litetree from node-i01

/var/log/xcat/cluster.log attached.

nmap indicates that port 22 is closed and ssh returns connection refused.

It feels like I'm missing something obvious, but (clearly) I don't know what 
... any pointers?

Jeff Berry
jeff.be...@mrc-cbu.cam.ac.uk
May 15 09:34:45 master02 xcat[188517]: DEBUG xcatd: connection from 
root@localhost
May 15 09:34:45 master02 xcat[188517]: DEBUG xcatd: open new process : xcatd 
SSL: rpower to status for root@localhost
May 15 09:34:45 master02 xcat[188517]: DEBUG xcatd: close connection with 
root@localhost
May 15 09:34:54 master02 xcat[188522]: DEBUG xcatd: connection from 
root@localhost
May 15 09:34:54 master02 xcat[188522]: DEBUG xcatd: open new process : xcatd 
SSL: rpower to node-i01 for root@localhost
May 15 09:34:54 master02 xcat[188522]: xCAT: Allowing rpower to node-i01 status 
for root from localhost
May 15 09:34:54 master02 xcat[188523]: DEBUG xcatd: dispatch request 'rpower 
node-i01 status' to plugin 'ipmi'
May 15 09:34:54 master02 xcat[188523]: DEBUG xcatd: handle request 'rpower' by 
plugin 'ipmi''s preprocess_request
May 15 09:34:54 master02 xcat[188523]: DEBUG xcatd: handle request 'rpower' by 
plugin 'ipmi''s process_request
May 15 09:34:54 master02 xcat[188522]: DEBUG xcatd: close connection with 
root@localhost
May 15 09:35:02 master02 xcat[188529]: DEBUG xcatd: connection from 
root@localhost
May 15 09:35:02 master02 xcat[188529]: DEBUG xcatd: open new process : xcatd 
SSL: rpower to node-i01 for root@localhost
May 15 09:35:03 master02 xcat[188529]: xCAT: Allowing rpower to node-i01 reset 
for root from localhost
May 15 09:35:03 master02 xcat[188530]: DEBUG xcatd: dispatch request 'rpower 
node-i01 reset' to plugin 'ipmi'
May 15 09:35:03 master02 xcat[188530]: DEBUG xcatd: handle request 'rpower' by 
plugin 'ipmi''s preprocess_request
May 15 09:35:03 master02 xcat[188530]: DEBUG xcatd: handle request 'rpower' by 
plugin 'ipmi''s process_request
May 15 09:35:03 master02 xcat[188530]: node-i01 status: powering-on statustime: 
05-15-2018 09:35:03
May 15 09:35:03 master02 xcat[188529]: DEBUG xcatd: close connection with 
root@localhost
May 15 09:35:15 master02 xcat[188560]: DEBUG xcatd: connection from 
root@localhost
May 15 09:35:15 master02 xcat[188560]: DEBUG xcatd: open new process : xcatd 
SSL: tabdump for root@localhost
May 15 09:35:15 master02 xcat[188560]: xCAT: Allowing tabdump site for root 
from localhost
May 15 09:35:15 master02 xcat[188561]: DEBUG xcatd: dispatch request 'tabdump 
site' to plugin 'tabutils'
May 15 09:35:15 master02 xcat[188561]: DEBUG xcatd: handle request 'tabdump' by 
plugin 'tabutils''s process_request
May 15 09:35:16 master02 xcat[188560]: DEBUG xcatd: close connection with 
root@localhost
May 15 09:35:16 master02 xcat[188570]: DEBUG xcatd: connection from 
root@localhost
May 15 09:35:16 master02 xcat[188570]: DEBUG xcatd: open new process : xcatd 
SSL: tabdump for root@localhost
May 15 09:35:16 master02 xcat[188570]: xCAT: Allowing tabdump site for root 
from localhost
May 15 09:35:16 master02 xcat[188571]: DEBUG xcatd: dispatch request 'tabdump 
site' to plugin 'tabutils'
May 15 09:35:16 master02 xcat[188571]: DEBUG xcatd: handle request 'tabdump' by 
plugin 'tabutils''s process_request
May 15 09:35:16 master02 xcat[188570]: DEBUG xcatd: close connection with 
root@localhost
May 15 09:35:16 master02 xcat[188591]: DEBUG xcatd: connection from 
root@localhost
May 15 09:35:16 master02 xcat[188591]: DEBUG xcatd: open new process : xcatd 
SSL: nodels to node-i01 for root@localhost
May 15 09:35:16 master02 xcat[188591]: xCAT: Allowing nodels to node-i01 
nodehm.conserver for root from localhost
May 15 09:35:16 master02 xcat[188592]: DEBUG xcatd: dispatch request 'nodels 
node-i01 nodehm.conserver' to plugin 'tabutils'
May 15 09:35:16 master02 xcat[188592]: DEBUG xcatd: handle request 'nodels' by 
plugin 'tabutils''s process_request
May 15 09:35:16 master02 xcat[188591]: DEBUG xcatd: close connection with 
root@localhost
May 15 10:01:01 master02 xcat[140913]: INFO xcatd received a connection request 
from <redactNodeIP>
May 15 10:01:01 master02 xcat[140913]: DEBUG xcatd: dispatch request 
'updatenodestat netbooting' to plugin 'updatenode'
May 15 10:01:01 master02 xcat[140913]: DEBUG xcatd: handle request 
'updatenodestat' by plugin 'updatenode''s preprocess_request
May 15 10:01:01 master02 xcat[140913]: DEBUG xcatd: handle request 
'updatenodestat' by plugin 'updatenode''s process_request
May 15 10:01:01 master02 xcat[140913]: node-i01 status: netbooting statustime: 
05-15-2018 10:01:01
May 15 10:01:01 node-i01 xcat:  ready
May 15 10:01:01 node-i01 xcat:  done
May 15 10:08:01 master02 xcat[189323]: DEBUG xcatd: connection from node-i01
May 15 10:08:01 master02 xcat[189323]: DEBUG xcatd: open new process : xcatd 
SSL: litefile for node-i01
May 15 10:08:01 master02 xcat[189323]: xCAT: Allowing litefile from node-i01
May 15 10:08:01 master02 xcat[189324]: DEBUG xcatd: dispatch request 'litefile 
' to plugin 'litetree'
May 15 10:08:01 master02 xcat[189324]: DEBUG xcatd: handle request 'litefile' 
by plugin 'litetree''s process_request
May 15 10:08:01 master02 xcat[189323]: DEBUG xcatd: close connection with 
node-i01
May 15 10:08:01 master02 xcat[189325]: DEBUG xcatd: connection from node-i01
May 15 10:08:01 master02 xcat[189325]: DEBUG xcatd: open new process : xcatd 
SSL: litetree for node-i01
May 15 10:08:01 master02 xcat[189325]: xCAT: Allowing litetree from node-i01
May 15 10:08:01 master02 xcat[189326]: DEBUG xcatd: dispatch request 'litetree 
' to plugin 'litetree'
May 15 10:08:01 master02 xcat[189326]: DEBUG xcatd: handle request 'litetree' 
by plugin 'litetree''s process_request
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user

Reply via email to