Dear Wang Xiaopeng,

Thank you for your support.

I can see only following messages from console for the failed node.

****************************************
  (ex. Error messages on one of failed node)
    Unable to download the kickstart file. Please modify the kickstart
parameter below or press Cansel to procees as an interactive installation.
    http://10.35.37.27:80/install/autoinst/testnode005
****************************************

For syslog, I can't find error messages, and failed nodes are get
DHCP-IPaddress.... (but stop avobe Error messages)

(xcat01 is xcat-server, testnode05 and 10.35.33.85 is failed node's
IP-address)
May  4 12:15:09 xcat01 dhcpd: DHCPOFFER on 10.35.33.85 to 40:f2:e9:c6:7c:e6
via bond0
May  4 12:15:14 xcat01 dhcpd: Dynamic and static leases present for
10.35.33.85.
May  4 12:15:14 xcat01 dhcpd: Remove host declaration testnode05 or remove
10.35.33.85
May  4 12:15:14 xcat01 dhcpd: DHCPREQUEST for 10.35.33.85 (10.35.37.27)
from 40:f2:e9:c6:7c:e6 via bond0
May  4 12:15:14 xcat01 dhcpd: DHCPACK on 10.35.33.85 to 40:f2:e9:c6:7c:e6
via bond0
May  4 12:15:14 xcat01 in.tftpd[8123]: RRQ from 10.35.33.85 filename
xcat/xnba.kpxe
May  4 12:15:14 xcat01 in.tftpd[8124]: RRQ from 10.35.33.85 filename
xcat/xnba.kpxe
May  4 12:15:14 xcat01 dhcpd: DHCPOFFER on 10.35.33.85 to 40:f2:e9:c6:7c:e6
via bond0
May  4 12:15:14 xcat01 dhcpd: Dynamic and static leases present for
10.35.33.85.
May  4 12:15:14 xcat01 dhcpd: Remove host declaration testnode05 or remove
10.35.33.85
May  4 12:15:14 xcat01 dhcpd: DHCPREQUEST for 10.35.33.85 (10.35.37.27)
from 40:f2:e9:c6:7c:e6 via bond0
May  4 12:15:14 xcat01 dhcpd: DHCPACK on 10.35.33.85 to 40:f2:e9:c6:7c:e6
via bond0
May  4 12:15:53 xcat01 dhcpd: DHCPOFFER on 10.35.33.85 to 40:f2:e9:c6:7c:e6
via bond0



(xcat01 is xcat-server, testnode10 and 10.35.33.96 is succeed node's
IP-address)
May  4 12:21:32 xcat01 dhcpd: DHCPOFFER on 10.35.33.96 to 40:f2:e9:c6:80:be
via bond0
May  4 12:21:37 xcat01 dhcpd: Dynamic and static leases present for
10.35.33.96.
May  4 12:21:37 xcat01 dhcpd: Remove host declaration testnode10 or remove
10.35.33.96
May  4 12:21:37 xcat01 dhcpd: DHCPREQUEST for 10.35.33.96 (10.35.37.27)
from 40:f2:e9:c6:80:be via bond0
May  4 12:21:37 xcat01 dhcpd: DHCPACK on 10.35.33.96 to 40:f2:e9:c6:80:be
via bond0
May  4 12:21:37 xcat01 in.tftpd[8224]: RRQ from 10.35.33.96 filename
xcat/xnba.kpxe
May  4 12:21:37 xcat01 in.tftpd[8225]: RRQ from 10.35.33.96 filename
xcat/xnba.kpxe
May  4 12:21:37 xcat01 dhcpd: DHCPOFFER on 10.35.33.96 to 40:f2:e9:c6:80:be
via bond0
May  4 12:21:37 xcat01 dhcpd: Dynamic and static leases present for
10.35.33.96.
May  4 12:21:37 xcat01 dhcpd: Remove host declaration testnode10 or remove
10.35.33.96
May  4 12:21:37 xcat01 dhcpd: DHCPREQUEST for 10.35.33.96 (10.35.37.27)
from 40:f2:e9:c6:80:be via bond0
May  4 12:21:37 xcat01 dhcpd: DHCPACK on 10.35.33.96 to 40:f2:e9:c6:80:be
via bond0

May  4 12:22:14 xcat01 dhcpd: DHCPOFFER on 10.35.33.96 to 40:f2:e9:c6:80:be
via bond0
May  4 12:22:14 xcat01 dhcpd: Dynamic and static leases present for
10.35.33.96.
May  4 12:22:14 xcat01 dhcpd: Remove host declaration testnode10 or remove
10.35.33.96
May  4 12:22:14 xcat01 dhcpd: DHCPREQUEST for 10.35.33.96 (10.35.37.27)
from 40:f2:e9:c6:80:be via bond0
May  4 12:22:14 xcat01 dhcpd: DHCPACK on 10.35.33.96 to 40:f2:e9:c6:80:be
via bond0
May  4 12:22:21 xcat01 dhcpd: DHCPOFFER on 10.35.33.96 to 40:f2:e9:c6:80:be
via bond0
May  4 12:22:21 xcat01 dhcpd: Dynamic and static leases present for
10.35.33.96.
May  4 12:22:21 xcat01 dhcpd: Remove host declaration testnode10 or remove
10.35.33.96
May  4 12:22:21 xcat01 dhcpd: DHCPREQUEST for 10.35.33.96 (10.35.37.27)
from 40:f2:e9:c6:80:be via bond0
May  4 12:22:21 xcat01 dhcpd: DHCPACK on 10.35.33.96 to 40:f2:e9:c6:80:be
via bond0

Best Regards,
Keiichi Suzuki.
----------------------------------------------------------------------

Then it was very possible the root cause was the failed node could not the
ip address. What could you see from the rcons for the failed node? You may
enable the debug log for syslog to see is there any clue from the dhcp and
tftp log.

Thanks
Best Regards
----------------------------------------------------------------------
Wang Xiaopeng (王?朋)
IBM China System Technology Laboratory
Tel: 86-10-82453455
Email: wxp@...
Address: 28,ZhongGuanCun Software Park,No.8 Dong Bei Wang West Road,
Haidian District Beijing P.R.China 100193



    ----- Original message -----
    From: "Keiichi Suzuki" <KEISUZU@...>
    To: xcat-user@...
    Cc:
    Subject: Re: [xcat-user] Fail to rinstall node on NextScale nx360m5
    Date: Tue, May 3, 2016 3:03 PM


    Dear Wang Xiaopeng,

    Thank you for your support.

    >You could ping the failed node, right? If yes, that means the
installer has got an ip and anaconda started working.

    No, I could'nt ping failed node. At the same time, I could ping succeed
node.

    >Then I am afraid there was issue in your httpd cfg, did you see any
error message in the httpd log?

    I saw /var/log/httpd/access_log and /var/log/httpd/error_log, but I
can't find messages like error.

    Following are my httpd.conf setting.

    - /etc/http/conf/httpd.conf

    Timeout 120

    KeepAlive Off

    MaxKeepAliveRequests 100

    KeepAliveTimeout 15

    <IfModule prefork.c>
    StartServers 80
    MinSpareServers 50
    MaxSpareServers 200
    ServerLimit 512
    MaxClients 512
    MaxRequestsPerChild 8000
    </IfModule>

    <IfModule worker.c>
    StartServers 80
    MaxClients 512
    MinSpareThreads 50
    MaxSpareThreads 200
    ThreadsPerChild 50
    MaxRequestsPerChild 0
    </IfModule>

    Best Regards,
    Keiichi Suzuki.
    ---------------------------------------------------------------------
    You could ping the failed node, right? If yes, that means the installer
has got an ip and anaconda started working.

    Then I am afraid there was issue in your httpd cfg, did you see any
error message in the httpd log?

    Thanks
    Best Regards
    ----------------------------------------------------------------------
    Wang Xiaopeng (王?朋)
    IBM China System Technology Laboratory
    Tel: 86-10-82453455
    Email: wxp@...
    Address: 28,ZhongGuanCun Software Park,No.8 Dong Bei Wang West Road,
Haidian District Beijing P.R.China 100193


    ----- Original message -----
    From: "Keiichi Suzuki" <KEISUZU@...>
    To: xcat-user@...
    Cc:
    Subject: Re: [xcat-user] Fail to rinstall node on NextScale nx360m5
    Date: Mon, May 2, 2016 9:42 AM


    Dear Wang Xiaopeng,

    Thank you for your support.

    >For every try, only exact 30 nodes can succeed and rest of them would
fail?
    Yes, exactlly, 33 nodes can succeed ant rest of them(27 nodes) are
faild.

    >You can try to manually select the install server from the rcons for
the failed node to see whether it can continue?
    No, I can't continue, because failed nodes stop following message on
booting.


***************************************************************************************************
    Error downloading kickstart file
    Unable to download the kickstart file. Please modify the kickstart
parameter below or press Cansel to procees as an interactive installation.

    http://10.35.37.27:80/install/autoinst/testnode005

***************************************************************************************************

    Best Regards,
    Keiichi Suzuki.
    ----------------------------------------------------------------------
    The configuration of your xCAT server is good to support 50 nodes.

    For every try, only exact 30 nodes can succeed and rest of them would
fail?

    You can try to manually select the install server from the rcons for
the failed node to see whether it can continue?

    Thanks
    Best Regards
    ----------------------------------------------------------------------
    Wang Xiaopeng (王?朋)
    IBM China System Technology Laboratory
    Tel: 86-10-82453455
    Email: wxp@...
    Address: 28,ZhongGuanCun Software Park,No.8 Dong Bei Wang West Road,
Haidian District Beijing P.R.China 100193


    ----- Original message -----
    From: "Keiichi Suzuki" <KEISUZU@...>
    To: xcat-user@...
    Cc:
    Subject: Re: [xcat-user] Fail to rinstall node on NextScale nx360m5
    Date: Thu, Apr 28, 2016 6:48 PM


    Dear Wang Xiaopeng,

    Thank you for your support.

    My xCAT server's environment is following.

    - lenovo x3650 m5
    - Xeon E5-2620v3 2.4GHz 12cores
    - Memory 64GB

    - network bandwidth of install network
    - 4GbE (1GbE*4) bond0 (mode: 802.3ad (eth0, eth1, eth2, eth3))

    Is this machine's performance insufficiency to install 50nodes over ?

    Best Regards,
    Keiichi Suzuki.

    ---------------------------------------------------------------------

    Then it was a performance issue.

    What's the network bandwidth of your install network?
    What was the usage of cpu and memory during the OS deployment?

    Thanks
    Best Regards
    ----------------------------------------------------------------------
    Wang Xiaopeng (王?朋)
    IBM China System Technology Laboratory
    Tel: 86-10-82453455
    Email: wxp@...
    Address: 28,ZhongGuanCun Software Park,No.8 Dong Bei Wang West Road,
Haidian District Beijing P.R.China 100193



    ----- Original message -----
    From: "Keiichi Suzuki" <KEISUZU@...>
    To: xcat-user@...
    Cc:
    Subject: Re: [xcat-user] Fail to rinstall node on NextScale nx360m5
    Date: Thu, Apr 28, 2016 11:54 AM


    Dear Wang Xiaopeng,

    Thank you for your support.

    50 nodes were performing installing in parallel. (wherever feasible, I
hope 300nodes are wanted to install in parallel)
    Failed node ,in parallel, succed to install individually. (one node at
one time)

    Best Regards,
    Keiichi Suzuki.

    ----------------------------------------------------------------------

    How many nodes you were performing installing in parallel?
    Will it succeed if you run the OS install for failed node individually?
(one node at one time)


    Thanks
    Best Regards
    ----------------------------------------------------------------------
    Wang Xiaopeng (‰????)
    IBM China System Technology Laboratory
    Tel: 86-10-82453455
    Email: wxp@...
    Address: 28,ZhongGuanCun Software Park,No.8 Dong Bei Wang West Road,
Haidian District Beijing P.R.China 100193



    ----- Original message -----
    From: "Keiichi Suzuki" <KEISUZU@...>
    To: xcat-user@...
    Cc:
    Subject: [xcat-user] Fail to rinstall node on NextScale nx360m5
    Date: Wed, Apr 27, 2016 11:47 AM


    I want to install SL6.7 to node on NextScale nx360m5.

    I can install SL6.7 to 1-30nodes successfully. But I can't install
30nodes over with following messages.

    (ex. Error messages on one of failed node)
    Unable to download the kickstart file. Please modify the kickstart
parameter below or press Cansel to procees as an interactive installation.
    http://10.35.37.27:80/install/autoinst/testnode005

    Please help me what to do to install successfully.

    I write detail, following.

    ***********************************************************
    1) Background / I want to do
    - I want to install SL6.7 to 300nodes on NextScale nx360m5, using xCAT.
    - It was success to install OS 1-30nodes.
    - It was failed to install OS 30nodes over.

    2) Environment (xCAT server)
    - Lenovo system x3550 (Mem.64GB)
    - xCAT 2.11 (OS is RHEL7.2)

    I customized following parameter.

    - cat /etc/sysconfig/nfs
    > RPCNFSDCOUNT=100


    - /proc/sys/net/core/rmem_default
    16777216

    - /proc/sys/net/core/rmem_max
    16777216

    - /proc/sys/net/core/wmem_default
    16777216

    - /proc/sys/net/core/wmem_max
    16777216


    - ulimit -a
    core file size (blocks, -c) unlimited
    data seg size (kbytes, -d) unlimited
    scheduling priority (-e) 0
    file size (blocks, -f) unlimited
    pending signals (-i) 16371
    max locked memory (kbytes, -l) unlimited
    max memory size (kbytes, -m) unlimited
    open files (-n) 65536
    pipe size (512 bytes, -p) 8
    POSIX message queues (bytes, -q) 819200
    real-time priority (-r) 0
    stack size (kbytes, -s) unlimited
    cpu time (seconds, -t) unlimited
    max user processes (-u) 16371
    virtual memory (kbytes, -v) unlimited
    file locks (-x) unlimited


    - /etc/http/conf/httpd.conf

    Timeout 120

    KeepAlive Off

    MaxKeepAliveRequests 100

    KeepAliveTimeout 15

    <IfModule prefork.c>
    StartServers 80
    MinSpareServers 50
    MaxSpareServers 200
    ServerLimit 512
    MaxClients 512
    MaxRequestsPerChild 8000
    </IfModule>

    <IfModule worker.c>
    StartServers 80
    MaxClients 512
    MinSpareThreads 50
    MaxSpareThreads 200
    ThreadsPerChild 50
    MaxRequestsPerChild 0
    </IfModule>




    3) Environment(xCAT client)
    - Lenovo NextScale n360m5
    - SL6.7


    4) xCAT settings

    [root@... ~]# tabdump site
    #key,value,comments,disable
    blademaxp,"64",,
    domain,"test.jp",,
    fsptimeout,"0",,
    installdir,"/install",,
    ipmimaxp,"64",,
    ipmiretries,"3",,
    ipmitimeout,"2",,
    consoleondemand,"no",,
    master,"192.168.37.27",,
    nameservers,"192.168.37.27",,
    maxssh,"8",,
    ppcmaxp,"64",,
    ppcretry,"3",,
    ppctimeout,"0",,
    sharedtftp,"1",,
    SNsyncfiledir,"/var/xcat/syncfiles",,
    tftpdir,"/tftpboot",,
    xcatdport,"3001",,
    xcatiport,"3002",,
    xcatconfdir,"/etc/xcat",,
    timezone,"Asia/Tokyo",,
    useNmapfromMN,"no",,
    enableASMI,"no",,
    db2installloc,"/mntdb2",,
    databaseloc,"/var/lib",,
    sshbetweennodes,"ALLGROUPS",,
    dnshandler,"ddns",,
    vsftp,"y",,
    cleanupxcatpost,"no",,
    dhcpinterfaces,"bond0",,

    ***********************************************************

    Best Regards,
    Keiichi Suzuki.

------------------------------------------------------------------------------
    Find and fix application performance issues faster with Applications
Manager
    Applications Manager provides deep performance insights into multiple
tiers of
    your business applications. It resolves application problems quickly
and
    reduces your MTTR. Get your free trial!
    https://ad.doubleclick.net/ddm/clk/302982198;130105516;z
    ;
    _______________________________________________
    xCAT-user mailing list
    xCAT-user@...
    https://lists.sourceforge.net/lists/listinfo/xcat-user
    ;

------------------------------------------------------------------------------
    Find and fix application performance issues faster with Applications
Manager
    Applications Manager provides deep performance insights into multiple
tiers of
    your business applications. It resolves application problems quickly
and
    reduces your MTTR. Get your free trial!
    https://ad.doubleclick.net/ddm/clk/302982198;130105516;z
    ;
    _______________________________________________
    xCAT-user mailing list
    xCAT-user@...
    https://lists.sourceforge.net/lists/listinfo/xcat-user


    ;

    ;

------------------------------------------------------------------------------
    Find and fix application performance issues faster with Applications
Manager
    Applications Manager provides deep performance insights into multiple
tiers of
    your business applications. It resolves application problems quickly
and
    reduces your MTTR. Get your free trial!
    https://ad.doubleclick.net/ddm/clk/302982198;130105516;z
    ;
    _______________________________________________
    xCAT-user mailing list
    xCAT-user@...
    https://lists.sourceforge.net/lists/listinfo/xcat-user

    ;


------------------------------------------------------------------------------
    Find and fix application performance issues faster with Applications
Manager
    Applications Manager provides deep performance insights into multiple
tiers of
    your business applications. It resolves application problems quickly
and
    reduces your MTTR. Get your free trial!
    https://ad.doubleclick.net/ddm/clk/302982198;130105516;z
    ;
    _______________________________________________
    xCAT-user mailing list
    xCAT-user@...
    https://lists.sourceforge.net/lists/listinfo/xcat-user

    ;

------------------------------------------------------------------------------
    Find and fix application performance issues faster with Applications
Manager
    Applications Manager provides deep performance insights into multiple
tiers of
    your business applications. It resolves application problems quickly
and
    reduces your MTTR. Get your free trial!
    https://ad.doubleclick.net/ddm/clk/302982198;130105516;z
    ;
    _______________________________________________
    xCAT-user mailing list
    xCAT-user@...
    https://lists.sourceforge.net/lists/listinfo/xcat-user
------------------------------------------------------------------------------
Find and fix application performance issues faster with Applications Manager
Applications Manager provides deep performance insights into multiple tiers of
your business applications. It resolves application problems quickly and
reduces your MTTR. Get your free trial!
https://ad.doubleclick.net/ddm/clk/302982198;130105516;z
_______________________________________________
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user

Reply via email to