Re: [xcat-user] Unrecognized directive (dest=) during nodediscovery

2019-05-23 Thread Andrew Loftus
Thanks for all your help.

For the "rinstall" method, I have to manually enter the MAC for the node.
Then all works fine and it drops me to a shell. Interestingly, the "doxcat"
parts "getcert" and "getdestiny" still run, but this time they return valid
data (and still no log of any activity recorded on the master node during
either "getcert" or "getdestiny").  Seeing getdestiny work as expected in
this case, I now think the network is all fine and perhaps I'm missing some
key part about nodediscover on the master node.

Recall, I'm trying to get sequential nodediscovery working properly, which,
I think, means xcat gets the MAC address and assigns it to the node being
discovered (as opposed to manually adding the MAC).

I'm suspicious as to why "getdestiny" returns data when using the
"rinstall" method, but doesn't return data when doing nodediscovery.

What should the "nodediscoverstart" command do?

How can I check that nodediscoverstart did (or did not do) the right thing
on the master node?

On Thu, May 23, 2019 at 9:57 PM Song BJ Yang  wrote:

> Hi,
>
> You can run `rinstall [CN] shell` to provision the node to genesis shell,
> and then run `/bin/dodiscovery` manually.
>
>
>
> --
> YANG Song (杨嵩)
> IBM China System Technology Laboratory
> Tel: 86-10-82452903
> Email: yang...@cn.ibm.com
> Address: Building 28, ZhongGuanCun Software Park,
> No.8, Dong Bei Wang West Road, Haidian District Beijing 100193, PRC
>
> 北京市海淀区东北旺西路8号中关村软件园28号楼
> 邮编: 100193
>
>
>
> - Original message -
> From: Andrew Loftus 
> To: xCAT Users Mailing list 
> Cc:
> Subject: [EXTERNAL] Re: [xcat-user] Unrecognized directive (dest=) during
> nodediscovery
> Date: Fri, May 24, 2019 4:10 AM
>
> I copied the XML request from the getdestiny bash script and built the
> node by hand (from ISO) and sent this request to the master node.
> On the master node, in /var/log/xcat/cluster.log, I see:
> DEBUG xcatd: connection from testvm30
> DEBUG xcatd: open new process : xcatd SSL: getdestiny for testvm30
> DEBUG xcatd: dispatch request 'getdestiny ' to plugin 'destiny'
> DEBUG xcatd: handle request 'getdestiny' by plugin 'destiny''s
> process_request
> DEBUG destiny->process_request: starting getdestiny...
> DEBUG destiny->process_request: getdestiny...
> INFO xcat.updatestatus - testvm30: changing status=shell
> DEBUG destiny->process_request: processing is finished for getdestiny
> DEBUG xcatd: close connection with testvm30
>
> For me, this confirms networking and firewalls are okay.
> It also looks like discovery is okay on the master node.
> It seems to me that something is not working in the genesis kernel.
>
> How can I customize the "doxcat" script (or customize the genesis kernel)
> so I can force it to drop to a shell and then troubleshoot from within the
> live genesis kernel?
>
> On Thu, May 23, 2019 at 1:50 PM Loftus, Andrew J 
> wrote:
>
> Thank you for pointing me to ` journalctl -u xcatd -f`, that shows me
> that xcat/xnba.kpxe is requested (and I assume provided to the booting
> node).
> However, after that, I see no more activity on the xcat master. I would
> expect activity recorded on the xcat master at the steps:
> xcat.genesis.doxcat: Getting initial certificate --> 172.30.18.6:3001
> xcat.genesis.doxcat: Running getdestiny --> 172.30.18.6:3001
>
> Both of those commands above seem to timeout with no apparent response
> from the xcat master. The ip address and port specified are correct for the
> xcat master.
> I have ensured that iptables is open for those ports.
> I have ensure that there is a process listening on port 3001.
>
> What else should I check?
>
> Can I get more debug info from the genesis kernel?
>
> Can I get more debug info from the xcat master?
>
> On Thu, May 23, 2019 at 4:51 AM Song BJ Yang  wrote:
>
> Hi,
>
> You can watch the output of `journalctl -u xcatd -f` in another session
> during nodediscovery
>
> there are some similar issues reported before, see
> https://sourceforge.net/p/xcat/mailman/search/?q=Unrecognized+directive+ ,
> you can simply go through them for any hint
>
> --
> YANG Song (杨嵩)
> IBM China System Technology Laboratory
> Tel: 86-10-82452903
> Email: yang...@cn.ibm.com
> Address: Building 28, ZhongGuanCun Software Park,
> No.8, Dong Bei Wang West Road, Haidian District Beijing 100193, PRC
>
> 北京市海淀区东北旺西路8号中关村软件园28号楼
> 邮编: 100193
>
>
>
> - Original message -
> From: Andrew Loftus 
> To: xCAT Users Mailing list 
> Cc:
> Subject: [EXTERNAL] [xcat-user] Unrecognized directive (dest=) during
> nodediscovery
> Date: Thu, May 23, 2019 6:58 AM
>
> Setting up a new test cluster and getting the following on the compute
> node during nodediscovery:
>
> xcat.genesis.doxcat: Running getdestiny --> 172.30.18.6:3001
>
> xcat.genesis.doxcat: Received destiny=
> xcat.genesis.doxcat: The destiny=, destiny parameters=
> 

Re: [xcat-user] Unrecognized directive (dest=) during nodediscovery

2019-05-23 Thread Song BJ Yang
Hi,
 
You can run `rinstall [CN] shell` to provision the node to genesis shell, and then run `/bin/dodiscovery` manually.
 
 
--YANG Song (杨嵩)IBM China System Technology LaboratoryTel: 86-10-82452903Email: yang...@cn.ibm.comAddress: Building 28, ZhongGuanCun Software Park,No.8, Dong Bei Wang West Road, Haidian District Beijing 100193, PRC北京市海淀区东北旺西路8号中关村软件园28号楼邮编: 100193
 
 
- Original message -From: Andrew Loftus To: xCAT Users Mailing list Cc:Subject: [EXTERNAL] Re: [xcat-user] Unrecognized directive (dest=) during nodediscoveryDate: Fri, May 24, 2019 4:10 AM 
I copied the XML request from the getdestiny bash script and built the node by hand (from ISO) and sent this request to the master node.
On the master node, in /var/log/xcat/cluster.log, I see:
DEBUG xcatd: connection from testvm30DEBUG xcatd: open new process : xcatd SSL: getdestiny for testvm30DEBUG xcatd: dispatch request 'getdestiny ' to plugin 'destiny'DEBUG xcatd: handle request 'getdestiny' by plugin 'destiny''s process_requestDEBUG destiny->process_request: starting getdestiny...DEBUG destiny->process_request: getdestiny...INFO xcat.updatestatus - testvm30: changing status=shellDEBUG destiny->process_request: processing is finished for getdestinyDEBUG xcatd: close connection with testvm30
 
For me, this confirms networking and firewalls are okay.
It also looks like discovery is okay on the master node.
It seems to me that something is not working in the genesis kernel.
 
How can I customize the "doxcat" script (or customize the genesis kernel) so I can force it to drop to a shell and then troubleshoot from within the live genesis kernel? 

On Thu, May 23, 2019 at 1:50 PM Loftus, Andrew J  wrote:
Thank you for pointing me to ` journalctl -u xcatd -f`, that shows me that xcat/xnba.kpxe is requested (and I assume provided to the booting node).
However, after that, I see no more activity on the xcat master. I would expect activity recorded on the xcat master at the steps:
xcat.genesis.doxcat: Getting initial certificate --> 172.30.18.6:3001
xcat.genesis.doxcat: Running getdestiny --> 172.30.18.6:3001
 
Both of those commands above seem to timeout with no apparent response from the xcat master. The ip address and port specified are correct for the xcat master.
I have ensured that iptables is open for those ports.
I have ensure that there is a process listening on port 3001.
 
What else should I check?
 
Can I get more debug info from the genesis kernel?
 
Can I get more debug info from the xcat master? 

On Thu, May 23, 2019 at 4:51 AM Song BJ Yang  wrote:
Hi,
 
You can watch the output of `journalctl -u xcatd -f` in another session during nodediscovery 
 
there are some similar issues reported before, see https://sourceforge.net/p/xcat/mailman/search/?q=Unrecognized+directive+ , you can simply go through them for any hint
--YANG Song (杨嵩)IBM China System Technology LaboratoryTel: 86-10-82452903Email: yang...@cn.ibm.comAddress: Building 28, ZhongGuanCun Software Park,No.8, Dong Bei Wang West Road, Haidian District Beijing 100193, PRC北京市海淀区东北旺西路8号中关村软件园28号楼邮编: 100193
 
 
- Original message -From: Andrew Loftus To: xCAT Users Mailing list Cc:Subject: [EXTERNAL] [xcat-user] Unrecognized directive (dest=) during nodediscoveryDate: Thu, May 23, 2019 6:58 AM 
Setting up a new test cluster and getting the following on the compute node during nodediscovery:
 
xcat.genesis.doxcat: Running getdestiny -->  172.30.18.6:3001
 
xcat.genesis.doxcat: Received destiny=
xcat.genesis.doxcat: The destiny=, destiny parameters=
xcat.genesis.doxcat: Unrecognized directive (dest=)
xcat.genesis.doxcat: ... Will retry xCAT in 40 seconds
xcat.genesis.doxcat: ... Will retry xCAT in 30 seconds
 
There are no "discovery" type messages in /var/log/messages or in /var/log/xcat/cluster.log.
 
Where to look for more troubleshooting information?
 
 
Version info:
# lsxcatd -a
Version 2.14.6 (git commit 7c787c499ebca7a98d6fcdb3577f701ac16d7ab3, built Fri Mar 29 03:19:08 EDT 2019)
This is a Management Node
dbengine=SQLite
___xCAT-user mailing listxCAT-user@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/xcat-user
 
___xCAT-user mailing listxCAT-user@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/xcat-user
 


___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user


Re: [xcat-user] confignetwork and localhost hostname

2019-05-23 Thread Yuan Y Bai
Hi Roosen, Nicolas
 
I am glad that `confignetwork` can work well in your postbootscripts.
 
Since `confignetwork` has some restrictions under postscripts in 2.14.6, it can work well in postbootscripts. I have enhance it running in postscripts in 2.15 branch, If you want to run it in postscripts, you can download latest `confignetwork` , `configeth` and  `nicutils.sh` from 2.15 master branch.
 
 
Best Regards--Yuan Bai (白媛)CSTL HPC System Management DevelopmentTel:86-10-82451401E-mail: by...@cn.ibm.comAddress: IBM ZGC Campus. Ring Building 28,ZhongGuanCun Software Park,No.8 Dong Bei Wang West Road, Haidian District,Beijing P.R.China 100193IBM环宇大厦北京市海淀区东北旺西路8号,中关村软件园28号楼邮编:100193
 
 
- Original message -From: "Roosen, Nicolas" To: "xcat-user@lists.sourceforge.net" Cc:Subject: [EXTERNAL] Re: [xcat-user] confignetwork and localhost hostnameDate: Thu, May 23, 2019 6:46 PM 
Hi,On 5/23/19 12:28 PM, Yuan Y Bai wrote:> Hi Nicolas>  > Based on your log, you use `confignics`. `confignics` cannot configure bond.>  > Could you try to use like the following command to configure bond?> >     chdef cn1  postbootscripts="otherpkgs,confignetwork -s"> postscripts="syslog,remoteshell,syncfiles">  >  yes you're right, I was mixing the two methods.It still doesn't work when I use the "confignetwork" in the postscripts.But postbootscript is fine for me.So I did some tests recently, and it almost works (see below why) whenprovisioning the node using the "postbootscript" method.Now I have an issue with the Ethernet driver (i40e / i40iw on a IntelX722 card) which does a core dump, but that's another story :-/Thanks for you help.Nicolas>  > Best Regards> --> Yuan Bai (白媛)>> CSTL HPC System Management Development> Tel:86-10-82451401> E-mail: by...@cn.ibm.com> Address: IBM ZGC Campus. Ring Building 28,> ZhongGuanCun Software Park,No.8 Dong Bei Wang West Road, Haidian District,> Beijing P.R.China 100193>> IBM环宇大厦> 北京市海淀区东北旺西路8号,中关村软件园28号楼> 邮编:100193>  >  >>     - Original message ->     From: "Roosen, Nicolas" >     To: "xcat-user@lists.sourceforge.net" >     Cc:>     Subject: [EXTERNAL] Re: [xcat-user] confignetwork and localhost hostname>     Date: Wed, May 22, 2019 3:51 PM>      >     On 5/22/19 8:52 AM, Yuan Y Bai wrote:>     > Hi Roosen,>     >  >     > Could you try to use `confignetwork -s` instead of `confignetwork` in>     > your postscripts?>     >  >     > You can use this command to change your postscripts:  chdef node1>     > postscripts="syslog,remoteshell,syncfiles,confignetwork -s">     >  >     > I think you use install NIC as one of bond slaves. `confignetwork -s`>     > can configure hostname during configure install NIC, after that, it>     > start to create bond.  >     >  >>     Thanks for the suggestion. I added the "-s" switch, still the issue is>     the same.>>     In the logs I see that the "bonding" module fails to load when>     provisioning, maybe I have to add this module somewhere (initramfs ?).>>     rt..: confignics>     confignics on node1: config install nic:0, remove: 0, iba ports: 1>     ib0!10.148.251.11>     bond0!1.2.3.4>     bond1!6.7.8.9>     confignics on node1: unknown nic type for bond0: 1.2.3.4 .>     confignics on node1: unknown nic type for bond1: 6.7.8.9 .>     confignics on node1: executed script: configib for nics: ib0, ports: 1>     bond0!BONDING_OPTS=mode=2>     bond1!MTU=9000>     ...: confignics return with 1>     [...]>     configure nic and its device : bond0 enp195s0f0@enp195s0f1>     type=ethernet>     ond0".>     [E]:Error: Fail to load kernel module "bonding">     [I]: >>>     ./nicutils.sh: line 1391: /sys/class/net/bonding_masters: Permission>     denied>     [E]:Error: stage 0: Fail to create bond device "bond0">>>     Thanks.>     Nicolas>>     >  >     > Best Regards>     > -->     > Yuan Bai (白媛)>     >>     > CSTL HPC System Management Development>     > Tel:86-10-82451401>     > E-mail: by...@cn.ibm.com>     > Address: IBM ZGC Campus. Ring Building 28,>     > ZhongGuanCun Software Park,No.8 Dong Bei Wang West Road, Haidian>     District,>     > Beijing P.R.China 100193>     >>     > IBM环宇大厦>     > 北京市海淀区东北旺西路8号,中关村软件园28号楼>     > 邮编:100193>     >  >     >  >     >>     >     - Original message ->     >     From: "Roosen, Nicolas" >     >     To: "xcat-user@lists.sourceforge.net">     >     >     Cc:>     >     Subject: [EXTERNAL] [xcat-user] confignetwork and localhost>     hostname>     >     Date: Tue, May 21, 2019 8:34 PM>     >      >     >     Hello, on xCAT 2.14.6 w/ RHEL 7.6, I have this weird issue: after>     >     provisioning the hostname is not correctly set, it stays to>     "localhost".>     >>     >     This happened since I added the "confignetwork" script to the node>     >     definition (to setup bonded interfaces).>     >>     >     Here are some details:>     >>     >     

Re: [xcat-user] Unrecognized directive (dest=) during nodediscovery

2019-05-23 Thread Andrew Loftus
I copied the XML request from the getdestiny bash script and built the node
by hand (from ISO) and sent this request to the master node.
On the master node, in /var/log/xcat/cluster.log, I see:
DEBUG xcatd: connection from testvm30
DEBUG xcatd: open new process : xcatd SSL: getdestiny for testvm30
DEBUG xcatd: dispatch request 'getdestiny ' to plugin 'destiny'
DEBUG xcatd: handle request 'getdestiny' by plugin 'destiny''s
process_request
DEBUG destiny->process_request: starting getdestiny...
DEBUG destiny->process_request: getdestiny...
INFO xcat.updatestatus - testvm30: changing status=shell
DEBUG destiny->process_request: processing is finished for getdestiny
DEBUG xcatd: close connection with testvm30

For me, this confirms networking and firewalls are okay.
It also looks like discovery is okay on the master node.
It seems to me that something is not working in the genesis kernel.

How can I customize the "doxcat" script (or customize the genesis kernel)
so I can force it to drop to a shell and then troubleshoot from within the
live genesis kernel?

On Thu, May 23, 2019 at 1:50 PM Loftus, Andrew J 
wrote:

> Thank you for pointing me to ` journalctl -u xcatd -f`, that shows me
> that xcat/xnba.kpxe is requested (and I assume provided to the booting
> node).
> However, after that, I see no more activity on the xcat master. I would
> expect activity recorded on the xcat master at the steps:
> xcat.genesis.doxcat: Getting initial certificate --> 172.30.18.6:3001
> xcat.genesis.doxcat: Running getdestiny --> 172.30.18.6:3001
>
> Both of those commands above seem to timeout with no apparent response
> from the xcat master. The ip address and port specified are correct for the
> xcat master.
> I have ensured that iptables is open for those ports.
> I have ensure that there is a process listening on port 3001.
>
> What else should I check?
>
> Can I get more debug info from the genesis kernel?
>
> Can I get more debug info from the xcat master?
>
> On Thu, May 23, 2019 at 4:51 AM Song BJ Yang  wrote:
>
>> Hi,
>>
>> You can watch the output of `journalctl -u xcatd -f` in another session
>> during nodediscovery
>>
>> there are some similar issues reported before, see
>> https://sourceforge.net/p/xcat/mailman/search/?q=Unrecognized+directive+ ,
>> you can simply go through them for any hint
>>
>> --
>> YANG Song (杨嵩)
>> IBM China System Technology Laboratory
>> Tel: 86-10-82452903
>> Email: yang...@cn.ibm.com
>> Address: Building 28, ZhongGuanCun Software Park,
>> No.8, Dong Bei Wang West Road, Haidian District Beijing 100193, PRC
>>
>> 北京市海淀区东北旺西路8号中关村软件园28号楼
>> 邮编: 100193
>>
>>
>>
>> - Original message -
>> From: Andrew Loftus 
>> To: xCAT Users Mailing list 
>> Cc:
>> Subject: [EXTERNAL] [xcat-user] Unrecognized directive (dest=) during
>> nodediscovery
>> Date: Thu, May 23, 2019 6:58 AM
>>
>> Setting up a new test cluster and getting the following on the compute
>> node during nodediscovery:
>>
>> xcat.genesis.doxcat: Running getdestiny --> 172.30.18.6:3001
>>
>> xcat.genesis.doxcat: Received destiny=
>> xcat.genesis.doxcat: The destiny=, destiny parameters=
>> xcat.genesis.doxcat: Unrecognized directive (dest=)
>> xcat.genesis.doxcat: ... Will retry xCAT in 40 seconds
>> xcat.genesis.doxcat: ... Will retry xCAT in 30 seconds
>>
>> There are no "discovery" type messages in /var/log/messages or in
>> /var/log/xcat/cluster.log.
>>
>> Where to look for more troubleshooting information?
>>
>>
>> Version info:
>> # lsxcatd -a
>> Version 2.14.6 (git commit 7c787c499ebca7a98d6fcdb3577f701ac16d7ab3,
>> built Fri Mar 29 03:19:08 EDT 2019)
>> This is a Management Node
>> dbengine=SQLite
>> ___
>> xCAT-user mailing list
>> xCAT-user@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/xcat-user
>>
>>
>>
>>
___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user


Re: [xcat-user] Unrecognized directive (dest=) during nodediscovery

2019-05-23 Thread Andrew Loftus
Thank you for pointing me to ` journalctl -u xcatd -f`, that shows me that
xcat/xnba.kpxe is requested (and I assume provided to the booting node).
However, after that, I see no more activity on the xcat master. I would
expect activity recorded on the xcat master at the steps:
xcat.genesis.doxcat: Getting initial certificate --> 172.30.18.6:3001
xcat.genesis.doxcat: Running getdestiny --> 172.30.18.6:3001

Both of those commands above seem to timeout with no apparent response from
the xcat master. The ip address and port specified are correct for the xcat
master.
I have ensured that iptables is open for those ports.
I have ensure that there is a process listening on port 3001.

What else should I check?

Can I get more debug info from the genesis kernel?

Can I get more debug info from the xcat master?

On Thu, May 23, 2019 at 4:51 AM Song BJ Yang  wrote:

> Hi,
>
> You can watch the output of `journalctl -u xcatd -f` in another session
> during nodediscovery
>
> there are some similar issues reported before, see
> https://sourceforge.net/p/xcat/mailman/search/?q=Unrecognized+directive+ ,
> you can simply go through them for any hint
>
> --
> YANG Song (杨嵩)
> IBM China System Technology Laboratory
> Tel: 86-10-82452903
> Email: yang...@cn.ibm.com
> Address: Building 28, ZhongGuanCun Software Park,
> No.8, Dong Bei Wang West Road, Haidian District Beijing 100193, PRC
>
> 北京市海淀区东北旺西路8号中关村软件园28号楼
> 邮编: 100193
>
>
>
> - Original message -
> From: Andrew Loftus 
> To: xCAT Users Mailing list 
> Cc:
> Subject: [EXTERNAL] [xcat-user] Unrecognized directive (dest=) during
> nodediscovery
> Date: Thu, May 23, 2019 6:58 AM
>
> Setting up a new test cluster and getting the following on the compute
> node during nodediscovery:
>
> xcat.genesis.doxcat: Running getdestiny --> 172.30.18.6:3001
>
> xcat.genesis.doxcat: Received destiny=
> xcat.genesis.doxcat: The destiny=, destiny parameters=
> xcat.genesis.doxcat: Unrecognized directive (dest=)
> xcat.genesis.doxcat: ... Will retry xCAT in 40 seconds
> xcat.genesis.doxcat: ... Will retry xCAT in 30 seconds
>
> There are no "discovery" type messages in /var/log/messages or in
> /var/log/xcat/cluster.log.
>
> Where to look for more troubleshooting information?
>
>
> Version info:
> # lsxcatd -a
> Version 2.14.6 (git commit 7c787c499ebca7a98d6fcdb3577f701ac16d7ab3, built
> Fri Mar 29 03:19:08 EDT 2019)
> This is a Management Node
> dbengine=SQLite
> ___
> xCAT-user mailing list
> xCAT-user@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/xcat-user
>
>
>
>
___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user


Re: [xcat-user] confignetwork and localhost hostname

2019-05-23 Thread Roosen, Nicolas
Hi,

On 5/23/19 12:28 PM, Yuan Y Bai wrote:
> Hi Nicolas
>  
> Based on your log, you use `confignics`. `confignics` cannot configure bond.
>  
> Could you try to use like the following command to configure bond?
> 
>     chdef cn1  postbootscripts="otherpkgs,confignetwork -s"
> postscripts="syslog,remoteshell,syncfiles"
>  
>  

yes you're right, I was mixing the two methods.

It still doesn't work when I use the "confignetwork" in the postscripts.
But postbootscript is fine for me.

So I did some tests recently, and it almost works (see below why) when
provisioning the node using the "postbootscript" method.

Now I have an issue with the Ethernet driver (i40e / i40iw on a Intel
X722 card) which does a core dump, but that's another story :-/

Thanks for you help.
Nicolas

>  
> Best Regards
> --
> Yuan Bai (白媛)
> 
> CSTL HPC System Management Development
> Tel:86-10-82451401
> E-mail: by...@cn.ibm.com
> Address: IBM ZGC Campus. Ring Building 28,
> ZhongGuanCun Software Park,No.8 Dong Bei Wang West Road, Haidian District,
> Beijing P.R.China 100193
> 
> IBM环宇大厦
> 北京市海淀区东北旺西路8号,中关村软件园28号楼
> 邮编:100193
>  
>  
> 
> - Original message -
> From: "Roosen, Nicolas" 
> To: "xcat-user@lists.sourceforge.net" 
> Cc:
> Subject: [EXTERNAL] Re: [xcat-user] confignetwork and localhost hostname
> Date: Wed, May 22, 2019 3:51 PM
>  
> On 5/22/19 8:52 AM, Yuan Y Bai wrote:
> > Hi Roosen,
> >  
> > Could you try to use `confignetwork -s` instead of `confignetwork` in
> > your postscripts?
> >  
> > You can use this command to change your postscripts:  chdef node1
> > postscripts="syslog,remoteshell,syncfiles,confignetwork -s"
> >  
> > I think you use install NIC as one of bond slaves. `confignetwork -s`
> > can configure hostname during configure install NIC, after that, it
> > start to create bond.  
> >  
> 
> Thanks for the suggestion. I added the "-s" switch, still the issue is
> the same.
> 
> In the logs I see that the "bonding" module fails to load when
> provisioning, maybe I have to add this module somewhere (initramfs ?).
> 
> rt..: confignics
> confignics on node1: config install nic:0, remove: 0, iba ports: 1
> ib0!10.148.251.11
> bond0!1.2.3.4
> bond1!6.7.8.9
> confignics on node1: unknown nic type for bond0: 1.2.3.4 .
> confignics on node1: unknown nic type for bond1: 6.7.8.9 .
> confignics on node1: executed script: configib for nics: ib0, ports: 1
> bond0!BONDING_OPTS=mode=2
> bond1!MTU=9000
> ...: confignics return with 1
> [...]
> configure nic and its device : bond0 enp195s0f0@enp195s0f1
> type=ethernet
> ond0".
> [E]:Error: Fail to load kernel module "bonding"
> [I]: >>
> ./nicutils.sh: line 1391: /sys/class/net/bonding_masters: Permission
> denied
> [E]:Error: stage 0: Fail to create bond device "bond0"
> 
> 
> Thanks.
> Nicolas
> 
> >  
> > Best Regards
> > --
> > Yuan Bai (白媛)
> >
> > CSTL HPC System Management Development
> > Tel:86-10-82451401
> > E-mail: by...@cn.ibm.com
> > Address: IBM ZGC Campus. Ring Building 28,
> > ZhongGuanCun Software Park,No.8 Dong Bei Wang West Road, Haidian
> District,
> > Beijing P.R.China 100193
> >
> > IBM环宇大厦
> > 北京市海淀区东北旺西路8号,中关村软件园28号楼
> > 邮编:100193
> >  
> >  
> >
> >     - Original message -
> >     From: "Roosen, Nicolas" 
> >     To: "xcat-user@lists.sourceforge.net"
> 
> >     Cc:
> >     Subject: [EXTERNAL] [xcat-user] confignetwork and localhost
> hostname
> >     Date: Tue, May 21, 2019 8:34 PM
> >      
> >     Hello, on xCAT 2.14.6 w/ RHEL 7.6, I have this weird issue: after
> >     provisioning the hostname is not correctly set, it stays to
> "localhost".
> >
> >     This happened since I added the "confignetwork" script to the node
> >     definition (to setup bonded interfaces).
> >
> >     Here are some details:
> >
> >     lsdef -t node -o node1
> >
> >     Object name: node1
> >         arch=x86_64
> >         currchain=boot
> >         currstate=install rhels7.6-x86_64-compute
> >         groups=all
> >         installnic=mac
> >         ip=1.2.3.4
> >         mac=08:00:00:00:00:00
> >         mgt=none
> >         netboot=xnba
> >         nicdevices.bond0=enp195s0f0|enp195s0f1
> >         nicdevices.bond1=enP2p193s0|enP3p65s0
> >         nicextraparams.bond0=BONDING_OPTS=mode=2
> >         nicextraparams.bond1=MTU=9000
> >         nicips.bond0=1.2.3.4
> >         nicips.bond1=6.7.8.9
> >         nicnetworks.bond0=1_1_3_4-255_255_0_0
> >         nicnetworks.bond1=6_7_8_9-255_255_255_0
> >         

Re: [xcat-user] confignetwork and localhost hostname

2019-05-23 Thread Yuan Y Bai
Hi Nicolas
 
Based on your log, you use `confignics`. `confignics` cannot configure bond.
 
Could you try to use like the following command to configure bond?
 
    chdef cn1  postbootscripts="otherpkgs,confignetwork -s" postscripts="syslog,remoteshell,syncfiles"
 
 
 
Best Regards--Yuan Bai (白媛)CSTL HPC System Management DevelopmentTel:86-10-82451401E-mail: by...@cn.ibm.comAddress: IBM ZGC Campus. Ring Building 28,ZhongGuanCun Software Park,No.8 Dong Bei Wang West Road, Haidian District,Beijing P.R.China 100193IBM环宇大厦北京市海淀区东北旺西路8号,中关村软件园28号楼邮编:100193
 
 
- Original message -From: "Roosen, Nicolas" To: "xcat-user@lists.sourceforge.net" Cc:Subject: [EXTERNAL] Re: [xcat-user] confignetwork and localhost hostnameDate: Wed, May 22, 2019 3:51 PM 
On 5/22/19 8:52 AM, Yuan Y Bai wrote:> Hi Roosen,>  > Could you try to use `confignetwork -s` instead of `confignetwork` in> your postscripts?>  > You can use this command to change your postscripts:  chdef node1> postscripts="syslog,remoteshell,syncfiles,confignetwork -s">  > I think you use install NIC as one of bond slaves. `confignetwork -s`> can configure hostname during configure install NIC, after that, it> start to create bond.  >  Thanks for the suggestion. I added the "-s" switch, still the issue isthe same.In the logs I see that the "bonding" module fails to load whenprovisioning, maybe I have to add this module somewhere (initramfs ?).rt..: confignicsconfignics on node1: config install nic:0, remove: 0, iba ports: 1ib0!10.148.251.11bond0!1.2.3.4bond1!6.7.8.9confignics on node1: unknown nic type for bond0: 1.2.3.4 .confignics on node1: unknown nic type for bond1: 6.7.8.9 .confignics on node1: executed script: configib for nics: ib0, ports: 1bond0!BONDING_OPTS=mode=2bond1!MTU=9000...: confignics return with 1[...]configure nic and its device : bond0 enp195s0f0@enp195s0f1type=ethernetond0".[E]:Error: Fail to load kernel module "bonding"[I]: >>./nicutils.sh: line 1391: /sys/class/net/bonding_masters: Permission denied[E]:Error: stage 0: Fail to create bond device "bond0"Thanks.Nicolas>  > Best Regards> --> Yuan Bai (白媛)>> CSTL HPC System Management Development> Tel:86-10-82451401> E-mail: by...@cn.ibm.com> Address: IBM ZGC Campus. Ring Building 28,> ZhongGuanCun Software Park,No.8 Dong Bei Wang West Road, Haidian District,> Beijing P.R.China 100193>> IBM环宇大厦> 北京市海淀区东北旺西路8号,中关村软件园28号楼> 邮编:100193>  >  >>     - Original message ->     From: "Roosen, Nicolas" >     To: "xcat-user@lists.sourceforge.net" >     Cc:>     Subject: [EXTERNAL] [xcat-user] confignetwork and localhost hostname>     Date: Tue, May 21, 2019 8:34 PM>      >     Hello, on xCAT 2.14.6 w/ RHEL 7.6, I have this weird issue: after>     provisioning the hostname is not correctly set, it stays to "localhost".>>     This happened since I added the "confignetwork" script to the node>     definition (to setup bonded interfaces).>>     Here are some details:>>     lsdef -t node -o node1>>     Object name: node1>         arch=x86_64>         currchain=boot>         currstate=install rhels7.6-x86_64-compute>         groups=all>         installnic=mac>         ip=1.2.3.4>         mac=08:00:00:00:00:00>         mgt=none>         netboot=xnba>         nicdevices.bond0=enp195s0f0|enp195s0f1>         nicdevices.bond1=enP2p193s0|enP3p65s0>         nicextraparams.bond0=BONDING_OPTS=mode=2>         nicextraparams.bond1=MTU=9000>         nicips.bond0=1.2.3.4>         nicips.bond1=6.7.8.9>         nicnetworks.bond0=1_1_3_4-255_255_0_0>         nicnetworks.bond1=6_7_8_9-255_255_255_0>         nictypes.enP3p65s0=ethernet>         nictypes.bond0=bond>         nictypes.bond1=bond>         nictypes.enp195s0f1=ethernet>         nictypes.enP2p193s0=ethernet>         nictypes.enp195s0f0=ethernet>         os=rhels7.6>         postbootscripts=otherpkgs>         postscripts=syslog,remoteshell,syncfiles,confignetwork>         primarynic=mac>         profile="">>         provmethod=rhels7.6-x86_64-install-node>         routenames=defaultroute>         status=booted>         statustime=05-20-2019 17:42:15>         updatestatus=synced>         updatestatustime=05-20-2019 14:31:18>>>     lsdef -t osimage -o rhels7.6-x86_64-install-node>>     Object name: rhels7.6-x86_64-install-node>         addkcmdline=earlyprintk=ttyS0,115200 console=tty0>     console=ttyS0,115200>         imagetype=linux>         osarch=x86_64>         osdistroname=rhels7.6-x86_64>         osname=Linux>         osvers=rhels7.6>         otherpkgdir=/install/post/otherpkgs/rhels7.6/x86_64>       >      otherpkglist=/install/custom/install/rh/sdflex.rhels7.otherpkgs.pkglist>         partitionfile=/install/custom/install/rh/sdflexparitions>       >      pkgdir=/install/rhels7.6/x86_64,/install/post/otherpkgs/rhels7.6/x86_64>         pkglist=/opt/xcat/share/xcat/install/rh/sdflex.rhels7.pkglist>         profile="">>         provmethod=install>      

Re: [xcat-user] Unrecognized directive (dest=) during nodediscovery

2019-05-23 Thread Song BJ Yang
Hi,
 
You can watch the output of `journalctl -u xcatd -f` in another session during nodediscovery 
 
there are some similar issues reported before, see https://sourceforge.net/p/xcat/mailman/search/?q=Unrecognized+directive+ , you can simply go through them for any hint
--YANG Song (杨嵩)IBM China System Technology LaboratoryTel: 86-10-82452903Email: yang...@cn.ibm.comAddress: Building 28, ZhongGuanCun Software Park,No.8, Dong Bei Wang West Road, Haidian District Beijing 100193, PRC北京市海淀区东北旺西路8号中关村软件园28号楼邮编: 100193
 
 
- Original message -From: Andrew Loftus To: xCAT Users Mailing list Cc:Subject: [EXTERNAL] [xcat-user] Unrecognized directive (dest=) during nodediscoveryDate: Thu, May 23, 2019 6:58 AM 
Setting up a new test cluster and getting the following on the compute node during nodediscovery:
 
xcat.genesis.doxcat: Running getdestiny --> 172.30.18.6:3001
 
xcat.genesis.doxcat: Received destiny=
xcat.genesis.doxcat: The destiny=, destiny parameters=
xcat.genesis.doxcat: Unrecognized directive (dest=)
xcat.genesis.doxcat: ... Will retry xCAT in 40 seconds
xcat.genesis.doxcat: ... Will retry xCAT in 30 seconds
 
There are no "discovery" type messages in /var/log/messages or in /var/log/xcat/cluster.log.
 
Where to look for more troubleshooting information?
 
 
Version info:
# lsxcatd -a
Version 2.14.6 (git commit 7c787c499ebca7a98d6fcdb3577f701ac16d7ab3, built Fri Mar 29 03:19:08 EDT 2019)
This is a Management Node
dbengine=SQLite
___xCAT-user mailing listxCAT-user@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/xcat-user
 


___
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user