Re: [xcat-user] makedns issue
Arif, yes, it makes sense. It's kind of a new requirement, because current xcat dns implementation can not meet this case and running command on SN directly normally is not recommanded though it works for you :) To Bruce, for the implementation to set up an individual dns server on SN, I think this could be something like site.disjointdhcps, it's default to 0, means to set up dns server on MN and dns forwarder cache on SNs, if it's set to 1 means to set up individual dns servers on MN and SNs. while if the SN does not have external network connection, then maybe it does not make sense to set the site.forwarders to the dns server on SN? Another potential issue might be passing /etc/hosts content from MN to SNs via XML in the scaling cluster, this seems not a xcat strategy - SN should get the resources from database on a file. I'm now investigating if there is any method inside dns to make syncup between dns servers, for example, dig, or rndc, or something else, if so, we can keep the logic to set up dns server on MN, then make syncup on SN without /etc/hosts, will try more. Thx. Jing Arif Ali m...@arif-ali.co.uk 2013/10/26 02:02 Please respond to xCAT Users Mailing list xcat-user@lists.sourceforge.net To xCAT Users Mailing list xcat-user@lists.sourceforge.net, cc Subject Re: [xcat-user] makedns issue Bruce/Jing, Let me elaborate on another scenario, where having multiple nameservers, and primarily SN being the choice, I run makehosts and makedns on all service nodes, I was in essence making the SN as a temporary failover for as many services that xCAT MN was providing. The only service not being to failover were the commands that were commands that were tied to the database, which is most of them. Actually one of our customers due to the nature of the remote access, both the customer and ourselves accidentally rebooted the MN, and jobs happily were able to resolve the names due to the fact the SN was the shadow host for the scheduler and had a named configuration. We tend to find that MN tends to be also the login node, so to have a fail-over like this can be imperative I hope that makes sense. -- Arif Ali IRC: arif-ali at freenode LinkedIn: http://uk.linkedin.com/in/arifali On 25 October 2013 18:01, Bruce M Potter b...@us.ibm.com wrote: Sun Jing, Arif's requirement is valid and i have heard it many times, so we need to come up with a solution. Maybe we need an option on makedns to configure all of the SN DNS instances the same way we configure the MN DNS: have all of the mappings from /etc/hosts (from the MN), and be configured with site.forwarders. In this mode makedns would be similar to makedhcp, in that it would spawn an xcatd request to each of the SNs to configure the DNS on the SNs. Thoughts? Other ideas? Bruce PotterSTSM, Linux AIX Cluster Development, IBM, Poughkeepsie, NY Email: b...@us.ibm.comPhone: external: 845-433-7073, internal: TL 293-7073 Jing CDL Sun ---10/25/2013 11:39:24 AM---Arif, In xcat dns, the dns server is set up on the management node, while the From: Jing CDL Sun sj...@cn.ibm.com To: xCAT Users Mailing list xcat-user@lists.sourceforge.net, Cc: xCAT Users Mailing list xcat-user@lists.sourceforge.net Date: 10/25/2013 11:39 AM Subject: Re: [xcat-user] makedns issue Arif, In xcat dns, the dns server is set up on the management node, while the service node only acts as a dns forwarder, that means if a new host needs to be resolved, then the service node will forward this dns request to the management node, so if the management node goes down, that will not work. BTW, I have fixed bug #3853, the fix will be included in xcat 2.8.3 release. Arif Ali m...@arif-ali.co.uk 2013/10/25 16:57 Please respond to xCAT Users Mailing list xcat-user@lists.sourceforge.net To xCAT Users Mailing list xcat-user@lists.sourceforge.net, cc Subject Re: [xcat-user] makedns issue Jing, The purpose of having multiple entries including SN in networks.nameservers is so that if the MN goes down, then we still have a SN to be able to get DNS and host resolution. I hope that makes sense. Would it make sense to allow this? regards, Arif -- Arif Ali IRC: arif-ali at freenode LinkedIn: http://uk.linkedin.com/in/arifali On 24 October 2013 04:24, Jing CDL Sun sj...@cn.ibm.com wrote: [root@defiant ~]# tabdump -d networks|grep xcatmaster gateway:The network gateway. It can be set to an ip address or the keyword xcatmaster, the keyword xcatmaster indicates the cluster-facing ip address configured on this management node or service node. Leaving this field blank means that there is no gateway for this network. nameservers:A comma delimited list of DNS servers that each node in this network should use. This value will end up in the nameserver settings of the /etc/resolv.conf on each node in this network. If this attribute value is set to the IP address of an
[xcat-user] xnba/genesis dhcp timeout issue
I'm getting a dhcp timeout issue with xnba/genesis. On discovery, a node will boot from the network and get the xnba image and then times out getting network information via dhcp. I'm running xCAT version 2.8.2, if it helps. On the xcat management server side, I see the initial sequence of DHCP requests coming from the node - DHCPDISCOVER, DHCPOFFER, DHCPREQUEST, then DHCPACK. After the node downloads the xnba image and loads it, it then tries to do another dhcp request, where the request times out. On the xcat side, I see just a DHCPDISCOVER and a DHCPOFFER, but no further. Roland -- October Webinars: Code for Performance Free Intel webinars can help you accelerate application performance. Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from the latest Intel processors and coprocessors. See abstracts and register http://pubads.g.doubleclick.net/gampad/clk?id=60135991iu=/4140/ostg.clktrk___ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user
Re: [xcat-user] re-discovering node after motherboard replacement
Should be ready to be nodeset to do something else. 'standby' in this case is 'completed everything supposed to happen, awaiting instructions' If you put in hard drives with os still working: nodeset node boot if hard drive needs reinstall: nodeset node osimage If stateless: nodeset node netboot From: Damir Krstic damir.krs...@gmail.com To: xCAT Users Mailing list xcat-user@lists.sourceforge.net Date: 10/28/2013 10:42 AM Subject:[xcat-user] re-discovering node after motherboard replacement One of our GPU nodes had bad motherboard and we had it replaced few days ago. After motherboard was replaced we ran rmnodecfg script and node was re-discovered: mgt xCAT node discovery: qgpu0020 has been discovered I can see the new MAC address in the mac table. However, we are running into issues reprograming BMC. It never finishes. Console screen displays: Received request to retry in a bit, will call xCAT back in amount seconds. lsdef on this node displays that node is standby mode (not sure what that means): chain=runcmd=bmcsetup,standby currchain=standby currstate=standby Here is the content of the pxelinux file for this node: #standby DEFAULT xCAT LABEL xCAT KERNEL xcat/genesis.kernel.x86_64 APPEND initrd=xcat/genesis.fs.x86_64.gz quiet console=tty0 console=ttyS0,115200 xcatd=172.20.0.1:3001 destiny=standby nouveau.modeset=0 IPAPPEND 2 I hope you can help. Damir -- October Webinars: Code for Performance Free Intel webinars can help you accelerate application performance. Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from the latest Intel processors and coprocessors. See abstracts and register http://pubads.g.doubleclick.net/gampad/clk?id=60135991iu=/4140/ostg.clktrk ___ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user inline: graycol.gif-- October Webinars: Code for Performance Free Intel webinars can help you accelerate application performance. Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from the latest Intel processors and coprocessors. See abstracts and register http://pubads.g.doubleclick.net/gampad/clk?id=60135991iu=/4140/ostg.clktrk___ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user
Re: [xcat-user] re-discovering node after motherboard replacement
OK - I'll try booting from the hard drive and see if that works, but...BMC never got programed. I can't reach this node with any of the rcons/rpower commands and if I try to telnet to its bmc port it fails. I'll keep poking to see if there are any other errors related to programing of BMC. Thanks, Damir On Mon, Oct 28, 2013 at 9:46 AM, Jarrod B Johnson jbjoh...@us.ibm.comwrote: Should be ready to be nodeset to do something else. 'standby' in this case is 'completed everything supposed to happen, awaiting instructions' If you put in hard drives with os still working: nodeset node boot if hard drive needs reinstall: nodeset node osimage If stateless: nodeset node netboot [image: Inactive hide details for Damir Krstic ---10/28/2013 10:42:01 AM---One of our GPU nodes had bad motherboard and we had it repla]Damir Krstic ---10/28/2013 10:42:01 AM---One of our GPU nodes had bad motherboard and we had it replaced few days ago. After motherboard was From: Damir Krstic damir.krs...@gmail.com To: xCAT Users Mailing list xcat-user@lists.sourceforge.net Date: 10/28/2013 10:42 AM Subject: [xcat-user] re-discovering node after motherboard replacement -- One of our GPU nodes had bad motherboard and we had it replaced few days ago. After motherboard was replaced we ran rmnodecfg script and node was re-discovered: mgt xCAT node discovery: qgpu0020 has been discovered I can see the new MAC address in the mac table. However, we are running into issues reprograming BMC. It never finishes. Console screen displays: Received request to retry in a bit, will call xCAT back in amount seconds. lsdef on this node displays that node is standby mode (not sure what that means): chain=runcmd=bmcsetup,standby currchain=standby currstate=standby Here is the content of the pxelinux file for this node: #standby DEFAULT xCAT LABEL xCAT KERNEL xcat/genesis.kernel.x86_64 APPEND initrd=xcat/genesis.fs.x86_64.gz quiet console=tty0 console=ttyS0,115200 xcatd=*172.20.0.1:3001* http://172.20.0.1:3001/ destiny=standby nouveau.modeset=0 IPAPPEND 2 I hope you can help. Damir -- October Webinars: Code for Performance Free Intel webinars can help you accelerate application performance. Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from the latest Intel processors and coprocessors. See abstracts and register http://pubads.g.doubleclick.net/gampad/clk?id=60135991iu=/4140/ostg.clktrk ___ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user -- October Webinars: Code for Performance Free Intel webinars can help you accelerate application performance. Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from the latest Intel processors and coprocessors. See abstracts and register http://pubads.g.doubleclick.net/gampad/clk?id=60135991iu=/4140/ostg.clktrk ___ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user graycol.gif-- October Webinars: Code for Performance Free Intel webinars can help you accelerate application performance. Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from the latest Intel processors and coprocessors. See abstracts and register http://pubads.g.doubleclick.net/gampad/clk?id=60135991iu=/4140/ostg.clktrk___ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user
Re: [xcat-user] re-discovering node after motherboard replacement
Perhaps the cmos settings need to be adjusted. If you have a shared nic (eth0/imm) make sure the the appropriate settings in cmos indicate shared nic. What is the server machine type? Thomas Alandt WW Test Engineer Complex Solutions IBM-ISC From: Damir Krstic damir.krs...@gmail.com To: xCAT Users Mailing list xcat-user@lists.sourceforge.net, Date: 10/28/2013 11:01 AM Subject:Re: [xcat-user] re-discovering node after motherboard replacement OK - I'll try booting from the hard drive and see if that works, but...BMC never got programed. I can't reach this node with any of the rcons/rpower commands and if I try to telnet to its bmc port it fails. I'll keep poking to see if there are any other errors related to programing of BMC. Thanks, Damir On Mon, Oct 28, 2013 at 9:46 AM, Jarrod B Johnson jbjoh...@us.ibm.com wrote: Should be ready to be nodeset to do something else. 'standby' in this case is 'completed everything supposed to happen, awaiting instructions' If you put in hard drives with os still working: nodeset node boot if hard drive needs reinstall: nodeset node osimage If stateless: nodeset node netboot Inactive hide details for Damir Krstic ---10/28/2013 10:42:01 AM---One of our GPU nodes had bad motherboard and we had it replaDamir Krstic ---10/28/2013 10:42:01 AM---One of our GPU nodes had bad motherboard and we had it replaced few days ago. After motherboard was From: Damir Krstic damir.krs...@gmail.com To: xCAT Users Mailing list xcat-user@lists.sourceforge.net Date: 10/28/2013 10:42 AM Subject: [xcat-user] re-discovering node after motherboard replacement One of our GPU nodes had bad motherboard and we had it replaced few days ago. After motherboard was replaced we ran rmnodecfg script and node was re-discovered: mgt xCAT node discovery: qgpu0020 has been discovered I can see the new MAC address in the mac table. However, we are running into issues reprograming BMC. It never finishes. Console screen displays: Received request to retry in a bit, will call xCAT back in amount seconds. lsdef on this node displays that node is standby mode (not sure what that means): chain=runcmd=bmcsetup,standby currchain=standby currstate=standby Here is the content of the pxelinux file for this node: #standby DEFAULT xCAT LABEL xCAT KERNEL xcat/genesis.kernel.x86_64 APPEND initrd=xcat/genesis.fs.x86_64.gz quiet console=tty0 console=ttyS0,115200 xcatd=172.20.0.1:3001 destiny=standby nouveau.modeset=0 IPAPPEND 2 I hope you can help. Damir -- October Webinars: Code for Performance Free Intel webinars can help you accelerate application performance. Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from the latest Intel processors and coprocessors. See abstracts and register http://pubads.g.doubleclick.net/gampad/clk?id=60135991iu=/4140/ostg.clktrk ___ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user -- October Webinars: Code for Performance Free Intel webinars can help you accelerate application performance. Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from the latest Intel processors and coprocessors. See abstracts and register http://pubads.g.doubleclick.net/gampad/clk?id=60135991iu=/4140/ostg.clktrk ___ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user -- October Webinars: Code for Performance Free Intel webinars can help you accelerate application performance. Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from the latest Intel processors and coprocessors. See abstracts and register http://pubads.g.doubleclick.net/gampad/clk?id=60135991iu=/4140/ostg.clktrk ___ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user inline: graycol.gif-- October Webinars: Code for Performance Free Intel webinars can help you accelerate application performance. Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from the latest Intel processors and coprocessors. See abstracts and register http://pubads.g.doubleclick.net/gampad/clk?id=60135991iu=/4140/ostg.clktrk___ xCAT-user mailing list xCAT-user@lists.sourceforge.net
Re: [xcat-user] re-discovering node after motherboard replacement
I'll bet that ipmi.bmcport is not set. If that is the case, set: nodech noderange ipmi.bmcport=0 And from now on, the IMM will move to the right port automatically during bmcsetup. From: Damir Krstic damir.krs...@gmail.com To: xCAT Users Mailing list xcat-user@lists.sourceforge.net Date: 10/28/2013 11:18 AM Subject:Re: [xcat-user] re-discovering node after motherboard replacement Machine type is 7912AC1. Shared nic...I can ssh to it while genesis is running but can't telnet to the shared bmc port. I'll go and look at the cmos settings to see if anything out of ordinary is set in the IMM section. Damir On Mon, Oct 28, 2013 at 10:05 AM, Jarrod B Johnson jbjoh...@us.ibm.com wrote: oh how is it wired? can you ssh to the node? is bmc dedicated or shared? nodels node ipmi.bmcport ssh node ipmitool lan print 1 Inactive hide details for Damir Krstic ---10/28/2013 11:01:39 AM---OK - I'll try booting from the hard drive and see if that woDamir Krstic ---10/28/2013 11:01:39 AM---OK - I'll try booting from the hard drive and see if that works, but...BMC never got programed. I c From: Damir Krstic damir.krs...@gmail.com To: xCAT Users Mailing list xcat-user@lists.sourceforge.net Date: 10/28/2013 11:01 AM Subject: Re: [xcat-user] re-discovering node after motherboard replacement OK - I'll try booting from the hard drive and see if that works, but...BMC never got programed. I can't reach this node with any of the rcons/rpower commands and if I try to telnet to its bmc port it fails. I'll keep poking to see if there are any other errors related to programing of BMC. Thanks, Damir On Mon, Oct 28, 2013 at 9:46 AM, Jarrod B Johnson jbjoh...@us.ibm.com wrote: Should be ready to be nodeset to do something else. 'standby' in this case is 'completed everything supposed to happen, awaiting instructions' If you put in hard drives with os still working: nodeset node boot if hard drive needs reinstall: nodeset node osimage If stateless: nodeset node netboot Inactive hide details for Damir Krstic ---10/28/2013 10:42:01 AM---One of our GPU nodes had bad motherboard and we had it repla Damir Krstic ---10/28/2013 10:42:01 AM---One of our GPU nodes had bad motherboard and we had it replaced few days ago. After motherboard was From: Damir Krstic damir.krs...@gmail.com To: xCAT Users Mailing list xcat-user@lists.sourceforge.net Date: 10/28/2013 10:42 AM Subject: [xcat-user] re-discovering node after motherboard replacement One of our GPU nodes had bad motherboard and we had it replaced few days ago. After motherboard was replaced we ran rmnodecfg script and node was re-discovered: mgt xCAT node discovery: qgpu0020 has been discovered I can see the new MAC address in the mac table. However, we are running into issues reprograming BMC. It never finishes. Console screen displays: Received request to retry in a bit, will call xCAT back in amount seconds. lsdef on this node displays that node is standby mode (not sure what that means): chain=runcmd=bmcsetup,standby currchain=standby currstate=standby Here is the content of the pxelinux file for this node: #standby DEFAULT xCAT LABEL xCAT KERNEL xcat/genesis.kernel.x86_64 APPEND initrd=xcat/genesis.fs.x86_64.gz quiet console=tty0 console=ttyS0,115200 xcatd=172.20.0.1:3001 destiny=standby nouveau.modeset=0 IPAPPEND 2 I hope you can help. Damir -- October Webinars: Code for Performance Free Intel webinars can help you accelerate application performance. Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from the latest Intel processors and coprocessors. See abstracts and register http://pubads.g.doubleclick.net/gampad/clk?id=60135991iu=/4140/ostg.clktrk ___ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user -- October Webinars: Code for Performance Free Intel webinars can help you accelerate application performance. Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from the latest Intel processors and coprocessors. See abstracts and register http://pubads.g.doubleclick.net/gampad/clk?id=60135991iu=/4140/ostg.clktrk
[xcat-user] dhcp timeout in xnba
I'm getting a dhcp timeout issue with xnba/genesis. On discovery, a node will boot from the network and get the xnba image. It then times out getting network information via dhcp from within xnba. I'm running xCAT version 2.8.2, if it helps. On the xcat management server side, I see the initial sequence of DHCP requests coming from the node - DHCPDISCOVER, DHCPOFFER, DHCPREQUEST, then DHCPACK. After the node downloads the xnba image and loads it, it then tries to do another dhcp request, where the request times out. On the xcat side, I see just a DHCPDISCOVER and a DHCPOFFER, but no further. Any help would be appreciated. -- October Webinars: Code for Performance Free Intel webinars can help you accelerate application performance. Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from the latest Intel processors and coprocessors. See abstracts and register http://pubads.g.doubleclick.net/gampad/clk?id=60135991iu=/4140/ostg.clktrk___ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user
Re: [xcat-user] dhcp timeout in xnba
We had a problem where the switch ports had been properly configured with spanning-tree edge mode for the first 32 ports, but when we added new nodes, we didn't notice that ports 33-48 on the switch had been left to the default. Edge mode comes up quickly, but the default took too long to begin forwarding on the port. This might not be your problem, but we spent weeks trying different firmware before someone suggested we look at the switch configuration. -- ddj On Oct 28, 2013, at 1:35 PM, Roland Santos wrote: I'm getting a dhcp timeout issue with xnba/genesis. On discovery, a node will boot from the network and get the xnba image. It then times out getting network information via dhcp from within xnba. I'm running xCAT version 2.8.2, if it helps. On the xcat management server side, I see the initial sequence of DHCP requests coming from the node - DHCPDISCOVER, DHCPOFFER, DHCPREQUEST, then DHCPACK. After the node downloads the xnba image and loads it, it then tries to do another dhcp request, where the request times out. On the xcat side, I see just a DHCPDISCOVER and a DHCPOFFER, but no further. Any help would be appreciated. -- October Webinars: Code for Performance Free Intel webinars can help you accelerate application performance. Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from the latest Intel processors and coprocessors. See abstracts and register http://pubads.g.doubleclick.net/gampad/clk?id=60135991iu=/4140/ostg.clktrk___ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user -- October Webinars: Code for Performance Free Intel webinars can help you accelerate application performance. Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from the latest Intel processors and coprocessors. See abstracts and register http://pubads.g.doubleclick.net/gampad/clk?id=60135991iu=/4140/ostg.clktrk___ xCAT-user mailing list xCAT-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/xcat-user