We have had similar issues in the past, and networking is often the issue, particularly related to DNS. The two examples that come to mind are:
- DHCP handing out the wrong address for DNS - we have had a problem when gives out the 'outside' IP address for the xCAT server, rather than the 'inside' (ie node facing) interface. - Node has multiple interfaces in different networks, with interfaces defined in NICs table, but no nichostnamesuffixes or nichostnameprefixes are set. When makehosts is run, you get multiple entries for the node, on different IPs. When makedns is run, you end up with the host resolving to the last entry. for example, you might have your ethernet (for management and provisioning) and IPoIB interfaces defined in the nics table, but no pre/suffixes, when you run makehosts, you get: 10.20.1.10 node1 (corresponds to ethernet) > 10.30.1.10 node1 (corresponds to IB) when you run 'makedns node1', you find that 'host node1' resolves to the IB interface. With the right suffix, you instead get: 10.20.1.10 node1 (corresponds to ethernet) 10.30.1.10 node1-ib (corresponds to IB) and the problem goes away. On Wed, 10 Apr 2019 at 14:38, Victor Hu <[email protected]> wrote: > Hi Iman, > > Can you let us know what the issue was with your networks table? Typo or > something else? > This would help us have more data and also look into whether we need to > have some specific log message that would help the end user detect this > problem. > > We have also seen similar `dest=` in the past and having more data about > this would be helpful. > > Thanks, > Victor > > > ----- Original message ----- > From: Imam Toufique <[email protected]> > To: xCAT Users Mailing list <[email protected]> > Cc: > Subject: Re: [xcat-user] nodiscovery issues - please help! > Date: Wed, Apr 10, 2019 12:55 AM > > No, thank you! You guys are one of the best open source tool support I > have experienced! > > --imam > > On Tue, Apr 9, 2019 at 8:58 PM Bin XA Xu <[email protected]> wrote: > > That's great, and thanks for using xcat and contribute to it with your > questions and requirements. > > Bin Xu > HPC Software Development > Software Defined Infrastructure, IBM Systems > Phone: 86-010-82454067 > E-mail: [email protected] > > > > ----- Original message ----- > From: Imam Toufique <[email protected]> > To: xCAT Users Mailing list <[email protected]> > Cc: > Subject: Re: [xcat-user] nodiscovery issues - please help! > Date: Wed, Apr 10, 2019 9:19 AM > > Hi, > > problem resolved, I had an issue with the networks table. All is well and > good. > > thanks for the tip on upgrading xcat to 2.14.6. > > --imam > > On Tue, Apr 9, 2019 at 12:30 PM Imam Toufique <[email protected]> wrote: > > Ok, so, upgrading to 2.14.6 fixed all the malformed URL's in dhcpd.conf > file, but discovery seems to be still broken. > > I get the following when I attempt a node build, after the initial xnba > HTTP download through iPXE: > > [root@headnode Downloads]# tail -f /var/log/xcat/computes.log > Apr 9 11:50:25 10.1.250.231 [localhost] INFO xcat.genesis.doxcat: ... > Will retry xCAT in 10 seconds > Apr 9 11:50:35 10.1.250.231 [localhost] INFO xcat.genesis.doxcat: > Running getdestiny --> 10.1.250.1:3001 > Apr 9 11:50:45 10.1.250.231 [localhost] INFO xcat.genesis.doxcat: > Received destiny= > Apr 9 11:50:45 10.1.250.231 [localhost] INFO xcat.genesis.doxcat: The > destiny=, destiny parameters= > Apr 9 11:50:45 10.1.250.231 [localhost] ERR xcat.genesis.doxcat: > Unrecognized directive (dest=) > Apr 9 11:50:48 10.1.250.231 [localhost] INFO xcat.genesis.doxcat: ... > Will retry xCAT in 170 seconds > Apr 9 11:50:58 10.1.250.231 [localhost] INFO xcat.genesis.doxcat: ... > Will retry xCAT in 160 seconds > Apr 9 11:51:08 10.1.250.231 [localhost] INFO xcat.genesis.doxcat: ... > Will retry xCAT in 150 seconds > Apr 9 11:51:18 10.1.250.231 [localhost] INFO xcat.genesis.doxcat: ... > Will retry xCAT in 140 seconds > Apr 9 11:51:28 10.1.250.231 [localhost] INFO xcat.genesis.doxcat: ... > Will retry xCAT in 130 seconds > Apr 9 11:51:38 10.1.250.231 [localhost] INFO xcat.genesis.doxcat: ... > Will retry xCAT in 120 seconds > > Here is the DHCP lease file. > > # The format of this file is documented in the dhcpd.leases(5) manual page. > # This lease file was written by isc-dhcp-4.2.5 > > lease 10.1.250.233 { > starts 1 2019/04/08 21:19:20; > ends 2 2019/04/09 09:19:20; > tstp 2 2019/04/09 09:19:20; > cltt 1 2019/04/08 21:19:20; > binding state free; > hardware ethernet 84:2b:2b:a7:80:46; > uid "\001\204++\247\200F"; > } > lease 10.1.250.232 { > starts 1 2019/04/08 23:47:03; > ends 2 2019/04/09 11:47:03; > tstp 2 2019/04/09 11:47:03; > cltt 1 2019/04/08 23:47:03; > binding state free; > hardware ethernet b8:2a:72:ad:ff:31; > } > lease 10.1.250.230 { > starts 2 2019/04/09 18:44:25; > ends 3 2019/04/10 06:44:25; > tstp 3 2019/04/10 06:44:25; > cltt 2 2019/04/09 18:44:25; > binding state active; > next binding state free; > rewind binding state free; > hardware ethernet 84:2b:2b:9c:a1:db; > uid "\001\204++\234\241\333"; > } > lease 10.1.250.231 { > starts 2 2019/04/09 18:45:08; > ends 3 2019/04/10 06:45:08; > tstp 3 2019/04/10 06:45:08; > cltt 2 2019/04/09 18:45:08; > binding state active; > next binding state free; > rewind binding state free; > hardware ethernet 84:2b:2b:9c:a1:db; > } > server-duid "\000\001\000\001$;\020[pq\274E\0351"; > > here is the xcat version in this new installation: > > [root@headnode xcat]# lsxcatd -a > > > Version 2.14.6 (git commit 7c787c499ebca7a98d6fcdb3577f701ac16d7ab3, built > Fri Mar 29 03:19:08 EDT 2019) > > This is a Management Node > > > dbengine=SQLite > > Am I missing anything else? discovery was never a problem before with > the 2.14.2 version that I am running in production. > > thanks. > > On Mon, Apr 8, 2019 at 7:14 PM Er Tao Zhao <[email protected]> wrote: > > Hi, Imam > > Can you provide us the log file /var/log/xcat/cluster.log? > It seems there is node matched but with no `chain` attribute set. > So, will you pls provide also the /var/lib/dhcpd/dhcpd.leases file? > > Thx! > > Best Regards, > ----------------------------------- > Zhao Er Tao > > IBM China System and Technology Laboratory, Beijing > Tel:(86-10)82450485 > Email: [email protected] > Address: 1/F, 28 Building,ZhongGuanCun Software Park, > No.8 DongBeiWang West Road, Haidian District, > Beijing, 100193, P.R.China > > > > ----- Original message ----- > From: "Bin XA Xu" <[email protected]> > To: [email protected] > Cc: > Subject: Re: [xcat-user] nodiscovery issues - please help! > Date: Tue, Apr 9, 2019 9:48 AM > > As you are using 2.14.5, not sure if it is impacted by > https://github.com/xcat2/xcat-core/issues/5905 > But it will impact node discovery, here is a patch for it ( > https://github.com/xcat2/xcat-core/issues/5905#issuecomment-448165125) > > Or you can just try with 2.14.6, xcat latest GA release. > > Bin Xu > HPC Software Development > Software Defined Infrastructure, IBM Systems > Phone: 86-010-82454067 > E-mail: [email protected] > > > > ----- Original message ----- > From: Imam Toufique <[email protected]> > To: "xcat2/xcat-core" <[email protected]>, xCAT Users Mailing > list <[email protected]> > Cc: > Subject: [xcat-user] nodiscovery issues - please help! > Date: Tue, Apr 9, 2019 9:06 AM > > hi, > > i am trying to bring a new instance of xcat ( version 2.14.5 ) and I get > this when I attempted to install a client. usually 'nodediscoverls' would > show some undiscovered node(s), but in my case there is nothing. I am not > sure what's so different in this version of xcat. > > Can anyone shed some light on this? > > thanks. > > Apr 8 16:01:56 10.1.250.231 [localhost] INFO xcat.genesis.doxcat: > Getting initial certificate --> 10.1.250.1:3001 > Apr 8 16:02:06 10.1.250.231 [localhost] INFO xcat.genesis.doxcat: > Running getdestiny --> 10.1.250.1:3001 > Apr 8 16:02:16 10.1.250.231 [localhost] INFO xcat.genesis.doxcat: > Received destiny= > Apr 8 16:02:16 10.1.250.231 [localhost] INFO xcat.genesis.doxcat: The > destiny=, destiny parameters= > Apr 8 16:02:16 10.1.250.231 [localhost] ERR xcat.genesis.doxcat: > Unrecognized directive (dest=) > Apr 8 16:02:22 10.1.250.231 [localhost] INFO xcat.genesis.doxcat: ... > Will retry xCAT in 220 seconds > Apr 8 16:02:32 10.1.250.231 [localhost] INFO xcat.genesis.doxcat: ... > Will retry xCAT in 210 seconds > Apr 8 16:02:42 10.1.250.231 [localhost] INFO xcat.genesis.doxcat: ... > Will retry xCAT in 200 seconds > Apr 8 16:02:52 10.1.250.231 [localhost] INFO xcat.genesis.doxcat: ... > Will retry xCAT in 190 seconds > Apr 8 16:03:02 10.1.250.231 [localhost] INFO xcat.genesis.doxcat: ... > Will retry xCAT in 180 seconds > Apr 8 16:03:12 10.1.250.231 [localhost] INFO xcat.genesis.doxcat: ... > Will retry xCAT in 170 seconds > Apr 8 16:03:22 10.1.250.231 [localhost] INFO xcat.genesis.doxcat: ... > Will retry xCAT in 160 seconds > Apr 8 16:03:32 10.1.250.231 [localhost] INFO xcat.genesis.doxcat: ... > Will retry xCAT in 150 seconds > Apr 8 16:03:42 10.1.250.231 [localhost] INFO xcat.genesis.doxcat: ... > Will retry xCAT in 140 seconds > Apr 8 16:03:52 10.1.250.231 [localhost] INFO xcat.genesis.doxcat: ... > Will retry xCAT in 130 seconds > Apr 8 16:04:02 10.1.250.231 [localhost] INFO xcat.genesis.doxcat: ... > Will retry xCAT in 120 seconds > Apr 8 16:04:12 10.1.250.231 [localhost] INFO xcat.genesis.doxcat: ... > Will retry xCAT in 110 seconds > Apr 8 16:04:22 10.1.250.231 [localhost] INFO xcat.genesis.doxcat: ... > Will retry xCAT in 100 seconds > Apr 8 16:04:32 10.1.250.231 [localhost] INFO xcat.genesis.doxcat: ... > Will retry xCAT in 90 seconds > Apr 8 16:04:42 10.1.250.231 [localhost] INFO xcat.genesis.doxcat: ... > Will retry xCAT in 80 seconds > Apr 8 16:04:52 10.1.250.231 [localhost] INFO xcat.genesis.doxcat: ... > Will retry xCAT in 70 seconds > Apr 8 16:05:02 10.1.250.231 [localhost] INFO xcat.genesis.doxcat: ... > Will retry xCAT in 60 seconds > Apr 8 16:05:12 10.1.250.231 [localhost] INFO xcat.genesis.doxcat: ... > Will retry xCAT in 50 seconds > Apr 8 16:05:22 10.1.250.231 [localhost] INFO xcat.genesis.doxcat: ... > Will retry xCAT in 40 seconds > Apr 8 16:05:33 10.1.250.231 [localhost] INFO xcat.genesis.doxcat: ... > Will retry xCAT in 30 seconds > Apr 8 16:05:43 10.1.250.231 [localhost] INFO xcat.genesis.doxcat: ... > Will retry xCAT in 20 seconds > Apr 8 16:05:53 10.1.250.231 [localhost] INFO xcat.genesis.doxcat: ... > Will retry xCAT in 10 seconds > Apr 8 16:06:03 10.1.250.231 [localhost] INFO xcat.genesis.doxcat: > Running getdestiny --> 10.1.250.1:3001 > Apr 8 16:06:13 10.1.250.231 [localhost] INFO xcat.genesis.doxcat: > Received destiny= > Apr 8 16:06:13 10.1.250.231 [localhost] INFO xcat.genesis.doxcat: The > destiny=, destiny parameters= > Apr 8 16:06:13 10.1.250.231 [localhost] ERR xcat.genesis.doxcat: > Unrecognized directive (dest=) > Apr 8 16:06:16 10.1.250.231 [localhost] INFO xcat.genesis.doxcat: ... > Will retry xCAT in 290 seconds > > _______________________________________________ > xCAT-user mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/xcat-user > > > > _______________________________________________ > xCAT-user mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/xcat-user > > > > _______________________________________________ > xCAT-user mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/xcat-user > > > > -- > Regards, > *Imam Toufique* > *213-700-5485* > > > > -- > Regards, > *Imam Toufique* > *213-700-5485* > _______________________________________________ > xCAT-user mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/xcat-user > > > > _______________________________________________ > xCAT-user mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/xcat-user > > > > -- > Regards, > *Imam Toufique* > *213-700-5485* > _______________________________________________ > xCAT-user mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/xcat-user > > > > _______________________________________________ > xCAT-user mailing list > [email protected] > https://lists.sourceforge.net/lists/listinfo/xcat-user > -- *Nathan Harper* // IT Systems Lead *e: *[email protected] *t*: 0117 906 1104 *m*: 0787 551 0891 *w: *www.cfms.org.uk CFMS Services Ltd // Bristol & Bath Science Park // Dirac Crescent // Emersons Green // Bristol // BS16 7FR CFMS Services Ltd is registered in England and Wales No 05742022 - a subsidiary of CFMS Ltd CFMS Services Ltd registered office // 43 Queens Square // Bristol // BS1 4QP
_______________________________________________ xCAT-user mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/xcat-user
