Hi, Thomas
 
Yeah, I agree with Kevin that if you not plan to bond eth0 and eth2, you'd better set then in different VLAN.
 
But if eth0 and eth2 are in same vlan, xCAT can deal with it after xCAT 2.13
 
Will you pls show me the output of `lsdef tars-113`
And can `tars-113-eth0` or `tars-113-eth2` can be resolved to the same ip with `tars-113` in your DNS? And can be get from MN?
For example `ping tars-113-eth0` or `ping tars-113-eth2` can get same ip in node `xcat-tars`?
 
Thx!
Best Regards,
-----------------------------------
Zhao Er Tao

IBM China System and Technology Laboratory, Beijing
Tel:(86-10)82450485
Email: erta...@cn.ibm.com
Address: 1/F, 28 Building,ZhongGuanCun Software Park,
No.8 DongBeiWang West Road, Haidian District,
Beijing, 100193, P.R.China
 
 
----- Original message -----
From: Kevin Keane <kke...@sandiego.edu>
To: xCAT Users Mailing list <xcat-user@lists.sourceforge.net>
Cc:
Subject: Re: [xcat-user] Discovery race condition ?
Date: Sat, Feb 23, 2019 4:36 AM
 
Basically, having two NICs on the same subnet is a Bad Thing (tm) unless you bond them. It's doubly bad if you use DHCP, because then both NICs have the same default gateway, which means that your system has two routes to the same destination.
 
That said, the way I read your log files:
 
In step 1, genesis shouldn't be involved yet. The BIOS should get IP addresses for the NICs - apparently for both.
 
 
Step 2 should be several separate steps:
2.a) The BIOS PXE boots via one of your NICs
2.b) After a few more back-and-forth, the node boots genesis. It's important to realize that genesis is simply a Linux live OS (I believe it's Fedora-based unless you build your own). Genesis can see all the NICs, just as any Linux OS can.
2.c) Genesis then runs discovery. Eventually, genesis will update the node object in xCAT with *one* MAC address. Important here: which MAC address it picks is not inherently related to which NIC was used to boot genesis. Normally, only the bootup NIC would have an IP address (because none of the others are configured yet, and they shouldn't be connected to a subnet with DHCP). But in your case, there are two NICs that have IP addresses via DHCP. I don't know how genesis picks the "right" NIC in that scenario. It might actually be random chance. Based on your observation in step 5, genesis might actually try to discover BOTH NICs.
 
In step 3, the BMC is changed from DHCP to a static IP address. Note that it's not clear *which* NIC ends up with the static IP.
 
In step 4, the leases are released. That's expected since the BMC is no longer using DHCP.
 
Step 5a: xCAT confirms that for one of the IP addresses, it found a corresponding predefined node object, and completed discovery.
Step 5b: xCAT tells you "there is no predefined node object for this IP address". Of course not - because Step 5a used up the predefined node object.
 
Step 6: eth2 has been reconfigured with a static IP in step 3, so eth0 is the only one that still has DHCP enabled. Remember, these DHCP requests are generated by the BIOS; they have nothing to do with discovery. Check your /var/lib/dhcpd/dhcp.leases file to see if there are reservations for one or both MAC addresses. My guess is that xCAT created a reservation for eth0's mac address with eth2's IP address.
 
Step 7: I'm not sure what this means. My hunch is that eth0 may have received the same IP address via DHCP that has been statically assigned to eth2 in step 3. eth2 detects the conflict, and falls back to DHCP.
 
Step 8: because of step 7, eth0 got the IP address via DHCP that xCAT thought it had statically assigned to eth2. So xCAT is talking to the wrong NIC on the node.
 
I'm not certain that this is what happens, but simply a plausible reading.
 
The solution should be not to have two NICs on the same subnet. As an side, if I recall correctly, one of the NICs was 1 GB and the other was 10 GB. By putting them on the same switch and the same VLAN, you may actually force the 10 GB NIC down to 1 GB throughput (depends on the switch and on which IP addresses talk to each other).

_______________________________________________________________________
Kevin Keane | Systems Architect | University of San Diego ITS | kke...@sandiego.edu
Maher Hall, 192 |5998 Alcalá Park | San Diego, CA 92110-2492 | 619.260.6859

REMEMBER! No one from IT at USD will ever ask to confirm or supply your password.
These messages are an attempt to steal your username and password. Please do not reply to, click the links within, or open the attachments of these messages. Delete them!

 

 
On Fri, Feb 22, 2019 at 10:34 AM Thomas HUMMEL <thomas.hum...@pasteur.fr> wrote:
Hello,

I apologize for opening yet another thread but I think what I'll detail
in it deserves it.

In short, I'm trying to figure out what's happening when discovery is
done on a node which has BOTH eth0 (switch A) and eth2 (switch B) on
switch ports configured on the same vlan. My nodes are only configured
with one ip (in hosts table) and has the switch and switchport
attributes to get used by switch-based discovery on switchB.

When BIOS/Intel Boot Agent PXE on eth2, here's what the log says :

1) genesis gets the nics configured and retrieve an ip from the dynamic
range :

eth2 : 2019-02-22T18:16:32.004867+01:00 xcat-tars dhcpd: DHCPACK on
192.168.134.252 to 0c:c4:7a:58:c7:6a via eth1

eth0 : 2019-02-22T18:16:39.003422+01:00 xcat-tars dhcpd: DHCPACK on
192.168.134.250 to 0c:c4:7a:4d:85:a8 via eth1

2) discovery begins on eth2

2019-02-22T18:16:44.320279+01:00 xcat-tars xcat[60918]: xcatd:
Processing discovery request from 192.168.134.252

3) discovery has seen that it was for the tars-113 node :

2019-02-22T18:16:57.162396+01:00 xcat-tars xcat[60918]: Discover info:
configure static BMC ip:10.6.96.115 for host_node:tars-113.

4) eth0 and eth2 leases are released :

2019-02-22T18:16:59.621541+01:00 xcat-tars dhcpd: DHCPRELEASE of
192.168.134.250 from 0c:c4:7a:4d:85:a8 via eth1 (found)

2019-02-22T18:16:59.664557+01:00 xcat-tars dhcpd: DHCPRELEASE of
192.168.134.252 from 0c:c4:7a:58:c7:6a via eth1 (found)

-> because discovery is resolved ?

5) I've got those messages

2019-02-22T18:16:59.752892+01:00 xcat-tars xcatd: Discovery worker:
switch instance: nodediscover instance[60918]: tars-113 has been discovered
2019-02-22T18:16:59.754145+01:00 xcat-tars xcat[60918]: Discovery Error:
Could not find any node.

-> I'm not really sure it the "tars-113 has been discovered" is related
to point number 3 (order of messages mixed in logs) or something else : ?

-> I'm not really sure why the "Could not find any node" message :

-> Are there 2 discovery process running and racing against each other
(one per nic) ?

6) eth0 gets the desired ip (the ip set in the node def)

2019-02-22T18:17:00.000095+01:00 xcat-tars dhcpd: DHCPDISCOVER from
0c:c4:7a:4d:85:a8 via eth1
2019-02-22T18:17:00.000116+01:00 xcat-tars dhcpd: DHCPOFFER on
192.168.128.115 to 0c:c4:7a:4d:85:a8 via eth1
2019-02-22T18:17:00.000433+01:00 xcat-tars dhcpd: DHCPREQUEST for
192.168.128.115 (192.168.132.2) from 0c:c4:7a:4d:85:a8 via eth1
2019-02-22T18:17:00.000442+01:00 xcat-tars dhcpd: DHCPACK on
192.168.128.115 to 0c:c4:7a:4d:85:a8 via eth1


-> why as discovery seems to have been done on eth2 ?

7) eth2 asks for an ip but is denied to boot :


2019-02-22T18:17:01.001030+01:00 xcat-tars dhcpd: DHCPDISCOVER from
0c:c4:7a:58:c7:6a via eth1: booting disallowed

8) I've got this message I don't fully understand either :

2019-02-22T18:17:01.500387+01:00 xcat-tars xcatd: Discovery worker:
switch instance: nodediscover instance[60918]: Failed to notify
192.168.134.252 that it's actually tars-113.

So, granted, my setup is bogus but I keep it that way for learning
purpose and in the hope I'll be able to figure out how nic related
attributes (installnic, nicips) are meant to be used.

Thanks!

--
Thomas Hummel




_______________________________________________
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user
_______________________________________________
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user
 

_______________________________________________
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user

Reply via email to