On 2/22/19 9:36 PM, Kevin Keane wrote:
Basically, having two NICs on the same subnet is a Bad Thing (tm) unless
you bond them. It's doubly bad if you use DHCP, because then both NICs
have the same default gateway, which means that your system has two
routes to the same destination.
Hello,
sorry for the delay, I've been in vacation then busy.
Yeah, I know it's bad.
A quick reminder :
Nodes are normally physically configured like this :
conf 1:
a) eth0 -> switch A (1G) : ipmi (ipmi subnet) traffic allowed
b) eth2 -> switch B (10G) : only data (data subnet) traffic allowed
and logically configured to get switch-based discovered using swith B
I did configure on purpose one node/port like this :
conf 2:
a) eth0 -> switch A (1G) : data (data subnet) + ipmi (ipmi subnet)
traffic allowed
b) eth2 -> switch B (10G) : data (data subnet) only traffic allowed
precisely to be able to handle (by knowing what's happening and/or
forcing eth2 install) a switch misconfiguration which would result to
the conf2 above [and at the time I wrote my initial message, a BIOS boot
order misconfiguration too]
[I will requote the steps I provided and you mention]
That said, the way I read your log files:
--
Step 1 was :
1) genesis gets the nics configured and retrieve an ip from the dynamic
range :
eth2 : 2019-02-22T18:16:32.004867+01:00 xcat-tars dhcpd: DHCPACK on
192.168.134.252 to 0c:c4:7a:58:c7:6a via eth1
eth0 : 2019-02-22T18:16:39.003422+01:00 xcat-tars dhcpd: DHCPACK on
192.168.134.250 to 0c:c4:7a:4d:85:a8 via eth1
--
In step 1, genesis shouldn't be involved yet. The BIOS should get IP
addresses for the NICs - apparently for both.
I don't see why. I was talking about what's seen after the 'Beginin
discovery process' message on the console
On the console we can see 1 DHCP request for each nic.
Important here: which MAC
address it picks is not inherently related to which NIC was used to boot
genesis. Normally, only the bootup NIC would have an IP address (because
none of the others are configured yet, and they shouldn't be connected
to a subnet with DHCP). But in your case, there are two NICs that have
IP addresses via DHCP. I don't know how genesis picks the "right" NIC in
that scenario. It might actually be random chance. Based on your
observation in step 5, genesis might actually try to discover BOTH NICs.
My first guess is that indeed xCAT tries to discover BOTH NICs but I'm
not so sure.
---
Step 2 was :
2) discovery begins on eth2
2019-02-22T18:16:44.320279+01:00 xcat-tars xcat[60918]: xcatd:
Processing discovery request from 192.168.134.252
---
Step 3 was :
---
3) discovery has seen that it was for the tars-113 node :
2019-02-22T18:16:57.162396+01:00 xcat-tars xcat[60918]: Discover info:
configure static BMC ip:10.6.96.115 for host_node:tars-113.
---
In step 3, the BMC is changed from DHCP to a static IP address. Note
that it's not clear *which* NIC ends up with the static IP.
I don't see the problem : BMC, even it physically shared the eth0 port
does have it's own MAC address.
Step 4 was
---
4) eth0 and eth2 leases are released :
2019-02-22T18:16:59.621541+01:00 xcat-tars dhcpd: DHCPRELEASE of
192.168.134.250 from 0c:c4:7a:4d:85:a8 via eth1 (found)
2019-02-22T18:16:59.664557+01:00 xcat-tars dhcpd: DHCPRELEASE of
192.168.134.252 from 0c:c4:7a:58:c7:6a via eth1 (found)
---
In step 4, the leases are released. That's expected since the BMC is no
longer using DHCP.
I don't think it's related to the BMC since both MAC addresses for the
DHCPRELEASE are the adresses of eth0 and eth2, not the MAC address of
the BMC
Step 5 was
---
5) I've got those messages
2019-02-22T18:16:59.752892+01:00 xcat-tars xcatd: Discovery worker:
switch instance: nodediscover instance[60918]: tars-113 has been discovered
2019-02-22T18:16:59.754145+01:00 xcat-tars xcat[60918]: Discovery Error:
Could not find any node.
---
Step 5a: xCAT confirms that for one of the IP addresses, it found a
corresponding predefined node object, and completed discovery.
Step 5b: xCAT tells you "there is no predefined node object for this IP
address". Of course not - because Step 5a used up the predefined node
object.
I agree something like this must happen. And it must be switch-based
since I removed the tars-113 entry in discoverydata. So I don't think
MTMS discovery could have found out anything.
Step 6 was :
---
6) eth0 gets the desired ip (the ip set in the node def)
2019-02-22T18:17:00.000095+01:00 xcat-tars dhcpd: DHCPDISCOVER from
0c:c4:7a:4d:85:a8 via eth1
2019-02-22T18:17:00.000116+01:00 xcat-tars dhcpd: DHCPOFFER on
192.168.128.115 to 0c:c4:7a:4d:85:a8 via eth1
2019-02-22T18:17:00.000433+01:00 xcat-tars dhcpd: DHCPREQUEST for
192.168.128.115 (192.168.132.2) from 0c:c4:7a:4d:85:a8 via eth1
2019-02-22T18:17:00.000442+01:00 xcat-tars dhcpd: DHCPACK on
192.168.128.115 to 0c:c4:7a:4d:85:a8 via eth1
---
Step 6: eth2 has been reconfigured with a static IP in step 3, so eth0
is the only one that still has DHCP enabled. Remember, these DHCP
requests are generated by the BIOS; they have nothing to do with
discovery.
I don't get why ? BIOS only does the initial native PXE, doesn't it ?
Check your /var/lib/dhcpd/dhcp.leases file to see if there
are reservations for one or both MAC addresses. My guess is that xCAT
created a reservation for eth0's mac address with eth2's IP address.
eth0 mac gets what I'd want for eth2 and eth2 gets
host tars-113-noip0cc47a58c76a {
dynamic;
hardware ethernet 0c:c4:7a:58:c7:6a;
supersede server.allow-booting = 00;
Step 7 was
---
7) eth2 asks for an ip but is denied to boot :
2019-02-22T18:17:01.001030+01:00 xcat-tars dhcpd: DHCPDISCOVER from
0c:c4:7a:58:c7:6a via eth1: booting disallowed
---
Step 7: I'm not sure what this means. My hunch is that eth0 may have
received the same IP address via DHCP that has been statically assigned
to eth2 in step 3. eth2 detects the conflict, and falls back to DHCP.
Step 8 was
---
8) I've got this message I don't fully understand either :
2019-02-22T18:17:01.500387+01:00 xcat-tars xcatd: Discovery worker:
switch instance: nodediscover instance[60918]: Failed to notify
192.168.134.252 that it's actually tars-113.
---
Step 8: because of step 7, eth0 got the IP address via DHCP that xCAT
thought it had statically assigned to eth2. So xCAT is talking to the
wrong NIC on the node.
So to sum up, what's unsure is
- does genesis try to discover BOTH nic
- which nic does genesis register
What's experienced is the the node ends up having the eth0 MAC address
in the MAC table.
Again, the initial need was a way to make sure eth2 MAC was discovered
and only this one even in the case of a switch misconfiguration or a
BIOS order misconfiguration (although you pretty convinced me BIOS order
may not have something to do with what MAC address ends up discovered)
Thanks
--
Thomas H.
_______________________________________________
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user