Basically, having two NICs on the same subnet is a Bad Thing (tm) unless
you bond them. It's doubly bad if you use DHCP, because then both NICs have
the same default gateway, which means that your system has two routes to
the same destination.

That said, the way I read your log files:

In step 1, genesis shouldn't be involved yet. The BIOS should get IP
addresses for the NICs - apparently for both.

Step 2 should be several separate steps:
2.a) The BIOS PXE boots via one of your NICs
2.b) After a few more back-and-forth, the node boots genesis. It's
important to realize that genesis is simply a Linux live OS (I believe it's
Fedora-based unless you build your own). Genesis can see all the NICs, just
as any Linux OS can.
2.c) Genesis then runs discovery. Eventually, genesis will update the node
object in xCAT with *one* MAC address. Important here: which MAC address it
picks is not inherently related to which NIC was used to boot genesis.
Normally, only the bootup NIC would have an IP address (because none of the
others are configured yet, and they shouldn't be connected to a subnet with
DHCP). But in your case, there are two NICs that have IP addresses via
DHCP. I don't know how genesis picks the "right" NIC in that scenario. It
might actually be random chance. Based on your observation in step 5,
genesis might actually try to discover BOTH NICs.

In step 3, the BMC is changed from DHCP to a static IP address. Note that
it's not clear *which* NIC ends up with the static IP.

In step 4, the leases are released. That's expected since the BMC is no
longer using DHCP.

Step 5a: xCAT confirms that for one of the IP addresses, it found a
corresponding predefined node object, and completed discovery.
Step 5b: xCAT tells you "there is no predefined node object for this IP
address". Of course not - because Step 5a used up the predefined node
object.

Step 6: eth2 has been reconfigured with a static IP in step 3, so eth0 is
the only one that still has DHCP enabled. Remember, these DHCP requests are
generated by the BIOS; they have nothing to do with discovery. Check your
/var/lib/dhcpd/dhcp.leases file to see if there are reservations for one or
both MAC addresses. My guess is that xCAT created a reservation for eth0's
mac address with eth2's IP address.

Step 7: I'm not sure what this means. My hunch is that eth0 may have
received the same IP address via DHCP that has been statically assigned to
eth2 in step 3. eth2 detects the conflict, and falls back to DHCP.

Step 8: because of step 7, eth0 got the IP address via DHCP that xCAT
thought it had statically assigned to eth2. So xCAT is talking to the wrong
NIC on the node.

I'm not certain that this is what happens, but simply a plausible reading.

The solution should be not to have two NICs on the same subnet. As an side,
if I recall correctly, one of the NICs was 1 GB and the other was 10 GB. By
putting them on the same switch and the same VLAN, you may actually force
the 10 GB NIC down to 1 GB throughput (depends on the switch and on which
IP addresses talk to each other).

_______________________________________________________________________
Kevin Keane | Systems Architect | University of San Diego ITS |
kke...@sandiego.edu
Maher Hall, 192 |5998 Alcalá Park | San Diego, CA 92110-2492 | 619.260.6859

*REMEMBER! **No one from IT at USD will ever ask to confirm or supply your
password*.
These messages are an attempt to steal your username and password. Please
do not reply to, click the links within, or open the attachments of these
messages. Delete them!




On Fri, Feb 22, 2019 at 10:34 AM Thomas HUMMEL <thomas.hum...@pasteur.fr>
wrote:

> Hello,
>
> I apologize for opening yet another thread but I think what I'll detail
> in it deserves it.
>
> In short, I'm trying to figure out what's happening when discovery is
> done on a node which has BOTH eth0 (switch A) and eth2 (switch B) on
> switch ports configured on the same vlan. My nodes are only configured
> with one ip (in hosts table) and has the switch and switchport
> attributes to get used by switch-based discovery on switchB.
>
> When BIOS/Intel Boot Agent PXE on eth2, here's what the log says :
>
> 1) genesis gets the nics configured and retrieve an ip from the dynamic
> range :
>
> eth2 : 2019-02-22T18:16:32.004867+01:00 xcat-tars dhcpd: DHCPACK on
> 192.168.134.252 to 0c:c4:7a:58:c7:6a via eth1
>
> eth0 : 2019-02-22T18:16:39.003422+01:00 xcat-tars dhcpd: DHCPACK on
> 192.168.134.250 to 0c:c4:7a:4d:85:a8 via eth1
>
> 2) discovery begins on eth2
>
> 2019-02-22T18:16:44.320279+01:00 xcat-tars xcat[60918]: xcatd:
> Processing discovery request from 192.168.134.252
>
> 3) discovery has seen that it was for the tars-113 node :
>
> 2019-02-22T18:16:57.162396+01:00 xcat-tars xcat[60918]: Discover info:
> configure static BMC ip:10.6.96.115 for host_node:tars-113.
>
> 4) eth0 and eth2 leases are released :
>
> 2019-02-22T18:16:59.621541+01:00 xcat-tars dhcpd: DHCPRELEASE of
> 192.168.134.250 from 0c:c4:7a:4d:85:a8 via eth1 (found)
>
> 2019-02-22T18:16:59.664557+01:00 xcat-tars dhcpd: DHCPRELEASE of
> 192.168.134.252 from 0c:c4:7a:58:c7:6a via eth1 (found)
>
> -> because discovery is resolved ?
>
> 5) I've got those messages
>
> 2019-02-22T18:16:59.752892+01:00 xcat-tars xcatd: Discovery worker:
> switch instance: nodediscover instance[60918]: tars-113 has been discovered
> 2019-02-22T18:16:59.754145+01:00 xcat-tars xcat[60918]: Discovery Error:
> Could not find any node.
>
> -> I'm not really sure it the "tars-113 has been discovered" is related
> to point number 3 (order of messages mixed in logs) or something else : ?
>
> -> I'm not really sure why the "Could not find any node" message :
>
> -> Are there 2 discovery process running and racing against each other
> (one per nic) ?
>
> 6) eth0 gets the desired ip (the ip set in the node def)
>
> 2019-02-22T18:17:00.000095+01:00 xcat-tars dhcpd: DHCPDISCOVER from
> 0c:c4:7a:4d:85:a8 via eth1
> 2019-02-22T18:17:00.000116+01:00 xcat-tars dhcpd: DHCPOFFER on
> 192.168.128.115 to 0c:c4:7a:4d:85:a8 via eth1
> 2019-02-22T18:17:00.000433+01:00 xcat-tars dhcpd: DHCPREQUEST for
> 192.168.128.115 (192.168.132.2) from 0c:c4:7a:4d:85:a8 via eth1
> 2019-02-22T18:17:00.000442+01:00 xcat-tars dhcpd: DHCPACK on
> 192.168.128.115 to 0c:c4:7a:4d:85:a8 via eth1
>
>
> -> why as discovery seems to have been done on eth2 ?
>
> 7) eth2 asks for an ip but is denied to boot :
>
>
> 2019-02-22T18:17:01.001030+01:00 xcat-tars dhcpd: DHCPDISCOVER from
> 0c:c4:7a:58:c7:6a via eth1: booting disallowed
>
> 8) I've got this message I don't fully understand either :
>
> 2019-02-22T18:17:01.500387+01:00 xcat-tars xcatd: Discovery worker:
> switch instance: nodediscover instance[60918]: Failed to notify
> 192.168.134.252 that it's actually tars-113.
>
> So, granted, my setup is bogus but I keep it that way for learning
> purpose and in the hope I'll be able to figure out how nic related
> attributes (installnic, nicips) are meant to be used.
>
> Thanks!
>
> --
> Thomas Hummel
>
>
>
>
> _______________________________________________
> xCAT-user mailing list
> xCAT-user@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/xcat-user
>
_______________________________________________
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user

Reply via email to