Hello,

I'm using xCAT 2.11 in production (I know it's an old release) to provision HPC multi-interface compute nodes with stateless images.

Basically, the nodes have one 1Gb/s nic and one 10Gb/s nic, each one connected to a different switch. Obvioulsy, I want them to be provisioned through the 10Gb/s link.

I'm using switch-based discovery and, provided that BIOS PXE priorities is correctly configured, everything works as expected. But I was wondering what would happen if BIOS was wrongly configured to PXE on the 1Gb/s nic first : would discover happen ? if yes, would it result to ? I first though it would be similar to a serial discovery situation but it doesn't seem that simple, here's why :

The compute nodes are physical nodes equipped with

- 2 onboard 1Gb/s nic (eth0, eth1)
- 1 additionnal PCI-E 10Gb/s nic (eth2)

and set up like this :

- eth2 is connected to a 10Gb/s ports switch which is used by switch-based discovery ('switch' and 'switchport' attributes point to this switch and switch ports)

- eth0 is connected to a 1Gb/s ports switch and is used for ipmi traffic from/to the BMC and could also carry the data traffic (legacy from the time we didn't have a 10Gb/s switch nor nic")

- BIOS is set up to PXE boot first and PXE is set up to boot through eth2 first.

Here's what I tested :

Starting conditions :

- Let's say nodeA was discovered and netbooted through the method above (thus its mac in the mac table is eth2 (10Gb/s) address).

- let's say I haven't any undiscovered nor unprovisionned node

To see what would happen through eth0, I changed the eth2 facing switch configuration to forbid traffic coming from this interface just to be sure eth2 PXE could no work no matter what. So now the node does PXE on eth0 (1Gb/s)

I know about the 3 types of automatic discovery as described in documentation (MTMS, switch-based, serial) and here's what I tested :


1) nodeA :

- remove its mac address in the mac table
- remove all its leases in /var/lib/dhcpd/dhcpd.leases
- makedhcp -n

reboot nodeA

-> it is discovered and provisionned with its osimage but through the eth0 interface. hostname is nodeA as expected

in the mac table we can see

"nodeA",,"<eth0 mac-address>|<eth2 mac-address>!*NOIP*",,

-> seems strange to me (see at the end of my tests description)

2) nodeA

- rmdef nodeA
- remove all its leases in /var/lib/dhcpd/dhcpd.leases
- makedhcp -n
- add a new node with simmilar properties (same chain, ...) :

nodeadd foobar groups=<same as nodeA>
tabedit hosts : insert a specific entry with an ip for foobar
tabedit ipmi : insert a specific entry with an bmc ip for foobar
makedns foobar

reboot nodeA

-> it gets stalled ("Network configuration complete, commencing transmit of discovery packets") and PXE loops after getting an IP from the dhcpd dynamic range

-> seems pretty normal to me since I consider myself to be in a serial discovery like scenario then

3) same as 2) but with a nodediscoverstart noderange=foobar AFTER node reboot : seems to loop in PXE/discovery (even if nodediscoverstop is run)

-> seems to me it might be normal as I maybe was supposed to run nodediscoverstart before node boot.

4) same as 2) but with a nodediscoverstart noderange=foobar BEFORE node reboot

-> foobar (same physical node as nodeA) gets discovered and provisionned. hostname is foobar as expected

in the mac table we can see

"foobar",,"0c:c4:7a:4d:85:a8",,

-> seems pretty normal to me

[I don't quite understand what is happening in 3) but it a bit off topic]

So, basically, I don't understand the conceptual difference between 1) and 4) and thus why in 1) nodeA gets discovered without the need to run nodediscoverstart (granted it has switch and swithport attributes but not refering the 1Gb/s switch)...

Again, I'd like to understand the above in order to be sure that the node won't get provisionned through the eth0/1Gb/s link if PXE order gets mixed up...Or even to make sure (worst case scenario if many nodes would have PXE order mixed up) that nodes hostnames won't be mixed up (in a serial discovery like scenario).

Thanks

--
Thomas HUMMEL


_______________________________________________
xCAT-user mailing list
xCAT-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/xcat-user

Reply via email to