What is your bandwidth threshold for the network used for VM migration ? Can you set a 90 mbit/s threshold (yes, less than 100mbit/s) and try to migrate a small (1 GB RAM) VM ?
Do you see disconnects ? If no, try a little bit up (the threshold) and check again. Best Regards, Strahil NikolovOn Aug 23, 2019 23:19, "Curtis E. Combs Jr." <[email protected]> wrote: > > It took a while for my servers to come back on the network this time. > I think it's due to ovirt continuing to try to migrate the VMs around > like I requested. The 3 servers' names are "swm-01, swm-02 and > swm-03". Eventually (about 2-3 minutes ago) they all came back online. > > So I disabled and stopped the lldpad service. > > Nope. Started some more migrations and swm-02 and swm-03 disappeared > again. No ping, SSH hung, same as before - almost as soon as the > migration started. > > If you wall have any ideas what switch-level setting might be enabled, > let me know, cause I'm stumped. I can add it to the ticket that's > requesting the port configurations. I've already added the port > numbers and switch name that I got from CDP. > > Thanks again, I really appreciate the help! > cecjr > > > > On Fri, Aug 23, 2019 at 3:28 PM Dominik Holler <[email protected]> wrote: > > > > > > > > On Fri, Aug 23, 2019 at 9:19 PM Dominik Holler <[email protected]> wrote: > >> > >> > >> > >> On Fri, Aug 23, 2019 at 8:03 PM Curtis E. Combs Jr. <[email protected]> > >> wrote: > >>> > >>> This little cluster isn't in production or anything like that yet. > >>> > >>> So, I went ahead and used your ethtool commands to disable pause > >>> frames on both interfaces of each server. I then, chose a few VMs to > >>> migrate around at random. > >>> > >>> swm-02 and swm-03 both went out again. Unreachable. Can't ping, can't > >>> ssh, and the SSH session that I had open was unresponsive. > >>> > >>> Any other ideas? > >>> > >> > >> Sorry, no. Looks like two different NICs with different drivers and > >> frimware goes down together. > >> This is a strong indication that the root cause is related to the switch. > >> Maybe you can get some information about the switch config by > >> 'lldptool get-tlv -n -i em1' > >> > > > > Another guess: > > After the optional 'lldptool get-tlv -n -i em1' > > 'systemctl stop lldpad' > > another try to migrate. > > > > > >> > >> > >>> > >>> On Fri, Aug 23, 2019 at 1:50 PM Dominik Holler <[email protected]> > >>> wrote: > >>> > > >>> > > >>> > > >>> > On Fri, Aug 23, 2019 at 6:45 PM Curtis E. Combs Jr. > >>> > <[email protected]> wrote: > >>> >> > >>> >> Unfortunately, I can't check on the switch. Trust me, I've tried. > >>> >> These servers are in a Co-Lo and I've put 5 tickets in asking about > >>> >> the port configuration. They just get ignored - but that's par for the > >>> >> coarse for IT here. Only about 2 out of 10 of our tickets get any > >>> >> response and usually the response doesn't help. Then the system they > >>> >> use auto-closes the ticket. That was why I was suspecting STP before. > >>> >> > >>> >> I can do ethtool. I do have root on these servers, though. Are you > >>> >> trying to get me to turn off link-speed auto-negotiation? Would you > >>> >> like me to try that? > >>> >> > >>> > > >>> > It is just a suspicion, that the reason is pause frames. > >>> > Let's start on a NIC which is not used for ovirtmgmt, I guess em1. > >>> > Does 'ethtool -S em1 | grep pause' show something? > >>> > Does 'ethtool em1 | grep pause' indicates support for pause? > >>> > The current config is shown by 'ethtool -a em1'. > >>> > '-A autoneg' "Specifies whether pause autonegotiation should be > >>> > enabled." according to ethtool doc. > >>> > Assuming flow control is enabled by default, I would try to disable it > >>> > via > >>> > 'ethtool -A em1 autoneg off rx off tx off' > >>> > and check if it is applied via > >>> > 'ethtool -a em1' > >>> > and check if the behavior under load changes. > >>> > > >>> > > >>> > > >>> >> > >>> >> On Fri, Aug 23, 2019 at 12:24 PM Dominik Holler <[email protected]> > >>> >> wrote: > >>> >> > > >>> >> > > >>> >> > > >>> >> > On Fri, Aug 23, 2019 at 5:49 PM Curtis E. Combs Jr. > >>> >> > <[email protected]> wrote: > >>> >> >> > >>> >> >> Sure! Right now, I only have a 5 _______________________________________________ Users mailing list -- [email protected] To unsubscribe send an email to [email protected] Privacy Statement: https://www.ovirt.org/site/privacy-policy/ oVirt Code of Conduct: https://www.ovirt.org/community/about/community-guidelines/ List Archives: https://lists.ovirt.org/archives/list/[email protected]/message/FY5I6PZNROOB5GTQCORQWO27PBLG2JK7/

