What is your bandwidth threshold for the network used for VM migration ?
Can you set a 90 mbit/s threshold (yes, less than 100mbit/s) and try to migrate 
a small (1 GB RAM) VM ?

Do you see disconnects ?

If no, try a little bit up (the threshold)  and check again.

Best Regards,
Strahil NikolovOn Aug 23, 2019 23:19, "Curtis E. Combs Jr." 
<[email protected]> wrote:
>
> It took a while for my servers to come back on the network this time. 
> I think it's due to ovirt continuing to try to migrate the VMs around 
> like I requested. The 3 servers' names are "swm-01, swm-02 and 
> swm-03". Eventually (about 2-3 minutes ago) they all came back online. 
>
> So I disabled and stopped the lldpad service. 
>
> Nope. Started some more migrations and swm-02 and swm-03 disappeared 
> again. No ping, SSH hung, same as before - almost as soon as the 
> migration started. 
>
> If you wall have any ideas what switch-level setting might be enabled, 
> let me know, cause I'm stumped. I can add it to the ticket that's 
> requesting the port configurations. I've already added the port 
> numbers and switch name that I got from CDP. 
>
> Thanks again, I really appreciate the help! 
> cecjr 
>
>
>
> On Fri, Aug 23, 2019 at 3:28 PM Dominik Holler <[email protected]> wrote: 
> > 
> > 
> > 
> > On Fri, Aug 23, 2019 at 9:19 PM Dominik Holler <[email protected]> wrote: 
> >> 
> >> 
> >> 
> >> On Fri, Aug 23, 2019 at 8:03 PM Curtis E. Combs Jr. <[email protected]> 
> >> wrote: 
> >>> 
> >>> This little cluster isn't in production or anything like that yet. 
> >>> 
> >>> So, I went ahead and used your ethtool commands to disable pause 
> >>> frames on both interfaces of each server. I then, chose a few VMs to 
> >>> migrate around at random. 
> >>> 
> >>> swm-02 and swm-03 both went out again. Unreachable. Can't ping, can't 
> >>> ssh, and the SSH session that I had open was unresponsive. 
> >>> 
> >>> Any other ideas? 
> >>> 
> >> 
> >> Sorry, no. Looks like two different NICs with different drivers and 
> >> frimware goes down together. 
> >> This is a strong indication that the root cause is related to the switch. 
> >> Maybe you can get some information about the switch config by 
> >> 'lldptool get-tlv -n -i em1' 
> >> 
> > 
> > Another guess: 
> > After the optional 'lldptool get-tlv -n -i em1' 
> > 'systemctl stop lldpad' 
> > another try to migrate. 
> > 
> > 
> >> 
> >> 
> >>> 
> >>> On Fri, Aug 23, 2019 at 1:50 PM Dominik Holler <[email protected]> 
> >>> wrote: 
> >>> > 
> >>> > 
> >>> > 
> >>> > On Fri, Aug 23, 2019 at 6:45 PM Curtis E. Combs Jr. 
> >>> > <[email protected]> wrote: 
> >>> >> 
> >>> >> Unfortunately, I can't check on the switch. Trust me, I've tried. 
> >>> >> These servers are in a Co-Lo and I've put 5 tickets in asking about 
> >>> >> the port configuration. They just get ignored - but that's par for the 
> >>> >> coarse for IT here. Only about 2 out of 10 of our tickets get any 
> >>> >> response and usually the response doesn't help. Then the system they 
> >>> >> use auto-closes the ticket. That was why I was suspecting STP before. 
> >>> >> 
> >>> >> I can do ethtool. I do have root on these servers, though. Are you 
> >>> >> trying to get me to turn off link-speed auto-negotiation? Would you 
> >>> >> like me to try that? 
> >>> >> 
> >>> > 
> >>> > It is just a suspicion, that the reason is pause frames. 
> >>> > Let's start on a NIC which is not used for ovirtmgmt, I guess em1. 
> >>> > Does 'ethtool -S em1  | grep pause' show something? 
> >>> > Does 'ethtool em1 | grep pause' indicates support for pause? 
> >>> > The current config is shown by 'ethtool -a em1'. 
> >>> > '-A autoneg' "Specifies whether pause autonegotiation should be 
> >>> > enabled." according to ethtool doc. 
> >>> > Assuming flow control is enabled by default, I would try to  disable it 
> >>> > via 
> >>> > 'ethtool -A em1 autoneg off rx off tx off' 
> >>> > and check if it is applied via 
> >>> > 'ethtool -a em1' 
> >>> > and check if the behavior under load changes. 
> >>> > 
> >>> > 
> >>> > 
> >>> >> 
> >>> >> On Fri, Aug 23, 2019 at 12:24 PM Dominik Holler <[email protected]> 
> >>> >> wrote: 
> >>> >> > 
> >>> >> > 
> >>> >> > 
> >>> >> > On Fri, Aug 23, 2019 at 5:49 PM Curtis E. Combs Jr. 
> >>> >> > <[email protected]> wrote: 
> >>> >> >> 
> >>> >> >> Sure! Right now, I only have a 5
_______________________________________________
Users mailing list -- [email protected]
To unsubscribe send an email to [email protected]
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/[email protected]/message/FY5I6PZNROOB5GTQCORQWO27PBLG2JK7/

Reply via email to