On Wed, 2015-04-22 at 11:12 +0200, Maikel vd Mosselaar wrote: > Our pool is configured as Z1 with ZIL (normal SSD), the sync parameter > is on the default setting (standard) so "sync" is on.
# zpool status ? /K > > When the issue happens oVirt event viewer shows indeed latency warnings. > Not always but most of the time this will be followed by an i/o storage > error linked to random VMs and they will be paused when that happens. > > All the nodes use mode 4 bonding. The interfaces on the nodes don't show > any drops or errors, i checked 2 of the VMs that got paused the last > time it happened they have dropped packets on their interfaces. > > We don't have a subscription with nexenta (anymore). > > On 04/21/2015 04:41 PM, InterNetX - Juergen Gotteswinter wrote: > > Am 21.04.2015 um 16:19 schrieb Maikel vd Mosselaar: > >> Hi Juergen, > >> > >> The load on the nodes rises far over >200 during the event. Load on the > >> nexenta stays normal and nothing strange in the logging. > > ZFS + NFS could be still the root of this. Your Pool Configuration is > > RaidzX or Mirror, with or without ZIL? The sync Parameter of your ZFS > > Subvolume which gets exported is kept default on "standard" ? > > > > http://christopher-technicalmusings.blogspot.de/2010/09/zfs-and-nfs-performance-with-zil.html > > > > Since Ovirt acts very sensible about Storage Latency (throws VM into > > unresponsive or unknown state) it might be worth a try to do "zfs set > > sync=disabled pool/volume" to see if this changes things. But be aware > > that this makes the NFS Export vuln. against dataloss in case of > > powerloss etc, comparable to async NFS in Linux. > > > > If disabling the sync setting helps, and you dont use a seperate ZIL > > Flash Drive yet -> this whould be very likely help to get rid of this. > > > > Also, if you run a subscribed Version of Nexenta it might be helpful to > > involve them. > > > > Do you see any messages about high latency in the Ovirt Events Panel? > > > >> For our storage interfaces on our nodes we use bonding in mode 4 > >> (802.3ad) 2x 1Gb. The nexenta has 4x 1Gb bond in mode 4 also. > > This should be fine, as long as no Node uses Mode0 / Round Robin which > > whould lead to out of order TCP Packets. The Interfaces themself dont > > show any Drops or Errors - on the VM Hosts as well as on the Switch itself? > > > > Jumbo Frames? > > > >> Kind regards, > >> > >> Maikel > >> > >> > >> On 04/21/2015 02:51 PM, InterNetX - Juergen Gotteswinter wrote: > >>> Hi, > >>> > >>> how about Load, Latency, strange dmesg messages on the Nexenta ? You are > >>> using bonded Gbit Networking? If yes, which mode? > >>> > >>> Cheers, > >>> > >>> Juergen > >>> > >>> Am 20.04.2015 um 14:25 schrieb Maikel vd Mosselaar: > >>>> Hi, > >>>> > >>>> We are running ovirt 3.5.1 with 3 nodes and seperate engine. > >>>> > >>>> All on CentOS 6.6: > >>>> 3 x nodes > >>>> 1 x engine > >>>> > >>>> 1 x storage nexenta with NFS > >>>> > >>>> For multiple weeks we are experiencing issues of our nodes that cannot > >>>> access the storage at random moments (atleast thats what the nodes > >>>> think). > >>>> > >>>> When the nodes are complaining about a unavailable storage then the load > >>>> rises up to +200 on all three nodes, this causes that all running VMs > >>>> are unaccessible. During this process oVirt event viewer shows some i/o > >>>> storage error messages, when this happens random VMs get paused and will > >>>> not be resumed anymore (this almost happens every time but not all the > >>>> VMs get paused). > >>>> > >>>> During the event we tested the accessibility from the nodes to the > >>>> storage and it looks like it is working normal, at least we can do a > >>>> normal > >>>> "ls" on the storage without any delay of showing the contents. > >>>> > >>>> We tried multiple things that we thought it causes this issue but > >>>> nothing worked so far. > >>>> * rebooting storage / nodes / engine. > >>>> * disabling offsite rsync backups. > >>>> * moved the biggest VMs with highest load to different platform outside > >>>> of oVirt. > >>>> * checked the wsize and rsize on the nfs mounts, storage and nodes are > >>>> correct according to the "NFS troubleshooting page" on ovirt.org. > >>>> > >>>> The environment is running in production so we are not free to test > >>>> everything. > >>>> > >>>> I can provide log files if needed. > >>>> > >>>> Kind Regards, > >>>> > >>>> Maikel > >>>> > >>>> > >>>> _______________________________________________ > >>>> Users mailing list > >>>> Users@ovirt.org > >>>> http://lists.ovirt.org/mailman/listinfo/users > >>> _______________________________________________ > >>> Users mailing list > >>> Users@ovirt.org > >>> http://lists.ovirt.org/mailman/listinfo/users > > _______________________________________________ > Users mailing list > Users@ovirt.org > http://lists.ovirt.org/mailman/listinfo/users _______________________________________________ Users mailing list Users@ovirt.org http://lists.ovirt.org/mailman/listinfo/users