On Wed, 2015-04-22 at 11:12 +0200, Maikel vd Mosselaar wrote:
> Our pool is configured as Z1 with ZIL (normal SSD), the sync parameter 
> is on the default setting (standard) so "sync" is on.

# zpool status ?

/K

> 
> When the issue happens oVirt event viewer shows indeed latency warnings. 
> Not always but most of the time this will be followed by an i/o storage 
> error linked to random VMs and they will be paused when that happens.
> 
> All the nodes use mode 4 bonding. The interfaces on the nodes don't show 
> any drops or errors, i checked 2 of the VMs that got paused the last 
> time it happened they have dropped packets on their interfaces.
> 
> We don't have a subscription with nexenta (anymore).
> 
> On 04/21/2015 04:41 PM, InterNetX - Juergen Gotteswinter wrote:
> > Am 21.04.2015 um 16:19 schrieb Maikel vd Mosselaar:
> >> Hi Juergen,
> >>
> >> The load on the nodes rises far over >200 during the event. Load on the
> >> nexenta stays normal and nothing strange in the logging.
> > ZFS + NFS could be still the root of this. Your Pool Configuration is
> > RaidzX or Mirror, with or without ZIL? The sync Parameter of your ZFS
> > Subvolume which gets exported is kept default on "standard" ?
> >
> > http://christopher-technicalmusings.blogspot.de/2010/09/zfs-and-nfs-performance-with-zil.html
> >
> > Since Ovirt acts very sensible about Storage Latency (throws VM into
> > unresponsive or unknown state) it might be worth a try to do "zfs set
> > sync=disabled pool/volume" to see if this changes things. But be aware
> > that this makes the NFS Export vuln. against dataloss in case of
> > powerloss etc, comparable to async NFS in Linux.
> >
> > If disabling the sync setting helps, and you dont use a seperate ZIL
> > Flash Drive yet -> this whould be very likely help to get rid of this.
> >
> > Also, if you run a subscribed Version of Nexenta it might be helpful to
> > involve them.
> >
> > Do you see any messages about high latency in the Ovirt Events Panel?
> >
> >> For our storage interfaces on our nodes we use bonding in mode 4
> >> (802.3ad) 2x 1Gb. The nexenta has 4x 1Gb bond in mode 4 also.
> > This should be fine, as long as no Node uses Mode0 / Round Robin which
> > whould lead to out of order TCP Packets. The Interfaces themself dont
> > show any Drops or Errors - on the VM Hosts as well as on the Switch itself?
> >
> > Jumbo Frames?
> >
> >> Kind regards,
> >>
> >> Maikel
> >>
> >>
> >> On 04/21/2015 02:51 PM, InterNetX - Juergen Gotteswinter wrote:
> >>> Hi,
> >>>
> >>> how about Load, Latency, strange dmesg messages on the Nexenta ? You are
> >>> using bonded Gbit Networking? If yes, which mode?
> >>>
> >>> Cheers,
> >>>
> >>> Juergen
> >>>
> >>> Am 20.04.2015 um 14:25 schrieb Maikel vd Mosselaar:
> >>>> Hi,
> >>>>
> >>>> We are running ovirt 3.5.1 with 3 nodes and seperate engine.
> >>>>
> >>>> All on CentOS 6.6:
> >>>> 3 x nodes
> >>>> 1 x engine
> >>>>
> >>>> 1 x storage nexenta with NFS
> >>>>
> >>>> For multiple weeks we are experiencing issues of our nodes that cannot
> >>>> access the storage at random moments (atleast thats what the nodes
> >>>> think).
> >>>>
> >>>> When the nodes are complaining about a unavailable storage then the load
> >>>> rises up to +200 on all three nodes, this causes that all running VMs
> >>>> are unaccessible. During this process oVirt event viewer shows some i/o
> >>>> storage error messages, when this happens random VMs get paused and will
> >>>> not be resumed anymore (this almost happens every time but not all the
> >>>> VMs get paused).
> >>>>
> >>>> During the event we tested the accessibility from the nodes to the
> >>>> storage and it looks like it is working normal, at least we can do a
> >>>> normal
> >>>> "ls" on the storage without any delay of showing the contents.
> >>>>
> >>>> We tried multiple things that we thought it causes this issue but
> >>>> nothing worked so far.
> >>>> * rebooting storage / nodes / engine.
> >>>> * disabling offsite rsync backups.
> >>>> * moved the biggest VMs with highest load to different platform outside
> >>>> of oVirt.
> >>>> * checked the wsize and rsize on the nfs mounts, storage and nodes are
> >>>> correct according to the "NFS troubleshooting page" on ovirt.org.
> >>>>
> >>>> The environment is running in production so we are not free to test
> >>>> everything.
> >>>>
> >>>> I can provide log files if needed.
> >>>>
> >>>> Kind Regards,
> >>>>
> >>>> Maikel
> >>>>
> >>>>
> >>>> _______________________________________________
> >>>> Users mailing list
> >>>> Users@ovirt.org
> >>>> http://lists.ovirt.org/mailman/listinfo/users
> >>> _______________________________________________
> >>> Users mailing list
> >>> Users@ovirt.org
> >>> http://lists.ovirt.org/mailman/listinfo/users
> 
> _______________________________________________
> Users mailing list
> Users@ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users

_______________________________________________
Users mailing list
Users@ovirt.org
http://lists.ovirt.org/mailman/listinfo/users

Reply via email to