you got 4 spare disks, and can take out one of your raidz to create a temp. parallel existing pool. zfs send/receive to migrate the data, this shouldnt take much time if you are not using huge drives?
Am 22.04.2015 um 11:54 schrieb Maikel vd Mosselaar: > Yes we are aware of that, problem is it's running production so not very > easy to change the pool. > > On 04/22/2015 11:48 AM, InterNetX - Juergen Gotteswinter wrote: >> i expect that you are aware of the fact that you only get the write >> performance of a single disk in that configuration? i whould drop that >> pool configuration, drop the spare drives and go for a mirror pool. >> >> Am 22.04.2015 um 11:39 schrieb Maikel vd Mosselaar: >>> pool: z2pool >>> state: ONLINE >>> scan: scrub canceled on Sun Apr 12 16:33:38 2015 >>> config: >>> >>> NAME STATE READ WRITE CKSUM >>> z2pool ONLINE 0 0 0 >>> raidz1-0 ONLINE 0 0 0 >>> c0t5000C5004172A87Bd0 ONLINE 0 0 0 >>> c0t5000C50041A59027d0 ONLINE 0 0 0 >>> c0t5000C50041A592AFd0 ONLINE 0 0 0 >>> c0t5000C50041A660D7d0 ONLINE 0 0 0 >>> c0t5000C50041A69223d0 ONLINE 0 0 0 >>> c0t5000C50041A6ADF3d0 ONLINE 0 0 0 >>> logs >>> c0t5001517BB2845595d0 ONLINE 0 0 0 >>> cache >>> c0t5001517BB2847892d0 ONLINE 0 0 0 >>> spares >>> c0t5000C50041A6B737d0 AVAIL >>> c0t5000C50041AC3F07d0 AVAIL >>> c0t5000C50041AD48DBd0 AVAIL >>> c0t5000C50041ADD727d0 AVAIL >>> >>> errors: No known data errors >>> >>> >>> On 04/22/2015 11:17 AM, Karli Sjöberg wrote: >>>> On Wed, 2015-04-22 at 11:12 +0200, Maikel vd Mosselaar wrote: >>>>> Our pool is configured as Z1 with ZIL (normal SSD), the sync parameter >>>>> is on the default setting (standard) so "sync" is on. >>>> # zpool status ? >>>> >>>> /K >>>> >>>>> When the issue happens oVirt event viewer shows indeed latency >>>>> warnings. >>>>> Not always but most of the time this will be followed by an i/o >>>>> storage >>>>> error linked to random VMs and they will be paused when that happens. >>>>> >>>>> All the nodes use mode 4 bonding. The interfaces on the nodes don't >>>>> show >>>>> any drops or errors, i checked 2 of the VMs that got paused the last >>>>> time it happened they have dropped packets on their interfaces. >>>>> >>>>> We don't have a subscription with nexenta (anymore). >>>>> >>>>> On 04/21/2015 04:41 PM, InterNetX - Juergen Gotteswinter wrote: >>>>>> Am 21.04.2015 um 16:19 schrieb Maikel vd Mosselaar: >>>>>>> Hi Juergen, >>>>>>> >>>>>>> The load on the nodes rises far over >200 during the event. Load on >>>>>>> the >>>>>>> nexenta stays normal and nothing strange in the logging. >>>>>> ZFS + NFS could be still the root of this. Your Pool Configuration is >>>>>> RaidzX or Mirror, with or without ZIL? The sync Parameter of your ZFS >>>>>> Subvolume which gets exported is kept default on "standard" ? >>>>>> >>>>>> http://christopher-technicalmusings.blogspot.de/2010/09/zfs-and-nfs-performance-with-zil.html >>>>>> >>>>>> >>>>>> >>>>>> Since Ovirt acts very sensible about Storage Latency (throws VM into >>>>>> unresponsive or unknown state) it might be worth a try to do "zfs set >>>>>> sync=disabled pool/volume" to see if this changes things. But be >>>>>> aware >>>>>> that this makes the NFS Export vuln. against dataloss in case of >>>>>> powerloss etc, comparable to async NFS in Linux. >>>>>> >>>>>> If disabling the sync setting helps, and you dont use a seperate ZIL >>>>>> Flash Drive yet -> this whould be very likely help to get rid of >>>>>> this. >>>>>> >>>>>> Also, if you run a subscribed Version of Nexenta it might be >>>>>> helpful to >>>>>> involve them. >>>>>> >>>>>> Do you see any messages about high latency in the Ovirt Events Panel? >>>>>> >>>>>>> For our storage interfaces on our nodes we use bonding in mode 4 >>>>>>> (802.3ad) 2x 1Gb. The nexenta has 4x 1Gb bond in mode 4 also. >>>>>> This should be fine, as long as no Node uses Mode0 / Round Robin >>>>>> which >>>>>> whould lead to out of order TCP Packets. The Interfaces themself dont >>>>>> show any Drops or Errors - on the VM Hosts as well as on the Switch >>>>>> itself? >>>>>> >>>>>> Jumbo Frames? >>>>>> >>>>>>> Kind regards, >>>>>>> >>>>>>> Maikel >>>>>>> >>>>>>> >>>>>>> On 04/21/2015 02:51 PM, InterNetX - Juergen Gotteswinter wrote: >>>>>>>> Hi, >>>>>>>> >>>>>>>> how about Load, Latency, strange dmesg messages on the Nexenta ? >>>>>>>> You are >>>>>>>> using bonded Gbit Networking? If yes, which mode? >>>>>>>> >>>>>>>> Cheers, >>>>>>>> >>>>>>>> Juergen >>>>>>>> >>>>>>>> Am 20.04.2015 um 14:25 schrieb Maikel vd Mosselaar: >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> We are running ovirt 3.5.1 with 3 nodes and seperate engine. >>>>>>>>> >>>>>>>>> All on CentOS 6.6: >>>>>>>>> 3 x nodes >>>>>>>>> 1 x engine >>>>>>>>> >>>>>>>>> 1 x storage nexenta with NFS >>>>>>>>> >>>>>>>>> For multiple weeks we are experiencing issues of our nodes that >>>>>>>>> cannot >>>>>>>>> access the storage at random moments (atleast thats what the nodes >>>>>>>>> think). >>>>>>>>> >>>>>>>>> When the nodes are complaining about a unavailable storage then >>>>>>>>> the load >>>>>>>>> rises up to +200 on all three nodes, this causes that all running >>>>>>>>> VMs >>>>>>>>> are unaccessible. During this process oVirt event viewer shows >>>>>>>>> some i/o >>>>>>>>> storage error messages, when this happens random VMs get paused >>>>>>>>> and will >>>>>>>>> not be resumed anymore (this almost happens every time but not >>>>>>>>> all the >>>>>>>>> VMs get paused). >>>>>>>>> >>>>>>>>> During the event we tested the accessibility from the nodes to the >>>>>>>>> storage and it looks like it is working normal, at least we can >>>>>>>>> do a >>>>>>>>> normal >>>>>>>>> "ls" on the storage without any delay of showing the contents. >>>>>>>>> >>>>>>>>> We tried multiple things that we thought it causes this issue but >>>>>>>>> nothing worked so far. >>>>>>>>> * rebooting storage / nodes / engine. >>>>>>>>> * disabling offsite rsync backups. >>>>>>>>> * moved the biggest VMs with highest load to different platform >>>>>>>>> outside >>>>>>>>> of oVirt. >>>>>>>>> * checked the wsize and rsize on the nfs mounts, storage and >>>>>>>>> nodes are >>>>>>>>> correct according to the "NFS troubleshooting page" on ovirt.org. >>>>>>>>> >>>>>>>>> The environment is running in production so we are not free to >>>>>>>>> test >>>>>>>>> everything. >>>>>>>>> >>>>>>>>> I can provide log files if needed. >>>>>>>>> >>>>>>>>> Kind Regards, >>>>>>>>> >>>>>>>>> Maikel >>>>>>>>> >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> Users mailing list >>>>>>>>> [email protected] >>>>>>>>> http://lists.ovirt.org/mailman/listinfo/users >>>>>>>> _______________________________________________ >>>>>>>> Users mailing list >>>>>>>> [email protected] >>>>>>>> http://lists.ovirt.org/mailman/listinfo/users >>>>> _______________________________________________ >>>>> Users mailing list >>>>> [email protected] >>>>> http://lists.ovirt.org/mailman/listinfo/users >>>> _______________________________________________ >>>> Users mailing list >>>> [email protected] >>>> http://lists.ovirt.org/mailman/listinfo/users >>> _______________________________________________ >>> Users mailing list >>> [email protected] >>> http://lists.ovirt.org/mailman/listinfo/users >> _______________________________________________ >> Users mailing list >> [email protected] >> http://lists.ovirt.org/mailman/listinfo/users > > _______________________________________________ > Users mailing list > [email protected] > http://lists.ovirt.org/mailman/listinfo/users _______________________________________________ Users mailing list [email protected] http://lists.ovirt.org/mailman/listinfo/users

