On 04/03/2015 10:04 PM, Alastair Neil wrote: > Any follow up on this? > > Are there known issues using a replica 3 glsuter datastore with lvm > thin provisioned bricks? > > On 20 March 2015 at 15:22, Alastair Neil <[email protected] > <mailto:[email protected]>> wrote: > > CentOS 6.6 > > > vdsm-4.16.10-8.gitc937927.el6 > glusterfs-3.6.2-1.el6 > 2.6.32 - 504.8.1.el6.x86_64 > > > moved to 3.6 specifically to get the snapshotting feature, hence > my desire to migrate to thinly provisioned lvm bricks. >
Well on the glusterfs mailinglist there have been discussions: > 3.6.2 is a major release and introduces some new features in cluster > wide concept. Additionally it is not stable yet. > > > On 20 March 2015 at 14:57, Darrell Budic <[email protected] > <mailto:[email protected]>> wrote: > > What version of gluster are you running on these? > > I’ve seen high load during heals bounce my hosted engine > around due to overall system load, but never pause anything > else. Cent 7 combo storage/host systems, gluster 3.5.2. > > >> On Mar 20, 2015, at 9:57 AM, Alastair Neil >> <[email protected] <mailto:[email protected]>> wrote: >> >> Pranith >> >> I have run a pretty straightforward test. I created a two >> brick 50 G replica volume with normal lvm bricks, and >> installed two servers, one centos 6.6 and one centos 7.0. I >> kicked off bonnie++ on both to generate some file system >> activity and then made the volume replica 3. I saw no issues >> on the servers. >> >> Not clear if this is a sufficiently rigorous test and the >> Volume I have had issues on is a 3TB volume with about 2TB used. >> >> -Alastair >> >> >> On 19 March 2015 at 12:30, Alastair Neil >> <[email protected] <mailto:[email protected]>> wrote: >> >> I don't think I have the resources to test it >> meaningfully. I have about 50 vms on my primary storage >> domain. I might be able to set up a small 50 GB volume >> and provision 2 or 3 vms running test loads but I'm not >> sure it would be comparable. I'll give it a try and let >> you know if I see similar behaviour. >> >> On 19 March 2015 at 11:34, Pranith Kumar Karampuri >> <[email protected] <mailto:[email protected]>> wrote: >> >> Without thinly provisioned lvm. >> >> Pranith >> >> On 03/19/2015 08:01 PM, Alastair Neil wrote: >>> do you mean raw partitions as bricks or simply with >>> out thin provisioned lvm? >>> >>> >>> >>> On 19 March 2015 at 00:32, Pranith Kumar Karampuri >>> <[email protected] <mailto:[email protected]>> >>> wrote: >>> >>> Could you let me know if you see this problem >>> without lvm as well? >>> >>> Pranith >>> >>> On 03/18/2015 08:25 PM, Alastair Neil wrote: >>>> I am in the process of replacing the bricks >>>> with thinly provisioned lvs yes. >>>> >>>> >>>> >>>> On 18 March 2015 at 09:35, Pranith Kumar >>>> Karampuri <[email protected] >>>> <mailto:[email protected]>> wrote: >>>> >>>> hi, >>>> Are you using thin-lvm based backend >>>> on which the bricks are created? >>>> >>>> Pranith >>>> >>>> On 03/18/2015 02:05 AM, Alastair Neil wrote: >>>>> I have a Ovirt cluster with 6 VM hosts and >>>>> 4 gluster nodes. There are two >>>>> virtualisation clusters one with two >>>>> nehelem nodes and one with four >>>>> sandybridge nodes. My master storage >>>>> domain is a GlusterFS backed by a replica >>>>> 3 gluster volume from 3 of the gluster >>>>> nodes. The engine is a hosted engine >>>>> 3.5.1 on 3 of the sandybridge nodes, with >>>>> storage broviede by nfs from a different >>>>> gluster volume. All the hosts are CentOS >>>>> 6.6. >>>>> >>>>> vdsm-4.16.10-8.gitc937927.el6 >>>>> glusterfs-3.6.2-1.el6 >>>>> 2.6.32 - 504.8.1.el6.x86_64 >>>>> >>>>> >>>>> Problems happen when I try to add a new >>>>> brick or replace a brick eventually the >>>>> self heal will kill the VMs. In the VM's >>>>> logs I see kernel hung task messages. >>>>> >>>>> Mar 12 23:05:16 static1 kernel: INFO: >>>>> task nginx:1736 blocked for more than >>>>> 120 seconds. >>>>> Mar 12 23:05:16 static1 kernel: >>>>> Not tainted 2.6.32-504.3.3.el6.x86_64 #1 >>>>> Mar 12 23:05:16 static1 kernel: "echo >>>>> 0 > >>>>> /proc/sys/kernel/hung_task_timeout_secs" >>>>> disables this message. >>>>> Mar 12 23:05:16 static1 kernel: nginx >>>>> D 0000000000000001 0 1736 >>>>> 1735 0x00000080 >>>>> Mar 12 23:05:16 static1 kernel: >>>>> ffff8800778b17a8 0000000000000082 >>>>> 0000000000000000 00000000000126c0 >>>>> Mar 12 23:05:16 static1 kernel: >>>>> ffff88007e5c6500 ffff880037170080 >>>>> 0006ce5c85bd9185 ffff88007e5c64d0 >>>>> Mar 12 23:05:16 static1 kernel: >>>>> ffff88007a614ae0 00000001722b64ba >>>>> ffff88007a615098 ffff8800778b1fd8 >>>>> Mar 12 23:05:16 static1 kernel: Call >>>>> Trace: >>>>> Mar 12 23:05:16 static1 kernel: >>>>> [<ffffffff8152a885>] >>>>> schedule_timeout+0x215/0x2e0 >>>>> Mar 12 23:05:16 static1 kernel: >>>>> [<ffffffff8152a503>] >>>>> wait_for_common+0x123/0x180 >>>>> Mar 12 23:05:16 static1 kernel: >>>>> [<ffffffff81064b90>] ? >>>>> default_wake_function+0x0/0x20 >>>>> Mar 12 23:05:16 static1 kernel: >>>>> [<ffffffffa0210a76>] ? >>>>> _xfs_buf_read+0x46/0x60 [xfs] >>>>> Mar 12 23:05:16 static1 kernel: >>>>> [<ffffffffa02063c7>] ? >>>>> xfs_trans_read_buf+0x197/0x410 [xfs] >>>>> Mar 12 23:05:16 static1 kernel: >>>>> [<ffffffff8152a61d>] >>>>> wait_for_completion+0x1d/0x20 >>>>> Mar 12 23:05:16 static1 kernel: >>>>> [<ffffffffa020ff5b>] >>>>> xfs_buf_iowait+0x9b/0x100 [xfs] >>>>> Mar 12 23:05:16 static1 kernel: >>>>> [<ffffffffa02063c7>] ? >>>>> xfs_trans_read_buf+0x197/0x410 [xfs] >>>>> Mar 12 23:05:16 static1 kernel: >>>>> [<ffffffffa0210a76>] >>>>> _xfs_buf_read+0x46/0x60 [xfs] >>>>> Mar 12 23:05:16 static1 kernel: >>>>> [<ffffffffa0210b3b>] >>>>> xfs_buf_read+0xab/0x100 [xfs] >>>>> Mar 12 23:05:16 static1 kernel: >>>>> [<ffffffffa02063c7>] >>>>> xfs_trans_read_buf+0x197/0x410 [xfs] >>>>> Mar 12 23:05:16 static1 kernel: >>>>> [<ffffffffa01ee6a4>] >>>>> xfs_imap_to_bp+0x54/0x130 [xfs] >>>>> Mar 12 23:05:16 static1 kernel: >>>>> [<ffffffffa01f077b>] >>>>> xfs_iread+0x7b/0x1b0 [xfs] >>>>> Mar 12 23:05:16 static1 kernel: >>>>> [<ffffffff811ab77e>] ? >>>>> inode_init_always+0x11e/0x1c0 >>>>> Mar 12 23:05:16 static1 kernel: >>>>> [<ffffffffa01eb5ee>] >>>>> xfs_iget+0x27e/0x6e0 [xfs] >>>>> Mar 12 23:05:16 static1 kernel: >>>>> [<ffffffffa01eae1d>] ? >>>>> xfs_iunlock+0x5d/0xd0 [xfs] >>>>> Mar 12 23:05:16 static1 kernel: >>>>> [<ffffffffa0209366>] >>>>> xfs_lookup+0xc6/0x110 [xfs] >>>>> Mar 12 23:05:16 static1 kernel: >>>>> [<ffffffffa0216024>] >>>>> xfs_vn_lookup+0x54/0xa0 [xfs] >>>>> Mar 12 23:05:16 static1 kernel: >>>>> [<ffffffff8119dc65>] do_lookup+0x1a5/0x230 >>>>> Mar 12 23:05:16 static1 kernel: >>>>> [<ffffffff8119e8f4>] >>>>> __link_path_walk+0x7a4/0x1000 >>>>> Mar 12 23:05:16 static1 kernel: >>>>> [<ffffffff811738e7>] ? >>>>> cache_grow+0x217/0x320 >>>>> Mar 12 23:05:16 static1 kernel: >>>>> [<ffffffff8119f40a>] path_walk+0x6a/0xe0 >>>>> Mar 12 23:05:16 static1 kernel: >>>>> [<ffffffff8119f61b>] >>>>> filename_lookup+0x6b/0xc0 >>>>> Mar 12 23:05:16 static1 kernel: >>>>> [<ffffffff811a0747>] >>>>> user_path_at+0x57/0xa0 >>>>> Mar 12 23:05:16 static1 kernel: >>>>> [<ffffffffa0204e74>] ? >>>>> _xfs_trans_commit+0x214/0x2a0 [xfs] >>>>> Mar 12 23:05:16 static1 kernel: >>>>> [<ffffffffa01eae3e>] ? >>>>> xfs_iunlock+0x7e/0xd0 [xfs] >>>>> Mar 12 23:05:16 static1 kernel: >>>>> [<ffffffff81193bc0>] vfs_fstatat+0x50/0xa0 >>>>> Mar 12 23:05:16 static1 kernel: >>>>> [<ffffffff811aaf5d>] ? >>>>> touch_atime+0x14d/0x1a0 >>>>> Mar 12 23:05:16 static1 kernel: >>>>> [<ffffffff81193d3b>] vfs_stat+0x1b/0x20 >>>>> Mar 12 23:05:16 static1 kernel: >>>>> [<ffffffff81193d64>] sys_newstat+0x24/0x50 >>>>> Mar 12 23:05:16 static1 kernel: >>>>> [<ffffffff810e5c87>] ? >>>>> audit_syscall_entry+0x1d7/0x200 >>>>> Mar 12 23:05:16 static1 kernel: >>>>> [<ffffffff810e5a7e>] ? >>>>> __audit_syscall_exit+0x25e/0x290 >>>>> Mar 12 23:05:16 static1 kernel: >>>>> [<ffffffff8100b072>] >>>>> system_call_fastpath+0x16/0x1b >>>>> >>>>> >>>>> >>>>> I am wondering if my volume settings are >>>>> causing this. Can anyone with more >>>>> knowledge take a look and let me know: >>>>> >>>>> network.remote-dio: on >>>>> performance.stat-prefetch: off >>>>> performance.io-cache: off >>>>> performance.read-ahead: off >>>>> performance.quick-read: off >>>>> nfs.export-volumes: on >>>>> network.ping-timeout: 20 >>>>> cluster.self-heal-readdir-size: 64KB >>>>> cluster.quorum-type: auto >>>>> cluster.data-self-heal-algorithm: diff >>>>> cluster.self-heal-window-size: 8 >>>>> cluster.heal-timeout: 500 >>>>> cluster.self-heal-daemon: on >>>>> cluster.entry-self-heal: on >>>>> cluster.data-self-heal: on >>>>> cluster.metadata-self-heal: on >>>>> cluster.readdir-optimize: on >>>>> cluster.background-self-heal-count: 20 >>>>> cluster.rebalance-stats: on >>>>> cluster.min-free-disk: 5% >>>>> cluster.eager-lock: enable >>>>> storage.owner-uid: 36 >>>>> storage.owner-gid: 36 >>>>> auth.allow:* >>>>> user.cifs: disable >>>>> cluster.server-quorum-ratio: 51% >>>>> >>>>> >>>>> Many Thanks, Alastair >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> Users mailing list >>>>> [email protected] <mailto:[email protected]> >>>>> http://lists.ovirt.org/mailman/listinfo/users >>>> >>>> >>>> _______________________________________________ >>>> Users mailing list >>>> [email protected] <mailto:[email protected]> >>>> http://lists.ovirt.org/mailman/listinfo/users >>>> >>>> >>> >>> >> >> >> >> _______________________________________________ >> Users mailing list >> [email protected] <mailto:[email protected]> >> http://lists.ovirt.org/mailman/listinfo/users > > > > > > > _______________________________________________ > Users mailing list > [email protected] > http://lists.ovirt.org/mailman/listinfo/users Met vriendelijke groet, With kind regards, Jorick Astrego Netbulae Virtualization Experts ---------------- Tel: 053 20 30 270 [email protected] Staalsteden 4-3A KvK 08198180 Fax: 053 20 30 271 www.netbulae.eu 7547 TA Enschede BTW NL821234584B01 ----------------
_______________________________________________ Users mailing list [email protected] http://lists.ovirt.org/mailman/listinfo/users

