Re: [Gluster-users] [ovirt-users] Re: Gluster problems, cluster performance issues
Adding Ravi to look into the heal issue. As for the fsync hang and subsequent IO errors, it seems a lot like https://bugzilla.redhat.com/show_bug.cgi?id=1497156 and Paolo Bonzini from qemu had pointed out that this would be fixed by the following commit: commit e72c9a2a67a6400c8ef3d01d4c461dbbbfa0e1f0 Author: Paolo Bonzini Date: Wed Jun 21 16:35:46 2017 +0200 scsi: virtio_scsi: let host do exception handling virtio_scsi tries to do exception handling after the default 30 seconds timeout expires. However, it's better to let the host control the timeout, otherwise with a heavy I/O load it is likely that an abort will also timeout. This leads to fatal errors like filesystems going offline. Disable the 'sd' timeout and allow the host to do exception handling, following the precedent of the storvsc driver. Hannes has a proposal to introduce timeouts in virtio, but this provides an immediate solution for stable kernels too. [mkp: fixed typo] Reported-by: Douglas Miller Cc: "James E.J. Bottomley" Cc: "Martin K. Petersen" Cc: Hannes Reinecke Cc: linux-s...@vger.kernel.org Cc: sta...@vger.kernel.org Signed-off-by: Paolo Bonzini Signed-off-by: Martin K. Petersen Adding Paolo/Kevin to comment. As for the poor gluster performance, could you disable cluster.eager-lock and see if that makes any difference: # gluster volume set cluster.eager-lock off Do also capture the volume profile again if you still see performance issues after disabling eager-lock. -Krutika On Wed, May 30, 2018 at 6:55 AM, Jim Kusznir wrote: > I also finally found the following in my system log on one server: > > [10679.524491] INFO: task glusterclogro:14933 blocked for more than 120 > seconds. > [10679.525826] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" > disables this message. > [10679.527144] glusterclogro D 97209832bf40 0 14933 1 > 0x0080 > [10679.527150] Call Trace: > [10679.527161] [] schedule+0x29/0x70 > [10679.527218] [] _xfs_log_force_lsn+0x2e8/0x340 [xfs] > [10679.527225] [] ? wake_up_state+0x20/0x20 > [10679.527254] [] xfs_file_fsync+0x107/0x1e0 [xfs] > [10679.527260] [] do_fsync+0x67/0xb0 > [10679.527268] [] ? system_call_after_swapgs+0xbc/0x160 > [10679.527271] [] SyS_fsync+0x10/0x20 > [10679.527275] [] system_call_fastpath+0x1c/0x21 > [10679.527279] [] ? system_call_after_swapgs+0xc8/0x160 > [10679.527283] INFO: task glusterposixfsy:14941 blocked for more than 120 > seconds. > [10679.528608] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" > disables this message. > [10679.529956] glusterposixfsy D 972495f84f10 0 14941 1 > 0x0080 > [10679.529961] Call Trace: > [10679.529966] [] schedule+0x29/0x70 > [10679.530003] [] _xfs_log_force_lsn+0x2e8/0x340 [xfs] > [10679.530008] [] ? wake_up_state+0x20/0x20 > [10679.530038] [] xfs_file_fsync+0x107/0x1e0 [xfs] > [10679.530042] [] do_fsync+0x67/0xb0 > [10679.530046] [] ? system_call_after_swapgs+0xbc/0x160 > [10679.530050] [] SyS_fdatasync+0x13/0x20 > [10679.530054] [] system_call_fastpath+0x1c/0x21 > [10679.530058] [] ? system_call_after_swapgs+0xc8/0x160 > [10679.530062] INFO: task glusteriotwr13:15486 blocked for more than 120 > seconds. > [10679.531805] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" > disables this message. > [10679.533732] glusteriotwr13 D 9720a83f 0 15486 1 > 0x0080 > [10679.533738] Call Trace: > [10679.533747] [] schedule+0x29/0x70 > [10679.533799] [] _xfs_log_force_lsn+0x2e8/0x340 [xfs] > [10679.533806] [] ? wake_up_state+0x20/0x20 > [10679.533846] [] xfs_file_fsync+0x107/0x1e0 [xfs] > [10679.533852] [] do_fsync+0x67/0xb0 > [10679.533858] [] ? system_call_after_swapgs+0xbc/0x160 > [10679.533863] [] SyS_fdatasync+0x13/0x20 > [10679.533868] [] system_call_fastpath+0x1c/0x21 > [10679.533873] [] ? system_call_after_swapgs+0xc8/0x160 > [10919.512757] INFO: task glusterclogro:14933 blocked for more than 120 > seconds. > [10919.514714] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" > disables this message. > [10919.516663] glusterclogro D 97209832bf40 0 14933 1 > 0x0080 > [10919.516677] Call Trace: > [10919.516690] [] schedule+0x29/0x70 > [10919.516696] [] schedule_timeout+0x239/0x2c0 > [10919.516703] [] ? blk_finish_plug+0x14/0x40 > [10919.516768] [] ? _xfs_buf_ioapply+0x334/0x460 [xfs] > [10919.516774] [] wait_for_completion+0xfd/0x140 > [10919.516782] [] ? wake_up_state+0x20/0x20 > [10919.516821] [] ? _xfs_buf_read+0x23/0x40 [xfs] > [10919.516859] [] xfs_buf_submit_wait+0xf9/0x1d0 [xfs] > [10919.516902] [] ? xfs_trans_read_buf_map+0x199/0x400 > [xfs] > [10919.516940] [] _xfs_buf_read+0x23/0x40 [xfs] > [10919.516977] [] xfs_buf_read_map+0xf9/0x160 [xfs] > [10919.517022] [] xfs_trans_read_buf_map+0x199/0x400 > [xfs] > [10919.517057] [] xfs_da_read_buf+0xd4/0x100 [xfs] > [10919.517091] [] xfs_da3_node_read+0x23/0xd0 [xfs] >
Re: [Gluster-users] shard corruption bug
https://docs.gluster.org/en/latest/release-notes/3.12.6/ The major issue in 3.12.6 is not present in 3.12.7. Bugzilla ID listed in link. On May 29, 2018 8:50:56 PM EDT, Dan Lavu wrote: >What shard corruption bug? bugzilla url? I'm running into some odd >behavior >in my lab with shards and RHEV/KVM data, trying to figure out if it's >related. > >Thanks. > >On Fri, May 4, 2018 at 11:13 AM, Jim Kinney >wrote: > >> I upgraded my ovirt stack to 3.12.9, added a brick to a volume and >left it >> to settle. No problems. I am now running replica 4 (preparing to >remove a >> brick and host to replica 3). >> >> On Fri, 2018-05-04 at 14:24 +, Gandalf Corvotempesta wrote: >> >> Il giorno ven 4 mag 2018 alle ore 14:06 Jim Kinney > >> ha scritto: >> >> >> It stopped being an outstanding issue at 3.12.7. I think it's now >fixed. >> >> >> >> So, is not possible to extend and rebalance a working cluster with >sharded >> data ? >> Can someone confirm this ? Maybe the ones that hit the bug in the >past >> >> -- >> >> James P. Kinney III Every time you stop a school, you will have to >build a >> jail. What you gain at one end you lose at the other. It's like >feeding a >> dog on his own tail. It won't fatten the dog. - Speech 11/23/1900 >Mark >> Twain http://heretothereideas.blogspot.com/ >> >> >> ___ >> Gluster-users mailing list >> Gluster-users@gluster.org >> http://lists.gluster.org/mailman/listinfo/gluster-users >> -- Sent from my Android device with K-9 Mail. All tyopes are thumb related and reflect authenticity.___ Gluster-users mailing list Gluster-users@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] RDMA inline threshold?
Forgot to mention, sometimes I have to do force start other volumes as well, its hard to determine which brick process is locked up from the logs. Status of volume: rhev_vms_primary Gluster process TCP Port RDMA Port Online Pid -- Brick spidey.ib.runlevelone.lan:/gluster/brick/rhev_vms_primary 0 49157 Y 15666 Brick deadpool.ib.runlevelone.lan:/gluster/brick/rhev_vms_primary 0 49156 Y 2542 Brick groot.ib.runlevelone.lan:/gluster/brick/rhev_vms_primary 0 49156 Y 2180 Self-heal Daemon on localhost N/A N/AN N/A << Brick process is not running on any node. Self-heal Daemon on spidey.ib.runlevelone.lan N/A N/AN N/A Self-heal Daemon on groot.ib.runlevelone.lan N/A N/AN N/A Task Status of Volume rhev_vms_primary -- There are no active volume tasks 3081 gluster volume start rhev_vms_noshards force 3082 gluster volume status 3083 gluster volume start rhev_vms_primary force 3084 gluster volume status 3085 gluster volume start rhev_vms_primary rhev_vms 3086 gluster volume start rhev_vms_primary rhev_vms force Status of volume: rhev_vms_primary Gluster process TCP Port RDMA Port Online Pid -- Brick spidey.ib.runlevelone.lan:/gluster/brick/rhev_vms_primary 0 49157 Y 15666 Brick deadpool.ib.runlevelone.lan:/gluster/brick/rhev_vms_primary 0 49156 Y 2542 Brick groot.ib.runlevelone.lan:/gluster/brick/rhev_vms_primary 0 49156 Y 2180 Self-heal Daemon on localhost N/A N/AY 8343 Self-heal Daemon on spidey.ib.runlevelone.lan N/A N/AY 22381 Self-heal Daemon on groot.ib.runlevelone.lan N/A N/AY 20633 Finally.. Dan On Tue, May 29, 2018 at 8:47 PM, Dan Lavu wrote: > Stefan, > > Sounds like a brick process is not running. I have notice some strangeness > in my lab when using RDMA, I often have to forcibly restart the brick > process, often as in every single time I do a major operation, add a new > volume, remove a volume, stop a volume, etc. > > gluster volume status > > Does any of the self heal daemons show N/A? If that's the case, try > forcing a restart on the volume. > > gluster volume start force > > This will also explain why your volumes aren't being replicated properly. > > On Tue, May 29, 2018 at 5:20 PM, Stefan Solbrig > wrote: > >> Dear all, >> >> I faced a problem with a glusterfs volume (pure distributed, _not_ >> dispersed) over RDMA transport. One user had a directory with a large >> number of files (50,000 files) and just doing an "ls" in this directory >> yields a "Transport endpoint not connected" error. The effect is, that "ls" >> only shows some files, but not all. >> >> The respective log file shows this error message: >> >> [2018-05-20 20:38:25.114978] W [MSGID: 114031] >> [client-rpc-fops.c:2578:client3_3_readdirp_cbk] 0-glurch-client-0: >> remote operation failed [Transport endpoint is not connected] >> [2018-05-20 20:38:27.732796] W [MSGID: 103046] >> [rdma.c:4089:gf_rdma_process_recv] 0-rpc-transport/rdma: peer ( >> 10.100.245.18:49153), couldn't encode or decode the msg properly or >> write chunks were not provided for replies that were bigger than >> RDMA_INLINE_THRESHOLD (2048) >> [2018-05-20 20:38:27.732844] W [MSGID: 114031] >> [client-rpc-fops.c:2578:client3_3_readdirp_cbk] 0-glurch-client-3: >> remote operation failed [Transport endpoint is not connected] >> [2018-05-20 20:38:27.733181] W [fuse-bridge.c:2897:fuse_readdirp_cbk] >> 0-glusterfs-fuse: 72882828: READDIRP => -1 (Transport endpoint is not >> connected) >> >> I already set the memlock limit for glusterd to unlimited, but the >> problem persists. >> >> Only going from RDMA transport to TCP transport solved the problem. (I'm >> running the volume now in mixed mode, config.transport=tcp,rdma). Mounting >> with transport=rdma shows this error, mouting with transport=tcp is fine. >> >> however, this problem does not arise on all large directories, not on >> all. I didn't recognize a pattern yet. >> >> I'm using glusterfs v3.12.6 on the servers, QDR Infiniband HCAs . >> >> Is this a known issue with RDMA transport? >> >> best wishes, >> Stefan >> >> ___ >> Gluster-users mailing list >> Gluster-users@gluster.org >> http://lists.gluster.org/mailman/listinfo/gluster-users >> > > ___ Gluster-users mailing list Gluster-users@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] shard corruption bug
What shard corruption bug? bugzilla url? I'm running into some odd behavior in my lab with shards and RHEV/KVM data, trying to figure out if it's related. Thanks. On Fri, May 4, 2018 at 11:13 AM, Jim Kinney wrote: > I upgraded my ovirt stack to 3.12.9, added a brick to a volume and left it > to settle. No problems. I am now running replica 4 (preparing to remove a > brick and host to replica 3). > > On Fri, 2018-05-04 at 14:24 +, Gandalf Corvotempesta wrote: > > Il giorno ven 4 mag 2018 alle ore 14:06 Jim Kinney > ha scritto: > > > It stopped being an outstanding issue at 3.12.7. I think it's now fixed. > > > > So, is not possible to extend and rebalance a working cluster with sharded > data ? > Can someone confirm this ? Maybe the ones that hit the bug in the past > > -- > > James P. Kinney III Every time you stop a school, you will have to build a > jail. What you gain at one end you lose at the other. It's like feeding a > dog on his own tail. It won't fatten the dog. - Speech 11/23/1900 Mark > Twain http://heretothereideas.blogspot.com/ > > > ___ > Gluster-users mailing list > Gluster-users@gluster.org > http://lists.gluster.org/mailman/listinfo/gluster-users > ___ Gluster-users mailing list Gluster-users@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] RDMA inline threshold?
Stefan, Sounds like a brick process is not running. I have notice some strangeness in my lab when using RDMA, I often have to forcibly restart the brick process, often as in every single time I do a major operation, add a new volume, remove a volume, stop a volume, etc. gluster volume status Does any of the self heal daemons show N/A? If that's the case, try forcing a restart on the volume. gluster volume start force This will also explain why your volumes aren't being replicated properly. On Tue, May 29, 2018 at 5:20 PM, Stefan Solbrig wrote: > Dear all, > > I faced a problem with a glusterfs volume (pure distributed, _not_ > dispersed) over RDMA transport. One user had a directory with a large > number of files (50,000 files) and just doing an "ls" in this directory > yields a "Transport endpoint not connected" error. The effect is, that "ls" > only shows some files, but not all. > > The respective log file shows this error message: > > [2018-05-20 20:38:25.114978] W [MSGID: 114031] > [client-rpc-fops.c:2578:client3_3_readdirp_cbk] > 0-glurch-client-0: remote operation failed [Transport endpoint is not > connected] > [2018-05-20 20:38:27.732796] W [MSGID: 103046] > [rdma.c:4089:gf_rdma_process_recv] 0-rpc-transport/rdma: peer ( > 10.100.245.18:49153), couldn't encode or decode the msg properly or write > chunks were not provided for replies that were bigger than > RDMA_INLINE_THRESHOLD (2048) > [2018-05-20 20:38:27.732844] W [MSGID: 114031] > [client-rpc-fops.c:2578:client3_3_readdirp_cbk] > 0-glurch-client-3: remote operation failed [Transport endpoint is not > connected] > [2018-05-20 20:38:27.733181] W [fuse-bridge.c:2897:fuse_readdirp_cbk] > 0-glusterfs-fuse: 72882828: READDIRP => -1 (Transport endpoint is not > connected) > > I already set the memlock limit for glusterd to unlimited, but the problem > persists. > > Only going from RDMA transport to TCP transport solved the problem. (I'm > running the volume now in mixed mode, config.transport=tcp,rdma). Mounting > with transport=rdma shows this error, mouting with transport=tcp is fine. > > however, this problem does not arise on all large directories, not on all. > I didn't recognize a pattern yet. > > I'm using glusterfs v3.12.6 on the servers, QDR Infiniband HCAs . > > Is this a known issue with RDMA transport? > > best wishes, > Stefan > > ___ > Gluster-users mailing list > Gluster-users@gluster.org > http://lists.gluster.org/mailman/listinfo/gluster-users > ___ Gluster-users mailing list Gluster-users@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-users
[Gluster-users] RDMA inline threshold?
Dear all, I faced a problem with a glusterfs volume (pure distributed, _not_ dispersed) over RDMA transport. One user had a directory with a large number of files (50,000 files) and just doing an "ls" in this directory yields a "Transport endpoint not connected" error. The effect is, that "ls" only shows some files, but not all. The respective log file shows this error message: [2018-05-20 20:38:25.114978] W [MSGID: 114031] [client-rpc-fops.c:2578:client3_3_readdirp_cbk] 0-glurch-client-0: remote operation failed [Transport endpoint is not connected] [2018-05-20 20:38:27.732796] W [MSGID: 103046] [rdma.c:4089:gf_rdma_process_recv] 0-rpc-transport/rdma: peer (10.100.245.18:49153), couldn't encode or decode the msg properly or write chunks were not provided for replies that were bigger than RDMA_INLINE_THRESHOLD (2048) [2018-05-20 20:38:27.732844] W [MSGID: 114031] [client-rpc-fops.c:2578:client3_3_readdirp_cbk] 0-glurch-client-3: remote operation failed [Transport endpoint is not connected] [2018-05-20 20:38:27.733181] W [fuse-bridge.c:2897:fuse_readdirp_cbk] 0-glusterfs-fuse: 72882828: READDIRP => -1 (Transport endpoint is not connected) I already set the memlock limit for glusterd to unlimited, but the problem persists. Only going from RDMA transport to TCP transport solved the problem. (I'm running the volume now in mixed mode, config.transport=tcp,rdma). Mounting with transport=rdma shows this error, mouting with transport=tcp is fine. however, this problem does not arise on all large directories, not on all. I didn't recognize a pattern yet. I'm using glusterfs v3.12.6 on the servers, QDR Infiniband HCAs . Is this a known issue with RDMA transport? best wishes, Stefan ___ Gluster-users mailing list Gluster-users@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] glustefs as vmware datastore in production
On Tue, May 29, 2018 at 09:03:04AM +0900, 김경표 wrote: > Sometimes os disk hang occured and re-mounted with ro in vm guest(centos6) > when storage was busy. I had similar problems in the early days of running my gluster volume, then I switched the gluster mounts from fuse to libgfapi and haven't had a problem since, even when running the volume harder than I had been previously. But, then, I'm running kvm/qemu virtualization rather than vmware and I don't know whether vmware supports libgfapi or not. (I noticed that it wasn't on the list of options you mentioned for how to access the volume.) -- Dave Sherohman ___ Gluster-users mailing list Gluster-users@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-users
Re: [Gluster-users] glustefs as vmware datastore in production
Hi,I've gone through a bit of testing around using Gluster as a VMWare datastore, here are my findings: running VMWare vSphere 6.5 with ESXi nodesGluster running on Supermicro kit, 6 SAS disks with 2 SSD's for caching all carved up using LVM on to of CentOS 7. I set up a 4 node cluster, ultimately to scale to 12 should this become usable, initially just using 3 nodes to satisfy quorum requirements without playing too much with default values.The idea was to preset via NFS, have a cluster VIP and heartbeat. Obviously NFS is no longer a viable option direct from Gluster so the route I took was down the NFS-Ganesha route. Had no end of troubles using the Ganesha tools with gluster 3.10 so ended up doing the whole thing manually which seems to stand up ok.For this I used Gluster, NFS-Ganesha with VFS connections, Pacemaker and Corrosync (still not 100% happy with the config and it needs tweaking but it does what it needs so far). Next job was connecting from VMWare, I had to ensure that relevant read/write permissions were set both on the filesystem and in the Ganesha config but connection succeeded. From here was where I'm suffering a few issues. If you migrate a VM into the NFS volume and then away again it doesn't clear the parent directory, it removes everything else but that, SSHing onto the ESXi server and manually trying to remove it results in 2 errors: rm directory - cannot remove as it is a directoryrm -rf directory - cannot remove as it isn't a directory second issue is to do with failover, The VIP fails over extremely fast, but VMWare always loses connection to the volume and a weird oddity of VMware seems to be that it will not remount an NFS volume so stays unavailable. I have not got over these hurdles as yet, but will persevere as performance wise and cost wise its brilliant. Hope this helps, or if anyone has any clues to the above issues I'd be most grateful. Regards Jon On Friday, 25 May 2018, 18:33:09 BST, wrote: Hi, Does anyone have glusterfs as vmware datastore working in production in a real world case? How to serve the glusterfs cluster? As iscsi, NFS? Thanks in advance ___ Gluster-users mailing list Gluster-users@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-users ___ Gluster-users mailing list Gluster-users@gluster.org http://lists.gluster.org/mailman/listinfo/gluster-users