Re: [Gluster-users] [ovirt-users] Re: Gluster problems, cluster performance issues

2018-05-29 Thread Krutika Dhananjay
Adding Ravi to look into the heal issue.

As for the fsync hang and subsequent IO errors, it seems a lot like
https://bugzilla.redhat.com/show_bug.cgi?id=1497156 and Paolo Bonzini from
qemu had pointed out that this would be fixed by the following commit:

  commit e72c9a2a67a6400c8ef3d01d4c461dbbbfa0e1f0
Author: Paolo Bonzini 
Date:   Wed Jun 21 16:35:46 2017 +0200

scsi: virtio_scsi: let host do exception handling

virtio_scsi tries to do exception handling after the default 30 seconds
timeout expires.  However, it's better to let the host control the
timeout, otherwise with a heavy I/O load it is likely that an abort will
also timeout.  This leads to fatal errors like filesystems going
offline.

Disable the 'sd' timeout and allow the host to do exception handling,
following the precedent of the storvsc driver.

Hannes has a proposal to introduce timeouts in virtio, but this provides
an immediate solution for stable kernels too.

[mkp: fixed typo]

Reported-by: Douglas Miller 
Cc: "James E.J. Bottomley" 
Cc: "Martin K. Petersen" 
Cc: Hannes Reinecke 
Cc: linux-s...@vger.kernel.org
Cc: sta...@vger.kernel.org
Signed-off-by: Paolo Bonzini 
Signed-off-by: Martin K. Petersen 


Adding Paolo/Kevin to comment.

As for the poor gluster performance, could you disable cluster.eager-lock
and see if that makes any difference:

# gluster volume set  cluster.eager-lock off

Do also capture the volume profile again if you still see performance
issues after disabling eager-lock.

-Krutika


On Wed, May 30, 2018 at 6:55 AM, Jim Kusznir  wrote:

> I also finally found the following in my system log on one server:
>
> [10679.524491] INFO: task glusterclogro:14933 blocked for more than 120
> seconds.
> [10679.525826] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [10679.527144] glusterclogro   D 97209832bf40 0 14933  1
> 0x0080
> [10679.527150] Call Trace:
> [10679.527161]  [] schedule+0x29/0x70
> [10679.527218]  [] _xfs_log_force_lsn+0x2e8/0x340 [xfs]
> [10679.527225]  [] ? wake_up_state+0x20/0x20
> [10679.527254]  [] xfs_file_fsync+0x107/0x1e0 [xfs]
> [10679.527260]  [] do_fsync+0x67/0xb0
> [10679.527268]  [] ? system_call_after_swapgs+0xbc/0x160
> [10679.527271]  [] SyS_fsync+0x10/0x20
> [10679.527275]  [] system_call_fastpath+0x1c/0x21
> [10679.527279]  [] ? system_call_after_swapgs+0xc8/0x160
> [10679.527283] INFO: task glusterposixfsy:14941 blocked for more than 120
> seconds.
> [10679.528608] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [10679.529956] glusterposixfsy D 972495f84f10 0 14941  1
> 0x0080
> [10679.529961] Call Trace:
> [10679.529966]  [] schedule+0x29/0x70
> [10679.530003]  [] _xfs_log_force_lsn+0x2e8/0x340 [xfs]
> [10679.530008]  [] ? wake_up_state+0x20/0x20
> [10679.530038]  [] xfs_file_fsync+0x107/0x1e0 [xfs]
> [10679.530042]  [] do_fsync+0x67/0xb0
> [10679.530046]  [] ? system_call_after_swapgs+0xbc/0x160
> [10679.530050]  [] SyS_fdatasync+0x13/0x20
> [10679.530054]  [] system_call_fastpath+0x1c/0x21
> [10679.530058]  [] ? system_call_after_swapgs+0xc8/0x160
> [10679.530062] INFO: task glusteriotwr13:15486 blocked for more than 120
> seconds.
> [10679.531805] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [10679.533732] glusteriotwr13  D 9720a83f 0 15486  1
> 0x0080
> [10679.533738] Call Trace:
> [10679.533747]  [] schedule+0x29/0x70
> [10679.533799]  [] _xfs_log_force_lsn+0x2e8/0x340 [xfs]
> [10679.533806]  [] ? wake_up_state+0x20/0x20
> [10679.533846]  [] xfs_file_fsync+0x107/0x1e0 [xfs]
> [10679.533852]  [] do_fsync+0x67/0xb0
> [10679.533858]  [] ? system_call_after_swapgs+0xbc/0x160
> [10679.533863]  [] SyS_fdatasync+0x13/0x20
> [10679.533868]  [] system_call_fastpath+0x1c/0x21
> [10679.533873]  [] ? system_call_after_swapgs+0xc8/0x160
> [10919.512757] INFO: task glusterclogro:14933 blocked for more than 120
> seconds.
> [10919.514714] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [10919.516663] glusterclogro   D 97209832bf40 0 14933  1
> 0x0080
> [10919.516677] Call Trace:
> [10919.516690]  [] schedule+0x29/0x70
> [10919.516696]  [] schedule_timeout+0x239/0x2c0
> [10919.516703]  [] ? blk_finish_plug+0x14/0x40
> [10919.516768]  [] ? _xfs_buf_ioapply+0x334/0x460 [xfs]
> [10919.516774]  [] wait_for_completion+0xfd/0x140
> [10919.516782]  [] ? wake_up_state+0x20/0x20
> [10919.516821]  [] ? _xfs_buf_read+0x23/0x40 [xfs]
> [10919.516859]  [] xfs_buf_submit_wait+0xf9/0x1d0 [xfs]
> [10919.516902]  [] ? xfs_trans_read_buf_map+0x199/0x400
> [xfs]
> [10919.516940]  [] _xfs_buf_read+0x23/0x40 [xfs]
> [10919.516977]  [] xfs_buf_read_map+0xf9/0x160 [xfs]
> [10919.517022]  [] xfs_trans_read_buf_map+0x199/0x400
> [xfs]
> [10919.517057]  [] xfs_da_read_buf+0xd4/0x100 [xfs]
> [10919.517091]  [] xfs_da3_node_read+0x23/0xd0 [xfs]
> 

Re: [Gluster-users] shard corruption bug

2018-05-29 Thread Jim Kinney
https://docs.gluster.org/en/latest/release-notes/3.12.6/

The major issue in 3.12.6 is not present in 3.12.7. Bugzilla ID listed in link.

On May 29, 2018 8:50:56 PM EDT, Dan Lavu  wrote:
>What shard corruption bug? bugzilla url? I'm running into some odd
>behavior
>in my lab with shards and RHEV/KVM data, trying to figure out if it's
>related.
>
>Thanks.
>
>On Fri, May 4, 2018 at 11:13 AM, Jim Kinney 
>wrote:
>
>> I upgraded my ovirt stack to 3.12.9, added a brick to a volume and
>left it
>> to settle. No problems. I am now running replica 4 (preparing to
>remove a
>> brick and host to replica 3).
>>
>> On Fri, 2018-05-04 at 14:24 +, Gandalf Corvotempesta wrote:
>>
>> Il giorno ven 4 mag 2018 alle ore 14:06 Jim Kinney
>
>> ha scritto:
>>
>>
>> It stopped being an outstanding issue at 3.12.7. I think it's now
>fixed.
>>
>>
>>
>> So, is not possible to extend and rebalance a working cluster with
>sharded
>> data ?
>> Can someone confirm this ? Maybe the ones that hit the bug in the
>past
>>
>> --
>>
>> James P. Kinney III Every time you stop a school, you will have to
>build a
>> jail. What you gain at one end you lose at the other. It's like
>feeding a
>> dog on his own tail. It won't fatten the dog. - Speech 11/23/1900
>Mark
>> Twain http://heretothereideas.blogspot.com/
>>
>>
>> ___
>> Gluster-users mailing list
>> Gluster-users@gluster.org
>> http://lists.gluster.org/mailman/listinfo/gluster-users
>>

-- 
Sent from my Android device with K-9 Mail. All tyopes are thumb related and 
reflect authenticity.___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] RDMA inline threshold?

2018-05-29 Thread Dan Lavu
Forgot to mention, sometimes I have to do force start other volumes as
well, its hard to determine which brick process is locked up from the logs.


Status of volume: rhev_vms_primary
Gluster process
  TCP Port  RDMA Port  Online  Pid
--
Brick spidey.ib.runlevelone.lan:/gluster/brick/rhev_vms_primary
 0 49157  Y   15666
Brick deadpool.ib.runlevelone.lan:/gluster/brick/rhev_vms_primary
   0 49156  Y   2542
Brick groot.ib.runlevelone.lan:/gluster/brick/rhev_vms_primary
 0 49156  Y   2180
Self-heal Daemon on localhost
  N/A   N/AN   N/A  << Brick process is not
running on any node.
Self-heal Daemon on spidey.ib.runlevelone.lan
   N/A   N/AN   N/A
Self-heal Daemon on groot.ib.runlevelone.lan
   N/A   N/AN   N/A

Task Status of Volume rhev_vms_primary
--
There are no active volume tasks


 3081  gluster volume start rhev_vms_noshards force
 3082  gluster volume status
 3083  gluster volume start rhev_vms_primary force
 3084  gluster volume status
 3085  gluster volume start rhev_vms_primary rhev_vms
 3086  gluster volume start rhev_vms_primary rhev_vms force

Status of volume: rhev_vms_primary
Gluster process
 TCP Port  RDMA Port  Online  Pid
--
Brick spidey.ib.runlevelone.lan:/gluster/brick/rhev_vms_primary
0 49157  Y   15666
Brick deadpool.ib.runlevelone.lan:/gluster/brick/rhev_vms_primary
  0 49156  Y   2542
Brick groot.ib.runlevelone.lan:/gluster/brick/rhev_vms_primary
0 49156  Y   2180
Self-heal Daemon on localhost
 N/A   N/AY   8343
Self-heal Daemon on spidey.ib.runlevelone.lan
  N/A   N/AY   22381
Self-heal Daemon on groot.ib.runlevelone.lan
  N/A   N/AY   20633

Finally..

Dan




On Tue, May 29, 2018 at 8:47 PM, Dan Lavu  wrote:

> Stefan,
>
> Sounds like a brick process is not running. I have notice some strangeness
> in my lab when using RDMA, I often have to forcibly restart the brick
> process, often as in every single time I do a major operation, add a new
> volume, remove a volume, stop a volume, etc.
>
> gluster volume status 
>
> Does any of the self heal daemons show N/A? If that's the case, try
> forcing a restart on the volume.
>
> gluster volume start  force
>
> This will also explain why your volumes aren't being replicated properly.
>
> On Tue, May 29, 2018 at 5:20 PM, Stefan Solbrig 
> wrote:
>
>> Dear all,
>>
>> I faced a problem with a glusterfs volume (pure distributed, _not_
>> dispersed) over RDMA transport.  One user had a directory with a large
>> number of files (50,000 files) and just doing an "ls" in this directory
>> yields a "Transport endpoint not connected" error. The effect is, that "ls"
>> only shows some files, but not all.
>>
>> The respective log file shows this error message:
>>
>> [2018-05-20 20:38:25.114978] W [MSGID: 114031]
>> [client-rpc-fops.c:2578:client3_3_readdirp_cbk] 0-glurch-client-0:
>> remote operation failed [Transport endpoint is not connected]
>> [2018-05-20 20:38:27.732796] W [MSGID: 103046]
>> [rdma.c:4089:gf_rdma_process_recv] 0-rpc-transport/rdma: peer (
>> 10.100.245.18:49153), couldn't encode or decode the msg properly or
>> write chunks were not provided for replies that were bigger than
>> RDMA_INLINE_THRESHOLD (2048)
>> [2018-05-20 20:38:27.732844] W [MSGID: 114031]
>> [client-rpc-fops.c:2578:client3_3_readdirp_cbk] 0-glurch-client-3:
>> remote operation failed [Transport endpoint is not connected]
>> [2018-05-20 20:38:27.733181] W [fuse-bridge.c:2897:fuse_readdirp_cbk]
>> 0-glusterfs-fuse: 72882828: READDIRP => -1 (Transport endpoint is not
>> connected)
>>
>> I already set the memlock limit for glusterd to unlimited, but the
>> problem persists.
>>
>> Only going from RDMA transport to TCP transport solved the problem.  (I'm
>> running the volume now in mixed mode, config.transport=tcp,rdma).  Mounting
>> with transport=rdma shows this error, mouting with transport=tcp is fine.
>>
>> however, this problem does not arise on all large directories, not on
>> all. I didn't recognize a pattern yet.
>>
>> I'm using glusterfs v3.12.6 on the servers, QDR Infiniband HCAs .
>>
>> Is this a known issue with RDMA transport?
>>
>> best wishes,
>> Stefan
>>
>> ___
>> Gluster-users mailing list
>> Gluster-users@gluster.org
>> http://lists.gluster.org/mailman/listinfo/gluster-users
>>
>
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] shard corruption bug

2018-05-29 Thread Dan Lavu
What shard corruption bug? bugzilla url? I'm running into some odd behavior
in my lab with shards and RHEV/KVM data, trying to figure out if it's
related.

Thanks.

On Fri, May 4, 2018 at 11:13 AM, Jim Kinney  wrote:

> I upgraded my ovirt stack to 3.12.9, added a brick to a volume and left it
> to settle. No problems. I am now running replica 4 (preparing to remove a
> brick and host to replica 3).
>
> On Fri, 2018-05-04 at 14:24 +, Gandalf Corvotempesta wrote:
>
> Il giorno ven 4 mag 2018 alle ore 14:06 Jim Kinney 
> ha scritto:
>
>
> It stopped being an outstanding issue at 3.12.7. I think it's now fixed.
>
>
>
> So, is not possible to extend and rebalance a working cluster with sharded
> data ?
> Can someone confirm this ? Maybe the ones that hit the bug in the past
>
> --
>
> James P. Kinney III Every time you stop a school, you will have to build a
> jail. What you gain at one end you lose at the other. It's like feeding a
> dog on his own tail. It won't fatten the dog. - Speech 11/23/1900 Mark
> Twain http://heretothereideas.blogspot.com/
>
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] RDMA inline threshold?

2018-05-29 Thread Dan Lavu
Stefan,

Sounds like a brick process is not running. I have notice some strangeness
in my lab when using RDMA, I often have to forcibly restart the brick
process, often as in every single time I do a major operation, add a new
volume, remove a volume, stop a volume, etc.

gluster volume status 

Does any of the self heal daemons show N/A? If that's the case, try forcing
a restart on the volume.

gluster volume start  force

This will also explain why your volumes aren't being replicated properly.

On Tue, May 29, 2018 at 5:20 PM, Stefan Solbrig 
wrote:

> Dear all,
>
> I faced a problem with a glusterfs volume (pure distributed, _not_
> dispersed) over RDMA transport.  One user had a directory with a large
> number of files (50,000 files) and just doing an "ls" in this directory
> yields a "Transport endpoint not connected" error. The effect is, that "ls"
> only shows some files, but not all.
>
> The respective log file shows this error message:
>
> [2018-05-20 20:38:25.114978] W [MSGID: 114031] 
> [client-rpc-fops.c:2578:client3_3_readdirp_cbk]
> 0-glurch-client-0: remote operation failed [Transport endpoint is not
> connected]
> [2018-05-20 20:38:27.732796] W [MSGID: 103046]
> [rdma.c:4089:gf_rdma_process_recv] 0-rpc-transport/rdma: peer (
> 10.100.245.18:49153), couldn't encode or decode the msg properly or write
> chunks were not provided for replies that were bigger than
> RDMA_INLINE_THRESHOLD (2048)
> [2018-05-20 20:38:27.732844] W [MSGID: 114031] 
> [client-rpc-fops.c:2578:client3_3_readdirp_cbk]
> 0-glurch-client-3: remote operation failed [Transport endpoint is not
> connected]
> [2018-05-20 20:38:27.733181] W [fuse-bridge.c:2897:fuse_readdirp_cbk]
> 0-glusterfs-fuse: 72882828: READDIRP => -1 (Transport endpoint is not
> connected)
>
> I already set the memlock limit for glusterd to unlimited, but the problem
> persists.
>
> Only going from RDMA transport to TCP transport solved the problem.  (I'm
> running the volume now in mixed mode, config.transport=tcp,rdma).  Mounting
> with transport=rdma shows this error, mouting with transport=tcp is fine.
>
> however, this problem does not arise on all large directories, not on all.
> I didn't recognize a pattern yet.
>
> I'm using glusterfs v3.12.6 on the servers, QDR Infiniband HCAs .
>
> Is this a known issue with RDMA transport?
>
> best wishes,
> Stefan
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] RDMA inline threshold?

2018-05-29 Thread Stefan Solbrig
Dear all,

I faced a problem with a glusterfs volume (pure distributed, _not_ dispersed) 
over RDMA transport.  One user had a directory with a large number of files 
(50,000 files) and just doing an "ls" in this directory yields a "Transport 
endpoint not connected" error. The effect is, that "ls" only shows some files, 
but not all. 

The respective log file shows this error message:

[2018-05-20 20:38:25.114978] W [MSGID: 114031] 
[client-rpc-fops.c:2578:client3_3_readdirp_cbk] 0-glurch-client-0: remote 
operation failed [Transport endpoint is not connected]
[2018-05-20 20:38:27.732796] W [MSGID: 103046] 
[rdma.c:4089:gf_rdma_process_recv] 0-rpc-transport/rdma: peer 
(10.100.245.18:49153), couldn't encode or decode the msg properly or write 
chunks were not provided for replies that were bigger than 
RDMA_INLINE_THRESHOLD (2048)
[2018-05-20 20:38:27.732844] W [MSGID: 114031] 
[client-rpc-fops.c:2578:client3_3_readdirp_cbk] 0-glurch-client-3: remote 
operation failed [Transport endpoint is not connected]
[2018-05-20 20:38:27.733181] W [fuse-bridge.c:2897:fuse_readdirp_cbk] 
0-glusterfs-fuse: 72882828: READDIRP => -1 (Transport endpoint is not connected)

I already set the memlock limit for glusterd to unlimited, but the problem 
persists. 

Only going from RDMA transport to TCP transport solved the problem.  (I'm 
running the volume now in mixed mode, config.transport=tcp,rdma).  Mounting 
with transport=rdma shows this error, mouting with transport=tcp is fine.

however, this problem does not arise on all large directories, not on all. I 
didn't recognize a pattern yet. 

I'm using glusterfs v3.12.6 on the servers, QDR Infiniband HCAs . 

Is this a known issue with RDMA transport?

best wishes,
Stefan

___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] glustefs as vmware datastore in production

2018-05-29 Thread Dave Sherohman
On Tue, May 29, 2018 at 09:03:04AM +0900, 김경표 wrote:
> Sometimes os disk hang occured and re-mounted with ro in vm guest(centos6)
> when storage was busy.

I had similar problems in the early days of running my gluster volume,
then I switched the gluster mounts from fuse to libgfapi and haven't had
a problem since, even when running the volume harder than I had been
previously.

But, then, I'm running kvm/qemu virtualization rather than vmware and I
don't know whether vmware supports libgfapi or not.  (I noticed that it
wasn't on the list of options you mentioned for how to access the
volume.)


-- 
Dave Sherohman
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] glustefs as vmware datastore in production

2018-05-29 Thread Jonathan Archer
 Hi,I've gone through a bit of testing around using Gluster as a VMWare 
datastore, here are my findings:
running VMWare vSphere 6.5 with ESXi nodesGluster running on Supermicro kit, 6 
SAS disks with 2 SSD's for caching all carved up using LVM on to of CentOS 7.
I set up a 4 node cluster, ultimately to scale to 12 should this become usable, 
initially just using 3 nodes to satisfy quorum requirements without playing too 
much with default values.The idea was to preset via NFS, have a cluster VIP and 
heartbeat. Obviously NFS is no longer a viable option direct from Gluster so 
the route I took was down the NFS-Ganesha route. Had no end of troubles using 
the Ganesha tools with gluster 3.10 so ended up doing the whole thing manually 
which seems to stand up ok.For this I used Gluster, NFS-Ganesha with VFS 
connections, Pacemaker and Corrosync (still not 100% happy with the config and 
it needs tweaking but it does what it needs so far).
Next job was connecting from VMWare, I had to ensure that relevant read/write 
permissions were set both on the filesystem and in the Ganesha config but 
connection succeeded. From here was where I'm suffering a few issues.
If you migrate a VM into the NFS volume and then away again it doesn't clear 
the parent directory, it removes everything else but that, SSHing onto the ESXi 
server and manually trying to remove it results in 2 errors:
rm directory - cannot remove as it is a directoryrm -rf directory - cannot 
remove as it isn't a directory
second issue is to do with failover, The VIP fails over extremely fast, but 
VMWare always loses connection to the volume and a weird oddity of VMware seems 
to be that it will not remount an NFS volume so stays unavailable.
I have not got over these hurdles as yet, but will persevere as performance 
wise and cost wise its brilliant.
Hope this helps, or if anyone has any clues to the above issues I'd be most 
grateful.
Regards
Jon
On Friday, 25 May 2018, 18:33:09 BST,  wrote:  
 
  
Hi,

Does anyone have glusterfs as vmware datastore working in production in a real 
world case? How to serve the glusterfs cluster? As iscsi, NFS?

Thanks in advance ___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users  ___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users