Re: [Gluster-users] File Corruption with shards - 100% reproducable

2015-11-13 Thread Lindsay Mathieson
On 13 November 2015 at 20:01, Humble Devassy Chirammal <
humble.deva...@gmail.com> wrote:

> Can you please share which 'cache' option ( none, writeback,
> writethrough..etc)  has been set for I/O on  this problematic VM ?  This
> can be fetched either from process output or from xml schema of the VM.
>

Writeback. I believe I did test it with writethrough, but I will test again.


-- 
Lindsay
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] [Gluster-devel] mixed 3.7 and 3.6 environment

2015-11-13 Thread Niels de Vos
On Thu, Nov 12, 2015 at 05:11:32PM +, David Robinson wrote:
> Is there anyway to force a mount of a 3.6 server using a 3.7.6 FUSE client?
> My production machine is 3.6.6 and my test platform is 3.7.6.  I would like
> to test the 3.7.6 FUSE client but would need for this client to be able to
> mount both a 3.6.6 and a 3.7.6 server.

A 3.7.6 client will try to connect from an unprivileged port (> 1024).
Gluster servers with 3.6.x require connections from pivileged ports by
default (< 1024). The GETSPEC procedure that you see failing in the log,
is most likely because the GlusterD process terminates the connection
from the high port numbers.

It seems we've missed including release notes for the last 3.6.x
updates, but the "Known Issues" section of 3.6.3 still applies. It
contains the steps to allow connecting clients to use unprivileged
ports:

  
https://github.com/gluster/glusterfs/blob/release-3.6/doc/release-notes/3.6.3.md#known-issues

Now, I expect you would like the 3.7.6 client to use privileged ports
instead. Unfortunately I am not sure if that is possible. I gave it a
quick try, but could not find a way to force the connection to GlusterD
to use a privileged port. Connecting to the bricks is possible like
that, when mounting with

  # mount -t glusterfs -o xlator-option=*.client-bind-insecure=no ...

This would then only require the change to /etc/glusterfs/glusterd.vol,
and not the server.allow-insecure option from the release notes.

HTH,
Niels


> 
> When I try to mount the 3.6.6 server using a 3.7.6 client, I get the
> following:
> 
> [root@ff01bkp glusterfs]# cat homegfs.log
> [2015-11-12 16:55:56.860663] I [MSGID: 100030] [glusterfsd.c:2318:main]
> 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.7.6
> (args: /usr/sbin/glusterfs --volfile-server=gfsib01a.corvidtec.com
> --volfile-server-transport=tcp --volfile-id=/homegfs.tcp /homegfs)
> [2015-11-12 16:55:56.868032] I [MSGID: 101190]
> [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread with
> index 1
> [2015-11-12 16:55:56.871923] W [socket.c:588:__socket_rwv] 0-glusterfs:
> readv on 10.200.70.1:24007 failed (No data available)
> [2015-11-12 16:55:56.872236] E [rpc-clnt.c:362:saved_frames_unwind] (-->
> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x192)[0x7f9e5507ea82] (-->
> /lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7f9e54e49a3e] (-->
> /lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f9e54e49b4e] (-->
> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x7a)[0x7f9e54e4b4da] (-->
> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x88)[0x7f9e54e4bd08] )
> 0-glusterfs: forced unwinding frame type(GlusterFS Handshake) op(GETSPEC(2))
> called at 2015-11-12 16:55:56.871822 (xid=0x1)
> [2015-11-12 16:55:56.872254] E [glusterfsd-mgmt.c:1603:mgmt_getspec_cbk]
> 0-mgmt: failed to fetch volume file (key:/homegfs.tcp)
> [2015-11-12 16:55:56.872283] W [glusterfsd.c:1236:cleanup_and_exit]
> (-->/lib64/libgfrpc.so.0(saved_frames_unwind+0x205) [0x7f9e54e49a65]
> -->/usr/sbin/glusterfs(mgmt_getspec_cbk+0x490) [0x7f9e6450]
> -->/usr/sbin/glusterfs(cleanup_and_exit+0x69) [0x7f9e06d9] ) 0-:
> received signum (0), shutting down
> [2015-11-12 16:55:56.872299] I [fuse-bridge.c:5683:fini] 0-fuse: Unmounting
> '/homegfs'.
> [2015-11-12 16:55:56.872616] W [glusterfsd.c:1236:cleanup_and_exit]
> (-->/lib64/libpthread.so.0(+0x7df3) [0x7f9e53ee5df3]
> -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5) [0x7f9e0855]
> -->/usr/sbin/glusterfs(cleanup_and_exit+0x69) [0x7f9e06d9] ) 0-:
> received signum (15), shutting down
> David

> ___
> Gluster-devel mailing list
> gluster-de...@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-devel



signature.asc
Description: PGP signature
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] File Corruption with shards - 100% reproducable

2015-11-13 Thread Lindsay Mathieson
On 13 November 2015 at 20:01, Humble Devassy Chirammal <
humble.deva...@gmail.com> wrote:

> Can you please share which 'cache' option ( none, writeback,
> writethrough..etc)  has been set for I/O on  this problematic VM ?  This
> can be fetched either from process output or from xml schema of the VM.
>

I tried it with Cache of and Cache = Sync. Both times the image was
corrupted.


-- 
Lindsay
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] rsync to gluster mount: self-heal and bad performance

2015-11-13 Thread Tiemen Ruiten
Hello Ernie, list,

No, that's not the case. The volume is mounted through glusterfs-fuse - on
the same server running one of the bricks. The fstab:

# /etc/fstab
# Created by anaconda on Tue Aug 18 18:10:49 2015
#
# Accessible filesystems, by reference, are maintained under '/dev/disk'
# See man pages fstab(5), findfs(8), mount(8) and/or blkid(8) for more info
#
UUID=56778fed-bf3f-435e-8c32-edaa8c707f29 /   xfs defaults
   0 0
UUID=a44e32ed-cfbe-4ba0-896f-1efff9397ba1 /boot   xfs defaults
   0 0
UUID=a344d2bc-266d-4905-85b1-fbb7fe927659 swap
swap defaults
   0 0
/dev/vdb1  /data/brick   xfs defaults 1 2
iron2:/lpxassets  /mnt/lpxassets glusterfs _netdev,acl 0 0




On 12 November 2015 at 22:50, Ernie Dunbar  wrote:

> Hi Tiemen
>
> It sounds like you're trying to rsync files onto your Gluster server,
> rather than to the Gluster filesystem. You want to copy these files into
> the mounted filesystem (typically on some other system than the Gluster
> servers), because Gluster is designed to handle it that way.
>
> I can't remember the nitty gritty details about why this is, but I've made
> this mistake before as well. Hope that helps. :)
>
>
> On 2015-11-12 11:31, Tiemen Ruiten wrote:
>
>> Hello,
>>
>> While rsyncing to a directory mounted through glusterfs fuse,
>> performance is very bad and it appears every synced file generates a
>> (metadata) self-heal.
>>
>> The volume is mounted with option acl and acl's are set on a
>> subdirectory.
>>
>> Setup is as follows:
>>
>> Two Centos 7 VM's (KVM), with Gluster 3.7.6 and one physical CentOS 6
>> node, also Gluster 3.7.6. Physical node functions as arbiter. So it's
>> a replica 3 arbiter 1 volume. The bricks are LVM's with XFS
>> filesystem.
>>
>> While I don't think I should expect top performance for rsync on
>> Gluster, I wouldn't expect every file synced to trigger a self-heal.
>> Anything I can do to improve this? Should I file a bug?
>>
>> Another thing that looks related, I see a lot of these messages,
>> especially when doing IO:
>>
>> [2015-11-12 19:25:42.185904] I [dict.c:473:dict_get]
>>
>> (-->/usr/lib64/glusterfs/3.7.6/xlator/debug/io-stats.so(io_stats_lookup_cbk+0x121)
>> [0x7fdcc2d31161]
>>
>> -->/usr/lib64/glusterfs/3.7.6/xlator/system/posix-acl.so(posix_acl_lookup_cbk+0x242)
>> [0x7fdcc2b1b212] -->/lib64/libglusterfs.so.0(dict_get+0xac)
>> [0x7fdcd5e770cc] ) 0-dict: !this || key=system.posix_acl_default
>> [Invalid argument]
>>
>> --
>>
>> Tiemen Ruiten
>> Systems Engineer
>> R Media
>>
>> ___
>> Gluster-users mailing list
>> Gluster-users@gluster.org
>> http://www.gluster.org/mailman/listinfo/gluster-users
>>
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
>



-- 
Tiemen Ruiten
Systems Engineer
R Media
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] [Gluster-devel] mixed 3.7 and 3.6 environment

2015-11-13 Thread Bipin Kunal
Hi David,

I don't think that is possible or recommended.

Client compatibility with server is only with client with same version or
lower version.

Thanks,
Bipin Kunal

On Thu, Nov 12, 2015 at 10:41 PM, David Robinson <
david.robin...@corvidtec.com> wrote:

> Is there anyway to force a mount of a 3.6 server using a 3.7.6 FUSE
> client?
> My production machine is 3.6.6 and my test platform is 3.7.6.  I would
> like to test the 3.7.6 FUSE client but would need for this client to be
> able to mount both a 3.6.6 and a 3.7.6 server.
>
> When I try to mount the 3.6.6 server using a 3.7.6 client, I get the
> following:
>
>
>
>
>
>
>
>
>
> *[root@ff01bkp glusterfs]# cat homegfs.log [2015-11-12 16:55:56.860663] I
> [MSGID: 100030] [glusterfsd.c:2318:main] 0-/usr/sbin/glusterfs: Started
> running /usr/sbin/glusterfs version 3.7.6 (args: /usr/sbin/glusterfs
> --volfile-server=gfsib01a.corvidtec.com 
> --volfile-server-transport=tcp --volfile-id=/homegfs.tcp
> /homegfs)[2015-11-12 16:55:56.868032] I [MSGID: 101190]
> [event-epoll.c:632:event_dispatch_epoll_worker] 0-epoll: Started thread
> with index 1[2015-11-12 16:55:56.871923] W [socket.c:588:__socket_rwv]
> 0-glusterfs: readv on 10.200.70.1:24007  failed
> (No data available)[2015-11-12 16:55:56.872236] E
> [rpc-clnt.c:362:saved_frames_unwind] (-->
> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x192)[0x7f9e5507ea82] (-->
> /lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7f9e54e49a3e] (-->
> /lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f9e54e49b4e] (-->
> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x7a)[0x7f9e54e4b4da] (-->
> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x88)[0x7f9e54e4bd08] )
> 0-glusterfs: forced unwinding frame type(GlusterFS Handshake)
> op(GETSPEC(2)) called at 2015-11-12 16:55:56.871822 (xid=0x1)[2015-11-12
> 16:55:56.872254] E [glusterfsd-mgmt.c:1603:mgmt_getspec_cbk] 0-mgmt: failed
> to fetch volume file (key:/homegfs.tcp)[2015-11-12 16:55:56.872283] W
> [glusterfsd.c:1236:cleanup_and_exit]
> (-->/lib64/libgfrpc.so.0(saved_frames_unwind+0x205) [0x7f9e54e49a65]
> -->/usr/sbin/glusterfs(mgmt_getspec_cbk+0x490) [0x7f9e6450]
> -->/usr/sbin/glusterfs(cleanup_and_exit+0x69) [0x7f9e06d9] ) 0-:
> received signum (0), shutting down[2015-11-12 16:55:56.872299] I
> [fuse-bridge.c:5683:fini] 0-fuse: Unmounting '/homegfs'.[2015-11-12
> 16:55:56.872616] W [glusterfsd.c:1236:cleanup_and_exit]
> (-->/lib64/libpthread.so.0(+0x7df3) [0x7f9e53ee5df3]
> -->/usr/sbin/glusterfs(glusterfs_sigwaiter+0xe5) [0x7f9e0855]
> -->/usr/sbin/glusterfs(cleanup_and_exit+0x69) [0x7f9e06d9] ) 0-:
> received signum (15), shutting down*
> David
>
>
> ___
> Gluster-devel mailing list
> gluster-de...@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-devel
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] File Corruption with shards - 100% reproducable

2015-11-13 Thread Humble Devassy Chirammal
Hi Lindsay,

>
- start the vm, open a console to it.

- live migrate the VM to a another node

- It will rapdily barf itself with disk errors

>

Can you please share which 'cache' option ( none, writeback,
writethrough..etc)  has been set for I/O on  this problematic VM ?  This
can be fetched either from process output or from xml schema of the VM.

--Humble


On Fri, Nov 13, 2015 at 11:57 AM, Lindsay Mathieson <
lindsay.mathie...@gmail.com> wrote:

>
> On 12 November 2015 at 15:46, Krutika Dhananjay 
> wrote:
>
>> OK. What do the client logs say?
>>
>
> Dumb question - Which logs are those?
>
>
> Could you share the exact steps to recreate this, and I will try it
>> locally on my setup?
>>
>
> I'm running this on a 3 node proxmox cluster, which makes the vm creation
> & migration easy to test.
>
> Steps:
> - Create 3 node gluster datastore using proxmox vm host nodes
>
> - Add gluster datastore as a storage dvice to proxmox
>   * qemu vms use the gfapi to access the datastore
>   * proxmox also adds a fuse mount for easy acces
>
> - create a VM on the gluster storage, QCOW2 format. I just created a
> simple debain Mate vm
>
> - start the vm, open a console to it.
>
> - live migrate the VM to a another node
>
> - It will rapdily barf itself with disk errors
>
> - stop the VM
>
> - qemu will show file corruption (many many errors)
>   * qemu-img check 
>   * qemu-img info 
>
>
> Repeating the process with sharding off has no errors.
>
>
>
>>
>> Also, want to see the output of 'gluster volume info'.
>>
>
>
> I've trimmed settings down to a bare minimum. This is a test gluster
> cluster so I can do with it as I wish.
>
>
>
> gluster volume info
>
> Volume Name: datastore1
> Type: Replicate
> Volume ID: 238fddd0-a88c-4edb-8ac5-ef87c58682bf
> Status: Started
> Number of Bricks: 1 x 3 = 3
> Transport-type: tcp
> Bricks:
> Brick1: vnb.proxmox.softlog:/mnt/ext4
> Brick2: vng.proxmox.softlog:/mnt/ext4
> Brick3: vna.proxmox.softlog:/mnt/ext4
> Options Reconfigured:
> performance.strict-write-ordering: on
> performance.readdir-ahead: off
> cluster.quorum-type: auto
> features.shard: on
>
>
>
> --
> Lindsay
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] yum install glusterfs-server install failed - dependency issue

2015-11-13 Thread Rao, Uthra R. (GSFC-672.0)[ADNET SYSTEMS INC]
On my RHEL7.1 system I have installed the following packages from the 
glusterfs-epel.repo:

# rpm -qa | grep gluster
glusterfs-libs-3.6.0.29-2.el7.x86_64
glusterfs-fuse-3.6.0.29-2.el7.x86_64
glusterfs-3.6.0.29-2.el7.x86_64
glusterfs-api-3.6.0.29-2.el7.x86_64

Now when I try to install the glusterfs-server package I am getting the 
following dependency error:


# yum -y install glusterfs-server
Loaded plugins: langpacks, product-id, protectbase, subscription-manager
46 packages excluded due to repository protections Resolving Dependencies
--> Running transaction check
---> Package glusterfs-server.x86_64 0:3.7.6-1.el7 will be installed
--> Processing Dependency: glusterfs-libs = 3.7.6-1.el7 for package:
--> glusterfs-server-3.7.6-1.el7.x86_64
--> Processing Dependency: glusterfs-fuse = 3.7.6-1.el7 for package:
--> glusterfs-server-3.7.6-1.el7.x86_64
--> Processing Dependency: glusterfs-client-xlators = 3.7.6-1.el7 for
--> package: glusterfs-server-3.7.6-1.el7.x86_64
--> Processing Dependency: glusterfs-cli = 3.7.6-1.el7 for package:
--> glusterfs-server-3.7.6-1.el7.x86_64
--> Processing Dependency: glusterfs = 3.7.6-1.el7 for package:
--> glusterfs-server-3.7.6-1.el7.x86_64
--> Processing Dependency: libgfapi.so.0(GFAPI_PRIVATE_3.7.0)(64bit) for
--> package: glusterfs-server-3.7.6-1.el7.x86_64
--> Processing Dependency: libgfapi.so.0(GFAPI_PRIVATE_3.4.0)(64bit) for
--> package: glusterfs-server-3.7.6-1.el7.x86_64
--> Processing Dependency: libgfapi.so.0(GFAPI_3.7.4)(64bit) for
--> package: glusterfs-server-3.7.6-1.el7.x86_64
--> Processing Dependency: libgfapi.so.0(GFAPI_3.7.0)(64bit) for
--> package: glusterfs-server-3.7.6-1.el7.x86_64
--> Processing Dependency: libgfapi.so.0(GFAPI_3.6.0)(64bit) for
--> package: glusterfs-server-3.7.6-1.el7.x86_64
--> Processing Dependency: libgfapi.so.0(GFAPI_3.5.1)(64bit) for
--> package: glusterfs-server-3.7.6-1.el7.x86_64
--> Processing Dependency: libgfapi.so.0(GFAPI_3.4.2)(64bit) for
--> package: glusterfs-server-3.7.6-1.el7.x86_64
--> Processing Dependency: libgfapi.so.0(GFAPI_3.4.0)(64bit) for
--> package: glusterfs-server-3.7.6-1.el7.x86_64
--> Processing Dependency: liburcu-cds.so.1()(64bit) for package:
--> glusterfs-server-3.7.6-1.el7.x86_64
--> Processing Dependency: liburcu-bp.so.1()(64bit) for package:
--> glusterfs-server-3.7.6-1.el7.x86_64
--> Running transaction check
---> Package glusterfs-client-xlators.x86_64 0:3.7.6-1.el7 will be
---> installed Package glusterfs-server.x86_64 0:3.7.6-1.el7 will be
---> installed
--> Processing Dependency: glusterfs-libs = 3.7.6-1.el7 for package:
--> glusterfs-server-3.7.6-1.el7.x86_64
--> Processing Dependency: glusterfs-fuse = 3.7.6-1.el7 for package:
--> glusterfs-server-3.7.6-1.el7.x86_64
--> Processing Dependency: glusterfs-cli = 3.7.6-1.el7 for package:
--> glusterfs-server-3.7.6-1.el7.x86_64
--> Processing Dependency: glusterfs = 3.7.6-1.el7 for package:
--> glusterfs-server-3.7.6-1.el7.x86_64
--> Processing Dependency: libgfapi.so.0(GFAPI_PRIVATE_3.7.0)(64bit) for
--> package: glusterfs-server-3.7.6-1.el7.x86_64
--> Processing Dependency: libgfapi.so.0(GFAPI_PRIVATE_3.4.0)(64bit) for
--> package: glusterfs-server-3.7.6-1.el7.x86_64
--> Processing Dependency: libgfapi.so.0(GFAPI_3.7.4)(64bit) for
--> package: glusterfs-server-3.7.6-1.el7.x86_64
--> Processing Dependency: libgfapi.so.0(GFAPI_3.7.0)(64bit) for
--> package: glusterfs-server-3.7.6-1.el7.x86_64
--> Processing Dependency: libgfapi.so.0(GFAPI_3.6.0)(64bit) for
--> package: glusterfs-server-3.7.6-1.el7.x86_64
--> Processing Dependency: libgfapi.so.0(GFAPI_3.5.1)(64bit) for
--> package: glusterfs-server-3.7.6-1.el7.x86_64
--> Processing Dependency: libgfapi.so.0(GFAPI_3.4.2)(64bit) for
--> package: glusterfs-server-3.7.6-1.el7.x86_64
--> Processing Dependency: libgfapi.so.0(GFAPI_3.4.0)(64bit) for
--> package: glusterfs-server-3.7.6-1.el7.x86_64
---> Package userspace-rcu.x86_64 0:0.7.9-1.el7 will be installed
--> Finished Dependency Resolution
Error: Package: glusterfs-server-3.7.6-1.el7.x86_64 (glusterfs-epel)
   Requires: libgfapi.so.0(GFAPI_PRIVATE_3.4.0)(64bit)
Error: Package: glusterfs-server-3.7.6-1.el7.x86_64 (glusterfs-epel)
   Requires: libgfapi.so.0(GFAPI_3.5.1)(64bit)
Error: Package: glusterfs-server-3.7.6-1.el7.x86_64 (glusterfs-epel)
   Requires: libgfapi.so.0(GFAPI_3.7.0)(64bit)
Error: Package: glusterfs-server-3.7.6-1.el7.x86_64 (glusterfs-epel)
   Requires: glusterfs-libs = 3.7.6-1.el7
   Installed: glusterfs-libs-3.6.0.29-2.el7.x86_64 (@rhel-7-server-rpms)
   glusterfs-libs = 3.6.0.29-2.el7
   Available: glusterfs-libs-3.4.0.59rhs-1.el7.x86_64 
(rhel-7-server-rpms)
   glusterfs-libs = 3.4.0.59rhs-1.el7
Error: Package: glusterfs-server-3.7.6-1.el7.x86_64 (glusterfs-epel)
   Requires: libgfapi.so.0(GFAPI_3.4.2)(64bit)
Error: Package: glusterfs-server-3.7.6-1.el7.x86_64 (glusterfs-epel)
   

Re: [Gluster-users] 'No data available' at clients, brick xattr ops errors on small I/O -- XFS stripe issue or repeat bug?

2015-11-13 Thread LaGarde, Owen M ERDC-RDE-ITL-MS Contractor
Looks like the errors occur only when the gfid-to-path translation [volume 
option] is on.  Is anyone else seeing this?  Anyone using 3.6.6-1 with 
XFS-formatted bricks?



From: LaGarde, Owen M ERDC-RDE-ITL-MS Contractor
Sent: Tuesday, November 10, 2015 4:24 PM
To: gluster-users@gluster.org; LaGarde, Owen M ERDC-RDE-ITL-MS Contractor
Subject: RE: 'No data available' at clients, brick xattr ops errors on small 
I/O -- XFS stripe issue or repeat bug?

Update:  I've tried a second cluster with AFAIK identical backing storage 
configuration from the LUNs up, identical gluster/xfsprogs/kernel on the 
servers, identical volume setup, and identical kernel/gluster on the clients.  
The reproducer does not fail on the new system.  So far I can't find any delta 
between the two clusters' setup other than the brick count (28 bricks across 8 
servers on the failing one, 14 bricks across 4 servers on the new one).
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] 'No data available' at clients, brick xattr ops errors on small I/O -- XFS stripe issue or repeat bug?

2015-11-13 Thread LaGarde, Owen M ERDC-RDE-ITL-MS Contractor
I've now tried the same repeater scenario against EXT2, EXT3, EXT4, and XFS 
formatted bricks.  There's no change in behavior;  the discriminating detail is 
still only whether the build-pgfid volume option is on.  Number of bricks, 
distribution over servers, transport protocol, etc., can all be changed over a 
wide range without affecting the scope or nature of failure.

Is there anyone using build-pgfid=on, doing any fine-grained small-file I/O 
(such as building sizable project from source), and *not* getting xattr errors 
in the brick logs / undeletable files due to incomplete xattr ops?  Anyone?  
Anyone?  Buhler?  Buhler?



From: LaGarde, Owen M ERDC-RDE-ITL-MS Contractor
Sent: Friday, November 13, 2015 4:36 PM
To: gluster-users@gluster.org; LaGarde, Owen M ERDC-RDE-ITL-MS Contractor
Subject: RE: 'No data available' at clients, brick xattr ops errors on small 
I/O -- XFS stripe issue or repeat bug?

Looks like the errors occur only when the gfid-to-path translation [volume 
option] is on.  Is anyone else seeing this?  Anyone using 3.6.6-1 with 
XFS-formatted bricks?



From: LaGarde, Owen M ERDC-RDE-ITL-MS Contractor
Sent: Tuesday, November 10, 2015 4:24 PM
To: gluster-users@gluster.org; LaGarde, Owen M ERDC-RDE-ITL-MS Contractor
Subject: RE: 'No data available' at clients, brick xattr ops errors on small 
I/O -- XFS stripe issue or repeat bug?

Update:  I've tried a second cluster with AFAIK identical backing storage 
configuration from the LUNs up, identical gluster/xfsprogs/kernel on the 
servers, identical volume setup, and identical kernel/gluster on the clients.  
The reproducer does not fail on the new system.  So far I can't find any delta 
between the two clusters' setup other than the brick count (28 bricks across 8 
servers on the failing one, 14 bricks across 4 servers on the new one).
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] File Corruption with shards - 100% reproducable

2015-11-13 Thread Lindsay Mathieson
gluster volume set datastore1 group virt
Unable to open file '/var/lib/glusterd/groups/virt'. Error: No such file or 
directory

Not sure I understand this one – couldn’t find any docs for it.

Sent from Mail for Windows 10



From: Krutika Dhananjay
Sent: Saturday, 14 November 2015 1:45 PM
To: Lindsay Mathieson
Cc: gluster-users
Subject: Re: [Gluster-users] File Corruption with shards - 100% reproducable


The logs are at /var/log/glusterfs/.log

OK. So what do you observe when you set group virt to on?

# gluster volume set  group virt

-Krutika


From: "Lindsay Mathieson" 
To: "Krutika Dhananjay" 
Cc: "gluster-users" 
Sent: Friday, November 13, 2015 11:57:15 AM
Subject: Re: [Gluster-users] File Corruption with shards - 100% reproducable


On 12 November 2015 at 15:46, Krutika Dhananjay  wrote:
OK. What do the client logs say?

Dumb question - Which logs are those?  


Could you share the exact steps to recreate this, and I will try it locally on 
my setup?

I'm running this on a 3 node proxmox cluster, which makes the vm creation & 
migration easy to test.

Steps:
- Create 3 node gluster datastore using proxmox vm host nodes

- Add gluster datastore as a storage dvice to proxmox
  * qemu vms use the gfapi to access the datastore
  * proxmox also adds a fuse mount for easy acces

- create a VM on the gluster storage, QCOW2 format. I just created a simple 
debain Mate vm

- start the vm, open a console to it.

- live migrate the VM to a another node

- It will rapdily barf itself with disk errors

- stop the VM

- qemu will show file corruption (many many errors)
  * qemu-img check 
  * qemu-img info 


Repeating the process with sharding off has no errors.

 

Also, want to see the output of 'gluster volume info'.


I've trimmed settings down to a bare minimum. This is a test gluster cluster so 
I can do with it as I wish.



gluster volume info
 
Volume Name: datastore1
Type: Replicate
Volume ID: 238fddd0-a88c-4edb-8ac5-ef87c58682bf
Status: Started
Number of Bricks: 1 x 3 = 3
Transport-type: tcp
Bricks:
Brick1: vnb.proxmox.softlog:/mnt/ext4
Brick2: vng.proxmox.softlog:/mnt/ext4
Brick3: vna.proxmox.softlog:/mnt/ext4
Options Reconfigured:
performance.strict-write-ordering: on
performance.readdir-ahead: off
cluster.quorum-type: auto
features.shard: on



-- 
Lindsay



___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] File Corruption with shards - 100% reproducable

2015-11-13 Thread Krutika Dhananjay
You should be able to find a file named group-virt.example under 
/etc/glusterfs/ 
Copy that as /var/lib/glusterd/virt. 

Then execute `gluster volume set datastore1 group virt`. 
Now with this configuration, could you try your test case and let me know 
whether the file corruption still exists? 

-Krutika 

- Original Message -

> From: "Lindsay Mathieson" 
> To: "Krutika Dhananjay" 
> Cc: "gluster-users" 
> Sent: Saturday, November 14, 2015 10:51:26 AM
> Subject: RE: [Gluster-users] File Corruption with shards - 100% reproducable

> gluster volume set datastore1 group virt

> Unable to open file '/var/lib/glusterd/groups/virt'. Error: No such file or
> directory

> Not sure I understand this one – couldn’t find any docs for it.

> Sent from Mail for Windows 10

> From: Krutika Dhananjay
> Sent: Saturday, 14 November 2015 1:45 PM
> To: Lindsay Mathieson
> Cc: gluster-users
> Subject: Re: [Gluster-users] File Corruption with shards - 100% reproducable

> The logs are at /var/log/glusterfs/.log

> OK. So what do you observe when you set group virt to on?

> # gluster volume set  group virt

> -Krutika

> > From: "Lindsay Mathieson" 
> 
> > To: "Krutika Dhananjay" 
> 
> > Cc: "gluster-users" 
> 
> > Sent: Friday, November 13, 2015 11:57:15 AM
> 
> > Subject: Re: [Gluster-users] File Corruption with shards - 100%
> > reproducable
> 

> > On 12 November 2015 at 15:46, Krutika Dhananjay < kdhan...@redhat.com >
> > wrote:
> 
> > > OK. What do the client logs say?
> > 
> 

> > Dumb question - Which logs are those?
> 

> > > Could you share the exact steps to recreate this, and I will try it
> > > locally
> > > on my setup?
> > 
> 

> > I'm running this on a 3 node proxmox cluster, which makes the vm creation &
> > migration easy to test.
> 

> > Steps:
> 

> > - Create 3 node gluster datastore using proxmox vm host nodes
> 

> > - Add gluster datastore as a storage dvice to proxmox
> 

> > * qemu vms use the gfapi to access the datastore
> 

> > * proxmox also adds a fuse mount for easy acces
> 

> > - create a VM on the gluster storage, QCOW2 format. I just created a simple
> > debain Mate vm
> 

> > - start the vm, open a console to it.
> 

> > - live migrate the VM to a another node
> 

> > - It will rapdily barf itself with disk errors
> 

> > - stop the VM
> 

> > - qemu will show file corruption (many many errors)
> 

> > * qemu-img check 
> 
> > * qemu-img info 
> 

> > Repeating the process with sharding off has no errors.
> 

> > > Also, want to see the output of 'gluster volume info'.
> > 
> 

> > I've trimmed settings down to a bare minimum. This is a test gluster
> > cluster
> > so I can do with it as I wish.
> 

> > gluster volume info
> 

> > Volume Name: datastore1
> 
> > Type: Replicate
> 
> > Volume ID: 238fddd0-a88c-4edb-8ac5-ef87c58682bf
> 
> > Status: Started
> 
> > Number of Bricks: 1 x 3 = 3
> 
> > Transport-type: tcp
> 
> > Bricks:
> 
> > Brick1: vnb.proxmox.softlog:/mnt/ext4
> 
> > Brick2: vng.proxmox.softlog:/mnt/ext4
> 
> > Brick3: vna.proxmox.softlog:/mnt/ext4
> 
> > Options Reconfigured:
> 
> > performance.strict-write-ordering: on
> 
> > performance.readdir-ahead: off
> 
> > cluster.quorum-type: auto
> 
> > features.shard: on
> 

> > --
> 

> > Lindsay
> 
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] File Corruption with shards - 100% reproducable

2015-11-13 Thread Humble Devassy Chirammal
If possible, can you please check the result with 'cache=none' ?

--Humble


On Fri, Nov 13, 2015 at 3:51 PM, Lindsay Mathieson <
lindsay.mathie...@gmail.com> wrote:

>
> On 13 November 2015 at 20:01, Humble Devassy Chirammal <
> humble.deva...@gmail.com> wrote:
>
>> Can you please share which 'cache' option ( none, writeback,
>> writethrough..etc)  has been set for I/O on  this problematic VM ?  This
>> can be fetched either from process output or from xml schema of the VM.
>>
>
> Writeback. I believe I did test it with writethrough, but I will test
> again.
>
>
> --
> Lindsay
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] File Corruption with shards - 100% reproducable

2015-11-13 Thread Lindsay Mathieson
The command used to lauch the VM:

/usr/bin/kvm -id 910 -chardev
socket,id=qmp,path=/var/run/qemu-server/910.qmp,server,nowait -mon
chardev=qmp,mode=control -vnc
unix:/var/run/qemu-server/910.vnc,x509,password -pidfile
/var/run/qemu-server/910.pid -daemonize -smbios
type=1,uuid=f415789d-d92c-44ef-9bfc-44c448eff562 -name gluster-test -smp
2,sockets=1,cores=2,maxcpus=2 -nodefaults -boot
menu=on,strict=on,reboot-timeout=1000 -vga qxl -cpu
kvm64,+lahf_lm,+sep,+kvm_pv_unhalt,+kvm_pv_eoi,-kvm_steal_time,enforce -m
2048 -k en-us -device pci-bridge,id=pci.1,chassis_nr=1,bus=pci.0,addr=0x1e
-device pci-bridge,id=pci.2,chassis_nr=2,bus=pci.0,addr=0x1f -device
piix3-usb-uhci,id=uhci,bus=pci.0,addr=0x1.0x2 -spice
tls-port=61006,addr=localhost,tls-ciphers=DES-CBC3-SHA,seamless-migration=on
-device virtio-serial,id=spice,bus=pci.0,addr=0x9 -chardev
spicevmc,id=vdagent,name=vdagent -device
virtserialport,chardev=vdagent,name=com.redhat.spice.0 -device
virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3 -iscsi
initiator-name=iqn.1993-08.org.debian:01:cb8a28cc6f1e -drive
if=none,id=drive-ide2,media=cdrom,aio=threads -device
ide-cd,bus=ide.1,unit=0,drive=drive-ide2,id=ide2,bootindex=200 -drive
file=gluster://vnb.proxmox.softlog/datastore1/images/910/vm-910-disk-1.qcow2,if=none,id=drive-virtio0,cache=none,format=qcow2,aio=native,detect-zeroes=on
-device
virtio-blk-pci,drive=drive-virtio0,id=virtio0,bus=pci.0,addr=0xa,bootindex=100
-netdev
type=tap,id=net0,ifname=tap910i0,script=/var/lib/qemu-server/pve-bridge,downscript=/var/lib/qemu-server/pve-bridgedown,vhost=on
-device
virtio-net-pci,mac=66:39:35:35:34:65,netdev=net0,bus=pci.0,addr=0x12,id=net0,bootindex=300

On 13 November 2015 at 21:02, Lindsay Mathieson  wrote:

>
> On 13 November 2015 at 20:41, Humble Devassy Chirammal <
> humble.deva...@gmail.com> wrote:
>
>> If possible, can you please check the result with 'cache=none' ?
>
>
> Corrupted with that too I'm afraid.
>
>
> --
> Lindsay
>



-- 
Lindsay
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] File Corruption with shards - 100% reproducable

2015-11-13 Thread Lindsay Mathieson
On 13 November 2015 at 20:41, Humble Devassy Chirammal <
humble.deva...@gmail.com> wrote:

> If possible, can you please check the result with 'cache=none' ?


Corrupted with that too I'm afraid.


-- 
Lindsay
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users