Re: [Gluster-users] [ovirt-users] Re: [ANN] oVirt 4.3.7 Third Release Candidate is now available for testing

2019-12-01 Thread Krutika Dhananjay
Sorry about the late response.

I looked at the logs. These errors are originating from posix-acl
translator -



*[2019-11-17 07:55:47.090065] E [MSGID: 115050]
[server-rpc-fops_v2.c:158:server4_lookup_cbk] 0-data_fast-server: 162496:
LOOKUP /.shard/5985adcb-0f4d-4317-8a26-1652973a2350.6
(be318638-e8a0-4c6d-977d-7a937aa84806/5985adcb-0f4d-4317-8a26-1652973a2350.6),
client:
CTX_ID:8bff2d95-4629-45cb-a7bf-2412e48896bc-GRAPH_ID:0-PID:13394-HOST:ovirt1.localdomain-PC_NAME:data_fast-client-0-RECON_NO:-0,
error-xlator: data_fast-access-control [Permission denied][2019-11-17
07:55:47.090174] I [MSGID: 139001]
[posix-acl.c:263:posix_acl_log_permit_denied] 0-data_fast-access-control:
client:
CTX_ID:8bff2d95-4629-45cb-a7bf-2412e48896bc-GRAPH_ID:0-PID:13394-HOST:ovirt1.localdomain-PC_NAME:data_fast-client-0-RECON_NO:-0,
gfid: be318638-e8a0-4c6d-977d-7a937aa84806,
req(uid:36,gid:36,perm:1,ngrps:3),
ctx(uid:0,gid:0,in-groups:0,perm:000,updated-fop:INVALID, acl:-)
[Permission denied][2019-11-17 07:55:47.090209] E [MSGID: 115050]
[server-rpc-fops_v2.c:158:server4_lookup_cbk] 0-data_fast-server: 162497:
LOOKUP /.shard/5985adcb-0f4d-4317-8a26-1652973a2350.7
(be318638-e8a0-4c6d-977d-7a937aa84806/5985adcb-0f4d-4317-8a26-1652973a2350.7),
client:
CTX_ID:8bff2d95-4629-45cb-a7bf-2412e48896bc-GRAPH_ID:0-PID:13394-HOST:ovirt1.localdomain-PC_NAME:data_fast-client-0-RECON_NO:-0,
error-xlator: data_fast-access-control [Permission denied][2019-11-17
07:55:47.090299] I [MSGID: 139001]
[posix-acl.c:263:posix_acl_log_permit_denied] 0-data_fast-access-control:
client:
CTX_ID:8bff2d95-4629-45cb-a7bf-2412e48896bc-GRAPH_ID:0-PID:13394-HOST:ovirt1.localdomain-PC_NAME:data_fast-client-0-RECON_NO:-0,
gfid: be318638-e8a0-4c6d-977d-7a937aa84806,
req(uid:36,gid:36,perm:1,ngrps:3),
ctx(uid:0,gid:0,in-groups:0,perm:000,updated-fop:INVALID, acl:-)
[Permission denied]*

Jiffin/Raghavendra Talur,
Can you help?

-Krutika

On Wed, Nov 27, 2019 at 2:11 PM Strahil Nikolov 
wrote:

> Hi Nir,All,
>
> it seems that 4.3.7 RC3 (and even RC4) are not the problem here(attached
> screenshot of oVirt running on v7 gluster).
> It seems strange that both my serious issues with oVirt are related to
> gluster issue (1st gluster v3  to v5 migration and now this one).
>
> I have just updated to gluster v7.0 (Centos 7 repos), and rebooted all
> nodes.
> Now both Engine and all my VMs are back online - so if you hit issues with
> 6.6 , you should give a try to 7.0 (and even 7.1 is coming soon) before
> deciding to wipe everything.
>
> @Krutika,
>
> I guess you will ask for the logs, so let's switch to gluster-users about
> this one ?
>
> Best Regards,
> Strahil Nikolov
>
> В понеделник, 25 ноември 2019 г., 16:45:48 ч. Гринуич-5, Strahil Nikolov <
> hunter86...@yahoo.com> написа:
>
>
> Hi Krutika,
>
> I have enabled TRACE log level for the volume data_fast,
>
> but the issue is not much clear:
> FUSE reports:
>
> [2019-11-25 21:31:53.478130] I [MSGID: 133022]
> [shard.c:3674:shard_delete_shards] 0-data_fast-shard: Deleted shards of
> gfid=6d9ed2e5-d4f2-4749-839b-2f1
> 3a68ed472 from backend
> [2019-11-25 21:32:43.564694] W [MSGID: 114031]
> [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-0:
> remote operation failed. Path:
> /.shard/b0af2b81-22cf-482e-9b2f-c431b6449dae.79
> (----) [Permission denied]
> [2019-11-25 21:32:43.565653] W [MSGID: 114031]
> [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-1:
> remote operation failed. Path:
> /.shard/b0af2b81-22cf-482e-9b2f-c431b6449dae.79
> (----) [Permission denied]
> [2019-11-25 21:32:43.565689] W [MSGID: 114031]
> [client-rpc-fops_v2.c:2634:client4_0_lookup_cbk] 0-data_fast-client-2:
> remote operation failed. Path:
> /.shard/b0af2b81-22cf-482e-9b2f-c431b6449dae.79
> (----) [Permission denied]
> [2019-11-25 21:32:43.565770] E [MSGID: 133010]
> [shard.c:2327:shard_common_lookup_shards_cbk] 0-data_fast-shard: Lookup on
> shard 79 failed. Base file gfid = b0af2b81-22cf-482e-9b2f-c431b6449dae
> [Permission denied]
> [2019-11-25 21:32:43.565858] W [fuse-bridge.c:2830:fuse_readv_cbk]
> 0-glusterfs-fuse: 279: READ => -1 gfid=b0af2b81-22cf-482e-9b2f-c431b6449dae
> fd=0x7fbf40005ea8 (Permission denied)
>
>
> While the BRICK logs on ovirt1/gluster1 report:
> 2019-11-25 21:32:43.564177] D [MSGID: 0] [io-threads.c:376:iot_schedule]
> 0-data_fast-io-threads: LOOKUP scheduled as fast priority fop
> [2019-11-25 21:32:43.564194] T [MSGID: 0]
> [defaults.c:2008:default_lookup_resume] 0-stack-trace: stack-address:
> 0x7fc02c00bbf8, winding from data_fast-io-threads to data_fast-upcall
> [2019-11-25 21:32:43.564206] T [MSGID: 0] [upcall.c:790:up_lookup]
> 0-stack-trace: stack-address: 0x7fc02c00bbf8, winding from data_fast-upcall
> to data_fast-leases
> [2019-11-25 21:32:43.564215] T [MSGID: 0] [defaults.c:2766:default_lookup]
> 0-stack-trace: stack-address: 0x7fc02c00bbf8, winding from data_fast-leases
> to 

Re: [Gluster-users] Reg Performance issue in GlusterFS

2019-10-07 Thread Krutika Dhananjay
Hi Pratik,

What's the version of gluster you're running?

Also, what would help is the volume profile output. Here's what you should
do to capture it:
# gluster volume profile  start
# run_your_workload
# gluster volume profile  info > brick-profile.out

And attach brick-profile.out here.

-Krutika

On Mon, Oct 7, 2019 at 11:09 AM Satheesaran Sundaramoorthi <
sasun...@redhat.com> wrote:

> On Mon, Oct 7, 2019 at 1:24 AM Soumya Koduri  wrote:
>
>> Hi Pratik,
>>
>> Offhand I do not see any issue with the configuration. But I think for
>> VM images store, using gfapi may give better performance than compared
>> to fuse. CC'in Kritika and Gobinda who have been working on this
>> use-case and may be able to guide you.
>>
>> Thanks,
>> Soumya
>>
>> On 10/5/19 11:25 AM, Pratik Chandrakar wrote:
>> > Hello Soumya,
>> >
>> > This is Pratik from India. I am writing this mail because I am facing
>> performance issue in my cluster and searched a lot in net for the tuning
>> but not succeeded. It would be great if you can suggest me whether I should
>> stick with glusterfs or move to any other technology. Currently I am using
>> Glusterfs with fuse on CentOS for Storing images of Virtual Machine in
>> CloudStack Setup. Majority of workload is of SQL Server & MariaDB database
>> server, and some Web Servers. The issue is of slow booting and slow UI
>> response of VMs and also lot of time outs in SQL server database even on
>> small databases. I have dedicated 10G network for storage in my setup.
>> >
>> > Request you to please guide me, whether I am have miss-configured the
>> cluster or need to change the storage layer.
>> >
>> > Below is the configuration for your reference...
>> >
>> > *Volume Name: vmstore5152-v2*
>> > *Type: Replicate*
>> > *Volume ID: aa27a2cb-c0f5-41b9-a50f-fdce4d4d8358*
>> > *Status: Started*
>> > *Snapshot Count: 0*
>> > *Number of Bricks: 1 x (2 + 1) = 3*
>> > *Transport-type: tcp*
>> > *Bricks:*
>> > *Brick1: storagenode51:/datav2/brick51-v2/brick*
>> > *Brick2: storagenode52:/datav2/brick52-v2/brick*
>> > *Brick3: indphyserver2:/arbitator/arbrick5152-v2/brick (arbiter)*
>> > *Options Reconfigured:*
>> > *cluster.choose-local: off*
>> > *user.cifs: off*
>> > *features.shard: on*
>> > *cluster.shd-wait-qlength: 1*
>> > *cluster.shd-max-threads: 8*
>> > *cluster.locking-scheme: granular*
>> > *cluster.data-self-heal-algorithm: full*
>> > *cluster.server-quorum-type: server*
>> > *cluster.quorum-type: auto*
>> > *cluster.eager-lock: enable*
>> > *network.remote-dio: enable*
>>
>
> Hi Krutika,
>
> Do you think turning off remote-dio and enabling strict-o-direct, will
> improve performance ?
>
> @Sahina, @Gobinda, are you aware of performance optimization for the
> DB workload in the VMs ?
>
> -- Satheesaran
>
>> > *performance.low-prio-threads: 32*
>> > *performance.io-cache: off*
>> > *performance.read-ahead: off*
>> > *performance.quick-read: off*
>> > *storage.owner-gid: 107*
>> > *storage.owner-uid: 107*
>> > *cluster.lookup-optimize: on*
>> > *client.event-threads: 4 *
>> > *transport.address-family: inet*
>> > *nfs.disable: on*
>> > *performance.client-io-threads: on*
>> >
>> >
>> > --
>> > प्रतीक चंद्राकर | Pratik Chandrakar
>> > वैज्ञानिक - सी | Scientist-C
>> > एन.आई.सी - छत्तीसगढ़ राज्य केंद्र | NIC - Chhattisgarh State Centre
>> > हॉल क्र. एडी2-14 , मंत्रालय | Hall no.-AD2-14, Mantralaya
>> > महानदी भवन | Mahanadi Bhavan
>> > नवा रायपुर अटल नगर | Nava Raipur Atal Nagar
>> >
>> >
>> > 
>> >
>> > *Disclaimer:*
>> >
>> > This e-mail and its attachments may contain official Indian Government
>> > information. If you are not the intended recipient, please notify the
>> > sender immediately and delete this e-mail. Any dissemination or use of
>> > this information by a person other than the intended recipient is
>> > unauthorized. The responsibility lies with the recipient to check this
>> > email and any attachment for the presence of viruses.
>> >
>> 
>>
>> Community Meeting Calendar:
>>
>> APAC Schedule -
>> Every 2nd and 4th Tuesday at 11:30 AM IST
>> Bridge: https://bluejeans.com/118564314
>>
>> NA/EMEA Schedule -
>> Every 1st and 3rd Tuesday at 01:00 PM EDT
>> Bridge: https://bluejeans.com/118564314
>>
>> Gluster-users mailing list
>> Gluster-users@gluster.org
>> https://lists.gluster.org/mailman/listinfo/gluster-users
>>
>


Community Meeting Calendar:

APAC Schedule -
Every 2nd and 4th Tuesday at 11:30 AM IST
Bridge: https://bluejeans.com/118564314

NA/EMEA Schedule -
Every 1st and 3rd Tuesday at 01:00 PM EDT
Bridge: https://bluejeans.com/118564314

Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users


Re: [Gluster-users] General questions

2019-06-21 Thread Krutika Dhananjay
Adding (back) gluster-users.

-Krutika

On Fri, Jun 21, 2019 at 1:09 PM Krutika Dhananjay 
wrote:

>
>
> On Fri, Jun 21, 2019 at 12:43 PM Cristian Del Carlo <
> cristian.delca...@targetsolutions.it> wrote:
>
>> Thanks Strahil,
>>
>> in this link
>> https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3.4/html/administration_guide/sect-creating_replicated_volumes
>> i see:
>>
>>
>> *Sharding has one supported use case: in the context of providing Red Hat
>> Gluster Storage as a storage domain for Red Hat Enterprise Virtualization,
>> to provide storage for live virtual machine images. Note that sharding is
>> also a requirement for this use case, as it provides significant
>> performance improvements over previous implementations. *
>>
>> The deafult setting in GusterFS 6.1 appears to be:
>>
>> features.shard-block-size   64MB
>>
>> features.shard-lru-limit16384
>>
>> features.shard-deletion-rate100
>>
>
> That's right. Based on the tests we'd conducted internally, we'd found
> 64MB to be a good number both in terms of self-heal and IO performance. 4MB
> is a little on the lower side in that sense. The benefits of some features
> like eager-locking are lost if the shard size is too small. You can perhaps
> run some tests with 64MB shard-block-size to begin with, and tune it if it
> doesn't fit your needs.
>
> -Krutika
>
>
>> Bricks in my case are over an xfs filesystem. I'll try different
>> block-size but if I understand correctly, small block sizes are preferable
>> to big block sizes and If i have doubt I will put 4M.
>>
>> Very thanks for the warning, message received! :-)
>>
>> Best Regards,
>>
>> Cristian
>>
>>
>> Il giorno gio 20 giu 2019 alle ore 22:13 Strahil Nikolov <
>> hunter86...@yahoo.com> ha scritto:
>>
>>> Sharding is complex. It helps to heal faster -as only the shards that
>>> got changed will be replicated, but imagine a 1GB shard that got only 512k
>>> updated - in such case you will copy the whole shard to the other replicas.
>>> RHV & oVirt use a default shard size of 4M which is the exact size of
>>> the default PE in LVM.
>>>
>>> On the other side, it speeds stuff as gluster can balance the shards
>>> properly on the replicas and thus you can evenly distribute the load on the
>>> cluster.
>>> It is not a coincidence that RHV and oVirt use sharding by default.
>>>
>>> Just a warning.
>>> NEVER, EVER, DISABLE SHARDING!!! ONCE ENABLED - STAYS ENABLED!
>>> Don't ask how I learnGrazie dell'avviso, messaggio ricevuto!t that :)
>>>
>>> Best Regards,
>>> Strahil Nikolov
>>>
>>>
>>>
>>> В четвъртък, 20 юни 2019 г., 18:32:00 ч. Гринуич+3, Cristian Del Carlo <
>>> cristian.delca...@targetsolutions.it> написа:
>>>
>>>
>>> Hi,
>>>
>>> thanks for your help.
>>>
>>> I am planing to use libvirtd with plain KVM.
>>>
>>> Ok i will use libgfapi.
>>>
>>> I'm confused about the use of sharding is it useful in this
>>> configuration? Doesn't sharding help limit the bandwidth in the event of a
>>> rebalancing?
>>>
>>> In the vm setting so i need to use directsync to avoid corruption.
>>>
>>> Thanks again,
>>>
>>> Il giorno gio 20 giu 2019 alle ore 12:25 Strahil 
>>> ha scritto:
>>>
>>> Hi,
>>>
>>> Are you planing to use oVirt or plain KVM or openstack?
>>>
>>> I would recommend you to use gluster v6.1 as it is the latest stable
>>> version and will have longer support than the older versions.
>>>
>>> Fuse vs libgfapi - use the latter as it has better performance and less
>>> overhead on the host.oVirt does supports both libgfapi and fuse.
>>>
>>> Also, use replica 3 because you will have better read performance
>>> compared to replica 2 arbiter 1.
>>>
>>> Sharding is a tradeoff  between CPU (when there is no sharding , gluster
>>> shd must calculate the offset of the VM disk) and bandwidth (whole shard
>>> is being replicated despite even 512 need to be synced).
>>>
>>> If you will do live migration -  you do not want to cache in order to
>>> avoid  corruption.
>>> Thus oVirt is using direct I/O.
>>> Still, you can check the gluster settings mentioned in Red Hat
>>> documentation for Virt/openStack .

Re: [Gluster-users] VMs blocked for more than 120 seconds

2019-05-21 Thread Krutika Dhananjay
Hi Martin,

Glad it worked! And yes, 3.7.6 is really old! :)

So the issue is occurring when the vm flushes outstanding data to disk. And
this
is taking > 120s because there's lot of buffered writes to flush, possibly
followed
by an fsync too which needs to sync them to disk (volume profile would have
been helpful in confirming this). All these two options do is to truly
honor O_DIRECT flag
(which is what we want anyway given the vms are opened with 'cache=none'
qemu option).
This will skip write-caching on gluster client side and also bypass the
page-cache on the
gluster-bricks, and so data gets flushed faster, thereby eliminating these
timeouts.

-Krutika


On Mon, May 20, 2019 at 3:38 PM Martin  wrote:

> Hi Krutika,
>
> Also, gluster version please?
>
> I am running old 3.7.6. (Yes I know I should upgrade asap)
>
> I’ve applied firstly "network.remote-dio off", behaviour did not changed,
> VMs got stuck after some time again.
> Then I’ve set "performance.strict-o-direct on" and problem completly
> disappeared. No more stucks at all (7 days without any problems at all).
> This SOLVED the issue.
>
> Can you explain what remote-dio and strict-o-direct variables changed in
> behaviour of my Gluster? It would be great for later archive/users to
> understand what and why this solved my issue.
>
> Anyway, Thanks a LOT!!!
>
> BR,
> Martin
>
> On 13 May 2019, at 10:20, Krutika Dhananjay  wrote:
>
> OK. In that case, can you check if the following two changes help:
>
> # gluster volume set $VOL network.remote-dio off
> # gluster volume set $VOL performance.strict-o-direct on
>
> preferably one option changed at a time, its impact tested and then the
> next change applied and tested.
>
> Also, gluster version please?
>
> -Krutika
>
> On Mon, May 13, 2019 at 1:02 PM Martin Toth  wrote:
>
>> Cache in qemu is none. That should be correct. This is full command :
>>
>> /usr/bin/qemu-system-x86_64 -name one-312 -S -machine
>> pc-i440fx-xenial,accel=kvm,usb=off -m 4096 -realtime mlock=off -smp
>> 4,sockets=4,cores=1,threads=1 -uuid e95a774e-a594-4e98-b141-9f30a3f848c1
>> -no-user-config -nodefaults -chardev
>> socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-one-312/monitor.sock,server,nowait
>> -mon chardev=charmonitor,id=monitor,mode=control -rtc base=localtime
>> -no-shutdown -boot order=c,menu=on,splash-time=3000,strict=on -device
>> piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2
>>
>> -device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x4
>> -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5
>> -drive file=/var/lib/one//datastores/116/312/*disk.0*
>> ,format=raw,if=none,id=drive-virtio-disk1,cache=none
>> -device
>> virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=drive-virtio-disk1,id=virtio-disk1
>> -drive file=gluster://localhost:24007/imagestore/
>> *7b64d6757acc47a39503f68731f89b8e*
>> ,format=qcow2,if=none,id=drive-scsi0-0-0-0,cache=none
>> -device
>> scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0
>> -drive file=/var/lib/one//datastores/116/312/*disk.1*
>> ,format=raw,if=none,id=drive-ide0-0-0,readonly=on
>> -device ide-cd,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0
>>
>> -netdev tap,fd=26,id=hostnet0
>> -device 
>> e1000,netdev=hostnet0,id=net0,mac=02:00:5c:f0:e4:39,bus=pci.0,addr=0x3
>> -chardev pty,id=charserial0 -device
>> isa-serial,chardev=charserial0,id=serial0
>> -chardev 
>> socket,id=charchannel0,path=/var/lib/libvirt/qemu/channel/target/domain-one-312/org.qemu.guest_agent.0,server,nowait
>> -device
>> virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0
>> -vnc 0.0.0.0:312,password -device
>> cirrus-vga,id=video0,bus=pci.0,addr=0x2 -device
>> virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x7 -msg timestamp=on
>>
>> I’ve highlighted disks. First is VM context disk - Fuse used, second is
>> SDA (OS is installed here) - libgfapi used, third is SWAP - Fuse used.
>>
>> Krutika,
>> I will start profiling on Gluster Volumes and wait for next VM to fail.
>> Than I will attach/send profiling info after some VM will be failed. I
>> suppose this is correct profiling strategy.
>>
>
> About this, how many vms do you need to recreate it? A single vm? Or
> multiple vms doing IO in parallel?
>
>
>> Thanks,
>> BR!
>> Martin
>>
>> On 13 May 2019, at 09:21, Krutika Dhananjay  wrote:
>>
>> Also, what's the caching policy that qemu is using on the affected vms?
>> Is it cache=none? Or something else? You can get this information in th

Re: [Gluster-users] VMs blocked for more than 120 seconds

2019-05-13 Thread Krutika Dhananjay
OK. In that case, can you check if the following two changes help:

# gluster volume set $VOL network.remote-dio off
# gluster volume set $VOL performance.strict-o-direct on

preferably one option changed at a time, its impact tested and then the
next change applied and tested.

Also, gluster version please?

-Krutika

On Mon, May 13, 2019 at 1:02 PM Martin Toth  wrote:

> Cache in qemu is none. That should be correct. This is full command :
>
> /usr/bin/qemu-system-x86_64 -name one-312 -S -machine
> pc-i440fx-xenial,accel=kvm,usb=off -m 4096 -realtime mlock=off -smp
> 4,sockets=4,cores=1,threads=1 -uuid e95a774e-a594-4e98-b141-9f30a3f848c1
> -no-user-config -nodefaults -chardev
> socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-one-312/monitor.sock,server,nowait
> -mon chardev=charmonitor,id=monitor,mode=control -rtc base=localtime
> -no-shutdown -boot order=c,menu=on,splash-time=3000,strict=on -device
> piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2
>
> -device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x4
> -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5
> -drive file=/var/lib/one//datastores/116/312/*disk.0*
> ,format=raw,if=none,id=drive-virtio-disk1,cache=none
> -device
> virtio-blk-pci,scsi=off,bus=pci.0,addr=0x6,drive=drive-virtio-disk1,id=virtio-disk1
> -drive file=gluster://localhost:24007/imagestore/
> *7b64d6757acc47a39503f68731f89b8e*
> ,format=qcow2,if=none,id=drive-scsi0-0-0-0,cache=none
> -device
> scsi-hd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0
> -drive file=/var/lib/one//datastores/116/312/*disk.1*
> ,format=raw,if=none,id=drive-ide0-0-0,readonly=on
> -device ide-cd,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0
>
> -netdev tap,fd=26,id=hostnet0
> -device e1000,netdev=hostnet0,id=net0,mac=02:00:5c:f0:e4:39,bus=pci.0,addr=0x3
> -chardev pty,id=charserial0 -device
> isa-serial,chardev=charserial0,id=serial0
> -chardev 
> socket,id=charchannel0,path=/var/lib/libvirt/qemu/channel/target/domain-one-312/org.qemu.guest_agent.0,server,nowait
> -device
> virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0
> -vnc 0.0.0.0:312,password -device cirrus-vga,id=video0,bus=pci.0,addr=0x2
> -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x7 -msg timestamp=on
>
> I’ve highlighted disks. First is VM context disk - Fuse used, second is
> SDA (OS is installed here) - libgfapi used, third is SWAP - Fuse used.
>
> Krutika,
> I will start profiling on Gluster Volumes and wait for next VM to fail.
> Than I will attach/send profiling info after some VM will be failed. I
> suppose this is correct profiling strategy.
>

About this, how many vms do you need to recreate it? A single vm? Or
multiple vms doing IO in parallel?


> Thanks,
> BR!
> Martin
>
> On 13 May 2019, at 09:21, Krutika Dhananjay  wrote:
>
> Also, what's the caching policy that qemu is using on the affected vms?
> Is it cache=none? Or something else? You can get this information in the
> command line of qemu-kvm process corresponding to your vm in the ps output.
>
> -Krutika
>
> On Mon, May 13, 2019 at 12:49 PM Krutika Dhananjay 
> wrote:
>
>> What version of gluster are you using?
>> Also, can you capture and share volume-profile output for a run where you
>> manage to recreate this issue?
>>
>> https://docs.gluster.org/en/v3/Administrator%20Guide/Monitoring%20Workload/#running-glusterfs-volume-profile-command
>> Let me know if you have any questions.
>>
>> -Krutika
>>
>> On Mon, May 13, 2019 at 12:34 PM Martin Toth 
>> wrote:
>>
>>> Hi,
>>>
>>> there is no healing operation, not peer disconnects, no readonly
>>> filesystem. Yes, storage is slow and unavailable for 120 seconds, but why,
>>> its SSD with 10G, performance is good.
>>>
>>> > you'd have it's log on qemu's standard output,
>>>
>>> If you mean /var/log/libvirt/qemu/vm.log there is nothing. I am looking
>>> for problem for more than month, tried everything. Can’t find anything. Any
>>> more clues or leads?
>>>
>>> BR,
>>> Martin
>>>
>>> > On 13 May 2019, at 08:55, lemonni...@ulrar.net wrote:
>>> >
>>> > On Mon, May 13, 2019 at 08:47:45AM +0200, Martin Toth wrote:
>>> >> Hi all,
>>> >
>>> > Hi
>>> >
>>> >>
>>> >> I am running replica 3 on SSDs with 10G networking, everything works
>>> OK but VMs stored in Gluster volume occasionally freeze with “Task XY
>>> blocked for more than 120 seconds”.
>>> >> Only solution is to poweroff (hard) VM an

Re: [Gluster-users] VMs blocked for more than 120 seconds

2019-05-13 Thread Krutika Dhananjay
Also, what's the caching policy that qemu is using on the affected vms?
Is it cache=none? Or something else? You can get this information in the
command line of qemu-kvm process corresponding to your vm in the ps output.

-Krutika

On Mon, May 13, 2019 at 12:49 PM Krutika Dhananjay 
wrote:

> What version of gluster are you using?
> Also, can you capture and share volume-profile output for a run where you
> manage to recreate this issue?
>
> https://docs.gluster.org/en/v3/Administrator%20Guide/Monitoring%20Workload/#running-glusterfs-volume-profile-command
> Let me know if you have any questions.
>
> -Krutika
>
> On Mon, May 13, 2019 at 12:34 PM Martin Toth  wrote:
>
>> Hi,
>>
>> there is no healing operation, not peer disconnects, no readonly
>> filesystem. Yes, storage is slow and unavailable for 120 seconds, but why,
>> its SSD with 10G, performance is good.
>>
>> > you'd have it's log on qemu's standard output,
>>
>> If you mean /var/log/libvirt/qemu/vm.log there is nothing. I am looking
>> for problem for more than month, tried everything. Can’t find anything. Any
>> more clues or leads?
>>
>> BR,
>> Martin
>>
>> > On 13 May 2019, at 08:55, lemonni...@ulrar.net wrote:
>> >
>> > On Mon, May 13, 2019 at 08:47:45AM +0200, Martin Toth wrote:
>> >> Hi all,
>> >
>> > Hi
>> >
>> >>
>> >> I am running replica 3 on SSDs with 10G networking, everything works
>> OK but VMs stored in Gluster volume occasionally freeze with “Task XY
>> blocked for more than 120 seconds”.
>> >> Only solution is to poweroff (hard) VM and than boot it up again. I am
>> unable to SSH and also login with console, its stuck probably on some disk
>> operation. No error/warning logs or messages are store in VMs logs.
>> >>
>> >
>> > As far as I know this should be unrelated, I get this during heals
>> > without any freezes, it just means the storage is slow I think.
>> >
>> >> KVM/Libvirt(qemu) using libgfapi and fuse mount to access VM disks on
>> replica volume. Can someone advice  how to debug this problem or what can
>> cause these issues?
>> >> It’s really annoying, I’ve tried to google everything but nothing came
>> up. I’ve tried changing virtio-scsi-pci to virtio-blk-pci disk drivers, but
>> its not related.
>> >>
>> >
>> > Any chance your gluster goes readonly ? Have you checked your gluster
>> > logs to see if maybe they lose each other some times ?
>> > /var/log/glusterfs
>> >
>> > For libgfapi accesses you'd have it's log on qemu's standard output,
>> > that might contain the actual error at the time of the freez.
>> > ___
>> > Gluster-users mailing list
>> > Gluster-users@gluster.org
>> > https://lists.gluster.org/mailman/listinfo/gluster-users
>>
>> ___
>> Gluster-users mailing list
>> Gluster-users@gluster.org
>> https://lists.gluster.org/mailman/listinfo/gluster-users
>
>
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] VMs blocked for more than 120 seconds

2019-05-13 Thread Krutika Dhananjay
What version of gluster are you using?
Also, can you capture and share volume-profile output for a run where you
manage to recreate this issue?
https://docs.gluster.org/en/v3/Administrator%20Guide/Monitoring%20Workload/#running-glusterfs-volume-profile-command
Let me know if you have any questions.

-Krutika

On Mon, May 13, 2019 at 12:34 PM Martin Toth  wrote:

> Hi,
>
> there is no healing operation, not peer disconnects, no readonly
> filesystem. Yes, storage is slow and unavailable for 120 seconds, but why,
> its SSD with 10G, performance is good.
>
> > you'd have it's log on qemu's standard output,
>
> If you mean /var/log/libvirt/qemu/vm.log there is nothing. I am looking
> for problem for more than month, tried everything. Can’t find anything. Any
> more clues or leads?
>
> BR,
> Martin
>
> > On 13 May 2019, at 08:55, lemonni...@ulrar.net wrote:
> >
> > On Mon, May 13, 2019 at 08:47:45AM +0200, Martin Toth wrote:
> >> Hi all,
> >
> > Hi
> >
> >>
> >> I am running replica 3 on SSDs with 10G networking, everything works OK
> but VMs stored in Gluster volume occasionally freeze with “Task XY blocked
> for more than 120 seconds”.
> >> Only solution is to poweroff (hard) VM and than boot it up again. I am
> unable to SSH and also login with console, its stuck probably on some disk
> operation. No error/warning logs or messages are store in VMs logs.
> >>
> >
> > As far as I know this should be unrelated, I get this during heals
> > without any freezes, it just means the storage is slow I think.
> >
> >> KVM/Libvirt(qemu) using libgfapi and fuse mount to access VM disks on
> replica volume. Can someone advice  how to debug this problem or what can
> cause these issues?
> >> It’s really annoying, I’ve tried to google everything but nothing came
> up. I’ve tried changing virtio-scsi-pci to virtio-blk-pci disk drivers, but
> its not related.
> >>
> >
> > Any chance your gluster goes readonly ? Have you checked your gluster
> > logs to see if maybe they lose each other some times ?
> > /var/log/glusterfs
> >
> > For libgfapi accesses you'd have it's log on qemu's standard output,
> > that might contain the actual error at the time of the freez.
> > ___
> > Gluster-users mailing list
> > Gluster-users@gluster.org
> > https://lists.gluster.org/mailman/listinfo/gluster-users
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Settings for VM hosting

2019-04-22 Thread Krutika Dhananjay
On Fri, Apr 19, 2019 at 12:48 PM  wrote:

> On Fri, Apr 19, 2019 at 06:47:49AM +0530, Krutika Dhananjay wrote:
> > Looks good mostly.
> > You can also turn on performance.stat-prefetch, and also set
>
> Ah the corruption bug has been fixed, I missed that. Great !
>
> > client.event-threads and server.event-threads to 4.
>
> I didn't realize that would also apply to libgfapi ?
> Good to know, thanks.
>
> > And if your bricks are on ssds, then you could also enable
> > performance.client-io-threads.
>
> I'm surprised by that, the doc says "This feature is not recommended for
> distributed, replicated or distributed-replicated volumes."
> Since this volume is just a replica 3, shouldn't this stay off ?
> The disks are all nvme, which I assume would count as ssd.
>

They're not recommended if you're using slower disks (HDDs for instance)
as it can increase the number of fsyncs triggered by replicate module and
their slowness
can degrade performance. With nvme/ssds this should not be a problem and
the net result
of enabling client-io-threads there should be an improvement in perf.

-Krutika


> > And if your bricks and hypervisors are on same set of machines
> > (hyperconverged),
> > then you can turn off cluster.choose-local and see if it helps read
> > performance.
>
> Thanks, we'll give those a try !
>
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Settings for VM hosting

2019-04-18 Thread Krutika Dhananjay
Looks good mostly.
You can also turn on performance.stat-prefetch, and also set
client.event-threads and server.event-threads to 4.
And if your bricks are on ssds, then you could also enable
performance.client-io-threads.
And if your bricks and hypervisors are on same set of machines
(hyperconverged),
then you can turn off cluster.choose-local and see if it helps read
performance.

Do let us know what helped and what didn't.

-Krutika

On Thu, Apr 18, 2019 at 1:05 PM  wrote:

> Hi,
>
> We've been using the same settings, found in an old email here, since
> v3.7 of gluster for our VM hosting volumes. They've been working fine
> but since we've just installed a v6 for testing I figured there might
> be new settings I should be aware of.
>
> So for access through the libgfapi (qemu), for VM hard drives, is that
> still optimal and recommended ?
>
> Volume Name: glusterfs
> Type: Replicate
> Volume ID: b28347ff-2c27-44e0-bc7d-c1c017df7cd1
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 1 x 3 = 3
> Transport-type: tcp
> Bricks:
> Brick1: ips1adm.X:/mnt/glusterfs/brick
> Brick2: ips2adm.X:/mnt/glusterfs/brick
> Brick3: ips3adm.X:/mnt/glusterfs/brick
> Options Reconfigured:
> performance.readdir-ahead: on
> cluster.quorum-type: auto
> cluster.server-quorum-type: server
> network.remote-dio: enable
> cluster.eager-lock: enable
> performance.quick-read: off
> performance.read-ahead: off
> performance.io-cache: off
> performance.stat-prefetch: off
> features.shard: on
> features.shard-block-size: 64MB
> cluster.data-self-heal-algorithm: full
> network.ping-timeout: 30
> diagnostics.count-fop-hits: on
> diagnostics.latency-measurement: on
> transport.address-family: inet
> nfs.disable: on
> performance.client-io-threads: off
>
> Thanks !
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> https://lists.gluster.org/mailman/listinfo/gluster-users
>
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] [ovirt-users] Re: Announcing Gluster release 5.5

2019-03-31 Thread Krutika Dhananjay
Adding back gluster-users
Comments inline ...

On Fri, Mar 29, 2019 at 8:11 PM Olaf Buitelaar 
wrote:

> Dear Krutika,
>
>
>
> 1. I’ve made 2 profile runs of around 10 minutes (see files
> profile_data.txt and profile_data2.txt). Looking at it, most time seems be
> spent at the  fop’s fsync and readdirp.
>
> Unfortunate I don’t have the profile info for the 3.12.15 version so it’s
> a bit hard to compare.
>
> One additional thing I do notice on 1 machine (10.32.9.5) the iowait time
> increased a lot, from an average below the 1% it’s now around the 12% after
> the upgrade.
>
> So first suspicion with be lighting strikes twice, and I’ve also just now
> a bad disk, but that doesn’t appear to be the case, since all smart status
> report ok.
>
> Also dd shows performance I would more or less expect;
>
> dd if=/dev/zero of=/data/test_file  bs=100M count=1  oflag=dsync
>
> 1+0 records in
>
> 1+0 records out
>
> 104857600 bytes (105 MB) copied, 0.686088 s, 153 MB/s
>
> dd if=/dev/zero of=/data/test_file  bs=1G count=1  oflag=dsync
>
> 1+0 records in
>
> 1+0 records out
>
> 1073741824 bytes (1.1 GB) copied, 7.61138 s, 141 MB/s
>
> if=/dev/urandom of=/data/test_file  bs=1024 count=100
>
> 100+0 records in
>
> 100+0 records out
>
> 102400 bytes (1.0 GB) copied, 6.35051 s, 161 MB/s
>
> dd if=/dev/zero of=/data/test_file  bs=1024 count=100
>
> 100+0 records in
>
> 100+0 records out
>
> 102400 bytes (1.0 GB) copied, 1.6899 s, 606 MB/s
>
> When I disable this brick (service glusterd stop; pkill glusterfsd)
> performance in gluster is better, but not on par with what it was. Also the
> cpu usages on the “neighbor” nodes which hosts the other bricks in the same
> subvolume increases quite a lot in this case, which I wouldn’t expect
> actually since they shouldn't handle much more work, except flagging shards
> to heal. Iowait  also goes to idle once gluster is stopped, so it’s for
> sure gluster which waits for io.
>
>
>

So I see that FSYNC %-latency is on the higher side. And I also noticed you
don't have direct-io options enabled on the volume.
Could you set the following options on the volume -
# gluster volume set  network.remote-dio off
# gluster volume set  performance.strict-o-direct on
and also disable choose-local
# gluster volume set  cluster.choose-local off

let me know if this helps.

2. I’ve attached the mnt log and volume info, but I couldn’t find anything
> relevant in in those logs. I think this is because we run the VM’s with
> libgfapi;
>
> [root@ovirt-host-01 ~]# engine-config  -g LibgfApiSupported
>
> LibgfApiSupported: true version: 4.2
>
> LibgfApiSupported: true version: 4.1
>
> LibgfApiSupported: true version: 4.3
>
> And I can confirm the qemu process is invoked with the gluster:// address
> for the images.
>
> The message is logged in the /var/lib/libvert/qemu/  file, which
> I’ve also included. For a sample case see around; 2019-03-28 20:20:07
>
> Which has the error; E [MSGID: 133010]
> [shard.c:2294:shard_common_lookup_shards_cbk] 0-ovirt-kube-shard: Lookup on
> shard 109886 failed. Base file gfid = a38d64bc-a28b-4ee1-a0bb-f919e7a1022c
> [Stale file handle]
>

Could you also attach the brick logs for this volume?


>
> 3. yes I see multiple instances for the same brick directory, like;
>
> /usr/sbin/glusterfsd -s 10.32.9.6 --volfile-id
> ovirt-core.10.32.9.6.data-gfs-bricks-brick1-ovirt-core -p
> /var/run/gluster/vols/ovirt-core/10.32.9.6-data-gfs-bricks-brick1-ovirt-core.pid
> -S /var/run/gluster/452591c9165945d9.socket --brick-name
> /data/gfs/bricks/brick1/ovirt-core -l
> /var/log/glusterfs/bricks/data-gfs-bricks-brick1-ovirt-core.log
> --xlator-option *-posix.glusterd-uuid=fb513da6-f3bd-4571-b8a2-db5efaf60cc1
> --process-name brick --brick-port 49154 --xlator-option
> ovirt-core-server.listen-port=49154
>
>
>
> I’ve made an export of the output of ps from the time I observed these
> multiple processes.
>
> In addition the brick_mux bug as noted by Atin. I might also have another
> possible cause, as ovirt moves nodes from none-operational state or
> maintenance state to active/activating, it also seems to restart gluster,
> however I don’t have direct proof for this theory.
>
>
>

+Atin Mukherjee  ^^
+Mohit Agrawal   ^^

-Krutika

Thanks Olaf
>
> Op vr 29 mrt. 2019 om 10:03 schreef Sandro Bonazzola  >:
>
>>
>>
>> Il giorno gio 28 mar 2019 alle ore 17:48  ha
>> scritto:
>>
>>> Dear All,
>>>
>>> I wanted to share my experience upgrading from 4.2.8 to 4.3.1. While
>>> previous upgrades from 4.1 to 4.2 etc. went rather smooth, this one was a
>>> different experience. After first trying a test upgrade on a 3 node setup,
>>> which went fine. i headed to upgrade the 9 node production platform,
>>> unaware of the backward compatibility issues between gluster 3.12.15 ->
>>> 5.3. After upgrading 2 nodes, the HA engine stopped and wouldn't start.
>>> Vdsm wasn't able to mount the engine storage domain, since /dom_md/metadata
>>> was missing or couldn't be accessed. 

Re: [Gluster-users] [ovirt-users] Re: Announcing Gluster release 5.5

2019-03-29 Thread Krutika Dhananjay
Questions/comments inline ...

On Thu, Mar 28, 2019 at 10:18 PM  wrote:

> Dear All,
>
> I wanted to share my experience upgrading from 4.2.8 to 4.3.1. While
> previous upgrades from 4.1 to 4.2 etc. went rather smooth, this one was a
> different experience. After first trying a test upgrade on a 3 node setup,
> which went fine. i headed to upgrade the 9 node production platform,
> unaware of the backward compatibility issues between gluster 3.12.15 ->
> 5.3. After upgrading 2 nodes, the HA engine stopped and wouldn't start.
> Vdsm wasn't able to mount the engine storage domain, since /dom_md/metadata
> was missing or couldn't be accessed. Restoring this file by getting a good
> copy of the underlying bricks, removing the file from the underlying bricks
> where the file was 0 bytes and mark with the stickybit, and the
> corresponding gfid's. Removing the file from the mount point, and copying
> back the file on the mount point. Manually mounting the engine domain,  and
> manually creating the corresponding symbolic links in /rhev/data-center and
> /var/run/vdsm/storage and fixing the ownership back to vdsm.kvm (which was
> root.root), i was able to start the HA engine again. Since the engine was
> up again, and things seemed rather unstable i decided to continue the
> upgrade on the other nodes suspecting an incompatibility in gluster
> versions, i thought would be best to have them all on the same version
> rather soonish. However things went from bad to worse, the engine stopped
> again, and all vm’s stopped working as well.  So on a machine outside the
> setup and restored a backup of the engine taken from version 4.2.8 just
> before the upgrade. With this engine I was at least able to start some vm’s
> again, and finalize the upgrade. Once the upgraded, things didn’t stabilize
> and also lose 2 vm’s during the process due to image corruption. After
> figuring out gluster 5.3 had quite some issues I was as lucky to see
> gluster 5.5 was about to be released, on the moment the RPM’s were
> available I’ve installed those. This helped a lot in terms of stability,
> for which I’m very grateful! However the performance is unfortunate
> terrible, it’s about 15% of what the performance was running gluster
> 3.12.15. It’s strange since a simple dd shows ok performance, but our
> actual workload doesn’t. While I would expect the performance to be better,
> due to all improvements made since gluster version 3.12. Does anybody share
> the same experience?
> I really hope gluster 6 will soon be tested with ovirt and released, and
> things start to perform and stabilize again..like the good old days. Of
> course when I can do anything, I’m happy to help.
>
> I think the following short list of issues we have after the migration;
> Gluster 5.5;
> -   Poor performance for our workload (mostly write dependent)
>

For this, could you share the volume-profile output specifically for the
affected volume(s)? Here's what you need to do -

1. # gluster volume profile $VOLNAME stop
2. # gluster volume profile $VOLNAME start
3. Run the test inside the vm wherein you see bad performance
4. # gluster volume profile $VOLNAME info # save the output of this command
into a file
5. # gluster volume profile $VOLNAME stop
6. and attach the output file gotten in step 4

-   VM’s randomly pause on un
>
known storage errors, which are “stale file’s”. corresponding log; Lookup
> on shard 797 failed. Base file gfid = 8a27b91a-ff02-42dc-bd4c-caa019424de8
> [Stale file handle]
>

Could you share the complete gluster client log file (it would be a
filename matching the pattern rhev-data-center-mnt-glusterSD-*)
Also the output of `gluster volume info $VOLNAME`



> -   Some files are listed twice in a directory (probably related the
> stale file issue?)
> Example;
> ls -la
> /rhev/data-center/59cd53a9-0003-02d7-00eb-01e3/313f5d25-76af-4ecd-9a20-82a2fe815a3c/images/4add6751-3731-4bbd-ae94-aaeed12ea450/
> total 3081
> drwxr-x---.  2 vdsm kvm4096 Mar 18 11:34 .
> drwxr-xr-x. 13 vdsm kvm4096 Mar 19 09:42 ..
> -rw-rw.  1 vdsm kvm 1048576 Mar 28 12:55
> 1a7cf259-6b29-421d-9688-b25dfaafb13c
> -rw-rw.  1 vdsm kvm 1048576 Mar 28 12:55
> 1a7cf259-6b29-421d-9688-b25dfaafb13c
> -rw-rw.  1 vdsm kvm 1048576 Jan 27  2018
> 1a7cf259-6b29-421d-9688-b25dfaafb13c.lease
> -rw-r--r--.  1 vdsm kvm 290 Jan 27  2018
> 1a7cf259-6b29-421d-9688-b25dfaafb13c.meta
> -rw-r--r--.  1 vdsm kvm 290 Jan 27  2018
> 1a7cf259-6b29-421d-9688-b25dfaafb13c.meta
>

Adding DHT and readdir-ahead maintainers regarding entries getting listed
twice.
@Nithya Balachandran  ^^
@Gowdappa, Raghavendra  ^^
@Poornima Gurusiddaiah  ^^


>
> - brick processes sometimes starts multiple times. Sometimes I’ve 5 brick
> processes for a single volume. Killing all glusterfsd’s for the volume on
> the machine and running gluster v start  force usually just starts one
> after the event, from then on things look all right.
>

Did you mean 5 brick processes for 

Re: [Gluster-users] [ovirt-users] Re: VM disk corruption with LSM on Gluster

2019-03-27 Thread Krutika Dhananjay
This is needed to prevent any inconsistencies stemming from buffered
writes/caching file data during live VM migration.
Besides, for Gluster to truly honor direct-io behavior in qemu's
'cache=none' mode (which is what oVirt uses),
one needs to turn on performance.strict-o-direct and disable remote-dio.

-Krutika

On Wed, Mar 27, 2019 at 12:24 PM Leo David  wrote:

> Hi,
> I can confirm that after setting these two options, I haven't encountered
> disk corruptions anymore.
> The downside, is that at least for me it had a pretty big impact on
> performance.
> The iops really went down - performing  inside vm fio tests.
>
> On Wed, Mar 27, 2019, 07:03 Krutika Dhananjay  wrote:
>
>> Could you enable strict-o-direct and disable remote-dio on the src volume
>> as well, restart the vms on "old" and retry migration?
>>
>> # gluster volume set  performance.strict-o-direct on
>> # gluster volume set  network.remote-dio off
>>
>> -Krutika
>>
>> On Tue, Mar 26, 2019 at 10:32 PM Sander Hoentjen 
>> wrote:
>>
>>> On 26-03-19 14:23, Sahina Bose wrote:
>>> > +Krutika Dhananjay and gluster ml
>>> >
>>> > On Tue, Mar 26, 2019 at 6:16 PM Sander Hoentjen 
>>> wrote:
>>> >> Hello,
>>> >>
>>> >> tl;dr We have disk corruption when doing live storage migration on
>>> oVirt
>>> >> 4.2 with gluster 3.12.15. Any idea why?
>>> >>
>>> >> We have a 3-node oVirt cluster that is both compute and
>>> gluster-storage.
>>> >> The manager runs on separate hardware. We are running out of space on
>>> >> this volume, so we added another Gluster volume that is bigger, put a
>>> >> storage domain on it and then we migrated VM's to it with LSM. After
>>> >> some time, we noticed that (some of) the migrated VM's had corrupted
>>> >> filesystems. After moving everything back with export-import to the
>>> old
>>> >> domain where possible, and recovering from backups where needed we set
>>> >> off to investigate this issue.
>>> >>
>>> >> We are now at the point where we can reproduce this issue within a
>>> day.
>>> >> What we have found so far:
>>> >> 1) The corruption occurs at the very end of the replication step, most
>>> >> probably between START and FINISH of diskReplicateFinish, before the
>>> >> START merge step
>>> >> 2) In the corrupted VM, at some place where data should be, this data
>>> is
>>> >> replaced by zero's. This can be file-contents or a directory-structure
>>> >> or whatever.
>>> >> 3) The source gluster volume has different settings then the
>>> destination
>>> >> (Mostly because the defaults were different at creation time):
>>> >>
>>> >> Setting old(src)  new(dst)
>>> >> cluster.op-version  30800 30800 (the same)
>>> >> cluster.max-op-version  31202 31202 (the same)
>>> >> cluster.metadata-self-heal  off   on
>>> >> cluster.data-self-heal  off   on
>>> >> cluster.entry-self-heal off   on
>>> >> performance.low-prio-threads1632
>>> >> performance.strict-o-direct off   on
>>> >> network.ping-timeout4230
>>> >> network.remote-dio  enableoff
>>> >> transport.address-family- inet
>>> >> performance.stat-prefetch   off   on
>>> >> features.shard-block-size   512MB 64MB
>>> >> cluster.shd-max-threads 1 8
>>> >> cluster.shd-wait-qlength1024  1
>>> >> cluster.locking-scheme  full  granular
>>> >> cluster.granular-entry-heal noenable
>>> >>
>>> >> 4) To test, we migrate some VM's back and forth. The corruption does
>>> not
>>> >> occur every time. To this point it only occurs from old to new, but we
>>> >> don't have enough data-points to be sure about that.
>>> >>
>>> >> Anybody an idea what is causing the corruption? Is this the best list
>>> to
>>> >> ask, or should I ask on a Gluster list? I am not sure if this is oVirt
>

Re: [Gluster-users] [ovirt-users] Re: VM disk corruption with LSM on Gluster

2019-03-26 Thread Krutika Dhananjay
Could you enable strict-o-direct and disable remote-dio on the src volume
as well, restart the vms on "old" and retry migration?

# gluster volume set  performance.strict-o-direct on
# gluster volume set  network.remote-dio off

-Krutika

On Tue, Mar 26, 2019 at 10:32 PM Sander Hoentjen  wrote:

> On 26-03-19 14:23, Sahina Bose wrote:
> > +Krutika Dhananjay and gluster ml
> >
> > On Tue, Mar 26, 2019 at 6:16 PM Sander Hoentjen 
> wrote:
> >> Hello,
> >>
> >> tl;dr We have disk corruption when doing live storage migration on oVirt
> >> 4.2 with gluster 3.12.15. Any idea why?
> >>
> >> We have a 3-node oVirt cluster that is both compute and gluster-storage.
> >> The manager runs on separate hardware. We are running out of space on
> >> this volume, so we added another Gluster volume that is bigger, put a
> >> storage domain on it and then we migrated VM's to it with LSM. After
> >> some time, we noticed that (some of) the migrated VM's had corrupted
> >> filesystems. After moving everything back with export-import to the old
> >> domain where possible, and recovering from backups where needed we set
> >> off to investigate this issue.
> >>
> >> We are now at the point where we can reproduce this issue within a day.
> >> What we have found so far:
> >> 1) The corruption occurs at the very end of the replication step, most
> >> probably between START and FINISH of diskReplicateFinish, before the
> >> START merge step
> >> 2) In the corrupted VM, at some place where data should be, this data is
> >> replaced by zero's. This can be file-contents or a directory-structure
> >> or whatever.
> >> 3) The source gluster volume has different settings then the destination
> >> (Mostly because the defaults were different at creation time):
> >>
> >> Setting old(src)  new(dst)
> >> cluster.op-version  30800 30800 (the same)
> >> cluster.max-op-version  31202 31202 (the same)
> >> cluster.metadata-self-heal  off   on
> >> cluster.data-self-heal  off   on
> >> cluster.entry-self-heal off   on
> >> performance.low-prio-threads1632
> >> performance.strict-o-direct off   on
> >> network.ping-timeout4230
> >> network.remote-dio  enableoff
> >> transport.address-family- inet
> >> performance.stat-prefetch   off   on
> >> features.shard-block-size   512MB 64MB
> >> cluster.shd-max-threads 1 8
> >> cluster.shd-wait-qlength1024  1
> >> cluster.locking-scheme  full  granular
> >> cluster.granular-entry-heal noenable
> >>
> >> 4) To test, we migrate some VM's back and forth. The corruption does not
> >> occur every time. To this point it only occurs from old to new, but we
> >> don't have enough data-points to be sure about that.
> >>
> >> Anybody an idea what is causing the corruption? Is this the best list to
> >> ask, or should I ask on a Gluster list? I am not sure if this is oVirt
> >> specific or Gluster specific though.
> > Do you have logs from old and new gluster volumes? Any errors in the
> > new volume's fuse mount logs?
>
> Around the time of corruption I see the message:
> The message "I [MSGID: 133017] [shard.c:4941:shard_seek]
> 0-ZoneA_Gluster1-shard: seek called on
> 7fabc273-3d8a-4a49-8906-b8ccbea4a49f. [Operation not supported]" repeated
> 231 times between [2019-03-26 13:14:22.297333] and [2019-03-26
> 13:15:42.912170]
>
> I also see this message at other times, when I don't see the corruption
> occur, though.
>
> --
> Sander
> ___
> Users mailing list -- us...@ovirt.org
> To unsubscribe send an email to users-le...@ovirt.org
> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
> oVirt Code of Conduct:
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives:
> https://lists.ovirt.org/archives/list/us...@ovirt.org/message/M3T2VGGGV6DE643ZKKJUAF274VSWTJFH/
>
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] [ovirt-users] Tracking down high writes in GlusterFS volume

2019-02-25 Thread Krutika Dhananjay
On Fri, Feb 15, 2019 at 12:30 AM Jayme  wrote:

> Running an oVirt 4.3 HCI 3-way replica cluster with SSD backed storage.
> I've noticed that my SSD writes (smart Total_LBAs_Written) are quite high
> on one particular drive.  Specifically I've noticed one volume is much much
> higher total bytes written than others (despite using less overall space).
>

Writes are higher on one particular volume? Or did one brick witness more
writes than its two replicas within the same volume? Could you share the
volume info output of the affected volume plus the name of the affected
brick if at all the issue is with one single brick?

Also, did you check if the volume was undergoing any heals (`gluster volume
heal  info`)?

-Krutika

My volume is writing over 1TB of data per day (by my manual calculation,
> and with glusterfs profiling) and wearing my SSDs quickly, how can I best
> determine which VM or process is at fault here?
>
> There are 5 low use VMs using the volume in question.  I'm attempting to
> track iostats on each of the vm's individually but so far I'm not seeing
> anything obvious that would account for 1TB of writes per day that the
> gluster volume is reporting.
> ___
> Users mailing list -- us...@ovirt.org
> To unsubscribe send an email to users-le...@ovirt.org
> Privacy Statement: https://www.ovirt.org/site/privacy-policy/
> oVirt Code of Conduct:
> https://www.ovirt.org/community/about/community-guidelines/
> List Archives:
> https://lists.ovirt.org/archives/list/us...@ovirt.org/message/OZHZXQS4GUPPJXOZSBTO6X5ZL6CATFXK/
>
___
Gluster-users mailing list
Gluster-users@gluster.org
https://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] [Stale file handle] in shard volume

2019-01-13 Thread Krutika Dhananjay
Hi,

So the main issue is that certain vms seem to be pausing? Did I understand
that right?
Could you share the gluster-mount logs around the time the pause was seen?
And the brick logs too please?

As for ESTALE errors, the real cause of pauses can be determined from
errors/warnings logged by fuse. Mere occurrence of ESTALE errors against
shard function in logs doesn't necessarily indicate that is the reason for
the pause. Also, in this instance, the ESTALE errors it seems are
propagated by the lower translators (DHT? protocol/client? Or even bricks?)
and shard is merely logging the same.

-Krutika


On Sun, Jan 13, 2019 at 10:11 PM Olaf Buitelaar 
wrote:

> @Krutika if you need any further information, please let me know.
>
> Thanks Olaf
>
> Op vr 4 jan. 2019 om 07:51 schreef Nithya Balachandran <
> nbala...@redhat.com>:
>
>> Adding Krutika.
>>
>> On Wed, 2 Jan 2019 at 20:56, Olaf Buitelaar 
>> wrote:
>>
>>> Hi Nithya,
>>>
>>> Thank you for your reply.
>>>
>>> the VM's using the gluster volumes keeps on getting paused/stopped on
>>> errors like these;
>>> [2019-01-02 02:33:44.469132] E [MSGID: 133010]
>>> [shard.c:1724:shard_common_lookup_shards_cbk] 0-ovirt-kube-shard: Lookup on
>>> shard 101487 failed. Base file gfid = a38d64bc-a28b-4ee1-a0bb-f919e7a1022c
>>> [Stale file handle]
>>> [2019-01-02 02:33:44.563288] E [MSGID: 133010]
>>> [shard.c:1724:shard_common_lookup_shards_cbk] 0-ovirt-kube-shard: Lookup on
>>> shard 101488 failed. Base file gfid = a38d64bc-a28b-4ee1-a0bb-f919e7a1022c
>>> [Stale file handle]
>>>
>>> Krutika, Can you take a look at this?
>>
>>
>>>
>>> What i'm trying to find out, if i can purge all gluster volumes from all
>>> possible stale file handles (and hopefully find a method to prevent this in
>>> the future), so the VM's can start running stable again.
>>> For this i need to know when the "shard_common_lookup_shards_cbk"
>>> function considers a file as stale.
>>> The statement; "Stale file handle errors show up when a file with a
>>> specified gfid is not found." doesn't seem to cover it all, as i've shown
>>> in earlier mails the shard file and glusterfs/xx/xx/uuid file do both
>>> exist, and have the same inode.
>>> If the criteria i'm using aren't correct, could you please tell me which
>>> criteria i should use to determine if a file is stale or not?
>>> these criteria are just based observations i made, moving the stale
>>> files manually. After removing them i was able to start the VM again..until
>>> some time later it hangs on another stale shard file unfortunate.
>>>
>>> Thanks Olaf
>>>
>>> Op wo 2 jan. 2019 om 14:20 schreef Nithya Balachandran <
>>> nbala...@redhat.com>:
>>>


 On Mon, 31 Dec 2018 at 01:27, Olaf Buitelaar 
 wrote:

> Dear All,
>
> till now a selected group of VM's still seem to produce new stale
> file's and getting paused due to this.
> I've not updated gluster recently, however i did change the op version
> from 31200 to 31202 about a week before this issue arose.
> Looking at the .shard directory, i've 100.000+ files sharing the same
> characteristics as a stale file. which are found till now,
> they all have the sticky bit set, e.g. file permissions; -T.
> are 0kb in size, and have the trusted.glusterfs.dht.linkto attribute.
>

 These are internal files used by gluster and do not necessarily mean
 they are stale. They "point" to data files which may be on different bricks
 (same name, gfid etc but no linkto xattr and no T permissions).


> These files range from long a go (beginning of the year) till now.
> Which makes me suspect this was laying dormant for some time now..and
> somehow recently surfaced.
> Checking other sub-volumes they contain also 0kb files in the .shard
> directory, but don't have the sticky bit and the linkto attribute.
>
> Does anybody else experience this issue? Could this be a bug or an
> environmental issue?
>
 These are most likely valid files- please do not delete them without
 double-checking.

 Stale file handle errors show up when a file with a specified gfid is
 not found. You will need to debug the files for which you see this error by
 checking the bricks to see if they actually exist.

>
> Also i wonder if there is any tool or gluster command to clean all
> stale file handles?
> Otherwise i'm planning to make a simple bash script, which iterates
> over the .shard dir, checks each file for the above mentioned criteria, 
> and
> (re)moves the file and the corresponding .glusterfs file.
> If there are other criteria needed to identify a stale file handle, i
> would like to hear that.
> If this is a viable and safe operation to do of course.
>
> Thanks Olaf
>
>
>
> Op do 20 dec. 2018 om 13:43 schreef Olaf Buitelaar <
> olaf.buitel...@gmail.com>:
>
>> Dear All,
>>
>> I 

Re: [Gluster-users] posix_handle_hard [file exists]

2018-11-05 Thread Krutika Dhananjay
The rename log messages are informational and can be ignored.

-Krutika

On Mon, Nov 5, 2018 at 8:30 PM Jorick Astrego  wrote:

> I see a lot of DHT warnings in
> rhev-data-center-mnt-glusterSD-192.168.99.14:_hdd2.log:
>
> [2018-10-21 01:24:01.413126] I [glusterfsd-mgmt.c:1600:mgmt_getspec_cbk]
> 0-glusterfs: No change in volfile, continuing
> [2018-11-01 12:48:32.537621] I [MSGID: 109066]
> [dht-rename.c:1569:dht_rename] 0-hdd2-dht: renaming
> /5a75dd72-2ab6-457d-b357-6d14b3bc2c3e/images/b48f6dcc-8fbc-4eb2-bb8b-7a3e03e72899/73655cd8-adfc-404a-8ef7-7bbaee9d43d0.meta.new
> (hash=hdd2-replicate-0/cache=hdd2-replicate-0) =>
> /5a75dd72-2ab6-457d-b357-6d14b3bc2c3e/images/b48f6dcc-8fbc-4eb2-bb8b-7a3e03e72899/73655cd8-adfc-404a-8ef7-7bbaee9d43d0.meta
> (hash=hdd2-replicate-0/cache=)
> [2018-11-01 13:31:17.726431] I [MSGID: 109066]
> [dht-rename.c:1569:dht_rename] 0-hdd2-dht: renaming
> /5a75dd72-2ab6-457d-b357-6d14b3bc2c3e/images/f43f4dbd-14ca-49fb-98da-cb32c26b05b7/16ac297a-14e1-43d3-a4d9-f2b8d183c1e1.meta.new
> (hash=hdd2-replicate-0/cache=hdd2-replicate-0) =>
> /5a75dd72-2ab6-457d-b357-6d14b3bc2c3e/images/f43f4dbd-14ca-49fb-98da-cb32c26b05b7/16ac297a-14e1-43d3-a4d9-f2b8d183c1e1.meta
> (hash=hdd2-replicate-0/cache=hdd2-replicate-0)
> [2018-11-01 13:31:18.316010] I [MSGID: 109066]
> [dht-rename.c:1569:dht_rename] 0-hdd2-dht: renaming
> /5a75dd72-2ab6-457d-b357-6d14b3bc2c3e/images/72bcb410-b20e-45f2-a269-a73e14c550cf/b0cbc7df-2761-4b74-8ca2-ee311fd57bd3.meta.new
> (hash=hdd2-replicate-0/cache=hdd2-replicate-0) =>
> /5a75dd72-2ab6-457d-b357-6d14b3bc2c3e/images/72bcb410-b20e-45f2-a269-a73e14c550cf/b0cbc7df-2761-4b74-8ca2-ee311fd57bd3.meta
> (hash=hdd2-replicate-0/cache=hdd2-replicate-0)
> [2018-11-01 13:31:18.208882] I [MSGID: 109066]
> [dht-rename.c:1569:dht_rename] 0-hdd2-dht: renaming
> /5a75dd72-2ab6-457d-b357-6d14b3bc2c3e/images/f43f4dbd-14ca-49fb-98da-cb32c26b05b7/16ac297a-14e1-43d3-a4d9-f2b8d183c1e1.meta.new
> (hash=hdd2-replicate-0/cache=hdd2-replicate-0) =>
> /5a75dd72-2ab6-457d-b357-6d14b3bc2c3e/images/f43f4dbd-14ca-49fb-98da-cb32c26b05b7/16ac297a-14e1-43d3-a4d9-f2b8d183c1e1.meta
> (hash=hdd2-replicate-0/cache=hdd2-replicate-0)
> [2018-11-01 13:31:19.461991] I [MSGID: 109066]
> [dht-rename.c:1569:dht_rename] 0-hdd2-dht: renaming
> /5a75dd72-2ab6-457d-b357-6d14b3bc2c3e/images/72bcb410-b20e-45f2-a269-a73e14c550cf/b0cbc7df-2761-4b74-8ca2-ee311fd57bd3.meta.new
> (hash=hdd2-replicate-0/cache=hdd2-replicate-0) =>
> /5a75dd72-2ab6-457d-b357-6d14b3bc2c3e/images/72bcb410-b20e-45f2-a269-a73e14c550cf/b0cbc7df-2761-4b74-8ca2-ee311fd57bd3.meta
> (hash=hdd2-replicate-0/cache=hdd2-replicate-0)
> [2018-11-02 13:31:46.567693] I [MSGID: 109066]
> [dht-rename.c:1569:dht_rename] 0-hdd2-dht: renaming
> /5a75dd72-2ab6-457d-b357-6d14b3bc2c3e/images/f43f4dbd-14ca-49fb-98da-cb32c26b05b7/16ac297a-14e1-43d3-a4d9-f2b8d183c1e1.meta.new
> (hash=hdd2-replicate-0/cache=hdd2-replicate-0) =>
> /5a75dd72-2ab6-457d-b357-6d14b3bc2c3e/images/f43f4dbd-14ca-49fb-98da-cb32c26b05b7/16ac297a-14e1-43d3-a4d9-f2b8d183c1e1.meta
> (hash=hdd2-replicate-0/cache=hdd2-replicate-0)
> [2018-11-02 13:31:47.591958] I [MSGID: 109066]
> [dht-rename.c:1569:dht_rename] 0-hdd2-dht: renaming
> /5a75dd72-2ab6-457d-b357-6d14b3bc2c3e/images/72bcb410-b20e-45f2-a269-a73e14c550cf/b0cbc7df-2761-4b74-8ca2-ee311fd57bd3.meta.new
> (hash=hdd2-replicate-0/cache=hdd2-replicate-0) =>
> /5a75dd72-2ab6-457d-b357-6d14b3bc2c3e/images/72bcb410-b20e-45f2-a269-a73e14c550cf/b0cbc7df-2761-4b74-8ca2-ee311fd57bd3.meta
> (hash=hdd2-replicate-0/cache=hdd2-replicate-0)
> [2018-11-02 13:31:47.365589] I [MSGID: 109066]
> [dht-rename.c:1569:dht_rename] 0-hdd2-dht: renaming
> /5a75dd72-2ab6-457d-b357-6d14b3bc2c3e/images/f43f4dbd-14ca-49fb-98da-cb32c26b05b7/16ac297a-14e1-43d3-a4d9-f2b8d183c1e1.meta.new
> (hash=hdd2-replicate-0/cache=hdd2-replicate-0) =>
> /5a75dd72-2ab6-457d-b357-6d14b3bc2c3e/images/f43f4dbd-14ca-49fb-98da-cb32c26b05b7/16ac297a-14e1-43d3-a4d9-f2b8d183c1e1.meta
> (hash=hdd2-replicate-0/cache=hdd2-replicate-0)
> [2018-11-02 13:31:48.095968] I [MSGID: 109066]
> [dht-rename.c:1569:dht_rename] 0-hdd2-dht: renaming
> /5a75dd72-2ab6-457d-b357-6d14b3bc2c3e/images/72bcb410-b20e-45f2-a269-a73e14c550cf/b0cbc7df-2761-4b74-8ca2-ee311fd57bd3.meta.new
> (hash=hdd2-replicate-0/cache=hdd2-replicate-0) =>
> /5a75dd72-2ab6-457d-b357-6d14b3bc2c3e/images/72bcb410-b20e-45f2-a269-a73e14c550cf/b0cbc7df-2761-4b74-8ca2-ee311fd57bd3.meta
> (hash=hdd2-replicate-0/cache=hdd2-replicate-0)
>
>
> On 11/05/2018 03:53 PM, Jorick Astrego wrote:
>
> Hi Krutika,
>
> Thanks for the info.
>
> After a long time the preallocated disk has been created properly. It was
> a 1TB disk on a hdd pool so a bit of delay was expected.
>
> But it took a bit longer then expected. The disk had no 

Re: [Gluster-users] posix_handle_hard [file exists]

2018-11-05 Thread Krutika Dhananjay
I think this is because the way preallocation works is by sending lot of
writes.
In the newer version of ovirt, this is changed to use fallocate for faster
allocation.

Adding Sahina, Gobinda to help with the ovirt version number that has this
fix.

-Krutika

On Mon, Nov 5, 2018 at 8:23 PM Jorick Astrego  wrote:

> Hi Krutika,
>
> Thanks for the info.
>
> After a long time the preallocated disk has been created properly. It was
> a 1TB disk on a hdd pool so a bit of delay was expected.
>
> But it took a bit longer then expected. The disk had no other virtual
> disks on it. Is there something I can tweak or check for this?
>
> Regards, Jorick
>
> On 10/31/2018 01:10 PM, Krutika Dhananjay wrote:
>
> These log messages represent a transient state and are harmless and can be
> ignored. This happens when a lookup and mknod to create shards happen in
> parallel.
>
> Regarding the preallocated disk creation issue, could you check if there
> are any errors/warnings in the fuse mount logs (these are named as the
> hyphenated mountpoint name followed by a ".log" and are found under
> /var/log/glusterfs).
>
> -Krutika
>
>
> On Wed, Oct 31, 2018 at 4:58 PM Jorick Astrego  wrote:
>
>> Hi,
>>
>> I have the similar issues with ovirt 4.2 on a glusterfs-3.8.15 cluster.
>> This was a new volume and I created first a thin provisioned disk, then I
>> tried to create a preallocated disk but it hangs after 4MB. The only issue
>> I can find in the logs sofar are the [File exists] errors with the sharding.
>>
>>
>> The message "W [MSGID: 113096] [posix-handle.c:761:posix_handle_hard]
>> 0-hdd2-posix: link
>> /data/hdd2/brick1/.shard/6573a019-dba5-4f97-bca9-0a00ce537318.125365 ->
>> /data/hdd2/brick1/.glusterfs/16/a1/16a18a01-4f77-4c37-923d-9f0bc59f5cc7failed
>> [File exists]" repeated 2 times between [2018-10-31 10:46:33.810987] and
>> [2018-10-31 10:46:33.810988]
>> [2018-10-31 10:46:33.970949] W [MSGID: 113096]
>> [posix-handle.c:761:posix_handle_hard] 0-hdd2-posix: link
>> /data/hdd2/brick1/.shard/6573a019-dba5-4f97-bca9-0a00ce537318.125366 ->
>> /data/hdd2/brick1/.glusterfs/90/85/9085ea11-4089-4d10-8848-fa2d518fd86dfailed
>> [File exists]
>> [2018-10-31 10:46:33.970950] W [MSGID: 113096]
>> [posix-handle.c:761:posix_handle_hard] 0-hdd2-posix: link
>> /data/hdd2/brick1/.shard/6573a019-dba5-4f97-bca9-0a00ce537318.125366 ->
>> /data/hdd2/brick1/.glusterfs/90/85/9085ea11-4089-4d10-8848-fa2d518fd86dfailed
>> [File exists]
>> [2018-10-31 10:46:35.601064] W [MSGID: 113096]
>> [posix-handle.c:761:posix_handle_hard] 0-hdd2-posix: link
>> /data/hdd2/brick1/.shard/6573a019-dba5-4f97-bca9-0a00ce537318.125369 ->
>> /data/hdd2/brick1/.glusterfs/9b/eb/9bebaaac-f460-496f-b30d-aabe77bffbc8failed
>> [File exists]
>> [2018-10-31 10:46:35.601065] W [MSGID: 113096]
>> [posix-handle.c:761:posix_handle_hard] 0-hdd2-posix: link
>> /data/hdd2/brick1/.shard/6573a019-dba5-4f97-bca9-0a00ce537318.125369 ->
>> /data/hdd2/brick1/.glusterfs/9b/eb/9bebaaac-f460-496f-b30d-aabe77bffbc8failed
>> [File exists]
>> [2018-10-31 10:46:36.040564] W [MSGID: 113096]
>> [posix-handle.c:761:posix_handle_hard] 0-hdd2-posix: link
>> /data/hdd2/brick1/.shard/6573a019-dba5-4f97-bca9-0a00ce537318.125370 ->
>> /data/hdd2/brick1/.glusterfs/30/93/3093fdb6-e62c-48b8-90e7-d4d72036fb69failed
>> [File exists]
>> [2018-10-31 10:46:36.040565] W [MSGID: 113096]
>> [posix-handle.c:761:posix_handle_hard] 0-hdd2-posix: link
>> /data/hdd2/brick1/.shard/6573a019-dba5-4f97-bca9-0a00ce537318.125370 ->
>> /data/hdd2/brick1/.glusterfs/30/93/3093fdb6-e62c-48b8-90e7-d4d72036fb69failed
>> [File exists]
>> [2018-10-31 10:46:36.319247] W [MSGID: 113096]
>> [posix-handle.c:761:posix_handle_hard] 0-hdd2-posix: link
>> /data/hdd2/brick1/.shard/6573a019-dba5-4f97-bca9-0a00ce537318.125372 ->
>> /data/hdd2/brick1/.glusterfs/c3/c2/c3c272f5-50af-4e82-94bb-b76eaa7a9a39failed
>> [File exists]
>> [2018-10-31 10:46:36.319250] W [MSGID: 113096]
>> [posix-handle.c:761:posix_handle_hard] 0-hdd2-posix: link
>> /data/hdd2/brick1/.shard/6573a019-dba5-4f97-bca9-0a00ce537318.125372 ->
>> /data/hdd2/brick1/.glusterfs/c3/c2/c3c272f5-50af-4e82-94bb-b76eaa7a9a39failed
>> [File exists]
>> [2018-10-31 10:46:36.319309] E [MSGID: 113020] [posix.c:1407:posix_mknod]
>> 0-hdd2-posix: setting gfid on
>> /data/hdd2/brick1/.shard/6573a019-dba5-4f97-bca9-0a00ce537318.125372 failed
>>
>>
>> -rw-rw. 2 root root 4194304 Oct 31 11:46
>> /data/hdd2/brick1/.shard/6573a019-dba5-4f97-bca9-0a00ce537318.125366
>>
>> -rw-rw. 2 r

Re: [Gluster-users] posix_handle_hard [file exists]

2018-10-31 Thread Krutika Dhananjay
These log messages represent a transient state and are harmless and can be
ignored. This happens when a lookup and mknod to create shards happen in
parallel.

Regarding the preallocated disk creation issue, could you check if there
are any errors/warnings in the fuse mount logs (these are named as the
hyphenated mountpoint name followed by a ".log" and are found under
/var/log/glusterfs).

-Krutika


On Wed, Oct 31, 2018 at 4:58 PM Jorick Astrego  wrote:

> Hi,
>
> I have the similar issues with ovirt 4.2 on a glusterfs-3.8.15 cluster.
> This was a new volume and I created first a thin provisioned disk, then I
> tried to create a preallocated disk but it hangs after 4MB. The only issue
> I can find in the logs sofar are the [File exists] errors with the sharding.
>
>
> The message "W [MSGID: 113096] [posix-handle.c:761:posix_handle_hard]
> 0-hdd2-posix: link
> /data/hdd2/brick1/.shard/6573a019-dba5-4f97-bca9-0a00ce537318.125365 ->
> /data/hdd2/brick1/.glusterfs/16/a1/16a18a01-4f77-4c37-923d-9f0bc59f5cc7failed
> [File exists]" repeated 2 times between [2018-10-31 10:46:33.810987] and
> [2018-10-31 10:46:33.810988]
> [2018-10-31 10:46:33.970949] W [MSGID: 113096]
> [posix-handle.c:761:posix_handle_hard] 0-hdd2-posix: link
> /data/hdd2/brick1/.shard/6573a019-dba5-4f97-bca9-0a00ce537318.125366 ->
> /data/hdd2/brick1/.glusterfs/90/85/9085ea11-4089-4d10-8848-fa2d518fd86dfailed
> [File exists]
> [2018-10-31 10:46:33.970950] W [MSGID: 113096]
> [posix-handle.c:761:posix_handle_hard] 0-hdd2-posix: link
> /data/hdd2/brick1/.shard/6573a019-dba5-4f97-bca9-0a00ce537318.125366 ->
> /data/hdd2/brick1/.glusterfs/90/85/9085ea11-4089-4d10-8848-fa2d518fd86dfailed
> [File exists]
> [2018-10-31 10:46:35.601064] W [MSGID: 113096]
> [posix-handle.c:761:posix_handle_hard] 0-hdd2-posix: link
> /data/hdd2/brick1/.shard/6573a019-dba5-4f97-bca9-0a00ce537318.125369 ->
> /data/hdd2/brick1/.glusterfs/9b/eb/9bebaaac-f460-496f-b30d-aabe77bffbc8failed
> [File exists]
> [2018-10-31 10:46:35.601065] W [MSGID: 113096]
> [posix-handle.c:761:posix_handle_hard] 0-hdd2-posix: link
> /data/hdd2/brick1/.shard/6573a019-dba5-4f97-bca9-0a00ce537318.125369 ->
> /data/hdd2/brick1/.glusterfs/9b/eb/9bebaaac-f460-496f-b30d-aabe77bffbc8failed
> [File exists]
> [2018-10-31 10:46:36.040564] W [MSGID: 113096]
> [posix-handle.c:761:posix_handle_hard] 0-hdd2-posix: link
> /data/hdd2/brick1/.shard/6573a019-dba5-4f97-bca9-0a00ce537318.125370 ->
> /data/hdd2/brick1/.glusterfs/30/93/3093fdb6-e62c-48b8-90e7-d4d72036fb69failed
> [File exists]
> [2018-10-31 10:46:36.040565] W [MSGID: 113096]
> [posix-handle.c:761:posix_handle_hard] 0-hdd2-posix: link
> /data/hdd2/brick1/.shard/6573a019-dba5-4f97-bca9-0a00ce537318.125370 ->
> /data/hdd2/brick1/.glusterfs/30/93/3093fdb6-e62c-48b8-90e7-d4d72036fb69failed
> [File exists]
> [2018-10-31 10:46:36.319247] W [MSGID: 113096]
> [posix-handle.c:761:posix_handle_hard] 0-hdd2-posix: link
> /data/hdd2/brick1/.shard/6573a019-dba5-4f97-bca9-0a00ce537318.125372 ->
> /data/hdd2/brick1/.glusterfs/c3/c2/c3c272f5-50af-4e82-94bb-b76eaa7a9a39failed
> [File exists]
> [2018-10-31 10:46:36.319250] W [MSGID: 113096]
> [posix-handle.c:761:posix_handle_hard] 0-hdd2-posix: link
> /data/hdd2/brick1/.shard/6573a019-dba5-4f97-bca9-0a00ce537318.125372 ->
> /data/hdd2/brick1/.glusterfs/c3/c2/c3c272f5-50af-4e82-94bb-b76eaa7a9a39failed
> [File exists]
> [2018-10-31 10:46:36.319309] E [MSGID: 113020] [posix.c:1407:posix_mknod]
> 0-hdd2-posix: setting gfid on
> /data/hdd2/brick1/.shard/6573a019-dba5-4f97-bca9-0a00ce537318.125372 failed
>
>
> -rw-rw. 2 root root 4194304 Oct 31 11:46
> /data/hdd2/brick1/.shard/6573a019-dba5-4f97-bca9-0a00ce537318.125366
>
> -rw-rw. 2 root root 4194304 Oct 31 11:46
> /data/hdd2/brick1/.glusterfs/9b/eb/9bebaaac-f460-496f-b30d-aabe77bffbc8
>
> On 10/01/2018 12:36 PM, Jose V. Carrión wrote:
>
> Hi,
>
> I have a gluster 3.12.6-1 installation with 2 configured volumes.
>
> Several times at day , some bricks are reporting the lines below:
>
> [2018-09-30 20:36:27.348015] W [MSGID: 113096]
> [posix-handle.c:770:posix_handle_hard] 0-volumedisk0-posix: link
> /mnt/glusterfs/vol0/brick1/6349/20180921/20180921.h5 ->
> /mnt/glusterfs/vol0/brick1/.glusterfs/3b/1c/3b1c5fe1-b141-4687-8eaf-2c28f9505277failed
> [File exists]
> [2018-09-30 20:36:27.383957] E [MSGID: 113020] [posix.c:3162:posix_create]
> 0-volumedisk0-posix: setting gfid on
> /mnt/glusterfs/vol0/brick1/6349/20180921/20180921.h5 failed
>
> I can access to the /mnt/glusterfs/vol0/brick1/6349/20180921/20180921.h5
> and
> /mnt/glusterfs/vol0/brick1/.glusterfs/3b/1c/3b1c5fe1-b141-4687-8eaf-2c28f9505277,
> both files are hard links .
>
> What is the meaning of the error lines?
>
> Thanks in advance.
>
> Cheers.
>
>
> ___
> Gluster-users mailing 
> listGluster-users@gluster.orghttps://lists.gluster.org/mailman/listinfo/gluster-users
>
>
>
>
>
>
> Met vriendelijke groet, With kind regards,
>
> Jorick Astrego
>
> *Netbulae 

Re: [Gluster-users] sharding in glusterfs

2018-10-05 Thread Krutika Dhananjay
Hi,

Apologies for the late reply. My email filters are messed up, I missed
reading this.

Answers to questions around shard algorithm inline ...

On Sun, Sep 30, 2018 at 9:54 PM Ashayam Gupta 
wrote:

> Hi Pranith,
>
> Thanks for you reply, it would be helpful if you can please help us with
> the following issues with respect to sharding.
> The gluster version we are using is *glusterfs 4.1.4 *on Ubuntu 18.04.1
> LTS
>
>
>- *Shards-Creation Algo*: We were interested in understanding the way
>in which shards are distributed across bricks and nodes, is it Round-Robin
>or some other algo and can we change this mechanism using some config file.
>E.g. If we have 2 nodes with each nodes having 2 bricks , with a total
>of 4 (2*2) bricks how will the shards be distributed, will it be always
>even distribution?(Volume type in this case is plain)
>
>-  *Sharding+Distributed-Volume*: Currently we are using plain volume
>with sharding enabled and we do not see even distribution of shards across
>bricks .Can we use sharding with distributed volume to achieve evenly and
>better distribution of shards? Would be helpful if you can suggest the most
>efficient way of using sharding , our goal is to have a evenly distributed
>file system(we have large files hence using sharding) and we are not
>concerned with replication as of now.
>
> I think Raghavendra already answered the two questions above.

>
>- *Shard-Block-Size: *In case we change the
>* features.shard-block-size* value from X -> Y after lots of data has
>been populated , how does this affect the existing shards are they auto
>corrected as per the new size or do we need to run some commdands to get
>this done or is this even recommended to do the change?
>
> Existing files will retain their shard-block-size. shard-block-size is a
property of a file that is set at the time of creation of the file (in the
form of an extended attribute "trusted.glusterfs.shard.block-size") and
remains same through the lifetime of the file.

If you want the shard-block-size to be changed across these files, you'll
need to perform either of the two steps below:

1. move the existing files to a local fs from your glusterfs volume and
then move them back into the volume.
2. copy the existing files into a temporary filenames on the same volume
and rename them back to their original names.
In our tests wrt vm store workload, we've found 64MB shard-block-size to be
good fit for both IO and self-heal performance.


>- *Rebalance-Shard*: As per the docs whenever we add new server/node
>to the existing gluster we need to run Rebalance command, we would like to
>know if there are any known issues for re-balancing with sharding enabled.
>
> We did find some shard-dht inter-op issues in rebalance in the past again
in the supported vm storage use-case. The good news is that the problems
known to us have been fixed, but their validation is still pending.


> We would highly appreciate if you can point us to the latest sharding
> docs, we tried to search but could not find better than this
> https://staged-gluster-docs.readthedocs.io/en/release3.7.0beta1/Features/shard/
> .
>

The doc is still valid (except for minor changes in the To-Do list at the
bottom). But I agree, the answers to all of the questions you asked above
are well worth documenting. I'll fix this. Thanks for the feedback.
Let us know if you have any more questions or if you run into any problems.
Happy to help.
Also, since you're using a non-vm storage use case, I'd suggest that you
try shard on a test cluster first before even putting it into production. :)

-Krutika


> Thanks
> Ashayam
>
>
> On Thu, Sep 20, 2018 at 7:47 PM Pranith Kumar Karampuri <
> pkara...@redhat.com> wrote:
>
>>
>>
>> On Wed, Sep 19, 2018 at 11:37 AM Ashayam Gupta <
>> ashayam.gu...@alpha-grep.com> wrote:
>>
>>> Please find our workload details as requested by you :
>>>
>>> * Only 1 write-mount point as of now
>>> * Read-Mount : Since we auto-scale our machines this can be as big as
>>> 300-400 machines during peak times
>>> * >" multiple concurrent reads means that Reads will not happen until
>>> the file is completely written to"  Yes , in our current scenario we can
>>> ensure that indeed this is the case.
>>>
>>> But when you say it only supports single writer workload we would like
>>> to understand the following scenarios with respect to multiple writers and
>>> the current behaviour of glusterfs with sharding
>>>
>>>- Multiple Writer writes to different files
>>>
>>> When I say multiple writers, I mean multiple mounts. Since you were
>> saying earlier there is only one mount which does all writes, everything
>> should work as expected.
>>
>>>
>>>- Multiple Writer writes to same file
>>>   - they write to same file but different shards of same file
>>>   - they write to same file (no gurantee if they write to different
>>>   shards)
>>>
>>> As long as 

Re: [Gluster-users] Fwd: vm paused unknown storage error one node out of 3 only

2018-06-12 Thread Krutika Dhananjay
On Sat, Jun 9, 2018 at 9:38 AM, Dan Lavu  wrote:

> Krutika,
>
> Is it also normal for the following messages as well?
>

Yes, this should be fine. It only represents a transient state when
multiple threads/clients are trying to create the same shard at the same
time. These can be ignored.

-Krutika


> [2018-06-07 06:36:22.008492] E [MSGID: 113020] [posix.c:1395:posix_mknod]
> 0-rhev_vms-posix: setting gfid on /gluster/brick/rhev_vms/.
> shard/0ab3a16c-1d07-4153-8d01-b9b0ffd9d19b.16158 failed
> [2018-06-07 06:36:22.319735] E [MSGID: 113020] [posix.c:1395:posix_mknod]
> 0-rhev_vms-posix: setting gfid on /gluster/brick/rhev_vms/.
> shard/0ab3a16c-1d07-4153-8d01-b9b0ffd9d19b.16160 failed
> [2018-06-07 06:36:24.711800] E [MSGID: 113002] [posix.c:267:posix_lookup]
> 0-rhev_vms-posix: buf->ia_gfid is null for /gluster/brick/rhev_vms/.
> shard/0ab3a16c-1d07-4153-8d01-b9b0ffd9d19b.16177 [No data available]
> [2018-06-07 06:36:24.711839] E [MSGID: 115050]
> [server-rpc-fops.c:170:server_lookup_cbk] 0-rhev_vms-server: 32334131:
> LOOKUP /.shard/0ab3a16c-1d07-4153-8d01-b9b0ffd9d19b.16177
> (be318638-e8a0-4c6d-977d-7a937aa84806/0ab3a16c-1d07-4153-8d01-b9b0ffd9d19b.16177)
> ==> (No data available) [No data available]
>
> if so what does it mean?
>
> Dan
>
> On Tue, Aug 16, 2016 at 1:21 AM, Krutika Dhananjay 
> wrote:
>
>> Thanks, I just sent http://review.gluster.org/#/c/15161/1 to reduce the
>> log-level to DEBUG. Let's see what the maintainers have to say. :)
>>
>> -Krutika
>>
>> On Tue, Aug 16, 2016 at 5:50 AM, David Gossage <
>> dgoss...@carouselchecks.com> wrote:
>>
>>> On Mon, Aug 15, 2016 at 6:24 PM, Krutika Dhananjay 
>>> wrote:
>>>
>>>> No. The EEXIST errors are normal and can be ignored. This can happen
>>>> when multiple threads try to create the same
>>>> shard in parallel. Nothing wrong with that.
>>>>
>>>>
>>> Other than they pop up as E errors making a user worry hehe
>>>
>>> Is their a known bug filed against that or should I maybe create one to
>>> see if we can get that sent to an informational level maybe?
>>>
>>>
>>>
>>>> -Krutika
>>>>
>>>> On Tue, Aug 16, 2016 at 1:02 AM, David Gossage <
>>>> dgoss...@carouselchecks.com> wrote:
>>>>
>>>>> On Sat, Aug 13, 2016 at 6:37 AM, David Gossage <
>>>>> dgoss...@carouselchecks.com> wrote:
>>>>>
>>>>>> Here is reply again just in case.  I got quarantine message so not
>>>>>> sure if first went through or wll anytime soon.  Brick logs weren't large
>>>>>> so Ill just include as text files this time
>>>>>>
>>>>>
>>>>> Did maintenance over weekend updating ovirt from 3.6.6->3.6.7 and
>>>>> after restrating the complaining ovirt node I was able to migrate the 2 vm
>>>>> with issues.  So not sure why the mount got stale, but I imagine that one
>>>>> node couldn't see the new image files after that had occurred?
>>>>>
>>>>> Still getting a few sporadic errors, but seems much fewer than before
>>>>> and never get any corresponding notices in any other log files
>>>>>
>>>>> [2016-08-15 13:40:31.510798] E [MSGID: 113022]
>>>>> [posix.c:1245:posix_mknod] 0-GLUSTER1-posix: mknod on
>>>>> /gluster1/BRICK1/1/.shard/0e5ad95d-722d-4374-88fb-66fca0b14341.584
>>>>> failed [File exists]
>>>>> [2016-08-15 13:40:31.522067] E [MSGID: 113022]
>>>>> [posix.c:1245:posix_mknod] 0-GLUSTER1-posix: mknod on
>>>>> /gluster1/BRICK1/1/.shard/0e5ad95d-722d-4374-88fb-66fca0b14341.584
>>>>> failed [File exists]
>>>>> [2016-08-15 17:47:06.375708] E [MSGID: 113022]
>>>>> [posix.c:1245:posix_mknod] 0-GLUSTER1-posix: mknod on
>>>>> /gluster1/BRICK1/1/.shard/d5a328be-03d0-42f7-a443-248290849e7d.722
>>>>> failed [File exists]
>>>>> [2016-08-15 17:47:26.435198] E [MSGID: 113022]
>>>>> [posix.c:1245:posix_mknod] 0-GLUSTER1-posix: mknod on
>>>>> /gluster1/BRICK1/1/.shard/d5a328be-03d0-42f7-a443-248290849e7d.723
>>>>> failed [File exists]
>>>>> [2016-08-15 17:47:06.405481] E [MSGID: 113022]
>>>>> [posix.c:1245:posix_mknod] 0-GLUSTER1-posix: mknod on
>>>>> /gluster1/BRICK1/1/.shard/d5a328be-03d0-42f7-a443-248290849e7d.722
>>>>> failed [File exists]
>>>>> [2016-08-15 17:47:26.464542] E [MS

Re: [Gluster-users] Current bug for VM hosting with 3.12 ?

2018-06-12 Thread Krutika Dhananjay
Could you share the gluster brick and mount logs?

-Krutika

On Mon, Jun 11, 2018 at 2:14 PM,  wrote:

> Hi,
>
> Given the numerous problems we've had with setting up gluster for VM
> hosting at the start, we've been staying with 3.7.15, which was the
> first version to work properly.
>
> However the repo for 3.7.15 is now down, so we've decided to give
> 3.12.9 a try. Unfortunatly, a few days ago, one of our nodes rebooted
> and after a quick heal one of the VM wasn't in a great state. Didn't
> think much of it, but I'm seeing right now other VMs doing I/O errors ..
> Just like with the versions of gluster < 3.7.15, which were causing
> corruption of disk images.
>
> Are there any known bug with 3.12.9 ? Any new settings we should have
> enabled but might have missed ?
>
> Options Reconfigured:
> performance.client-io-threads: off
> nfs.disable: on
> transport.address-family: inet
> diagnostics.count-fop-hits: on
> diagnostics.latency-measurement: on
> network.ping-timeout: 30
> cluster.data-self-heal-algorithm: full
> features.shard-block-size: 64MB
> features.shard: on
> performance.stat-prefetch: off
> performance.read-ahead: off
> performance.quick-read: off
> cluster.eager-lock: enable
> network.remote-dio: enable
> cluster.server-quorum-type: server
> cluster.quorum-type: auto
> performance.readdir-ahead: on
>
> I haven't had the courrage to reboot the VM yet, guess I'll go do that
>
> --
> PGP Fingerprint : 0x624E42C734DAC346
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] [ovirt-users] Re: Gluster problems, cluster performance issues

2018-05-29 Thread Krutika Dhananjay
Adding Ravi to look into the heal issue.

As for the fsync hang and subsequent IO errors, it seems a lot like
https://bugzilla.redhat.com/show_bug.cgi?id=1497156 and Paolo Bonzini from
qemu had pointed out that this would be fixed by the following commit:

  commit e72c9a2a67a6400c8ef3d01d4c461dbbbfa0e1f0
Author: Paolo Bonzini 
Date:   Wed Jun 21 16:35:46 2017 +0200

scsi: virtio_scsi: let host do exception handling

virtio_scsi tries to do exception handling after the default 30 seconds
timeout expires.  However, it's better to let the host control the
timeout, otherwise with a heavy I/O load it is likely that an abort will
also timeout.  This leads to fatal errors like filesystems going
offline.

Disable the 'sd' timeout and allow the host to do exception handling,
following the precedent of the storvsc driver.

Hannes has a proposal to introduce timeouts in virtio, but this provides
an immediate solution for stable kernels too.

[mkp: fixed typo]

Reported-by: Douglas Miller 
Cc: "James E.J. Bottomley" 
Cc: "Martin K. Petersen" 
Cc: Hannes Reinecke 
Cc: linux-s...@vger.kernel.org
Cc: sta...@vger.kernel.org
Signed-off-by: Paolo Bonzini 
Signed-off-by: Martin K. Petersen 


Adding Paolo/Kevin to comment.

As for the poor gluster performance, could you disable cluster.eager-lock
and see if that makes any difference:

# gluster volume set  cluster.eager-lock off

Do also capture the volume profile again if you still see performance
issues after disabling eager-lock.

-Krutika


On Wed, May 30, 2018 at 6:55 AM, Jim Kusznir  wrote:

> I also finally found the following in my system log on one server:
>
> [10679.524491] INFO: task glusterclogro:14933 blocked for more than 120
> seconds.
> [10679.525826] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [10679.527144] glusterclogro   D 97209832bf40 0 14933  1
> 0x0080
> [10679.527150] Call Trace:
> [10679.527161]  [] schedule+0x29/0x70
> [10679.527218]  [] _xfs_log_force_lsn+0x2e8/0x340 [xfs]
> [10679.527225]  [] ? wake_up_state+0x20/0x20
> [10679.527254]  [] xfs_file_fsync+0x107/0x1e0 [xfs]
> [10679.527260]  [] do_fsync+0x67/0xb0
> [10679.527268]  [] ? system_call_after_swapgs+0xbc/0x160
> [10679.527271]  [] SyS_fsync+0x10/0x20
> [10679.527275]  [] system_call_fastpath+0x1c/0x21
> [10679.527279]  [] ? system_call_after_swapgs+0xc8/0x160
> [10679.527283] INFO: task glusterposixfsy:14941 blocked for more than 120
> seconds.
> [10679.528608] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [10679.529956] glusterposixfsy D 972495f84f10 0 14941  1
> 0x0080
> [10679.529961] Call Trace:
> [10679.529966]  [] schedule+0x29/0x70
> [10679.530003]  [] _xfs_log_force_lsn+0x2e8/0x340 [xfs]
> [10679.530008]  [] ? wake_up_state+0x20/0x20
> [10679.530038]  [] xfs_file_fsync+0x107/0x1e0 [xfs]
> [10679.530042]  [] do_fsync+0x67/0xb0
> [10679.530046]  [] ? system_call_after_swapgs+0xbc/0x160
> [10679.530050]  [] SyS_fdatasync+0x13/0x20
> [10679.530054]  [] system_call_fastpath+0x1c/0x21
> [10679.530058]  [] ? system_call_after_swapgs+0xc8/0x160
> [10679.530062] INFO: task glusteriotwr13:15486 blocked for more than 120
> seconds.
> [10679.531805] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [10679.533732] glusteriotwr13  D 9720a83f 0 15486  1
> 0x0080
> [10679.533738] Call Trace:
> [10679.533747]  [] schedule+0x29/0x70
> [10679.533799]  [] _xfs_log_force_lsn+0x2e8/0x340 [xfs]
> [10679.533806]  [] ? wake_up_state+0x20/0x20
> [10679.533846]  [] xfs_file_fsync+0x107/0x1e0 [xfs]
> [10679.533852]  [] do_fsync+0x67/0xb0
> [10679.533858]  [] ? system_call_after_swapgs+0xbc/0x160
> [10679.533863]  [] SyS_fdatasync+0x13/0x20
> [10679.533868]  [] system_call_fastpath+0x1c/0x21
> [10679.533873]  [] ? system_call_after_swapgs+0xc8/0x160
> [10919.512757] INFO: task glusterclogro:14933 blocked for more than 120
> seconds.
> [10919.514714] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [10919.516663] glusterclogro   D 97209832bf40 0 14933  1
> 0x0080
> [10919.516677] Call Trace:
> [10919.516690]  [] schedule+0x29/0x70
> [10919.516696]  [] schedule_timeout+0x239/0x2c0
> [10919.516703]  [] ? blk_finish_plug+0x14/0x40
> [10919.516768]  [] ? _xfs_buf_ioapply+0x334/0x460 [xfs]
> [10919.516774]  [] wait_for_completion+0xfd/0x140
> [10919.516782]  [] ? wake_up_state+0x20/0x20
> [10919.516821]  [] ? _xfs_buf_read+0x23/0x40 [xfs]
> [10919.516859]  [] xfs_buf_submit_wait+0xf9/0x1d0 [xfs]
> [10919.516902]  [] ? xfs_trans_read_buf_map+0x199/0x400
> [xfs]
> [10919.516940]  [] _xfs_buf_read+0x23/0x40 [xfs]
> [10919.516977]  [] xfs_buf_read_map+0xf9/0x160 [xfs]
> [10919.517022]  [] xfs_trans_read_buf_map+0x199/0x400
> [xfs]
> [10919.517057]  [] xfs_da_read_buf+0xd4/0x100 [xfs]
> [10919.517091]  [] xfs_da3_node_read+0x23/0xd0 [xfs]
> 

Re: [Gluster-users] Reconstructing files from shards

2018-04-26 Thread Krutika Dhananjay
The short answer is - no there exists no script currently that can piece
the shards together into a single file.

Long answer:
IMO the safest way to convert from sharded to a single file _is_ by copying
the data out into a new volume at the moment.
Picking up the files from the individual bricks directly and joining them,
although fast, is a strict no-no for many reasons - for example, when you
have a replicated volume
and the good copy needs to be carefully selected and must remain a good
copy through the course of the copying process. There could be other
consistency issues with
file attributes changing while they are being copied. All of this is not
possible, unless you're open to taking the volume down.

Then the other option is to have gluster client (perhaps in the shard
translator itself)) do the conversion in the background within the gluster
translator stack, which is safer
but would require that shard lock it until the copying is complete. And
until then no IO can happen into this file.
(I haven't found the time to work on this, as there exists a workaround and
I've been busy with other tasks. If anyone wants to volunteer to get this
done, I'll be happy to help).

But anway, why is copying data into new unsharded volume disruptive for you?

-Krutika


On Sat, Apr 21, 2018 at 1:14 AM, Jamie Lawrence 
wrote:

> Hello,
>
> So I have a volume on a gluster install (3.12.5) on which sharding was
> enabled at some point recently. (Don't know how it happened, it may have
> been an accidental run of an old script.) So it has been happily sharding
> behind our backs and it shouldn't have.
>
> I'd like to turn sharding off and reverse the files back to normal.  Some
> of these are sparse files, so I need to account for holes. There are more
> than enough that I need to write a tool to do it.
>
> I saw notes ca. 3.7 saying the only way to do it was to read-off on the
> client-side, blow away the volume and start over. This would be extremely
> disruptive for us, and language I've seen reading tickets and old messages
> to this list make me think that isn't needed anymore, but confirmation of
> that would be good.
>

> The only discussion I can find are these videos[1]:
> http://opensource-storage.blogspot.com/2016/07/de-
> mystifying-gluster-shards.html , and some hints[2] that are old enough
> that I don't trust them without confirmation that nothing's changed. The
> video things don't acknowledge the existence of file holes. Also, the hint
> in [2] mentions using trusted.glusterfs.shard.file-size to get the size
> of a partly filled hole; that value looks like base64, but when I attempt
> to decode it, base64 complains about invalid input.
>
> In short, I can't find sufficient information to reconstruct these. Has
> anyone written a current, step-by-step guide on reconstructing sharded
> files? Or has someone has written a tool so I don't have to?
>
> Thanks,
>
> -j
>
>
> [1] Why one would choose to annoy the crap out of their fellow gluster
> users by using video to convey about 80 bytes of ASCII-encoded information,
> I have no idea.
> [2] http://lists.gluster.org/pipermail/gluster-devel/2017-
> March/052212.html
>
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Is the size of bricks limiting the size of files I can store?

2018-04-13 Thread Krutika Dhananjay
Sorry about the late reply, I missed seeing your mail.

To begin with, what is your use-case? Sharding is currently supported only
for virtual machine image storage use-case.
It *could* work in other single-writer use-cases but it's only tested
thoroughly for the vm use-case.
If yours is not a vm store use-case, you might want to do some tests first
to see if it works fine.
If you find any issues, you can raise a bug. I'll be more than happy to fix
them.


On Fri, Apr 13, 2018 at 1:19 AM, Andreas Davour  wrote:

> On Tue, 3 Apr 2018, Raghavendra Gowdappa wrote:
>
> On Mon, Apr 2, 2018 at 11:37 PM, Andreas Davour  wrote:
>>
>> On Mon, 2 Apr 2018, Nithya Balachandran wrote:
>>>
>>> On 2 April 2018 at 14:48, Andreas Davour  wrote:
>>>


 Hi
>
> I've found something that works so weird I'm certain I have missed how
> gluster is supposed to be used, but I can not figure out how. This is
> my
> scenario.
>
> I have a volume, created from 16 nodes, each with a brick of the same
> size. The total of that volume thus is in the Terabyte scale. It's a
> distributed volume with a replica count of 2.
>
> The filesystem when mounted on the clients is not even close to getting
> full, as displayed by 'df'.
>
> But, when one of my users try to copy a file from another network
> storage
> to the gluster volume, he gets a 'filesystem full' error. What
> happened?
> I
> looked at the bricks and figured out that one big file had ended up on
> a
> brick that was half full or so, and the big file did not fit in the
> space
> that was left on that brick.
>
> Hi,

 This is working as expected. As files are not split up (unless you are
 using shards) the size of the file is restricted by the size of the
 individual bricks.


>>> Thanks a lot for that definitive answer. Is there a way to manage this?
>>> Can you shard just those files, making them replicated in the process?
>>>
>>
Is your question about whether you can shard just that big file that caused
space to run out and keep the rest of the files unsharded?
This is a bit tricky. From the time you enable sharding on your volume, all
newly created shards  will get sharded once their size
exceeds features.shard-block-size value (which is configurable) because
it's a volume-wide option.

As for volumes which have pre-existing data even before shard is enabled,
for you to shard them, you'll need to perform either of the two steps below:

1. move the existing file to a local fs from your glusterfs volume and then
move it back into the volume.
2. copy the existing file into a temporary file on the same volume and
rename the file back to its original name.

-Krutika



>>>
>> +Krutika, xlator/shard maintainer for the answer.
>>
>>
>> I just can't have users see 15TB free and fail copying a 15GB file. They
>>> will show me the bill they paid for those "disks" and flay me.
>>>
>>
> Any input on that Krutika?
>
> /andreas
>
> --
> "economics is a pseudoscience; the astrology of our time"
> Kim Stanley Robinson
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Enable sharding on active volume

2018-04-04 Thread Krutika Dhananjay
On Thu, Apr 5, 2018 at 7:33 AM, Ian Halliday  wrote:

> Hello,
>
> I wanted to post this as a question to the group before we go launch it in
> a test environment. Will Gluster handle enabling sharding on an existing
> distributed-replicated environment, and is it safe to do?
>
Yes it's safe but it would mean that only the files that are created since
shard was enabled would be sharded, not the existing files.
If you want to shard the existing files, there are a couple of things you
can do:

1. move the existing file to a local fs from your glusterfs volume and then
move it back into the volume.
2. copy the existing file into a temporary file on the same volume and
rename the file back to its original name.

You could try both on two test vms and go with the faster of the two
approaches. And either way, you could do this one vm at a time.

-Krutika

The environment in question is a VM image storage cluster with some disk
> files starting to grow beyond the size of some of the smaller bricks.
>
> -- Ian
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Sharding problem - multiple shard copies with mismatching gfids

2018-03-26 Thread Krutika Dhananjay
The gfid mismatch here is between the shard and its "link-to" file, the
creation of which happens at a layer below that of shard translator on the
stack.

Adding DHT devs to take a look.

-Krutika

On Mon, Mar 26, 2018 at 1:09 AM, Ian Halliday  wrote:

> Hello all,
>
> We are having a rather interesting problem with one of our VM storage
> systems. The GlusterFS client is throwing errors relating to GFID
> mismatches. We traced this down to multiple shards being present on the
> gluster nodes, with different gfids.
>
> Hypervisor gluster mount log:
>
> [2018-03-25 18:54:19.261733] E [MSGID: 133010] 
> [shard.c:1724:shard_common_lookup_shards_cbk]
> 0-ovirt-zone1-shard: Lookup on shard 7 failed. Base file gfid =
> 87137cac-49eb-492a-8f33-8e33470d8cb7 [Stale file handle]
> The message "W [MSGID: 109009] [dht-common.c:2162:dht_lookup_linkfile_cbk]
> 0-ovirt-zone1-dht: /.shard/87137cac-49eb-492a-8f33-8e33470d8cb7.7: gfid
> different on data file on ovirt-zone1-replicate-3, gfid local =
> ----, gfid node = 
> 57c6fcdf-52bb-4f7a-aea4-02f0dc81ff56
> " repeated 2 times between [2018-03-25 18:54:19.253748] and [2018-03-25
> 18:54:19.263576]
> [2018-03-25 18:54:19.264349] W [MSGID: 109009]
> [dht-common.c:1901:dht_lookup_everywhere_cbk] 0-ovirt-zone1-dht:
> /.shard/87137cac-49eb-492a-8f33-8e33470d8cb7.7: gfid differs on subvolume
> ovirt-zone1-replicate-3, gfid local = fdf0813b-718a-4616-a51b-6999ebba9ec3,
> gfid node = 57c6fcdf-52bb-4f7a-aea4-02f0dc81ff56
>
>
> On the storage nodes, we found this:
>
> [root@n1 gluster]# find -name 87137cac-49eb-492a-8f33-8e33470d8cb7.7
> ./brick2/brick/.shard/87137cac-49eb-492a-8f33-8e33470d8cb7.7
> ./brick4/brick/.shard/87137cac-49eb-492a-8f33-8e33470d8cb7.7
>
> [root@n1 gluster]# ls -lh ./brick2/brick/.shard/87137cac-49eb-492a-8f33-
> 8e33470d8cb7.7
> -T. 2 root root 0 Mar 25 13:55 ./brick2/brick/.shard/
> 87137cac-49eb-492a-8f33-8e33470d8cb7.7
> [root@n1 gluster]# ls -lh ./brick4/brick/.shard/87137cac-49eb-492a-8f33-
> 8e33470d8cb7.7
> -rw-rw. 2 root root 3.8G Mar 25 13:55 ./brick4/brick/.shard/
> 87137cac-49eb-492a-8f33-8e33470d8cb7.7
>
> [root@n1 gluster]# getfattr -d -m . -e hex ./brick2/brick/.shard/
> 87137cac-49eb-492a-8f33-8e33470d8cb7.7
> # file: brick2/brick/.shard/87137cac-49eb-492a-8f33-8e33470d8cb7.7
> security.selinux=0x73797374656d5f753a6f626a6563
> 745f723a756e6c6162656c65645f743a733000
> trusted.gfid=0xfdf0813b718a4616a51b6999ebba9ec3
> trusted.glusterfs.dht.linkto=0x6f766972742d3335302d7a6f6e65
> 312d7265706c69636174652d3300
>
> [root@n1 gluster]# getfattr -d -m . -e hex ./brick4/brick/.shard/
> 87137cac-49eb-492a-8f33-8e33470d8cb7.7
> # file: brick4/brick/.shard/87137cac-49eb-492a-8f33-8e33470d8cb7.7
> security.selinux=0x73797374656d5f753a6f626a6563
> 745f723a756e6c6162656c65645f743a733000
> trusted.afr.dirty=0x
> trusted.bit-rot.version=0x02005991419ce672
> trusted.gfid=0x57c6fcdf52bb4f7aaea402f0dc81ff56
>
>
> I'm wondering how they got created in the first place, and if anyone has
> any insight on how to fix it?
>
> Storage nodes:
> [root@n1 gluster]# gluster --version
> glusterfs 4.0.0
>
> [root@n1 gluster]# gluster volume info
>
> Volume Name: ovirt-350-zone1
> Type: Distributed-Replicate
> Volume ID: 106738ed-9951-4270-822e-63c9bcd0a20e
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 7 x (2 + 1) = 21
> Transport-type: tcp
> Bricks:
> Brick1: 10.0.6.100:/gluster/brick1/brick
> Brick2: 10.0.6.101:/gluster/brick1/brick
> Brick3: 10.0.6.102:/gluster/arbrick1/brick (arbiter)
> Brick4: 10.0.6.100:/gluster/brick2/brick
> Brick5: 10.0.6.101:/gluster/brick2/brick
> Brick6: 10.0.6.102:/gluster/arbrick2/brick (arbiter)
> Brick7: 10.0.6.100:/gluster/brick3/brick
> Brick8: 10.0.6.101:/gluster/brick3/brick
> Brick9: 10.0.6.102:/gluster/arbrick3/brick (arbiter)
> Brick10: 10.0.6.100:/gluster/brick4/brick
> Brick11: 10.0.6.101:/gluster/brick4/brick
> Brick12: 10.0.6.102:/gluster/arbrick4/brick (arbiter)
> Brick13: 10.0.6.100:/gluster/brick5/brick
> Brick14: 10.0.6.101:/gluster/brick5/brick
> Brick15: 10.0.6.102:/gluster/arbrick5/brick (arbiter)
> Brick16: 10.0.6.100:/gluster/brick6/brick
> Brick17: 10.0.6.101:/gluster/brick6/brick
> Brick18: 10.0.6.102:/gluster/arbrick6/brick (arbiter)
> Brick19: 10.0.6.100:/gluster/brick7/brick
> Brick20: 10.0.6.101:/gluster/brick7/brick
> Brick21: 10.0.6.102:/gluster/arbrick7/brick (arbiter)
> Options Reconfigured:
> cluster.min-free-disk: 50GB
> performance.strict-write-ordering: off
> performance.strict-o-direct: off
> nfs.disable: off
> performance.readdir-ahead: on
> transport.address-family: inet
> performance.cache-size: 1GB
> features.shard: on
> features.shard-block-size: 5GB
> server.event-threads: 8
> server.outstanding-rpc-limit: 128
> storage.owner-uid: 36
> storage.owner-gid: 36
> performance.quick-read: off
> performance.read-ahead: off
> performance.io-cache: off
> performance.stat-prefetch: on
> 

Re: [Gluster-users] Stale locks on shards

2018-01-22 Thread Krutika Dhananjay
On Mon, Jan 22, 2018 at 12:33 AM, Samuli Heinonen 
wrote:

> Hi again,
>
> here is more information regarding issue described earlier
>
> It looks like self healing is stuck. According to "heal statistics" crawl
> began at Sat Jan 20 12:56:19 2018 and it's still going on (It's around Sun
> Jan 21 20:30 when writing this). However glustershd.log says that last heal
> was completed at "2018-01-20 11:00:13.090697" (which is 13:00 UTC+2). Also
> "heal info" has been running now for over 16 hours without any information.
> In statedump I can see that storage nodes have locks on files and some of
> those are blocked. Ie. Here again it says that ovirt8z2 is having active
> lock even ovirt8z2 crashed after the lock was granted.:
>
> [xlator.features.locks.zone2-ssd1-vmstor1-locks.inode]
> path=/.shard/3d55f8cc-cda9-489a-b0a3-fd0f43d67876.27
> mandatory=0
> inodelk-count=3
> lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:self-heal
> inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=0, pid =
> 18446744073709551610, owner=d0c6d857a87f, client=0x7f885845efa0,
> connection-id=sto2z2.xxx-10975-2018/01/20-10:56:14:
> 649541-zone2-ssd1-vmstor1-client-0-0-0, granted at 2018-01-20 10:59:52
> lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:metadata
> lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0
> inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=0, pid =
> 3420, owner=d8b9372c397f, client=0x7f8858410be0,
> connection-id=ovirt8z2.xxx.com-5652-2017/12/27-09:49:02:
> 946825-zone2-ssd1-vmstor1-client-0-7-0, granted at 2018-01-20 08:57:23
> inodelk.inodelk[1](BLOCKED)=type=WRITE, whence=0, start=0, len=0, pid =
> 18446744073709551610, owner=d0c6d857a87f, client=0x7f885845efa0,
> connection-id=sto2z2.xxx-10975-2018/01/20-10:56:14:
> 649541-zone2-ssd1-vmstor1-client-0-0-0, blocked at 2018-01-20 10:59:52
>
> I'd also like to add that volume had arbiter brick before crash happened.
> We decided to remove it because we thought that it was causing issues.
> However now I think that this was unnecessary. After the crash arbiter logs
> had lots of messages like this:
> [2018-01-20 10:19:36.515717] I [MSGID: 115072] 
> [server-rpc-fops.c:1640:server_setattr_cbk]
> 0-zone2-ssd1-vmstor1-server: 37374187: SETATTR
>  
> (a52055bd-e2e9-42dd-92a3-e96b693bcafe)
> ==> (Operation not permitted) [Operation not permitted]
>
> Is there anyways to force self heal to stop? Any help would be very much
> appreciated :)
>

The locks are contending in afr self-heal and data path domains. It's
possible that the deadlock is not caused by the hypervisor as if that were
the case, the locks should have been released when it crashed/disconnected.

Adding AFR devs to check what's causing the deadlock in the first place.

-Krutika



>
> Best regards,
> Samuli Heinonen
>
>
>
>
>
> Samuli Heinonen 
> 20 January 2018 at 21.57
> Hi all!
>
> One hypervisor on our virtualization environment crashed and now some of
> the VM images cannot be accessed. After investigation we found out that
> there was lots of images that still had active lock on crashed hypervisor.
> We were able to remove locks from "regular files", but it doesn't seem
> possible to remove locks from shards.
>
> We are running GlusterFS 3.8.15 on all nodes.
>
> Here is part of statedump that shows shard having active lock on crashed
> node:
> [xlator.features.locks.zone2-ssd1-vmstor1-locks.inode]
> path=/.shard/75353c17-d6b8-485d-9baf-fd6c700e39a1.21
> mandatory=0
> inodelk-count=1
> lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:metadata
> lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0:self-heal
> lock-dump.domain.domain=zone2-ssd1-vmstor1-replicate-0
> inodelk.inodelk[0](ACTIVE)=type=WRITE, whence=0, start=0, len=0, pid =
> 3568, owner=14ce372c397f, client=0x7f3198388770, connection-id
> ovirt8z2.xxx-5652-2017/12/27-09:49:02:946825-zone2-ssd1-vmstor1-client-1-7-0,
> granted at 2018-01-20 08:57:24
>
> If we try to run clear-locks we get following error message:
> # gluster volume clear-locks zone2-ssd1-vmstor1 
> /.shard/75353c17-d6b8-485d-9baf-fd6c700e39a1.21
> kind all inode
> Volume clear-locks unsuccessful
> clear-locks getxattr command failed. Reason: Operation not permitted
>
> Gluster vol info if needed:
> Volume Name: zone2-ssd1-vmstor1
> Type: Replicate
> Volume ID: b6319968-690b-4060-8fff-b212d2295208
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 1 x 2 = 2
> Transport-type: rdma
> Bricks:
> Brick1: sto1z2.xxx:/ssd1/zone2-vmstor1/export
> Brick2: sto2z2.xxx:/ssd1/zone2-vmstor1/export
> Options Reconfigured:
> cluster.shd-wait-qlength: 1
> cluster.shd-max-threads: 8
> cluster.locking-scheme: granular
> performance.low-prio-threads: 32
> cluster.data-self-heal-algorithm: full
> performance.client-io-threads: off
> storage.linux-aio: off
> performance.readdir-ahead: on
> client.event-threads: 16
> server.event-threads: 16
> 

Re: [Gluster-users] [Possibile SPAM] Re: Problem with Gluster 3.12.4, VM and sharding

2018-01-18 Thread Krutika Dhananjay
Thanks for that input. Adding Niels since the issue is reproducible only
with libgfapi.

-Krutika

On Thu, Jan 18, 2018 at 1:39 PM, Ing. Luca Lazzeroni - Trend Servizi Srl <
l...@gvnet.it> wrote:

> Another update.
>
> I've setup a replica 3 volume without sharding and tried to install a VM
> on a qcow2 volume on that device; however the result is the same and the vm
> image has been corrupted, exactly at the same point.
>
> Here's the volume info of the create volume:
>
> Volume Name: gvtest
> Type: Replicate
> Volume ID: e2ddf694-ba46-4bc7-bc9c-e30803374e9d
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 1 x 3 = 3
> Transport-type: tcp
> Bricks:
> Brick1: gluster1:/bricks/brick1/gvtest
> Brick2: gluster2:/bricks/brick1/gvtest
> Brick3: gluster3:/bricks/brick1/gvtest
> Options Reconfigured:
> user.cifs: off
> features.shard: off
> cluster.shd-wait-qlength: 1
> cluster.shd-max-threads: 8
> cluster.locking-scheme: granular
> cluster.data-self-heal-algorithm: full
> cluster.server-quorum-type: server
> cluster.quorum-type: auto
> cluster.eager-lock: enable
> network.remote-dio: enable
> performance.low-prio-threads: 32
> performance.io-cache: off
> performance.read-ahead: off
> performance.quick-read: off
> transport.address-family: inet
> nfs.disable: on
> performance.client-io-threads: off
>
>
> Il 17/01/2018 14:51, Ing. Luca Lazzeroni - Trend Servizi Srl ha scritto:
>
> Hi,
>
> after our IRC chat I've rebuilt a virtual machine with FUSE based virtual
> disk. Everything worked flawlessly.
>
> Now I'm sending you the output of the requested getfattr command on the
> disk image:
>
> # file: TestFUSE-vda.qcow2
> trusted.afr.dirty=0x
> trusted.gfid=0x40ffafbbe987445692bb31295fa40105
> trusted.gfid2path.dc9dde61f0b77eab=0x31326533323631662d373839332d
> 346262302d383738632d3966623765306232336263652f54657374465553
> 452d7664612e71636f7732
> trusted.glusterfs.shard.block-size=0x0400
> trusted.glusterfs.shard.file-size=0xc153
> 0060be90
>
> Hope this helps.
>
>
>
> Il 17/01/2018 11:37, Ing. Luca Lazzeroni - Trend Servizi Srl ha scritto:
>
> I actually use FUSE and it works. If i try to use "libgfapi" direct
> interface to gluster in qemu-kvm, the problem appears.
>
>
>
> Il 17/01/2018 11:35, Krutika Dhananjay ha scritto:
>
> Really? Then which protocol exactly do you see this issue with? libgfapi?
> NFS?
>
> -Krutika
>
> On Wed, Jan 17, 2018 at 3:59 PM, Ing. Luca Lazzeroni - Trend Servizi Srl <
> l...@gvnet.it> wrote:
>
>> Of course. Here's the full log. Please, note that in FUSE mode everything
>> works apparently without problems. I've installed 4 vm and updated them
>> without problems.
>>
>>
>>
>> Il 17/01/2018 11:00, Krutika Dhananjay ha scritto:
>>
>>
>>
>> On Tue, Jan 16, 2018 at 10:47 PM, Ing. Luca Lazzeroni - Trend Servizi Srl
>> <l...@gvnet.it> wrote:
>>
>>> I've made the test with raw image format (preallocated too) and the
>>> corruption problem is still there (but without errors in bricks' log file).
>>>
>>> What does the "link" error in bricks log files means ?
>>>
>>> I've seen the source code looking for the lines where it happens and it
>>> seems a warning (it doesn't imply a failure).
>>>
>>
>> Indeed, it only represents a transient state when the shards are created
>> for the first time and does not indicate a failure.
>> Could you also get the logs of the gluster fuse mount process? It should
>> be under /var/log/glusterfs of your client machine with the filename as a
>> hyphenated mount point path.
>>
>> For example, if your volume was mounted at /mnt/glusterfs, then your log
>> file would be named mnt-glusterfs.log.
>>
>> -Krutika
>>
>>
>>>
>>> Il 16/01/2018 17:39, Ing. Luca Lazzeroni - Trend Servizi Srl ha scritto:
>>>
>>> An update:
>>>
>>> I've tried, for my tests, to create the vm volume as
>>>
>>> qemu-img create -f qcow2 -o preallocation=full
>>> gluster://gluster1/Test/Test-vda.img 20G
>>>
>>> et voila !
>>>
>>> No errors at all, neither in bricks' log file (the "link failed" message
>>> disappeared), neither in VM (no corruption and installed succesfully).
>>>
>>> I'll do another test with a fully preallocated raw image.
>>>
>>>
>>>
>>> Il 16/01/2018 16:31, Ing. Luca Lazzeroni - Trend Servizi Srl

Re: [Gluster-users] [Possibile SPAM] Re: Problem with Gluster 3.12.4, VM and sharding

2018-01-17 Thread Krutika Dhananjay
Really? Then which protocol exactly do you see this issue with? libgfapi?
NFS?

-Krutika

On Wed, Jan 17, 2018 at 3:59 PM, Ing. Luca Lazzeroni - Trend Servizi Srl <
l...@gvnet.it> wrote:

> Of course. Here's the full log. Please, note that in FUSE mode everything
> works apparently without problems. I've installed 4 vm and updated them
> without problems.
>
>
>
> Il 17/01/2018 11:00, Krutika Dhananjay ha scritto:
>
>
>
> On Tue, Jan 16, 2018 at 10:47 PM, Ing. Luca Lazzeroni - Trend Servizi Srl
> <l...@gvnet.it> wrote:
>
>> I've made the test with raw image format (preallocated too) and the
>> corruption problem is still there (but without errors in bricks' log file).
>>
>> What does the "link" error in bricks log files means ?
>>
>> I've seen the source code looking for the lines where it happens and it
>> seems a warning (it doesn't imply a failure).
>>
>
> Indeed, it only represents a transient state when the shards are created
> for the first time and does not indicate a failure.
> Could you also get the logs of the gluster fuse mount process? It should
> be under /var/log/glusterfs of your client machine with the filename as a
> hyphenated mount point path.
>
> For example, if your volume was mounted at /mnt/glusterfs, then your log
> file would be named mnt-glusterfs.log.
>
> -Krutika
>
>
>>
>> Il 16/01/2018 17:39, Ing. Luca Lazzeroni - Trend Servizi Srl ha scritto:
>>
>> An update:
>>
>> I've tried, for my tests, to create the vm volume as
>>
>> qemu-img create -f qcow2 -o preallocation=full
>> gluster://gluster1/Test/Test-vda.img 20G
>>
>> et voila !
>>
>> No errors at all, neither in bricks' log file (the "link failed" message
>> disappeared), neither in VM (no corruption and installed succesfully).
>>
>> I'll do another test with a fully preallocated raw image.
>>
>>
>>
>> Il 16/01/2018 16:31, Ing. Luca Lazzeroni - Trend Servizi Srl ha scritto:
>>
>> I've just done all the steps to reproduce the problem.
>>
>> Tha VM volume has been created via "qemu-img create -f qcow2
>> Test-vda2.qcow2 20G" on the gluster volume mounted via FUSE. I've tried
>> also to create the volume with preallocated metadata, which moves the
>> problem a bit far away (in time). The volume is a replice 3 arbiter 1
>> volume hosted on XFS bricks.
>>
>> Here are the informations:
>>
>> [root@ovh-ov1 bricks]# gluster volume info gv2a2
>>
>> Volume Name: gv2a2
>> Type: Replicate
>> Volume ID: 83c84774-2068-4bfc-b0b9-3e6b93705b9f
>> Status: Started
>> Snapshot Count: 0
>> Number of Bricks: 1 x (2 + 1) = 3
>> Transport-type: tcp
>> Bricks:
>> Brick1: gluster1:/bricks/brick2/gv2a2
>> Brick2: gluster3:/bricks/brick3/gv2a2
>> Brick3: gluster2:/bricks/arbiter_brick_gv2a2/gv2a2 (arbiter)
>> Options Reconfigured:
>> storage.owner-gid: 107
>> storage.owner-uid: 107
>> user.cifs: off
>> features.shard: on
>> cluster.shd-wait-qlength: 1
>> cluster.shd-max-threads: 8
>> cluster.locking-scheme: granular
>> cluster.data-self-heal-algorithm: full
>> cluster.server-quorum-type: server
>> cluster.quorum-type: auto
>> cluster.eager-lock: enable
>> network.remote-dio: enable
>> performance.low-prio-threads: 32
>> performance.io-cache: off
>> performance.read-ahead: off
>> performance.quick-read: off
>> transport.address-family: inet
>> nfs.disable: off
>> performance.client-io-threads: off
>>
>> /var/log/glusterfs/glusterd.log:
>>
>> [2018-01-15 14:17:50.196228] I [MSGID: 106488]
>> [glusterd-handler.c:1548:__glusterd_handle_cli_get_volume] 0-management:
>> Received get vol req
>> [2018-01-15 14:25:09.555214] I [MSGID: 106488]
>> [glusterd-handler.c:1548:__glusterd_handle_cli_get_volume] 0-management:
>> Received get vol req
>>
>> (empty because today it's 2018-01-16)
>>
>> /var/log/glusterfs/glustershd.log:
>>
>> [2018-01-14 02:23:02.731245] I [glusterfsd-mgmt.c:1821:mgmt_getspec_cbk]
>> 0-glusterfs: No change in volfile,continuing
>>
>> (empty too)
>>
>> /var/log/glusterfs/bricks/brick-brick2-gv2a2.log (the interested volume):
>>
>> [2018-01-16 15:14:37.809965] I [MSGID: 115029]
>> [server-handshake.c:793:server_setvolume] 0-gv2a2-server: accepted
>> client from ovh-ov1-10302-2018/01/16-15:14:37:790306-gv2a2-client-0-0-0
>> (version: 3.12.4)
>> [2018-01-16 15:16:41.471751] E [MSGID: 113020] [posix.c:1485:posix_mknod]
>> 0-gv2a2-posix: 

Re: [Gluster-users] [Possibile SPAM] Re: Problem with Gluster 3.12.4, VM and sharding

2018-01-17 Thread Krutika Dhananjay
5:17:04.129593] W [MSGID: 113096] 
> [posix-handle.c:770:posix_handle_hard]
> 0-gv2a2-posix: link 
> /bricks/brick2/gv2a2/.shard/62335cb9-c7b5-4735-a879-59cff93fe622.8
> -> 
> /bricks/brick2/gv2a2/.glusterfs/dc/92/dc92bd0a-0d46-4826-a4c9-d073a924dd8dfailed
> [File exists]
> The message "W [MSGID: 113096] [posix-handle.c:770:posix_handle_hard]
> 0-gv2a2-posix: link 
> /bricks/brick2/gv2a2/.shard/62335cb9-c7b5-4735-a879-59cff93fe622.8
> -> 
> /bricks/brick2/gv2a2/.glusterfs/dc/92/dc92bd0a-0d46-4826-a4c9-d073a924dd8dfailed
> [File exists]" repeated 5 times between [2018-01-16 15:17:04.129593] and
> [2018-01-16 15:17:04.129593]
> [2018-01-16 15:17:04.129661] E [MSGID: 113020] [posix.c:1485:posix_mknod]
> 0-gv2a2-posix: setting gfid on /bricks/brick2/gv2a2/.shard/
> 62335cb9-c7b5-4735-a879-59cff93fe622.8 failed
> [2018-01-16 15:17:08.279162] W [MSGID: 113096] 
> [posix-handle.c:770:posix_handle_hard]
> 0-gv2a2-posix: link 
> /bricks/brick2/gv2a2/.shard/62335cb9-c7b5-4735-a879-59cff93fe622.9
> -> 
> /bricks/brick2/gv2a2/.glusterfs/c9/b7/c9b71b00-a09f-4df1-b874-041820ca8241failed
> [File exists]
> [2018-01-16 15:17:08.279162] W [MSGID: 113096] 
> [posix-handle.c:770:posix_handle_hard]
> 0-gv2a2-posix: link 
> /bricks/brick2/gv2a2/.shard/62335cb9-c7b5-4735-a879-59cff93fe622.9
> -> 
> /bricks/brick2/gv2a2/.glusterfs/c9/b7/c9b71b00-a09f-4df1-b874-041820ca8241failed
> [File exists]
> The message "W [MSGID: 113096] [posix-handle.c:770:posix_handle_hard]
> 0-gv2a2-posix: link 
> /bricks/brick2/gv2a2/.shard/62335cb9-c7b5-4735-a879-59cff93fe622.9
> -> 
> /bricks/brick2/gv2a2/.glusterfs/c9/b7/c9b71b00-a09f-4df1-b874-041820ca8241failed
> [File exists]" repeated 2 times between [2018-01-16 15:17:08.279162] and
> [2018-01-16 15:17:08.279162]
>
> [2018-01-16 15:17:08.279177] E [MSGID: 113020] [posix.c:1485:posix_mknod]
> 0-gv2a2-posix: setting gfid on /bricks/brick2/gv2a2/.shard/
> 62335cb9-c7b5-4735-a879-59cff93fe622.9 failed
> The message "W [MSGID: 113096] [posix-handle.c:770:posix_handle_hard]
> 0-gv2a2-posix: link 
> /bricks/brick2/gv2a2/.shard/62335cb9-c7b5-4735-a879-59cff93fe622.4
> -> 
> /bricks/brick2/gv2a2/.glusterfs/a0/14/a0144df3-8d89-4aed-872e-5fef141e9e1efailed
> [File exists]" repeated 6 times between [2018-01-16 15:16:41.471745] and
> [2018-01-16 15:16:41.471807]
> The message "W [MSGID: 113096] [posix-handle.c:770:posix_handle_hard]
> 0-gv2a2-posix: link 
> /bricks/brick2/gv2a2/.shard/62335cb9-c7b5-4735-a879-59cff93fe622.5
> -> 
> /bricks/brick2/gv2a2/.glusterfs/eb/04/eb044e6e-3a23-40a4-9ce1-f13af148eb67failed
> [File exists]" repeated 2 times between [2018-01-16 15:16:42.593392] and
> [2018-01-16 15:16:42.593430]
> [2018-01-16 15:17:32.229689] W [MSGID: 113096] 
> [posix-handle.c:770:posix_handle_hard]
> 0-gv2a2-posix: link 
> /bricks/brick2/gv2a2/.shard/62335cb9-c7b5-4735-a879-59cff93fe622.14
> -> 
> /bricks/brick2/gv2a2/.glusterfs/53/04/530449fa-d698-4928-a262-9a0234232323failed
> [File exists]
> [2018-01-16 15:17:32.229720] E [MSGID: 113020] [posix.c:1485:posix_mknod]
> 0-gv2a2-posix: setting gfid on /bricks/brick2/gv2a2/.shard/
> 62335cb9-c7b5-4735-a879-59cff93fe622.14 failed
> [2018-01-16 15:18:07.154330] W [MSGID: 113096] 
> [posix-handle.c:770:posix_handle_hard]
> 0-gv2a2-posix: link 
> /bricks/brick2/gv2a2/.shard/62335cb9-c7b5-4735-a879-59cff93fe622.17
> -> 
> /bricks/brick2/gv2a2/.glusterfs/81/96/8196dd19-84bc-4c3d-909f-8792e9b4929dfailed
> [File exists]
> [2018-01-16 15:18:07.154375] E [MSGID: 113020] [posix.c:1485:posix_mknod]
> 0-gv2a2-posix: setting gfid on /bricks/brick2/gv2a2/.shard/
> 62335cb9-c7b5-4735-a879-59cff93fe622.17 failed
> The message "W [MSGID: 113096] [posix-handle.c:770:posix_handle_hard]
> 0-gv2a2-posix: link 
> /bricks/brick2/gv2a2/.shard/62335cb9-c7b5-4735-a879-59cff93fe622.14
> -> 
> /bricks/brick2/gv2a2/.glusterfs/53/04/530449fa-d698-4928-a262-9a0234232323failed
> [File exists]" repeated 7 times between [2018-01-16 15:17:32.229689] and
> [2018-01-16 15:17:32.229806]
> The message "W [MSGID: 113096] [posix-handle.c:770:posix_handle_hard]
> 0-gv2a2-posix: link 
> /bricks/brick2/gv2a2/.shard/62335cb9-c7b5-4735-a879-59cff93fe622.17
> -> 
> /bricks/brick2/gv2a2/.glusterfs/81/96/8196dd19-84bc-4c3d-909f-8792e9b4929dfailed
> [File exists]" repeated 3 times between [2018-01-16 15:18:07.154330] and
> [2018-01-16 15:18:07.154357]
> [2018-01-16 15:19:23.618794] W [MSGID: 113096] 
> [posix-handle.c:770:posix_handle_hard]
> 0-gv2a2-posix: link 
> /bricks/brick2/gv2a2/.shard/62335cb9-c7b5-4735-a879-59cff93fe622.21
> -> 
> /bricks/brick2/gv2a2/.glusterfs/6d/02/6d02bd98-83de-43e8-a7af-b1d5f5160403failed
> [File exists]

Re: [Gluster-users] Problem with Gluster 3.12.4, VM and sharding

2018-01-16 Thread Krutika Dhananjay
Also to help isolate the component, could you answer these:

1. on a different volume with shard not enabled, do you see this issue?
2. on a plain 3-way replicated volume (no arbiter), do you see this issue?



On Tue, Jan 16, 2018 at 4:03 PM, Krutika Dhananjay <kdhan...@redhat.com>
wrote:

> Please share the volume-info output and the logs under /var/log/glusterfs/
> from all your nodes. for investigating the issue.
>
> -Krutika
>
> On Tue, Jan 16, 2018 at 1:30 PM, Ing. Luca Lazzeroni - Trend Servizi Srl <
> l...@gvnet.it> wrote:
>
>> Hi to everyone.
>>
>> I've got a strange problem with a gluster setup: 3 nodes with Centos 7.4,
>> Gluster 3.12.4 from Centos/Gluster repositories, QEMU-KVM version 2.9.0
>> (compiled from RHEL sources).
>>
>> I'm running volumes in replica 3 arbiter 1 mode (but I've got a volume in
>> "pure" replica 3 mode too). I've applied the "virt" group settings to my
>> volumes since they host VM images.
>>
>> If I try to install something (eg: Ubuntu Server 16.04.3) on a VM (and so
>> I generate a bit of I/O inside it) and configure KVM to access gluster
>> volume directly (via libvirt), install fails after a while because the disk
>> content is corrupted. If I inspect the block inside the disk (by accessing
>> the image directly from outside) I can found many files filled with "^@".
>>
>
Also, what exactly do you mean by accessing the image directly from
outside? Was it from the brick directories directly? Was it from the mount
point of the volume? Could you elaborate? Which files exactly did you check?

-Krutika


>> If, instead, I configure KVM to access VM images via a FUSE mount,
>> everything seems to work correctly.
>>
>> Note that the problem with install is verified 100% time with QCOW2
>> image, while it appears only after with RAW disk images.
>>
>> Is there anyone who experienced the same problem ?
>>
>> Thank you,
>>
>>
>> --
>> Ing. Luca Lazzeroni
>> Responsabile Ricerca e Sviluppo
>> Trend Servizi Srl
>> Tel: 0376/631761
>> Web: https://www.trendservizi.it
>>
>> ___
>> Gluster-users mailing list
>> Gluster-users@gluster.org
>> http://lists.gluster.org/mailman/listinfo/gluster-users
>>
>
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Problem with Gluster 3.12.4, VM and sharding

2018-01-16 Thread Krutika Dhananjay
Please share the volume-info output and the logs under /var/log/glusterfs/
from all your nodes. for investigating the issue.

-Krutika

On Tue, Jan 16, 2018 at 1:30 PM, Ing. Luca Lazzeroni - Trend Servizi Srl <
l...@gvnet.it> wrote:

> Hi to everyone.
>
> I've got a strange problem with a gluster setup: 3 nodes with Centos 7.4,
> Gluster 3.12.4 from Centos/Gluster repositories, QEMU-KVM version 2.9.0
> (compiled from RHEL sources).
>
> I'm running volumes in replica 3 arbiter 1 mode (but I've got a volume in
> "pure" replica 3 mode too). I've applied the "virt" group settings to my
> volumes since they host VM images.
>
> If I try to install something (eg: Ubuntu Server 16.04.3) on a VM (and so
> I generate a bit of I/O inside it) and configure KVM to access gluster
> volume directly (via libvirt), install fails after a while because the disk
> content is corrupted. If I inspect the block inside the disk (by accessing
> the image directly from outside) I can found many files filled with "^@".
>
> If, instead, I configure KVM to access VM images via a FUSE mount,
> everything seems to work correctly.
>
> Note that the problem with install is verified 100% time with QCOW2 image,
> while it appears only after with RAW disk images.
>
> Is there anyone who experienced the same problem ?
>
> Thank you,
>
>
> --
> Ing. Luca Lazzeroni
> Responsabile Ricerca e Sviluppo
> Trend Servizi Srl
> Tel: 0376/631761
> Web: https://www.trendservizi.it
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] What is it with trusted.io-stats-dump?

2017-11-13 Thread Krutika Dhananjay
trusted.io-stats-dump is a virtual (not physical) extended attribute.
The code is written in a way that a request to set trusted.io-stats-dump
gets bypassed at the io-stats translator layer on the stack and
there it gets converted into the action of dumping the statistics into the
provided output file path.
See io_stats_setxattr() implementation in io-stats.c for more details.

HTH,
Krutika

On Mon, Nov 13, 2017 at 12:14 PM, Jeevan Patnaik 
wrote:

> Hi,
>
> I am trying to understand how the extended attribute trusted.io-stats-dump
> works.
>
> setfattr -n trusted.io-stats-dump -v /tmp/gluster_perf_stats/io-stats-pre.txt
> /mnt/gluster/gv0_glusterfs
>
> I can see that the io-stats-pre.txt is created. But how and what happened
> in the background?
>
> And why I can't I see the attribute with getfattr again?
>
> getfattr -dm- /mnt/gluster/gv0_glusterfs
> # file: mnt/gluster/gv0_glusterfs
> trusted.glusterfs.dht.commithash="3480667945"
>
> Regards,
> Jeevan.
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Gluster 3.8.13 data corruption

2017-10-09 Thread Krutika Dhananjay
OK.

Is this problem unique to templates for a particular guest OS type? Or is
this something you see for all guest OS?

Also, can you get the output of `getfattr -d -m . -e hex ` for the
following two "paths" from all of the bricks:
path to the file representing the vm created off this template wrt the
brick. It will usually be $BRICKPATH/xx/images/$UUID where $UUID
represents the uuid of the vm created from the template.  If im not wrong,
there would be two sets of a certain UUID2, UUID2.lease, UUID2.meta.
Please  get me the output of the command above for both the uuid files. The
one that has more than 2 hard links is the template. When you attach this
output, please make that distinction clear, so it will be easier to debug.


-Krutika

On Fri, Oct 6, 2017 at 6:38 PM, Mahdi Adnan <mahdi.ad...@outlook.com> wrote:

>
> Hi,
>
> Thank you for your reply.
> Lindsay,
> Uunfortunately i do not have backup for this template.
>
> Krutika,
> The stat-prefetch is already disabled on the volume.
>
> --
>
> Respectfully
> *Mahdi A. Mahdi*
>
> --
> *From:* Krutika Dhananjay <kdhan...@redhat.com>
> *Sent:* Friday, October 6, 2017 7:39 AM
> *To:* Lindsay Mathieson
> *Cc:* Mahdi Adnan; gluster-users@gluster.org
> *Subject:* Re: [Gluster-users] Gluster 3.8.13 data corruption
>
> Could you disable stat-prefetch on the volume and create another vm off
> that template and see if it works?
>
> -Krutika
>
> On Fri, Oct 6, 2017 at 8:28 AM, Lindsay Mathieson <
> lindsay.mathie...@gmail.com> wrote:
>
>> Any chance of a backup you could do bit compare with?
>>
>>
>>
>> Sent from my Windows 10 phone
>>
>>
>>
>> *From: *Mahdi Adnan <mahdi.ad...@outlook.com>
>> *Sent: *Friday, 6 October 2017 12:26 PM
>> *To: *gluster-users@gluster.org
>> *Subject: *[Gluster-users] Gluster 3.8.13 data corruption
>>
>>
>>
>> Hi,
>>
>>
>>
>> We're running Gluster 3.8.13 replica 2 (SSDs), it's used as storage
>> domain for oVirt.
>>
>> Today, we found an issue with one of the VMs template, after deploying a
>> VM from this template it will not boot, it stuck at mount the root
>> partition.
>>
>> We've been using this templates for months now and we did not had any
>> issues with it.
>>
>> Both oVirt and Gluster logs is not showing any errors or warnings.
>>
>> I exported the template and tried running on a standalone machine and it
>> did not work.
>>
>> Do you think this might be a silent data corruption ? Bitrot is not
>> enabled.
>>
>> Volume info can be found at:
>>
>> https://paste.fedoraproject.org/paste/f~1jNIObDa2zoG7ATdCdZg
>>
>>
>>
>> --
>>
>> Respectfully
>> * Mahdi A. Mahdi*
>>
>>
>>
>> ___
>> Gluster-users mailing list
>> Gluster-users@gluster.org
>> http://lists.gluster.org/mailman/listinfo/gluster-users
>>
>
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Gluster 3.8.13 data corruption

2017-10-05 Thread Krutika Dhananjay
Could you disable stat-prefetch on the volume and create another vm off
that template and see if it works?

-Krutika

On Fri, Oct 6, 2017 at 8:28 AM, Lindsay Mathieson <
lindsay.mathie...@gmail.com> wrote:

> Any chance of a backup you could do bit compare with?
>
>
>
> Sent from my Windows 10 phone
>
>
>
> *From: *Mahdi Adnan 
> *Sent: *Friday, 6 October 2017 12:26 PM
> *To: *gluster-users@gluster.org
> *Subject: *[Gluster-users] Gluster 3.8.13 data corruption
>
>
>
> Hi,
>
>
>
> We're running Gluster 3.8.13 replica 2 (SSDs), it's used as storage domain
> for oVirt.
>
> Today, we found an issue with one of the VMs template, after deploying a
> VM from this template it will not boot, it stuck at mount the root
> partition.
>
> We've been using this templates for months now and we did not had any
> issues with it.
>
> Both oVirt and Gluster logs is not showing any errors or warnings.
>
> I exported the template and tried running on a standalone machine and it
> did not work.
>
> Do you think this might be a silent data corruption ? Bitrot is not
> enabled.
>
> Volume info can be found at:
>
> https://paste.fedoraproject.org/paste/f~1jNIObDa2zoG7ATdCdZg
>
>
>
> --
>
> Respectfully
> *Mahdi A. Mahdi*
>
>
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] data corruption - any update?

2017-10-04 Thread Krutika Dhananjay
On Wed, Oct 4, 2017 at 10:51 AM, Nithya Balachandran 
wrote:

>
>
> On 3 October 2017 at 13:27, Gandalf Corvotempesta <
> gandalf.corvotempe...@gmail.com> wrote:
>
>> Any update about multiple bugs regarding data corruptions with
>> sharding enabled ?
>>
>> Is 3.12.1 ready to be used in production?
>>
>
> Most issues have been fixed but there appears to be one more race for
> which the patch is being worked on.
>
> @Krutika, is that correct?
>
>
>
That is my understanding too, yes, in light of the discussion that happened
at https://bugzilla.redhat.com/show_bug.cgi?id=1465123

-Krutika


> Thanks,
> Nithya
>
>
>
>
>> ___
>> Gluster-users mailing list
>> Gluster-users@gluster.org
>> http://lists.gluster.org/mailman/listinfo/gluster-users
>>
>
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Performance drop from 3.8 to 3.10

2017-09-22 Thread Krutika Dhananjay
Could you disable cluster.eager-lock and try again?

-Krutika

On Thu, Sep 21, 2017 at 6:31 PM, Lindsay Mathieson <
lindsay.mathie...@gmail.com> wrote:

> Upgraded recently from 3.8.15 to 3.10.5 and have seen a fairly substantial
> drop in read/write perfomance
>
> env:
>
> - 3 node, replica 3 cluster
>
> - Private dedicated Network: 1Gx3, bond: balance-alb
>
> - was able to down the volume for the upgrade and reboot each node
>
> - Usage: VM Hosting (qemu)
>
> - Sharded Volume
>
> - sequential read performance in VM's has dropped from 700Mbps to 300mbs
>
> - Seq Write has dropped from 115MB/s (approx) to 110
>
> - Write IOPS have dropped from 12MB/s to 8MB/s
>
> Apart from increasing the op version I made no changes to the volume
> settings.
>
> op.version is 31004
>
> gluster v info
>
> Volume Name: datastore4
> Type: Replicate
> Volume ID: 0ba131ef-311d-4bb1-be46-596e83b2f6ce
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 1 x 3 = 3
> Transport-type: tcp
> Bricks:
> Brick1: vnb.proxmox.softlog:/tank/vmdata/datastore4
> Brick2: vng.proxmox.softlog:/tank/vmdata/datastore4
> Brick3: vnh.proxmox.softlog:/tank/vmdata/datastore4
> Options Reconfigured:
> transport.address-family: inet
> cluster.locking-scheme: granular
> cluster.granular-entry-heal: yes
> features.shard-block-size: 64MB
> network.remote-dio: enable
> cluster.eager-lock: enable
> performance.io-cache: off
> performance.read-ahead: off
> performance.quick-read: off
> performance.stat-prefetch: on
> performance.strict-write-ordering: off
> nfs.enable-ino32: off
> nfs.addr-namelookup: off
> nfs.disable: on
> cluster.server-quorum-type: server
> cluster.quorum-type: auto
> features.shard: on
> cluster.data-self-heal: on
> performance.readdir-ahead: on
> performance.low-prio-threads: 32
> user.cifs: off
> performance.flush-behind: on
> server.event-threads: 4
> client.event-threads: 4
> server.allow-insecure: on
>
>
> --
> Lindsay Mathieson
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Slow performance of gluster volume

2017-09-06 Thread Krutika Dhananjay
Do you see any improvement with 3.11.1 as that has a patch that improves
perf for this kind of a workload

Also, could you disable eager-lock and check if that helps? I see that max
time is being spent in acquiring locks.

-Krutika

On Wed, Sep 6, 2017 at 1:38 PM, Abi Askushi <rightkickt...@gmail.com> wrote:

> Hi Krutika,
>
> Is it anything in the profile indicating what is causing this bottleneck?
> In case i can collect any other info let me know.
>
> Thanx
>
> On Sep 5, 2017 13:27, "Abi Askushi" <rightkickt...@gmail.com> wrote:
>
> Hi Krutika,
>
> Attached the profile stats. I enabled profiling then ran some dd tests.
> Also 3 Windows VMs are running on top this volume but did not do any stress
> testing on the VMs. I have left the profiling enabled in case more time is
> needed for useful stats.
>
> Thanx
>
> On Tue, Sep 5, 2017 at 12:48 PM, Krutika Dhananjay <kdhan...@redhat.com>
> wrote:
>
>> OK my understanding is that with preallocated disks the performance with
>> and without shard will be the same.
>>
>> In any case, please attach the volume profile[1], so we can see what else
>> is slowing things down.
>>
>> -Krutika
>>
>> [1] - https://gluster.readthedocs.io/en/latest/Administrator%20Gui
>> de/Monitoring%20Workload/#running-glusterfs-volume-profile-command
>>
>> On Tue, Sep 5, 2017 at 2:32 PM, Abi Askushi <rightkickt...@gmail.com>
>> wrote:
>>
>>> Hi Krutika,
>>>
>>> I already have a preallocated disk on VM.
>>> Now I am checking performance with dd on the hypervisors which have the
>>> gluster volume configured.
>>>
>>> I tried also several values of shard-block-size and I keep getting the
>>> same low values on write performance.
>>> Enabling client-io-threads also did not have any affect.
>>>
>>> The version of gluster I am using is glusterfs 3.8.12 built on May 11
>>> 2017 18:46:20.
>>> The setup is a set of 3 Centos 7.3 servers and ovirt 4.1, using gluster
>>> as storage.
>>>
>>> Below are the current settings:
>>>
>>>
>>> Volume Name: vms
>>> Type: Replicate
>>> Volume ID: 4513340d-7919-498b-bfe0-d836b5cea40b
>>> Status: Started
>>> Snapshot Count: 0
>>> Number of Bricks: 1 x (2 + 1) = 3
>>> Transport-type: tcp
>>> Bricks:
>>> Brick1: gluster0:/gluster/vms/brick
>>> Brick2: gluster1:/gluster/vms/brick
>>> Brick3: gluster2:/gluster/vms/brick (arbiter)
>>> Options Reconfigured:
>>> server.event-threads: 4
>>> client.event-threads: 4
>>> performance.client-io-threads: on
>>> features.shard-block-size: 512MB
>>> cluster.granular-entry-heal: enable
>>> performance.strict-o-direct: on
>>> network.ping-timeout: 30
>>> storage.owner-gid: 36
>>> storage.owner-uid: 36
>>> user.cifs: off
>>> features.shard: on
>>> cluster.shd-wait-qlength: 1
>>> cluster.shd-max-threads: 8
>>> cluster.locking-scheme: granular
>>> cluster.data-self-heal-algorithm: full
>>> cluster.server-quorum-type: server
>>> cluster.quorum-type: auto
>>> cluster.eager-lock: enable
>>> network.remote-dio: off
>>> performance.low-prio-threads: 32
>>> performance.stat-prefetch: on
>>> performance.io-cache: off
>>> performance.read-ahead: off
>>> performance.quick-read: off
>>> transport.address-family: inet
>>> performance.readdir-ahead: on
>>> nfs.disable: on
>>> nfs.export-volumes: on
>>>
>>>
>>> I observed that when testing with dd if=/dev/zero of=testfile bs=1G
>>> count=1 I get 65MB/s on the vms gluster volume (and the network traffic
>>> between the servers reaches ~ 500Mbps), while when testing with dd
>>> if=/dev/zero of=testfile bs=1G count=1 *oflag=direct *I get a
>>> consistent 10MB/s and the network traffic hardly reaching 100Mbps.
>>>
>>> Any other things one can do?
>>>
>>> On Tue, Sep 5, 2017 at 5:57 AM, Krutika Dhananjay <kdhan...@redhat.com>
>>> wrote:
>>>
>>>> I'm assuming you are using this volume to store vm images, because I
>>>> see shard in the options list.
>>>>
>>>> Speaking from shard translator's POV, one thing you can do to improve
>>>> performance is to use preallocated images.
>>>> This will at least eliminate the need for shard to perform multiple
>>>> steps as part of the writes - such as creating 

Re: [Gluster-users] Slow performance of gluster volume

2017-09-05 Thread Krutika Dhananjay
OK my understanding is that with preallocated disks the performance with
and without shard will be the same.

In any case, please attach the volume profile[1], so we can see what else
is slowing things down.

-Krutika

[1] -
https://gluster.readthedocs.io/en/latest/Administrator%20Guide/Monitoring%20Workload/#running-glusterfs-volume-profile-command

On Tue, Sep 5, 2017 at 2:32 PM, Abi Askushi <rightkickt...@gmail.com> wrote:

> Hi Krutika,
>
> I already have a preallocated disk on VM.
> Now I am checking performance with dd on the hypervisors which have the
> gluster volume configured.
>
> I tried also several values of shard-block-size and I keep getting the
> same low values on write performance.
> Enabling client-io-threads also did not have any affect.
>
> The version of gluster I am using is glusterfs 3.8.12 built on May 11 2017
> 18:46:20.
> The setup is a set of 3 Centos 7.3 servers and ovirt 4.1, using gluster as
> storage.
>
> Below are the current settings:
>
>
> Volume Name: vms
> Type: Replicate
> Volume ID: 4513340d-7919-498b-bfe0-d836b5cea40b
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 1 x (2 + 1) = 3
> Transport-type: tcp
> Bricks:
> Brick1: gluster0:/gluster/vms/brick
> Brick2: gluster1:/gluster/vms/brick
> Brick3: gluster2:/gluster/vms/brick (arbiter)
> Options Reconfigured:
> server.event-threads: 4
> client.event-threads: 4
> performance.client-io-threads: on
> features.shard-block-size: 512MB
> cluster.granular-entry-heal: enable
> performance.strict-o-direct: on
> network.ping-timeout: 30
> storage.owner-gid: 36
> storage.owner-uid: 36
> user.cifs: off
> features.shard: on
> cluster.shd-wait-qlength: 1
> cluster.shd-max-threads: 8
> cluster.locking-scheme: granular
> cluster.data-self-heal-algorithm: full
> cluster.server-quorum-type: server
> cluster.quorum-type: auto
> cluster.eager-lock: enable
> network.remote-dio: off
> performance.low-prio-threads: 32
> performance.stat-prefetch: on
> performance.io-cache: off
> performance.read-ahead: off
> performance.quick-read: off
> transport.address-family: inet
> performance.readdir-ahead: on
> nfs.disable: on
> nfs.export-volumes: on
>
>
> I observed that when testing with dd if=/dev/zero of=testfile bs=1G
> count=1 I get 65MB/s on the vms gluster volume (and the network traffic
> between the servers reaches ~ 500Mbps), while when testing with dd
> if=/dev/zero of=testfile bs=1G count=1 *oflag=direct *I get a consistent
> 10MB/s and the network traffic hardly reaching 100Mbps.
>
> Any other things one can do?
>
> On Tue, Sep 5, 2017 at 5:57 AM, Krutika Dhananjay <kdhan...@redhat.com>
> wrote:
>
>> I'm assuming you are using this volume to store vm images, because I see
>> shard in the options list.
>>
>> Speaking from shard translator's POV, one thing you can do to improve
>> performance is to use preallocated images.
>> This will at least eliminate the need for shard to perform multiple steps
>> as part of the writes - such as creating the shard and then writing to it
>> and then updating the aggregated file size - all of which require one
>> network call each, which further get blown up once they reach AFR
>> (replicate) into many more network calls.
>>
>> Second, I'm assuming you're using the default shard block size of 4MB
>> (you can confirm this using `gluster volume get  shard-block-size`).
>> In our tests, we've found that larger shard sizes perform better. So maybe
>> change the shard-block-size to 64MB (`gluster volume set 
>> shard-block-size 64MB`).
>>
>> Third, keep stat-prefetch enabled. We've found that qemu sends quite a
>> lot of [f]stats which can be served from the (md)cache to improve
>> performance. So enable that.
>>
>> Also, could you also enable client-io-threads and see if that improves
>> performance?
>>
>> Which version of gluster are you using BTW?
>>
>> -Krutika
>>
>>
>> On Tue, Sep 5, 2017 at 4:32 AM, Abi Askushi <rightkickt...@gmail.com>
>> wrote:
>>
>>> Hi all,
>>>
>>> I have a gluster volume used to host several VMs (managed through
>>> oVirt).
>>> The volume is a replica 3 with arbiter and the 3 servers use 1 Gbit
>>> network for the storage.
>>>
>>> When testing with dd (dd if=/dev/zero of=testfile bs=1G count=1
>>> oflag=direct) out of the volume (e.g. writing at /root/) the performance of
>>> the dd is reported to be ~ 700MB/s, which is quite decent. When testing the
>>> dd on the gluster volume I get ~ 43 MB/s which way lower from the previous.
>>> When tes

Re: [Gluster-users] Poor performance with shard

2017-09-04 Thread Krutika Dhananjay
Hi,

Speaking from shard translator's POV, one thing you can do to improve
performance is to use preallocated images.
This will at least eliminate the need for shard to perform multiple steps
as part of the writes - such as creating the shard and then writing to it
and then updating the aggregated file size - all of which require one
network call each, which further get blown up once they reach AFR
(replicate) into many more network calls. What this also means is that the
performance with and without shard will be the same with this change.

Also, could you also enable client-io-threads and see if that improves
performance?

There's a patch that is part of 3.11.1 that has been found to improve
performance for vm workloads based on our testing -
https://review.gluster.org/#/c/17391/
You can give this version a try.

-Krutika

On Mon, Sep 4, 2017 at 7:48 PM, Roei G  wrote:

> Hey everyone!
> I have deployed gluster on 3 nodes with 4 SSDs each and 10Gb Ethernet
> connection.
>
> The storage is configured with 3 gluster volumes, every volume has 12
> bricks (4 bricks on every server, 1 per ssd in the server).
>
> With the 'features.shard' off option my writing speed (using the 'dd'
> command) is approximately 250 Mbs and when the feature is on the writing
> speed is around 130mbs.
>
> - gluster version 3.8.13 
>
> Volume name: data
> Number of bricks : 4 * 3 = 12
> Bricks:
> Brick1: server1:/brick/data1
> Brick2: server1:/brick/data2
> Brick3: server1:/brick/data3
> Brick4: server1:/brick/data4
> Brick5: server2:/brick/data1
> .
> .
> .
> Options reconfigure:
> Performance.strict-o-direct: off
> Cluster.nufa: off
> Features.shard-block-size: 512MB
> Features.shard: on
> Cluster.server-quorum-type: server
> Cluster.quorum-type: auto
> Cluster.eager-lock: enable
> Network.remote-dio: on
> Performance.readdir-ahead: on
>
> Any idea on how to improve my performance?
>
>
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Slow performance of gluster volume

2017-09-04 Thread Krutika Dhananjay
I'm assuming you are using this volume to store vm images, because I see
shard in the options list.

Speaking from shard translator's POV, one thing you can do to improve
performance is to use preallocated images.
This will at least eliminate the need for shard to perform multiple steps
as part of the writes - such as creating the shard and then writing to it
and then updating the aggregated file size - all of which require one
network call each, which further get blown up once they reach AFR
(replicate) into many more network calls.

Second, I'm assuming you're using the default shard block size of 4MB (you
can confirm this using `gluster volume get  shard-block-size`). In our
tests, we've found that larger shard sizes perform better. So maybe change
the shard-block-size to 64MB (`gluster volume set  shard-block-size
64MB`).

Third, keep stat-prefetch enabled. We've found that qemu sends quite a lot
of [f]stats which can be served from the (md)cache to improve performance.
So enable that.

Also, could you also enable client-io-threads and see if that improves
performance?

Which version of gluster are you using BTW?

-Krutika


On Tue, Sep 5, 2017 at 4:32 AM, Abi Askushi  wrote:

> Hi all,
>
> I have a gluster volume used to host several VMs (managed through oVirt).
> The volume is a replica 3 with arbiter and the 3 servers use 1 Gbit
> network for the storage.
>
> When testing with dd (dd if=/dev/zero of=testfile bs=1G count=1
> oflag=direct) out of the volume (e.g. writing at /root/) the performance of
> the dd is reported to be ~ 700MB/s, which is quite decent. When testing the
> dd on the gluster volume I get ~ 43 MB/s which way lower from the previous.
> When testing with dd the gluster volume, the network traffic was not
> exceeding 450 Mbps on the network interface. I would expect to reach near
> 900 Mbps considering that there is 1 Gbit of bandwidth available. This
> results having VMs with very slow performance (especially on their write
> operations).
>
> The full details of the volume are below. Any advise on what can be
> tweaked will be highly appreciated.
>
> Volume Name: vms
> Type: Replicate
> Volume ID: 4513340d-7919-498b-bfe0-d836b5cea40b
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 1 x (2 + 1) = 3
> Transport-type: tcp
> Bricks:
> Brick1: gluster0:/gluster/vms/brick
> Brick2: gluster1:/gluster/vms/brick
> Brick3: gluster2:/gluster/vms/brick (arbiter)
> Options Reconfigured:
> cluster.granular-entry-heal: enable
> performance.strict-o-direct: on
> network.ping-timeout: 30
> storage.owner-gid: 36
> storage.owner-uid: 36
> user.cifs: off
> features.shard: on
> cluster.shd-wait-qlength: 1
> cluster.shd-max-threads: 8
> cluster.locking-scheme: granular
> cluster.data-self-heal-algorithm: full
> cluster.server-quorum-type: server
> cluster.quorum-type: auto
> cluster.eager-lock: enable
> network.remote-dio: off
> performance.low-prio-threads: 32
> performance.stat-prefetch: off
> performance.io-cache: off
> performance.read-ahead: off
> performance.quick-read: off
> transport.address-family: inet
> performance.readdir-ahead: on
> nfs.disable: on
> nfs.export-volumes: on
>
>
> Thanx,
> Alex
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Bug 1473150 - features/shard:Lookup on shard 18 failed. Base file gfid = b00f5de2-d811-44fe-80e5-1f382908a55a [No data available], the [No data available]

2017-07-24 Thread Krutika Dhananjay
+gluster-users ML

Hi,

I've responded to your bug report here -
https://bugzilla.redhat.com/show_bug.cgi?id=1473150#c3
Kindly let us know if the patch fixes your bug.


-Krutika

On Thu, Jul 20, 2017 at 3:12 PM, zhangjianwei1...@163.com <
zhangjianwei1...@163.com> wrote:

> Hi  Krutika Dhananjay, Pranith Kumar Karampuri,
>  Thank for your reply!
>
>  I am waiting your good news!
>
>  Thank you for your hard work!
>
> --
> zhangjianwei1...@163.com
>
>
> *From:* Krutika Dhananjay <kdhan...@redhat.com>
> *Date:* 2017-07-20 17:34
> *To:* Pranith Kumar Karampuri <pkara...@redhat.com>
> *CC:* 张建伟 <zhangjianwei1...@163.com>
> *Subject:* Re: Bug 1473150 - features/shard:Lookup on shard 18 failed.
> Base file gfid = b00f5de2-d811-44fe-80e5-1f382908a55a [No data
> available], the [No data available]
> Hi 张建伟,
>
> Thanks for your email. I am currently looking into a customer issue. I
> will get back to you as soon as I'm done with it.
>
> -Krutika
>
> On Thu, Jul 20, 2017 at 2:17 PM, Pranith Kumar Karampuri <
> pkara...@redhat.com> wrote:
>
>> Krutika is working on a similar ENODATA bug with distribute xlator. This
>> bug looks similar to it. Krutika knows more details about this issue.
>>  This is lunch time in India. Expect some delay.
>>
>> On Thu, Jul 20, 2017 at 1:08 PM, 张建伟 <zhangjianwei1...@163.com> wrote:
>>
>>> Hi,
>>> Nice to meet you!
>>> Recently, I am testing features/shard module and finding some
>>> problem in it.
>>> I have commited the problem to https://bugzilla.redhat.com
>>> /show_bug.cgi?id=1473150
>>> <https://bugzilla.redhat.com/show_bug.cgi?id=1473150.>
>>>
>>> The shard_glusterfs_log.tar.gz is my test results log.
>>> I hope you can help me!
>>> Thank you very much!
>>>
>>> Best regards!!!
>>>
>>>
>>>
>>>
>>
>>
>>
>> --
>> Pranith
>>
>
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Very slow performance on Sharded GlusterFS

2017-07-12 Thread Krutika Dhananjay
Hi,

Sorry for the late response.
No, so eager-lock experiment was more to see if the implementation had any
new bugs.
It doesn't look like it does. I think having it on would be the right thing
to do. It will reduce the number of fops having to go over the network.

Coming to the performance drop, I compared the volume profile output for
stripe and 32MB shard again.
The only thing that is striking is the number of xattrops and inodelks,
which is only 2-4 for striped volume
whereas the number is much bigger in the case of sharded volume. This is
unfortunately likely with sharding because
the optimizations eager-locking and delayed post-op will now only be
applicable on a per-shard basis.
Larger the shard size, the better, to work around this issue.

Meanwhile, let me think about how we can get this fixed in code.

-Krutika



On Mon, Jul 10, 2017 at 7:59 PM, <gen...@gencgiyen.com> wrote:

> Hi Krutika,
>
>
>
> May I kindly ping to you and ask that If you have any idea yet or figured
> out whats the issue may?
>
>
>
> I am awaiting your reply with four eyes :)
>
>
>
> Apologies for the ping :)
>
>
>
> -Gencer.
>
>
>
> *From:* gluster-users-boun...@gluster.org [mailto:gluster-users-bounces@
> gluster.org] *On Behalf Of *gen...@gencgiyen.com
> *Sent:* Thursday, July 6, 2017 11:06 AM
>
> *To:* 'Krutika Dhananjay' <kdhan...@redhat.com>
> *Cc:* 'gluster-user' <gluster-users@gluster.org>
> *Subject:* Re: [Gluster-users] Very slow performance on Sharded GlusterFS
>
>
>
> Hi Krutika,
>
>
>
> I also did one more test. I re-created another volume (single volume. Old
> one destroyed-deleted) then do 2 dd tests. One for 1GB other for 2GB. Both
> are 32MB shard and eager-lock off.
>
>
>
> Samples:
>
>
>
> sr:~# gluster volume profile testvol start
>
> Starting volume profile on testvol has been successful
>
> sr:~# dd if=/dev/zero of=/testvol/dtestfil0xb bs=1G count=1
>
> 1+0 records in
>
> 1+0 records out
>
> 1073741824 bytes (1.1 GB, 1.0 GiB) copied, 12.2708 s, 87.5 MB/s
>
> sr:~# gluster volume profile testvol info > /32mb_shard_and_1gb_dd.log
>
> sr:~# gluster volume profile testvol stop
>
> Stopping volume profile on testvol has been successful
>
> sr:~# gluster volume profile testvol start
>
> Starting volume profile on testvol has been successful
>
> sr:~# dd if=/dev/zero of=/testvol/dtestfil0xb bs=1G count=2
>
> 2+0 records in
>
> 2+0 records out
>
> 2147483648 bytes (2.1 GB, 2.0 GiB) copied, 23.5457 s, 91.2 MB/s
>
> sr:~# gluster volume profile testvol info > /32mb_shard_and_2gb_dd.log
>
> sr:~# gluster volume profile testvol stop
>
> Stopping volume profile on testvol has been successful
>
>
>
> Also here is volume info:
>
>
>
> sr:~# gluster volume info testvol
>
>
>
> Volume Name: testvol
>
> Type: Distributed-Replicate
>
> Volume ID: 3cc06d95-06e9-41f8-8b26-e997886d7ba1
>
> Status: Started
>
> Snapshot Count: 0
>
> Number of Bricks: 10 x 2 = 20
>
> Transport-type: tcp
>
> Bricks:
>
> Brick1: sr-09-loc-50-14-18:/bricks/brick1
>
> Brick2: sr-10-loc-50-14-18:/bricks/brick1
>
> Brick3: sr-09-loc-50-14-18:/bricks/brick2
>
> Brick4: sr-10-loc-50-14-18:/bricks/brick2
>
> Brick5: sr-09-loc-50-14-18:/bricks/brick3
>
> Brick6: sr-10-loc-50-14-18:/bricks/brick3
>
> Brick7: sr-09-loc-50-14-18:/bricks/brick4
>
> Brick8: sr-10-loc-50-14-18:/bricks/brick4
>
> Brick9: sr-09-loc-50-14-18:/bricks/brick5
>
> Brick10: sr-10-loc-50-14-18:/bricks/brick5
>
> Brick11: sr-09-loc-50-14-18:/bricks/brick6
>
> Brick12: sr-10-loc-50-14-18:/bricks/brick6
>
> Brick13: sr-09-loc-50-14-18:/bricks/brick7
>
> Brick14: sr-10-loc-50-14-18:/bricks/brick7
>
> Brick15: sr-09-loc-50-14-18:/bricks/brick8
>
> Brick16: sr-10-loc-50-14-18:/bricks/brick8
>
> Brick17: sr-09-loc-50-14-18:/bricks/brick9
>
> Brick18: sr-10-loc-50-14-18:/bricks/brick9
>
> Brick19: sr-09-loc-50-14-18:/bricks/brick10
>
> Brick20: sr-10-loc-50-14-18:/bricks/brick10
>
> Options Reconfigured:
>
> cluster.eager-lock: off
>
> features.shard-block-size: 32MB
>
> features.shard: on
>
> transport.address-family: inet
>
> nfs.disable: on
>
>
>
> See attached results and sorry for the multiple e-mails. I just want to
> make sure that I provided correct results for the tests.
>
>
>
> Thanks,
>
> Gencer.
>
>
>
> *From:* gluster-users-boun...@gluster.org [mailto:gluster-users-bounces@
> gluster.org <gluster-users-boun...@gluster.org>] *On Behalf Of *
> gen...@gencgiyen.com
> *Sent:* Thursday, July 6, 2017 10:34 AM
> *To:* 'Krutika Dhana

Re: [Gluster-users] Very slow performance on Sharded GlusterFS

2017-07-05 Thread Krutika Dhananjay
What if you disabled eager lock and run your test again on the sharded
configuration along with the profile output?

# gluster volume set  cluster.eager-lock off

-Krutika

On Tue, Jul 4, 2017 at 9:03 PM, Krutika Dhananjay <kdhan...@redhat.com>
wrote:

> Thanks. I think reusing the same volume was the cause of lack of IO
> distribution.
> The latest profile output looks much more realistic and in line with i
> would expect.
>
> Let me analyse the numbers a bit and get back.
>
> -Krutika
>
> On Tue, Jul 4, 2017 at 12:55 PM, <gen...@gencgiyen.com> wrote:
>
>> Hi Krutika,
>>
>>
>>
>> Thank you so much for myour reply. Let me answer all:
>>
>>
>>
>>1. I have no idea why it did not get distributed over all bricks.
>>2. Hm.. This is really weird.
>>
>>
>>
>> And others;
>>
>>
>>
>> No. I use only one volume. When I tested sharded and striped volumes, I
>> manually stopped volume, deleted volume, purged data (data inside of
>> bricks/disks) and re-create by using this command:
>>
>>
>>
>> sudo gluster volume create testvol replica 2
>> sr-09-loc-50-14-18:/bricks/brick1 sr-10-loc-50-14-18:/bricks/brick1
>> sr-09-loc-50-14-18:/bricks/brick2 sr-10-loc-50-14-18:/bricks/brick2
>> sr-09-loc-50-14-18:/bricks/brick3 sr-10-loc-50-14-18:/bricks/brick3
>> sr-09-loc-50-14-18:/bricks/brick4 sr-10-loc-50-14-18:/bricks/brick4
>> sr-09-loc-50-14-18:/bricks/brick5 sr-10-loc-50-14-18:/bricks/brick5
>> sr-09-loc-50-14-18:/bricks/brick6 sr-10-loc-50-14-18:/bricks/brick6
>> sr-09-loc-50-14-18:/bricks/brick7 sr-10-loc-50-14-18:/bricks/brick7
>> sr-09-loc-50-14-18:/bricks/brick8 sr-10-loc-50-14-18:/bricks/brick8
>> sr-09-loc-50-14-18:/bricks/brick9 sr-10-loc-50-14-18:/bricks/brick9
>> sr-09-loc-50-14-18:/bricks/brick10 sr-10-loc-50-14-18:/bricks/brick10
>> force
>>
>>
>>
>> and of course after that volume start executed. If shard enabled, I
>> enable that feature BEFORE I start the sharded volume than mount.
>>
>>
>>
>> I tried converting from one to another but then I saw documentation says
>> clean voluje should be better. So I tried clean method. Still same
>> performance.
>>
>>
>>
>> Testfile grows from 1GB to 5GB. And tests are dd. See this example:
>>
>>
>>
>> dd if=/dev/zero of=/mnt/testfile bs=1G count=5
>>
>> 5+0 records in
>>
>> 5+0 records out
>>
>> 5368709120 bytes (5.4 GB, 5.0 GiB) copied, 66.7978 s, 80.4 MB/s
>>
>>
>>
>>
>>
>> >> dd if=/dev/zero of=/mnt/testfile bs=5G count=1
>>
>> This also gives same result. (bs and count reversed)
>>
>>
>>
>>
>>
>> And this example have generated a profile which I also attached to this
>> e-mail.
>>
>>
>>
>> Is there anything that I can try? I am open to all kind of suggestions.
>>
>>
>>
>> Thanks,
>>
>> Gencer.
>>
>>
>>
>> *From:* Krutika Dhananjay [mailto:kdhan...@redhat.com]
>> *Sent:* Tuesday, July 4, 2017 9:39 AM
>>
>> *To:* gen...@gencgiyen.com
>> *Cc:* gluster-user <gluster-users@gluster.org>
>> *Subject:* Re: [Gluster-users] Very slow performance on Sharded GlusterFS
>>
>>
>>
>> Hi Gencer,
>>
>> I just checked the volume-profile attachments.
>>
>> Things that seem really odd to me as far as the sharded volume is
>> concerned:
>>
>> 1. Only the replica pair having bricks 5 and 6 on both nodes 09 and 10
>> seems to have witnessed all the IO. No other bricks witnessed any write
>> operations. This is unacceptable for a volume that has 8 other replica
>> sets. Why didn't the shards get distributed across all of these sets?
>>
>>
>>
>> 2. For replica set consisting of bricks 5 and 6 of node 09, I see that
>> the brick 5 is spending 99% of its time in FINODELK fop, when the fop that
>> should have dominated its profile should have been in fact WRITE.
>>
>> Could you throw some more light on your setup from gluster standpoint?
>> * For instance, are you using two different gluster volumes to gather
>> these numbers - one distributed-replicated-striped and another
>> distributed-replicated-sharded? Or are you merely converting a single
>> volume from one type to another?
>>
>>
>>
>> * And if there are indeed two volumes, could you share both their `volume
>> info` outputs to eliminate any confusion?
>>
>> * If there's just one volume, are you taking care to remove all da

Re: [Gluster-users] Very slow performance on Sharded GlusterFS

2017-07-04 Thread Krutika Dhananjay
Thanks. I think reusing the same volume was the cause of lack of IO
distribution.
The latest profile output looks much more realistic and in line with i
would expect.

Let me analyse the numbers a bit and get back.

-Krutika

On Tue, Jul 4, 2017 at 12:55 PM, <gen...@gencgiyen.com> wrote:

> Hi Krutika,
>
>
>
> Thank you so much for myour reply. Let me answer all:
>
>
>
>1. I have no idea why it did not get distributed over all bricks.
>2. Hm.. This is really weird.
>
>
>
> And others;
>
>
>
> No. I use only one volume. When I tested sharded and striped volumes, I
> manually stopped volume, deleted volume, purged data (data inside of
> bricks/disks) and re-create by using this command:
>
>
>
> sudo gluster volume create testvol replica 2 sr-09-loc-50-14-18:/bricks/brick1
> sr-10-loc-50-14-18:/bricks/brick1 sr-09-loc-50-14-18:/bricks/brick2
> sr-10-loc-50-14-18:/bricks/brick2 sr-09-loc-50-14-18:/bricks/brick3
> sr-10-loc-50-14-18:/bricks/brick3 sr-09-loc-50-14-18:/bricks/brick4
> sr-10-loc-50-14-18:/bricks/brick4 sr-09-loc-50-14-18:/bricks/brick5
> sr-10-loc-50-14-18:/bricks/brick5 sr-09-loc-50-14-18:/bricks/brick6
> sr-10-loc-50-14-18:/bricks/brick6 sr-09-loc-50-14-18:/bricks/brick7
> sr-10-loc-50-14-18:/bricks/brick7 sr-09-loc-50-14-18:/bricks/brick8
> sr-10-loc-50-14-18:/bricks/brick8 sr-09-loc-50-14-18:/bricks/brick9
> sr-10-loc-50-14-18:/bricks/brick9 sr-09-loc-50-14-18:/bricks/brick10
> sr-10-loc-50-14-18:/bricks/brick10 force
>
>
>
> and of course after that volume start executed. If shard enabled, I enable
> that feature BEFORE I start the sharded volume than mount.
>
>
>
> I tried converting from one to another but then I saw documentation says
> clean voluje should be better. So I tried clean method. Still same
> performance.
>
>
>
> Testfile grows from 1GB to 5GB. And tests are dd. See this example:
>
>
>
> dd if=/dev/zero of=/mnt/testfile bs=1G count=5
>
> 5+0 records in
>
> 5+0 records out
>
> 5368709120 bytes (5.4 GB, 5.0 GiB) copied, 66.7978 s, 80.4 MB/s
>
>
>
>
>
> >> dd if=/dev/zero of=/mnt/testfile bs=5G count=1
>
> This also gives same result. (bs and count reversed)
>
>
>
>
>
> And this example have generated a profile which I also attached to this
> e-mail.
>
>
>
> Is there anything that I can try? I am open to all kind of suggestions.
>
>
>
> Thanks,
>
> Gencer.
>
>
>
> *From:* Krutika Dhananjay [mailto:kdhan...@redhat.com]
> *Sent:* Tuesday, July 4, 2017 9:39 AM
>
> *To:* gen...@gencgiyen.com
> *Cc:* gluster-user <gluster-users@gluster.org>
> *Subject:* Re: [Gluster-users] Very slow performance on Sharded GlusterFS
>
>
>
> Hi Gencer,
>
> I just checked the volume-profile attachments.
>
> Things that seem really odd to me as far as the sharded volume is
> concerned:
>
> 1. Only the replica pair having bricks 5 and 6 on both nodes 09 and 10
> seems to have witnessed all the IO. No other bricks witnessed any write
> operations. This is unacceptable for a volume that has 8 other replica
> sets. Why didn't the shards get distributed across all of these sets?
>
>
>
> 2. For replica set consisting of bricks 5 and 6 of node 09, I see that the
> brick 5 is spending 99% of its time in FINODELK fop, when the fop that
> should have dominated its profile should have been in fact WRITE.
>
> Could you throw some more light on your setup from gluster standpoint?
> * For instance, are you using two different gluster volumes to gather
> these numbers - one distributed-replicated-striped and another
> distributed-replicated-sharded? Or are you merely converting a single
> volume from one type to another?
>
>
>
> * And if there are indeed two volumes, could you share both their `volume
> info` outputs to eliminate any confusion?
>
> * If there's just one volume, are you taking care to remove all data from
> the mount point of this volume before converting it?
>
> * What is the size the test file grew to?
>
> * These attached profiles are against dd runs? Or the file download test?
>
>
>
> -Krutika
>
>
>
>
>
> On Mon, Jul 3, 2017 at 8:42 PM, <gen...@gencgiyen.com> wrote:
>
> Hi Krutika,
>
>
>
> Have you be able to look out my profiles? Do you have any clue, idea or
> suggestion?
>
>
>
> Thanks,
>
> -Gencer
>
>
>
> *From:* Krutika Dhananjay [mailto:kdhan...@redhat.com]
> *Sent:* Friday, June 30, 2017 3:50 PM
>
>
> *To:* gen...@gencgiyen.com
> *Cc:* gluster-user <gluster-users@gluster.org>
> *Subject:* Re: [Gluster-users] Very slow performance on Sharded GlusterFS
>
>
>
> J

Re: [Gluster-users] Very slow performance on Sharded GlusterFS

2017-07-04 Thread Krutika Dhananjay
Hi Gencer,

I just checked the volume-profile attachments.

Things that seem really odd to me as far as the sharded volume is concerned:

1. Only the replica pair having bricks 5 and 6 on both nodes 09 and 10
seems to have witnessed all the IO. No other bricks witnessed any write
operations. This is unacceptable for a volume that has 8 other replica
sets. Why didn't the shards get distributed across all of these sets?

2. For replica set consisting of bricks 5 and 6 of node 09, I see that the
brick 5 is spending 99% of its time in FINODELK fop, when the fop that
should have dominated its profile should have been in fact WRITE.

Could you throw some more light on your setup from gluster standpoint?
* For instance, are you using two different gluster volumes to gather these
numbers - one distributed-replicated-striped and another
distributed-replicated-sharded? Or are you merely converting a single
volume from one type to another?

* And if there are indeed two volumes, could you share both their `volume
info` outputs to eliminate any confusion?

* If there's just one volume, are you taking care to remove all data from
the mount point of this volume before converting it?

* What is the size the test file grew to?

* These attached profiles are against dd runs? Or the file download test?

-Krutika



On Mon, Jul 3, 2017 at 8:42 PM, <gen...@gencgiyen.com> wrote:

> Hi Krutika,
>
>
>
> Have you be able to look out my profiles? Do you have any clue, idea or
> suggestion?
>
>
>
> Thanks,
>
> -Gencer
>
>
>
> *From:* Krutika Dhananjay [mailto:kdhan...@redhat.com]
> *Sent:* Friday, June 30, 2017 3:50 PM
>
> *To:* gen...@gencgiyen.com
> *Cc:* gluster-user <gluster-users@gluster.org>
> *Subject:* Re: [Gluster-users] Very slow performance on Sharded GlusterFS
>
>
>
> Just noticed that the way you have configured your brick order during
> volume-create makes both replicas of every set reside on the same machine.
>
> That apart, do you see any difference if you change shard-block-size to
> 512MB? Could you try that?
>
> If it doesn't help, could you share the volume-profile output for both the
> tests (separate)?
>
> Here's what you do:
>
> 1. Start profile before starting your test - it could be dd or it could be
> file download.
>
> # gluster volume profile  start
>
> 2. Run your test - again either dd or file-download.
>
> 3. Once the test has completed, run `gluster volume profile  info`
> and redirect its output to a tmp file.
>
> 4. Stop profile
>
> # gluster volume profile  stop
>
> And attach the volume-profile output file that you saved at a temporary
> location in step 3.
>
> -Krutika
>
>
>
> On Fri, Jun 30, 2017 at 5:33 PM, <gen...@gencgiyen.com> wrote:
>
> Hi Krutika,
>
>
>
> Sure, here is volume info:
>
>
>
> root@sr-09-loc-50-14-18:/# gluster volume info testvol
>
>
>
> Volume Name: testvol
>
> Type: Distributed-Replicate
>
> Volume ID: 30426017-59d5-4091-b6bc-279a905b704a
>
> Status: Started
>
> Snapshot Count: 0
>
> Number of Bricks: 10 x 2 = 20
>
> Transport-type: tcp
>
> Bricks:
>
> Brick1: sr-09-loc-50-14-18:/bricks/brick1
>
> Brick2: sr-09-loc-50-14-18:/bricks/brick2
>
> Brick3: sr-09-loc-50-14-18:/bricks/brick3
>
> Brick4: sr-09-loc-50-14-18:/bricks/brick4
>
> Brick5: sr-09-loc-50-14-18:/bricks/brick5
>
> Brick6: sr-09-loc-50-14-18:/bricks/brick6
>
> Brick7: sr-09-loc-50-14-18:/bricks/brick7
>
> Brick8: sr-09-loc-50-14-18:/bricks/brick8
>
> Brick9: sr-09-loc-50-14-18:/bricks/brick9
>
> Brick10: sr-09-loc-50-14-18:/bricks/brick10
>
> Brick11: sr-10-loc-50-14-18:/bricks/brick1
>
> Brick12: sr-10-loc-50-14-18:/bricks/brick2
>
> Brick13: sr-10-loc-50-14-18:/bricks/brick3
>
> Brick14: sr-10-loc-50-14-18:/bricks/brick4
>
> Brick15: sr-10-loc-50-14-18:/bricks/brick5
>
> Brick16: sr-10-loc-50-14-18:/bricks/brick6
>
> Brick17: sr-10-loc-50-14-18:/bricks/brick7
>
> Brick18: sr-10-loc-50-14-18:/bricks/brick8
>
> Brick19: sr-10-loc-50-14-18:/bricks/brick9
>
> Brick20: sr-10-loc-50-14-18:/bricks/brick10
>
> Options Reconfigured:
>
> features.shard-block-size: 32MB
>
> features.shard: on
>
> transport.address-family: inet
>
> nfs.disable: on
>
>
>
> -Gencer.
>
>
>
> *From:* Krutika Dhananjay [mailto:kdhan...@redhat.com]
> *Sent:* Friday, June 30, 2017 2:50 PM
> *To:* gen...@gencgiyen.com
> *Cc:* gluster-user <gluster-users@gluster.org>
> *Subject:* Re: [Gluster-users] Very slow performance on Sharded GlusterFS
>
>
>
> Could you please provide the volume-info output?
>
> -Krutika
>
>
>
> On Fri, Jun 30, 2017 at 4:23 PM, <gen.

Re: [Gluster-users] Very slow performance on Sharded GlusterFS

2017-06-30 Thread Krutika Dhananjay
Just noticed that the way you have configured your brick order during
volume-create makes both replicas of every set reside on the same machine.

That apart, do you see any difference if you change shard-block-size to
512MB? Could you try that?

If it doesn't help, could you share the volume-profile output for both the
tests (separate)?

Here's what you do:
1. Start profile before starting your test - it could be dd or it could be
file download.
# gluster volume profile  start

2. Run your test - again either dd or file-download.

3. Once the test has completed, run `gluster volume profile  info` and
redirect its output to a tmp file.

4. Stop profile
# gluster volume profile  stop

And attach the volume-profile output file that you saved at a temporary
location in step 3.

-Krutika


On Fri, Jun 30, 2017 at 5:33 PM, <gen...@gencgiyen.com> wrote:

> Hi Krutika,
>
>
>
> Sure, here is volume info:
>
>
>
> root@sr-09-loc-50-14-18:/# gluster volume info testvol
>
>
>
> Volume Name: testvol
>
> Type: Distributed-Replicate
>
> Volume ID: 30426017-59d5-4091-b6bc-279a905b704a
>
> Status: Started
>
> Snapshot Count: 0
>
> Number of Bricks: 10 x 2 = 20
>
> Transport-type: tcp
>
> Bricks:
>
> Brick1: sr-09-loc-50-14-18:/bricks/brick1
>
> Brick2: sr-09-loc-50-14-18:/bricks/brick2
>
> Brick3: sr-09-loc-50-14-18:/bricks/brick3
>
> Brick4: sr-09-loc-50-14-18:/bricks/brick4
>
> Brick5: sr-09-loc-50-14-18:/bricks/brick5
>
> Brick6: sr-09-loc-50-14-18:/bricks/brick6
>
> Brick7: sr-09-loc-50-14-18:/bricks/brick7
>
> Brick8: sr-09-loc-50-14-18:/bricks/brick8
>
> Brick9: sr-09-loc-50-14-18:/bricks/brick9
>
> Brick10: sr-09-loc-50-14-18:/bricks/brick10
>
> Brick11: sr-10-loc-50-14-18:/bricks/brick1
>
> Brick12: sr-10-loc-50-14-18:/bricks/brick2
>
> Brick13: sr-10-loc-50-14-18:/bricks/brick3
>
> Brick14: sr-10-loc-50-14-18:/bricks/brick4
>
> Brick15: sr-10-loc-50-14-18:/bricks/brick5
>
> Brick16: sr-10-loc-50-14-18:/bricks/brick6
>
> Brick17: sr-10-loc-50-14-18:/bricks/brick7
>
> Brick18: sr-10-loc-50-14-18:/bricks/brick8
>
> Brick19: sr-10-loc-50-14-18:/bricks/brick9
>
> Brick20: sr-10-loc-50-14-18:/bricks/brick10
>
> Options Reconfigured:
>
> features.shard-block-size: 32MB
>
> features.shard: on
>
> transport.address-family: inet
>
> nfs.disable: on
>
>
>
> -Gencer.
>
>
>
> *From:* Krutika Dhananjay [mailto:kdhan...@redhat.com]
> *Sent:* Friday, June 30, 2017 2:50 PM
> *To:* gen...@gencgiyen.com
> *Cc:* gluster-user <gluster-users@gluster.org>
> *Subject:* Re: [Gluster-users] Very slow performance on Sharded GlusterFS
>
>
>
> Could you please provide the volume-info output?
>
> -Krutika
>
>
>
> On Fri, Jun 30, 2017 at 4:23 PM, <gen...@gencgiyen.com> wrote:
>
> Hi,
>
>
>
> I have an 2 nodes with 20 bricks in total (10+10).
>
>
>
> First test:
>
>
>
> 2 Nodes with Distributed – Striped – Replicated (2 x 2)
>
> 10GbE Speed between nodes
>
>
>
> “dd” performance: 400mb/s and higher
>
> Downloading a large file from internet and directly to the gluster:
> 250-300mb/s
>
>
>
> Now same test without Stripe but with sharding. This results are same when
> I set shard size 4MB or 32MB. (Again 2x Replica here)
>
>
>
> Dd performance: 70mb/s
>
> Download directly to the gluster performance : 60mb/s
>
>
>
> Now, If we do this test twice at the same time (two dd or two doewnload at
> the same time) it goes below 25/mb each or slower.
>
>
>
> I thought sharding is at least equal or a little slower (maybe?) but these
> results are terribly slow.
>
>
>
> I tried tuning (cache, window-size etc..). Nothing helps.
>
>
>
> GlusterFS 3.11 and Debian 9 used. Kernel also tuned. Disks are “xfs” and
> 4TB each.
>
>
>
> Is there any tweak/tuning out there to make it fast?
>
>
>
> Or is this an expected behavior? If its, It is unacceptable. So slow. I
> cannot use this on production as it is terribly slow.
>
>
>
> The reason behind I use shard instead of stripe is i would like to
> eleminate files that bigger than brick size.
>
>
>
> Thanks,
>
> Gencer.
>
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
>
>
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Very slow performance on Sharded GlusterFS

2017-06-30 Thread Krutika Dhananjay
Could you please provide the volume-info output?

-Krutika

On Fri, Jun 30, 2017 at 4:23 PM,  wrote:

> Hi,
>
>
>
> I have an 2 nodes with 20 bricks in total (10+10).
>
>
>
> First test:
>
>
>
> 2 Nodes with Distributed – Striped – Replicated (2 x 2)
>
> 10GbE Speed between nodes
>
>
>
> “dd” performance: 400mb/s and higher
>
> Downloading a large file from internet and directly to the gluster:
> 250-300mb/s
>
>
>
> Now same test without Stripe but with sharding. This results are same when
> I set shard size 4MB or 32MB. (Again 2x Replica here)
>
>
>
> Dd performance: 70mb/s
>
> Download directly to the gluster performance : 60mb/s
>
>
>
> Now, If we do this test twice at the same time (two dd or two doewnload at
> the same time) it goes below 25/mb each or slower.
>
>
>
> I thought sharding is at least equal or a little slower (maybe?) but these
> results are terribly slow.
>
>
>
> I tried tuning (cache, window-size etc..). Nothing helps.
>
>
>
> GlusterFS 3.11 and Debian 9 used. Kernel also tuned. Disks are “xfs” and
> 4TB each.
>
>
>
> Is there any tweak/tuning out there to make it fast?
>
>
>
> Or is this an expected behavior? If its, It is unacceptable. So slow. I
> cannot use this on production as it is terribly slow.
>
>
>
> The reason behind I use shard instead of stripe is i would like to
> eleminate files that bigger than brick size.
>
>
>
> Thanks,
>
> Gencer.
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] [ovirt-users] Very poor GlusterFS performance

2017-06-21 Thread Krutika Dhananjay
No, you don't need to do any of that. Just executing volume-set commands is
sufficient for the changes to take effect.


-Krutika

On Wed, Jun 21, 2017 at 3:48 PM, Chris Boot <bo...@bootc.net> wrote:

> [replying to lists this time]
>
> On 20/06/17 11:23, Krutika Dhananjay wrote:
> > Couple of things:
> >
> > 1. Like Darrell suggested, you should enable stat-prefetch and increase
> > client and server event threads to 4.
> > # gluster volume set  performance.stat-prefetch on
> > # gluster volume set  client.event-threads 4
> > # gluster volume set  server.event-threads 4
> >
> > 2. Also glusterfs-3.10.1 and above has a shard performance bug fix -
> > https://review.gluster.org/#/c/16966/
> >
> > With these two changes, we saw great improvement in performance in our
> > internal testing.
>
> Hi Krutika,
>
> Thanks for your input. I have yet to run any benchmarks, but I'll do
> that once I have a bit more time to work on this.
>
> I've tweaked the options as you suggest, but that doesn't seem to have
> made an appreciable difference. I admit that without benchmarks it's a
> bit like sticking your finger in the air, though. Do I need to restart
> my bricks and/or remount the volumes for these to take effect?
>
> I'm actually running GlusterFS 3.10.2-1. This is all coming from the
> CentOS Storage SIG's centos-release-gluster310 repository.
>
> Thanks again.
>
> Chris
>
> --
> Chris Boot
> bo...@bootc.net
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] [ovirt-users] Very poor GlusterFS performance

2017-06-21 Thread Krutika Dhananjay
No. It's just that in the internal testing that was done here, increasing
the thread count beyond 4 did not improve the performance any further.

-Krutika

On Tue, Jun 20, 2017 at 11:30 PM, mabi  wrote:

> Dear Krutika,
>
> Sorry for asking so naively but can you tell me on what factor do you base
> that the client and server event-threads parameters for a volume should be
> set to 4?
>
> Is this metric for example based on the number of cores a GlusterFS server
> has?
>
> I am asking because I saw my GlusterFS volumes are set to 2 and would like
> to set these parameters to something meaningful for performance tuning. My
> setup is a two node replica with GlusterFS 3.8.11.
>
> Best regards,
> M.
>
>
>
>  Original Message 
> Subject: Re: [Gluster-users] [ovirt-users] Very poor GlusterFS performance
> Local Time: June 20, 2017 12:23 PM
> UTC Time: June 20, 2017 10:23 AM
> From: kdhan...@redhat.com
> To: Lindsay Mathieson 
> gluster-users , oVirt users 
>
> Couple of things:
> 1. Like Darrell suggested, you should enable stat-prefetch and increase
> client and server event threads to 4.
> # gluster volume set  performance.stat-prefetch on
> # gluster volume set  client.event-threads 4
> # gluster volume set  server.event-threads 4
>
> 2. Also glusterfs-3.10.1 and above has a shard performance bug fix -
> https://review.gluster.org/#/c/16966/
>
> With these two changes, we saw great improvement in performance in our
> internal testing.
>
> Do you mind trying these two options above?
> -Krutika
>
> On Tue, Jun 20, 2017 at 1:00 PM, Lindsay Mathieson <
> lindsay.mathie...@gmail.com> wrote:
>
>> Have you tried with:
>>
>> performance.strict-o-direct : off
>> performance.strict-write-ordering : off
>> They can be changed dynamically.
>>
>>
>> On 20 June 2017 at 17:21, Sahina Bose  wrote:
>>
>>> [Adding gluster-users]
>>>
>>> On Mon, Jun 19, 2017 at 8:16 PM, Chris Boot  wrote:
>>>
 Hi folks,

 I have 3x servers in a "hyper-converged" oVirt 4.1.2 + GlusterFS 3.10
 configuration. My VMs run off a replica 3 arbiter 1 volume comprised of
 6 bricks, which themselves live on two SSDs in each of the servers (one
 brick per SSD). The bricks are XFS on LVM thin volumes straight onto the
 SSDs. Connectivity is 10G Ethernet.

 Performance within the VMs is pretty terrible. I experience very low
 throughput and random IO is really bad: it feels like a latency issue.
 On my oVirt nodes the SSDs are not generally very busy. The 10G network
 seems to run without errors (iperf3 gives bandwidth measurements of >=
 9.20 Gbits/sec between the three servers).

 To put this into perspective: I was getting better behaviour from NFS4
 on a gigabit connection than I am with GlusterFS on 10G: that doesn't
 feel right at all.

 My volume configuration looks like this:

 Volume Name: vmssd
 Type: Distributed-Replicate
 Volume ID: d5a5ddd1-a140-4e0d-b514-701cfe464853
 Status: Started
 Snapshot Count: 0
 Number of Bricks: 2 x (2 + 1) = 6
 Transport-type: tcp
 Bricks:
 Brick1: ovirt3:/gluster/ssd0_vmssd/brick
 Brick2: ovirt1:/gluster/ssd0_vmssd/brick
 Brick3: ovirt2:/gluster/ssd0_vmssd/brick (arbiter)
 Brick4: ovirt3:/gluster/ssd1_vmssd/brick
 Brick5: ovirt1:/gluster/ssd1_vmssd/brick
 Brick6: ovirt2:/gluster/ssd1_vmssd/brick (arbiter)
 Options Reconfigured:
 nfs.disable: on
 transport.address-family: inet6
 performance.quick-read: off
 performance.read-ahead: off
 performance.io-cache: off
 performance.stat-prefetch: off
 performance.low-prio-threads: 32
 network.remote-dio: off
 cluster.eager-lock: enable
 cluster.quorum-type: auto
 cluster.server-quorum-type: server
 cluster.data-self-heal-algorithm: full
 cluster.locking-scheme: granular
 cluster.shd-max-threads: 8
 cluster.shd-wait-qlength: 1
 features.shard: on
 user.cifs: off
 storage.owner-uid: 36
 storage.owner-gid: 36
 features.shard-block-size: 128MB
 performance.strict-o-direct: on
 network.ping-timeout: 30
 cluster.granular-entry-heal: enable

 I would really appreciate some guidance on this to try to improve things
 because at this rate I will need to reconsider using GlusterFS
 altogether.

>>>
>>> Could you provide the gluster volume profile output while you're running
>>> your I/O tests.
>>> # gluster volume profile  start
>>> to start profiling
>>> # gluster volume profile  info
>>> for the profile output.
>>>
>>>

 Cheers,
 Chris

 --
 Chris Boot
 bo...@bootc.net
 ___
 Users mailing list
 us...@ovirt.org
 http://lists.ovirt.org/mailman/listinfo/users

>>>
>>>
>>> 

Re: [Gluster-users] [ovirt-users] Very poor GlusterFS performance

2017-06-20 Thread Krutika Dhananjay
Couple of things:

1. Like Darrell suggested, you should enable stat-prefetch and increase
client and server event threads to 4.
# gluster volume set  performance.stat-prefetch on
# gluster volume set  client.event-threads 4
# gluster volume set  server.event-threads 4

2. Also glusterfs-3.10.1 and above has a shard performance bug fix -
https://review.gluster.org/#/c/16966/

With these two changes, we saw great improvement in performance in our
internal testing.

Do you mind trying these two options above?

-Krutika

On Tue, Jun 20, 2017 at 1:00 PM, Lindsay Mathieson <
lindsay.mathie...@gmail.com> wrote:

> Have you tried with:
>
> performance.strict-o-direct : off
> performance.strict-write-ordering : off
>
> They can be changed dynamically.
>
>
> On 20 June 2017 at 17:21, Sahina Bose  wrote:
>
>> [Adding gluster-users]
>>
>> On Mon, Jun 19, 2017 at 8:16 PM, Chris Boot  wrote:
>>
>>> Hi folks,
>>>
>>> I have 3x servers in a "hyper-converged" oVirt 4.1.2 + GlusterFS 3.10
>>> configuration. My VMs run off a replica 3 arbiter 1 volume comprised of
>>> 6 bricks, which themselves live on two SSDs in each of the servers (one
>>> brick per SSD). The bricks are XFS on LVM thin volumes straight onto the
>>> SSDs. Connectivity is 10G Ethernet.
>>>
>>> Performance within the VMs is pretty terrible. I experience very low
>>> throughput and random IO is really bad: it feels like a latency issue.
>>> On my oVirt nodes the SSDs are not generally very busy. The 10G network
>>> seems to run without errors (iperf3 gives bandwidth measurements of >=
>>> 9.20 Gbits/sec between the three servers).
>>>
>>> To put this into perspective: I was getting better behaviour from NFS4
>>> on a gigabit connection than I am with GlusterFS on 10G: that doesn't
>>> feel right at all.
>>>
>>> My volume configuration looks like this:
>>>
>>> Volume Name: vmssd
>>> Type: Distributed-Replicate
>>> Volume ID: d5a5ddd1-a140-4e0d-b514-701cfe464853
>>> Status: Started
>>> Snapshot Count: 0
>>> Number of Bricks: 2 x (2 + 1) = 6
>>> Transport-type: tcp
>>> Bricks:
>>> Brick1: ovirt3:/gluster/ssd0_vmssd/brick
>>> Brick2: ovirt1:/gluster/ssd0_vmssd/brick
>>> Brick3: ovirt2:/gluster/ssd0_vmssd/brick (arbiter)
>>> Brick4: ovirt3:/gluster/ssd1_vmssd/brick
>>> Brick5: ovirt1:/gluster/ssd1_vmssd/brick
>>> Brick6: ovirt2:/gluster/ssd1_vmssd/brick (arbiter)
>>> Options Reconfigured:
>>> nfs.disable: on
>>> transport.address-family: inet6
>>> performance.quick-read: off
>>> performance.read-ahead: off
>>> performance.io-cache: off
>>> performance.stat-prefetch: off
>>> performance.low-prio-threads: 32
>>> network.remote-dio: off
>>> cluster.eager-lock: enable
>>> cluster.quorum-type: auto
>>> cluster.server-quorum-type: server
>>> cluster.data-self-heal-algorithm: full
>>> cluster.locking-scheme: granular
>>> cluster.shd-max-threads: 8
>>> cluster.shd-wait-qlength: 1
>>> features.shard: on
>>> user.cifs: off
>>> storage.owner-uid: 36
>>> storage.owner-gid: 36
>>> features.shard-block-size: 128MB
>>> performance.strict-o-direct: on
>>> network.ping-timeout: 30
>>> cluster.granular-entry-heal: enable
>>>
>>> I would really appreciate some guidance on this to try to improve things
>>> because at this rate I will need to reconsider using GlusterFS
>>> altogether.
>>>
>>
>>
>> Could you provide the gluster volume profile output while you're running
>> your I/O tests.
>>
>> # gluster volume profile  start
>> to start profiling
>>
>> # gluster volume profile  info
>>
>> for the profile output.
>>
>>
>>>
>>> Cheers,
>>> Chris
>>>
>>> --
>>> Chris Boot
>>> bo...@bootc.net
>>> ___
>>> Users mailing list
>>> us...@ovirt.org
>>> http://lists.ovirt.org/mailman/listinfo/users
>>>
>>
>>
>> ___
>> Gluster-users mailing list
>> Gluster-users@gluster.org
>> http://lists.gluster.org/mailman/listinfo/gluster-users
>>
>
>
>
> --
> Lindsay
>
> ___
> Users mailing list
> us...@ovirt.org
> http://lists.ovirt.org/mailman/listinfo/users
>
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Rebalance + VM corruption - current status and request for feedback

2017-06-06 Thread Krutika Dhananjay
Hi Mahdi,

Did you get a chance to verify this fix again?
If this fix works for you, is it OK if we move this bug to CLOSED state and
revert the rebalance-cli warning patch?

-Krutika

On Mon, May 29, 2017 at 6:51 PM, Mahdi Adnan <mahdi.ad...@outlook.com>
wrote:

> Hello,
>
>
> Yes, i forgot to upgrade the client as well.
>
> I did the upgrade and created a new volume, same options as before, with
> one VM running and doing lots of IOs. i started the rebalance with force
> and after it completed the process i rebooted the VM, and it did start
> normally without issues.
>
> I repeated the process and did another rebalance while the VM running and
> everything went fine.
>
> But the logs in the client throwing lots of warning messages:
>
>
> [2017-05-29 13:14:59.416382] W [MSGID: 114031] 
> [client-rpc-fops.c:2928:client3_3_lookup_cbk]
> 2-gfs_vol2-client-2: remote operation failed. Path:
> /50294ed6-db7a-418d-965f-9b44c69a83fd/images/d59487fe-
> f3a9-4bad-a607-3a181c871711/aa01c3a0-5aa0-432d-82ad-d1f515f1d87f
> (93c403f5-c769-44b9-a087-dc51fc21412e) [No such file or directory]
> [2017-05-29 13:14:59.416427] W [MSGID: 114031] 
> [client-rpc-fops.c:2928:client3_3_lookup_cbk]
> 2-gfs_vol2-client-3: remote operation failed. Path:
> /50294ed6-db7a-418d-965f-9b44c69a83fd/images/d59487fe-
> f3a9-4bad-a607-3a181c871711/aa01c3a0-5aa0-432d-82ad-d1f515f1d87f
> (93c403f5-c769-44b9-a087-dc51fc21412e) [No such file or directory]
> [2017-05-29 13:14:59.808251] W [MSGID: 114031] 
> [client-rpc-fops.c:2928:client3_3_lookup_cbk]
> 2-gfs_vol2-client-2: remote operation failed. Path:
> /50294ed6-db7a-418d-965f-9b44c69a83fd/images/d59487fe-
> f3a9-4bad-a607-3a181c871711/aa01c3a0-5aa0-432d-82ad-d1f515f1d87f
> (93c403f5-c769-44b9-a087-dc51fc21412e) [No such file or directory]
> [2017-05-29 13:14:59.808287] W [MSGID: 114031] 
> [client-rpc-fops.c:2928:client3_3_lookup_cbk]
> 2-gfs_vol2-client-3: remote operation failed. Path:
> /50294ed6-db7a-418d-965f-9b44c69a83fd/images/d59487fe-
> f3a9-4bad-a607-3a181c871711/aa01c3a0-5aa0-432d-82ad-d1f515f1d87f
> (93c403f5-c769-44b9-a087-dc51fc21412e) [No such file or directory]
>
>
>
> Although the process went smooth, i will run another extensive test
> tomorrow just to be sure.
>
> --
>
> Respectfully
> *Mahdi A. Mahdi*
>
> --
> *From:* Krutika Dhananjay <kdhan...@redhat.com>
> *Sent:* Monday, May 29, 2017 9:20:29 AM
>
> *To:* Mahdi Adnan
> *Cc:* gluster-user; Gandalf Corvotempesta; Lindsay Mathieson; Kevin
> Lemonnier
> *Subject:* Re: Rebalance + VM corruption - current status and request for
> feedback
>
> Hi,
>
> I took a look at your logs.
> It very much seems like an issue that is caused by a mismatch in glusterfs
> client and server packages.
> So your client (mount) seems to be still running 3.7.20, as confirmed by
> the occurrence of the following log message:
>
> [2017-05-26 08:58:23.647458] I [MSGID: 100030] [glusterfsd.c:2338:main]
> 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.7.20
> (args: /usr/sbin/glusterfs --volfile-server=s1 --volfile-server=s2
> --volfile-server=s3 --volfile-server=s4 --volfile-id=/testvol
> /rhev/data-center/mnt/glusterSD/s1:_testvol)
> [2017-05-26 08:58:40.901204] I [MSGID: 100030] [glusterfsd.c:2338:main]
> 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.7.20
> (args: /usr/sbin/glusterfs --volfile-server=s1 --volfile-server=s2
> --volfile-server=s3 --volfile-server=s4 --volfile-id=/testvol
> /rhev/data-center/mnt/glusterSD/s1:_testvol)
> [2017-05-26 08:58:48.923452] I [MSGID: 100030] [glusterfsd.c:2338:main]
> 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.7.20
> (args: /usr/sbin/glusterfs --volfile-server=s1 --volfile-server=s2
> --volfile-server=s3 --volfile-server=s4 --volfile-id=/testvol
> /rhev/data-center/mnt/glusterSD/s1:_testvol)
>
> whereas the servers have rightly been upgraded to 3.10.2, as seen in
> rebalance log:
>
> [2017-05-26 09:36:36.075940] I [MSGID: 100030] [glusterfsd.c:2475:main]
> 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.10.2
> (args: /usr/sbin/glusterfs -s localhost --volfile-id rebalance/testvol
> --xlator-option *dht.use-readdirp=yes --xlator-option
> *dht.lookup-unhashed=yes --xlator-option *dht.assert-no-child-down=yes
> --xlator-option *replicate*.data-self-heal=off --xlator-option
> *replicate*.metadata-self-heal=off --xlator-option
> *replicate*.entry-self-heal=off --xlator-option *dht.readdir-optimize=on
> --xlator-option *dht.rebalance-cmd=5 --xlator-option
> *dht.node-uuid=7c0bf49e-1ede-47b1-b9a5-bfde6e60f07b --xlator-option
> *dht.commit-hash=3376396580 --socket-file /var/run/gluster/gluster-
>

Re: [Gluster-users] Rebalance + VM corruption - current status and request for feedback

2017-06-04 Thread Krutika Dhananjay
The fixes are already available in 3.10.2, 3.8.12 and 3.11.0

-Krutika

On Sun, Jun 4, 2017 at 5:30 PM, Gandalf Corvotempesta <
gandalf.corvotempe...@gmail.com> wrote:

> Great news.
> Is this planned to be published in next release?
>
> Il 29 mag 2017 3:27 PM, "Krutika Dhananjay" <kdhan...@redhat.com> ha
> scritto:
>
>> Thanks for that update. Very happy to hear it ran fine without any
>> issues. :)
>>
>> Yeah so you can ignore those 'No such file or directory' errors. They
>> represent a transient state where DHT in the client process is yet to
>> figure out the new location of the file.
>>
>> -Krutika
>>
>>
>> On Mon, May 29, 2017 at 6:51 PM, Mahdi Adnan <mahdi.ad...@outlook.com>
>> wrote:
>>
>>> Hello,
>>>
>>>
>>> Yes, i forgot to upgrade the client as well.
>>>
>>> I did the upgrade and created a new volume, same options as before, with
>>> one VM running and doing lots of IOs. i started the rebalance with force
>>> and after it completed the process i rebooted the VM, and it did start
>>> normally without issues.
>>>
>>> I repeated the process and did another rebalance while the VM running
>>> and everything went fine.
>>>
>>> But the logs in the client throwing lots of warning messages:
>>>
>>>
>>> [2017-05-29 13:14:59.416382] W [MSGID: 114031]
>>> [client-rpc-fops.c:2928:client3_3_lookup_cbk] 2-gfs_vol2-client-2:
>>> remote operation failed. Path: /50294ed6-db7a-418d-965f-9b44c
>>> 69a83fd/images/d59487fe-f3a9-4bad-a607-3a181c871711/aa01c3a0-5aa0-432d-82ad-d1f515f1d87f
>>> (93c403f5-c769-44b9-a087-dc51fc21412e) [No such file or directory]
>>> [2017-05-29 13:14:59.416427] W [MSGID: 114031]
>>> [client-rpc-fops.c:2928:client3_3_lookup_cbk] 2-gfs_vol2-client-3:
>>> remote operation failed. Path: /50294ed6-db7a-418d-965f-9b44c
>>> 69a83fd/images/d59487fe-f3a9-4bad-a607-3a181c871711/aa01c3a0-5aa0-432d-82ad-d1f515f1d87f
>>> (93c403f5-c769-44b9-a087-dc51fc21412e) [No such file or directory]
>>> [2017-05-29 13:14:59.808251] W [MSGID: 114031]
>>> [client-rpc-fops.c:2928:client3_3_lookup_cbk] 2-gfs_vol2-client-2:
>>> remote operation failed. Path: /50294ed6-db7a-418d-965f-9b44c
>>> 69a83fd/images/d59487fe-f3a9-4bad-a607-3a181c871711/aa01c3a0-5aa0-432d-82ad-d1f515f1d87f
>>> (93c403f5-c769-44b9-a087-dc51fc21412e) [No such file or directory]
>>> [2017-05-29 13:14:59.808287] W [MSGID: 114031]
>>> [client-rpc-fops.c:2928:client3_3_lookup_cbk] 2-gfs_vol2-client-3:
>>> remote operation failed. Path: /50294ed6-db7a-418d-965f-9b44c
>>> 69a83fd/images/d59487fe-f3a9-4bad-a607-3a181c871711/aa01c3a0-5aa0-432d-82ad-d1f515f1d87f
>>> (93c403f5-c769-44b9-a087-dc51fc21412e) [No such file or directory]
>>>
>>>
>>>
>>> Although the process went smooth, i will run another extensive test
>>> tomorrow just to be sure.
>>>
>>> --
>>>
>>> Respectfully
>>> *Mahdi A. Mahdi*
>>>
>>> --
>>> *From:* Krutika Dhananjay <kdhan...@redhat.com>
>>> *Sent:* Monday, May 29, 2017 9:20:29 AM
>>>
>>> *To:* Mahdi Adnan
>>> *Cc:* gluster-user; Gandalf Corvotempesta; Lindsay Mathieson; Kevin
>>> Lemonnier
>>> *Subject:* Re: Rebalance + VM corruption - current status and request
>>> for feedback
>>>
>>> Hi,
>>>
>>> I took a look at your logs.
>>> It very much seems like an issue that is caused by a mismatch in
>>> glusterfs client and server packages.
>>> So your client (mount) seems to be still running 3.7.20, as confirmed by
>>> the occurrence of the following log message:
>>>
>>> [2017-05-26 08:58:23.647458] I [MSGID: 100030] [glusterfsd.c:2338:main]
>>> 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.7.20
>>> (args: /usr/sbin/glusterfs --volfile-server=s1 --volfile-server=s2
>>> --volfile-server=s3 --volfile-server=s4 --volfile-id=/testvol
>>> /rhev/data-center/mnt/glusterSD/s1:_testvol)
>>> [2017-05-26 08:58:40.901204] I [MSGID: 100030] [glusterfsd.c:2338:main]
>>> 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.7.20
>>> (args: /usr/sbin/glusterfs --volfile-server=s1 --volfile-server=s2
>>> --volfile-server=s3 --volfile-server=s4 --volfile-id=/testvol
>>> /rhev/data-center/mnt/glusterSD/s1:_testvol)
>>> [2017-05-26 08:58:48.923452] I [MSGID: 100030] [glusterfsd.c:2338:main]
>

Re: [Gluster-users] Rebalance + VM corruption - current status and request for feedback

2017-05-29 Thread Krutika Dhananjay
Thanks for that update. Very happy to hear it ran fine without any issues.
:)

Yeah so you can ignore those 'No such file or directory' errors. They
represent a transient state where DHT in the client process is yet to
figure out the new location of the file.

-Krutika


On Mon, May 29, 2017 at 6:51 PM, Mahdi Adnan <mahdi.ad...@outlook.com>
wrote:

> Hello,
>
>
> Yes, i forgot to upgrade the client as well.
>
> I did the upgrade and created a new volume, same options as before, with
> one VM running and doing lots of IOs. i started the rebalance with force
> and after it completed the process i rebooted the VM, and it did start
> normally without issues.
>
> I repeated the process and did another rebalance while the VM running and
> everything went fine.
>
> But the logs in the client throwing lots of warning messages:
>
>
> [2017-05-29 13:14:59.416382] W [MSGID: 114031] 
> [client-rpc-fops.c:2928:client3_3_lookup_cbk]
> 2-gfs_vol2-client-2: remote operation failed. Path:
> /50294ed6-db7a-418d-965f-9b44c69a83fd/images/d59487fe-
> f3a9-4bad-a607-3a181c871711/aa01c3a0-5aa0-432d-82ad-d1f515f1d87f
> (93c403f5-c769-44b9-a087-dc51fc21412e) [No such file or directory]
> [2017-05-29 13:14:59.416427] W [MSGID: 114031] 
> [client-rpc-fops.c:2928:client3_3_lookup_cbk]
> 2-gfs_vol2-client-3: remote operation failed. Path:
> /50294ed6-db7a-418d-965f-9b44c69a83fd/images/d59487fe-
> f3a9-4bad-a607-3a181c871711/aa01c3a0-5aa0-432d-82ad-d1f515f1d87f
> (93c403f5-c769-44b9-a087-dc51fc21412e) [No such file or directory]
> [2017-05-29 13:14:59.808251] W [MSGID: 114031] 
> [client-rpc-fops.c:2928:client3_3_lookup_cbk]
> 2-gfs_vol2-client-2: remote operation failed. Path:
> /50294ed6-db7a-418d-965f-9b44c69a83fd/images/d59487fe-
> f3a9-4bad-a607-3a181c871711/aa01c3a0-5aa0-432d-82ad-d1f515f1d87f
> (93c403f5-c769-44b9-a087-dc51fc21412e) [No such file or directory]
> [2017-05-29 13:14:59.808287] W [MSGID: 114031] 
> [client-rpc-fops.c:2928:client3_3_lookup_cbk]
> 2-gfs_vol2-client-3: remote operation failed. Path:
> /50294ed6-db7a-418d-965f-9b44c69a83fd/images/d59487fe-
> f3a9-4bad-a607-3a181c871711/aa01c3a0-5aa0-432d-82ad-d1f515f1d87f
> (93c403f5-c769-44b9-a087-dc51fc21412e) [No such file or directory]
>
>
>
> Although the process went smooth, i will run another extensive test
> tomorrow just to be sure.
>
> --
>
> Respectfully
> *Mahdi A. Mahdi*
>
> --
> *From:* Krutika Dhananjay <kdhan...@redhat.com>
> *Sent:* Monday, May 29, 2017 9:20:29 AM
>
> *To:* Mahdi Adnan
> *Cc:* gluster-user; Gandalf Corvotempesta; Lindsay Mathieson; Kevin
> Lemonnier
> *Subject:* Re: Rebalance + VM corruption - current status and request for
> feedback
>
> Hi,
>
> I took a look at your logs.
> It very much seems like an issue that is caused by a mismatch in glusterfs
> client and server packages.
> So your client (mount) seems to be still running 3.7.20, as confirmed by
> the occurrence of the following log message:
>
> [2017-05-26 08:58:23.647458] I [MSGID: 100030] [glusterfsd.c:2338:main]
> 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.7.20
> (args: /usr/sbin/glusterfs --volfile-server=s1 --volfile-server=s2
> --volfile-server=s3 --volfile-server=s4 --volfile-id=/testvol
> /rhev/data-center/mnt/glusterSD/s1:_testvol)
> [2017-05-26 08:58:40.901204] I [MSGID: 100030] [glusterfsd.c:2338:main]
> 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.7.20
> (args: /usr/sbin/glusterfs --volfile-server=s1 --volfile-server=s2
> --volfile-server=s3 --volfile-server=s4 --volfile-id=/testvol
> /rhev/data-center/mnt/glusterSD/s1:_testvol)
> [2017-05-26 08:58:48.923452] I [MSGID: 100030] [glusterfsd.c:2338:main]
> 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.7.20
> (args: /usr/sbin/glusterfs --volfile-server=s1 --volfile-server=s2
> --volfile-server=s3 --volfile-server=s4 --volfile-id=/testvol
> /rhev/data-center/mnt/glusterSD/s1:_testvol)
>
> whereas the servers have rightly been upgraded to 3.10.2, as seen in
> rebalance log:
>
> [2017-05-26 09:36:36.075940] I [MSGID: 100030] [glusterfsd.c:2475:main]
> 0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.10.2
> (args: /usr/sbin/glusterfs -s localhost --volfile-id rebalance/testvol
> --xlator-option *dht.use-readdirp=yes --xlator-option
> *dht.lookup-unhashed=yes --xlator-option *dht.assert-no-child-down=yes
> --xlator-option *replicate*.data-self-heal=off --xlator-option
> *replicate*.metadata-self-heal=off --xlator-option
> *replicate*.entry-self-heal=off --xlator-option *dht.readdir-optimize=on
> --xlator-option *dht.rebalance-cmd=5 --xlator-option
> *dht.node-uuid=7c0bf49e-1ede-47b1-b9a5-bfde6e60f07b --x

Re: [Gluster-users] Rebalance + VM corruption - current status and request for feedback

2017-05-29 Thread Krutika Dhananjay
Hi,

I took a look at your logs.
It very much seems like an issue that is caused by a mismatch in glusterfs
client and server packages.
So your client (mount) seems to be still running 3.7.20, as confirmed by
the occurrence of the following log message:

[2017-05-26 08:58:23.647458] I [MSGID: 100030] [glusterfsd.c:2338:main]
0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.7.20
(args: /usr/sbin/glusterfs --volfile-server=s1 --volfile-server=s2
--volfile-server=s3 --volfile-server=s4 --volfile-id=/testvol
/rhev/data-center/mnt/glusterSD/s1:_testvol)
[2017-05-26 08:58:40.901204] I [MSGID: 100030] [glusterfsd.c:2338:main]
0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.7.20
(args: /usr/sbin/glusterfs --volfile-server=s1 --volfile-server=s2
--volfile-server=s3 --volfile-server=s4 --volfile-id=/testvol
/rhev/data-center/mnt/glusterSD/s1:_testvol)
[2017-05-26 08:58:48.923452] I [MSGID: 100030] [glusterfsd.c:2338:main]
0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.7.20
(args: /usr/sbin/glusterfs --volfile-server=s1 --volfile-server=s2
--volfile-server=s3 --volfile-server=s4 --volfile-id=/testvol
/rhev/data-center/mnt/glusterSD/s1:_testvol)

whereas the servers have rightly been upgraded to 3.10.2, as seen in
rebalance log:

[2017-05-26 09:36:36.075940] I [MSGID: 100030] [glusterfsd.c:2475:main]
0-/usr/sbin/glusterfs: Started running /usr/sbin/glusterfs version 3.10.2
(args: /usr/sbin/glusterfs -s localhost --volfile-id rebalance/testvol
--xlator-option *dht.use-readdirp=yes --xlator-option
*dht.lookup-unhashed=yes --xlator-option *dht.assert-no-child-down=yes
--xlator-option *replicate*.data-self-heal=off --xlator-option
*replicate*.metadata-self-heal=off --xlator-option
*replicate*.entry-self-heal=off --xlator-option *dht.readdir-optimize=on
--xlator-option *dht.rebalance-cmd=5 --xlator-option
*dht.node-uuid=7c0bf49e-1ede-47b1-b9a5-bfde6e60f07b --xlator-option
*dht.commit-hash=3376396580 --socket-file
/var/run/gluster/gluster-rebalance-801faefa-a583-46b4-8eef-e0ec160da9ea.sock
--pid-file
/var/lib/glusterd/vols/testvol/rebalance/7c0bf49e-1ede-47b1-b9a5-bfde6e60f07b.pid
-l /var/log/glusterfs/testvol-rebalance.log)


Could you upgrade all packages to 3.10.2 and try again?

-Krutika


On Fri, May 26, 2017 at 4:46 PM, Mahdi Adnan <mahdi.ad...@outlook.com>
wrote:

> Hi,
>
>
> Attached are the logs for both the rebalance and the mount.
>
>
>
> --
>
> Respectfully
> *Mahdi A. Mahdi*
>
> ------
> *From:* Krutika Dhananjay <kdhan...@redhat.com>
> *Sent:* Friday, May 26, 2017 1:12:28 PM
> *To:* Mahdi Adnan
> *Cc:* gluster-user; Gandalf Corvotempesta; Lindsay Mathieson; Kevin
> Lemonnier
> *Subject:* Re: Rebalance + VM corruption - current status and request for
> feedback
>
> Could you provide the rebalance and mount logs?
>
> -Krutika
>
> On Fri, May 26, 2017 at 3:17 PM, Mahdi Adnan <mahdi.ad...@outlook.com>
> wrote:
>
>> Good morning,
>>
>>
>> So i have tested the new Gluster 3.10.2, and after starting rebalance two
>> VMs were paused due to storage error and third one was not responding.
>>
>> After rebalance completed i started the VMs and it did not boot, and
>> throw an XFS wrong inode error into the screen.
>>
>>
>> My setup:
>>
>> 4 nodes running CentOS7.3 with Gluster 3.10.2
>>
>> 4 bricks in distributed replica with group set to virt.
>>
>> I added the volume to ovirt and created three VMs, i ran a loop to create
>> 5GB file inside the VMs.
>>
>> Added new 4 bricks to the existing nodes.
>>
>> Started rebalane "with force to bypass the warning message"
>>
>> VMs started to fail after rebalancing.
>>
>>
>>
>>
>> --
>>
>> Respectfully
>> *Mahdi A. Mahdi*
>>
>> --
>> *From:* Krutika Dhananjay <kdhan...@redhat.com>
>> *Sent:* Wednesday, May 17, 2017 6:59:20 AM
>> *To:* gluster-user
>> *Cc:* Gandalf Corvotempesta; Lindsay Mathieson; Kevin Lemonnier; Mahdi
>> Adnan
>> *Subject:* Rebalance + VM corruption - current status and request for
>> feedback
>>
>> Hi,
>>
>> In the past couple of weeks, we've sent the following fixes concerning VM
>> corruption upon doing rebalance - https://review.gluster.org/#/q
>> /status:merged+project:glusterfs+branch:master+topic:bug-1440051
>>
>> These fixes are very much part of the latest 3.10.2 release.
>>
>> Satheesaran within Red Hat also verified that they work and he's not
>> seeing corruption issues anymore.
>>
>> I'd like to hear feedback from the users themselves on these fixes (on
>> your test environments to begin with) be

Re: [Gluster-users] Rebalance + VM corruption - current status and request for feedback

2017-05-26 Thread Krutika Dhananjay
Could you provide the rebalance and mount logs?

-Krutika

On Fri, May 26, 2017 at 3:17 PM, Mahdi Adnan <mahdi.ad...@outlook.com>
wrote:

> Good morning,
>
>
> So i have tested the new Gluster 3.10.2, and after starting rebalance two
> VMs were paused due to storage error and third one was not responding.
>
> After rebalance completed i started the VMs and it did not boot, and throw
> an XFS wrong inode error into the screen.
>
>
> My setup:
>
> 4 nodes running CentOS7.3 with Gluster 3.10.2
>
> 4 bricks in distributed replica with group set to virt.
>
> I added the volume to ovirt and created three VMs, i ran a loop to create
> 5GB file inside the VMs.
>
> Added new 4 bricks to the existing nodes.
>
> Started rebalane "with force to bypass the warning message"
>
> VMs started to fail after rebalancing.
>
>
>
>
> --
>
> Respectfully
> *Mahdi A. Mahdi*
>
> --
> *From:* Krutika Dhananjay <kdhan...@redhat.com>
> *Sent:* Wednesday, May 17, 2017 6:59:20 AM
> *To:* gluster-user
> *Cc:* Gandalf Corvotempesta; Lindsay Mathieson; Kevin Lemonnier; Mahdi
> Adnan
> *Subject:* Rebalance + VM corruption - current status and request for
> feedback
>
> Hi,
>
> In the past couple of weeks, we've sent the following fixes concerning VM
> corruption upon doing rebalance - https://review.gluster.org/#/
> q/status:merged+project:glusterfs+branch:master+topic:bug-1440051
>
> These fixes are very much part of the latest 3.10.2 release.
>
> Satheesaran within Red Hat also verified that they work and he's not
> seeing corruption issues anymore.
>
> I'd like to hear feedback from the users themselves on these fixes (on
> your test environments to begin with) before even changing the status of
> the bug to CLOSED.
>
> Although 3.10.2 has a patch that prevents rebalance sub-commands from
> being executed on sharded volumes, you can override the check by using the
> 'force' option.
>
> For example,
>
> # gluster volume rebalance myvol start force
>
> Very much looking forward to hearing from you all.
>
> Thanks,
> Krutika
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Rebalance + VM corruption - current status and request for feedback

2017-05-20 Thread Krutika Dhananjay
Raghavendra Talur might know. Adding him to the thread.

-Krutika

On Sat, May 20, 2017 at 2:47 PM, Mahdi Adnan <mahdi.ad...@outlook.com>
wrote:

> Good morning,
>
>
> SIG repository does not have the latest glusterfs 3.10.2.
>
> Do you have any idea when it's going to be updated ?
>
> Is there any other recommended place to get the latest rpms ?
>
> --
>
> Respectfully
> *Mahdi A. Mahdi*
>
> --
> *From:* Mahdi Adnan <mahdi.ad...@outlook.com>
> *Sent:* Friday, May 19, 2017 6:14:05 PM
> *To:* Krutika Dhananjay; gluster-user
> *Cc:* Gandalf Corvotempesta; Lindsay Mathieson; Kevin Lemonnier
> *Subject:* Re: Rebalance + VM corruption - current status and request for
> feedback
>
>
> Thank you so much mate.
>
> I'll finish the test tomorrow and let you know the results.
>
> --
>
> Respectfully
> *Mahdi A. Mahdi*
>
> --
> *From:* Krutika Dhananjay <kdhan...@redhat.com>
> *Sent:* Wednesday, May 17, 2017 6:59:20 AM
> *To:* gluster-user
> *Cc:* Gandalf Corvotempesta; Lindsay Mathieson; Kevin Lemonnier; Mahdi
> Adnan
> *Subject:* Rebalance + VM corruption - current status and request for
> feedback
>
> Hi,
>
> In the past couple of weeks, we've sent the following fixes concerning VM
> corruption upon doing rebalance - https://review.gluster.org/#/
> q/status:merged+project:glusterfs+branch:master+topic:bug-1440051
>
> These fixes are very much part of the latest 3.10.2 release.
>
> Satheesaran within Red Hat also verified that they work and he's not
> seeing corruption issues anymore.
>
> I'd like to hear feedback from the users themselves on these fixes (on
> your test environments to begin with) before even changing the status of
> the bug to CLOSED.
>
> Although 3.10.2 has a patch that prevents rebalance sub-commands from
> being executed on sharded volumes, you can override the check by using the
> 'force' option.
>
> For example,
>
> # gluster volume rebalance myvol start force
>
> Very much looking forward to hearing from you all.
>
> Thanks,
> Krutika
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] Rebalance + VM corruption - current status and request for feedback

2017-05-16 Thread Krutika Dhananjay
Hi,

In the past couple of weeks, we've sent the following fixes concerning VM
corruption upon doing rebalance -
https://review.gluster.org/#/q/status:merged+project:glusterfs+branch:master+topic:bug-1440051

These fixes are very much part of the latest 3.10.2 release.

Satheesaran within Red Hat also verified that they work and he's not seeing
corruption issues anymore.

I'd like to hear feedback from the users themselves on these fixes (on your
test environments to begin with) before even changing the status of the bug
to CLOSED.

Although 3.10.2 has a patch that prevents rebalance sub-commands from being
executed on sharded volumes, you can override the check by using the
'force' option.

For example,

# gluster volume rebalance myvol start force

Very much looking forward to hearing from you all.

Thanks,
Krutika
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Reliability issues with Gluster 3.10 and shard

2017-05-15 Thread Krutika Dhananjay
Shard translator is currently supported only for VM image store workload.

-Krutika

On Sun, May 14, 2017 at 12:50 AM, Benjamin Kingston 
wrote:

> Hers's some log entries from nfs-ganesha gfapi
>
> [2017-05-13 19:02:54.105936] E [MSGID: 133010] 
> [shard.c:1706:shard_common_lookup_shards_cbk]
> 0-storage2-shard: Lookup on shard 11 failed. Base file gfid =
> 1494c083-a618-4eba-80a0-147e656dd9d0 [Input/output error]
> [2017-05-13 19:02:54.106176] E [MSGID: 133010] 
> [shard.c:1706:shard_common_lookup_shards_cbk]
> 0-storage2-shard: Lookup on shard 2 failed. Base file gfid =
> 1494c083-a618-4eba-80a0-147e656dd9d0 [Input/output error]
> [2017-05-13 19:02:54.106288] E [MSGID: 133010] 
> [shard.c:1706:shard_common_lookup_shards_cbk]
> 0-storage2-shard: Lookup on shard 1 failed. Base file gfid =
> 1494c083-a618-4eba-80a0-147e656dd9d0 [Input/output error]
> [2017-05-13 19:02:54.384922] I [MSGID: 108026]
> [afr-self-heal-metadata.c:52:__afr_selfheal_metadata_do]
> 0-storage2-replicate-2: performing metadata selfheal on
> fe651475-226e-42a3-be2d-751d4f58e383
> [2017-05-13 19:02:54.385894] W [MSGID: 114031] 
> [client-rpc-fops.c:2258:client3_3_setattr_cbk]
> 0-storage2-client-8: remote operation failed [Operation not permitted]
> [2017-05-13 19:02:54.401187] I [MSGID: 108026]
> [afr-self-heal-common.c:1255:afr_log_selfheal] 0-storage2-replicate-2:
> Completed metadata selfheal on fe651475-226e-42a3-be2d-751d4f58e383.
> sources=[0] 1  sinks=
> [2017-05-13 19:02:57.830019] I [MSGID: 109066]
> [dht-rename.c:1608:dht_rename] 0-storage2-dht: renaming
> /content/Downloads/incomplete/usenet/Attack.on.Titan.S02E05.
> 720p.WEB.x264-ANiURL.#27/aniurl-aot.s02e05.720p.web.par2.tmp
> (hash=storage2-readdir-ahead-2/cache=storage2-readdir-ahead-2) =>
> /content/Downloads/incomplete/usenet/Attack.on.Titan.S02E05.
> 720p.WEB.x264-ANiURL.#27/aniurl-aot.s02e05.720p.web.par2
> (hash=storage2-readdir-ahead-0/cache=)
>
> [2017-05-13 19:08:22.014899] I [MSGID: 109066]
> [dht-rename.c:1608:dht_rename] 0-storage2-dht: renaming
> /content/Downloads/incomplete/usenet/Attack.on.Titan.S02E05.
> 720p.WEB.x264-ANiURL.#27/aniurl-aot.s02e05.720p.web.srr.tmp
> (hash=storage2-readdir-ahead-1/cache=storage2-readdir-ahead-1) =>
> /content/Downloads/incomplete/usenet/Attack.on.Titan.S02E05.
> 720p.WEB.x264-ANiURL.#27/aniurl-aot.s02e05.720p.web.srr
> (hash=storage2-readdir-ahead-1/cache=)
> [2017-05-13 19:08:22.463840] I [MSGID: 109066]
> [dht-rename.c:1608:dht_rename] 0-storage2-dht: renaming
> /content/Downloads/incomplete/usenet/Attack.on.Titan.S02E05.
> 720p.WEB.x264-ANiURL.#27/aniurl-aot.s02e05.720p.web.r04.tmp
> (hash=storage2-readdir-ahead-2/cache=storage2-readdir-ahead-2) =>
> /content/Downloads/incomplete/usenet/Attack.on.Titan.S02E05.
> 720p.WEB.x264-ANiURL.#27/aniurl-aot.s02e05.720p.web.r04
> (hash=storage2-readdir-ahead-0/cache=)
> [2017-05-13 19:08:22.769542] I [MSGID: 109066]
> [dht-rename.c:1608:dht_rename] 0-storage2-dht: renaming
> /content/Downloads/incomplete/usenet/Attack.on.Titan.S02E05.
> 720p.WEB.x264-ANiURL.#27/aniurl-aot.s02e05.720p.web.r01.tmp
> (hash=storage2-readdir-ahead-2/cache=storage2-readdir-ahead-2) =>
> /content/Downloads/incomplete/usenet/Attack.on.Titan.S02E05.
> 720p.WEB.x264-ANiURL.#27/aniurl-aot.s02e05.720p.web.r01
> (hash=storage2-readdir-ahead-0/cache=)
> [2017-05-13 19:08:23.141069] I [MSGID: 109066]
> [dht-rename.c:1608:dht_rename] 0-storage2-dht: renaming
> /content/Downloads/incomplete/usenet/Attack.on.Titan.S02E05.
> 720p.WEB.x264-ANiURL.#27/aniurl-aot.s02e05.720p.web.nfo.tmp
> (hash=storage2-readdir-ahead-1/cache=storage2-readdir-ahead-1) =>
> /content/Downloads/incomplete/usenet/Attack.on.Titan.S02E05.
> 720p.WEB.x264-ANiURL.#27/aniurl-aot.s02e05.720p.web.nfo
> (hash=storage2-readdir-ahead-0/cache=)
> [2017-05-13 19:08:23.468554] I [MSGID: 109066]
> [dht-rename.c:1608:dht_rename] 0-storage2-dht: renaming
> /content/Downloads/incomplete/usenet/Attack.on.Titan.S02E05.
> 720p.WEB.x264-ANiURL.#27/aniurl-aot.s02e05.720p.web.r00.tmp
> (hash=storage2-readdir-ahead-0/cache=storage2-readdir-ahead-0) =>
> /content/Downloads/incomplete/usenet/Attack.on.Titan.S02E05.
> 720p.WEB.x264-ANiURL.#27/aniurl-aot.s02e05.720p.web.r00
> (hash=storage2-readdir-ahead-2/cache=)
> [2017-05-13 19:08:23.671753] I [MSGID: 109066]
> [dht-rename.c:1608:dht_rename] 0-storage2-dht: renaming
> /content/Downloads/incomplete/usenet/Attack.on.Titan.S02E05.
> 720p.WEB.x264-ANiURL.#27/aniurl-aot.s02e05.720p.web.sfv.tmp
> (hash=storage2-readdir-ahead-2/cache=storage2-readdir-ahead-2) =>
> /content/Downloads/incomplete/usenet/Attack.on.Titan.S02E05.
> 720p.WEB.x264-ANiURL.#27/aniurl-aot.s02e05.720p.web.sfv
> (hash=storage2-readdir-ahead-2/cache=)
> [2017-05-13 19:08:23.812152] I [MSGID: 109066]
> [dht-rename.c:1608:dht_rename] 0-storage2-dht: renaming
> /content/Downloads/incomplete/usenet/Attack.on.Titan.S02E05.
> 720p.WEB.x264-ANiURL.#27/aniurl-aot.s02e05.720p.web.r11.tmp
> 

Re: [Gluster-users] VM going down

2017-05-11 Thread Krutika Dhananjay
Niels,

Allesandro's configuration does not have shard enabled. So it has
definitely not got anything to do with shard not supporting seek fop.

Copy-pasting volume-info output from the first mail:

Volume Name: datastore2
Type: Replicate
Volume ID: c95ebb5f-6e04-4f09-91b9-bbbe63d83aea
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x (2 + 1) = 3
Transport-type: tcp
Bricks:
Brick1: srvpve2g:/data/brick2/brick
Brick2: srvpve3g:/data/brick2/brick
Brick3: srvpve1g:/data/brick2/brick (arbiter)
Options Reconfigured:
nfs.disable: on
performance.readdir-ahead: on
transport.address-family: inet


-Krutika


On Tue, May 9, 2017 at 7:40 PM, Niels de Vos  wrote:

> ...
> > > client from
> > > srvpve2-162483-2017/05/08-10:01:06:189720-datastore2-client-0-0-0
> > > (version: 3.8.11)
> > > [2017-05-08 10:01:06.237433] E [MSGID: 113107]
> [posix.c:1079:posix_seek]
> > > 0-datastore2-posix: seek failed on fd 18 length 42957209600 [No such
> > > device or address]
>
> The SEEK procedure translates to lseek() in the posix xlator. This can
> return with "No suck device or address" (ENXIO) in only one case:
>
> ENXIOwhence is SEEK_DATA or SEEK_HOLE, and the file offset is
>  beyond the end of the file.
>
> This means that an lseek() was executed where the current offset of the
> filedescriptor was higher than the size of the file. I'm not sure how
> that could happen... Sharding prevents using SEEK at all atm.
>
> ...
> > > The strange part is that I cannot seem to find any other error.
> > > If I restart the VM everything works as expected (it stopped at ~9.51
> > > UTC and was started at ~10.01 UTC) .
> > >
> > > This is not the first time that this happened, and I do not see any
> > > problems with networking or the hosts.
> > >
> > > Gluster version is 3.8.11
> > > this is the incriminated volume (though it happened on a different one
> too)
> > >
> > > Volume Name: datastore2
> > > Type: Replicate
> > > Volume ID: c95ebb5f-6e04-4f09-91b9-bbbe63d83aea
> > > Status: Started
> > > Snapshot Count: 0
> > > Number of Bricks: 1 x (2 + 1) = 3
> > > Transport-type: tcp
> > > Bricks:
> > > Brick1: srvpve2g:/data/brick2/brick
> > > Brick2: srvpve3g:/data/brick2/brick
> > > Brick3: srvpve1g:/data/brick2/brick (arbiter)
> > > Options Reconfigured:
> > > nfs.disable: on
> > > performance.readdir-ahead: on
> > > transport.address-family: inet
> > >
> > > Any hint on how to dig more deeply into the reason would be greatly
> > > appreciated.
>
> Probably the problem is with SEEK support in the arbiter functionality.
> Just like with a READ or a WRITE on the arbiter brick, SEEK can only
> succeed on bricks where the files with content are located. It does not
> look like arbiter handles SEEK, so the offset in lseek() will likely be
> higher than the size of the file on the brick (empty, 0 size file). I
> don't know how the replication xlator responds on an error return from
> SEEK on one of the bricks, but I doubt it likes it.
>
> We have https://bugzilla.redhat.com/show_bug.cgi?id=1301647 to support
> SEEK for sharding. I suggest you open a bug for getting SEEK in the
> arbiter xlator as well.
>
> HTH,
> Niels
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] VM going down

2017-05-08 Thread Krutika Dhananjay
The newly introduced "SEEK" fop seems to be failing at the bricks.

Adding Niels for his inputs/help.

-Krutika

On Mon, May 8, 2017 at 3:43 PM, Alessandro Briosi  wrote:

> Hi all,
> I have sporadic VM going down which files are on gluster FS.
>
> If I look at the gluster logs the only events I find are:
> /var/log/glusterfs/bricks/data-brick2-brick.log
>
> [2017-05-08 09:51:17.661697] I [MSGID: 115036]
> [server.c:548:server_rpc_notify] 0-datastore2-server: disconnecting
> connection from
> srvpve2-9074-2017/05/04-14:12:53:301448-datastore2-client-0-0-0
> [2017-05-08 09:51:17.661697] I [MSGID: 115036]
> [server.c:548:server_rpc_notify] 0-datastore2-server: disconnecting
> connection from
> srvpve2-9074-2017/05/04-14:12:53:367950-datastore2-client-0-0-0
> [2017-05-08 09:51:17.661810] W [inodelk.c:399:pl_inodelk_log_cleanup]
> 0-datastore2-server: releasing lock on
> 66d9eefb-ee55-40ad-9f44-c55d1e809006 held by {client=0x7f4c7c004880,
> pid=0 lk-owner=5c7099efc97f}
> [2017-05-08 09:51:17.661810] W [inodelk.c:399:pl_inodelk_log_cleanup]
> 0-datastore2-server: releasing lock on
> a8d82b3d-1cf9-45cf-9858-d8546710b49c held by {client=0x7f4c840f31d0,
> pid=0 lk-owner=5c7019fac97f}
> [2017-05-08 09:51:17.661835] I [MSGID: 115013]
> [server-helpers.c:293:do_fd_cleanup] 0-datastore2-server: fd cleanup on
> /images/201/vm-201-disk-2.qcow2
> [2017-05-08 09:51:17.661838] I [MSGID: 115013]
> [server-helpers.c:293:do_fd_cleanup] 0-datastore2-server: fd cleanup on
> /images/201/vm-201-disk-1.qcow2
> [2017-05-08 09:51:17.661953] I [MSGID: 101055]
> [client_t.c:415:gf_client_unref] 0-datastore2-server: Shutting down
> connection srvpve2-9074-2017/05/04-14:12:53:301448-datastore2-client-0-0-0
> [2017-05-08 09:51:17.661953] I [MSGID: 101055]
> [client_t.c:415:gf_client_unref] 0-datastore2-server: Shutting down
> connection srvpve2-9074-2017/05/04-14:12:53:367950-datastore2-client-0-0-0
> [2017-05-08 10:01:06.210392] I [MSGID: 115029]
> [server-handshake.c:692:server_setvolume] 0-datastore2-server: accepted
> client from
> srvpve2-162483-2017/05/08-10:01:06:189720-datastore2-client-0-0-0
> (version: 3.8.11)
> [2017-05-08 10:01:06.237433] E [MSGID: 113107] [posix.c:1079:posix_seek]
> 0-datastore2-posix: seek failed on fd 18 length 42957209600 [No such
> device or address]
> [2017-05-08 10:01:06.237463] E [MSGID: 115089]
> [server-rpc-fops.c:2007:server_seek_cbk] 0-datastore2-server: 18: SEEK-2
> (a8d82b3d-1cf9-45cf-9858-d8546710b49c) ==> (No such device or address)
> [No such device or address]
> [2017-05-08 10:01:07.019974] I [MSGID: 115029]
> [server-handshake.c:692:server_setvolume] 0-datastore2-server: accepted
> client from
> srvpve2-162483-2017/05/08-10:01:07:3687-datastore2-client-0-0-0
> (version: 3.8.11)
> [2017-05-08 10:01:07.041967] E [MSGID: 113107] [posix.c:1079:posix_seek]
> 0-datastore2-posix: seek failed on fd 19 length 859136720896 [No such
> device or address]
> [2017-05-08 10:01:07.041992] E [MSGID: 115089]
> [server-rpc-fops.c:2007:server_seek_cbk] 0-datastore2-server: 18: SEEK-2
> (66d9eefb-ee55-40ad-9f44-c55d1e809006) ==> (No such device or address)
> [No such device or address]
>
> The strange part is that I cannot seem to find any other error.
> If I restart the VM everything works as expected (it stopped at ~9.51
> UTC and was started at ~10.01 UTC) .
>
> This is not the first time that this happened, and I do not see any
> problems with networking or the hosts.
>
> Gluster version is 3.8.11
> this is the incriminated volume (though it happened on a different one too)
>
> Volume Name: datastore2
> Type: Replicate
> Volume ID: c95ebb5f-6e04-4f09-91b9-bbbe63d83aea
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 1 x (2 + 1) = 3
> Transport-type: tcp
> Bricks:
> Brick1: srvpve2g:/data/brick2/brick
> Brick2: srvpve3g:/data/brick2/brick
> Brick3: srvpve1g:/data/brick2/brick (arbiter)
> Options Reconfigured:
> nfs.disable: on
> performance.readdir-ahead: on
> transport.address-family: inet
>
> Any hint on how to dig more deeply into the reason would be greatly
> appreciated.
>
> Alessandro
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Elasticsearch facing CorruptIndexException exception with GlusterFs 3.10.1

2017-05-05 Thread Krutika Dhananjay
Yeah, there are a couple of cache consistency issues with performance
translators that are causing these exceptions.
Some of them were fixed by 3.10.1. Some still remain.

Alternatively you can give gluster-block + elasticsearch a try, which
doesn't require solving all these caching issues.
Here's a blog post on the same -
https://pkalever.wordpress.com/2017/03/14/elasticsearch-with-gluster-block/

Adding Prasanna and Pranith who worked on this, in case you need more info
on this.

-Krutika

On Fri, May 5, 2017 at 12:15 AM, Abhijit Paul 
wrote:

> Thanks for the reply, i will try it out but i am also facing one more
> issue "i.e. replicated volumes returning different timestamps"
> so is this because of Bug 1426548 - Openshift Logging ElasticSearch
> FSLocks when using GlusterFS storage backend
>  ?
>
> *FYI i am using glusterfs 3.10.1 tar.gz*
>
> Regards,
> Abhijit
>
>
>
> On Thu, May 4, 2017 at 10:58 PM, Amar Tumballi 
> wrote:
>
>>
>>
>> On Thu, May 4, 2017 at 10:41 PM, Abhijit Paul 
>> wrote:
>>
>>> Since i am new to gluster, can please provide how to turn off/disable "perf
>>> xlator options"?
>>>
>>>
>> $ gluster volume set  performance.stat-prefetch off
>> $ gluster volume set  performance.read-ahead off
>> $ gluster volume set  performance.write-behind off
>> $ gluster volume set  performance.io-cache off
>> $ gluster volume set  performance.quick-read off
>>
>>
>> Regards,
>> Amar
>>
>>>
 On Wed, May 3, 2017 at 8:51 PM, Atin Mukherjee 
 wrote:

> I think there is still some pending stuffs in some of the gluster perf
> xlators to make that work complete. Cced the relevant folks for more
> information. Can you please turn off all the perf xlator options as a work
> around to move forward?
>
> On Wed, May 3, 2017 at 8:04 PM, Abhijit Paul  > wrote:
>
>> Dear folks,
>>
>> I setup Glusterfs(3.10.1) NFS type as persistence volume for
>> Elasticsearch(5.1.2) but currently facing issue with 
>> *"CorruptIndexException"
>> *with Elasticseach logs and due to that index health trued RED in
>> Elasticsearch.
>>
>> Later found that there was an issue with gluster < 3.10 (
>> https://bugzilla.redhat.com/show_bug.cgi?id=1390050) but even after 
>> *upgrading
>> to 3.10.1 issue is still there.*
>>
>> *So curios to know what would be the root cause to fix this issue.*
>>
>> Regards,
>> Abhijit
>>
>> ___
>> Gluster-users mailing list
>> Gluster-users@gluster.org
>> http://lists.gluster.org/mailman/listinfo/gluster-users
>>
>
>

>>>
>>> ___
>>> Gluster-users mailing list
>>> Gluster-users@gluster.org
>>> http://lists.gluster.org/mailman/listinfo/gluster-users
>>>
>>
>>
>>
>> --
>> Amar Tumballi (amarts)
>>
>
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] [Gluster-devel] Enabling shard on EC

2017-05-05 Thread Krutika Dhananjay
Hi,

Work is in progress for this (and going a bit slow at the moment because of
other priorities).

At the moment we support sharding only for VM image store use-case - most
common large file + single writer use case we know of.

Just curious, what is the use case where you want to use shard+EC?

-Krutika

On Thu, May 4, 2017 at 4:05 PM, Pranith Kumar Karampuri  wrote:

> +Krutika
>
> Krutika started work on this. But it is very long term. Not a simple thing
> to do.
>
> On Thu, May 4, 2017 at 3:53 PM, Ankireddypalle Reddy  > wrote:
>
>> Pranith,
>>
>>  Thanks. Is there any work in progress to add this
>> support.
>>
>>
>>
>> Thanks and Regards,
>>
>> Ram
>>
>> *From:* Pranith Kumar Karampuri [mailto:pkara...@redhat.com]
>> *Sent:* Thursday, May 04, 2017 6:17 AM
>>
>> *To:* Ankireddypalle Reddy
>> *Cc:* Gluster Devel (gluster-de...@gluster.org);
>> gluster-users@gluster.org
>> *Subject:* Re: [Gluster-devel] Enabling shard on EC
>>
>>
>>
>>
>>
>>
>>
>> On Thu, May 4, 2017 at 3:43 PM, Ankireddypalle Reddy <
>> are...@commvault.com> wrote:
>>
>> Pranith,
>>
>>  Thanks. Does it mean that a given file can be written by
>> only one client at a time. If multiple clients try to access the file in
>> write mode, does it lead to any kind of data inconsistencies.
>>
>>
>>
>> We only tested it for single writer cases such as VM usecases. We need to
>> bring in transaction framework for sharding to work with multiple writers.
>>
>>
>>
>>
>>
>> Thanks and Regards,
>>
>> Ram
>>
>> *From:* Pranith Kumar Karampuri [mailto:pkara...@redhat.com]
>> *Sent:* Thursday, May 04, 2017 6:07 AM
>> *To:* Ankireddypalle Reddy
>> *Cc:* Gluster Devel (gluster-de...@gluster.org);
>> gluster-users@gluster.org
>> *Subject:* Re: [Gluster-devel] Enabling shard on EC
>>
>>
>>
>> It is never been tested. That said, I don't see any missing pieces that
>> we know of for it to work. Please note that sharding works only for single
>> writer cases at the moment. Do let us know if you find any problems and we
>> will fix them.
>>
>>
>>
>> On Wed, May 3, 2017 at 2:17 PM, Ankireddypalle Reddy <
>> are...@commvault.com> wrote:
>>
>> Hi,
>>
>>   Are there any known negatives of enabling shard on EC. Is this a
>> recommended configuration?.
>>
>>
>>
>> Thanks and Regards,
>>
>> Ram
>>
>>
>>
>>
>>
>> ***Legal Disclaimer***
>>
>> "This communication may contain confidential and privileged material for
>> the
>>
>> sole use of the intended recipient. Any unauthorized review, use or
>> distribution
>>
>> by others is strictly prohibited. If you have received the message by
>> mistake,
>>
>> please advise the sender by reply email and delete the message. Thank
>> you."
>>
>> **
>>
>>
>> ___
>> Gluster-devel mailing list
>> gluster-de...@gluster.org
>> http://lists.gluster.org/mailman/listinfo/gluster-devel
>>
>>
>>
>>
>> --
>>
>> Pranith
>>
>> ***Legal Disclaimer***
>>
>> "This communication may contain confidential and privileged material for
>> the
>>
>> sole use of the intended recipient. Any unauthorized review, use or
>> distribution
>>
>> by others is strictly prohibited. If you have received the message by
>> mistake,
>>
>> please advise the sender by reply email and delete the message. Thank
>> you."
>>
>> **
>>
>>
>>
>>
>> --
>>
>> Pranith
>> ***Legal Disclaimer***
>> "This communication may contain confidential and privileged material for
>> the
>> sole use of the intended recipient. Any unauthorized review, use or
>> distribution
>> by others is strictly prohibited. If you have received the message by
>> mistake,
>> please advise the sender by reply email and delete the message. Thank
>> you."
>> **
>>
>
>
>
> --
> Pranith
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] GlusterFS Shard Feature: Max number of files in .shard-Folder

2017-04-24 Thread Krutika Dhananjay
Yes, that's about it. Pranith pretty much summed up whatever I would have
said.

-Krutika

On Sat, Apr 22, 2017 at 12:25 PM, Pranith Kumar Karampuri <
pkara...@redhat.com> wrote:

> +Krutika for any other inputs you may need.
>
> On Sat, Apr 22, 2017 at 12:21 PM, Pranith Kumar Karampuri <
> pkara...@redhat.com> wrote:
>
>> Sorry for the delay. The only internal process that we know would take
>> more time is self-heal and we implemented a feature called granular entry
>> self-heal which should be enabled with sharded volumes to get the benefits.
>> So when a brick goes down and say only 1 in those million entries is
>> created/deleted. Self-heal would be done for only that file it won't crawl
>> the entire directory.
>>
>>
>>
>> On Wed, Apr 12, 2017 at 8:11 PM, David Spisla 
>> wrote:
>>
>>> Dear Gluster-Community,
>>>
>>>
>>>
>>> If I use the shard feature it may happen that I will have a huge number
>>> of shard-chunks in the hidden folder .shard
>>>
>>> Does anybody has some experience what is the maximum number of files in
>>> one .shard-Folder?
>>>
>>>
>>>
>>> If I have 1 Million files in such a folder, some operations like
>>> self-healing or another internal operations would need
>>>
>>> a lot of time, I guess.
>>>
>>>
>>>
>>> Sincerely
>>>
>>>
>>>
>>>
>>>
>>> *David Spisla*
>>>
>>> Software Developer
>>>
>>> david.spi...@iternity.com
>>>
>>> www.iTernity.com 
>>>
>>> Tel:   +49 761-590 34 841
>>>
>>>
>>>
>>> [image: cid:image001.png@01D239C7.FDF7B430]
>>>
>>>
>>>
>>> iTernity GmbH
>>> Heinrich-von-Stephan-Str. 21
>>> 79100 Freiburg – Germany
>>> ---
>>> unseren technischen Support erreichen Sie unter +49 761-387 36 66
>>> ---
>>>
>>> Geschäftsführer: Ralf Steinemann
>>> Eingetragen beim Amtsgericht Freiburg: HRB-Nr. 701332
>>> USt.Id de-24266431
>>>
>>>
>>>
>>>
>>>
>>> ___
>>> Gluster-users mailing list
>>> Gluster-users@gluster.org
>>> http://lists.gluster.org/mailman/listinfo/gluster-users
>>>
>>
>>
>>
>> --
>> Pranith
>>
>
>
>
> --
> Pranith
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] adding arbiter

2017-04-04 Thread Krutika Dhananjay
You mean when converting a replica 2 volume to a replica 3 volume with
arbiter?
No, you don't need to do rebalance.

What you will need to do instead is monitor heal-info output to ensure the
directory tree is sync'd to the arbiter brick.

Please refer to http://review.gluster.org/#/c/14502/2 for more details.

-Krutika

On Tue, Apr 4, 2017 at 8:09 PM, Alessandro Briosi <a...@metalit.com> wrote:

> Il 04/04/2017 16:16, Krutika Dhananjay ha scritto:
> > So the corruption bug is seen iff your vms are online while fix-layout
> > and/or rebalance is going on.
> > Does that answer your question?
> >
> > The same issue has now been root-caused and there will be a fix for it
> > soon by Raghavendra G.
>
> Yes it does about triggering the corruption bug, and glad you were able
> to find a solution.
>
> I sill am not sure if a rebalance is needed when adding an arbitrer to
> an existing volume.
>
> thank you
> Alessandro
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] adding arbiter

2017-04-04 Thread Krutika Dhananjay
So the corruption bug is seen iff your vms are online while fix-layout
and/or rebalance is going on.
Does that answer your question?

The same issue has now been root-caused and there will be a fix for it soon
by Raghavendra G.

-Krutika

On Mon, Apr 3, 2017 at 6:44 PM, Alessandro Briosi  wrote:

> Il 01/04/2017 04:22, Gambit15 ha scritto:
> > As I understand it, only new files will be sharded, but simply
> > renaming or moving them may be enough in that case.
> >
> > I'm interested in the arbiter/sharding bug you've mentioned. Could you
> > provide any more details or a link?
> >
>
> I think it is triggered only on rebalance.
>
> Though I have still no idea if adding an arbiter afterwards needs
> rebalance or not, and as this should only write file refernce (and no
> data) on the arbiter, this should not touch anything on the data side. I
> though wanted to be sure before doing this on a production environment.
>
> The bug has been discussed in the mailing list. There are a couple of
> patches that went into 3.8.10
>
> https://review.gluster.org/#/c/16749/
> https://review.gluster.org/#/c/16750/
>
> though I'm not sure this solved the problem or not.
>
> https://bugzilla.redhat.com/show_bug.cgi?id=1387878
>
> If you look at the mailing list archive you can find more information on
> this.
>
> Currently I'm not using shardin, though as I'm using gluster to host VM
> in case of some problems to one of the hosts healing would require lots
> of CPU and time to recover the files.
> Sharding should solve this, but I'd rather wait the time for it to heal,
> then have to go through a restore from a backup cause there was data
> corruption.
>
> Any hint would really be appreciated.
>
> Alessandro
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Gluster 3.8.10 rebalance VMs corruption

2017-04-04 Thread Krutika Dhananjay
Nope. This is a different bug.

-Krutika

On Mon, Apr 3, 2017 at 5:03 PM, Gandalf Corvotempesta <
gandalf.corvotempe...@gmail.com> wrote:

> This is a good news
> Is this related to the previously fixed bug?
>
> Il 3 apr 2017 10:22 AM, "Krutika Dhananjay" <kdhan...@redhat.com> ha
> scritto:
>
>> So Raghavendra has an RCA for this issue.
>>
>> Copy-pasting his comment here:
>>
>> 
>>
>> Following is a rough algorithm of shard_writev:
>>
>> 1. Based on the offset, calculate the shards touched by current write.
>> 2. Look for inodes corresponding to these shard files in itable.
>> 3. If one or more inodes are missing from itable, issue mknod for 
>> corresponding shard files and ignore EEXIST in cbk.
>> 4. resume writes on respective shards.
>>
>> Now, imagine a write which falls to an existing "shard_file". For the sake 
>> of discussion lets consider a distribute of three subvols - s1, s2, s3
>>
>> 1. "shard_file" hashes to subvolume s2 and is present on s2
>> 2. add a subvolume s4 and initiate a fix layout. The layout of ".shard" is 
>> fixed to include s4 and hash ranges are changed.
>> 3. write that touches "shard_file" is issued.
>> 4. The inode for "shard_file" is not present in itable after a graph switch 
>> and features/shard issues an mknod.
>> 5. With new layout of .shard, lets say "shard_file" hashes to s3 and mknod 
>> (shard_file) on s3 succeeds. But, the shard_file is already present on s2.
>>
>> So, we have two files on two different subvols of dht representing same 
>> shard and this will lead to corruption.
>>
>> 
>>
>> Raghavendra will be sending out a patch in DHT to fix this issue.
>>
>> -Krutika
>>
>>
>> On Tue, Mar 28, 2017 at 11:49 PM, Pranith Kumar Karampuri <
>> pkara...@redhat.com> wrote:
>>
>>>
>>>
>>> On Mon, Mar 27, 2017 at 11:29 PM, Mahdi Adnan <mahdi.ad...@outlook.com>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>>
>>>> Do you guys have any update regarding this issue ?
>>>>
>>> I do not actively work on this issue so I do not have an accurate
>>> update, but from what I heard from Krutika and Raghavendra(works on DHT)
>>> is: Krutika debugged initially and found that the issue seems more likely
>>> to be in DHT, Satheesaran who helped us recreate this issue in lab found
>>> that just fix-layout without rebalance also caused the corruption 1 out of
>>> 3 times. Raghavendra came up with a possible RCA for why this can happen.
>>> Raghavendra(CCed) would be the right person to provide accurate update.
>>>
>>>>
>>>>
>>>> --
>>>>
>>>> Respectfully
>>>> *Mahdi A. Mahdi*
>>>>
>>>> --
>>>> *From:* Krutika Dhananjay <kdhan...@redhat.com>
>>>> *Sent:* Tuesday, March 21, 2017 3:02:55 PM
>>>> *To:* Mahdi Adnan
>>>> *Cc:* Nithya Balachandran; Gowdappa, Raghavendra; Susant Palai;
>>>> gluster-users@gluster.org List
>>>>
>>>> *Subject:* Re: [Gluster-users] Gluster 3.8.10 rebalance VMs corruption
>>>>
>>>> Hi,
>>>>
>>>> So it looks like Satheesaran managed to recreate this issue. We will be
>>>> seeking his help in debugging this. It will be easier that way.
>>>>
>>>> -Krutika
>>>>
>>>> On Tue, Mar 21, 2017 at 1:35 PM, Mahdi Adnan <mahdi.ad...@outlook.com>
>>>> wrote:
>>>>
>>>>> Hello and thank you for your email.
>>>>> Actually no, i didn't check the gfid of the vms.
>>>>> If this will help, i can setup a new test cluster and get all the data
>>>>> you need.
>>>>>
>>>>> Get Outlook for Android <https://aka.ms/ghei36>
>>>>>
>>>>> From: Nithya Balachandran
>>>>> Sent: Monday, March 20, 20:57
>>>>> Subject: Re: [Gluster-users] Gluster 3.8.10 rebalance VMs corruption
>>>>> To: Krutika Dhananjay
>>>>> Cc: Mahdi Adnan, Gowdappa, Raghavendra, Susant Palai,
>>>>> gluster-users@gluster.org List
>>>>>
>>>>> Hi,
>>>>>
>>>>> Do you know the GFIDs of the VM images which were corrupted?
>>>>>
>>>>> Regards,
>>>>>
>>>>> Nithya
>>>>

Re: [Gluster-users] Gluster 3.8.10 rebalance VMs corruption

2017-04-03 Thread Krutika Dhananjay
So Raghavendra has an RCA for this issue.

Copy-pasting his comment here:



Following is a rough algorithm of shard_writev:

1. Based on the offset, calculate the shards touched by current write.
2. Look for inodes corresponding to these shard files in itable.
3. If one or more inodes are missing from itable, issue mknod for
corresponding shard files and ignore EEXIST in cbk.
4. resume writes on respective shards.

Now, imagine a write which falls to an existing "shard_file". For the
sake of discussion lets consider a distribute of three subvols - s1,
s2, s3

1. "shard_file" hashes to subvolume s2 and is present on s2
2. add a subvolume s4 and initiate a fix layout. The layout of
".shard" is fixed to include s4 and hash ranges are changed.
3. write that touches "shard_file" is issued.
4. The inode for "shard_file" is not present in itable after a graph
switch and features/shard issues an mknod.
5. With new layout of .shard, lets say "shard_file" hashes to s3 and
mknod (shard_file) on s3 succeeds. But, the shard_file is already
present on s2.

So, we have two files on two different subvols of dht representing
same shard and this will lead to corruption.



Raghavendra will be sending out a patch in DHT to fix this issue.

-Krutika


On Tue, Mar 28, 2017 at 11:49 PM, Pranith Kumar Karampuri <
pkara...@redhat.com> wrote:

>
>
> On Mon, Mar 27, 2017 at 11:29 PM, Mahdi Adnan <mahdi.ad...@outlook.com>
> wrote:
>
>> Hi,
>>
>>
>> Do you guys have any update regarding this issue ?
>>
> I do not actively work on this issue so I do not have an accurate update,
> but from what I heard from Krutika and Raghavendra(works on DHT) is:
> Krutika debugged initially and found that the issue seems more likely to be
> in DHT, Satheesaran who helped us recreate this issue in lab found that
> just fix-layout without rebalance also caused the corruption 1 out of 3
> times. Raghavendra came up with a possible RCA for why this can happen.
> Raghavendra(CCed) would be the right person to provide accurate update.
>
>>
>>
>> --
>>
>> Respectfully
>> *Mahdi A. Mahdi*
>>
>> --
>> *From:* Krutika Dhananjay <kdhan...@redhat.com>
>> *Sent:* Tuesday, March 21, 2017 3:02:55 PM
>> *To:* Mahdi Adnan
>> *Cc:* Nithya Balachandran; Gowdappa, Raghavendra; Susant Palai;
>> gluster-users@gluster.org List
>>
>> *Subject:* Re: [Gluster-users] Gluster 3.8.10 rebalance VMs corruption
>>
>> Hi,
>>
>> So it looks like Satheesaran managed to recreate this issue. We will be
>> seeking his help in debugging this. It will be easier that way.
>>
>> -Krutika
>>
>> On Tue, Mar 21, 2017 at 1:35 PM, Mahdi Adnan <mahdi.ad...@outlook.com>
>> wrote:
>>
>>> Hello and thank you for your email.
>>> Actually no, i didn't check the gfid of the vms.
>>> If this will help, i can setup a new test cluster and get all the data
>>> you need.
>>>
>>> Get Outlook for Android <https://aka.ms/ghei36>
>>>
>>> From: Nithya Balachandran
>>> Sent: Monday, March 20, 20:57
>>> Subject: Re: [Gluster-users] Gluster 3.8.10 rebalance VMs corruption
>>> To: Krutika Dhananjay
>>> Cc: Mahdi Adnan, Gowdappa, Raghavendra, Susant Palai,
>>> gluster-users@gluster.org List
>>>
>>> Hi,
>>>
>>> Do you know the GFIDs of the VM images which were corrupted?
>>>
>>> Regards,
>>>
>>> Nithya
>>>
>>> On 20 March 2017 at 20:37, Krutika Dhananjay <kdhan...@redhat.com>
>>> wrote:
>>>
>>> I looked at the logs.
>>>
>>> From the time the new graph (since the add-brick command you shared
>>> where bricks 41 through 44 are added) is switched to (line 3011 onwards in
>>> nfs-gfapi.log), I see the following kinds of errors:
>>>
>>> 1. Lookups to a bunch of files failed with ENOENT on both replicas which
>>> protocol/client converts to ESTALE. I am guessing these entries got
>>> migrated to
>>>
>>> other subvolumes leading to 'No such file or directory' errors.
>>>
>>> DHT and thereafter shard get the same error code and log the following:
>>>
>>>  0 [2017-03-17 14:04:26.353444] E [MSGID: 109040]
>>> [dht-helper.c:1198:dht_migration_complete_check_task] 17-vmware2-dht:
>>> : failed to lookup the
>>> file on vmware2-dht [Stale file handle]
>>>
>>>
>>>   1 [2017-03-17 14:04:26.353528] E [MSGID: 133014]
>>> [shard.c:1253:shard_common_stat_cbk] 17-vmware2-sha

Re: [Gluster-users] Gluster 3.8.10 rebalance VMs corruption

2017-03-20 Thread Krutika Dhananjay
I looked at the logs.

>From the time the new graph (since the add-brick command you shared where
bricks 41 through 44 are added) is switched to (line 3011 onwards in
nfs-gfapi.log), I see the following kinds of errors:

1. Lookups to a bunch of files failed with ENOENT on both replicas which
protocol/client converts to ESTALE. I am guessing these entries got
migrated to
other subvolumes leading to 'No such file or directory' errors.
DHT and thereafter shard get the same error code and log the following:

 0 [2017-03-17 14:04:26.353444] E [MSGID: 109040]
[dht-helper.c:1198:dht_migration_complete_check_task] 17-vmware2-dht:
: failed to lookup the file
on vmware2-dht [Stale file
handle]

  1 [2017-03-17 14:04:26.353528] E [MSGID: 133014]
[shard.c:1253:shard_common_stat_cbk] 17-vmware2-shard: stat failed:
a68ce411-e381-46a3-93cd-d2af6a7c3532 [Stale file handle]

which is fine.

2. The other kind are from AFR logging of possible split-brain which I
suppose are harmless too.
[2017-03-17 14:23:36.968883] W [MSGID: 108008]
[afr-read-txn.c:228:afr_read_txn] 17-vmware2-replicate-13: Unreadable
subvolume -1 found with event generation 2 for gfid
74d49288-8452-40d4-893e-ff4672557ff9. (Possible split-brain)

Since you are saying the bug is hit only on VMs that are undergoing IO
while rebalance is running (as opposed to those that remained powered off),
rebalance + IO could be causing some issues.

CC'ing DHT devs

Raghavendra/Nithya/Susant,

Could you take a look?

-Krutika



On Sun, Mar 19, 2017 at 4:55 PM, Mahdi Adnan <mahdi.ad...@outlook.com>
wrote:

> Thank you for your email mate.
>
>
> Yes, im aware of this but, to save costs i chose replica 2, this cluster
> is all flash.
>
> In version 3.7.x i had issues with ping timeout, if one hosts went down
> for few seconds the whole cluster hangs and become unavailable, to avoid
> this i adjusted the ping timeout to 5 seconds.
>
> As for choosing Ganesha over gfapi, VMWare does not support Gluster (FUSE
> or gfapi) im stuck with NFS for this volume.
>
> The other volume is mounted using gfapi in oVirt cluster.
>
>
>
>
>
> --
>
> Respectfully
> *Mahdi A. Mahdi*
>
> --
> *From:* Krutika Dhananjay <kdhan...@redhat.com>
> *Sent:* Sunday, March 19, 2017 2:01:49 PM
>
> *To:* Mahdi Adnan
> *Cc:* gluster-users@gluster.org
> *Subject:* Re: [Gluster-users] Gluster 3.8.10 rebalance VMs corruption
>
> While I'm still going through the logs, just wanted to point out a couple
> of things:
>
> 1. It is recommended that you use 3-way replication (replica count 3) for
> VM store use case
> 2. network.ping-timeout at 5 seconds is way too low. Please change it to
> 30.
>
> Is there any specific reason for using NFS-Ganesha over gfapi/FUSE?
>
> Will get back with anything else I might find or more questions if I have
> any.
>
> -Krutika
>
> On Sun, Mar 19, 2017 at 2:36 PM, Mahdi Adnan <mahdi.ad...@outlook.com>
> wrote:
>
>> Thanks mate,
>>
>> Kindly, check the attachment.
>>
>>
>>
>> --
>>
>> Respectfully
>> *Mahdi A. Mahdi*
>>
>> --
>> *From:* Krutika Dhananjay <kdhan...@redhat.com>
>> *Sent:* Sunday, March 19, 2017 10:00:22 AM
>>
>> *To:* Mahdi Adnan
>> *Cc:* gluster-users@gluster.org
>> *Subject:* Re: [Gluster-users] Gluster 3.8.10 rebalance VMs corruption
>>
>> In that case could you share the ganesha-gfapi logs?
>>
>> -Krutika
>>
>> On Sun, Mar 19, 2017 at 12:13 PM, Mahdi Adnan <mahdi.ad...@outlook.com>
>> wrote:
>>
>>> I have two volumes, one is mounted using libgfapi for ovirt mount, the
>>> other one is exported via NFS-Ganesha for VMWare which is the one im
>>> testing now.
>>>
>>>
>>>
>>> --
>>>
>>> Respectfully
>>> *Mahdi A. Mahdi*
>>>
>>> --
>>> *From:* Krutika Dhananjay <kdhan...@redhat.com>
>>> *Sent:* Sunday, March 19, 2017 8:02:19 AM
>>>
>>> *To:* Mahdi Adnan
>>> *Cc:* gluster-users@gluster.org
>>> *Subject:* Re: [Gluster-users] Gluster 3.8.10 rebalance VMs corruption
>>>
>>>
>>>
>>> On Sat, Mar 18, 2017 at 10:36 PM, Mahdi Adnan <mahdi.ad...@outlook.com>
>>> wrote:
>>>
>>>> Kindly, check the attached new log file, i dont know if it's helpful or
>>>> not but, i couldn't find the log with the name you just described.
>>>>
>>> No. Are you using FUSE or libgfapi for accessing the volume? Or is it
>>> NFS?
>>>
>>> -Krutika
>>>
>>>>
>

Re: [Gluster-users] Gluster 3.8.10 rebalance VMs corruption

2017-03-19 Thread Krutika Dhananjay
While I'm still going through the logs, just wanted to point out a couple
of things:

1. It is recommended that you use 3-way replication (replica count 3) for
VM store use case
2. network.ping-timeout at 5 seconds is way too low. Please change it to 30.

Is there any specific reason for using NFS-Ganesha over gfapi/FUSE?

Will get back with anything else I might find or more questions if I have
any.

-Krutika

On Sun, Mar 19, 2017 at 2:36 PM, Mahdi Adnan <mahdi.ad...@outlook.com>
wrote:

> Thanks mate,
>
> Kindly, check the attachment.
>
>
>
> --
>
> Respectfully
> *Mahdi A. Mahdi*
>
> ------
> *From:* Krutika Dhananjay <kdhan...@redhat.com>
> *Sent:* Sunday, March 19, 2017 10:00:22 AM
>
> *To:* Mahdi Adnan
> *Cc:* gluster-users@gluster.org
> *Subject:* Re: [Gluster-users] Gluster 3.8.10 rebalance VMs corruption
>
> In that case could you share the ganesha-gfapi logs?
>
> -Krutika
>
> On Sun, Mar 19, 2017 at 12:13 PM, Mahdi Adnan <mahdi.ad...@outlook.com>
> wrote:
>
>> I have two volumes, one is mounted using libgfapi for ovirt mount, the
>> other one is exported via NFS-Ganesha for VMWare which is the one im
>> testing now.
>>
>>
>>
>> --
>>
>> Respectfully
>> *Mahdi A. Mahdi*
>>
>> --
>> *From:* Krutika Dhananjay <kdhan...@redhat.com>
>> *Sent:* Sunday, March 19, 2017 8:02:19 AM
>>
>> *To:* Mahdi Adnan
>> *Cc:* gluster-users@gluster.org
>> *Subject:* Re: [Gluster-users] Gluster 3.8.10 rebalance VMs corruption
>>
>>
>>
>> On Sat, Mar 18, 2017 at 10:36 PM, Mahdi Adnan <mahdi.ad...@outlook.com>
>> wrote:
>>
>>> Kindly, check the attached new log file, i dont know if it's helpful or
>>> not but, i couldn't find the log with the name you just described.
>>>
>> No. Are you using FUSE or libgfapi for accessing the volume? Or is it NFS?
>>
>> -Krutika
>>
>>>
>>>
>>> --
>>>
>>> Respectfully
>>> *Mahdi A. Mahdi*
>>>
>>> --
>>> *From:* Krutika Dhananjay <kdhan...@redhat.com>
>>> *Sent:* Saturday, March 18, 2017 6:10:40 PM
>>>
>>> *To:* Mahdi Adnan
>>> *Cc:* gluster-users@gluster.org
>>> *Subject:* Re: [Gluster-users] Gluster 3.8.10 rebalance VMs corruption
>>>
>>> mnt-disk11-vmware2.log seems like a brick log. Could you attach the fuse
>>> mount logs? It should be right under /var/log/glusterfs/ directory
>>> named after the mount point name, only hyphenated.
>>>
>>> -Krutika
>>>
>>> On Sat, Mar 18, 2017 at 7:27 PM, Mahdi Adnan <mahdi.ad...@outlook.com>
>>> wrote:
>>>
>>>> Hello Krutika,
>>>>
>>>>
>>>> Kindly, check the attached logs.
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> Respectfully
>>>> *Mahdi A. Mahdi*
>>>>
>>>> --
>>>> *From:* Krutika Dhananjay <kdhan...@redhat.com>
>>>> *Sent:* Saturday, March 18, 2017 3:29:03 PM
>>>> *To:* Mahdi Adnan
>>>> *Cc:* gluster-users@gluster.org
>>>> *Subject:* Re: [Gluster-users] Gluster 3.8.10 rebalance VMs corruption
>>>>
>>>> Hi Mahdi,
>>>>
>>>> Could you attach mount, brick and rebalance logs?
>>>>
>>>> -Krutika
>>>>
>>>> On Sat, Mar 18, 2017 at 12:14 AM, Mahdi Adnan <mahdi.ad...@outlook.com>
>>>> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I have upgraded to Gluster 3.8.10 today and ran the add-brick
>>>>> procedure in a volume contains few VMs.
>>>>> After the completion of rebalance, i have rebooted the VMs, some of
>>>>> ran just fine, and others just crashed.
>>>>> Windows boot to recovery mode and Linux throw xfs errors and does not
>>>>> boot.
>>>>> I ran the test again and it happened just as the first one, but i have
>>>>> noticed only VMs doing disk IOs are affected by this bug.
>>>>> The VMs in power off mode started fine and even md5 of the disk file
>>>>> did not change after the rebalance.
>>>>>
>>>>> anyone else can confirm this ?
>>>>>
>>>>>
>>>>> Volume info:
>>>>>
>>>>> Volume Name: vmware2
>>>>> Type: Distrib

Re: [Gluster-users] Gluster 3.8.10 rebalance VMs corruption

2017-03-19 Thread Krutika Dhananjay
In that case could you share the ganesha-gfapi logs?

-Krutika

On Sun, Mar 19, 2017 at 12:13 PM, Mahdi Adnan <mahdi.ad...@outlook.com>
wrote:

> I have two volumes, one is mounted using libgfapi for ovirt mount, the
> other one is exported via NFS-Ganesha for VMWare which is the one im
> testing now.
>
>
>
> --
>
> Respectfully
> *Mahdi A. Mahdi*
>
> ------
> *From:* Krutika Dhananjay <kdhan...@redhat.com>
> *Sent:* Sunday, March 19, 2017 8:02:19 AM
>
> *To:* Mahdi Adnan
> *Cc:* gluster-users@gluster.org
> *Subject:* Re: [Gluster-users] Gluster 3.8.10 rebalance VMs corruption
>
>
>
> On Sat, Mar 18, 2017 at 10:36 PM, Mahdi Adnan <mahdi.ad...@outlook.com>
> wrote:
>
>> Kindly, check the attached new log file, i dont know if it's helpful or
>> not but, i couldn't find the log with the name you just described.
>>
> No. Are you using FUSE or libgfapi for accessing the volume? Or is it NFS?
>
> -Krutika
>
>>
>>
>> --
>>
>> Respectfully
>> *Mahdi A. Mahdi*
>>
>> --
>> *From:* Krutika Dhananjay <kdhan...@redhat.com>
>> *Sent:* Saturday, March 18, 2017 6:10:40 PM
>>
>> *To:* Mahdi Adnan
>> *Cc:* gluster-users@gluster.org
>> *Subject:* Re: [Gluster-users] Gluster 3.8.10 rebalance VMs corruption
>>
>> mnt-disk11-vmware2.log seems like a brick log. Could you attach the fuse
>> mount logs? It should be right under /var/log/glusterfs/ directory
>> named after the mount point name, only hyphenated.
>>
>> -Krutika
>>
>> On Sat, Mar 18, 2017 at 7:27 PM, Mahdi Adnan <mahdi.ad...@outlook.com>
>> wrote:
>>
>>> Hello Krutika,
>>>
>>>
>>> Kindly, check the attached logs.
>>>
>>>
>>>
>>> --
>>>
>>> Respectfully
>>> *Mahdi A. Mahdi*
>>>
>>> --
>>> *From:* Krutika Dhananjay <kdhan...@redhat.com>
>>> *Sent:* Saturday, March 18, 2017 3:29:03 PM
>>> *To:* Mahdi Adnan
>>> *Cc:* gluster-users@gluster.org
>>> *Subject:* Re: [Gluster-users] Gluster 3.8.10 rebalance VMs corruption
>>>
>>> Hi Mahdi,
>>>
>>> Could you attach mount, brick and rebalance logs?
>>>
>>> -Krutika
>>>
>>> On Sat, Mar 18, 2017 at 12:14 AM, Mahdi Adnan <mahdi.ad...@outlook.com>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>> I have upgraded to Gluster 3.8.10 today and ran the add-brick procedure
>>>> in a volume contains few VMs.
>>>> After the completion of rebalance, i have rebooted the VMs, some of ran
>>>> just fine, and others just crashed.
>>>> Windows boot to recovery mode and Linux throw xfs errors and does not
>>>> boot.
>>>> I ran the test again and it happened just as the first one, but i have
>>>> noticed only VMs doing disk IOs are affected by this bug.
>>>> The VMs in power off mode started fine and even md5 of the disk file
>>>> did not change after the rebalance.
>>>>
>>>> anyone else can confirm this ?
>>>>
>>>>
>>>> Volume info:
>>>>
>>>> Volume Name: vmware2
>>>> Type: Distributed-Replicate
>>>> Volume ID: 02328d46-a285-4533-aa3a-fb9bfeb688bf
>>>> Status: Started
>>>> Snapshot Count: 0
>>>> Number of Bricks: 22 x 2 = 44
>>>> Transport-type: tcp
>>>> Bricks:
>>>> Brick1: gluster01:/mnt/disk1/vmware2
>>>> Brick2: gluster03:/mnt/disk1/vmware2
>>>> Brick3: gluster02:/mnt/disk1/vmware2
>>>> Brick4: gluster04:/mnt/disk1/vmware2
>>>> Brick5: gluster01:/mnt/disk2/vmware2
>>>> Brick6: gluster03:/mnt/disk2/vmware2
>>>> Brick7: gluster02:/mnt/disk2/vmware2
>>>> Brick8: gluster04:/mnt/disk2/vmware2
>>>> Brick9: gluster01:/mnt/disk3/vmware2
>>>> Brick10: gluster03:/mnt/disk3/vmware2
>>>> Brick11: gluster02:/mnt/disk3/vmware2
>>>> Brick12: gluster04:/mnt/disk3/vmware2
>>>> Brick13: gluster01:/mnt/disk4/vmware2
>>>> Brick14: gluster03:/mnt/disk4/vmware2
>>>> Brick15: gluster02:/mnt/disk4/vmware2
>>>> Brick16: gluster04:/mnt/disk4/vmware2
>>>> Brick17: gluster01:/mnt/disk5/vmware2
>>>> Brick18: gluster03:/mnt/disk5/vmware2
>>>> Brick19: gluster02:/mnt/disk5/vmware2
>>>> Brick20: gluster04:/mnt/disk5/vmware2
>>>&g

Re: [Gluster-users] Gluster 3.8.10 rebalance VMs corruption

2017-03-18 Thread Krutika Dhananjay
On Sat, Mar 18, 2017 at 10:36 PM, Mahdi Adnan <mahdi.ad...@outlook.com>
wrote:

> Kindly, check the attached new log file, i dont know if it's helpful or
> not but, i couldn't find the log with the name you just described.
>
No. Are you using FUSE or libgfapi for accessing the volume? Or is it NFS?

-Krutika

>
>
> --
>
> Respectfully
> *Mahdi A. Mahdi*
>
> ----------
> *From:* Krutika Dhananjay <kdhan...@redhat.com>
> *Sent:* Saturday, March 18, 2017 6:10:40 PM
>
> *To:* Mahdi Adnan
> *Cc:* gluster-users@gluster.org
> *Subject:* Re: [Gluster-users] Gluster 3.8.10 rebalance VMs corruption
>
> mnt-disk11-vmware2.log seems like a brick log. Could you attach the fuse
> mount logs? It should be right under /var/log/glusterfs/ directory
> named after the mount point name, only hyphenated.
>
> -Krutika
>
> On Sat, Mar 18, 2017 at 7:27 PM, Mahdi Adnan <mahdi.ad...@outlook.com>
> wrote:
>
>> Hello Krutika,
>>
>>
>> Kindly, check the attached logs.
>>
>>
>>
>> --
>>
>> Respectfully
>> *Mahdi A. Mahdi*
>>
>> --
>> *From:* Krutika Dhananjay <kdhan...@redhat.com>
>> *Sent:* Saturday, March 18, 2017 3:29:03 PM
>> *To:* Mahdi Adnan
>> *Cc:* gluster-users@gluster.org
>> *Subject:* Re: [Gluster-users] Gluster 3.8.10 rebalance VMs corruption
>>
>> Hi Mahdi,
>>
>> Could you attach mount, brick and rebalance logs?
>>
>> -Krutika
>>
>> On Sat, Mar 18, 2017 at 12:14 AM, Mahdi Adnan <mahdi.ad...@outlook.com>
>> wrote:
>>
>>> Hi,
>>>
>>> I have upgraded to Gluster 3.8.10 today and ran the add-brick procedure
>>> in a volume contains few VMs.
>>> After the completion of rebalance, i have rebooted the VMs, some of ran
>>> just fine, and others just crashed.
>>> Windows boot to recovery mode and Linux throw xfs errors and does not
>>> boot.
>>> I ran the test again and it happened just as the first one, but i have
>>> noticed only VMs doing disk IOs are affected by this bug.
>>> The VMs in power off mode started fine and even md5 of the disk file did
>>> not change after the rebalance.
>>>
>>> anyone else can confirm this ?
>>>
>>>
>>> Volume info:
>>>
>>> Volume Name: vmware2
>>> Type: Distributed-Replicate
>>> Volume ID: 02328d46-a285-4533-aa3a-fb9bfeb688bf
>>> Status: Started
>>> Snapshot Count: 0
>>> Number of Bricks: 22 x 2 = 44
>>> Transport-type: tcp
>>> Bricks:
>>> Brick1: gluster01:/mnt/disk1/vmware2
>>> Brick2: gluster03:/mnt/disk1/vmware2
>>> Brick3: gluster02:/mnt/disk1/vmware2
>>> Brick4: gluster04:/mnt/disk1/vmware2
>>> Brick5: gluster01:/mnt/disk2/vmware2
>>> Brick6: gluster03:/mnt/disk2/vmware2
>>> Brick7: gluster02:/mnt/disk2/vmware2
>>> Brick8: gluster04:/mnt/disk2/vmware2
>>> Brick9: gluster01:/mnt/disk3/vmware2
>>> Brick10: gluster03:/mnt/disk3/vmware2
>>> Brick11: gluster02:/mnt/disk3/vmware2
>>> Brick12: gluster04:/mnt/disk3/vmware2
>>> Brick13: gluster01:/mnt/disk4/vmware2
>>> Brick14: gluster03:/mnt/disk4/vmware2
>>> Brick15: gluster02:/mnt/disk4/vmware2
>>> Brick16: gluster04:/mnt/disk4/vmware2
>>> Brick17: gluster01:/mnt/disk5/vmware2
>>> Brick18: gluster03:/mnt/disk5/vmware2
>>> Brick19: gluster02:/mnt/disk5/vmware2
>>> Brick20: gluster04:/mnt/disk5/vmware2
>>> Brick21: gluster01:/mnt/disk6/vmware2
>>> Brick22: gluster03:/mnt/disk6/vmware2
>>> Brick23: gluster02:/mnt/disk6/vmware2
>>> Brick24: gluster04:/mnt/disk6/vmware2
>>> Brick25: gluster01:/mnt/disk7/vmware2
>>> Brick26: gluster03:/mnt/disk7/vmware2
>>> Brick27: gluster02:/mnt/disk7/vmware2
>>> Brick28: gluster04:/mnt/disk7/vmware2
>>> Brick29: gluster01:/mnt/disk8/vmware2
>>> Brick30: gluster03:/mnt/disk8/vmware2
>>> Brick31: gluster02:/mnt/disk8/vmware2
>>> Brick32: gluster04:/mnt/disk8/vmware2
>>> Brick33: gluster01:/mnt/disk9/vmware2
>>> Brick34: gluster03:/mnt/disk9/vmware2
>>> Brick35: gluster02:/mnt/disk9/vmware2
>>> Brick36: gluster04:/mnt/disk9/vmware2
>>> Brick37: gluster01:/mnt/disk10/vmware2
>>> Brick38: gluster03:/mnt/disk10/vmware2
>>> Brick39: gluster02:/mnt/disk10/vmware2
>>> Brick40: gluster04:/mnt/disk10/vmware2
>>> Brick41: gluster01:/mnt/disk11/vmware2
>>> Brick42: gluster03:/mnt/disk11/vmware2
>>> 

Re: [Gluster-users] Gluster 3.8.10 rebalance VMs corruption

2017-03-18 Thread Krutika Dhananjay
On Sat, Mar 18, 2017 at 11:15 PM, Gandalf Corvotempesta <
gandalf.corvotempe...@gmail.com> wrote:

> Krutika, it wasn't an attack directly to you.
> It wasn't an attack at all.
>

> Gluster is a "SCALE-OUT" software defined storage, the folllowing is
> wrote in the middle of the homepage:
> "GlusterFS is a scalable network filesystem"
>
> So, scaling a cluster is one of the primary goal of gluster.
>
> A critical bug that prevent gluster from being scaled without loosing
> data was discovered 1 year ago, and took 1 year to be fixed.
>

> If gluster isn't able to ensure data consistency when doing it's
> primary role, scaling up a storage, i'm sorry but it can't be
> considered "enterprise" ready or production ready.
>

That's not entirely true. VM use-case is just one of the many workloads
users
use Gluster for. I think I've clarified this before. The bug was in
dht-shard interaction.
And shard is *only* supported in VM use-case as of today. This means that
scaling out has been working fine on all but the VM use-case.
That doesn't mean that Gluster is not production-ready. At least users
who've deployed Gluster
in non-VM use-cases haven't complained of add-brick not working in the
recent past.


-Krutika


> Maybe SOHO for small offices or home users, but in enterprises, data
> consistency and reliability is the most important thing and gluster
> isn't able to guarantee this even
> doing a very basic routine procedure that should be considered as the
> basis of the whole gluster project (as wrote on gluster's homepage)
>
>
> 2017-03-18 14:21 GMT+01:00 Krutika Dhananjay <kdhan...@redhat.com>:
> >
> >
> > On Sat, Mar 18, 2017 at 3:18 PM, Gandalf Corvotempesta
> > <gandalf.corvotempe...@gmail.com> wrote:
> >>
> >> 2017-03-18 2:09 GMT+01:00 Lindsay Mathieson <
> lindsay.mathie...@gmail.com>:
> >> > Concerning, this was supposed to be fixed in 3.8.10
> >>
> >> Exactly. https://bugzilla.redhat.com/show_bug.cgi?id=1387878
> >> Now let's see how much time they require to fix another CRITICAL bug.
> >>
> >> I'm really curious.
> >
> >
> > Hey Gandalf!
> >
> > Let's see. There have been plenty of occasions where I've sat and worked
> on
> > users' issues on weekends.
> > And then again, I've got a life too outside of work (or at least I'm
> > supposed to), you know.
> > (And hey you know what! Today is Saturday and I'm sitting here and
> > responding to your mail and collecting information
> > on Mahdi's issue. Nobody asked me to look into it. I checked the mail
> and I
> > had a choice to ignore it and not look into it until Monday.)
> >
> > Is there a genuine problem Mahdi is facing? Without a doubt!
> >
> > Got a constructive feedback to give? Please do.
> > Do you want to give back to the community and help improve GlusterFS?
> There
> > are plenty of ways to do that.
> > One of them is testing out the releases and providing feedback. Sharding
> > wouldn't have worked today, if not for Lindsay's timely
> > and regular feedback in several 3.7.x releases.
> >
> > But this kind of criticism doesn't help.
> >
> > Also, spending time on users' issues is only one of the many
> > responsibilities we have as developers.
> > So what you see on mailing lists is just the tip of the iceberg.
> >
> > I have personally tried several times to recreate the add-brick bug on 3
> > machines I borrowed from Kaleb. I haven't had success in recreating it.
> > Reproducing VM-related bugs, in my experience, wasn't easy. I don't use
> > Proxmox. Lindsay and Kevin did. There are a myriad qemu options used when
> > launching vms. Different VM management projects (ovirt/Proxmox) use
> > different defaults for these options. There are too many variables to be
> > considered
> > when debugging or trying to simulate the users' test.
> >
> > It's why I asked for Mahdi's help before 3.8.10 was out for feedback on
> the
> > fix:
> > http://lists.gluster.org/pipermail/gluster-users/2017-
> February/030112.html
> >
> > Alright. That's all I had to say.
> >
> > Happy weekend to you!
> >
> > -Krutika
> >
> >> ___
> >> Gluster-users mailing list
> >> Gluster-users@gluster.org
> >> http://lists.gluster.org/mailman/listinfo/gluster-users
> >
> >
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Sharding?

2017-03-10 Thread Krutika Dhananjay
On Fri, Mar 10, 2017 at 4:09 PM, Cedric Lemarchand 
wrote:

>
> > On 10 Mar 2017, at 10:33, Alessandro Briosi  wrote:
> >
> > Il 10/03/2017 10:28, Kevin Lemonnier ha scritto:
> >>> I haven't done any test yet, but I was under the impression that
> >>> sharding feature isn't so stable/mature yet.
> >>> In the remote of my mind I remember reading something about a
> >>> bug/situation which caused data corruption.
> >>> Can someone confirm that sharding is stable enough to be used in
> >>> production and won't cause any data loss?
> >> There were a few bugs yeah. I can tell you that in 3.7.15 (and I assume
> >> later versions) it works well as long as you don't try to add new bricks
> >> to your volumes (we use it in production for HA virtual machine disks).
> >> Apparently that bug was fixed recently, so latest versions should be
> >> pretty stable yeah.
> >
> > I'm using 3.8.9, so I suppose all known bugs have been fixed there (also
> the one with adding briks)
> >
> > I'll then proceed with some tests before going to production.
>
> I am still asking myself how such bug could happen on a clustered storage
> software, where adding bricks is a base feature for scalable solution, like
> Gluster. Or maybe is it that STM releases are really under tested compared
> to LTM ones ? Could we states that STM release are really not made for
> production, or at least really risky ?
>

Not entirely true. The same bug existed in LTM release too.

I did try reproducing the bug on my setup as soon as Lindsay, Kevin and
others started reporting about it, but it was never reproducible on my
setup.
Absence of proper logging in libgfapi upon failures only made it harder to
debug, even when the users successfully recreated the issue and shared
their logs. It was only after Satheesaran recreated it successfully with
FUSE mount that the real debugging could begin, when fuse-bridge translator
logged the exact error code for failure.

-Krutika


> Sorry if the question could sounds a bit rude, but I think it still
> remains for newish peoples that had to make a choice on which release is
> better for production ;-)
>
> Cheers
>
> Cédric
>
> >
> > Thank you
> >
> > ___
> > Gluster-users mailing list
> > Gluster-users@gluster.org
> > http://lists.gluster.org/mailman/listinfo/gluster-users
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Sharding?

2017-03-10 Thread Krutika Dhananjay
On Fri, Mar 10, 2017 at 3:03 PM, Alessandro Briosi  wrote:

> Il 10/03/2017 10:28, Kevin Lemonnier ha scritto:
>
> I haven't done any test yet, but I was under the impression that
> sharding feature isn't so stable/mature yet.
> In the remote of my mind I remember reading something about a
> bug/situation which caused data corruption.
> Can someone confirm that sharding is stable enough to be used in
> production and won't cause any data loss?
>
> There were a few bugs yeah. I can tell you that in 3.7.15 (and I assume
> later versions) it works well as long as you don't try to add new bricks
> to your volumes (we use it in production for HA virtual machine disks).
> Apparently that bug was fixed recently, so latest versions should be
> pretty stable yeah.
>
>
> I'm using 3.8.9, so I suppose all known bugs have been fixed there (also
> the one with adding briks)
>

No. That one is out for review and yet to be merged.

... which again reminds me ...

Niels,

Care to merge the two patches?

https://review.gluster.org/#/c/16749/
https://review.gluster.org/#/c/16750/

-Krutika


> I'll then proceed with some tests before going to production.
>
> Thank you
>
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Deleting huge file from glusterfs hangs the cluster for a while

2017-03-09 Thread Krutika Dhananjay
Unfortunately you'll need to delete those shards manually from the bricks.
I am assuming you know how to identify shards that belong to a particular
image.
Since the VM is deleted, no IO will be happening on those remaining shards.

You would need to identify the shards, find all hard links associated with
every shard,
and delete the shards and their hard links from the backend.

Do you mind raising a bug for this issue? I'll send a patch to move the
deletion of the shards
to the background.

https://bugzilla.redhat.com/enter_bug.cgi?product=GlusterFS

-Krutika

On Thu, Mar 9, 2017 at 12:29 AM, Georgi Mirchev <gmirc...@usa.net> wrote:

>
> На 03/08/2017 в 03:37 PM, Krutika Dhananjay написа:
>
> Thanks for your feedback.
>
> May I know what was the shard-block-size?
>
> The shard size is 4 MB.
>
> One way to fix this would be to make shard translator delete only the base
> file (0th shard) in the IO path and move
> the deletion of the rest of the shards to background. I'll work on this.
>
> Is there a manual way?
>
>
> -Krutika
>
> On Fri, Mar 3, 2017 at 10:35 PM, GEORGI MIRCHEV <gmirc...@usa.net> wrote:
>
>> Hi,
>>
>> I have deleted two large files (around 1 TB each) via gluster client
>> (mounted
>> on /mnt folder). I used a simple rm command, e.g "rm /mnt/hugefile". This
>> resulted in hang of the cluster (no io can be done, the VM hanged). After
>> a
>> few minutes my ssh connection to the gluster node gets disconnected - I
>> had to
>> reconnect, which was very strange, probably some kind of timeout. Nothing
>> in
>> dmesg so it's probably the ssh that terminated the connection.
>>
>> After that the cluster works, everything seems fine, the file is gone in
>> the
>> client but the space is not reclaimed.
>>
>> The deleted file is also gone from bricks, but the shards are still there
>> and
>> use up all the space.
>>
>> I need to reclaim the space. How do I delete the shards / other metadata
>> for a
>> file that no longer exists?
>>
>>
>> Versions:
>> glusterfs-server-3.8.9-1.el7.x86_64
>> glusterfs-client-xlators-3.8.9-1.el7.x86_64
>> glusterfs-geo-replication-3.8.9-1.el7.x86_64
>> glusterfs-3.8.9-1.el7.x86_64
>> glusterfs-fuse-3.8.9-1.el7.x86_64
>> vdsm-gluster-4.19.4-1.el7.centos.noarch
>> glusterfs-cli-3.8.9-1.el7.x86_64
>> glusterfs-libs-3.8.9-1.el7.x86_64
>> glusterfs-api-3.8.9-1.el7.x86_64
>>
>> --
>> Georgi Mirchev
>>
>> ___
>> Gluster-users mailing list
>> Gluster-users@gluster.org
>> http://lists.gluster.org/mailman/listinfo/gluster-users
>>
>
>
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Deleting huge file from glusterfs hangs the cluster for a while

2017-03-08 Thread Krutika Dhananjay
Thanks for your feedback.

May I know what was the shard-block-size?

One way to fix this would be to make shard translator delete only the base
file (0th shard) in the IO path and move
the deletion of the rest of the shards to background. I'll work on this.

-Krutika

On Fri, Mar 3, 2017 at 10:35 PM, GEORGI MIRCHEV  wrote:

> Hi,
>
> I have deleted two large files (around 1 TB each) via gluster client
> (mounted
> on /mnt folder). I used a simple rm command, e.g "rm /mnt/hugefile". This
> resulted in hang of the cluster (no io can be done, the VM hanged). After a
> few minutes my ssh connection to the gluster node gets disconnected - I
> had to
> reconnect, which was very strange, probably some kind of timeout. Nothing
> in
> dmesg so it's probably the ssh that terminated the connection.
>
> After that the cluster works, everything seems fine, the file is gone in
> the
> client but the space is not reclaimed.
>
> The deleted file is also gone from bricks, but the shards are still there
> and
> use up all the space.
>
> I need to reclaim the space. How do I delete the shards / other metadata
> for a
> file that no longer exists?
>
>
> Versions:
> glusterfs-server-3.8.9-1.el7.x86_64
> glusterfs-client-xlators-3.8.9-1.el7.x86_64
> glusterfs-geo-replication-3.8.9-1.el7.x86_64
> glusterfs-3.8.9-1.el7.x86_64
> glusterfs-fuse-3.8.9-1.el7.x86_64
> vdsm-gluster-4.19.4-1.el7.centos.noarch
> glusterfs-cli-3.8.9-1.el7.x86_64
> glusterfs-libs-3.8.9-1.el7.x86_64
> glusterfs-api-3.8.9-1.el7.x86_64
>
> --
> Georgi Mirchev
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

[Gluster-users] Fixes to VM pause issues upon add-brick

2017-03-02 Thread Krutika Dhananjay
Hi Niels,

Care to merge the following two 3.8 backports:

https://review.gluster.org/16749 and
https://review.gluster.org/16750

and in that order. One of the users who'd reported this issue has confirmed
that the patch fixed the issue. So did Satheesaran.

-Krutika
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Optimal shard size & self-heal algorithm for VM hosting?

2017-02-16 Thread Krutika Dhananjay
On Wed, Feb 15, 2017 at 9:38 PM, Gambit15  wrote:

> Hey guys,
>  I keep seeing different recommendations for the best shard sizes for VM
> images, from 64MB to 512MB.
>
> What's the benefit of smaller v larger shards?
> I'm guessing smaller shards are quicker to heal, but larger shards will
> provide better sequential I/O for single clients? Anything else?
>

That's the main difference. And also smaller shards provide better brick
utilization and distribution of IO in distributed-replicated volumes as
opposed to larger shards.

>
> I also usually see "cluster.data-self-heal-algorithm: full" is generally
> recommended in these cases. Why not "diff"? Is it simply to reduce CPU load
> when there's plenty of excess network capacity?
>

That's correct. diff heal requires rolling checksum to be computed for
every 128KB chunk of the file on both source and sink bricks, which is CPU
intensive, potentially affecting IO traffic.

-Krutika


>
> Thanks in advance,
> Doug
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Error while setting cluster.granular-entry-heal

2017-02-16 Thread Krutika Dhananjay
Could you please attach the "glfsheal-.log" logfile?

-Krutika

On Thu, Feb 16, 2017 at 12:05 AM, Andrea Fogazzi  wrote:

> Hello,
>
> I have a gluster volume on 3.8.8 which has multiple volumes, each on
> distributed/replicated on 5 servers (2 replicas+1 quorum); each volume is
> both accessed as gluster client (RW, clients have 3.8.8 or 3.8.5) or
> Ganesha FS (RO).
>
>
> On one of the  volumes we have:
>
> - cluster.data-self-heal-algorithm: full
> - cluster.locking-scheme: granular
>
> When we try to set
>
> - cluster.granular-entry-heal on
>
> (command I use is "gluster volume set vol-cor-homes
> cluster.granular-entry-heal on")
>
> I receive
>
> *volume set: failed:  'gluster volume set 
> cluster.granular-entry-heal {enable, disable}' is not supported. Use
> 'gluster volume heal  granular-entry-heal {enable, disable}'
> instead.*
>
>
> Answer is not clear to me; I also tried command suggested in the command
> response, but it does not work (I get "Enable granular entry heal on
> volume vol-cor-homes has been unsuccessful on bricks that are down. Please
> check if all brick processes are running." while I am sure all bricks are
> online).
>
>
> Do you have any suggestion on what I am doing wrong, or how to debug the
> issue?
>
>
>
> Thanks in advance.
>
>
> Best regards
>
> andrea
>
>
>
>
>
> --
> Andrea Fogazzi
> fo...@fogazzi.com
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] When to use striped volumes?

2017-01-17 Thread Krutika Dhananjay
Could you describe what use-case you intend to use striping for?

-Krutika


On Tue, Jan 17, 2017 at 12:52 PM, Dave Fan  wrote:

> Hello everyone,
>
> We are trying to set up a Gluster-based storage for best performance. On
> the official Gluster website. It says:
>
> *Striped* – Striped volumes stripes data across bricks in the volume. For
> best results, you should use striped volumes only in high concurrency
> environments accessing very large files.
>
> Is there a rule-of-thumb on what size qualifies as "very large files" here?
>
> Many thanks,
> Dave
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://lists.gluster.org/mailman/listinfo/gluster-users
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://lists.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] corruption using gluster and iSCSI with LIO

2016-11-18 Thread Krutika Dhananjay
Assuming you're using FUSE, if your gluster volume is mounted at /some/dir,
for example,
then its corresponding logs will be at /var/log/glusterfs/some-dir.log

-Krutika

On Fri, Nov 18, 2016 at 7:13 AM, Olivier Lambert <lambert.oliv...@gmail.com>
wrote:

> Attached, bricks log. Where could I find the fuse client log?
>
> On Fri, Nov 18, 2016 at 2:22 AM, Krutika Dhananjay <kdhan...@redhat.com>
> wrote:
> > Could you attach the fuse client and brick logs?
> >
> > -Krutika
> >
> > On Fri, Nov 18, 2016 at 6:12 AM, Olivier Lambert <
> lambert.oliv...@gmail.com>
> > wrote:
> >>
> >> Okay, used the exact same config you provided, and adding an arbiter
> >> node (node3)
> >>
> >> After halting node2, VM continues to work after a small "lag"/freeze.
> >> I restarted node2 and it was back online: OK
> >>
> >> Then, after waiting few minutes, halting node1. And **just** at this
> >> moment, the VM is corrupted (segmentation fault, /var/log folder empty
> >> etc.)
> >>
> >> dmesg of the VM:
> >>
> >> [ 1645.852905] EXT4-fs error (device xvda1):
> >> htree_dirblock_to_tree:988: inode #19: block 8286: comm bash: bad
> >> entry in directory: rec_len is smaller than minimal - offset=0(0),
> >> inode=0, rec_len=0, name_len=0
> >> [ 1645.854509] Aborting journal on device xvda1-8.
> >> [ 1645.855524] EXT4-fs (xvda1): Remounting filesystem read-only
> >>
> >> And got a lot of " comm bash: bad entry in directory" messages then...
> >>
> >> Here is the current config with all Node back online:
> >>
> >> # gluster volume info
> >>
> >> Volume Name: gv0
> >> Type: Replicate
> >> Volume ID: 5f15c919-57e3-4648-b20a-395d9fe3d7d6
> >> Status: Started
> >> Snapshot Count: 0
> >> Number of Bricks: 1 x (2 + 1) = 3
> >> Transport-type: tcp
> >> Bricks:
> >> Brick1: 10.0.0.1:/bricks/brick1/gv0
> >> Brick2: 10.0.0.2:/bricks/brick1/gv0
> >> Brick3: 10.0.0.3:/bricks/brick1/gv0 (arbiter)
> >> Options Reconfigured:
> >> nfs.disable: on
> >> performance.readdir-ahead: on
> >> transport.address-family: inet
> >> features.shard: on
> >> features.shard-block-size: 16MB
> >> network.remote-dio: enable
> >> cluster.eager-lock: enable
> >> performance.io-cache: off
> >> performance.read-ahead: off
> >> performance.quick-read: off
> >> performance.stat-prefetch: on
> >> performance.strict-write-ordering: off
> >> cluster.server-quorum-type: server
> >> cluster.quorum-type: auto
> >> cluster.data-self-heal: on
> >>
> >>
> >> # gluster volume status
> >> Status of volume: gv0
> >> Gluster process TCP Port  RDMA Port  Online
> >> Pid
> >>
> >> 
> --
> >> Brick 10.0.0.1:/bricks/brick1/gv0   49152 0  Y
> >> 1331
> >> Brick 10.0.0.2:/bricks/brick1/gv0   49152 0  Y
> >> 2274
> >> Brick 10.0.0.3:/bricks/brick1/gv0   49152 0  Y
> >> 2355
> >> Self-heal Daemon on localhost   N/A   N/AY
> >> 2300
> >> Self-heal Daemon on 10.0.0.3N/A   N/AY
> >> 10530
> >> Self-heal Daemon on 10.0.0.2N/A   N/AY
> >> 2425
> >>
> >> Task Status of Volume gv0
> >>
> >> 
> --
> >> There are no active volume tasks
> >>
> >>
> >>
> >> On Thu, Nov 17, 2016 at 11:35 PM, Olivier Lambert
> >> <lambert.oliv...@gmail.com> wrote:
> >> > It's planned to have an arbiter soon :) It was just preliminary tests.
> >> >
> >> > Thanks for the settings, I'll test this soon and I'll come back to
> you!
> >> >
> >> > On Thu, Nov 17, 2016 at 11:29 PM, Lindsay Mathieson
> >> > <lindsay.mathie...@gmail.com> wrote:
> >> >> On 18/11/2016 8:17 AM, Olivier Lambert wrote:
> >> >>>
> >> >>> gluster volume info gv0
> >> >>>
> >> >>> Volume Name: gv0
> >> >>> Type: Replicate
> >> >>> Volume ID: 2f8658ed-0d9d-4a6f-a00b-96e9d3470b53
> >> >>&g

Re: [Gluster-users] corruption using gluster and iSCSI with LIO

2016-11-17 Thread Krutika Dhananjay
Could you attach the fuse client and brick logs?

-Krutika

On Fri, Nov 18, 2016 at 6:12 AM, Olivier Lambert 
wrote:

> Okay, used the exact same config you provided, and adding an arbiter
> node (node3)
>
> After halting node2, VM continues to work after a small "lag"/freeze.
> I restarted node2 and it was back online: OK
>
> Then, after waiting few minutes, halting node1. And **just** at this
> moment, the VM is corrupted (segmentation fault, /var/log folder empty
> etc.)
>
> dmesg of the VM:
>
> [ 1645.852905] EXT4-fs error (device xvda1):
> htree_dirblock_to_tree:988: inode #19: block 8286: comm bash: bad
> entry in directory: rec_len is smaller than minimal - offset=0(0),
> inode=0, rec_len=0, name_len=0
> [ 1645.854509] Aborting journal on device xvda1-8.
> [ 1645.855524] EXT4-fs (xvda1): Remounting filesystem read-only
>
> And got a lot of " comm bash: bad entry in directory" messages then...
>
> Here is the current config with all Node back online:
>
> # gluster volume info
>
> Volume Name: gv0
> Type: Replicate
> Volume ID: 5f15c919-57e3-4648-b20a-395d9fe3d7d6
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 1 x (2 + 1) = 3
> Transport-type: tcp
> Bricks:
> Brick1: 10.0.0.1:/bricks/brick1/gv0
> Brick2: 10.0.0.2:/bricks/brick1/gv0
> Brick3: 10.0.0.3:/bricks/brick1/gv0 (arbiter)
> Options Reconfigured:
> nfs.disable: on
> performance.readdir-ahead: on
> transport.address-family: inet
> features.shard: on
> features.shard-block-size: 16MB
> network.remote-dio: enable
> cluster.eager-lock: enable
> performance.io-cache: off
> performance.read-ahead: off
> performance.quick-read: off
> performance.stat-prefetch: on
> performance.strict-write-ordering: off
> cluster.server-quorum-type: server
> cluster.quorum-type: auto
> cluster.data-self-heal: on
>
>
> # gluster volume status
> Status of volume: gv0
> Gluster process TCP Port  RDMA Port  Online
> Pid
> 
> --
> Brick 10.0.0.1:/bricks/brick1/gv0   49152 0  Y
>  1331
> Brick 10.0.0.2:/bricks/brick1/gv0   49152 0  Y
>  2274
> Brick 10.0.0.3:/bricks/brick1/gv0   49152 0  Y
>  2355
> Self-heal Daemon on localhost   N/A   N/AY
>  2300
> Self-heal Daemon on 10.0.0.3N/A   N/AY
>  10530
> Self-heal Daemon on 10.0.0.2N/A   N/AY
>  2425
>
> Task Status of Volume gv0
> 
> --
> There are no active volume tasks
>
>
>
> On Thu, Nov 17, 2016 at 11:35 PM, Olivier Lambert
>  wrote:
> > It's planned to have an arbiter soon :) It was just preliminary tests.
> >
> > Thanks for the settings, I'll test this soon and I'll come back to you!
> >
> > On Thu, Nov 17, 2016 at 11:29 PM, Lindsay Mathieson
> >  wrote:
> >> On 18/11/2016 8:17 AM, Olivier Lambert wrote:
> >>>
> >>> gluster volume info gv0
> >>>
> >>> Volume Name: gv0
> >>> Type: Replicate
> >>> Volume ID: 2f8658ed-0d9d-4a6f-a00b-96e9d3470b53
> >>> Status: Started
> >>> Snapshot Count: 0
> >>> Number of Bricks: 1 x 2 = 2
> >>> Transport-type: tcp
> >>> Bricks:
> >>> Brick1: 10.0.0.1:/bricks/brick1/gv0
> >>> Brick2: 10.0.0.2:/bricks/brick1/gv0
> >>> Options Reconfigured:
> >>> nfs.disable: on
> >>> performance.readdir-ahead: on
> >>> transport.address-family: inet
> >>> features.shard: on
> >>> features.shard-block-size: 16MB
> >>
> >>
> >>
> >> When hosting VM's its essential to set these options:
> >>
> >> network.remote-dio: enable
> >> cluster.eager-lock: enable
> >> performance.io-cache: off
> >> performance.read-ahead: off
> >> performance.quick-read: off
> >> performance.stat-prefetch: on
> >> performance.strict-write-ordering: off
> >> cluster.server-quorum-type: server
> >> cluster.quorum-type: auto
> >> cluster.data-self-heal: on
> >>
> >> Also with replica two and quorum on (required) your volume will become
> >> read-only when one node goes down to prevent the possibility of
> split-brain
> >> - you *really* want to avoid that :)
> >>
> >> I'd recommend a replica 3 volume, that way 1 node can go down, but the
> other
> >> two still form a quorum and will remain r/w.
> >>
> >> If the extra disks are not possible, then a Arbiter volume can be setup
> -
> >> basically dummy files on the third node.
> >>
> >>
> >>
> >> --
> >> Lindsay Mathieson
> >>
> >> ___
> >> Gluster-users mailing list
> >> Gluster-users@gluster.org
> >> http://www.gluster.org/mailman/listinfo/gluster-users
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
>
___
Gluster-users mailing list
Gluster-users@gluster.org

Re: [Gluster-users] 3.7.16 with sharding corrupts VMDK files when adding and removing bricks

2016-11-14 Thread Krutika Dhananjay
Yes. I apologise for the delay.

Disabling sharding would knock the translator itself off the client stack,
and
being that sharding is the actual (and the only) translator that has the
knowledge of how to interpret sharded files, and how to aggregate them,
removing the translator from the stack will make all shards start to appear
like
isolated files with no way to interpret the correlation between the
individual pieces.

The only way to fix it is to have sharding be part of the graph *even* if
disabled,
except that in this case, its job should be confined to aggregating the
already
sharded files during reads but NOT shard new files that are created, since
it is
supposed to "act" disabled. This is a slightly bigger change and this is
why I had
suggested the workaround at
https://bugzilla.redhat.com/show_bug.cgi?id=1355846#c1
back then.

FWIW, the documentation [1] does explain how to disable sharding the right
way and has been in existence ever since sharding was first released in
3.7.0.

[1] - http://staged-gluster-docs.readthedocs.io/en/release3.7.
0beta1/Features/shard/

-Krutika



On Mon, Nov 14, 2016 at 9:08 PM, Gandalf Corvotempesta <
gandalf.corvotempe...@gmail.com> wrote:

> 2016-11-14 15:54 GMT+01:00 Niels de Vos :
> > Obviously this is unacceptible for versions that have sharding as a
> > functional (not experimental) feature. All supported features are
> > expected to function without major problems (like corruption) for all
> > standard Gluster operations. Add-brick/replace-brick are surely such
> > Gluster operations.
>
> Is sharding an experimental feature even in 3.8 ?
> Because in 3.8 announcement, it's declared stable:
> http://blog.gluster.org/2016/06/glusterfs-3-8-released/
> "Sharding is now stable for VM image storage. "
>
> > FWIW sharding has several open bugs (like any other component), but it
> > is not immediately clear to me if the problem reported in this email is
> > in Bugzilla yet. These are the bugs that are expected to get fixed in
> > upcoming minor releases:
> >   https://bugzilla.redhat.com/buglist.cgi?component=
> sharding=bug_status=version=notequals=
> notequals=GlusterFS_format=advanced=CLOSED=mainline
>
> My issue with sharding was reported in bugzilla on 2016-07-12
> 4 months for a IMHO, critical bug.
>
> If you disable sharding on a sharded volume with existing shared data,
> you corrupt every existing file.
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] 3.7.16 with sharding corrupts VMDK files when adding and removing bricks

2016-11-14 Thread Krutika Dhananjay
On Mon, Nov 14, 2016 at 8:24 PM, Niels de Vos  wrote:

> On Mon, Nov 14, 2016 at 04:50:44PM +0530, Pranith Kumar Karampuri wrote:
> > On Mon, Nov 14, 2016 at 4:38 PM, Gandalf Corvotempesta <
> > gandalf.corvotempe...@gmail.com> wrote:
> >
> > > 2016-11-14 11:50 GMT+01:00 Pranith Kumar Karampuri <
> pkara...@redhat.com>:
> > > > To make gluster stable for VM images we had to add all these new
> features
> > > > and then fix all the bugs Lindsay/Kevin reported. We just fixed a
> > > corruption
> > > > issue that can happen with replace-brick which will be available in
> 3.9.0
> > > > and 3.8.6. The only 2 other known issues that can lead to
> corruptions are
> > > > add-brick and the bug you filed Gandalf. Krutika just 5 minutes back
> saw
> > > > something that could possibly lead to the corruption for the
> add-brick
> > > bug.
> > > > Is that really the Root cause? We are not sure yet, we need more
> time.
> > > > Without Lindsay/Kevin/David Gossage's support this workload would
> have
> > > been
> > > > in much worse condition. These bugs are not easy to re-create thus
> not
> > > easy
> > > > to fix. At least that has been Krutika's experience.
> > >
> > > Ok, but this changes should be placed in a "test" version and not
> > > marked as stable.
> > > I don't see any development release, only stable releases here.
> > > Do you want all features ? Try the "beta/rc/unstable/alpha/dev"
> version.
> > > Do you want the stable version without known bugs but slow on VMs
> > > workload? Use the "-stable" version.
> > >
> > > If you relase as stable, users tend to upgrade their cluster and use
> > > the newer feature (that you are marking as stable).
> > > What If I upgrade a production cluster to a stable version and try to
> > > add-brick that lead to data corruption ?
> > > I have to restore terabytes worth of data? Gluster is made for
> > > scale-out, what I my cluster was made with 500TB of VMs ?
> > > Try to restore 500TB from a backup
> > >
> > > This is unacceptable. add-brick/replace-brick should be common "daily"
> > > operations. You should heavy check these for regression or bug.
> > >
> >
> > This is a very good point. Adding other maintainers.
>

I think Pranith's intention here was to bring to other maintainers'
attention the point about
development releases vs stable releases although his inline comment may
have been a
bit out-of-place (I was part of the discussion that took place before this
reply of his, in office
today, hence taking the liberty to clarify).

-Krutika


> Obviously this is unacceptible for versions that have sharding as a
> functional (not experimental) feature. All supported features are
> expected to function without major problems (like corruption) for all
> standard Gluster operations. Add-brick/replace-brick are surely such
> Gluster operations.
>
> Of course it is possible that this does not always happen, and our tests
> did not catch the problem. In that case, we really need to have a bug
> report with all the details, and preferably a script that can be used to
> reproduce and detect the failure.
>
> FWIW sharding has several open bugs (like any other component), but it
> is not immediately clear to me if the problem reported in this email is
> in Bugzilla yet. These are the bugs that are expected to get fixed in
> upcoming minor releases:
>   https://bugzilla.redhat.com/buglist.cgi?component=
> sharding=bug_status=version=notequals=
> notequals=GlusterFS_format=advanced=CLOSED=mainline
>
> HTH,
> Niels
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] 3.7.16 with sharding corrupts VMDK files when adding and removing bricks

2016-11-14 Thread Krutika Dhananjay
Which data corruption issue is this? Could you point me to the bug report
on bugzilla?

-Krutika

On Sat, Nov 12, 2016 at 4:28 PM, Gandalf Corvotempesta <
gandalf.corvotempe...@gmail.com> wrote:

> Il 12 nov 2016 10:21, "Kevin Lemonnier"  ha scritto:
> > We've had a lot of problems in the past, but at least for us 3.7.12 (and
> 3.7.15)
> > seems to be working pretty well as long as you don't add bricks. We
> started doing
> > multiple little clusters and abandonned the idea of one big cluster, had
> no
> > issues since :)
> >
>
> Well, adding bricks could be usefull...  :)
>
> Having to create multiple cluster is not a solution and is much more
> expansive.
> And if you corrupt data from a single cluster you still have issues
>
> I think would be better to add less features and focus more to stability.
> In a software defined storage, stability and consistency are the most
> important things
>
> I'm also subscribed to moosefs and lizardfs mailing list and I don't
> recall any single data corruption/data loss event
>
> In gluster,  after some days of testing I've found a huge data corruption
> issue that is still unfixed on bugzilla.
> If you change the shard size on a populated cluster,  you break all
> existing data.
> Try to do this on a cluster with working VMs and see what happens
> a single cli command break everything and is still unfixed.
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] 3.7.16 with sharding corrupts VMDK files when adding and removing bricks

2016-11-11 Thread Krutika Dhananjay
Hi,

Yes, this has been reported before by Lindsay Mathieson and Kevin Lemonnier
on this list.
We just found one issue with replace-brick that we recently fixed.

In your case, are you doing add-brick and changing the replica count (say
from 2 -> 3) or are you adding
"replica-count" number of bricks every time?

-Krutika

On Sat, Nov 12, 2016 at 6:40 AM, ML Wong  wrote:

> Have anyone encounter this behavior?
>
> Running 3.7.16 from centos-gluster37, on CentOS 7.2 with NFS-Ganesha
> 2.3.0. VMs are running fine without problems and with Sharding on. However,
> when i either do a "add-brick" or "remove-brick start force". VM files will
> then be corrupted, and the VM will not be able to boot anymore.
>
> So far, as i access files through regular NFS, all regular files, or
> directories seems to be accessible fine. I am not sure if this somehow
> relate to bug1318136, but any help will be appreciated. Or, m i missing any
> settings? Below is the vol info of gluster volume.
>
> Volume Name: nfsvol1
> Type: Distributed-Replicate
> Volume ID: 06786467-4c8a-48ad-8b1f-346aa8342283
> Status: Started
> Number of Bricks: 2 x 2 = 4
> Transport-type: tcp
> Bricks:
> Brick1: stor4:/data/brick1/nfsvol1
> Brick2: stor5:/data/brick1/nfsvol1
> Brick3: stor1:/data/brick1/nfsvol1
> Brick4: stor2:/data/brick1/nfsvol1
> Options Reconfigured:
> features.shard-block-size: 64MB
> features.shard: on
> ganesha.enable: on
> features.cache-invalidation: off
> nfs.disable: on
> performance.readdir-ahead: on
> nfs-ganesha: enable
> cluster.enable-shared-storage: enable
>
> thanks,
> Melvin
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] Improving IOPS

2016-11-03 Thread Krutika Dhananjay
There is compound fops feature coming up which reduces the
number of calls over the network in AFR transactions, thereby
improving performance. It will be available in 3.9 (and latest
upstream master too) if you're interested to try it out but
DO NOT use it in production yet. It may have some stability
issues as it hasn't been thoroughly tested.

You can enable it using the following command:

# gluster volume set  cluster.use-compound-fops on

-Krutika

On Fri, Nov 4, 2016 at 9:36 AM, Lindsay Mathieson <
lindsay.mathie...@gmail.com> wrote:

> On 4 November 2016 at 03:38, Gambit15  wrote:
> > There are lots of factors involved. Can you describe your setup & use
> case a
> > little more?
>
>
> Replica 3 Cluster. Individual Bricks are RAIDZ10 (zfs) that can manage
> 450 MB/s write, 1.2GB/s Read.
> - 2 * 1GB Bond, Balance-alb
> - 64 MB Shards
> - KVM VM Hosting via gfapi
>
> Looking at improving the IOPS for the VM's
>
> --
> Lindsay
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] [URGENT] Add-bricks to a volume corrupted the files

2016-10-22 Thread Krutika Dhananjay
Awesome. Thanks for the logs. Will take a look.

-Krutika

On Sun, Oct 23, 2016 at 5:47 AM, Lindsay Mathieson <
lindsay.mathie...@gmail.com> wrote:

> On 20/10/2016 9:13 PM, Krutika Dhananjay wrote:
>
>> It would be awesome if you could tell us whether you
>> see the issue with FUSE as well, while we get around
>> to setting up the environment and running the test ourselves.
>>
>
> I just managed to replicate the exact same error using the fuse mount
>
> --
> Lindsay Mathieson
>
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] [URGENT] Add-bricks to a volume corrupted the files

2016-10-20 Thread Krutika Dhananjay
Thanks a lot, Lindsay! Appreciate the help.

It would be awesome if you could tell us whether you
see the issue with FUSE as well, while we get around
to setting up the environment and running the test ourselves.

-Krutika

On Thu, Oct 20, 2016 at 2:57 AM, Lindsay Mathieson <
lindsay.mathie...@gmail.com> wrote:

> On 20/10/2016 7:01 AM, Kevin Lemonnier wrote:
>
>> Yes, you need to add a full replica set at once.
>> I don't remember, but according to my history, looks like I've used this :
>>
>> gluster volume add-brick VMs host1:/brick host2:/brick host3:/brick force
>>
>> (I have the same without force just before that, so I assume force is
>> needed)
>>
>
> Ok, I did a:
>
> gluster volume add-brick datastore1 
> vna.proxmox.softlog:/tank/vmdata/datastore1-2
> vnb.proxmox.softlog:/tank/vmdata/datastore1-2
> vng.proxmox.softlog:/tank/vmdata/datastore1-2
>
> I had added a 2nd windows VM as well.
>
> Looked like it was going ok for a while, then blew up. The first windows
> vm which was running diskmark died and won't boot. qemu-img check shows the
> image hopelessly corrupted. 2nd VM has also crashed and is unbootable,
> though qemuimg shows the qcow2 file as ok.
>
>
> I have a sneaking suspicion its related to active IO. VM1 was doing heavy
> io compared to vm2, perhaps thats while is image was corrupted worse.
>
>
> rebalance status looks odd to me:
>
> root@vna:~# gluster volume rebalance datastore1 status
> Node Rebalanced-files  size
>scanned  failures skipped   status  run time in h:m:s
>- ---   ---
>  ---   --- --- 
>  --
>localhost 00Bytes 0
>  0 0completed0:0:1
>  vnb.proxmox.softlog 00Bytes 0
>  0 0completed0:0:1
>  vng.proxmox.softlog 32819.2GB  1440
>0 0  in progress0:11:55
>
>
> Don't know why vng is taking so much longer, the nodes are identical. But
> maybe this normal?
>
>
> When I get time, I'll try again with:
>
> - all vm's shutdown (no IO)
>
> - All VM's running off the gluster fuse mount (no gfapi).
>
>
> cheers,
>
>
> --
> Lindsay Mathieson
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] [URGENT] Add-bricks to a volume corrupted the files

2016-10-19 Thread Krutika Dhananjay
Agreed.
I will run the same test on an actual vm setup one of these days and
see if I manage to recreate the issue (after I have completed some
of my long pending tasks). Meanwhile if any of you find a consistent simpler
test case to hit the issue, feel free to reply on this thread. At least I
had no success
in recreating the bug in a non-VM-store setup.

-Krutika

On Mon, Oct 17, 2016 at 12:50 PM, Gandalf Corvotempesta <
gandalf.corvotempe...@gmail.com> wrote:

> Il 14 ott 2016 17:37, "David Gossage"  ha
> scritto:
> >
> > Sorry to resurrect an old email but did any resolution occur for this or
> a cause found?  I just see this as a potential task I may need to also run
> through some day and if their are pitfalls to watch for would be good to
> know.
> >
>
> I think that the issue wrote in these emails must be addressed in some way.
> It's really bad that adding bricks to a cluster lead to data corruption as
> adding bricks is a standard administration task
>
> I hope that the issue will be detected and fixed asap.
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] [URGENT] Add-bricks to a volume corrupted the files

2016-10-16 Thread Krutika Dhananjay
Hi,

No. I did run add-brick on a volume with the same configuration as that of
Kevin, while IO was running, except
that I wasn't running VM workload. I compared the file checksums wrt the
original src files from which they were copied
and they matched.


@Kevin,

I see that network.ping-timeout on your setup is 15 seconds and  that's too
low. Could you reconfigure that to 30 seconds?

-Krutika

On Fri, Oct 14, 2016 at 9:07 PM, David Gossage <dgoss...@carouselchecks.com>
wrote:

> Sorry to resurrect an old email but did any resolution occur for this or a
> cause found?  I just see this as a potential task I may need to also run
> through some day and if their are pitfalls to watch for would be good to
> know.
>
> *David Gossage*
> *Carousel Checks Inc. | System Administrator*
> *Office* 708.613.2284
>
> On Tue, Sep 6, 2016 at 5:38 AM, Kevin Lemonnier <lemonni...@ulrar.net>
> wrote:
>
>> Hi,
>>
>> Here is the info :
>>
>> Volume Name: VMs
>> Type: Replicate
>> Volume ID: c5272382-d0c8-4aa4-aced-dd25a064e45c
>> Status: Started
>> Number of Bricks: 1 x 3 = 3
>> Transport-type: tcp
>> Bricks:
>> Brick1: ips4adm.name:/mnt/storage/VMs
>> Brick2: ips5adm.name:/mnt/storage/VMs
>> Brick3: ips6adm.name:/mnt/storage/VMs
>> Options Reconfigured:
>> performance.readdir-ahead: on
>> cluster.quorum-type: auto
>> cluster.server-quorum-type: server
>> network.remote-dio: enable
>> cluster.eager-lock: enable
>> performance.quick-read: off
>> performance.read-ahead: off
>> performance.io-cache: off
>> performance.stat-prefetch: off
>> features.shard: on
>> features.shard-block-size: 64MB
>> cluster.data-self-heal-algorithm: full
>> network.ping-timeout: 15
>>
>>
>> For the logs I'm sending that over to you in private.
>>
>>
>> On Tue, Sep 06, 2016 at 09:48:07AM +0530, Krutika Dhananjay wrote:
>> >Could you please attach the glusterfs client and brick logs?
>> >Also provide output of `gluster volume info`.
>> >-Krutika
>> >On Tue, Sep 6, 2016 at 4:29 AM, Kevin Lemonnier <
>> lemonni...@ulrar.net>
>> >wrote:
>> >
>> >  >A  A  - What was the original (and current) geometry? (status and
>> info)
>> >
>> >  It was a 1x3 that I was trying to bump to 2x3.
>> >  >A  A  - what parameters did you use when adding the bricks?
>> >  >
>> >
>> >  Just a simple add-brick node1:/path node2:/path node3:/path
>> >  Then a fix-layout when everything started going wrong.
>> >
>> >  I was able to salvage some VMs by stopping them then starting them
>> >  again,
>> >  but most won't start for various reasons (disk corrupted, grub not
>> found
>> >  ...).
>> >  For those we are deleting the disks then importing them from
>> backups,
>> >  that's
>> >  a huge loss but everything has been down for so long, no choice ..
>> >  >A  A  On 6/09/2016 8:00 AM, Kevin Lemonnier wrote:
>> >  >
>> >  >A  I tried a fix-layout, and since that didn't work I removed the
>> brick
>> >  (start then commit when it showed
>> >  >A  completed). Not better, the volume is now running on the 3
>> original
>> >  bricks (replica 3) but the VMs
>> >  >A  are still corrupted. I have 880 Mb of shards on the bricks I
>> removed
>> >  for some reason, thos shards do exist
>> >  >A  (and are bigger) on the "live" volume. I don't understand why
>> now
>> >  that I have removed the new bricks
>> >  >A  everything isn't working like before ..
>> >  >
>> >  >A  On Mon, Sep 05, 2016 at 11:06:16PM +0200, Kevin Lemonnier
>> wrote:
>> >  >
>> >  >A  Hi,
>> >  >
>> >  >A  I just added 3 bricks to a volume and all the VMs are doing I/O
>> >  errors now.
>> >  >A  I rebooted a VM to see and it can't start again, am I missing
>> >  something ? Is the reblance required
>> >  >A  to make everything run ?
>> >  >
>> >  >A  That's urgent, thanks.
>> >  >
>> >  >A  --
>> >  >A  Kevin Lemonnier
>> >  >A  PGP Fingerprint : 89A5 2283 04A0 E6E9 0111
>> >  >
>> >  >
>> >  >
>> >  >
>> >  >A  _

Re: [Gluster-users] Healing Delays

2016-10-01 Thread Krutika Dhananjay
Any errors/warnings in the glustershd logs?

-Krutika

On Sat, Oct 1, 2016 at 8:18 PM, Lindsay Mathieson <
lindsay.mathie...@gmail.com> wrote:

> This was raised earlier but I don't believe it was ever resolved and it is
> becoming a serious issue for me.
>
>
> I'm doing rolling upgrades on our three node cluster (Replica 3, Sharded,
> VM Workload).
>
>
> I update one node, reboot it, wait for healing to complete, do the next
> one.
>
>
> Only the heal count does not change, it just does not seem to start. It
> can take hours before it shifts, but once it does, its quite rapid. Node 1
> has restarted and the heal count has been static at 511 shards for 45
> minutes now. Nodes 1 & 2 have low CPU load, node 3 has glusterfsd pegged at
> 800% CPU.
>
>
> This was *not* the case in earlier versions of gluster (3.7.11 I think),
> healing would start almost right away. I think it started doing this when
> the afr locking improvements where made.
>
>
> I have experimented with full & diff heal modes, doesn't make any
> difference.
>
> Current:
>
> Gluster Version 4.8.4
>
> Volume Name: datastore4
> Type: Replicate
> Volume ID: 0ba131ef-311d-4bb1-be46-596e83b2f6ce
> Status: Started
> Snapshot Count: 0
> Number of Bricks: 1 x 3 = 3
> Transport-type: tcp
> Bricks:
> Brick1: vnb.proxmox.softlog:/tank/vmdata/datastore4
> Brick2: vng.proxmox.softlog:/tank/vmdata/datastore4
> Brick3: vna.proxmox.softlog:/tank/vmdata/datastore4
> Options Reconfigured:
> cluster.self-heal-window-size: 1024
> cluster.locking-scheme: granular
> cluster.granular-entry-heal: on
> performance.readdir-ahead: on
> cluster.data-self-heal: on
> features.shard: on
> cluster.quorum-type: auto
> cluster.server-quorum-type: server
> nfs.disable: on
> nfs.addr-namelookup: off
> nfs.enable-ino32: off
> performance.strict-write-ordering: off
> performance.stat-prefetch: on
> performance.quick-read: off
> performance.read-ahead: off
> performance.io-cache: off
> cluster.eager-lock: enable
> network.remote-dio: enable
> features.shard-block-size: 64MB
> cluster.background-self-heal-count: 16
>
>
> Thanks,
>
>
>
>
>
> --
> Lindsay Mathieson
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] File Size and Brick Size

2016-09-27 Thread Krutika Dhananjay
Worked fine for me actually.

# md5sum lastlog
ab7557d582484a068c3478e342069326  lastlog
# rsync -avH lastlog  /mnt/
sending incremental file list
lastlog

sent 364,001,522 bytes  received 35 bytes  48,533,540.93 bytes/sec
total size is 363,912,592  speedup is 1.00
# cd /mnt
# md5sum lastlog
ab7557d582484a068c3478e342069326  lastlog

-Krutika


On Wed, Sep 28, 2016 at 8:21 AM, Krutika Dhananjay <kdhan...@redhat.com>
wrote:

> Hi,
>
> What version of gluster are you using?
> Also, could you share your volume configuration (`gluster volume info`)?
>
> -Krutika
>
> On Wed, Sep 28, 2016 at 6:58 AM, Ravishankar N <ravishan...@redhat.com>
> wrote:
>
>> On 09/28/2016 12:16 AM, ML Wong wrote:
>>
>> Hello Ravishankar,
>> Thanks for introducing the sharding feature to me.
>> It does seems to resolve the problem i was encountering earlier. But I
>> have 1 question, do we expect the checksum of the file to be different if i
>> copy from directory A to a shard-enabled volume?
>>
>>
>> No the checksums must match. Perhaps Krutika who works on Sharding
>> (CC'ed) can help you figure out why that isn't the case here.
>> -Ravi
>>
>>
>> [x@ip-172-31-1-72 ~]$ sudo sha1sum /var/tmp/oVirt-Live-4.0.4.iso
>> ea8472f6408163fa9a315d878c651a519fc3f438  /var/tmp/oVirt-Live-4.0.4.iso
>> [x@ip-172-31-1-72 ~]$ sudo rsync -avH /var/tmp/oVirt-Live-4.0.4.iso
>> /mnt/
>> sending incremental file list
>> oVirt-Live-4.0.4.iso
>>
>> sent 1373802342 bytes  received 31 bytes  30871963.44 bytes/sec
>> total size is 1373634560  speedup is 1.00
>> [x@ip-172-31-1-72 ~]$ sudo sha1sum /mnt/oVirt-Live-4.0.4.iso
>> 14e9064857b40face90c91750d79c4d8665b9cab  /mnt/oVirt-Live-4.0.4.iso
>>
>> On Mon, Sep 26, 2016 at 6:42 PM, Ravishankar N <ravishan...@redhat.com>
>> wrote:
>>
>>> On 09/27/2016 05:15 AM, ML Wong wrote:
>>>
>>> Have anyone in the list who has tried copying file which is bigger than
>>> the individual brick/replica size?
>>> Test Scenario:
>>> Distributed-Replicated volume, 2GB size, 2x2 = 4 bricks, 2 replicas
>>> Each replica has 1GB
>>>
>>> When i tried to copy file this volume, by both fuse, or nfs mount. i get
>>> I/O error.
>>> Filesystem  Size  Used Avail Use% Mounted on
>>> /dev/mapper/vg0-brick1 1017M   33M  985M   4% /data/brick1
>>> /dev/mapper/vg0-brick2 1017M  109M  909M  11% /data/brick2
>>> lbre-cloud-dev1:/sharevol1  2.0G  141M  1.9G   7% /sharevol1
>>>
>>> [xx@cloud-dev1 ~]$ du -sh /var/tmp/ovirt-live-el7-3.6.2.iso
>>> 1.3G /var/tmp/ovirt-live-el7-3.6.2.iso
>>>
>>> [melvinw@lbre-cloud-dev1 ~]$ sudo cp /var/tmp/ovirt-live-el7-3.6.2.iso
>>> /sharevol1/
>>> cp: error writing ‘/sharevol1/ovirt-live-el7-3.6.2.iso’: Input/output
>>> error
>>> cp: failed to extend ‘/sharevol1/ovirt-live-el7-3.6.2.iso’:
>>> Input/output error
>>> cp: failed to close ‘/sharevol1/ovirt-live-el7-3.6.2.iso’: Input/output
>>> error
>>>
>>>
>>> Does the mount log give you more information? It it was a disk full
>>> issue, the error you would get is ENOSPC and not EIO. This looks like
>>> something else.
>>>
>>>
>>> I know, we have experts in this mailing list. And, i assume, this is a
>>> common situation where many Gluster users may have encountered.  The worry
>>> i have what if you have a big VM file sitting on top of Gluster volume ...?
>>>
>>> It is recommended to use sharding (http://blog.gluster.org/2015/
>>> 12/introducing-shard-translator/) for VM workloads to alleviate these
>>> kinds of issues.
>>> -Ravi
>>>
>>> Any insights will be much appreciated.
>>>
>>>
>>>
>>> ___
>>> Gluster-users mailing 
>>> listGluster-users@gluster.orghttp://www.gluster.org/mailman/listinfo/gluster-users
>>>
>>>
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] File Size and Brick Size

2016-09-27 Thread Krutika Dhananjay
Hi,

What version of gluster are you using?
Also, could you share your volume configuration (`gluster volume info`)?

-Krutika

On Wed, Sep 28, 2016 at 6:58 AM, Ravishankar N 
wrote:

> On 09/28/2016 12:16 AM, ML Wong wrote:
>
> Hello Ravishankar,
> Thanks for introducing the sharding feature to me.
> It does seems to resolve the problem i was encountering earlier. But I
> have 1 question, do we expect the checksum of the file to be different if i
> copy from directory A to a shard-enabled volume?
>
>
> No the checksums must match. Perhaps Krutika who works on Sharding (CC'ed)
> can help you figure out why that isn't the case here.
> -Ravi
>
>
> [x@ip-172-31-1-72 ~]$ sudo sha1sum /var/tmp/oVirt-Live-4.0.4.iso
> ea8472f6408163fa9a315d878c651a519fc3f438  /var/tmp/oVirt-Live-4.0.4.iso
> [x@ip-172-31-1-72 ~]$ sudo rsync -avH /var/tmp/oVirt-Live-4.0.4.iso
> /mnt/
> sending incremental file list
> oVirt-Live-4.0.4.iso
>
> sent 1373802342 bytes  received 31 bytes  30871963.44 bytes/sec
> total size is 1373634560  speedup is 1.00
> [x@ip-172-31-1-72 ~]$ sudo sha1sum /mnt/oVirt-Live-4.0.4.iso
> 14e9064857b40face90c91750d79c4d8665b9cab  /mnt/oVirt-Live-4.0.4.iso
>
> On Mon, Sep 26, 2016 at 6:42 PM, Ravishankar N 
> wrote:
>
>> On 09/27/2016 05:15 AM, ML Wong wrote:
>>
>> Have anyone in the list who has tried copying file which is bigger than
>> the individual brick/replica size?
>> Test Scenario:
>> Distributed-Replicated volume, 2GB size, 2x2 = 4 bricks, 2 replicas
>> Each replica has 1GB
>>
>> When i tried to copy file this volume, by both fuse, or nfs mount. i get
>> I/O error.
>> Filesystem  Size  Used Avail Use% Mounted on
>> /dev/mapper/vg0-brick1 1017M   33M  985M   4% /data/brick1
>> /dev/mapper/vg0-brick2 1017M  109M  909M  11% /data/brick2
>> lbre-cloud-dev1:/sharevol1  2.0G  141M  1.9G   7% /sharevol1
>>
>> [xx@cloud-dev1 ~]$ du -sh /var/tmp/ovirt-live-el7-3.6.2.iso
>> 1.3G /var/tmp/ovirt-live-el7-3.6.2.iso
>>
>> [melvinw@lbre-cloud-dev1 ~]$ sudo cp /var/tmp/ovirt-live-el7-3.6.2.iso
>> /sharevol1/
>> cp: error writing ‘/sharevol1/ovirt-live-el7-3.6.2.iso’: Input/output
>> error
>> cp: failed to extend ‘/sharevol1/ovirt-live-el7-3.6.2.iso’: Input/output
>> error
>> cp: failed to close ‘/sharevol1/ovirt-live-el7-3.6.2.iso’: Input/output
>> error
>>
>>
>> Does the mount log give you more information? It it was a disk full
>> issue, the error you would get is ENOSPC and not EIO. This looks like
>> something else.
>>
>>
>> I know, we have experts in this mailing list. And, i assume, this is a
>> common situation where many Gluster users may have encountered.  The worry
>> i have what if you have a big VM file sitting on top of Gluster volume ...?
>>
>> It is recommended to use sharding (http://blog.gluster.org/2015/
>> 12/introducing-shard-translator/) for VM workloads to alleviate these
>> kinds of issues.
>> -Ravi
>>
>> Any insights will be much appreciated.
>>
>>
>>
>> ___
>> Gluster-users mailing 
>> listGluster-users@gluster.orghttp://www.gluster.org/mailman/listinfo/gluster-users
>>
>>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] 3.8.3 Shards Healing Glacier Slow

2016-09-06 Thread Krutika Dhananjay
On Tue, Sep 6, 2016 at 7:27 PM, David Gossage <dgoss...@carouselchecks.com>
wrote:

> Going to top post with solution Krutika Dhananjay came up with.  His steps
> were much less volatile and could be done with volume still being actively
> used and also much less prone to accidental destruction.
>
> My use case and issue were desire to wipe a brick and recreate with same
> directory structure so as to change underlying raid setup of disks making
> brick.  Problem occurred that getting the shards to heal was 99% of the
> time failing.
>
>
Hi,

Thank you for posting this before I could get around to it. Also thanks to
Pranith for suggesting the additional precautionary 'trusted.afr.dirty'
step (step 4 below) and reviewing the steps once.

IIUC the newly-introduced reset-brick command serves as an alternative to
all this lengthy process listed below.

@Pranith,
Is the above statement correct? If so, do we know which releases will have
the reset-brick command/feature?



> These are steps he provided that has been working well.
>

Err.. she. :)

-Krutika


>
> 1) kill brick pid on server that you want to replace
> kill -15 
>
> 2) do brick maintenance which in my case was:
> zpool destroy 
> zpool create (options) yada yada disks
>
> 3) make sure original path to brick exists
> mkdir /path/to/brick
>
> 4) set extended attribute on new brick path (not over gluster mount)
> setfattr -n trusted.afr.dirty -v 0x0001 /path/to/brick
>
> 5) create a mount point to volume
> mkdir /mnt-brick-test
> glusterfs --volfile-id= --volfile-server= active gluster server> --client-pid=-6 /mnt-brick-test
>
> 6)set an extended attribute on the gluster network mount VOLNAME is the
> gluster volume KILLEDBRICK# is the index of server needing heal.  they
> start from 0 and gluster v info should display them in order
> setfattr -n trusted.replace-brick -v VOLNAME-client-KILLEDBRICK#
> /mnt-brick-test
>
> 7) gluster heal should know show the / root of gluster volume in output
> gluster v heal VOLNAME info
>
> 8) force start volume to bring up killed brick
> gluster v start VOLNAME force
>
> 9) optionally watch heal progress and drink beer while you wait and hope
> nothing blows up
> watch -n 10 gluster v heal VOLNAME statistics heal-count
>
> 10) unmount gluster network mount from server
> umount /mnt-brick-test
>
> 11) Praise the developers for their efforts
>
> *David Gossage*
> *Carousel Checks Inc. | System Administrator*
> *Office* 708.613.2284
>
> On Thu, Sep 1, 2016 at 2:29 PM, David Gossage <dgoss...@carouselchecks.com
> > wrote:
>
>> On Thu, Sep 1, 2016 at 12:09 AM, Krutika Dhananjay <kdhan...@redhat.com>
>> wrote:
>>
>>>
>>>
>>> On Wed, Aug 31, 2016 at 8:13 PM, David Gossage <
>>> dgoss...@carouselchecks.com> wrote:
>>>
>>>> Just as a test I did not shut down the one VM on the cluster as finding
>>>> a window before weekend where I can shut down all VM's and fit in a full
>>>> heal is unlikely so wanted to see what occurs.
>>>>
>>>>
>>>> kill -15 brick pid
>>>> rm -Rf /gluster2/brick1/1
>>>> mkdir /gluster2/brick1/1
>>>> mkdir /rhev/data-center/mnt/glusterSD/192.168.71.10\:_glustershard
>>>> /fake3
>>>> setfattr -n "user.some-name" -v "some-value"
>>>> /rhev/data-center/mnt/glusterSD/192.168.71.10\:_glustershard
>>>>
>>>> getfattr -d -m . -e hex /gluster2/brick2/1
>>>> # file: gluster2/brick2/1
>>>> security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f7
>>>> 23a756e6c6162656c65645f743a733000
>>>> trusted.afr.dirty=0x0001
>>>> trusted.afr.glustershard-client-0=0x0002
>>>>
>>>
>>> This is unusual. The last digit ought to have been 1 on account of
>>> "fake3" being created while hte first brick is offline.
>>>
>>> This discussion is becoming unnecessary lengthy. Mind if we discuss this
>>> and sort it out on IRC today, at least the communication will be continuous
>>> and in real-time. I'm kdhananjay on #gluster (Freenode). Ping me when
>>> you're online.
>>>
>>> -Krutika
>>>
>>
>> Thanks for assistance this morning.  Looks like I lost connection in IRC
>> and didn't realize it so sorry if you came back looking for me.  Let me
>> know when the steps you worked out have been reviewed and if it's found
>> safe for production use and I'll give a try.
>>
>>
>>
>>&

Re: [Gluster-users] Tiering and sharding for VM workload

2016-09-06 Thread Krutika Dhananjay
Theoretically whatever you said is correct (at least from shard's
perspective).
Adding Rafi who's worked on tiering to know if he thinks otherwise.

It must be mentioned that sharding + tiering hasn't been tested as such
till now by us at least.

Did you try it? If so, what was your experience?

-Krutika

On Tue, Sep 6, 2016 at 5:59 PM, Gandalf Corvotempesta <
gandalf.corvotempe...@gmail.com> wrote:

> Anybody?
>
> Il 05 set 2016 22:19, "Gandalf Corvotempesta" <
> gandalf.corvotempe...@gmail.com> ha scritto:
>
>> Is tiering with sharding usefull with VM workload?
>> Let's assume a storage with tiering and sharding enabled, used for
>> hosting VM images.
>> Each shard is subject to tiering, thus the most frequent part of the
>> VM would be cached on the SSD, allowing better performance.
>>
>> Is this correct?
>>
>> To put it simple, very simple, let's assume a webserver VM, with the
>> following directory structure:
>>
>> /home/user1/public_html
>> /home/user2/public_html
>>
>> both are stored on 2 different shards (i'm semplyfing).
>> /home/user1/public_html has much more visits than user2.
>>
>> Would that shard cached on hot tier allowing faster access by the
>> webserver?
>>
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] [URGENT] Add-bricks to a volume corrupted the files

2016-09-05 Thread Krutika Dhananjay
Could you please attach the glusterfs client and brick logs?
Also provide output of `gluster volume info`.

-Krutika

On Tue, Sep 6, 2016 at 4:29 AM, Kevin Lemonnier 
wrote:

> >- What was the original (and current) geometry? (status and info)
>
> It was a 1x3 that I was trying to bump to 2x3.
>
> >- what parameters did you use when adding the bricks?
> >
>
> Just a simple add-brick node1:/path node2:/path node3:/path
> Then a fix-layout when everything started going wrong.
>
>
> I was able to salvage some VMs by stopping them then starting them again,
> but most won't start for various reasons (disk corrupted, grub not found
> ...).
> For those we are deleting the disks then importing them from backups,
> that's
> a huge loss but everything has been down for so long, no choice ..
>
> >On 6/09/2016 8:00 AM, Kevin Lemonnier wrote:
> >
> >  I tried a fix-layout, and since that didn't work I removed the brick
> (start then commit when it showed
> >  completed). Not better, the volume is now running on the 3 original
> bricks (replica 3) but the VMs
> >  are still corrupted. I have 880 Mb of shards on the bricks I removed
> for some reason, thos shards do exist
> >  (and are bigger) on the "live" volume. I don't understand why now that
> I have removed the new bricks
> >  everything isn't working like before ..
> >
> >  On Mon, Sep 05, 2016 at 11:06:16PM +0200, Kevin Lemonnier wrote:
> >
> >  Hi,
> >
> >  I just added 3 bricks to a volume and all the VMs are doing I/O errors
> now.
> >  I rebooted a VM to see and it can't start again, am I missing something
> ? Is the reblance required
> >  to make everything run ?
> >
> >  That's urgent, thanks.
> >
> >  --
> >  Kevin Lemonnier
> >  PGP Fingerprint : 89A5 2283 04A0 E6E9 0111
> >
> >
> >
> >
> >  ___
> >  Gluster-users mailing list
> >  Gluster-users@gluster.org
> >  http://www.gluster.org/mailman/listinfo/gluster-users
> >
> >
> >
> >  ___
> >  Gluster-users mailing list
> >  Gluster-users@gluster.org
> >  http://www.gluster.org/mailman/listinfo/gluster-users
> >
> >  --
> >  Lindsay Mathieson
>
> > ___
> > Gluster-users mailing list
> > Gluster-users@gluster.org
> > http://www.gluster.org/mailman/listinfo/gluster-users
>
>
> --
> Kevin Lemonnier
> PGP Fingerprint : 89A5 2283 04A0 E6E9 0111
>
> ___
> Gluster-users mailing list
> Gluster-users@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-users
>
___
Gluster-users mailing list
Gluster-users@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-users

Re: [Gluster-users] 3.8.3 Shards Healing Glacier Slow

2016-08-31 Thread Krutika Dhananjay
9332e50ba441e8fa5cce3ae6f3a15
> user.some-name=0x736f6d652d76616c7565
>
> getfattr -d -m . -e hex /gluster2/brick1/1/
> getfattr: Removing leading '/' from absolute path names
> # file: gluster2/brick1/1/
> security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f7
> 23a756e6c6162656c65645f743a733000
> trusted.gfid=0x0001
> trusted.glusterfs.dht=0x0001
> trusted.glusterfs.volume-id=0x5889332e50ba441e8fa5cce3ae6f3a15
> user.some-name=0x736f6d652d76616c7565
>
> heal count stayed same for awhile then ran
>
> gluster v heal glustershard full
>
> heals jump up to 700 as shards actually get read in as needing heals.
>  glustershd shows 3 sweeps started one per brick
>
> It heals shards things look ok heal <> info shows 0 files but statistics
> heal-info shows 1 left for brick 2 and 3. perhaps cause I didnt stop vm
> running?
>
> # file: gluster2/brick1/1/
> security.selinux=0x756e636f6e66696e65645f753a6f
> 626a6563745f723a756e6c6162656c65645f743a733000
> trusted.gfid=0x0001
> trusted.glusterfs.dht=0x0001
> trusted.glusterfs.volume-id=0x5889332e50ba441e8fa5cce3ae6f3a15
> user.some-name=0x736f6d652d76616c7565
>
> # file: gluster2/brick2/1/
> security.selinux=0x756e636f6e66696e65645f753a6f
> 626a6563745f723a756e6c6162656c65645f743a733000
> trusted.afr.dirty=0x
> trusted.afr.glustershard-client-0=0x0001
> trusted.afr.glustershard-client-2=0x
> trusted.gfid=0x0001
> trusted.glusterfs.dht=0x0001
> trusted.glusterfs.volume-id=0x5889332e50ba441e8fa5cce3ae6f3a15
> user.some-name=0x736f6d652d76616c7565
>
> # file: gluster2/brick3/1/
> security.selinux=0x756e636f6e66696e65645f753a6f
> 626a6563745f723a756e6c6162656c65645f743a733000
> trusted.afr.dirty=0x
> trusted.afr.glustershard-client-0=0x0001
> trusted.gfid=0x0001
> trusted.glusterfs.dht=0x0001
> trusted.glusterfs.volume-id=0x5889332e50ba441e8fa5cce3ae6f3a15
> user.some-name=0x736f6d652d76616c7565
>
> meta-data split-brain?  heal <> info split-brain shows no files or
> entries.  If I had thought ahead I would have checked the values returned
> by getfattr before, although I do know heal-count was returning 0 at the
> time
>
>
> Assuming I need to shut down vm's and put volume in maintenance from ovirt
> to prevent any io.  Does it need to occur for whole heal or can I
> re-activate at some point to bring VM's back up?
>
>
>
>
> *David Gossage*
> *Carousel Checks Inc. | System Administrator*
> *Office* 708.613.2284
>
> On Wed, Aug 31, 2016 at 3:50 AM, Krutika Dhananjay <kdhan...@redhat.com>
> wrote:
>
>> No, sorry, it's working fine. I may have missed some step because of
>> which i saw that problem. /.shard is also healing fine now.
>>
>> Let me know if it works for you.
>>
>> -Krutika
>>
>> On Wed, Aug 31, 2016 at 12:49 PM, Krutika Dhananjay <kdhan...@redhat.com>
>> wrote:
>>
>>> OK I just hit the other issue too, where .shard doesn't get healed. :)
>>>
>>> Investigating as to why that is the case. Give me some time.
>>>
>>> -Krutika
>>>
>>> On Wed, Aug 31, 2016 at 12:39 PM, Krutika Dhananjay <kdhan...@redhat.com
>>> > wrote:
>>>
>>>> Just figured the steps Anuradha has provided won't work if granular
>>>> entry heal is on.
>>>> So when you bring down a brick and create fake2 under / of the volume,
>>>> granular entry heal feature causes
>>>> sh to remember only the fact that 'fake2' needs to be recreated on the
>>>> offline brick (because changelogs are granular).
>>>>
>>>> In this case, we would be required to indicate to self-heal-daemon that
>>>> the entire directory tree from '/' needs to be repaired on the brick that
>>>> contains no data.
>>>>
>>>> To fix this, I did the following (for users who use granular entry
>>>> self-healing):
>>>>
>>>> 1. Kill the last brick process in the replica (/bricks/3)
>>>>
>>>> 2. [root@server-3 ~]# rm -rf /bricks/3
>>>>
>>>> 3. [root@server-3 ~]# mkdir /bricks/3
>>>>
>>>> 4. Create a new dir on the mount point:
>>>> [root@client-1 ~]# mkdir /mnt/fake
>>>>
>>>> 5. Set some fake xattr on the root of the volume, and not the 'fake'

Re: [Gluster-users] 3.8.3 Shards Healing Glacier Slow

2016-08-31 Thread Krutika Dhananjay
No, sorry, it's working fine. I may have missed some step because of which
i saw that problem. /.shard is also healing fine now.

Let me know if it works for you.

-Krutika

On Wed, Aug 31, 2016 at 12:49 PM, Krutika Dhananjay <kdhan...@redhat.com>
wrote:

> OK I just hit the other issue too, where .shard doesn't get healed. :)
>
> Investigating as to why that is the case. Give me some time.
>
> -Krutika
>
> On Wed, Aug 31, 2016 at 12:39 PM, Krutika Dhananjay <kdhan...@redhat.com>
> wrote:
>
>> Just figured the steps Anuradha has provided won't work if granular entry
>> heal is on.
>> So when you bring down a brick and create fake2 under / of the volume,
>> granular entry heal feature causes
>> sh to remember only the fact that 'fake2' needs to be recreated on the
>> offline brick (because changelogs are granular).
>>
>> In this case, we would be required to indicate to self-heal-daemon that
>> the entire directory tree from '/' needs to be repaired on the brick that
>> contains no data.
>>
>> To fix this, I did the following (for users who use granular entry
>> self-healing):
>>
>> 1. Kill the last brick process in the replica (/bricks/3)
>>
>> 2. [root@server-3 ~]# rm -rf /bricks/3
>>
>> 3. [root@server-3 ~]# mkdir /bricks/3
>>
>> 4. Create a new dir on the mount point:
>> [root@client-1 ~]# mkdir /mnt/fake
>>
>> 5. Set some fake xattr on the root of the volume, and not the 'fake'
>> directory itself.
>> [root@client-1 ~]# setfattr -n "user.some-name" -v "some-value" /mnt
>>
>> 6. Make sure there's no io happening on your volume.
>>
>> 7. Check the pending xattrs on the brick directories of the two good
>> copies (on bricks 1 and 2), you should be seeing same values as the one
>> marked in red in both bricks.
>> (note that the client- xattr key will have the same last digit as
>> the index of the brick that is down, when counting from 0. So if the first
>> brick is the one that is down, it would read trusted.afr.*-client-0; if the
>> second brick is the one that is empty and down, it would read
>> trusted.afr.*-client-1 and so on).
>>
>> [root@server-1 ~]# getfattr -d -m . -e hex /bricks/1
>> # file: 1
>> security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f7
>> 23a6574635f72756e74696d655f743a733000
>> trusted.afr.dirty=0x
>> *trusted.afr.rep-client-2=0x00010001*
>> trusted.gfid=0x0001
>> trusted.glusterfs.dht=0x0001
>> trusted.glusterfs.volume-id=0xa349517bb9d44bdf96da8ea324f89e7b
>>
>> [root@server-2 ~]# getfattr -d -m . -e hex /bricks/2
>> # file: 2
>> security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f7
>> 23a6574635f72756e74696d655f743a733000
>> trusted.afr.dirty=0x
>> *trusted.afr.rep-client-2=0x000**10001*
>> trusted.gfid=0x0001
>> trusted.glusterfs.dht=0x0001
>> trusted.glusterfs.volume-id=0xa349517bb9d44bdf96da8ea324f89e7b
>>
>> 8. Flip the 8th digit in the trusted.afr.-client-2 to a 1.
>>
>> [root@server-1 ~]# setfattr -n trusted.afr.rep-client-2 -v
>> *0x000100010001* /bricks/1
>> [root@server-2 ~]# setfattr -n trusted.afr.rep-client-2 -v
>> *0x000100010001* /bricks/2
>>
>> 9. Get the xattrs again and check the xattrs are set properly now
>>
>> [root@server-1 ~]# getfattr -d -m . -e hex /bricks/1
>> # file: 1
>> security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f7
>> 23a6574635f72756e74696d655f743a733000
>> trusted.afr.dirty=0x
>> *trusted.afr.rep-client-2=0x000**100010001*
>> trusted.gfid=0x0001
>> trusted.glusterfs.dht=0x0001
>> trusted.glusterfs.volume-id=0xa349517bb9d44bdf96da8ea324f89e7b
>>
>> [root@server-2 ~]# getfattr -d -m . -e hex /bricks/2
>> # file: 2
>> security.selinux=0x756e636f6e66696e65645f753a6f626a6563745f7
>> 23a6574635f72756e74696d655f743a733000
>> trusted.afr.dirty=0x
>> *trusted.afr.rep-client-2=0x000**100010001*
>> trusted.gfid=0x0001
>> trusted.glusterfs.dht=0x0001
>> trusted.glusterfs.volume-id=0xa349517bb9d44bdf96da8ea324f89e7b
>>
>> 10. Force-start the volume.
>>
>> [root@server-1 ~]# gluster volume start rep force
>> volume start: rep: success
>

  1   2   3   >