subject:"\[Gluster\-devel\] Possible race condition bug with tiered volume"

Re: [Gluster-devel] Possible race condition bug with tiered volume

2016-10-20 Thread Dan Lambright

Dustin,

Your python code looks fine to me... I've been in Ceph C++ weeds lately, I 
kinda miss python ;)

If I run back-to-back smallfile operation "create", then on the second 
smallfile run, I consistently see:  

0.00% of requested files processed, minimum is  70.00
at least one thread encountered error, test may be incomplete

Is this what you get? We can follow up off the mailing list.

Dan

glusterfs 3.7.15 built on Oct 20 2016, with two clients running small file 
against a tiered volume (using ram disk as hot tier, cold disks JBOD, copied 
below) on Fedora 23.

./smallfile_cli.py  --top /mnt/p66p67 --host-set gprfc066,gprfc067 --threads 8 
--files 5000 --file-size 64 --record-size 64 --fsync N --operation read

volume - 

Status: Started
Number of Bricks: 28
Transport-type: tcp
Hot Tier :
Hot Tier Type : Distributed-Replicate
Number of Bricks: 2 x 2 = 4
Brick1: gprfs020:/home/ram 
Brick2: gprfs019:/home/ram 
Brick3: gprfs018:/home/ram 
Brick4: gprfs017:/home/ram 
Cold Tier:
Cold Tier Type : Distributed-Disperse
Number of Bricks: 2 x (8 + 4) = 24
Brick5: gprfs017:/t0
Brick6: gprfs018:/t0
Brick7: gprfs019:/t0
Brick8: gprfs020:/t0
Brick9: gprfs017:/t1
Brick10: gprfs018:/t1
Brick11: gprfs019:/t1
Brick12: gprfs020:/t1
Brick13: gprfs017:/t2
Brick14: gprfs018:/t2
Brick15: gprfs019:/t2
Brick16: gprfs020:/t2
Brick17: gprfs017:/t3
Brick18: gprfs018:/t3
Brick19: gprfs019:/t3
Brick20: gprfs020:/t3
Brick21: gprfs017:/t4
Brick22: gprfs018:/t4
Brick23: gprfs019:/t4
Brick24: gprfs020:/t4
Brick25: gprfs017:/t5
Brick26: gprfs018:/t5
Brick27: gprfs019:/t5
Brick28: gprfs020:/t5
Options Reconfigured:
cluster.tier-mode: cache   
features.ctr-enabled: on   
performance.readdir-ahead: on


- Original Message -
> From: "Dustin Black" 
> To: "Dan Lambright" 
> Cc: "Milind Changire" , "Annette Clewett" 
> , gluster-devel@gluster.org
> Sent: Wednesday, October 19, 2016 3:23:04 PM
> Subject: Re: [Gluster-devel] Possible race condition bug with tiered volume
> 
> # gluster --version
> glusterfs 3.7.9 built on Jun 10 2016 06:32:42
> 
> 
> Try not to make fun of my python, but I was able to make a small
> modification to the to the sync_files.py script from smallfile and at least
> enable my team to move on with testing. It's terribly hacky and ugly, but
> works around the problem, which I am pretty convinced is a Gluster bug at
> this point.
> 
> 
> # diff bin/sync_files.py.orig bin/sync_files.py
> 6a7,8
> > import errno
> > import binascii
> 27c29,40
> < shutil.rmtree(master_invoke.network_dir)
> ---
> > try:
> > shutil.rmtree(master_invoke.network_dir)
> > except OSError as e:
> > err = e.errno
> > if err != errno.EEXIST:
> > # workaround for possible bug in Gluster
> > if err != errno.ENOTEMPTY:
> > raise e
> > else:
> > print('saw ENOTEMPTY on stonewall, moving shared
> directory')
> > ext = str(binascii.b2a_hex(os.urandom(15)))
> > shutil.move(master_invoke.network_dir,
> master_invoke.network_dir + ext)
> 
> 
> Dustin Black, RHCA
> Senior Architect, Software-Defined Storage
> Red Hat, Inc.
> (o) +1.212.510.4138  (m) +1.215.821.7423
> dus...@redhat.com
> 
> 
> On Tue, Oct 18, 2016 at 7:09 PM, Dustin Black  wrote:
> 
> > Dang. I always think I get all the detail and inevitably leave out
> > something important. :-/
> >
> > I'm mobile and don't have the exact version in front of me, but this is
> > recent if not latest RHGS on RHEL 7.2.
> >
> >
> > On Oct 18, 2016 7:04 PM, "Dan Lambright"  wrote:
> >
> >> Dustin,
> >>
> >> What level code ? I often run smallfile on upstream code with tiered
> >> volumes and have not seen this.
> >>
> >> Sure, one of us will get back to you.
> >>
> >> Unfortunately, gluster has a lot of protocol overhead (LOOKUPs), and they
> >> overwhelm the boost in transfer speeds you get for small files. A
> >> presentation at the Berlin gluster summit evaluated this.  The expectation
> >> is md-cache will go a long way towards helping that, before too long.
> >>
> >> Dan
> >>
> >>
> >>
> >> - Original Message -
> >> > From: "Dustin Black" 
> >> > To: gluster-devel@gluster.org
> >> > Cc: "Annette Clewett" 
> >> > Sent: Tuesday, October 18, 2016 4:30:04 PM
> >> > Subject: [Gluster-devel] Possible race condition bug with tiered volume
> &g

Re: [Gluster-devel] Possible race condition bug with tiered volume

2016-10-19 Thread Dustin Black

# gluster --version
glusterfs 3.7.9 built on Jun 10 2016 06:32:42


Try not to make fun of my python, but I was able to make a small
modification to the to the sync_files.py script from smallfile and at least
enable my team to move on with testing. It's terribly hacky and ugly, but
works around the problem, which I am pretty convinced is a Gluster bug at
this point.


# diff bin/sync_files.py.orig bin/sync_files.py
6a7,8
> import errno
> import binascii
27c29,40
< shutil.rmtree(master_invoke.network_dir)
---
> try:
> shutil.rmtree(master_invoke.network_dir)
> except OSError as e:
> err = e.errno
> if err != errno.EEXIST:
> # workaround for possible bug in Gluster
> if err != errno.ENOTEMPTY:
> raise e
> else:
> print('saw ENOTEMPTY on stonewall, moving shared
directory')
> ext = str(binascii.b2a_hex(os.urandom(15)))
> shutil.move(master_invoke.network_dir,
master_invoke.network_dir + ext)


Dustin Black, RHCA
Senior Architect, Software-Defined Storage
Red Hat, Inc.
(o) +1.212.510.4138  (m) +1.215.821.7423
dus...@redhat.com


On Tue, Oct 18, 2016 at 7:09 PM, Dustin Black  wrote:

> Dang. I always think I get all the detail and inevitably leave out
> something important. :-/
>
> I'm mobile and don't have the exact version in front of me, but this is
> recent if not latest RHGS on RHEL 7.2.
>
>
> On Oct 18, 2016 7:04 PM, "Dan Lambright"  wrote:
>
>> Dustin,
>>
>> What level code ? I often run smallfile on upstream code with tiered
>> volumes and have not seen this.
>>
>> Sure, one of us will get back to you.
>>
>> Unfortunately, gluster has a lot of protocol overhead (LOOKUPs), and they
>> overwhelm the boost in transfer speeds you get for small files. A
>> presentation at the Berlin gluster summit evaluated this.  The expectation
>> is md-cache will go a long way towards helping that, before too long.
>>
>> Dan
>>
>>
>>
>> ----- Original Message -
>> > From: "Dustin Black" 
>> > To: gluster-devel@gluster.org
>> > Cc: "Annette Clewett" 
>> > Sent: Tuesday, October 18, 2016 4:30:04 PM
>> > Subject: [Gluster-devel] Possible race condition bug with tiered volume
>> >
>> > I have a 3x2 hot tier on NVMe drives with a 3x2 cold tier on RAID6
>> drives.
>> >
>> > # gluster vol info 1nvme-distrep3x2
>> > Volume Name: 1nvme-distrep3x2
>> > Type: Tier
>> > Volume ID: 21e3fc14-c35c-40c5-8e46-c258c1302607
>> > Status: Started
>> > Number of Bricks: 12
>> > Transport-type: tcp
>> > Hot Tier :
>> > Hot Tier Type : Distributed-Replicate
>> > Number of Bricks: 3 x 2 = 6
>> > Brick1: n5:/rhgs/hotbricks/1nvme-distrep3x2-hot
>> > Brick2: n4:/rhgs/hotbricks/1nvme-distrep3x2-hot
>> > Brick3: n3:/rhgs/hotbricks/1nvme-distrep3x2-hot
>> > Brick4: n2:/rhgs/hotbricks/1nvme-distrep3x2-hot
>> > Brick5: n1:/rhgs/hotbricks/1nvme-distrep3x2-hot
>> > Brick6: n0:/rhgs/hotbricks/1nvme-distrep3x2-hot
>> > Cold Tier:
>> > Cold Tier Type : Distributed-Replicate
>> > Number of Bricks: 3 x 2 = 6
>> > Brick7: n0:/rhgs/coldbricks/1nvme-distrep3x2
>> > Brick8: n1:/rhgs/coldbricks/1nvme-distrep3x2
>> > Brick9: n2:/rhgs/coldbricks/1nvme-distrep3x2
>> > Brick10: n3:/rhgs/coldbricks/1nvme-distrep3x2
>> > Brick11: n4:/rhgs/coldbricks/1nvme-distrep3x2
>> > Brick12: n5:/rhgs/coldbricks/1nvme-distrep3x2
>> > Options Reconfigured:
>> > cluster.tier-mode: cache
>> > features.ctr-enabled: on
>> > performance.readdir-ahead: on
>> >
>> >
>> > I am attempting to run the 'smallfile' benchmark tool on this volume.
>> The
>> > 'smallfile' tool creates a starting gate directory and files in a shared
>> > filesystem location. The first run (write) works as expected.
>> >
>> > # smallfile_cli.py --threads 12 --file-size 4096 --files 300 --top
>> > /rhgs/client/1nvme-distrep3x2 --host-set
>> > c0,c1,c2,c3,c4,c5,c6,c7,c8,c9,c10,c11 --prefix test1 --stonewall Y
>> > --network-sync-dir /rhgs/client/1nvme-distrep3x2/smf1 --operation
>> create
>> >
>> > For the second run (read), I believe that smallfile attempts first to
>> 'rm
>> > -rf' the "network-sync-dir" path, which fails with ENOTEMPTY, causing
>> the
>> > run to fail
>> >

Re: [Gluster-devel] Possible race condition bug with tiered volume

2016-10-18 Thread Dustin Black

Dang. I always think I get all the detail and inevitably leave out
something important. :-/

I'm mobile and don't have the exact version in front of me, but this is
recent if not latest RHGS on RHEL 7.2.


On Oct 18, 2016 7:04 PM, "Dan Lambright"  wrote:

> Dustin,
>
> What level code ? I often run smallfile on upstream code with tiered
> volumes and have not seen this.
>
> Sure, one of us will get back to you.
>
> Unfortunately, gluster has a lot of protocol overhead (LOOKUPs), and they
> overwhelm the boost in transfer speeds you get for small files. A
> presentation at the Berlin gluster summit evaluated this.  The expectation
> is md-cache will go a long way towards helping that, before too long.
>
> Dan
>
>
>
> - Original Message -
> > From: "Dustin Black" 
> > To: gluster-devel@gluster.org
> > Cc: "Annette Clewett" 
> > Sent: Tuesday, October 18, 2016 4:30:04 PM
> > Subject: [Gluster-devel] Possible race condition bug with tiered volume
> >
> > I have a 3x2 hot tier on NVMe drives with a 3x2 cold tier on RAID6
> drives.
> >
> > # gluster vol info 1nvme-distrep3x2
> > Volume Name: 1nvme-distrep3x2
> > Type: Tier
> > Volume ID: 21e3fc14-c35c-40c5-8e46-c258c1302607
> > Status: Started
> > Number of Bricks: 12
> > Transport-type: tcp
> > Hot Tier :
> > Hot Tier Type : Distributed-Replicate
> > Number of Bricks: 3 x 2 = 6
> > Brick1: n5:/rhgs/hotbricks/1nvme-distrep3x2-hot
> > Brick2: n4:/rhgs/hotbricks/1nvme-distrep3x2-hot
> > Brick3: n3:/rhgs/hotbricks/1nvme-distrep3x2-hot
> > Brick4: n2:/rhgs/hotbricks/1nvme-distrep3x2-hot
> > Brick5: n1:/rhgs/hotbricks/1nvme-distrep3x2-hot
> > Brick6: n0:/rhgs/hotbricks/1nvme-distrep3x2-hot
> > Cold Tier:
> > Cold Tier Type : Distributed-Replicate
> > Number of Bricks: 3 x 2 = 6
> > Brick7: n0:/rhgs/coldbricks/1nvme-distrep3x2
> > Brick8: n1:/rhgs/coldbricks/1nvme-distrep3x2
> > Brick9: n2:/rhgs/coldbricks/1nvme-distrep3x2
> > Brick10: n3:/rhgs/coldbricks/1nvme-distrep3x2
> > Brick11: n4:/rhgs/coldbricks/1nvme-distrep3x2
> > Brick12: n5:/rhgs/coldbricks/1nvme-distrep3x2
> > Options Reconfigured:
> > cluster.tier-mode: cache
> > features.ctr-enabled: on
> > performance.readdir-ahead: on
> >
> >
> > I am attempting to run the 'smallfile' benchmark tool on this volume. The
> > 'smallfile' tool creates a starting gate directory and files in a shared
> > filesystem location. The first run (write) works as expected.
> >
> > # smallfile_cli.py --threads 12 --file-size 4096 --files 300 --top
> > /rhgs/client/1nvme-distrep3x2 --host-set
> > c0,c1,c2,c3,c4,c5,c6,c7,c8,c9,c10,c11 --prefix test1 --stonewall Y
> > --network-sync-dir /rhgs/client/1nvme-distrep3x2/smf1 --operation create
> >
> > For the second run (read), I believe that smallfile attempts first to 'rm
> > -rf' the "network-sync-dir" path, which fails with ENOTEMPTY, causing the
> > run to fail
> >
> > # smallfile_cli.py --threads 12 --file-size 4096 --files 300 --top
> > /rhgs/client/1nvme-distrep3x2 --host-set
> > c0,c1,c2,c3,c4,c5,c6,c7,c8,c9,c10,c11 --prefix test1 --stonewall Y
> > --network-sync-dir /rhgs/client/1nvme-distrep3x2/smf1 --operation create
> > ...
> > Traceback (most recent call last):
> > File "/root/bin/smallfile_cli.py", line 280, in 
> > run_workload()
> > File "/root/bin/smallfile_cli.py", line 270, in run_workload
> > return run_multi_host_workload(params)
> > File "/root/bin/smallfile_cli.py", line 62, in run_multi_host_workload
> > sync_files.create_top_dirs(master_invoke, True)
> > File "/root/bin/sync_files.py", line 27, in create_top_dirs
> > shutil.rmtree(master_invoke.network_dir)
> > File "/usr/lib64/python2.7/shutil.py", line 256, in rmtree
> > onerror(os.rmdir, path, sys.exc_info())
> > File "/usr/lib64/python2.7/shutil.py", line 254, in rmtree
> > os.rmdir(path)
> > OSError: [Errno 39] Directory not empty: '/rhgs/client/1nvme-
> distrep3x2/smf1'
> >
> >
> > From the client perspective, the directory is clearly empty.
> >
> > # ls -a /rhgs/client/1nvme-distrep3x2/smf1/
> > . ..
> >
> >
> > And a quick search on the bricks shows that the hot tier on the last
> replica
> > pair is the offender.
> >
> > # for i in {0..5}; do ssh n$i "hostname; ls
> > /rhgs/coldbricks/1nvme-distrep3x2/smf1 | wc -l; ls
> > /rhgs/hotbricks/1nvme-distrep3x2-hot/smf1 | wc -l"; don

Re: [Gluster-devel] Possible race condition bug with tiered volume

2016-10-18 Thread Dan Lambright

Dustin,

What level code ? I often run smallfile on upstream code with tiered volumes 
and have not seen this.  

Sure, one of us will get back to you.

Unfortunately, gluster has a lot of protocol overhead (LOOKUPs), and they 
overwhelm the boost in transfer speeds you get for small files. A presentation 
at the Berlin gluster summit evaluated this.  The expectation is md-cache will 
go a long way towards helping that, before too long.

Dan



- Original Message -
> From: "Dustin Black" 
> To: gluster-devel@gluster.org
> Cc: "Annette Clewett" 
> Sent: Tuesday, October 18, 2016 4:30:04 PM
> Subject: [Gluster-devel] Possible race condition bug with tiered volume
> 
> I have a 3x2 hot tier on NVMe drives with a 3x2 cold tier on RAID6 drives.
> 
> # gluster vol info 1nvme-distrep3x2
> Volume Name: 1nvme-distrep3x2
> Type: Tier
> Volume ID: 21e3fc14-c35c-40c5-8e46-c258c1302607
> Status: Started
> Number of Bricks: 12
> Transport-type: tcp
> Hot Tier :
> Hot Tier Type : Distributed-Replicate
> Number of Bricks: 3 x 2 = 6
> Brick1: n5:/rhgs/hotbricks/1nvme-distrep3x2-hot
> Brick2: n4:/rhgs/hotbricks/1nvme-distrep3x2-hot
> Brick3: n3:/rhgs/hotbricks/1nvme-distrep3x2-hot
> Brick4: n2:/rhgs/hotbricks/1nvme-distrep3x2-hot
> Brick5: n1:/rhgs/hotbricks/1nvme-distrep3x2-hot
> Brick6: n0:/rhgs/hotbricks/1nvme-distrep3x2-hot
> Cold Tier:
> Cold Tier Type : Distributed-Replicate
> Number of Bricks: 3 x 2 = 6
> Brick7: n0:/rhgs/coldbricks/1nvme-distrep3x2
> Brick8: n1:/rhgs/coldbricks/1nvme-distrep3x2
> Brick9: n2:/rhgs/coldbricks/1nvme-distrep3x2
> Brick10: n3:/rhgs/coldbricks/1nvme-distrep3x2
> Brick11: n4:/rhgs/coldbricks/1nvme-distrep3x2
> Brick12: n5:/rhgs/coldbricks/1nvme-distrep3x2
> Options Reconfigured:
> cluster.tier-mode: cache
> features.ctr-enabled: on
> performance.readdir-ahead: on
> 
> 
> I am attempting to run the 'smallfile' benchmark tool on this volume. The
> 'smallfile' tool creates a starting gate directory and files in a shared
> filesystem location. The first run (write) works as expected.
> 
> # smallfile_cli.py --threads 12 --file-size 4096 --files 300 --top
> /rhgs/client/1nvme-distrep3x2 --host-set
> c0,c1,c2,c3,c4,c5,c6,c7,c8,c9,c10,c11 --prefix test1 --stonewall Y
> --network-sync-dir /rhgs/client/1nvme-distrep3x2/smf1 --operation create
> 
> For the second run (read), I believe that smallfile attempts first to 'rm
> -rf' the "network-sync-dir" path, which fails with ENOTEMPTY, causing the
> run to fail
> 
> # smallfile_cli.py --threads 12 --file-size 4096 --files 300 --top
> /rhgs/client/1nvme-distrep3x2 --host-set
> c0,c1,c2,c3,c4,c5,c6,c7,c8,c9,c10,c11 --prefix test1 --stonewall Y
> --network-sync-dir /rhgs/client/1nvme-distrep3x2/smf1 --operation create
> ...
> Traceback (most recent call last):
> File "/root/bin/smallfile_cli.py", line 280, in 
> run_workload()
> File "/root/bin/smallfile_cli.py", line 270, in run_workload
> return run_multi_host_workload(params)
> File "/root/bin/smallfile_cli.py", line 62, in run_multi_host_workload
> sync_files.create_top_dirs(master_invoke, True)
> File "/root/bin/sync_files.py", line 27, in create_top_dirs
> shutil.rmtree(master_invoke.network_dir)
> File "/usr/lib64/python2.7/shutil.py", line 256, in rmtree
> onerror(os.rmdir, path, sys.exc_info())
> File "/usr/lib64/python2.7/shutil.py", line 254, in rmtree
> os.rmdir(path)
> OSError: [Errno 39] Directory not empty: '/rhgs/client/1nvme-distrep3x2/smf1'
> 
> 
> From the client perspective, the directory is clearly empty.
> 
> # ls -a /rhgs/client/1nvme-distrep3x2/smf1/
> . ..
> 
> 
> And a quick search on the bricks shows that the hot tier on the last replica
> pair is the offender.
> 
> # for i in {0..5}; do ssh n$i "hostname; ls
> /rhgs/coldbricks/1nvme-distrep3x2/smf1 | wc -l; ls
> /rhgs/hotbricks/1nvme-distrep3x2-hot/smf1 | wc -l"; donerhosd0
> 0
> 0
> rhosd1
> 0
> 0
> rhosd2
> 0
> 0
> rhosd3
> 0
> 0
> rhosd4
> 0
> 1
> rhosd5
> 0
> 1
> 
> 
> (For the record, multiple runs of this reproducer show that it is
> consistently the hot tier that is to blame, but it is not always the same
> replica pair.)
> 
> 
> Can someone try recreating this scenario to see if the problem is consistent?
> Please reach out if you need me to provide any further details.
> 
> 
> Dustin Black, RHCA
> Senior Architect, Software-Defined Storage
> Red Hat, Inc.
> (o) +1.212.510.4138 (m) +1.215.821.7423
> dus...@redhat.com
> 
> ___
> Gluster-devel mailing list
> Gluster-devel@gluster.org
> http://www.gluster.org/mailman/listinfo/gluster-devel
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

[Gluster-devel] Possible race condition bug with tiered volume

2016-10-18 Thread Dustin Black

I have a 3x2 hot tier on NVMe drives with a 3x2 cold tier on RAID6 drives.

# gluster vol info 1nvme-distrep3x2

Volume Name: 1nvme-distrep3x2
Type: Tier
Volume ID: 21e3fc14-c35c-40c5-8e46-c258c1302607
Status: Started
Number of Bricks: 12
Transport-type: tcp
Hot Tier :
Hot Tier Type : Distributed-Replicate
Number of Bricks: 3 x 2 = 6
Brick1: n5:/rhgs/hotbricks/1nvme-distrep3x2-hot
Brick2: n4:/rhgs/hotbricks/1nvme-distrep3x2-hot
Brick3: n3:/rhgs/hotbricks/1nvme-distrep3x2-hot
Brick4: n2:/rhgs/hotbricks/1nvme-distrep3x2-hot
Brick5: n1:/rhgs/hotbricks/1nvme-distrep3x2-hot
Brick6: n0:/rhgs/hotbricks/1nvme-distrep3x2-hot
Cold Tier:
Cold Tier Type : Distributed-Replicate
Number of Bricks: 3 x 2 = 6
Brick7: n0:/rhgs/coldbricks/1nvme-distrep3x2
Brick8: n1:/rhgs/coldbricks/1nvme-distrep3x2
Brick9: n2:/rhgs/coldbricks/1nvme-distrep3x2
Brick10: n3:/rhgs/coldbricks/1nvme-distrep3x2
Brick11: n4:/rhgs/coldbricks/1nvme-distrep3x2
Brick12: n5:/rhgs/coldbricks/1nvme-distrep3x2
Options Reconfigured:
cluster.tier-mode: cache
features.ctr-enabled: on
performance.readdir-ahead: on


I am attempting to run the 'smallfile' benchmark tool on this volume. The
'smallfile' tool creates a starting gate directory and files in a shared
filesystem location. The first run (write) works as expected.

# smallfile_cli.py --threads 12 --file-size 4096 --files 300 --top
/rhgs/client/1nvme-distrep3x2 --host-set
c0,c1,c2,c3,c4,c5,c6,c7,c8,c9,c10,c11 --prefix test1 --stonewall Y
--network-sync-dir /rhgs/client/1nvme-distrep3x2/smf1 --operation create

For the second run (read), I believe that smallfile attempts first to 'rm
-rf' the "network-sync-dir" path, which fails with ENOTEMPTY, causing the
run to fail

# smallfile_cli.py --threads 12 --file-size 4096 --files 300 --top
/rhgs/client/1nvme-distrep3x2 --host-set
c0,c1,c2,c3,c4,c5,c6,c7,c8,c9,c10,c11 --prefix test1 --stonewall Y
--network-sync-dir /rhgs/client/1nvme-distrep3x2/smf1 --operation create
...
Traceback (most recent call last):
  File "/root/bin/smallfile_cli.py", line 280, in 
run_workload()
  File "/root/bin/smallfile_cli.py", line 270, in run_workload
return run_multi_host_workload(params)
  File "/root/bin/smallfile_cli.py", line 62, in run_multi_host_workload
sync_files.create_top_dirs(master_invoke, True)
  File "/root/bin/sync_files.py", line 27, in create_top_dirs
shutil.rmtree(master_invoke.network_dir)
  File "/usr/lib64/python2.7/shutil.py", line 256, in rmtree
onerror(os.rmdir, path, sys.exc_info())
  File "/usr/lib64/python2.7/shutil.py", line 254, in rmtree
os.rmdir(path)
OSError: [Errno 39] Directory not empty:
'/rhgs/client/1nvme-distrep3x2/smf1'


>From the client perspective, the directory is clearly empty.

# ls -a /rhgs/client/1nvme-distrep3x2/smf1/
.  ..


And a quick search on the bricks shows that the hot tier on the last
replica pair is the offender.

# for i in {0..5}; do ssh n$i "hostname; ls
/rhgs/coldbricks/1nvme-distrep3x2/smf1 | wc -l; ls
/rhgs/hotbricks/1nvme-distrep3x2-hot/smf1 | wc -l"; donerhosd0
0
0
rhosd1
0
0
rhosd2
0
0
rhosd3
0
0
rhosd4
0
1
rhosd5
0
1


(For the record, multiple runs of this reproducer show that it is
consistently the hot tier that is to blame, but it is not always the same
replica pair.)


Can someone try recreating this scenario to see if the problem is
consistent? Please reach out if you need me to provide any further details.


Dustin Black, RHCA
Senior Architect, Software-Defined Storage
Red Hat, Inc.
(o) +1.212.510.4138  (m) +1.215.821.7423
dus...@redhat.com
___
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] Possible race condition bug with tiered volume

Re: [Gluster-devel] Possible race condition bug with tiered volume

Re: [Gluster-devel] Possible race condition bug with tiered volume

Re: [Gluster-devel] Possible race condition bug with tiered volume

[Gluster-devel] Possible race condition bug with tiered volume

5 matches

Site Navigation

Mail list logo

Footer information