Re: [Gluster-devel] missing files
I have not made any progress on the internal systems, post Pranith's investigations on the inode release causing this slowness on a aged volume, due to other priorities. Need to get back on track with this one, let me discuss this with Pranith and see how best to move ahead with the same. Shyam On 02/17/2015 04:50 PM, David F. Robinson wrote: Any updates on this issue? Thanks in advance... David -- Original Message -- From: Shyam srang...@redhat.com To: David F. Robinson david.robin...@corvidtec.com; Justin Clift jus...@gluster.org Cc: Gluster Devel gluster-devel@gluster.org Sent: 2/11/2015 10:02:09 PM Subject: Re: [Gluster-devel] missing files On 02/11/2015 08:28 AM, David F. Robinson wrote: My base filesystem has 40-TB and the tar takes 19 minutes. I copied over 10-TB and it took the tar extraction from 1-minute to 7-minutes. My suspicion is that it is related to number of files and not necessarily file size. Shyam is looking into reproducing this behavior on a redhat system. I am able to reproduce the issue on a similar setup internally (at least at the surface it seems to be similar to what David is facing). I will continue the investigation for the root cause. Shyam ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] missing files
Any updates on this issue? Thanks in advance... David -- Original Message -- From: Shyam srang...@redhat.com To: David F. Robinson david.robin...@corvidtec.com; Justin Clift jus...@gluster.org Cc: Gluster Devel gluster-devel@gluster.org Sent: 2/11/2015 10:02:09 PM Subject: Re: [Gluster-devel] missing files On 02/11/2015 08:28 AM, David F. Robinson wrote: My base filesystem has 40-TB and the tar takes 19 minutes. I copied over 10-TB and it took the tar extraction from 1-minute to 7-minutes. My suspicion is that it is related to number of files and not necessarily file size. Shyam is looking into reproducing this behavior on a redhat system. I am able to reproduce the issue on a similar setup internally (at least at the surface it seems to be similar to what David is facing). I will continue the investigation for the root cause. Shyam ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] missing files
On 02/12/2015 03:05 PM, Pranith Kumar Karampuri wrote: On 02/12/2015 09:14 AM, Justin Clift wrote: On 12 Feb 2015, at 03:02, Shyam srang...@redhat.com wrote: On 02/11/2015 08:28 AM, David F. Robinson wrote: My base filesystem has 40-TB and the tar takes 19 minutes. I copied over 10-TB and it took the tar extraction from 1-minute to 7-minutes. My suspicion is that it is related to number of files and not necessarily file size. Shyam is looking into reproducing this behavior on a redhat system. I am able to reproduce the issue on a similar setup internally (at least at the surface it seems to be similar to what David is facing). I will continue the investigation for the root cause. Here is the initial analysis of my investigation: (Thanks for providing me with the setup shyam, keep the setup we may need it for further analysis) On bad volume: %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop - --- --- --- 0.00 0.00 us 0.00 us 0.00 us 937104 FORGET 0.00 0.00 us 0.00 us 0.00 us 872478 RELEASE 0.00 0.00 us 0.00 us 0.00 us 23668 RELEASEDIR 0.00 41.86 us 23.00 us 86.00 us 92 STAT 0.01 39.40 us 24.00 us 104.00 us 218 STATFS 0.28 55.99 us 43.00 us1152.00 us 4065 SETXATTR 0.58 56.89 us 25.00 us4505.00 us 8236 OPENDIR 0.73 26.80 us 11.00 us 257.00 us 22238 FLUSH 0.77 152.83 us 92.00 us8819.00 us 4065 RMDIR 2.57 62.00 us 21.00 us 409.00 us 33643 WRITE 5.46 199.16 us 108.00 us 469938.00 us 22238 UNLINK 6.70 69.83 us 43.00 us.00 us 77809 LOOKUP 6.97 447.60 us 21.00 us 54875.00 us 12631 READDIRP 7.73 79.42 us 33.00 us1535.00 us 78909 SETATTR 14.112815.00 us 176.00 us 2106305.00 us 4065 MKDIR 54.091972.62 us 138.00 us 1520773.00 us 22238 CREATE On good volume: %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop - --- --- --- 0.00 0.00 us 0.00 us 0.00 us 58870 FORGET 0.00 0.00 us 0.00 us 0.00 us 66016 RELEASE 0.00 0.00 us 0.00 us 0.00 us 16480 RELEASEDIR 0.00 61.50 us 58.00 us 65.00 us 2OPEN 0.01 39.56 us 16.00 us 112.00 us 71 STAT 0.02 41.29 us 27.00 us 79.00 us 163 STATFS 0.03 36.06 us 17.00 us 98.00 us 301 FSTAT 0.79 62.38 us 39.00 us 269.00 us 4065 SETXATTR 1.14 242.99 us 25.00 us 28636.00 us 1497 READ 1.54 59.76 us 25.00 us6325.00 us 8236 OPENDIR 1.70 133.75 us 89.00 us 374.00 us 4065 RMDIR 2.25 32.65 us 15.00 us 265.00 us 22006 FLUSH 3.37 265.05 us 172.00 us2349.00 us 4065 MKDIR 7.14 68.34 us 21.00 us 21902.00 us 33357 WRITE 11.00 159.68 us 107.00 us2567.00 us 22003 UNLINK 13.82 200.54 us 133.00 us 21762.00 us 22003 CREATE 17.85 448.85 us 22.00 us 54046.00 us 12697 READDIRP 18.37 76.12 us 45.00 us 294.00 us 77044 LOOKUP 20.95 85.54 us 35.00 us1404.00 us 78204 SETATTR As we can see here, FORGET/RELEASE are way more in the brick from full volume compared to the brick from empty volume. It seems to suggest that the inode-table on the volume with lots of data is carrying too many passive inodes in the table which need to be displaced to create new ones. Need to check if they come in the fop-path. Need to continue my investigations further, will let you know. Just to increase confidence performed one more test. Stopped the volumes and re-started. Now on both the volumes, the numbers are almost same: [root@gqac031 gluster-mount]# time rm -rf boost_1_57_0 ; time tar xf boost_1_57_0.tar.gz real1m15.074s user0m0.550s sys 0m4.656s real2m46.866s user0m5.347s sys 0m16.047s [root@gqac031 gluster-mount]# cd /gluster-emptyvol/ [root@gqac031 gluster-emptyvol]# ls boost_1_57_0.tar.gz [root@gqac031 gluster-emptyvol]# time tar xf boost_1_57_0.tar.gz real2m31.467s user0m5.475s sys 0m15.471s gqas015.sbu.lab.eng.bos.redhat.com:testvol on /gluster-mount type fuse.glusterfs (rw,default_permissions,allow_other,max_read=131072) gqas015.sbu.lab.eng.bos.redhat.com:emotyvol on /gluster-emptyvol type fuse.glusterfs (rw,default_permissions,allow_other,max_read=131072) Pranith Pranith Thanks Shyam. :) + Justin -- GlusterFS - http://www.gluster.org An open source, distributed file system scaling to several petabytes, and handling thousands of clients. My personal twitter:
Re: [Gluster-devel] missing files
On 02/12/2015 09:14 AM, Justin Clift wrote: On 12 Feb 2015, at 03:02, Shyam srang...@redhat.com wrote: On 02/11/2015 08:28 AM, David F. Robinson wrote: My base filesystem has 40-TB and the tar takes 19 minutes. I copied over 10-TB and it took the tar extraction from 1-minute to 7-minutes. My suspicion is that it is related to number of files and not necessarily file size. Shyam is looking into reproducing this behavior on a redhat system. I am able to reproduce the issue on a similar setup internally (at least at the surface it seems to be similar to what David is facing). I will continue the investigation for the root cause. Here is the initial analysis of my investigation: (Thanks for providing me with the setup shyam, keep the setup we may need it for further analysis) On bad volume: %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop - --- --- --- 0.00 0.00 us 0.00 us 0.00 us 937104 FORGET 0.00 0.00 us 0.00 us 0.00 us 872478 RELEASE 0.00 0.00 us 0.00 us 0.00 us 23668 RELEASEDIR 0.00 41.86 us 23.00 us 86.00 us 92STAT 0.01 39.40 us 24.00 us 104.00 us 218 STATFS 0.28 55.99 us 43.00 us1152.00 us 4065SETXATTR 0.58 56.89 us 25.00 us4505.00 us 8236 OPENDIR 0.73 26.80 us 11.00 us 257.00 us 22238 FLUSH 0.77 152.83 us 92.00 us8819.00 us 4065 RMDIR 2.57 62.00 us 21.00 us 409.00 us 33643 WRITE 5.46 199.16 us 108.00 us 469938.00 us 22238 UNLINK 6.70 69.83 us 43.00 us.00 us 77809 LOOKUP 6.97 447.60 us 21.00 us 54875.00 us 12631READDIRP 7.73 79.42 us 33.00 us1535.00 us 78909 SETATTR 14.112815.00 us 176.00 us 2106305.00 us 4065 MKDIR 54.091972.62 us 138.00 us 1520773.00 us 22238 CREATE On good volume: %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop - --- --- --- 0.00 0.00 us 0.00 us 0.00 us 58870 FORGET 0.00 0.00 us 0.00 us 0.00 us 66016 RELEASE 0.00 0.00 us 0.00 us 0.00 us 16480 RELEASEDIR 0.00 61.50 us 58.00 us 65.00 us 2OPEN 0.01 39.56 us 16.00 us 112.00 us 71STAT 0.02 41.29 us 27.00 us 79.00 us 163 STATFS 0.03 36.06 us 17.00 us 98.00 us 301 FSTAT 0.79 62.38 us 39.00 us 269.00 us 4065SETXATTR 1.14 242.99 us 25.00 us 28636.00 us 1497READ 1.54 59.76 us 25.00 us6325.00 us 8236 OPENDIR 1.70 133.75 us 89.00 us 374.00 us 4065 RMDIR 2.25 32.65 us 15.00 us 265.00 us 22006 FLUSH 3.37 265.05 us 172.00 us2349.00 us 4065 MKDIR 7.14 68.34 us 21.00 us 21902.00 us 33357 WRITE 11.00 159.68 us 107.00 us2567.00 us 22003 UNLINK 13.82 200.54 us 133.00 us 21762.00 us 22003 CREATE 17.85 448.85 us 22.00 us 54046.00 us 12697READDIRP 18.37 76.12 us 45.00 us 294.00 us 77044 LOOKUP 20.95 85.54 us 35.00 us1404.00 us 78204 SETATTR As we can see here, FORGET/RELEASE are way more in the brick from full volume compared to the brick from empty volume. It seems to suggest that the inode-table on the volume with lots of data is carrying too many passive inodes in the table which need to be displaced to create new ones. Need to check if they come in the fop-path. Need to continue my investigations further, will let you know. Pranith Thanks Shyam. :) + Justin -- GlusterFS - http://www.gluster.org An open source, distributed file system scaling to several petabytes, and handling thousands of clients. My personal twitter: twitter.com/realjustinclift ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] missing files
Shyam, You asked me to stop/start the slow volume to see if it fixed the timing issue. I stopped/started homegfs_backup (the production volume with 40+ TB) and it didn't make it faster. I didn't stop/start the fast volume to see if it made it slower. I just did that and sent out an email. I saw a similar result as Pranith. however, I tried this test below and saw no issues. So, i don't know why restart the older volume of test3brick slowed it down but the test below shows no slowdown. #... Create 2-new bricks gluster volume create test4brick gfsib01bkp.corvidtec.com:/data/brick01bkp/test4brick gfsib01bkp.corvidtec.com:/data/brick02bkp/test4brick gluster volume create test5brick gfsib01bkp.corvidtec.com:/data/brick01bkp/test5brick gfsib01bkp.corvidtec.com:/data/brick02bkp/test5brick gluster volume start test4brick gluster volume start test5brick mount /test4brick mount /test5brick cp /root/boost_1_57_0.tar /test4brick cp /root/boost_1_57_0.tar /test5brick #... Stop/start test4brick to see if this causes a timing issue umount /test4brick gluster volume stop test4brick gluster volume start test4brick mount /test4brick #... Run test on both new bricks cd /test4brick time tar -xPf boost_1_57_0.tar; time rm -rf boost_1_57_0 real1m29.712s user0m0.415s sys 0m2.772s real0m18.866s user0m0.087s sys 0m0.556s cd /test5brick time tar -xPf boost_1_57_0.tar; time rm -rf boost_1_57_0 real 1m28.243s user 0m0.366s sys 0m2.502s real 0m18.193s user 0m0.075s sys 0m0.543s #... Repeat again after stop/start of test4brick umount /test4brick gluster volume stop test4brick gluster volume start test4brick mount /test4brick cd /test4brick time tar -xPf boost_1_57_0.tar; time rm -rf boost_1_57_0 real1m25.277s user0m0.466s sys 0m3.107s real0m16.575s user0m0.084s sys 0m0.577s -- Original Message -- From: Shyam srang...@redhat.com To: Pranith Kumar Karampuri pkara...@redhat.com; Justin Clift jus...@gluster.org Cc: Gluster Devel gluster-devel@gluster.org; David F. Robinson david.robin...@corvidtec.com Sent: 2/12/2015 10:46:14 AM Subject: Re: [Gluster-devel] missing files On 02/12/2015 06:22 AM, Pranith Kumar Karampuri wrote: On 02/12/2015 03:05 PM, Pranith Kumar Karampuri wrote: On 02/12/2015 09:14 AM, Justin Clift wrote: On 12 Feb 2015, at 03:02, Shyam srang...@redhat.com wrote: On 02/11/2015 08:28 AM, David F. Robinson wrote: Just to increase confidence performed one more test. Stopped the volumes and re-started. Now on both the volumes, the numbers are almost same: [root@gqac031 gluster-mount]# time rm -rf boost_1_57_0 ; time tar xf boost_1_57_0.tar.gz real 1m15.074s user 0m0.550s sys 0m4.656s real 2m46.866s user 0m5.347s sys 0m16.047s [root@gqac031 gluster-mount]# cd /gluster-emptyvol/ [root@gqac031 gluster-emptyvol]# ls boost_1_57_0.tar.gz [root@gqac031 gluster-emptyvol]# time tar xf boost_1_57_0.tar.gz real 2m31.467s user 0m5.475s sys 0m15.471s gqas015.sbu.lab.eng.bos.redhat.com:testvol on /gluster-mount type fuse.glusterfs (rw,default_permissions,allow_other,max_read=131072) gqas015.sbu.lab.eng.bos.redhat.com:emotyvol on /gluster-emptyvol type fuse.glusterfs (rw,default_permissions,allow_other,max_read=131072) If I remember right, we performed a similar test on David's setup, but I believe there was no significant performance gain there. David could you clarify? Just so we know where we are headed :) Shyam ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] missing files
On 02/12/2015 11:18 AM, David F. Robinson wrote: Shyam, You asked me to stop/start the slow volume to see if it fixed the timing issue. I stopped/started homegfs_backup (the production volume with 40+ TB) and it didn't make it faster. I didn't stop/start the fast volume to see if it made it slower. I just did that and sent out an email. I saw a similar result as Pranith. Just to be clear even after restart of the slow volume, we see ~19 minutes for the tar to complete, correct? Versus, on the fast volume it is anywhere between 00:55 - 3:00 minutes, irrespective of start, fresh create, etc. correct? Shyam ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] missing files
On 12 Feb 2015, at 11:22, Pranith Kumar Karampuri pkara...@redhat.com wrote: snip Just to increase confidence performed one more test. Stopped the volumes and re-started. Now on both the volumes, the numbers are almost same: Oh. So it's a problem that turns up after a certain amount of activity has happened on a volume? eg a lot of intensive activity would show up quickly, but a less intense amount of activity would take longer to show the effect Kaleb's long running cluster might be useful to catch this kind of thing in future, depending on the workload running on it, and the kind of pre/post tests we run. (eg to catch performance regressions) + Justin -- GlusterFS - http://www.gluster.org An open source, distributed file system scaling to several petabytes, and handling thousands of clients. My personal twitter: twitter.com/realjustinclift ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] missing files
That is very interesting. I tried this test and received a similar result. Start/stopping the volume causes a timing issue on the blank volume. It seems like there is some parameter getting set when you create a volume and gets reset when you start/stop a volume. Or, something gets set during the start/stop operation that causes the problem. Is there a way to list all parameters that are set for a volume? gluster volume info only shows the ones that the user has changed from defaults. [root@gfs01bkp ~]# gluster volume stop test3brick Stopping volume will make its data inaccessible. Do you want to continue? (y/n) y volume stop: test3brick: success [root@gfs01bkp ~]# gluster volume start test3brick volume start: test3brick: success [root@gfs01bkp ~]# mount /test3brick [root@gfs01bkp ~]# cd /test3brick/ [root@gfs01bkp test3brick]# date; time tar -xPf boost_1_57_0.tar ; time rm -rf boost_1_57_0 Thu Feb 12 10:42:43 EST 2015 real3m46.002s user0m0.421s sys 0m2.812s real0m15.406s user0m0.092s sys 0m0.549s -- Original Message -- From: Pranith Kumar Karampuri pkara...@redhat.com To: Justin Clift jus...@gluster.org; Shyam srang...@redhat.com Cc: Gluster Devel gluster-devel@gluster.org; David F. Robinson david.robin...@corvidtec.com Sent: 2/12/2015 6:22:23 AM Subject: Re: [Gluster-devel] missing files On 02/12/2015 03:05 PM, Pranith Kumar Karampuri wrote: On 02/12/2015 09:14 AM, Justin Clift wrote: On 12 Feb 2015, at 03:02, Shyam srang...@redhat.com wrote: On 02/11/2015 08:28 AM, David F. Robinson wrote: My base filesystem has 40-TB and the tar takes 19 minutes. I copied over 10-TB and it took the tar extraction from 1-minute to 7-minutes. My suspicion is that it is related to number of files and not necessarily file size. Shyam is looking into reproducing this behavior on a redhat system. I am able to reproduce the issue on a similar setup internally (at least at the surface it seems to be similar to what David is facing). I will continue the investigation for the root cause. Here is the initial analysis of my investigation: (Thanks for providing me with the setup shyam, keep the setup we may need it for further analysis) On bad volume: %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop - --- --- --- 0.00 0.00 us 0.00 us 0.00 us 937104 FORGET 0.00 0.00 us 0.00 us 0.00 us 872478 RELEASE 0.00 0.00 us 0.00 us 0.00 us 23668 RELEASEDIR 0.00 41.86 us 23.00 us 86.00 us 92 STAT 0.01 39.40 us 24.00 us 104.00 us 218 STATFS 0.28 55.99 us 43.00 us 1152.00 us 4065 SETXATTR 0.58 56.89 us 25.00 us 4505.00 us 8236 OPENDIR 0.73 26.80 us 11.00 us 257.00 us 22238 FLUSH 0.77 152.83 us 92.00 us 8819.00 us 4065 RMDIR 2.57 62.00 us 21.00 us 409.00 us 33643 WRITE 5.46 199.16 us 108.00 us 469938.00 us 22238 UNLINK 6.70 69.83 us 43.00 us .00 us 77809 LOOKUP 6.97 447.60 us 21.00 us 54875.00 us 12631 READDIRP 7.73 79.42 us 33.00 us 1535.00 us 78909 SETATTR 14.11 2815.00 us 176.00 us 2106305.00 us 4065 MKDIR 54.09 1972.62 us 138.00 us 1520773.00 us 22238 CREATE On good volume: %-latency Avg-latency Min-Latency Max-Latency No. of calls Fop - --- --- --- 0.00 0.00 us 0.00 us 0.00 us 58870 FORGET 0.00 0.00 us 0.00 us 0.00 us 66016 RELEASE 0.00 0.00 us 0.00 us 0.00 us 16480 RELEASEDIR 0.00 61.50 us 58.00 us 65.00 us 2 OPEN 0.01 39.56 us 16.00 us 112.00 us 71 STAT 0.02 41.29 us 27.00 us 79.00 us 163 STATFS 0.03 36.06 us 17.00 us 98.00 us 301 FSTAT 0.79 62.38 us 39.00 us 269.00 us 4065 SETXATTR 1.14 242.99 us 25.00 us 28636.00 us 1497 READ 1.54 59.76 us 25.00 us 6325.00 us 8236 OPENDIR 1.70 133.75 us 89.00 us 374.00 us 4065 RMDIR 2.25 32.65 us 15.00 us 265.00 us 22006 FLUSH 3.37 265.05 us 172.00 us 2349.00 us 4065 MKDIR 7.14 68.34 us 21.00 us 21902.00 us 33357 WRITE 11.00 159.68 us 107.00 us 2567.00 us 22003 UNLINK 13.82 200.54 us 133.00 us 21762.00 us 22003 CREATE 17.85 448.85 us 22.00 us 54046.00 us 12697 READDIRP 18.37 76.12 us 45.00 us 294.00 us 77044 LOOKUP 20.95 85.54 us 35.00 us 1404.00 us 78204 SETATTR As we can see here, FORGET/RELEASE are way more in the brick from full volume compared to the brick from empty volume. It seems to suggest that the inode-table on the volume with lots of data is carrying too many passive inodes in the table which need to be displaced to create new ones. Need to check if they come in the fop-path. Need to continue my investigations further, will let you know. Just to increase confidence performed one more test. Stopped the volumes and re-started. Now on both the volumes, the numbers are almost same: [root@gqac031 gluster-mount]# time rm -rf boost_1_57_0 ; time tar xf boost_1_57_0.tar.gz real 1m15.074s
Re: [Gluster-devel] missing files
-- Original Message -- From: Shyam srang...@redhat.com To: David F. Robinson david.robin...@corvidtec.com; Pranith Kumar Karampuri pkara...@redhat.com; Justin Clift jus...@gluster.org Cc: Gluster Devel gluster-devel@gluster.org Sent: 2/12/2015 11:26:51 AM Subject: Re: [Gluster-devel] missing files On 02/12/2015 11:18 AM, David F. Robinson wrote: Shyam, You asked me to stop/start the slow volume to see if it fixed the timing issue. I stopped/started homegfs_backup (the production volume with 40+ TB) and it didn't make it faster. I didn't stop/start the fast volume to see if it made it slower. I just did that and sent out an email. I saw a similar result as Pranith. Just to be clear even after restart of the slow volume, we see ~19 minutes for the tar to complete, correct? Correct Versus, on the fast volume it is anywhere between 00:55 - 3:00 minutes, irrespective of start, fresh create, etc. correct? Correct Shyam ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] missing files
FWIW, starting/stopping a volume that is fast doesn't consistently make it slow. I just tried it again on an older volume... It doesn't make it slow. I also went back and re-ran the test on test3brick and it isn't slow any longer. Maybe there is a time lag after stopping/starting a volume before it becomes fast. Either way, stopping/starting a fast volume only makes it slow for some period of time and it doesn't consistently make it slow. I don't think this is the issue. red-herring. [root@gfs01bkp /]# gluster volume stop test2brick Stopping volume will make its data inaccessible. Do you want to continue? (y/n) y [root@gfs01bkp /]# gluster volume start test2brick volume start: test2brick: success [root@gfs01bkp /]# mount /test2brick [root@gfs01bkp /]# cd /test2brick [root@gfs01bkp test2brick]# time tar -xPf boost_1_57_0.tar; time rm -rf boost_1_57_0 real1m1.124s user0m0.432s sys 0m3.136s real0m16.630s user0m0.083s sys 0m0.570s #... Retest on test3brick after it has been up after a volume restart for 20-minutes... Compare this to running the test immediately after a restart which gave a time of 3.5-minutes. [root@gfs01bkp test3brick]# time tar -xPf boost_1_57_0.tar; time rm -rf boost_1_57_0 real1m17.786s user0m0.502s sys 0m3.278s real0m18.103s user0m0.101s sys 0m0.684s -- Original Message -- From: Shyam srang...@redhat.com To: David F. Robinson david.robin...@corvidtec.com; Pranith Kumar Karampuri pkara...@redhat.com; Justin Clift jus...@gluster.org Cc: Gluster Devel gluster-devel@gluster.org Sent: 2/12/2015 11:26:51 AM Subject: Re: [Gluster-devel] missing files On 02/12/2015 11:18 AM, David F. Robinson wrote: Shyam, You asked me to stop/start the slow volume to see if it fixed the timing issue. I stopped/started homegfs_backup (the production volume with 40+ TB) and it didn't make it faster. I didn't stop/start the fast volume to see if it made it slower. I just did that and sent out an email. I saw a similar result as Pranith. Just to be clear even after restart of the slow volume, we see ~19 minutes for the tar to complete, correct? Versus, on the fast volume it is anywhere between 00:55 - 3:00 minutes, irrespective of start, fresh create, etc. correct? Shyam ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] missing files
My base filesystem has 40-TB and the tar takes 19 minutes. I copied over 10-TB and it took the tar extraction from 1-minute to 7-minutes. My suspicion is that it is related to number of files and not necessarily file size. Shyam is looking into reproducing this behavior on a redhat system. David (Sent from mobile) === David F. Robinson, Ph.D. President - Corvid Technologies 704.799.6944 x101 [office] 704.252.1310 [cell] 704.799.7974 [fax] david.robin...@corvidtec.com http://www.corvidtechnologies.com On Feb 11, 2015, at 7:38 AM, Justin Clift jus...@gluster.org wrote: On 11 Feb 2015, at 12:31, David F. Robinson david.robin...@corvidtec.com wrote: Some time ago I had a similar performance problem (with 3.4 if I remember correctly): a just created volume started to work fine, but after some time using it performance was worse. Removing all files from the volume didn't improve the performance again. I guess my problem is a little better depending on how you look at it. If I date the data from the volume, the performance goes back to that of an empty volume. I don't have to delete the .glusterfs entries to regain my performance. I only have to delete the data from the mount point. Interesting. Do you have somewhat accurate stats on how much data (eg # of entries, size of files) was in the data set that did this? Wondering if it's repeatable, so we can replicate the problem and solve. :) + Justin -- GlusterFS - http://www.gluster.org An open source, distributed file system scaling to several petabytes, and handling thousands of clients. My personal twitter: twitter.com/realjustinclift ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] missing files
Don't think it is the underlying file system. /data/brickxx is the underlying xfs. Performance to this is fine. When I created a volume it just puts the data in /data/brick/test2. The underlying filesystem shouldn't know/care that it is in a new directory. Also, if I create a /data/brick/test2 volume and put data on it, it gets slow in gluster. But, writing to /data/brick is still fine. And, after test2 gets slow, I can create a /data/test3 volume that is empty and its speed is fine. My knowledge is admittedly very limited here, but I don't see how it could be the underlying filesystem if the slowdown only occurs on the gluster mount and not on the underlying xfs filesystem. David (Sent from mobile) === David F. Robinson, Ph.D. President - Corvid Technologies 704.799.6944 x101 [office] 704.252.1310 [cell] 704.799.7974 [fax] david.robin...@corvidtec.com http://www.corvidtechnologies.com On Feb 11, 2015, at 12:18 AM, Justin Clift jus...@gluster.org wrote: On 11 Feb 2015, at 03:06, Shyam srang...@redhat.com wrote: snip 2) We ran an strace of tar and also collected io-stats outputs from these volumes, both show that create and mkdir is slower on slow as compared to the fast volume. This seems to be the overall reason for slowness Any idea's on why the create and mkdir is slower? Wondering if it's a case of underlying filesystem parameters (for the bricks) + maybe physical storage structure having become badly optimised over time. eg if its on spinning rust, not ssd, and sector placement is now bad Any idea if there are tools that can analyse this kind of thing? eg meta data placement / fragmentation / on a drive for XFS/ext4 + Justin -- GlusterFS - http://www.gluster.org An open source, distributed file system scaling to several petabytes, and handling thousands of clients. My personal twitter: twitter.com/realjustinclift ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] missing files
Some time ago I had a similar performance problem (with 3.4 if I remember correctly): a just created volume started to work fine, but after some time using it performance was worse. Removing all files from the volume didn't improve the performance again. The only way I had to recover a performance similar to the initial one without recreating the volume was to remove all volume contents and also delete all 256 .glusterfs/xx/ directories from all bricks. The backend filesystem was XFS. Could you try if this is the same case ? Xavi On 02/11/2015 12:22 PM, David F. Robinson wrote: Don't think it is the underlying file system. /data/brickxx is the underlying xfs. Performance to this is fine. When I created a volume it just puts the data in /data/brick/test2. The underlying filesystem shouldn't know/care that it is in a new directory. Also, if I create a /data/brick/test2 volume and put data on it, it gets slow in gluster. But, writing to /data/brick is still fine. And, after test2 gets slow, I can create a /data/test3 volume that is empty and its speed is fine. My knowledge is admittedly very limited here, but I don't see how it could be the underlying filesystem if the slowdown only occurs on the gluster mount and not on the underlying xfs filesystem. David (Sent from mobile) === David F. Robinson, Ph.D. President - Corvid Technologies 704.799.6944 x101 [office] 704.252.1310 [cell] 704.799.7974 [fax] david.robin...@corvidtec.com http://www.corvidtechnologies.com On Feb 11, 2015, at 12:18 AM, Justin Clift jus...@gluster.org wrote: On 11 Feb 2015, at 03:06, Shyam srang...@redhat.com wrote: snip 2) We ran an strace of tar and also collected io-stats outputs from these volumes, both show that create and mkdir is slower on slow as compared to the fast volume. This seems to be the overall reason for slowness Any idea's on why the create and mkdir is slower? Wondering if it's a case of underlying filesystem parameters (for the bricks) + maybe physical storage structure having become badly optimised over time. eg if its on spinning rust, not ssd, and sector placement is now bad Any idea if there are tools that can analyse this kind of thing? eg meta data placement / fragmentation / on a drive for XFS/ext4 + Justin -- GlusterFS - http://www.gluster.org An open source, distributed file system scaling to several petabytes, and handling thousands of clients. My personal twitter: twitter.com/realjustinclift ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] missing files
:15 FEASABILITY STUDY.docx -rwxrw 2 streadway sbir 3826704 Jan 21 14:57 FEASABILITY STUDY.one /data/brick02b/homegfs/documentation/programs/OLD_PROGRAMS/SBIR_TOM/Phase_1_SOCOM14-003_adv_armor/References: total 0 drwxrws--- 2 root root 10 Feb 4 18:12 . drwxrws--x 6 root root 95 Feb 4 18:12 .. [root@gfs02a ~]# ls -alR /data/brick0*/homegfs/documentation/programs/OLD_PROGRAMS/SBIR_TOM/Phase_1_SOCOM14-003_adv_armor/References /data/brick01a/homegfs/documentation/programs/OLD_PROGRAMS/SBIR_TOM/Phase_1_SOCOM14-003_adv_armor/References: total 0 drwxrws--- 3 root root 41 Feb 4 18:12 . drwxrws--x 7 root root 118 Feb 4 18:12 .. drwxrws--- 2 streadway sbir 80 Jan 23 14:46 USSOCOM_OPAQUE_ARMOR /data/brick01a/homegfs/documentation/programs/OLD_PROGRAMS/SBIR_TOM/Phase_1_SOCOM14-003_adv_armor/References/USSOCOM_OPAQUE_ARMOR: total 72 drwxrws--- 2 streadway sbir 80 Jan 23 14:46 . drwxrws--- 3 root root 41 Feb 4 18:12 .. -rwxrw 2 streadway sbir 17248 Jun 19 2014 COMPARISON OF SOLUTIONS.one -rwxrw 2 streadway sbir 49736 Jan 21 13:18 GIVEN TRADE SPACE.one /data/brick02a/homegfs/documentation/programs/OLD_PROGRAMS/SBIR_TOM/Phase_1_SOCOM14-003_adv_armor/References: total 0 drwxrws--- 3 root root 41 Feb 4 18:12 . drwxrws--x 7 root root 118 Feb 4 18:12 .. drwxrws--- 2 streadway sbir 79 Jan 23 14:46 USSOCOM_OPAQUE_ARMOR /data/brick02a/homegfs/documentation/programs/OLD_PROGRAMS/SBIR_TOM/Phase_1_SOCOM14-003_adv_armor/References/USSOCOM_OPAQUE_ARMOR: total 84 drwxrws--- 2 streadway sbir 79 Jan 23 14:46 . drwxrws--- 3 root root 41 Feb 4 18:12 .. -rwxrw 2 streadway sbir 42440 Jun 19 2014 ARMOR PACKAGES.one -rwxrw 2 streadway sbir 38184 Jun 19 2014 CURRENT STANDARD ARMORING.one [root@gfs02b ~]# ls -alR /data/brick0*/homegfs/documentation/programs/OLD_PROGRAMS/SBIR_TOM/Phase_1_SOCOM14-003_adv_armor/References /data/brick01b/homegfs/documentation/programs/OLD_PROGRAMS/SBIR_TOM/Phase_1_SOCOM14-003_adv_armor/References: total 0 drwxrws--- 3 root root 41 Feb 4 18:12 . drwxrws--x 7 root root 118 Feb 4 18:12 .. drwxrws--- 2 streadway sbir 80 Jan 23 14:46 USSOCOM_OPAQUE_ARMOR /data/brick01b/homegfs/documentation/programs/OLD_PROGRAMS/SBIR_TOM/Phase_1_SOCOM14-003_adv_armor/References/USSOCOM_OPAQUE_ARMOR: total 72 drwxrws--- 2 streadway sbir 80 Jan 23 14:46 . drwxrws--- 3 root root 41 Feb 4 18:12 .. -rwxrw 2 streadway sbir 17248 Jun 19 2014 COMPARISON OF SOLUTIONS.one -rwxrw 2 streadway sbir 49736 Jan 21 13:18 GIVEN TRADE SPACE.one /data/brick02b/homegfs/documentation/programs/OLD_PROGRAMS/SBIR_TOM/Phase_1_SOCOM14-003_adv_armor/References: total 0 drwxrws--- 3 root root 41 Feb 4 18:12 . drwxrws--x 7 root root 118 Feb 4 18:12 .. drwxrws--- 2 streadway sbir 79 Jan 23 14:46 USSOCOM_OPAQUE_ARMOR /data/brick02b/homegfs/documentation/programs/OLD_PROGRAMS/SBIR_TOM/Phase_1_SOCOM14-003_adv_armor/References/USSOCOM_OPAQUE_ARMOR: total 84 drwxrws--- 2 streadway sbir 79 Jan 23 14:46 . drwxrws--- 3 root root 41 Feb 4 18:12 .. -rwxrw 2 streadway sbir 42440 Jun 19 2014 ARMOR PACKAGES.one -rwxrw 2 streadway sbir 38184 Jun 19 2014 CURRENT STANDARD ARMORING.one -- Original Message -- From: Xavier Hernandez xhernan...@datalab.es To: David F. Robinson david.robin...@corvidtec.com; Benjamin Turner bennytu...@gmail.com; Pranith Kumar Karampuri pkara...@redhat.com Cc: gluster-us...@gluster.org gluster-us...@gluster.org; Gluster Devel gluster-devel@gluster.org Sent: 2/5/2015 5:14:22 AM Subject: Re: [Gluster-devel] missing files Is the failure repeatable ? with the same directories ? It's very weird that the directories appear on the volume when you do an 'ls' on the bricks. Could it be that you only made a single 'ls' on fuse mount which not showed the directory ? Is it possible that this 'ls' triggered a self-heal that repaired the problem, whatever it was, and when you did another 'ls' on the fuse mount after the 'ls' on the bricks, the directories were there ? The first 'ls' could have healed the files, causing that the following 'ls' on the bricks showed the files as if nothing were damaged. If that's the case, it's possible that there were some disconnections during the copy. Added Pranith because he knows better replication and self-heal details. Xavi On 02/04/2015 07:23 PM, David F. Robinson wrote: Distributed/replicated Volume Name: homegfs Type: Distributed-Replicate Volume ID: 1e32672a-f1b7-4b58-ba94-58c085e59071 Status: Started Number of Bricks: 4 x 2 = 8 Transport-type: tcp Bricks: Brick1: gfsib01a.corvidtec.com:/data/brick01a/homegfs Brick2: gfsib01b.corvidtec.com:/data/brick01b/homegfs Brick3: gfsib01a.corvidtec.com:/data/brick02a/homegfs Brick4: gfsib01b.corvidtec.com:/data/brick02b/homegfs Brick5: gfsib02a.corvidtec.com:/data/brick01a/homegfs Brick6: gfsib02b.corvidtec.com:/data/brick01b/homegfs Brick7: gfsib02a.corvidtec.com:/data/brick02a/homegfs Brick8: gfsib02b.corvidtec.com:/data/brick02b/homegfs Options Reconfigured: performance.io
Re: [Gluster-devel] missing files
/Phase_1_SOCOM14-003_adv_armor/References: total 0 drwxrws--- 2 root root 10 Feb 4 18:12 . drwxrws--x 6 root root 95 Feb 4 18:12 .. [root@gfs02a ~]# ls -alR /data/brick0*/homegfs/documentation/programs/OLD_PROGRAMS/SBIR_TOM/Phase_1_SOCOM14-003_adv_armor/References /data/brick01a/homegfs/documentation/programs/OLD_PROGRAMS/SBIR_TOM/Phase_1_SOCOM14-003_adv_armor/References: total 0 drwxrws--- 3 root root 41 Feb 4 18:12 . drwxrws--x 7 root root 118 Feb 4 18:12 .. drwxrws--- 2 streadway sbir 80 Jan 23 14:46 USSOCOM_OPAQUE_ARMOR /data/brick01a/homegfs/documentation/programs/OLD_PROGRAMS/SBIR_TOM/Phase_1_SOCOM14-003_adv_armor/References/USSOCOM_OPAQUE_ARMOR: total 72 drwxrws--- 2 streadway sbir 80 Jan 23 14:46 . drwxrws--- 3 root root 41 Feb 4 18:12 .. -rwxrw 2 streadway sbir 17248 Jun 19 2014 COMPARISON OF SOLUTIONS.one -rwxrw 2 streadway sbir 49736 Jan 21 13:18 GIVEN TRADE SPACE.one /data/brick02a/homegfs/documentation/programs/OLD_PROGRAMS/SBIR_TOM/Phase_1_SOCOM14-003_adv_armor/References: total 0 drwxrws--- 3 root root 41 Feb 4 18:12 . drwxrws--x 7 root root 118 Feb 4 18:12 .. drwxrws--- 2 streadway sbir 79 Jan 23 14:46 USSOCOM_OPAQUE_ARMOR /data/brick02a/homegfs/documentation/programs/OLD_PROGRAMS/SBIR_TOM/Phase_1_SOCOM14-003_adv_armor/References/USSOCOM_OPAQUE_ARMOR: total 84 drwxrws--- 2 streadway sbir 79 Jan 23 14:46 . drwxrws--- 3 root root 41 Feb 4 18:12 .. -rwxrw 2 streadway sbir 42440 Jun 19 2014 ARMOR PACKAGES.one -rwxrw 2 streadway sbir 38184 Jun 19 2014 CURRENT STANDARD ARMORING.one [root@gfs02b ~]# ls -alR /data/brick0*/homegfs/documentation/programs/OLD_PROGRAMS/SBIR_TOM/Phase_1_SOCOM14-003_adv_armor/References /data/brick01b/homegfs/documentation/programs/OLD_PROGRAMS/SBIR_TOM/Phase_1_SOCOM14-003_adv_armor/References: total 0 drwxrws--- 3 root root 41 Feb 4 18:12 . drwxrws--x 7 root root 118 Feb 4 18:12 .. drwxrws--- 2 streadway sbir 80 Jan 23 14:46 USSOCOM_OPAQUE_ARMOR /data/brick01b/homegfs/documentation/programs/OLD_PROGRAMS/SBIR_TOM/Phase_1_SOCOM14-003_adv_armor/References/USSOCOM_OPAQUE_ARMOR: total 72 drwxrws--- 2 streadway sbir 80 Jan 23 14:46 . drwxrws--- 3 root root 41 Feb 4 18:12 .. -rwxrw 2 streadway sbir 17248 Jun 19 2014 COMPARISON OF SOLUTIONS.one -rwxrw 2 streadway sbir 49736 Jan 21 13:18 GIVEN TRADE SPACE.one /data/brick02b/homegfs/documentation/programs/OLD_PROGRAMS/SBIR_TOM/Phase_1_SOCOM14-003_adv_armor/References: total 0 drwxrws--- 3 root root 41 Feb 4 18:12 . drwxrws--x 7 root root 118 Feb 4 18:12 .. drwxrws--- 2 streadway sbir 79 Jan 23 14:46 USSOCOM_OPAQUE_ARMOR /data/brick02b/homegfs/documentation/programs/OLD_PROGRAMS/SBIR_TOM/Phase_1_SOCOM14-003_adv_armor/References/USSOCOM_OPAQUE_ARMOR: total 84 drwxrws--- 2 streadway sbir 79 Jan 23 14:46 . drwxrws--- 3 root root 41 Feb 4 18:12 .. -rwxrw 2 streadway sbir 42440 Jun 19 2014 ARMOR PACKAGES.one -rwxrw 2 streadway sbir 38184 Jun 19 2014 CURRENT STANDARD ARMORING.one -- Original Message -- From: Xavier Hernandez xhernan...@datalab.es To: David F. Robinson david.robin...@corvidtec.com; Benjamin Turner bennytu...@gmail.com; Pranith Kumar Karampuri pkara...@redhat.com Cc: gluster-us...@gluster.org gluster-us...@gluster.org; Gluster Devel gluster-devel@gluster.org Sent: 2/5/2015 5:14:22 AM Subject: Re: [Gluster-devel] missing files Is the failure repeatable ? with the same directories ? It's very weird that the directories appear on the volume when you do an 'ls' on the bricks. Could it be that you only made a single 'ls' on fuse mount which not showed the directory ? Is it possible that this 'ls' triggered a self-heal that repaired the problem, whatever it was, and when you did another 'ls' on the fuse mount after the 'ls' on the bricks, the directories were there ? The first 'ls' could have healed the files, causing that the following 'ls' on the bricks showed the files as if nothing were damaged. If that's the case, it's possible that there were some disconnections during the copy. Added Pranith because he knows better replication and self-heal details. Xavi On 02/04/2015 07:23 PM, David F. Robinson wrote: Distributed/replicated Volume Name: homegfs Type: Distributed-Replicate Volume ID: 1e32672a-f1b7-4b58-ba94-58c085e59071 Status: Started Number of Bricks: 4 x 2 = 8 Transport-type: tcp Bricks: Brick1: gfsib01a.corvidtec.com:/data/brick01a/homegfs Brick2: gfsib01b.corvidtec.com:/data/brick01b/homegfs Brick3: gfsib01a.corvidtec.com:/data/brick02a/homegfs Brick4: gfsib01b.corvidtec.com:/data/brick02b/homegfs Brick5: gfsib02a.corvidtec.com:/data/brick01a/homegfs Brick6: gfsib02b.corvidtec.com:/data/brick01b/homegfs Brick7: gfsib02a.corvidtec.com:/data/brick02a/homegfs Brick8: gfsib02b.corvidtec.com:/data/brick02b/homegfs Options Reconfigured: performance.io-thread-count: 32 performance.cache-size: 128MB performance.write-behind-window-size: 128MB server.allow-insecure: on network.ping-timeout: 10 storage.owner-gid: 100 geo-replication.indexing: off geo
Re: [Gluster-devel] missing files
Is the failure repeatable ? with the same directories ? It's very weird that the directories appear on the volume when you do an 'ls' on the bricks. Could it be that you only made a single 'ls' on fuse mount which not showed the directory ? Is it possible that this 'ls' triggered a self-heal that repaired the problem, whatever it was, and when you did another 'ls' on the fuse mount after the 'ls' on the bricks, the directories were there ? The first 'ls' could have healed the files, causing that the following 'ls' on the bricks showed the files as if nothing were damaged. If that's the case, it's possible that there were some disconnections during the copy. Added Pranith because he knows better replication and self-heal details. Xavi On 02/04/2015 07:23 PM, David F. Robinson wrote: Distributed/replicated Volume Name: homegfs Type: Distributed-Replicate Volume ID: 1e32672a-f1b7-4b58-ba94-58c085e59071 Status: Started Number of Bricks: 4 x 2 = 8 Transport-type: tcp Bricks: Brick1: gfsib01a.corvidtec.com:/data/brick01a/homegfs Brick2: gfsib01b.corvidtec.com:/data/brick01b/homegfs Brick3: gfsib01a.corvidtec.com:/data/brick02a/homegfs Brick4: gfsib01b.corvidtec.com:/data/brick02b/homegfs Brick5: gfsib02a.corvidtec.com:/data/brick01a/homegfs Brick6: gfsib02b.corvidtec.com:/data/brick01b/homegfs Brick7: gfsib02a.corvidtec.com:/data/brick02a/homegfs Brick8: gfsib02b.corvidtec.com:/data/brick02b/homegfs Options Reconfigured: performance.io-thread-count: 32 performance.cache-size: 128MB performance.write-behind-window-size: 128MB server.allow-insecure: on network.ping-timeout: 10 storage.owner-gid: 100 geo-replication.indexing: off geo-replication.ignore-pid-check: on changelog.changelog: on changelog.fsync-interval: 3 changelog.rollover-time: 15 server.manage-gids: on -- Original Message -- From: Xavier Hernandez xhernan...@datalab.es To: David F. Robinson david.robin...@corvidtec.com; Benjamin Turner bennytu...@gmail.com Cc: gluster-us...@gluster.org gluster-us...@gluster.org; Gluster Devel gluster-devel@gluster.org Sent: 2/4/2015 6:03:45 AM Subject: Re: [Gluster-devel] missing files On 02/04/2015 01:30 AM, David F. Robinson wrote: Sorry. Thought about this a little more. I should have been clearer. The files were on both bricks of the replica, not just one side. So, both bricks had to have been up... The files/directories just don't show up on the mount. I was reading and saw a related bug (https://bugzilla.redhat.com/show_bug.cgi?id=1159484). I saw it suggested to run: find mount -d -exec getfattr -h -n trusted.ec.heal {} \; This command is specific for a dispersed volume. It won't do anything (aside from the error you are seeing) on a replicated volume. I think you are using a replicated volume, right ? In this case I'm not sure what can be happening. Is your volume a pure replicated one or a distributed-replicated ? on a pure replicated it doesn't make sense that some entries do not show in an 'ls' when the file is in both replicas (at least without any error message in the logs). On a distributed-replicated it could be caused by some problem while combining contents of each replica set. What's the configuration of your volume ? Xavi I get a bunch of errors for operation not supported: [root@gfs02a homegfs]# find wks_backup -d -exec getfattr -h -n trusted.ec.heal {} \; find: warning: the -d option is deprecated; please use -depth instead, because the latter is a POSIX-compliant feature. wks_backup/homer_backup/backup: trusted.ec.heal: Operation not supported wks_backup/homer_backup/logs/2014_05_20.log: trusted.ec.heal: Operation not supported wks_backup/homer_backup/logs/2014_05_21.log: trusted.ec.heal: Operation not supported wks_backup/homer_backup/logs/2014_05_18.log: trusted.ec.heal: Operation not supported wks_backup/homer_backup/logs/2014_05_19.log: trusted.ec.heal: Operation not supported wks_backup/homer_backup/logs/2014_05_22.log: trusted.ec.heal: Operation not supported wks_backup/homer_backup/logs: trusted.ec.heal: Operation not supported wks_backup/homer_backup: trusted.ec.heal: Operation not supported -- Original Message -- From: Benjamin Turner bennytu...@gmail.com mailto:bennytu...@gmail.com To: David F. Robinson david.robin...@corvidtec.com mailto:david.robin...@corvidtec.com Cc: Gluster Devel gluster-devel@gluster.org mailto:gluster-devel@gluster.org; gluster-us...@gluster.org gluster-us...@gluster.org mailto:gluster-us...@gluster.org Sent: 2/3/2015 7:12:34 PM Subject: Re: [Gluster-devel] missing files It sounds to me like the files were only copied to one replica, werent there for the initial for the initial ls which triggered a self heal, and were there for the last ls because they were healed. Is there any chance that one of the replicas was down during the rsync? It could be that you lost a brick during copy or something like that. To confirm I would look for disconnects in the brick logs as well as checking glusterfshd.log
Re: [Gluster-devel] missing files
Distributed/replicated Volume Name: homegfs Type: Distributed-Replicate Volume ID: 1e32672a-f1b7-4b58-ba94-58c085e59071 Status: Started Number of Bricks: 4 x 2 = 8 Transport-type: tcp Bricks: Brick1: gfsib01a.corvidtec.com:/data/brick01a/homegfs Brick2: gfsib01b.corvidtec.com:/data/brick01b/homegfs Brick3: gfsib01a.corvidtec.com:/data/brick02a/homegfs Brick4: gfsib01b.corvidtec.com:/data/brick02b/homegfs Brick5: gfsib02a.corvidtec.com:/data/brick01a/homegfs Brick6: gfsib02b.corvidtec.com:/data/brick01b/homegfs Brick7: gfsib02a.corvidtec.com:/data/brick02a/homegfs Brick8: gfsib02b.corvidtec.com:/data/brick02b/homegfs Options Reconfigured: performance.io-thread-count: 32 performance.cache-size: 128MB performance.write-behind-window-size: 128MB server.allow-insecure: on network.ping-timeout: 10 storage.owner-gid: 100 geo-replication.indexing: off geo-replication.ignore-pid-check: on changelog.changelog: on changelog.fsync-interval: 3 changelog.rollover-time: 15 server.manage-gids: on -- Original Message -- From: Xavier Hernandez xhernan...@datalab.es To: David F. Robinson david.robin...@corvidtec.com; Benjamin Turner bennytu...@gmail.com Cc: gluster-us...@gluster.org gluster-us...@gluster.org; Gluster Devel gluster-devel@gluster.org Sent: 2/4/2015 6:03:45 AM Subject: Re: [Gluster-devel] missing files On 02/04/2015 01:30 AM, David F. Robinson wrote: Sorry. Thought about this a little more. I should have been clearer. The files were on both bricks of the replica, not just one side. So, both bricks had to have been up... The files/directories just don't show up on the mount. I was reading and saw a related bug (https://bugzilla.redhat.com/show_bug.cgi?id=1159484). I saw it suggested to run: find mount -d -exec getfattr -h -n trusted.ec.heal {} \; This command is specific for a dispersed volume. It won't do anything (aside from the error you are seeing) on a replicated volume. I think you are using a replicated volume, right ? In this case I'm not sure what can be happening. Is your volume a pure replicated one or a distributed-replicated ? on a pure replicated it doesn't make sense that some entries do not show in an 'ls' when the file is in both replicas (at least without any error message in the logs). On a distributed-replicated it could be caused by some problem while combining contents of each replica set. What's the configuration of your volume ? Xavi I get a bunch of errors for operation not supported: [root@gfs02a homegfs]# find wks_backup -d -exec getfattr -h -n trusted.ec.heal {} \; find: warning: the -d option is deprecated; please use -depth instead, because the latter is a POSIX-compliant feature. wks_backup/homer_backup/backup: trusted.ec.heal: Operation not supported wks_backup/homer_backup/logs/2014_05_20.log: trusted.ec.heal: Operation not supported wks_backup/homer_backup/logs/2014_05_21.log: trusted.ec.heal: Operation not supported wks_backup/homer_backup/logs/2014_05_18.log: trusted.ec.heal: Operation not supported wks_backup/homer_backup/logs/2014_05_19.log: trusted.ec.heal: Operation not supported wks_backup/homer_backup/logs/2014_05_22.log: trusted.ec.heal: Operation not supported wks_backup/homer_backup/logs: trusted.ec.heal: Operation not supported wks_backup/homer_backup: trusted.ec.heal: Operation not supported -- Original Message -- From: Benjamin Turner bennytu...@gmail.com mailto:bennytu...@gmail.com To: David F. Robinson david.robin...@corvidtec.com mailto:david.robin...@corvidtec.com Cc: Gluster Devel gluster-devel@gluster.org mailto:gluster-devel@gluster.org; gluster-us...@gluster.org gluster-us...@gluster.org mailto:gluster-us...@gluster.org Sent: 2/3/2015 7:12:34 PM Subject: Re: [Gluster-devel] missing files It sounds to me like the files were only copied to one replica, werent there for the initial for the initial ls which triggered a self heal, and were there for the last ls because they were healed. Is there any chance that one of the replicas was down during the rsync? It could be that you lost a brick during copy or something like that. To confirm I would look for disconnects in the brick logs as well as checking glusterfshd.log to verify the missing files were actually healed. -b On Tue, Feb 3, 2015 at 5:37 PM, David F. Robinson david.robin...@corvidtec.com mailto:david.robin...@corvidtec.com wrote: I rsync'd 20-TB over to my gluster system and noticed that I had some directories missing even though the rsync completed normally. The rsync logs showed that the missing files were transferred. I went to the bricks and did an 'ls -al /data/brick*/homegfs/dir/*' the files were on the bricks. After I did this 'ls', the files then showed up on the FUSE mounts. 1) Why are the files hidden on the fuse mount? 2) Why does the ls make them show up on the FUSE mount? 3) How can I prevent this from happening again? Note, I also mounted the gluster volume using NFS and saw
Re: [Gluster-devel] missing files
On 02/04/2015 01:30 AM, David F. Robinson wrote: Sorry. Thought about this a little more. I should have been clearer. The files were on both bricks of the replica, not just one side. So, both bricks had to have been up... The files/directories just don't show up on the mount. I was reading and saw a related bug (https://bugzilla.redhat.com/show_bug.cgi?id=1159484). I saw it suggested to run: find mount -d -exec getfattr -h -n trusted.ec.heal {} \; This command is specific for a dispersed volume. It won't do anything (aside from the error you are seeing) on a replicated volume. I think you are using a replicated volume, right ? In this case I'm not sure what can be happening. Is your volume a pure replicated one or a distributed-replicated ? on a pure replicated it doesn't make sense that some entries do not show in an 'ls' when the file is in both replicas (at least without any error message in the logs). On a distributed-replicated it could be caused by some problem while combining contents of each replica set. What's the configuration of your volume ? Xavi I get a bunch of errors for operation not supported: [root@gfs02a homegfs]# find wks_backup -d -exec getfattr -h -n trusted.ec.heal {} \; find: warning: the -d option is deprecated; please use -depth instead, because the latter is a POSIX-compliant feature. wks_backup/homer_backup/backup: trusted.ec.heal: Operation not supported wks_backup/homer_backup/logs/2014_05_20.log: trusted.ec.heal: Operation not supported wks_backup/homer_backup/logs/2014_05_21.log: trusted.ec.heal: Operation not supported wks_backup/homer_backup/logs/2014_05_18.log: trusted.ec.heal: Operation not supported wks_backup/homer_backup/logs/2014_05_19.log: trusted.ec.heal: Operation not supported wks_backup/homer_backup/logs/2014_05_22.log: trusted.ec.heal: Operation not supported wks_backup/homer_backup/logs: trusted.ec.heal: Operation not supported wks_backup/homer_backup: trusted.ec.heal: Operation not supported -- Original Message -- From: Benjamin Turner bennytu...@gmail.com mailto:bennytu...@gmail.com To: David F. Robinson david.robin...@corvidtec.com mailto:david.robin...@corvidtec.com Cc: Gluster Devel gluster-devel@gluster.org mailto:gluster-devel@gluster.org; gluster-us...@gluster.org gluster-us...@gluster.org mailto:gluster-us...@gluster.org Sent: 2/3/2015 7:12:34 PM Subject: Re: [Gluster-devel] missing files It sounds to me like the files were only copied to one replica, werent there for the initial for the initial ls which triggered a self heal, and were there for the last ls because they were healed. Is there any chance that one of the replicas was down during the rsync? It could be that you lost a brick during copy or something like that. To confirm I would look for disconnects in the brick logs as well as checking glusterfshd.log to verify the missing files were actually healed. -b On Tue, Feb 3, 2015 at 5:37 PM, David F. Robinson david.robin...@corvidtec.com mailto:david.robin...@corvidtec.com wrote: I rsync'd 20-TB over to my gluster system and noticed that I had some directories missing even though the rsync completed normally. The rsync logs showed that the missing files were transferred. I went to the bricks and did an 'ls -al /data/brick*/homegfs/dir/*' the files were on the bricks. After I did this 'ls', the files then showed up on the FUSE mounts. 1) Why are the files hidden on the fuse mount? 2) Why does the ls make them show up on the FUSE mount? 3) How can I prevent this from happening again? Note, I also mounted the gluster volume using NFS and saw the same behavior. The files/directories were not shown until I did the ls on the bricks. David === David F. Robinson, Ph.D. President - Corvid Technologies 704.799.6944 x101 tel:704.799.6944%20x101 [office] 704.252.1310 tel:704.252.1310 [cell] 704.799.7974 tel:704.799.7974 [fax] david.robin...@corvidtec.com mailto:david.robin...@corvidtec.com http://www.corvidtechnologies.com http://www.corvidtechnologies.com/ ___ Gluster-devel mailing list Gluster-devel@gluster.org mailto:Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
[Gluster-devel] missing files
I rsync'd 20-TB over to my gluster system and noticed that I had some directories missing even though the rsync completed normally. The rsync logs showed that the missing files were transferred. I went to the bricks and did an 'ls -al /data/brick*/homegfs/dir/*' the files were on the bricks. After I did this 'ls', the files then showed up on the FUSE mounts. 1) Why are the files hidden on the fuse mount? 2) Why does the ls make them show up on the FUSE mount? 3) How can I prevent this from happening again? Note, I also mounted the gluster volume using NFS and saw the same behavior. The files/directories were not shown until I did the ls on the bricks. David === David F. Robinson, Ph.D. President - Corvid Technologies 704.799.6944 x101 [office] 704.252.1310 [cell] 704.799.7974 [fax] david.robin...@corvidtec.com http://www.corvidtechnologies.com ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] missing files
Sorry. Thought about this a little more. I should have been clearer. The files were on both bricks of the replica, not just one side. So, both bricks had to have been up... The files/directories just don't show up on the mount. I was reading and saw a related bug (https://bugzilla.redhat.com/show_bug.cgi?id=1159484). I saw it suggested to run: find mount -d -exec getfattr -h -n trusted.ec.heal {} \; I get a bunch of errors for operation not supported: [root@gfs02a homegfs]# find wks_backup -d -exec getfattr -h -n trusted.ec.heal {} \; find: warning: the -d option is deprecated; please use -depth instead, because the latter is a POSIX-compliant feature. wks_backup/homer_backup/backup: trusted.ec.heal: Operation not supported wks_backup/homer_backup/logs/2014_05_20.log: trusted.ec.heal: Operation not supported wks_backup/homer_backup/logs/2014_05_21.log: trusted.ec.heal: Operation not supported wks_backup/homer_backup/logs/2014_05_18.log: trusted.ec.heal: Operation not supported wks_backup/homer_backup/logs/2014_05_19.log: trusted.ec.heal: Operation not supported wks_backup/homer_backup/logs/2014_05_22.log: trusted.ec.heal: Operation not supported wks_backup/homer_backup/logs: trusted.ec.heal: Operation not supported wks_backup/homer_backup: trusted.ec.heal: Operation not supported -- Original Message -- From: Benjamin Turner bennytu...@gmail.com To: David F. Robinson david.robin...@corvidtec.com Cc: Gluster Devel gluster-devel@gluster.org; gluster-us...@gluster.org gluster-us...@gluster.org Sent: 2/3/2015 7:12:34 PM Subject: Re: [Gluster-devel] missing files It sounds to me like the files were only copied to one replica, werent there for the initial for the initial ls which triggered a self heal, and were there for the last ls because they were healed. Is there any chance that one of the replicas was down during the rsync? It could be that you lost a brick during copy or something like that. To confirm I would look for disconnects in the brick logs as well as checking glusterfshd.log to verify the missing files were actually healed. -b On Tue, Feb 3, 2015 at 5:37 PM, David F. Robinson david.robin...@corvidtec.com wrote: I rsync'd 20-TB over to my gluster system and noticed that I had some directories missing even though the rsync completed normally. The rsync logs showed that the missing files were transferred. I went to the bricks and did an 'ls -al /data/brick*/homegfs/dir/*' the files were on the bricks. After I did this 'ls', the files then showed up on the FUSE mounts. 1) Why are the files hidden on the fuse mount? 2) Why does the ls make them show up on the FUSE mount? 3) How can I prevent this from happening again? Note, I also mounted the gluster volume using NFS and saw the same behavior. The files/directories were not shown until I did the ls on the bricks. David === David F. Robinson, Ph.D. President - Corvid Technologies 704.799.6944 x101 [office] 704.252.1310 [cell] 704.799.7974 [fax] david.robin...@corvidtec.com http://www.corvidtechnologies.com ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] missing files
It sounds to me like the files were only copied to one replica, werent there for the initial for the initial ls which triggered a self heal, and were there for the last ls because they were healed. Is there any chance that one of the replicas was down during the rsync? It could be that you lost a brick during copy or something like that. To confirm I would look for disconnects in the brick logs as well as checking glusterfshd.log to verify the missing files were actually healed. -b On Tue, Feb 3, 2015 at 5:37 PM, David F. Robinson david.robin...@corvidtec.com wrote: I rsync'd 20-TB over to my gluster system and noticed that I had some directories missing even though the rsync completed normally. The rsync logs showed that the missing files were transferred. I went to the bricks and did an 'ls -al /data/brick*/homegfs/dir/*' the files were on the bricks. After I did this 'ls', the files then showed up on the FUSE mounts. 1) Why are the files hidden on the fuse mount? 2) Why does the ls make them show up on the FUSE mount? 3) How can I prevent this from happening again? Note, I also mounted the gluster volume using NFS and saw the same behavior. The files/directories were not shown until I did the ls on the bricks. David === David F. Robinson, Ph.D. President - Corvid Technologies 704.799.6944 x101 [office] 704.252.1310 [cell] 704.799.7974 [fax] david.robin...@corvidtec.com http://www.corvidtechnologies.com ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel ___ Gluster-devel mailing list Gluster-devel@gluster.org http://www.gluster.org/mailman/listinfo/gluster-devel
Re: [Gluster-devel] missing files
Like these? data-brick02a-homegfs.log:[2015-02-03 19:09:34.568842] I [server.c:518:server_rpc_notify] 0-homegfs-server: disconnecting connection from gfs02a.corvidtec.com-18563-2015/02/03-19:07:58:519134-homegfs-client-2-0-0 data-brick02a-homegfs.log:[2015-02-03 19:09:41.286551] I [server.c:518:server_rpc_notify] 0-homegfs-server: disconnecting connection from gfs01a.corvidtec.com-12804-2015/02/03-19:09:38:497808-homegfs-client-2-0-0 data-brick02a-homegfs.log:[2015-02-03 19:16:35.906412] I [server.c:518:server_rpc_notify] 0-homegfs-server: disconnecting connection from gfs02b.corvidtec.com-27190-2015/02/03-19:15:53:458467-homegfs-client-2-0-0 data-brick02a-homegfs.log:[2015-02-03 19:51:22.761293] I [server.c:518:server_rpc_notify] 0-homegfs-server: disconnecting connection from gfs01a.corvidtec.com-25926-2015/02/03-19:51:02:89070-homegfs-client-2-0-0 data-brick02a-homegfs.log:[2015-02-03 20:54:02.772180] I [server.c:518:server_rpc_notify] 0-homegfs-server: disconnecting connection from gfs01b.corvidtec.com-4175-2015/02/02-16:44:31:179119-homegfs-client-2-0-1 data-brick02a-homegfs.log:[2015-02-03 22:44:47.458905] I [server.c:518:server_rpc_notify] 0-homegfs-server: disconnecting connection from gfs01a.corvidtec.com-29467-2015/02/03-22:44:05:838129-homegfs-client-2-0-0 data-brick02a-homegfs.log:[2015-02-03 22:47:42.830866] I [server.c:518:server_rpc_notify] 0-homegfs-server: disconnecting connection from gfs01a.corvidtec.com-30069-2015/02/03-22:47:37:209436-homegfs-client-2-0-0 data-brick02a-homegfs.log:[2015-02-03 22:48:26.785931] I [server.c:518:server_rpc_notify] 0-homegfs-server: disconnecting connection from gfs01a.corvidtec.com-30256-2015/02/03-22:47:55:203659-homegfs-client-2-0-0 data-brick02a-homegfs.log:[2015-02-03 22:53:25.530836] I [server.c:518:server_rpc_notify] 0-homegfs-server: disconnecting connection from gfs01a.corvidtec.com-30658-2015/02/03-22:53:21:627538-homegfs-client-2-0-0 data-brick02a-homegfs.log:[2015-02-03 22:56:14.033823] I [server.c:518:server_rpc_notify] 0-homegfs-server: disconnecting connection from gfs01a.corvidtec.com-30893-2015/02/03-22:56:01:450507-homegfs-client-2-0-0 data-brick02a-homegfs.log:[2015-02-03 22:56:55.622800] I [server.c:518:server_rpc_notify] 0-homegfs-server: disconnecting connection from gfs01a.corvidtec.com-31080-2015/02/03-22:56:32:665370-homegfs-client-2-0-0 data-brick02a-homegfs.log:[2015-02-03 22:59:11.445742] I [server.c:518:server_rpc_notify] 0-homegfs-server: disconnecting connection from gfs01a.corvidtec.com-31383-2015/02/03-22:58:45:190874-homegfs-client-2-0-0 data-brick02a-homegfs.log:[2015-02-03 23:06:26.482709] I [server.c:518:server_rpc_notify] 0-homegfs-server: disconnecting connection from gfs01a.corvidtec.com-31720-2015/02/03-23:06:11:340012-homegfs-client-2-0-0 data-brick02a-homegfs.log:[2015-02-03 23:10:54.807725] I [server.c:518:server_rpc_notify] 0-homegfs-server: disconnecting connection from gfs01a.corvidtec.com-32083-2015/02/03-23:10:22:131678-homegfs-client-2-0-0 data-brick02a-homegfs.log:[2015-02-03 23:13:35.545513] I [server.c:518:server_rpc_notify] 0-homegfs-server: disconnecting connection from gfs01a.corvidtec.com-32284-2015/02/03-23:13:21:26552-homegfs-client-2-0-0 data-brick02a-homegfs.log:[2015-02-03 23:14:19.065271] I [server.c:518:server_rpc_notify] 0-homegfs-server: disconnecting connection from gfs01a.corvidtec.com-32471-2015/02/03-23:13:48:221126-homegfs-client-2-0-0 data-brick02a-homegfs.log:[2015-02-04 00:18:20.261428] I [server.c:518:server_rpc_notify] 0-homegfs-server: disconnecting connection from gfs01a.corvidtec.com-1369-2015/02/04-00:16:53:613570-homegfs-client-2-0-0 -- Original Message -- From: Benjamin Turner bennytu...@gmail.com To: David F. Robinson david.robin...@corvidtec.com Cc: Gluster Devel gluster-devel@gluster.org; gluster-us...@gluster.org gluster-us...@gluster.org Sent: 2/3/2015 7:12:34 PM Subject: Re: [Gluster-devel] missing files It sounds to me like the files were only copied to one replica, werent there for the initial for the initial ls which triggered a self heal, and were there for the last ls because they were healed. Is there any chance that one of the replicas was down during the rsync? It could be that you lost a brick during copy or something like that. To confirm I would look for disconnects in the brick logs as well as checking glusterfshd.log to verify the missing files were actually healed. -b On Tue, Feb 3, 2015 at 5:37 PM, David F. Robinson david.robin...@corvidtec.com wrote: I rsync'd 20-TB over to my gluster system and noticed that I had some directories missing even though the rsync completed normally. The rsync logs showed that the missing files were transferred. I went to the bricks and did an 'ls -al /data/brick*/homegfs/dir/*' the files were on the bricks. After I did this 'ls', the files then showed up on the FUSE mounts. 1) Why are the files hidden on the fuse mount? 2) Why does the ls