Romain, Thanks a lot for that pointer, it was surprisingly difficult to find anything about this via usual web search. After a little scripting and a bit of coordinated effort we're looking much healthier.
We noticed the worst offender directories appeared to have multiple "nested" rsync temp files left behind, i.e., the replicator on a peer host was trying to copy existing rsync partial files (e.g. ./objects/27913/91c/1b4263d0e93452b461572bbf57f9591c/.1467136978.92866.data.1RdQku.bCyhfi.6JlGol) and thus quickly compounding the problem! Cheers, On 20 July 2016 at 20:25, Romain LE DISEZ <[email protected]> wrote: > Hi Blair, > > they are temporary files from rsync, when the replicator tried to replicate a > partition and failed for some reason. > > You can safely delete them as long as there mtime is a bit old (do not delete > a file currently being replicated). Since 2.7, Swift take care of that: > https://github.com/openstack/swift/blob/master/CHANGELOG#L226 > > > > Le Mercredi 20 Juillet 2016 10:17 CEST, Blair Bethwaite > <[email protected]> a écrit: > >> Hi all, >> >> As per the subject, wondering where these files come from, e.g.,: >> >> root@stor010:/srv/node/sdc1/objects# ls -la >> ./109794/359/6b389b24749b7046344ffd2a42aab359 >> total 1195784 >> drwxr-xr-x 2 swift swift 4096 Jun 8 04:11 . >> drwxr-xr-x 3 swift swift 53 May 22 05:05 .. >> -rw------- 1 swift swift 204800000 Jun 8 04:11 1463857426.65100.data >> -rw------- 1 swift swift 6225920 Jun 3 00:42 .1463857426.65100.data.aCtGLk >> -rw------- 1 swift swift 197754880 Jun 2 11:49 .1463857426.65100.data.AMQhPo >> -rw------- 1 swift swift 33980416 Jun 3 00:41 .1463857426.65100.data.CkpDSv >> -rw------- 1 swift swift 7634944 Jun 3 04:02 >> .1463857426.65100.data.CkpDSv.CtrQws >> -rw------- 1 swift swift 189399040 Jun 1 18:42 .1463857426.65100.data.CRFb2k >> -rw------- 1 swift swift 47644672 Jun 2 11:51 .1463857426.65100.data.dKsUZI >> -rw------- 1 swift swift 157122560 Jun 3 13:57 .1463857426.65100.data.GpmbOK >> -rw------- 1 swift swift 174489600 Jun 2 11:50 .1463857426.65100.data.MAoI3y >> -rw------- 1 swift swift 174358528 Jun 3 00:42 .1463857426.65100.data.Pbsk7S >> -rw------- 1 swift swift 31064064 Jun 1 18:42 .1463857426.65100.data.xlmmie >> >> We have a geo-replicated cluster that is currently suffering from >> major outliers in disk usage, i.e.: >> >> [2016-07-20 18:08:33] Checking disk usage now >> Distribution Graph: >> 0% 2 * >> 32% 1 >> 34% 19 ********** >> 35% 49 ************************** >> 36% 127 >> ********************************************************************* >> 37% 111 ************************************************************ >> 38% 40 ********************* >> 39% 12 ****** >> 40% 6 *** >> 41% 6 *** >> 42% 2 * >> 43% 4 ** >> 44% 3 * >> 45% 1 >> 46% 3 * >> 47% 4 ** >> 48% 3 * >> 50% 1 >> 51% 2 * >> 52% 1 >> 53% 1 >> 54% 1 >> 56% 3 * >> 58% 1 >> 62% 2 * >> 63% 1 >> 71% 1 >> 73% 1 >> 75% 1 >> 76% 1 >> 78% 2 * >> 88% 1 >> 92% 2 * >> 95% 1 >> 96% 1 >> 100% 3 * >> Disk usage: space used: 395001580875776 of 995614295568384 >> Disk usage: space free: 600612714692608 of 995614295568384 >> Disk usage: lowest: 0.0%, highest: 100.0%, avg: 39.6741572147% >> >> It looks like this is attributable to a handful of object directories >> with lots of data.XXXXXX files in them, whereas >99% of object dirs >> just have a single .data file, e.g. this from one of the disks at >> ~60%: >> >> root@stor010:/srv/node/sdc1/objects# find . -mindepth 4 -type f >> -printf "%h\n" | sort | uniq -c | sort -rnk 1 | head -20 >> >> 733 ./151107/3b5/9390f9c2ceee07f059a0d1f651e423b5 >> 11 ./109794/359/6b389b24749b7046344ffd2a42aab359 >> 9 ./248385/60c/f2907cb0b290def6f614bf46a715a60c >> 5 ./222791/888/d991c8db1e2f1e724c1a4f52914f7888 >> 4 ./257772/140/fbbb1c017a841e6e821ed707025fe140 >> 4 ./231068/ca6/e1a706f50dd99f97fafeba6bd1f47ca6 >> 4 ./215734/80c/d2adbf087b09ca24cc546497d265180c >> 3 ./248166/8b9/f259b3b0113b522ec5ef5753588438b9 >> 3 ./221060/383/d7e101fd86eeef65f8c89f3f99ce4383 >> 2 ./38609/203/25b46d47af8ee700e87ceab33748b203 >> 2 ./27961/a78/1b4e5665c16e1da8a70cc093a2bbba78 >> 2 ./158466/d43/9ac0b8bd4e592fe6f3360a731bac7d43 >> 2 ./141275/588/89f6efb554e964aeebffa9dad9e17588 >> 1 ./99980/fad/61a3214454426f7b30fc62773eb3bfad >> 1 ./99980/fa4/61a311d12248a2e4ca8d2f61a3adafa4 >> 1 ./99980/ed9/61a32b429174f6099b0b35b0103e4ed9 >> 1 ./99980/e76/61a3129de66d7541ce127f07a7737e76 >> 1 ./99980/e3d/61a329332713fdacedb10834901f6e3d >> 1 ./99980/e1f/61a306b87307636cc569bd0bc047ee1f >> 1 ./99980/db8/61a30a00a9b7edd3318be25bbed47db8 >> >> root@stor010:/srv/node/sdc1/objects# du -sh >> ./151107/3b5/9390f9c2ceee07f059a0d1f651e423b5 >> 957G ./151107/3b5/9390f9c2ceee07f059a0d1f651e423b5 >> >> We're on version 2.5.0.7. >> >> -- >> Cheers, >> ~Blairo >> >> _______________________________________________ >> OpenStack-operators mailing list >> [email protected] >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators > > -- > Romain LE DISEZ > -- Cheers, ~Blairo _______________________________________________ OpenStack-operators mailing list [email protected] http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
