Re: copying milllions of small files and millions of dirs
On Thu, 10 Oct 2013, aurfalien wrote: On Aug 15, 2013, at 11:46 PM, Nicolas KOWALSKI wrote: On Thu, Aug 15, 2013 at 11:13:25AM -0700, aurfalien wrote: Is there a faster way to copy files over NFS? I would use find+cpio. This handles hard links, permissions, and in case of later runs, will not copy files if they already exist on the destination. # cd /source/dir # find . | cpio -pvdm /destination/dir Old thread I know but cpio has proven twice as fast as rsync. Trusty ol cpio. Gonna try cpdup next. Try sysutils/clone, too. ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"
Re: copying milllions of small files and millions of dirs
On Aug 15, 2013, at 11:46 PM, Nicolas KOWALSKI wrote: > On Thu, Aug 15, 2013 at 11:13:25AM -0700, aurfalien wrote: >> Is there a faster way to copy files over NFS? > > I would use find+cpio. This handles hard links, permissions, and in case > of later runs, will not copy files if they already exist on the > destination. > > # cd /source/dir > # find . | cpio -pvdm /destination/dir Old thread I know but cpio has proven twice as fast as rsync. Trusty ol cpio. Gonna try cpdup next. - aurf ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"
Re: copying milllions of small files and millions of dirs
On Mon, 19 Aug 2013, Mark Felder wrote: On Fri, Aug 16, 2013, at 1:46, Nicolas KOWALSKI wrote: On Thu, Aug 15, 2013 at 11:13:25AM -0700, aurfalien wrote: Is there a faster way to copy files over NFS? I would use find+cpio. This handles hard links, permissions, and in case of later runs, will not copy files if they already exist on the destination. # cd /source/dir # find . | cpio -pvdm /destination/dir I always found sysutils/cpdup to be faster than rsync. sysutils/clone may do better as well. ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"
Re: copying milllions of small files and millions of dirs
On 20/08/2013 08:32, krad wrote: When i migrated a large mailspool in maildir format from the old nfs server to the new one in a previous job, I 1st generated a list of the top level maildirs. I then generated the rsync commands + plus a few other bits and pieces for each maildir to make a single transaction like function. I then pumped all this auto generated scripts into xjobs and ran them in parallel. This vastly speeded up the process as sequentially running the tree was far to slow. THis was for about 15 million maildirs in a hashed structure btw so a fair amount of files. eg find /maildir -type d -maxdepth 4 | while read d do r=$(($RANDOM*$RANDOM)) echo rsync -a $d/ /newpath/$d/ > /tmp/scripts/$r echo some other stuff >> /tmp/scripts/$r done ls /tmp/scripts/| while read f echo /tmp/scripts/$f done | xjobs -j 20 This isn't what I'd have expected, as running operations in parallel on mechanical drives would normally result in superfluous head movements and thus exacerbate the I/O bottleneck. The system must be optimising the requests from 20 parallel jobs better than I thought it would to climb out from that hole far enough to get a net benefit. Did you remember how any other approaches performed? ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"
Re: copying milllions of small files and millions of dirs
whops that should have been ls /tmp/scripts/| while read f echo sh /tmp/scripts/$f done | xjobs -j 20 On 20 August 2013 08:32, krad wrote: > When i migrated a large mailspool in maildir format from the old nfs > server to the new one in a previous job, I 1st generated a list of the top > level maildirs. I then generated the rsync commands + plus a few other bits > and pieces for each maildir to make a single transaction like function. I > then pumped all this auto generated scripts into xjobs and ran them in > parallel. This vastly speeded up the process as sequentially running the > tree was far to slow. THis was for about 15 million maildirs in a hashed > structure btw so a fair amount of files. > > > eg > > find /maildir -type d -maxdepth 4 | while read d > do > r=$(($RANDOM*$RANDOM)) > echo rsync -a $d/ /newpath/$d/ > /tmp/scripts/$r > echo some other stuff >> /tmp/scripts/$r > done > > ls /tmp/scripts/| while read f > echo /tmp/scripts/$f > done | xjobs -j 20 > > > > > > > > > > > On 19 August 2013 18:52, aurfalien wrote: > >> >> On Aug 19, 2013, at 10:41 AM, Mark Felder wrote: >> >> > On Fri, Aug 16, 2013, at 1:46, Nicolas KOWALSKI wrote: >> >> On Thu, Aug 15, 2013 at 11:13:25AM -0700, aurfalien wrote: >> >>> Is there a faster way to copy files over NFS? >> >> >> >> I would use find+cpio. This handles hard links, permissions, and in >> case >> >> of later runs, will not copy files if they already exist on the >> >> destination. >> >> >> >> # cd /source/dir >> >> # find . | cpio -pvdm /destination/dir >> >> >> > >> > I always found sysutils/cpdup to be faster than rsync. >> >> Ah, bookmarking this one. >> >> Many thanks. >> >> - aurf >> ___ >> freebsd-questions@freebsd.org mailing list >> http://lists.freebsd.org/mailman/listinfo/freebsd-questions >> To unsubscribe, send any mail to " >> freebsd-questions-unsubscr...@freebsd.org" >> > > ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"
Re: copying milllions of small files and millions of dirs
When i migrated a large mailspool in maildir format from the old nfs server to the new one in a previous job, I 1st generated a list of the top level maildirs. I then generated the rsync commands + plus a few other bits and pieces for each maildir to make a single transaction like function. I then pumped all this auto generated scripts into xjobs and ran them in parallel. This vastly speeded up the process as sequentially running the tree was far to slow. THis was for about 15 million maildirs in a hashed structure btw so a fair amount of files. eg find /maildir -type d -maxdepth 4 | while read d do r=$(($RANDOM*$RANDOM)) echo rsync -a $d/ /newpath/$d/ > /tmp/scripts/$r echo some other stuff >> /tmp/scripts/$r done ls /tmp/scripts/| while read f echo /tmp/scripts/$f done | xjobs -j 20 On 19 August 2013 18:52, aurfalien wrote: > > On Aug 19, 2013, at 10:41 AM, Mark Felder wrote: > > > On Fri, Aug 16, 2013, at 1:46, Nicolas KOWALSKI wrote: > >> On Thu, Aug 15, 2013 at 11:13:25AM -0700, aurfalien wrote: > >>> Is there a faster way to copy files over NFS? > >> > >> I would use find+cpio. This handles hard links, permissions, and in case > >> of later runs, will not copy files if they already exist on the > >> destination. > >> > >> # cd /source/dir > >> # find . | cpio -pvdm /destination/dir > >> > > > > I always found sysutils/cpdup to be faster than rsync. > > Ah, bookmarking this one. > > Many thanks. > > - aurf > ___ > freebsd-questions@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-questions > To unsubscribe, send any mail to " > freebsd-questions-unsubscr...@freebsd.org" > ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"
Re: copying milllions of small files and millions of dirs
On Aug 19, 2013, at 10:41 AM, Mark Felder wrote: > On Fri, Aug 16, 2013, at 1:46, Nicolas KOWALSKI wrote: >> On Thu, Aug 15, 2013 at 11:13:25AM -0700, aurfalien wrote: >>> Is there a faster way to copy files over NFS? >> >> I would use find+cpio. This handles hard links, permissions, and in case >> of later runs, will not copy files if they already exist on the >> destination. >> >> # cd /source/dir >> # find . | cpio -pvdm /destination/dir >> > > I always found sysutils/cpdup to be faster than rsync. Ah, bookmarking this one. Many thanks. - aurf ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"
Re: copying milllions of small files and millions of dirs
On Fri, Aug 16, 2013, at 1:46, Nicolas KOWALSKI wrote: > On Thu, Aug 15, 2013 at 11:13:25AM -0700, aurfalien wrote: > > Is there a faster way to copy files over NFS? > > I would use find+cpio. This handles hard links, permissions, and in case > of later runs, will not copy files if they already exist on the > destination. > > # cd /source/dir > # find . | cpio -pvdm /destination/dir > I always found sysutils/cpdup to be faster than rsync. ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"
Re: copying milllions of small files and millions of dirs
On Thu, Aug 15, 2013 at 11:13:25AM -0700, aurfalien wrote: > Is there a faster way to copy files over NFS? I would use find+cpio. This handles hard links, permissions, and in case of later runs, will not copy files if they already exist on the destination. # cd /source/dir # find . | cpio -pvdm /destination/dir -- Nicolas ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"
Re: copying milllions of small files and millions of dirs
I would use ndmp. That is how we archive our nas crap isilon stuff but we have the backend accelerators Not sure if there is ndmp for FreeBSD. Like another poster said you are most likely i/o bound anyway. On Thu, Aug 15, 2013 at 2:14 PM, aurfalien wrote: > > On Aug 15, 2013, at 11:52 AM, Charles Swiger wrote: > > > On Aug 15, 2013, at 11:37 AM, aurfalien wrote: > >> On Aug 15, 2013, at 11:26 AM, Charles Swiger wrote: > >>> On Aug 15, 2013, at 11:13 AM, aurfalien wrote: > Is there a faster way to copy files over NFS? > >>> > >>> Probably. > >> > >> Ok, thanks for the specifics. > > > > You're most welcome. > > > Currently breaking up a simple rsync over 7 or so scripts which > copies 22 dirs having ~500,000 dirs or files each. > >>> > >>> There's a maximum useful concurrency which depends on how many disk > spindles and what flavor of RAID is in use; exceeding it will result in > thrashing the disks and heavily reducing throughput due to competing I/O > requests. Try measuring aggregate performance when running fewer rsyncs at > once and see whether it improves. > >> > >> Its 35 disks broken into 7 striped RaidZ groups with an SLC based ZIL > and no atime, the server it self has 128GB ECC RAM. I didn't have time to > tune or really learn ZFS but at this point its only backing up the data for > emergency purposes. > > > > OK. If you've got 7 independent groups and can use separate network > pipes for each parallel copy, then using 7 simultaneous scripts is likely > reasonable. > > > >>> Of course, putting half a million files into a single directory level > is also a bad idea, even with dirhash support. You'd do better to break > them up into subdirs containing fewer than ~10K files apiece. > >> > >> I can't, thats our job structure obviously developed by scrip kiddies > and not systems ppl, but I digress. > > > > Identifying something which is "broken as designed" is still helpful, > since it indicates what needs to change. > > > Obviously reading all the meta data is a PITA. > >>> > >>> Yes. > >>> > Doin 10Gb/jumbos but in this case it don't make much of a hoot of a > diff. > >>> > >>> Yeah, probably not-- you're almost certainly I/O bound, not network > bound. > >> > >> Actually it was network bound via 1 rsync process which is why I broke > up 154 dirs into 7 batches of 22 each. > > > > Oh. Um, unless you can make more network bandwidth available, you've > saturated the bottleneck. > > Doing a single copy task is likely to complete faster than splitting up > the job into subtasks in such a case. > > Well, using iftop, I am now at least able to get ~1Gb with 7 scripts going > were before it was in the 10Ms with 1. > > Also, physically looking at my ZFS server, it now shows the drives lights > are blinking faster, like every second. Were as before it was sort of > seldom, like every 3 seconds or so. > > I was thinking to perhaps zip dirs up and then xfer the file over but it > would prolly take as long to zip/unzip. > > This bloody project structure we have is nuts. > > - aurf > ___ > freebsd-questions@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-questions > To unsubscribe, send any mail to " > freebsd-questions-unsubscr...@freebsd.org" > ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"
Re: copying milllions of small files and millions of dirs
On Aug 15, 2013, at 1:35 PM, Roland Smith wrote: > On Thu, Aug 15, 2013 at 11:13:25AM -0700, aurfalien wrote: >> Hi all, >> >> Is there a faster way to copy files over NFS? > > Can you log into your NAS with ssh or telnet? I can but thats a back channel link of 100Mb link. - aurf ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"
Re: copying milllions of small files and millions of dirs
On Aug 15, 2013, at 1:22 PM, Charles Swiger wrote: > [ ...combining replies for brevity... ] > > On Aug 15, 2013, at 1:02 PM, Frank Leonhardt wrote: >> I'm reading all this with interest. The first thing I'd have tried would be >> tar (and probably netcat) but I'm a probably bit of a dinosaur. (If someone >> wants to buy me some really big drives I promise I'll update). If it's >> really NFS or nothing I guess you couldn't open a socket anyway. > > Either tar via netcat or SSH, or dump / restore via similar pipeline are > quite traditional. tar is more flexible for partial filesystem copies, > whereas the dump / restore is more oriented towards complete filesystem > copies. If the destination starts off empty, they're probably faster than > rsync, but rsync does delta updates which is a huge win if you're going to be > copying changes onto a slightly older version. Yep, so looks like it is what it is as the data set is changing while I do the base sync. So I'll have to do several more to pick up new comers etc... > Anyway, you're entirely right that the capabilities of the source matter a > great deal. > If it could do zfs send / receive, or similar snapshot mirroring, that would > likely do better than userland tools. > >> I'd be interested to know whether tar is still worth using in this world of >> volume managers and SMP. > > Yes. > > On Aug 15, 2013, at 12:14 PM, aurfalien wrote: > [ ... ] >> Doin 10Gb/jumbos but in this case it don't make much of a hoot of a diff. > > Yeah, probably not-- you're almost certainly I/O bound, not network bound. Actually it was network bound via 1 rsync process which is why I broke up 154 dirs into 7 batches of 22 each. >>> >>> Oh. Um, unless you can make more network bandwidth available, you've >>> saturated the bottleneck. >>> Doing a single copy task is likely to complete faster than splitting up the >>> job into subtasks in such a case. >> >> Well, using iftop, I am now at least able to get ~1Gb with 7 scripts going >> were before it was in the 10Ms with 1. > > 1 gigabyte of data per second is pretty decent for a 10Gb link; 10 MB/s > obviously wasn't close saturating a 10Gb link. Cool. Looks like I am doing my best which is what I wanted to know. I chose to do 7 rsync scripts as it evenly divides into 154 parent dirs :) You should see how our backup system deal with this; Atempo Time Navigator or Tina as its called. It takes an hour just to lay down the dirs on tape before even starting to backup, crazyness. And thats just for 1 parent dir having an avg of 500,000 dirs. Actually I'm prolly wrong as the initial creation is 125,000 dirs, of which a few are sym links. Then it grows from there. Looking at the Tina stats, we see a million objects or more. - aurf ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"
Re: copying milllions of small files and millions of dirs
On Thu, Aug 15, 2013 at 11:13:25AM -0700, aurfalien wrote: > Hi all, > > Is there a faster way to copy files over NFS? Can you log into your NAS with ssh or telnet? It so I would suggest using tar(1) and nc(1). It has been a while since I measured it, but IIRC the combination of tar (without compression) and netcat could saturate a 100 Mbit ethernet connection. Roland -- R.F.Smith http://rsmith.home.xs4all.nl/ [plain text _non-HTML_ PGP/GnuPG encrypted/signed email much appreciated] pgp: 1A2B 477F 9970 BA3C 2914 B7CE 1277 EFB0 C321 A725 (KeyID: C321A725) pgpQj6wccNv4f.pgp Description: PGP signature
Re: copying milllions of small files and millions of dirs
[ ...combining replies for brevity... ] On Aug 15, 2013, at 1:02 PM, Frank Leonhardt wrote: > I'm reading all this with interest. The first thing I'd have tried would be > tar (and probably netcat) but I'm a probably bit of a dinosaur. (If someone > wants to buy me some really big drives I promise I'll update). If it's really > NFS or nothing I guess you couldn't open a socket anyway. Either tar via netcat or SSH, or dump / restore via similar pipeline are quite traditional. tar is more flexible for partial filesystem copies, whereas the dump / restore is more oriented towards complete filesystem copies. If the destination starts off empty, they're probably faster than rsync, but rsync does delta updates which is a huge win if you're going to be copying changes onto a slightly older version. Anyway, you're entirely right that the capabilities of the source matter a great deal. If it could do zfs send / receive, or similar snapshot mirroring, that would likely do better than userland tools. > I'd be interested to know whether tar is still worth using in this world of > volume managers and SMP. Yes. On Aug 15, 2013, at 12:14 PM, aurfalien wrote: [ ... ] > Doin 10Gb/jumbos but in this case it don't make much of a hoot of a diff. Yeah, probably not-- you're almost certainly I/O bound, not network bound. >>> >>> Actually it was network bound via 1 rsync process which is why I broke up >>> 154 dirs into 7 batches of 22 each. >> >> Oh. Um, unless you can make more network bandwidth available, you've >> saturated the bottleneck. >> Doing a single copy task is likely to complete faster than splitting up the >> job into subtasks in such a case. > > Well, using iftop, I am now at least able to get ~1Gb with 7 scripts going > were before it was in the 10Ms with 1. 1 gigabyte of data per second is pretty decent for a 10Gb link; 10 MB/s obviously wasn't close saturating a 10Gb link. Regards, -- -Chuck ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"
Re: copying milllions of small files and millions of dirs
On 15/08/2013 19:13, aurfalien wrote: Hi all, Is there a faster way to copy files over NFS? Currently breaking up a simple rsync over 7 or so scripts which copies 22 dirs having ~500,000 dirs or files each. I'm reading all this with interest. The first thing I'd have tried would be tar (and probably netcat) but I'm a probably bit of a dinosaur. (If someone wants to buy me some really big drives I promise I'll update). If it's really NFS or nothing I guess you couldn't open a socket anyway. I'd be interested to know whether tar is still worth using in this world of volume managers and SMP. ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"
Re: copying milllions of small files and millions of dirs
On Aug 15, 2013, at 11:37 AM, aurfalien wrote: > On Aug 15, 2013, at 11:26 AM, Charles Swiger wrote: >> On Aug 15, 2013, at 11:13 AM, aurfalien wrote: >>> Is there a faster way to copy files over NFS? >> >> Probably. > > Ok, thanks for the specifics. You're most welcome. >>> Currently breaking up a simple rsync over 7 or so scripts which copies 22 >>> dirs having ~500,000 dirs or files each. >> >> There's a maximum useful concurrency which depends on how many disk spindles >> and what flavor of RAID is in use; exceeding it will result in thrashing the >> disks and heavily reducing throughput due to competing I/O requests. Try >> measuring aggregate performance when running fewer rsyncs at once and see >> whether it improves. > > Its 35 disks broken into 7 striped RaidZ groups with an SLC based ZIL and no > atime, the server it self has 128GB ECC RAM. I didn't have time to tune or > really learn ZFS but at this point its only backing up the data for emergency > purposes. OK. If you've got 7 independent groups and can use separate network pipes for each parallel copy, then using 7 simultaneous scripts is likely reasonable. >> Of course, putting half a million files into a single directory level is >> also a bad idea, even with dirhash support. You'd do better to break them >> up into subdirs containing fewer than ~10K files apiece. > > I can't, thats our job structure obviously developed by scrip kiddies and not > systems ppl, but I digress. Identifying something which is "broken as designed" is still helpful, since it indicates what needs to change. >>> Obviously reading all the meta data is a PITA. >> >> Yes. >> >>> Doin 10Gb/jumbos but in this case it don't make much of a hoot of a diff. >> >> Yeah, probably not-- you're almost certainly I/O bound, not network bound. > > Actually it was network bound via 1 rsync process which is why I broke up 154 > dirs into 7 batches of 22 each. Oh. Um, unless you can make more network bandwidth available, you've saturated the bottleneck. Doing a single copy task is likely to complete faster than splitting up the job into subtasks in such a case. Regards, -- -Chuck ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"
Re: copying milllions of small files and millions of dirs
On Aug 15, 2013, at 12:36 PM, Adam Vande More wrote: > On Thu, Aug 15, 2013 at 1:13 PM, aurfalien wrote: > Hi all, > > Is there a faster way to copy files over NFS? > > Remove NFS from the setup. Yea, your mouth to gods ears. My BlueArc is an NFS NAS only box. So no way to get to the data other then NFS. - aurf ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"
Re: copying milllions of small files and millions of dirs
On Thu, Aug 15, 2013 at 1:13 PM, aurfalien wrote: > Hi all, > > Is there a faster way to copy files over NFS? > Remove NFS from the setup. -- Adam Vande More ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"
Re: copying milllions of small files and millions of dirs
On Aug 15, 2013, at 11:13 AM, aurfalien wrote: > Is there a faster way to copy files over NFS? Probably. > Currently breaking up a simple rsync over 7 or so scripts which copies 22 > dirs having ~500,000 dirs or files each. There's a maximum useful concurrency which depends on how many disk spindles and what flavor of RAID is in use; exceeding it will result in thrashing the disks and heavily reducing throughput due to competing I/O requests. Try measuring aggregate performance when running fewer rsyncs at once and see whether it improves. Of course, putting half a million files into a single directory level is also a bad idea, even with dirhash support. You'd do better to break them up into subdirs containing fewer than ~10K files apiece. > Obviously reading all the meta data is a PITA. Yes. > Doin 10Gb/jumbos but in this case it don't make much of a hoot of a diff. Yeah, probably not-- you're almost certainly I/O bound, not network bound. Regards, -- -Chuck ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"
Re: copying milllions of small files and millions of dirs
On Aug 15, 2013, at 11:52 AM, Charles Swiger wrote: > On Aug 15, 2013, at 11:37 AM, aurfalien wrote: >> On Aug 15, 2013, at 11:26 AM, Charles Swiger wrote: >>> On Aug 15, 2013, at 11:13 AM, aurfalien wrote: Is there a faster way to copy files over NFS? >>> >>> Probably. >> >> Ok, thanks for the specifics. > > You're most welcome. > Currently breaking up a simple rsync over 7 or so scripts which copies 22 dirs having ~500,000 dirs or files each. >>> >>> There's a maximum useful concurrency which depends on how many disk >>> spindles and what flavor of RAID is in use; exceeding it will result in >>> thrashing the disks and heavily reducing throughput due to competing I/O >>> requests. Try measuring aggregate performance when running fewer rsyncs at >>> once and see whether it improves. >> >> Its 35 disks broken into 7 striped RaidZ groups with an SLC based ZIL and no >> atime, the server it self has 128GB ECC RAM. I didn't have time to tune or >> really learn ZFS but at this point its only backing up the data for >> emergency purposes. > > OK. If you've got 7 independent groups and can use separate network pipes > for each parallel copy, then using 7 simultaneous scripts is likely > reasonable. > >>> Of course, putting half a million files into a single directory level is >>> also a bad idea, even with dirhash support. You'd do better to break them >>> up into subdirs containing fewer than ~10K files apiece. >> >> I can't, thats our job structure obviously developed by scrip kiddies and >> not systems ppl, but I digress. > > Identifying something which is "broken as designed" is still helpful, since > it indicates what needs to change. > Obviously reading all the meta data is a PITA. >>> >>> Yes. >>> Doin 10Gb/jumbos but in this case it don't make much of a hoot of a diff. >>> >>> Yeah, probably not-- you're almost certainly I/O bound, not network bound. >> >> Actually it was network bound via 1 rsync process which is why I broke up >> 154 dirs into 7 batches of 22 each. > > Oh. Um, unless you can make more network bandwidth available, you've > saturated the bottleneck. > Doing a single copy task is likely to complete faster than splitting up the > job into subtasks in such a case. Well, using iftop, I am now at least able to get ~1Gb with 7 scripts going were before it was in the 10Ms with 1. Also, physically looking at my ZFS server, it now shows the drives lights are blinking faster, like every second. Were as before it was sort of seldom, like every 3 seconds or so. I was thinking to perhaps zip dirs up and then xfer the file over but it would prolly take as long to zip/unzip. This bloody project structure we have is nuts. - aurf ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"
Re: copying milllions of small files and millions of dirs
On Aug 15, 2013, at 11:26 AM, Charles Swiger wrote: > On Aug 15, 2013, at 11:13 AM, aurfalien wrote: >> Is there a faster way to copy files over NFS? > > Probably. Ok, thanks for the specifics. >> Currently breaking up a simple rsync over 7 or so scripts which copies 22 >> dirs having ~500,000 dirs or files each. > > There's a maximum useful concurrency which depends on how many disk spindles > and what flavor of RAID is in use; exceeding it will result in thrashing the > disks and heavily reducing throughput due to competing I/O requests. Try > measuring aggregate performance when running fewer rsyncs at once and see > whether it improves. Its 35 disks broken into 7 striped RaidZ groups with an SLC based ZIL and no atime, the server it self has 128GB ECC RAM. I didn't have time to tune or really learn ZFS but at this point its only backing up the data for emergency purposes. > Of course, putting half a million files into a single directory level is also > a bad idea, even with dirhash support. You'd do better to break them up into > subdirs containing fewer than ~10K files apiece. I can't, thats our job structure obviously developed by scrip kiddies and not systems ppl, but I digress. >> Obviously reading all the meta data is a PITA. > > Yes. > >> Doin 10Gb/jumbos but in this case it don't make much of a hoot of a diff. > > Yeah, probably not-- you're almost certainly I/O bound, not network bound. Actually it was network bound via 1 rsync process which is why I broke up 154 dirs into 7 batches of 22 each. I'll have to acquaint myself with ZFS centric tools to help me determine whats going on. But ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"
copying milllions of small files and millions of dirs
Hi all, Is there a faster way to copy files over NFS? Currently breaking up a simple rsync over 7 or so scripts which copies 22 dirs having ~500,000 dirs or files each. Obviously reading all the meta data is a PITA. Doin 10Gb/jumbos but in this case it don't make much of a hoot of a diff. Going from a 38TB used, 50TB total BlueArc Titan 3200 to a new shiny 80TB total FreeBSD 9.2RC1 ZFS bad boy. Thanks in advance, - aurf ___ freebsd-questions@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-questions To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"