Re: copying milllions of small files and millions of dirs

2013-10-10 Thread Warren Block

On Thu, 10 Oct 2013, aurfalien wrote:



On Aug 15, 2013, at 11:46 PM, Nicolas KOWALSKI wrote:


On Thu, Aug 15, 2013 at 11:13:25AM -0700, aurfalien wrote:

Is there a faster way to copy files over NFS?


I would use find+cpio. This handles hard links, permissions, and in case
of later runs, will not copy files if they already exist on the
destination.

# cd /source/dir
# find . | cpio -pvdm /destination/dir



Old thread I know but cpio has proven twice as fast as rsync.

Trusty ol cpio.

Gonna try cpdup next.


Try sysutils/clone, too.
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"


Re: copying milllions of small files and millions of dirs

2013-10-10 Thread aurfalien

On Aug 15, 2013, at 11:46 PM, Nicolas KOWALSKI wrote:

> On Thu, Aug 15, 2013 at 11:13:25AM -0700, aurfalien wrote:
>> Is there a faster way to copy files over NFS?
> 
> I would use find+cpio. This handles hard links, permissions, and in case 
> of later runs, will not copy files if they already exist on the 
> destination.
> 
> # cd /source/dir
> # find . | cpio -pvdm /destination/dir


Old thread I know but cpio has proven twice as fast as rsync.

Trusty ol cpio.

Gonna try cpdup next.

- aurf
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"


Re: copying milllions of small files and millions of dirs

2013-08-20 Thread Warren Block

On Mon, 19 Aug 2013, Mark Felder wrote:


On Fri, Aug 16, 2013, at 1:46, Nicolas KOWALSKI wrote:

On Thu, Aug 15, 2013 at 11:13:25AM -0700, aurfalien wrote:

Is there a faster way to copy files over NFS?


I would use find+cpio. This handles hard links, permissions, and in case
of later runs, will not copy files if they already exist on the
destination.

# cd /source/dir
# find . | cpio -pvdm /destination/dir



I always found sysutils/cpdup to be faster than rsync.


sysutils/clone may do better as well.
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"


Re: copying milllions of small files and millions of dirs

2013-08-20 Thread Frank Leonhardt

On 20/08/2013 08:32, krad wrote:

When i migrated a large mailspool in maildir format from the old nfs server
to the new one in a previous job, I 1st generated a list of the top level
maildirs. I then generated the rsync commands + plus a few other bits and
pieces for each maildir to make a single transaction like function. I then
pumped all this auto generated scripts into xjobs and ran them in parallel.
This vastly speeded up the process as sequentially running the tree was far
to slow. THis was for about 15 million maildirs in a hashed structure btw
so a fair amount of files.


eg

find /maildir -type d -maxdepth 4 | while read d
do
r=$(($RANDOM*$RANDOM))
echo rsync -a $d/ /newpath/$d/ > /tmp/scripts/$r
echo some other stuff >> /tmp/scripts/$r
done

ls /tmp/scripts/| while read f
echo /tmp/scripts/$f
done | xjobs -j 20



This isn't what I'd have expected, as running operations in parallel on 
mechanical drives would normally result in superfluous head movements 
and thus exacerbate the I/O bottleneck. The system must be optimising 
the requests from 20 parallel jobs better than I thought it would to 
climb out from that hole far enough to get a net benefit. Did you 
remember how any other approaches performed?


___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"


Re: copying milllions of small files and millions of dirs

2013-08-20 Thread krad
whops that should have been

ls /tmp/scripts/| while read f
echo sh /tmp/scripts/$f
done | xjobs -j 20


On 20 August 2013 08:32, krad  wrote:

> When i migrated a large mailspool in maildir format from the old nfs
> server to the new one in a previous job, I 1st generated a list of the top
> level maildirs. I then generated the rsync commands + plus a few other bits
> and pieces for each maildir to make a single transaction like function. I
> then pumped all this auto generated scripts into xjobs and ran them in
> parallel. This vastly speeded up the process as sequentially running the
> tree was far to slow. THis was for about 15 million maildirs in a hashed
> structure btw so a fair amount of files.
>
>
> eg
>
> find /maildir -type d -maxdepth 4 | while read d
> do
> r=$(($RANDOM*$RANDOM))
> echo rsync -a $d/ /newpath/$d/ > /tmp/scripts/$r
> echo some other stuff >> /tmp/scripts/$r
> done
>
> ls /tmp/scripts/| while read f
> echo /tmp/scripts/$f
> done | xjobs -j 20
>
>
>
>
>
>
>
>
>
>
> On 19 August 2013 18:52, aurfalien  wrote:
>
>>
>> On Aug 19, 2013, at 10:41 AM, Mark Felder wrote:
>>
>> > On Fri, Aug 16, 2013, at 1:46, Nicolas KOWALSKI wrote:
>> >> On Thu, Aug 15, 2013 at 11:13:25AM -0700, aurfalien wrote:
>> >>> Is there a faster way to copy files over NFS?
>> >>
>> >> I would use find+cpio. This handles hard links, permissions, and in
>> case
>> >> of later runs, will not copy files if they already exist on the
>> >> destination.
>> >>
>> >> # cd /source/dir
>> >> # find . | cpio -pvdm /destination/dir
>> >>
>> >
>> > I always found sysutils/cpdup to be faster than rsync.
>>
>> Ah, bookmarking this one.
>>
>> Many thanks.
>>
>> - aurf
>> ___
>> freebsd-questions@freebsd.org mailing list
>> http://lists.freebsd.org/mailman/listinfo/freebsd-questions
>> To unsubscribe, send any mail to "
>> freebsd-questions-unsubscr...@freebsd.org"
>>
>
>
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"


Re: copying milllions of small files and millions of dirs

2013-08-20 Thread krad
When i migrated a large mailspool in maildir format from the old nfs server
to the new one in a previous job, I 1st generated a list of the top level
maildirs. I then generated the rsync commands + plus a few other bits and
pieces for each maildir to make a single transaction like function. I then
pumped all this auto generated scripts into xjobs and ran them in parallel.
This vastly speeded up the process as sequentially running the tree was far
to slow. THis was for about 15 million maildirs in a hashed structure btw
so a fair amount of files.


eg

find /maildir -type d -maxdepth 4 | while read d
do
r=$(($RANDOM*$RANDOM))
echo rsync -a $d/ /newpath/$d/ > /tmp/scripts/$r
echo some other stuff >> /tmp/scripts/$r
done

ls /tmp/scripts/| while read f
echo /tmp/scripts/$f
done | xjobs -j 20










On 19 August 2013 18:52, aurfalien  wrote:

>
> On Aug 19, 2013, at 10:41 AM, Mark Felder wrote:
>
> > On Fri, Aug 16, 2013, at 1:46, Nicolas KOWALSKI wrote:
> >> On Thu, Aug 15, 2013 at 11:13:25AM -0700, aurfalien wrote:
> >>> Is there a faster way to copy files over NFS?
> >>
> >> I would use find+cpio. This handles hard links, permissions, and in case
> >> of later runs, will not copy files if they already exist on the
> >> destination.
> >>
> >> # cd /source/dir
> >> # find . | cpio -pvdm /destination/dir
> >>
> >
> > I always found sysutils/cpdup to be faster than rsync.
>
> Ah, bookmarking this one.
>
> Many thanks.
>
> - aurf
> ___
> freebsd-questions@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-questions
> To unsubscribe, send any mail to "
> freebsd-questions-unsubscr...@freebsd.org"
>
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"


Re: copying milllions of small files and millions of dirs

2013-08-19 Thread aurfalien

On Aug 19, 2013, at 10:41 AM, Mark Felder wrote:

> On Fri, Aug 16, 2013, at 1:46, Nicolas KOWALSKI wrote:
>> On Thu, Aug 15, 2013 at 11:13:25AM -0700, aurfalien wrote:
>>> Is there a faster way to copy files over NFS?
>> 
>> I would use find+cpio. This handles hard links, permissions, and in case 
>> of later runs, will not copy files if they already exist on the 
>> destination.
>> 
>> # cd /source/dir
>> # find . | cpio -pvdm /destination/dir
>> 
> 
> I always found sysutils/cpdup to be faster than rsync.

Ah, bookmarking this one.

Many thanks.

- aurf
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"


Re: copying milllions of small files and millions of dirs

2013-08-19 Thread Mark Felder
On Fri, Aug 16, 2013, at 1:46, Nicolas KOWALSKI wrote:
> On Thu, Aug 15, 2013 at 11:13:25AM -0700, aurfalien wrote:
> > Is there a faster way to copy files over NFS?
> 
> I would use find+cpio. This handles hard links, permissions, and in case 
> of later runs, will not copy files if they already exist on the 
> destination.
> 
> # cd /source/dir
> # find . | cpio -pvdm /destination/dir
> 

I always found sysutils/cpdup to be faster than rsync.
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"


Re: copying milllions of small files and millions of dirs

2013-08-15 Thread Nicolas KOWALSKI
On Thu, Aug 15, 2013 at 11:13:25AM -0700, aurfalien wrote:
> Is there a faster way to copy files over NFS?

I would use find+cpio. This handles hard links, permissions, and in case 
of later runs, will not copy files if they already exist on the 
destination.

# cd /source/dir
# find . | cpio -pvdm /destination/dir

-- 
Nicolas
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"


Re: copying milllions of small files and millions of dirs

2013-08-15 Thread iamatt
I would use ndmp.  That is how we  archive our  nas crap  isilon stuff but
we have the backend accelerators   Not sure if there is ndmp for FreeBSD.
Like another poster said   you are most likely i/o bound anyway.


On Thu, Aug 15, 2013 at 2:14 PM, aurfalien  wrote:

>
> On Aug 15, 2013, at 11:52 AM, Charles Swiger wrote:
>
> > On Aug 15, 2013, at 11:37 AM, aurfalien  wrote:
> >> On Aug 15, 2013, at 11:26 AM, Charles Swiger wrote:
> >>> On Aug 15, 2013, at 11:13 AM, aurfalien  wrote:
>  Is there a faster way to copy files over NFS?
> >>>
> >>> Probably.
> >>
> >> Ok, thanks for the specifics.
> >
> > You're most welcome.
> >
>  Currently breaking up a simple rsync over 7 or so scripts which
> copies 22 dirs having ~500,000 dirs or files each.
> >>>
> >>> There's a maximum useful concurrency which depends on how many disk
> spindles and what flavor of RAID is in use; exceeding it will result in
> thrashing the disks and heavily reducing throughput due to competing I/O
> requests.  Try measuring aggregate performance when running fewer rsyncs at
> once and see whether it improves.
> >>
> >> Its 35 disks broken into 7 striped RaidZ groups with an SLC based ZIL
> and no atime, the server it self has 128GB ECC RAM.  I didn't have time to
> tune or really learn ZFS but at this point its only backing up the data for
> emergency purposes.
> >
> > OK.  If you've got 7 independent groups and can use separate network
> pipes for each parallel copy, then using 7 simultaneous scripts is likely
> reasonable.
> >
> >>> Of course, putting half a million files into a single directory level
> is also a bad idea, even with dirhash support.  You'd do better to break
> them up into subdirs containing fewer than ~10K files apiece.
> >>
> >> I can't, thats our job structure obviously developed by scrip kiddies
> and not systems ppl, but I digress.
> >
> > Identifying something which is "broken as designed" is still helpful,
> since it indicates what needs to change.
> >
>  Obviously reading all the meta data is a PITA.
> >>>
> >>> Yes.
> >>>
>  Doin 10Gb/jumbos but in this case it don't make much of a hoot of a
> diff.
> >>>
> >>> Yeah, probably not-- you're almost certainly I/O bound, not network
> bound.
> >>
> >> Actually it was network bound via 1 rsync process which is why I broke
> up 154 dirs into 7 batches of 22 each.
> >
> > Oh.  Um, unless you can make more network bandwidth available, you've
> saturated the bottleneck.
> > Doing a single copy task is likely to complete faster than splitting up
> the job into subtasks in such a case.
>
> Well, using iftop, I am now at least able to get ~1Gb with 7 scripts going
> were before it was in the 10Ms with 1.
>
> Also, physically looking at my ZFS server, it now shows the drives lights
> are blinking faster, like every second.  Were as before it was sort of
> seldom, like every 3 seconds or so.
>
> I was thinking to perhaps zip dirs up and then xfer the file over but it
> would prolly take as long to zip/unzip.
>
> This bloody project structure we have is nuts.
>
> - aurf
> ___
> freebsd-questions@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-questions
> To unsubscribe, send any mail to "
> freebsd-questions-unsubscr...@freebsd.org"
>
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"


Re: copying milllions of small files and millions of dirs

2013-08-15 Thread aurfalien

On Aug 15, 2013, at 1:35 PM, Roland Smith wrote:

> On Thu, Aug 15, 2013 at 11:13:25AM -0700, aurfalien wrote:
>> Hi all,
>> 
>> Is there a faster way to copy files over NFS?
> 
> Can you log into your NAS with ssh or telnet?

I can but thats a back channel link of 100Mb link.

- aurf
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"


Re: copying milllions of small files and millions of dirs

2013-08-15 Thread aurfalien

On Aug 15, 2013, at 1:22 PM, Charles Swiger wrote:

> [ ...combining replies for brevity... ]
> 
> On Aug 15, 2013, at 1:02 PM, Frank Leonhardt  wrote:
>> I'm reading all this with interest. The first thing I'd have tried would be 
>> tar (and probably netcat) but I'm a probably bit of a dinosaur. (If someone 
>> wants to buy me some really big drives I promise I'll update). If it's 
>> really NFS or nothing I guess you couldn't open a socket anyway.
> 
> Either tar via netcat or SSH, or dump / restore via similar pipeline are 
> quite traditional.  tar is more flexible for partial filesystem copies, 
> whereas the dump / restore is more oriented towards complete filesystem 
> copies.  If the destination starts off empty, they're probably faster than 
> rsync, but rsync does delta updates which is a huge win if you're going to be 
> copying changes onto a slightly older version.

Yep, so looks like it is what it is as the data set is changing while I do the 
base sync.  So I'll have to do several more to pick up new comers etc...

> Anyway, you're entirely right that the capabilities of the source matter a 
> great deal.
> If it could do zfs send / receive, or similar snapshot mirroring, that would 
> likely do better than userland tools.
> 
>> I'd be interested to know whether tar is still worth using in this world of 
>> volume managers and SMP.
> 
> Yes.
> 
> On Aug 15, 2013, at 12:14 PM, aurfalien  wrote:
> [ ... ]
>> Doin 10Gb/jumbos but in this case it don't make much of a hoot of a diff.
> 
> Yeah, probably not-- you're almost certainly I/O bound, not network bound.
 
 Actually it was network bound via 1 rsync process which is why I broke up 
 154 dirs into 7 batches of 22 each.
>>> 
>>> Oh.  Um, unless you can make more network bandwidth available, you've 
>>> saturated the bottleneck.
>>> Doing a single copy task is likely to complete faster than splitting up the 
>>> job into subtasks in such a case.
>> 
>> Well, using iftop, I am now at least able to get ~1Gb with 7 scripts going 
>> were before it was in the 10Ms with 1.
> 
> 1 gigabyte of data per second is pretty decent for a 10Gb link; 10 MB/s 
> obviously wasn't close saturating a 10Gb link.

Cool.  Looks like I am doing my best which is what I wanted to know.  I chose 
to do 7 rsync scripts as it evenly divides into 154 parent dirs :)

You should see how our backup system deal with this; Atempo Time Navigator or 
Tina as its called.

It takes an hour just to lay down the dirs on tape before even starting to 
backup, crazyness.  And thats just for 1 parent dir having an avg of 500,000 
dirs.  Actually I'm prolly wrong as the initial creation is 125,000 dirs, of 
which a few are sym links.

Then it grows from there.  Looking at the Tina stats, we see a million objects 
or more.

- aurf
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"


Re: copying milllions of small files and millions of dirs

2013-08-15 Thread Roland Smith
On Thu, Aug 15, 2013 at 11:13:25AM -0700, aurfalien wrote:
> Hi all,
> 
> Is there a faster way to copy files over NFS?

Can you log into your NAS with ssh or telnet?

It so I would suggest using tar(1) and nc(1). It has been a while since I
measured it, but IIRC the combination of tar (without compression) and netcat
could saturate a 100 Mbit ethernet connection.

Roland
-- 
R.F.Smith   http://rsmith.home.xs4all.nl/
[plain text _non-HTML_ PGP/GnuPG encrypted/signed email much appreciated]
pgp: 1A2B 477F 9970 BA3C 2914  B7CE 1277 EFB0 C321 A725 (KeyID: C321A725)


pgpQj6wccNv4f.pgp
Description: PGP signature


Re: copying milllions of small files and millions of dirs

2013-08-15 Thread Charles Swiger
[ ...combining replies for brevity... ]

On Aug 15, 2013, at 1:02 PM, Frank Leonhardt  wrote:
> I'm reading all this with interest. The first thing I'd have tried would be 
> tar (and probably netcat) but I'm a probably bit of a dinosaur. (If someone 
> wants to buy me some really big drives I promise I'll update). If it's really 
> NFS or nothing I guess you couldn't open a socket anyway.

Either tar via netcat or SSH, or dump / restore via similar pipeline are quite 
traditional.  tar is more flexible for partial filesystem copies, whereas the 
dump / restore is more oriented towards complete filesystem copies.  If the 
destination starts off empty, they're probably faster than rsync, but rsync 
does delta updates which is a huge win if you're going to be copying changes 
onto a slightly older version.

Anyway, you're entirely right that the capabilities of the source matter a 
great deal.
If it could do zfs send / receive, or similar snapshot mirroring, that would 
likely do better than userland tools.

> I'd be interested to know whether tar is still worth using in this world of 
> volume managers and SMP.

Yes.

On Aug 15, 2013, at 12:14 PM, aurfalien  wrote:
[ ... ]
> Doin 10Gb/jumbos but in this case it don't make much of a hoot of a diff.
 
 Yeah, probably not-- you're almost certainly I/O bound, not network bound.
>>> 
>>> Actually it was network bound via 1 rsync process which is why I broke up 
>>> 154 dirs into 7 batches of 22 each.
>> 
>> Oh.  Um, unless you can make more network bandwidth available, you've 
>> saturated the bottleneck.
>> Doing a single copy task is likely to complete faster than splitting up the 
>> job into subtasks in such a case.
> 
> Well, using iftop, I am now at least able to get ~1Gb with 7 scripts going 
> were before it was in the 10Ms with 1.

1 gigabyte of data per second is pretty decent for a 10Gb link; 10 MB/s 
obviously wasn't close saturating a 10Gb link.

Regards,
-- 
-Chuck

___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"


Re: copying milllions of small files and millions of dirs

2013-08-15 Thread Frank Leonhardt

On 15/08/2013 19:13, aurfalien wrote:

Hi all,

Is there a faster way to copy files over NFS?

Currently breaking up a simple rsync over 7 or so scripts which copies 22 dirs 
having ~500,000 dirs or files each.



I'm reading all this with interest. The first thing I'd have tried would 
be tar (and probably netcat) but I'm a probably bit of a dinosaur. (If 
someone wants to buy me some really big drives I promise I'll update). 
If it's really NFS or nothing I guess you couldn't open a socket anyway.


I'd be interested to know whether tar is still worth using in this world 
of volume managers and SMP.


___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"


Re: copying milllions of small files and millions of dirs

2013-08-15 Thread Charles Swiger
On Aug 15, 2013, at 11:37 AM, aurfalien  wrote:
> On Aug 15, 2013, at 11:26 AM, Charles Swiger wrote:
>> On Aug 15, 2013, at 11:13 AM, aurfalien  wrote:
>>> Is there a faster way to copy files over NFS?
>> 
>> Probably.
> 
> Ok, thanks for the specifics.

You're most welcome.

>>> Currently breaking up a simple rsync over 7 or so scripts which copies 22 
>>> dirs having ~500,000 dirs or files each.
>> 
>> There's a maximum useful concurrency which depends on how many disk spindles 
>> and what flavor of RAID is in use; exceeding it will result in thrashing the 
>> disks and heavily reducing throughput due to competing I/O requests.  Try 
>> measuring aggregate performance when running fewer rsyncs at once and see 
>> whether it improves.
> 
> Its 35 disks broken into 7 striped RaidZ groups with an SLC based ZIL and no 
> atime, the server it self has 128GB ECC RAM.  I didn't have time to tune or 
> really learn ZFS but at this point its only backing up the data for emergency 
> purposes.

OK.  If you've got 7 independent groups and can use separate network pipes for 
each parallel copy, then using 7 simultaneous scripts is likely reasonable.

>> Of course, putting half a million files into a single directory level is 
>> also a bad idea, even with dirhash support.  You'd do better to break them 
>> up into subdirs containing fewer than ~10K files apiece.
> 
> I can't, thats our job structure obviously developed by scrip kiddies and not 
> systems ppl, but I digress.

Identifying something which is "broken as designed" is still helpful, since it 
indicates what needs to change.

>>> Obviously reading all the meta data is a PITA.
>> 
>> Yes.
>> 
>>> Doin 10Gb/jumbos but in this case it don't make much of a hoot of a diff.
>> 
>> Yeah, probably not-- you're almost certainly I/O bound, not network bound.
> 
> Actually it was network bound via 1 rsync process which is why I broke up 154 
> dirs into 7 batches of 22 each.

Oh.  Um, unless you can make more network bandwidth available, you've saturated 
the bottleneck.
Doing a single copy task is likely to complete faster than splitting up the job 
into subtasks in such a case.

Regards,
-- 
-Chuck

___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"


Re: copying milllions of small files and millions of dirs

2013-08-15 Thread aurfalien

On Aug 15, 2013, at 12:36 PM, Adam Vande More wrote:

> On Thu, Aug 15, 2013 at 1:13 PM, aurfalien  wrote:
> Hi all,
> 
> Is there a faster way to copy files over NFS?
> 
> Remove NFS from the setup.  

Yea, your mouth to gods ears.

My BlueArc is an NFS NAS only box.

So no way to get to the data other then NFS.

- aurf
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"


Re: copying milllions of small files and millions of dirs

2013-08-15 Thread Adam Vande More
On Thu, Aug 15, 2013 at 1:13 PM, aurfalien  wrote:

> Hi all,
>
> Is there a faster way to copy files over NFS?
>

Remove NFS from the setup.



-- 
Adam Vande More
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"


Re: copying milllions of small files and millions of dirs

2013-08-15 Thread Charles Swiger
On Aug 15, 2013, at 11:13 AM, aurfalien  wrote:
> Is there a faster way to copy files over NFS?

Probably.

> Currently breaking up a simple rsync over 7 or so scripts which copies 22 
> dirs having ~500,000 dirs or files each.

There's a maximum useful concurrency which depends on how many disk spindles 
and what flavor of RAID is in use; exceeding it will result in thrashing the 
disks and heavily reducing throughput due to competing I/O requests.  Try 
measuring aggregate performance when running fewer rsyncs at once and see 
whether it improves.

Of course, putting half a million files into a single directory level is also a 
bad idea, even with dirhash support.  You'd do better to break them up into 
subdirs containing fewer than ~10K files apiece.

> Obviously reading all the meta data is a PITA.

Yes.

> Doin 10Gb/jumbos but in this case it don't make much of a hoot of a diff.

Yeah, probably not-- you're almost certainly I/O bound, not network bound.

Regards,
-- 
-Chuck

___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"


Re: copying milllions of small files and millions of dirs

2013-08-15 Thread aurfalien

On Aug 15, 2013, at 11:52 AM, Charles Swiger wrote:

> On Aug 15, 2013, at 11:37 AM, aurfalien  wrote:
>> On Aug 15, 2013, at 11:26 AM, Charles Swiger wrote:
>>> On Aug 15, 2013, at 11:13 AM, aurfalien  wrote:
 Is there a faster way to copy files over NFS?
>>> 
>>> Probably.
>> 
>> Ok, thanks for the specifics.
> 
> You're most welcome.
> 
 Currently breaking up a simple rsync over 7 or so scripts which copies 22 
 dirs having ~500,000 dirs or files each.
>>> 
>>> There's a maximum useful concurrency which depends on how many disk 
>>> spindles and what flavor of RAID is in use; exceeding it will result in 
>>> thrashing the disks and heavily reducing throughput due to competing I/O 
>>> requests.  Try measuring aggregate performance when running fewer rsyncs at 
>>> once and see whether it improves.
>> 
>> Its 35 disks broken into 7 striped RaidZ groups with an SLC based ZIL and no 
>> atime, the server it self has 128GB ECC RAM.  I didn't have time to tune or 
>> really learn ZFS but at this point its only backing up the data for 
>> emergency purposes.
> 
> OK.  If you've got 7 independent groups and can use separate network pipes 
> for each parallel copy, then using 7 simultaneous scripts is likely 
> reasonable.
> 
>>> Of course, putting half a million files into a single directory level is 
>>> also a bad idea, even with dirhash support.  You'd do better to break them 
>>> up into subdirs containing fewer than ~10K files apiece.
>> 
>> I can't, thats our job structure obviously developed by scrip kiddies and 
>> not systems ppl, but I digress.
> 
> Identifying something which is "broken as designed" is still helpful, since 
> it indicates what needs to change.
> 
 Obviously reading all the meta data is a PITA.
>>> 
>>> Yes.
>>> 
 Doin 10Gb/jumbos but in this case it don't make much of a hoot of a diff.
>>> 
>>> Yeah, probably not-- you're almost certainly I/O bound, not network bound.
>> 
>> Actually it was network bound via 1 rsync process which is why I broke up 
>> 154 dirs into 7 batches of 22 each.
> 
> Oh.  Um, unless you can make more network bandwidth available, you've 
> saturated the bottleneck.
> Doing a single copy task is likely to complete faster than splitting up the 
> job into subtasks in such a case.

Well, using iftop, I am now at least able to get ~1Gb with 7 scripts going were 
before it was in the 10Ms with 1.

Also, physically looking at my ZFS server, it now shows the drives lights are 
blinking faster, like every second.  Were as before it was sort of seldom, like 
every 3 seconds or so.

I was thinking to perhaps zip dirs up and then xfer the file over but it would 
prolly take as long to zip/unzip.

This bloody project structure we have is nuts.

- aurf
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"


Re: copying milllions of small files and millions of dirs

2013-08-15 Thread aurfalien

On Aug 15, 2013, at 11:26 AM, Charles Swiger wrote:

> On Aug 15, 2013, at 11:13 AM, aurfalien  wrote:
>> Is there a faster way to copy files over NFS?
> 
> Probably.

Ok, thanks for the specifics.

>> Currently breaking up a simple rsync over 7 or so scripts which copies 22 
>> dirs having ~500,000 dirs or files each.
> 
> There's a maximum useful concurrency which depends on how many disk spindles 
> and what flavor of RAID is in use; exceeding it will result in thrashing the 
> disks and heavily reducing throughput due to competing I/O requests.  Try 
> measuring aggregate performance when running fewer rsyncs at once and see 
> whether it improves.

Its 35 disks broken into 7 striped RaidZ groups with an SLC based ZIL and no 
atime, the server it self has 128GB ECC RAM.  I didn't have time to tune or 
really learn ZFS but at this point its only backing up the data for emergency 
purposes.

> Of course, putting half a million files into a single directory level is also 
> a bad idea, even with dirhash support.  You'd do better to break them up into 
> subdirs containing fewer than ~10K files apiece.

I can't, thats our job structure obviously developed by scrip kiddies and not 
systems ppl, but I digress.

>> Obviously reading all the meta data is a PITA.
> 
> Yes.
> 
>> Doin 10Gb/jumbos but in this case it don't make much of a hoot of a diff.
> 
> Yeah, probably not-- you're almost certainly I/O bound, not network bound.

Actually it was network bound via 1 rsync process which is why I broke up 154 
dirs into 7 batches of 22 each.

I'll have to acquaint myself with ZFS centric tools to help me determine whats 
going on.

But 


___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"


copying milllions of small files and millions of dirs

2013-08-15 Thread aurfalien
Hi all,

Is there a faster way to copy files over NFS?

Currently breaking up a simple rsync over 7 or so scripts which copies 22 dirs 
having ~500,000 dirs or files each.

Obviously reading all the meta data is a PITA.

Doin 10Gb/jumbos but in this case it don't make much of a hoot of a diff.

Going from a 38TB used, 50TB total BlueArc Titan 3200 to a new shiny 80TB total 
FreeBSD 9.2RC1 ZFS bad boy.

Thanks in advance,

- aurf



___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to "freebsd-questions-unsubscr...@freebsd.org"