Re: [zfs-discuss] slow zfs send/recv speed

2011-11-17 Thread Richard Elling
On Nov 16, 2011, at 7:35 AM, David Dyer-Bennet wrote:

 
 On Tue, November 15, 2011 17:05, Anatoly wrote:
 Good day,
 
 The speed of send/recv is around 30-60 MBytes/s for initial send and
 17-25 MBytes/s for incremental. I have seen lots of setups with 1 disk
 to 100+ disks in pool. But the speed doesn't vary in any degree. As I
 understand 'zfs send' is a limiting factor. I did tests by sending to
 /dev/null. It worked out too slow and absolutely not scalable.
 None of cpu/memory/disk activity were in peak load, so there is of room
 for improvement.
 
 What you're probably seeing with incremental sends is that the disks being
 read are hitting their IOPS limits.  Zfs send does random reads all over
 the place -- every block that's changed since the last incremental send is
 read, in TXG order.  So that's essentially random reads all of the disk.

Not necessarily. I've seen sustained zfs sends in the 600+ MB/sec range
for modest servers. It does depend on how the data is used more than the 
hardware it is stored upon.
 -- richard

-- 

ZFS and performance consulting
http://www.RichardElling.com
LISA '11, Boston, MA, December 4-9 














___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] slow zfs send/recv speed

2011-11-16 Thread David Dyer-Bennet

On Tue, November 15, 2011 20:08, Edward Ned Harvey wrote:
 From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
 boun...@opensolaris.org] On Behalf Of Anatoly

 The speed of send/recv is around 30-60 MBytes/s for initial send and
 17-25 MBytes/s for incremental. I have seen lots of setups with 1 disk

 I suggest watching zpool iostat before, during, and after the send to
 /dev/null.  Actually, I take that back - zpool iostat seems to measure
 virtual IOPS, as I just did this on my laptop a minute ago, I saw 1.2k
 ops,
 which is at least 5-6x higher than my hard drive can handle, which can
 only
 mean it's reading a lot of previously aggregated small blocks from disk,
 which are now sequentially organized on disk.  How do you measure physical
 iops?  Is it just regular iostat?  I have seriously put zero effort into
 answering this question (sorry.)

 I have certainly noticed a delay in the beginning, while the system thinks
 about stuff for a little while to kick off an incremental... And it's
 acknowledged and normal that incrementals are likely fragmented all over
 the
 place so you could be IOPS limited (hence watching the iostat).

 Also, whenever I sit and watch it for long times, I see that it varies
 enormously.  For 5 minutes it will be (some speed), and for 5 minutes it
 will be 5x higher...

 Whatever it is, it's something we likely are all seeing, but probably just
 ignoring.  If you can find it in your heart to just ignore it too, then
 great, no problem.  ;-)  Otherwise, it's a matter of digging in and
 characterizing to learn more about it.

I see rather variable io stats while sending incremental backups.  The
receiver is a USB disk, so fairly slow, but I get 30MB/s in a good
stretch.  I'm compressing the ZFS filesystem on the receiving end, but
much of my content is already-compressed photo files, so it doesn't make a
huge difference.   Helps some, though, and at 30MB/s there's no shortage
of CPU horsepower to handle the compression.

The raw files are around 12MB each, probably not fragmented much (they're
just copied over from memory cards).  For a small number of the files,
there's a photoshop file that's much bigger (sometimes more than 1GB, if
it's a stitched panorama with layers of changes).  And then there are
sidecar XMP files, mostly two per image, and for most of them
web-resolution images, 100kB.

-- 
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] slow zfs send/recv speed

2011-11-16 Thread David Dyer-Bennet

On Tue, November 15, 2011 17:05, Anatoly wrote:
 Good day,

 The speed of send/recv is around 30-60 MBytes/s for initial send and
 17-25 MBytes/s for incremental. I have seen lots of setups with 1 disk
 to 100+ disks in pool. But the speed doesn't vary in any degree. As I
 understand 'zfs send' is a limiting factor. I did tests by sending to
 /dev/null. It worked out too slow and absolutely not scalable.
 None of cpu/memory/disk activity were in peak load, so there is of room
 for improvement.

What you're probably seeing with incremental sends is that the disks being
read are hitting their IOPS limits.  Zfs send does random reads all over
the place -- every block that's changed since the last incremental send is
read, in TXG order.  So that's essentially random reads all of the disk.

-- 
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] slow zfs send/recv speed

2011-11-16 Thread Anatoly


  
  
Good day,

I've just made clean test for sequential data read. System has 45
mirror vdevs.

1. Create 160GB random file.
2. Read it to /dev/null.
3. Do Snaspshot and send it to /dev/null.
4. Compare results.

1. Write speed is slow due to 'urandom':
# dd if=/dev/urandom bs=128k | pv  big_file
161118683136 bytes (161 GB) copied, 3962.15 seconds, 40.7 MB/s

2. Read file normally:
# time dd if=./big_file bs=128k of=/dev/null
161118683136 bytes (161 GB) copied, 103.455 seconds, 1.6 GB/s
real 1m43.459s
user 0m0.899s
sys 1m25.078s

3. Snapshot  send:
# zfs snapshot volume/test@A
# time zfs send volume/test@A  /dev/null
real 7m20.635s
user 0m0.004s
sys 0m52.760s

4. As you see, there is 4 times difference on pure sequential read,
greenhouse conditions.
I repeated tests couple of times to check ARC influence - no much
difference.
Real send speed on this system is around 60 MBytes/s with some 100
peak.
File read operation is good scaled for large number of disks. But
'zfs send' is lame.

In normal conditions moving of large portions of data may take days
to weeks. It can't fill 
10G Ethernet connection, sometimes even 1G.

Best regards,
Anatoly Legkodymov.

On 16.11.2011 06:08, Edward Ned Harvey wrote:

  
From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
boun...@opensolaris.org] On Behalf Of Anatoly

The speed of send/recv is around 30-60 MBytes/s for initial send and
17-25 MBytes/s for incremental. I have seen lots of setups with 1 disk

  
  
I suggest watching zpool iostat before, during, and after the send to
/dev/null.  Actually, I take that back - zpool iostat seems to measure
virtual IOPS, as I just did this on my laptop a minute ago, I saw 1.2k ops,
which is at least 5-6x higher than my hard drive can handle, which can only
mean it's reading a lot of previously aggregated small blocks from disk,
which are now sequentially organized on disk.  How do you measure physical
iops?  Is it just regular iostat?  I have seriously put zero effort into
answering this question (sorry.)

I have certainly noticed a delay in the beginning, while the system thinks
about stuff for a little while to kick off an incremental... And it's
acknowledged and normal that incrementals are likely fragmented all over the
place so you could be IOPS limited (hence watching the iostat).

Also, whenever I sit and watch it for long times, I see that it varies
enormously.  For 5 minutes it will be (some speed), and for 5 minutes it
will be 5x higher...

Whatever it is, it's something we likely are all seeing, but probably just
ignoring.  If you can find it in your heart to just ignore it too, then
great, no problem.  ;-)  Otherwise, it's a matter of digging in and
characterizing to learn more about it.




  

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] slow zfs send/recv speed

2011-11-16 Thread Paul Kraus
On Wed, Nov 16, 2011 at 11:07 AM, Anatoly legko...@fastmail.fm wrote:

 I've just made clean test for sequential data read. System has 45 mirror
 vdevs.

 1. Create 160GB random file.
 2. Read it to /dev/null.
 3. Do Snaspshot and send it to /dev/null.
 4. Compare results.

What OS?

The following is under Solaris 10U9 with CPU_2010-10 + an IDR for a
SAS/SATA drive bug.

I just had to replicate over 20TB of small files, `zfs send -R
zfs@snap | zfs recv -e zfs`, and I got an AVERAGE throughput of
over 77MB/sec. (over 6TB /day). The entire replication took just over
3 days.

The source zpool is on J4400 750GB SATA drives, 110 of them in a
RAIDz2 configuration (22 vdevs of 5 disks each), the target was a pair
of old h/w raid boxes (one without any NVRAM cache) and a zpool
configuration of 6 striped vdevs (a total of 72 drives behind the h/w
raid controller doing raid5, this is temporary and only for moving
data physically around, so the lack of ZFS redundancy is not an
issue).

There are over 2300 snapshots on the source side and we were
replicating close to 2000 of them.

-- 
{1-2-3-4-5-6-7-}
Paul Kraus
- Senior Systems Architect, Garnet River ( http://www.garnetriver.com/ )
- Sound Coordinator, Schenectady Light Opera Company (
http://www.sloctheater.org/ )
- Technical Advisor, RPI Players
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] slow zfs send/recv speed

2011-11-16 Thread Eric D. Mudama

On Wed, Nov 16 at  9:35, David Dyer-Bennet wrote:


On Tue, November 15, 2011 17:05, Anatoly wrote:

Good day,

The speed of send/recv is around 30-60 MBytes/s for initial send and
17-25 MBytes/s for incremental. I have seen lots of setups with 1 disk
to 100+ disks in pool. But the speed doesn't vary in any degree. As I
understand 'zfs send' is a limiting factor. I did tests by sending to
/dev/null. It worked out too slow and absolutely not scalable.
None of cpu/memory/disk activity were in peak load, so there is of room
for improvement.


What you're probably seeing with incremental sends is that the disks being
read are hitting their IOPS limits.  Zfs send does random reads all over
the place -- every block that's changed since the last incremental send is
read, in TXG order.  So that's essentially random reads all of the disk.


Anatoly didn't state whether his 160GB file test was done on a virgin
pool, or whether it was allocated out of an existing pool.  If the
latter, your comment is the likely explanation.  If the former, your
comment wouldn't explain the slow performance.

--eric

--
Eric D. Mudama
edmud...@bounceswoosh.org

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] slow zfs send/recv speed

2011-11-16 Thread Edward Ned Harvey
 From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
 boun...@opensolaris.org] On Behalf Of Anatoly
 
 I've just made clean test for sequential data read. System has 45 mirror
 vdevs.

90 disks in the system...  I bet you have a lot of ram?


 2. Read file normally:
 # time dd if=./big_file bs=128k of=/dev/null
 161118683136 bytes (161 GB) copied, 103.455 seconds, 1.6 GB/s

I wonder how much of that is being read back from cache.  Would it be
impossible to reboot, or otherwise invalidate the cache, before reading the
file back?

With 90 disks, in theory, you should be able to read something like 90Gbit =
11GB / sec.  But of course various bus speed bottlenecks come into play, so
I don't think the 1.6GB/s is unrealistically high in any way.


 3. Snapshot  send:
 # zfs snapshot volume/test@A
 # time zfs send volume/test@A  /dev/null
 real    7m20.635s
 user    0m0.004s
 sys 0m52.760s

This doesn't surprise me, based on gut feel, I don't think zfs send performs
optimally, in general.

I think your results are probably correct, and even if you revisit all this,
doing the reboots (or cache invalidation) and/or using a newly created pool,
as anyone here might suggest...  I think you'll still see the same results.
Somewhat unpredictably.

Even so, I always find zfs send performance still beats the pants off any
alternative... rsync and whatnot.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] slow zfs send/recv speed

2011-11-15 Thread Anatoly

Good day,

The speed of send/recv is around 30-60 MBytes/s for initial send and 
17-25 MBytes/s for incremental. I have seen lots of setups with 1 disk 
to 100+ disks in pool. But the speed doesn't vary in any degree. As I 
understand 'zfs send' is a limiting factor. I did tests by sending to 
/dev/null. It worked out too slow and absolutely not scalable.
None of cpu/memory/disk activity were in peak load, so there is of room 
for improvement.


Is there any bug report or article that addresses this problem? Any 
workaround or solution?


I found these guys have the same result - around 7 Mbytes/s for 'send' 
and 70 Mbytes for 'recv'.

http://wikitech-static.wikimedia.org/articles/z/f/s/Zfs_replication.html

Thank you in advance,
Anatoly Legkodymov.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] slow zfs send/recv speed

2011-11-15 Thread Andrew Gabriel

 On 11/15/11 23:05, Anatoly wrote:

Good day,

The speed of send/recv is around 30-60 MBytes/s for initial send and 
17-25 MBytes/s for incremental. I have seen lots of setups with 1 disk 
to 100+ disks in pool. But the speed doesn't vary in any degree. As I 
understand 'zfs send' is a limiting factor. I did tests by sending to 
/dev/null. It worked out too slow and absolutely not scalable.
None of cpu/memory/disk activity were in peak load, so there is of 
room for improvement.


Is there any bug report or article that addresses this problem? Any 
workaround or solution?


I found these guys have the same result - around 7 Mbytes/s for 'send' 
and 70 Mbytes for 'recv'.

http://wikitech-static.wikimedia.org/articles/z/f/s/Zfs_replication.html


Well, if I do a zfs send/recv over 1Gbit ethernet from a 2 disk mirror, 
the send runs at almost 100Mbytes/sec, so it's pretty much limited by 
the ethernet.


Since you have provided none of the diagnostic data you collected, it's 
difficult to guess what the limiting factor is for you.


--
Andrew Gabriel
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] slow zfs send/recv speed

2011-11-15 Thread Tim Cook
On Tue, Nov 15, 2011 at 5:17 PM, Andrew Gabriel
andrew.gabr...@oracle.comwrote:

  On 11/15/11 23:05, Anatoly wrote:

 Good day,

 The speed of send/recv is around 30-60 MBytes/s for initial send and
 17-25 MBytes/s for incremental. I have seen lots of setups with 1 disk to
 100+ disks in pool. But the speed doesn't vary in any degree. As I
 understand 'zfs send' is a limiting factor. I did tests by sending to
 /dev/null. It worked out too slow and absolutely not scalable.
 None of cpu/memory/disk activity were in peak load, so there is of room
 for improvement.

 Is there any bug report or article that addresses this problem? Any
 workaround or solution?

 I found these guys have the same result - around 7 Mbytes/s for 'send'
 and 70 Mbytes for 'recv'.
 http://wikitech-static.**wikimedia.org/articles/z/f/s/**
 Zfs_replication.htmlhttp://wikitech-static.wikimedia.org/articles/z/f/s/Zfs_replication.html


 Well, if I do a zfs send/recv over 1Gbit ethernet from a 2 disk mirror,
 the send runs at almost 100Mbytes/sec, so it's pretty much limited by the
 ethernet.

 Since you have provided none of the diagnostic data you collected, it's
 difficult to guess what the limiting factor is for you.

 --
 Andrew Gabriel



So all the bugs have been fixed?  I seem to recall people on this mailing
list using mbuff to speed it up because it was so bursty and slow at one
point.  IE:
http://blogs.everycity.co.uk/alasdair/2010/07/using-mbuffer-to-speed-up-slow-zfs-send-zfs-receive/


--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] slow zfs send/recv speed

2011-11-15 Thread Eric D. Mudama

On Wed, Nov 16 at  3:05, Anatoly wrote:

Good day,

The speed of send/recv is around 30-60 MBytes/s for initial send and 
17-25 MBytes/s for incremental. I have seen lots of setups with 1 
disk to 100+ disks in pool. But the speed doesn't vary in any degree. 
As I understand 'zfs send' is a limiting factor. I did tests by 
sending to /dev/null. It worked out too slow and absolutely not 
scalable.
None of cpu/memory/disk activity were in peak load, so there is of 
room for improvement.


My belief is that initial/incremental may be affecting it because of
initial versus incremental efficiency of the data layout in the pools,
not because of something inherent in the send/recv process itself.

There are various send/recv improvements (e.g. don't use SSH as a
tunnel) but even that shouldn't be capping you at 17MBytes/sec.

My incrementals get me ~35MB/s consistently.  Each incremental is
10-50GB worth of transfer.

cheap gig switch, no jumbo frames
Source = 2 mirrored vdevs + l2arc ssd, CPU = xeon E5520, 6GB RAM
Destination = 4-drive raidz1, CPU = c2d E4500 @2.2GHz, 2GB RAM
tunnel is un-tuned SSH

I found these guys have the same result - around 7 Mbytes/s for 
'send' and 70 Mbytes for 'recv'.

http://wikitech-static.wikimedia.org/articles/z/f/s/Zfs_replication.html


Their data doesn't match mine.

--
Eric D. Mudama
edmud...@bounceswoosh.org

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] slow zfs send/recv speed

2011-11-15 Thread Andrew Gabriel

 On 11/15/11 23:40, Tim Cook wrote:
On Tue, Nov 15, 2011 at 5:17 PM, Andrew Gabriel 
andrew.gabr...@oracle.com mailto:andrew.gabr...@oracle.com wrote:


 On 11/15/11 23:05, Anatoly wrote:

Good day,

The speed of send/recv is around 30-60 MBytes/s for initial
send and 17-25 MBytes/s for incremental. I have seen lots of
setups with 1 disk to 100+ disks in pool. But the speed
doesn't vary in any degree. As I understand 'zfs send' is a
limiting factor. I did tests by sending to /dev/null. It
worked out too slow and absolutely not scalable.
None of cpu/memory/disk activity were in peak load, so there
is of room for improvement.

Is there any bug report or article that addresses this
problem? Any workaround or solution?

I found these guys have the same result - around 7 Mbytes/s
for 'send' and 70 Mbytes for 'recv'.
http://wikitech-static.wikimedia.org/articles/z/f/s/Zfs_replication.html


Well, if I do a zfs send/recv over 1Gbit ethernet from a 2 disk
mirror, the send runs at almost 100Mbytes/sec, so it's pretty much
limited by the ethernet.

Since you have provided none of the diagnostic data you collected,
it's difficult to guess what the limiting factor is for you.

-- 
Andrew Gabriel




So all the bugs have been fixed?


Probably not, but the OP's implication that zfs send has a specific rate 
limit in the range suggested is demonstrably untrue. So I don't know 
what's limiting the OP's send rate. (I could guess a few possibilities, 
but that's pointless without the data.)


I seem to recall people on this mailing list using mbuff to speed it 
up because it was so bursty and slow at one point.  IE:

http://blogs.everycity.co.uk/alasdair/2010/07/using-mbuffer-to-speed-up-slow-zfs-send-zfs-receive/



Yes, this idea originally came from me, having analyzed the send/receive 
traffic behavior in combination with network connection behavior. 
However, it's the receive side that's bursty around the TXG commits, not 
the send side, so that doesn't match the issue the OP is seeing. (The 
buffer sizes in that blog are not optimal, although any buffer at the 
receive side will make a significant improvement if the network 
bandwidth is same order of magnitude as the send/recv are capable of.)


--
Andrew Gabriel
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] slow zfs send/recv speed

2011-11-15 Thread Ian Collins

On 11/16/11 01:01 PM, Eric D. Mudama wrote:

On Wed, Nov 16 at  3:05, Anatoly wrote:

Good day,

The speed of send/recv is around 30-60 MBytes/s for initial send and
17-25 MBytes/s for incremental. I have seen lots of setups with 1
disk to 100+ disks in pool. But the speed doesn't vary in any degree.
As I understand 'zfs send' is a limiting factor. I did tests by
sending to /dev/null. It worked out too slow and absolutely not
scalable.
None of cpu/memory/disk activity were in peak load, so there is of
room for improvement.

My belief is that initial/incremental may be affecting it because of
initial versus incremental efficiency of the data layout in the pools,
not because of something inherent in the send/recv process itself.

There are various send/recv improvements (e.g. don't use SSH as a
tunnel) but even that shouldn't be capping you at 17MBytes/sec.

My incrementals get me ~35MB/s consistently.  Each incremental is
10-50GB worth of transfer.


While my incremental sizes are much smaller, the rates I see for dense 
(large blocks of changes, such as media files) incrementals is about the 
same.  I do see much lower rates for more scattered (such as filesystems 
with documents) changes.


--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] slow zfs send/recv speed

2011-11-15 Thread Edward Ned Harvey
 From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
 boun...@opensolaris.org] On Behalf Of Anatoly
 
 The speed of send/recv is around 30-60 MBytes/s for initial send and
 17-25 MBytes/s for incremental. I have seen lots of setups with 1 disk

I suggest watching zpool iostat before, during, and after the send to
/dev/null.  Actually, I take that back - zpool iostat seems to measure
virtual IOPS, as I just did this on my laptop a minute ago, I saw 1.2k ops,
which is at least 5-6x higher than my hard drive can handle, which can only
mean it's reading a lot of previously aggregated small blocks from disk,
which are now sequentially organized on disk.  How do you measure physical
iops?  Is it just regular iostat?  I have seriously put zero effort into
answering this question (sorry.)

I have certainly noticed a delay in the beginning, while the system thinks
about stuff for a little while to kick off an incremental... And it's
acknowledged and normal that incrementals are likely fragmented all over the
place so you could be IOPS limited (hence watching the iostat).

Also, whenever I sit and watch it for long times, I see that it varies
enormously.  For 5 minutes it will be (some speed), and for 5 minutes it
will be 5x higher...

Whatever it is, it's something we likely are all seeing, but probably just
ignoring.  If you can find it in your heart to just ignore it too, then
great, no problem.  ;-)  Otherwise, it's a matter of digging in and
characterizing to learn more about it.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss