Re: [zfs-discuss] slow zfs send/recv speed

2011-11-16 Thread Edward Ned Harvey
> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
> boun...@opensolaris.org] On Behalf Of Anatoly
> 
> I've just made clean test for sequential data read. System has 45 mirror
> vdevs.

90 disks in the system...  I bet you have a lot of ram?


> 2. Read file normally:
> # time dd if=./big_file bs=128k of=/dev/null
> 161118683136 bytes (161 GB) copied, 103.455 seconds, 1.6 GB/s

I wonder how much of that is being read back from cache.  Would it be
impossible to reboot, or otherwise invalidate the cache, before reading the
file back?

With 90 disks, in theory, you should be able to read something like 90Gbit =
11GB / sec.  But of course various bus speed bottlenecks come into play, so
I don't think the 1.6GB/s is unrealistically high in any way.


> 3. Snapshot & send:
> # zfs snapshot volume/test@A
> # time zfs send volume/test@A > /dev/null
> real    7m20.635s
> user    0m0.004s
> sys 0m52.760s

This doesn't surprise me, based on gut feel, I don't think zfs send performs
optimally, in general.

I think your results are probably correct, and even if you revisit all this,
doing the reboots (or cache invalidation) and/or using a newly created pool,
as anyone here might suggest...  I think you'll still see the same results.
Somewhat unpredictably.

Even so, I always find zfs send performance still beats the pants off any
alternative... rsync and whatnot.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] how to set up solaris os and cache within one SSD

2011-11-16 Thread Gregg Wonderly

On 11/10/2011 7:42 AM, Edward Ned Harvey wrote:

From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
boun...@opensolaris.org] On Behalf Of darkblue

1 * XEON 5606
1 * supermirco X8DT3-LN4F
6 * 4G RECC RAM
22 * WD RE3 1T harddisk
4 * intel 320 (160G) SSD
1 * supermicro 846E1-900B chassis

I just want to say, this isn't supported hardware, and although many people 
will say they do this without problem, I've heard just as many people 
(including myself) saying it's unstable that way.

I recommend buying either the oracle hardware or the nexenta on whatever they 
recommend for hardware.

Definitely DO NOT run the free version of solaris without updates and expect it 
to be reliable.  But that's a separate issue.  I'm also emphasizing that even 
if you pay for solaris support on non-oracle hardware, don't expect it to be 
great.  But maybe it will be.
I think the key issue here, is whether this hardware will corrupt a pool or 
not.  Ultimately, the promise of ZFS, for me anyways, is that I can take disks 
to new hardware if/when needed.  I am not dependent on a controller or 
motherboard which provides some feature key to access the data on the disks.


Companies which sell key software, that you depend on working, generally have 
proven that software to work reliably on hardware which they might sell to make 
use of said software.


Apple's business model and success, for example is based on this fact, because 
they have a much smaller bug pool to consider.  Oracle hardware works out the 
same way.


I think supporting the development of ZFS is key to the next generation of 
storage solutions...  But, I don't need the class of hardware that Oracle wants 
me to pay for.  I need disks with 24/7 reliability.  I can wait till tomorrow to 
store something onto my server from my laptop/desktop.  Consumer/non-enterprise 
needs are quite different, and I don't think Oracle understands how to deal in 
the 1,000,000,000 potential customer marketplace.   They've had a hard enough 
time just working in the 100,000 customer marketplace.


Gregg
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] slow zfs send/recv speed

2011-11-16 Thread Eric D. Mudama

On Wed, Nov 16 at  9:35, David Dyer-Bennet wrote:


On Tue, November 15, 2011 17:05, Anatoly wrote:

Good day,

The speed of send/recv is around 30-60 MBytes/s for initial send and
17-25 MBytes/s for incremental. I have seen lots of setups with 1 disk
to 100+ disks in pool. But the speed doesn't vary in any degree. As I
understand 'zfs send' is a limiting factor. I did tests by sending to
/dev/null. It worked out too slow and absolutely not scalable.
None of cpu/memory/disk activity were in peak load, so there is of room
for improvement.


What you're probably seeing with incremental sends is that the disks being
read are hitting their IOPS limits.  Zfs send does random reads all over
the place -- every block that's changed since the last incremental send is
read, in TXG order.  So that's essentially random reads all of the disk.


Anatoly didn't state whether his 160GB file test was done on a virgin
pool, or whether it was allocated out of an existing pool.  If the
latter, your comment is the likely explanation.  If the former, your
comment wouldn't explain the slow performance.

--eric

--
Eric D. Mudama
edmud...@bounceswoosh.org

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] slow zfs send/recv speed

2011-11-16 Thread Paul Kraus
On Wed, Nov 16, 2011 at 11:07 AM, Anatoly  wrote:

> I've just made clean test for sequential data read. System has 45 mirror
> vdevs.
>
> 1. Create 160GB random file.
> 2. Read it to /dev/null.
> 3. Do Snaspshot and send it to /dev/null.
> 4. Compare results.

What OS?

The following is under Solaris 10U9 with CPU_2010-10 + an IDR for a
SAS/SATA drive bug.

I just had to replicate over 20TB of small files, `zfs send -R
 | zfs recv -e `, and I got an AVERAGE throughput of
over 77MB/sec. (over 6TB /day). The entire replication took just over
3 days.

The source zpool is on J4400 750GB SATA drives, 110 of them in a
RAIDz2 configuration (22 vdevs of 5 disks each), the target was a pair
of old h/w raid boxes (one without any NVRAM cache) and a zpool
configuration of 6 striped vdevs (a total of 72 drives behind the h/w
raid controller doing raid5, this is temporary and only for moving
data physically around, so the lack of ZFS redundancy is not an
issue).

There are over 2300 snapshots on the source side and we were
replicating close to 2000 of them.

-- 
{1-2-3-4-5-6-7-}
Paul Kraus
-> Senior Systems Architect, Garnet River ( http://www.garnetriver.com/ )
-> Sound Coordinator, Schenectady Light Opera Company (
http://www.sloctheater.org/ )
-> Technical Advisor, RPI Players
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] slow zfs send/recv speed

2011-11-16 Thread Anatoly


  
  
Good day,

I've just made clean test for sequential data read. System has 45
mirror vdevs.

1. Create 160GB random file.
2. Read it to /dev/null.
3. Do Snaspshot and send it to /dev/null.
4. Compare results.

1. Write speed is slow due to 'urandom':
# dd if=/dev/urandom bs=128k | pv > big_file
161118683136 bytes (161 GB) copied, 3962.15 seconds, 40.7 MB/s

2. Read file normally:
# time dd if=./big_file bs=128k of=/dev/null
161118683136 bytes (161 GB) copied, 103.455 seconds, 1.6 GB/s
real    1m43.459s
user    0m0.899s
sys 1m25.078s

3. Snapshot & send:
# zfs snapshot volume/test@A
# time zfs send volume/test@A > /dev/null
real    7m20.635s
user    0m0.004s
sys 0m52.760s

4. As you see, there is 4 times difference on pure sequential read,
greenhouse conditions.
I repeated tests couple of times to check ARC influence - no much
difference.
Real send speed on this system is around 60 MBytes/s with some 100
peak.
File read operation is good scaled for large number of disks. But
'zfs send' is lame.

In normal conditions moving of large portions of data may take days
to weeks. It can't fill 
10G Ethernet connection, sometimes even 1G.

Best regards,
Anatoly Legkodymov.

On 16.11.2011 06:08, Edward Ned Harvey wrote:

  
From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
boun...@opensolaris.org] On Behalf Of Anatoly

The speed of send/recv is around 30-60 MBytes/s for initial send and
17-25 MBytes/s for incremental. I have seen lots of setups with 1 disk

  
  
I suggest watching zpool iostat before, during, and after the send to
/dev/null.  Actually, I take that back - zpool iostat seems to measure
virtual IOPS, as I just did this on my laptop a minute ago, I saw 1.2k ops,
which is at least 5-6x higher than my hard drive can handle, which can only
mean it's reading a lot of previously aggregated small blocks from disk,
which are now sequentially organized on disk.  How do you measure physical
iops?  Is it just regular iostat?  I have seriously put zero effort into
answering this question (sorry.)

I have certainly noticed a delay in the beginning, while the system thinks
about stuff for a little while to kick off an incremental... And it's
acknowledged and normal that incrementals are likely fragmented all over the
place so you could be IOPS limited (hence watching the iostat).

Also, whenever I sit and watch it for long times, I see that it varies
enormously.  For 5 minutes it will be (some speed), and for 5 minutes it
will be 5x higher...

Whatever it is, it's something we likely are all seeing, but probably just
ignoring.  If you can find it in your heart to just ignore it too, then
great, no problem.  ;-)  Otherwise, it's a matter of digging in and
characterizing to learn more about it.




  

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] slow zfs send/recv speed

2011-11-16 Thread David Dyer-Bennet

On Tue, November 15, 2011 17:05, Anatoly wrote:
> Good day,
>
> The speed of send/recv is around 30-60 MBytes/s for initial send and
> 17-25 MBytes/s for incremental. I have seen lots of setups with 1 disk
> to 100+ disks in pool. But the speed doesn't vary in any degree. As I
> understand 'zfs send' is a limiting factor. I did tests by sending to
> /dev/null. It worked out too slow and absolutely not scalable.
> None of cpu/memory/disk activity were in peak load, so there is of room
> for improvement.

What you're probably seeing with incremental sends is that the disks being
read are hitting their IOPS limits.  Zfs send does random reads all over
the place -- every block that's changed since the last incremental send is
read, in TXG order.  So that's essentially random reads all of the disk.

-- 
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Remove corrupt files from snapshot

2011-11-16 Thread David Dyer-Bennet

On Tue, November 15, 2011 10:07, sbre...@hotmail.com wrote:


> Would it make sense to do "zfs scrub" regularly and have a report sent,
> i.e. once a day, so discrepancy would be noticed beforehand? Is there
> anything readily available in the Freebsd ZFS package for this?

If you're not scrubbing regularly, you're losing out on one of the key
benefits of ZFS.  In nearly all fileserver situations, a good amount of
the content is essentially archival, infrequently accessed but important
now and then.  (In my case it's my collection of digital and digitized
photos.)

A weekly scrub combined with a decent backup plan will detect bit-rot
before the backups with the correct data cycle into the trash (and, with
redundant storage like mirroring or RAID, the scrub will probably be able
to fix the error without resorting to restoring files from backup).
-- 
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] slow zfs send/recv speed

2011-11-16 Thread David Dyer-Bennet

On Tue, November 15, 2011 20:08, Edward Ned Harvey wrote:
>> From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
>> boun...@opensolaris.org] On Behalf Of Anatoly
>>
>> The speed of send/recv is around 30-60 MBytes/s for initial send and
>> 17-25 MBytes/s for incremental. I have seen lots of setups with 1 disk
>
> I suggest watching zpool iostat before, during, and after the send to
> /dev/null.  Actually, I take that back - zpool iostat seems to measure
> virtual IOPS, as I just did this on my laptop a minute ago, I saw 1.2k
> ops,
> which is at least 5-6x higher than my hard drive can handle, which can
> only
> mean it's reading a lot of previously aggregated small blocks from disk,
> which are now sequentially organized on disk.  How do you measure physical
> iops?  Is it just regular iostat?  I have seriously put zero effort into
> answering this question (sorry.)
>
> I have certainly noticed a delay in the beginning, while the system thinks
> about stuff for a little while to kick off an incremental... And it's
> acknowledged and normal that incrementals are likely fragmented all over
> the
> place so you could be IOPS limited (hence watching the iostat).
>
> Also, whenever I sit and watch it for long times, I see that it varies
> enormously.  For 5 minutes it will be (some speed), and for 5 minutes it
> will be 5x higher...
>
> Whatever it is, it's something we likely are all seeing, but probably just
> ignoring.  If you can find it in your heart to just ignore it too, then
> great, no problem.  ;-)  Otherwise, it's a matter of digging in and
> characterizing to learn more about it.

I see rather variable io stats while sending incremental backups.  The
receiver is a USB disk, so fairly slow, but I get 30MB/s in a good
stretch.  I'm compressing the ZFS filesystem on the receiving end, but
much of my content is already-compressed photo files, so it doesn't make a
huge difference.   Helps some, though, and at 30MB/s there's no shortage
of CPU horsepower to handle the compression.

The raw files are around 12MB each, probably not fragmented much (they're
just copied over from memory cards).  For a small number of the files,
there's a photoshop file that's much bigger (sometimes more than 1GB, if
it's a stitched panorama with layers of changes).  And then there are
sidecar XMP files, mostly two per image, and for most of them
web-resolution images, 100kB.

-- 
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss