Re: [zfs-discuss] An slog experiment (my NAS can beat up your NAS)

2008-10-09 Thread Keith Bierman

On Oct 8, 2008, at 4:27 PM   10/8/, Jim Dunham wrote:
 , a single Solaris node can not be both
 the primary and secondary node.

 If one wants this type of mirror functionality on a single node, use
 host based or controller based mirroring software.


If one is running multiple zones, couldn't you fool AVS into thinking  
that one zone was the primary and the other the secondary?
-- 
Keith H. Bierman   [EMAIL PROTECTED]  | AIM kbiermank
5430 Nassau Circle East  |
Cherry Hills Village, CO 80113   | 303-997-2749
speaking for myself* Copyright 2008




___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] An slog experiment (my NAS can beat up your NAS)

2008-10-08 Thread Moore, Joe
Brian Hechinger
 On Mon, Oct 06, 2008 at 10:47:04AM -0400, Moore, Joe wrote:
 
  I wonder if an AVS-replicated storage device on the
 backends would be appropriate?
 
  write - ZFS-mirrored slog - ramdisk -AVS- physical disk
 \
  +-iscsi- ramdisk -AVS- physical disk
 
  You'd get the continuous replication of the ramdisk to
 physical drive (and perhaps automagic recovery on reboot) but
 not pay the syncronous write to remote physical disk penalty

 It looks like the answer is no.

 [EMAIL PROTECTED] sudo sndradm -e localhost
 /dev/rramdisk/avstest1 /dev/zvol/rdsk/SYS0/bitmap1
 \wintermute /dev/zvol/dsk/SYS0/avstest2
 /dev/zvol/rdsk/SYS0/bitmap2 ip async
 Enable Remote Mirror? (Y/N) [N]: y
 sndradm: Error: both localhost and wintermute are local

I've not worked with AVS other than looking at the basic concepts, but to me 
this looks like a dont-shoot-yourself-in-the-foot critical warning rather than 
an actual functionality restriction.  Is there a -force option to override this 
normally quite reasonable sanity check?

--Joe
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] An slog experiment (my NAS can beat up your NAS)

2008-10-08 Thread Wilkinson, Alex

0n Sat, Oct 04, 2008 at 10:37:26PM -0700, Chris Greer wrote: 

The big thing here is I ended up getting a MASSIVE boost in
performance even with the overhead of the 1GB link, and iSCSI.
The iorate test I was using went from 3073 IOPS on 90% sequential
writes to 23953 IOPS with the RAM slog added.  The service time 
was also significantly better than the physical disk.

Curios, what tool did you use to benchmark your IOPS ?

 -aW

IMPORTANT: This email remains the property of the Australian Defence 
Organisation and is subject to the jurisdiction of section 70 of the CRIMES ACT 
1914.  If you have received this email in error, you are requested to contact 
the sender and delete the email.


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] An slog experiment (my NAS can beat up your NAS)

2008-10-08 Thread Brian Hechinger
On Wed, Oct 08, 2008 at 08:50:57AM -0400, Moore, Joe wrote:
 
 I've not worked with AVS other than looking at the basic concepts, but to me 
 this looks like a dont-shoot-yourself-in-the-foot critical warning rather 
 than an actual functionality restriction.  Is there a -force option to 
 override this normally quite reasonable sanity check?

There is no force option that I can see, but I'm also not ever worked with AVS.

-brian
-- 
Coding in C is like sending a 3 year old to do groceries. You gotta
tell them exactly what you want or you'll end up with a cupboard full of
pop tarts and pancake mix. -- IRC User (http://www.bash.org/?841435)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] An slog experiment (my NAS can beat up your NAS)

2008-10-08 Thread Chris Greer
I was using EMC's iorate for the comparison.

ftp://ftp.emc.com/pub/symm3000/iorate/

I had 4 processes running on the pool in parallel do 4K sequential writes.

I've also been playing around with a few other benchmark tools (i just had 
results from other storage test with this same iorate test).
--
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] An slog experiment (my NAS can beat up your NAS)

2008-10-08 Thread Jim Dunham
Joe,

 Brian Hechinger
 On Mon, Oct 06, 2008 at 10:47:04AM -0400, Moore, Joe wrote:

 I wonder if an AVS-replicated storage device on the
 backends would be appropriate?

 write - ZFS-mirrored slog - ramdisk -AVS- physical disk
   \
+-iscsi- ramdisk -AVS- physical disk

 You'd get the continuous replication of the ramdisk to
 physical drive (and perhaps automagic recovery on reboot) but
 not pay the syncronous write to remote physical disk penalty

 It looks like the answer is no.

 [EMAIL PROTECTED] sudo sndradm -e localhost
 /dev/rramdisk/avstest1 /dev/zvol/rdsk/SYS0/bitmap1
 \wintermute /dev/zvol/dsk/SYS0/avstest2
 /dev/zvol/rdsk/SYS0/bitmap2 ip async
 Enable Remote Mirror? (Y/N) [N]: y
 sndradm: Error: both localhost and wintermute are local

 I've not worked with AVS other than looking at the basic concepts,  
 but to me this looks like a dont-shoot-yourself-in-the-foot critical  
 warning rather than an actual functionality restriction.  Is there a  
 -force option to override this normally quite reasonable sanity check?

This is a hard restriction, with no override.  AVS, or more  
specifically the remote replication component call SNDR, needs to know  
which end of the replica is the SNDR primary node and which end is the  
SNDR secondary node. Since SNDR requires this functionality to know  
which direction to replica data, a single Solaris node can not be both  
the primary and secondary node.

If one wants this type of mirror functionality on a single node, use  
host based or controller based mirroring software.



 --Joe
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Jim Dunham

Storage Platform Software Group
Sun Microsystems, Inc.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] An slog experiment (my NAS can beat up your NAS)

2008-10-08 Thread Brian Hechinger
On Wed, Oct 08, 2008 at 06:27:51PM -0400, Jim Dunham wrote:
 
 If one wants this type of mirror functionality on a single node, use  
 host based or controller based mirroring software.

Is there mirroring software that can do async copies to a mirror?

-brian
-- 
Coding in C is like sending a 3 year old to do groceries. You gotta
tell them exactly what you want or you'll end up with a cupboard full of
pop tarts and pancake mix. -- IRC User (http://www.bash.org/?841435)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] An slog experiment (my NAS can beat up your NAS)

2008-10-07 Thread Ross
 Or would they?  A box dedicated to being a RAM based
 slog is going to be
 faster than any SSD would be.  Especially if you make
 the expensive jump
 to 8Gb FC.

Not necessarily.  While this has some advantages in terms of price  
performance, at ~$2400 the 80GB ioDrive would give it a run for it's money.  
600MB/s and enough capacity to (hopefully) use it as a L2ARC as well.

When you consider that you need at least two machines, UPS', and the supporting 
infrastructure for this idea, the ioDrive really isn't far off for cost.
--
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] An slog experiment (my NAS can beat up your NAS)

2008-10-07 Thread Robert Milkowski
Hello Nicolas,

Monday, October 6, 2008, 10:51:58 PM, you wrote:

NW I'm pretty sure that local RAM beats remote-anything, no matter what the
NW anything (as long as it isn't RAM) and what the protocol to get to it
NW (as long as it isn't a normal backplane).  (You could claim with NUMA
NW memory can be remote, so let's say that for a reasonable value of
NW remote.)

IIRC the total throughput to remote memory over fire link could be
faster than to local memory... just a funny thing I remembered.

Not that it is relevant here.



-- 
Best regards,
 Robertmailto:[EMAIL PROTECTED]
   http://milek.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] An slog experiment (my NAS can beat up your NAS)

2008-10-06 Thread Ross
Very interesting idea, thanks for sharing it.

Infiniband would definately be worth looking at for performance, although I 
think you'd need iSER to get the benefits and that might still be a little new: 
 http://www.opensolaris.org/os/project/iser/Release-notes/.  

It's also worth bearing in mind that you can have multiple mirrors.  I don't 
know what effect that will have on the performance, but it's an easy way to 
boost the reliability even further.  I think this idea configured on a set of 
2-3 servers, with separate UPS' for each, and a script that can export the pool 
and save the ramdrive when the power fails, is potentially a very neat little 
system.
--
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] An slog experiment (my NAS can beat up your NAS)

2008-10-06 Thread Moore, Joe
Nicolas Williams wrote
 There have been threads about adding a feature to support slow mirror
 devices that don't stay synced synchronously.  At least IIRC.  That
 would help.  But then, if the pool is busy writing then your slow ZIL
 mirrors would generally be out of sync, thus being of no help in the
 even of a power failure given fast slog devices that don't
 survive power
 failure.

I wonder if an AVS-replicated storage device on the backends would be 
appropriate?

write - ZFS-mirrored slog - ramdisk -AVS- physical disk
   \
+-iscsi- ramdisk -AVS- physical disk

You'd get the continuous replication of the ramdisk to physical drive (and 
perhaps automagic recovery on reboot) but not pay the syncronous write to 
remote physical disk penalty


 Also, using remote devices for a ZIL may defeat the purpose of fast
 ZILs, even if the actual devices are fast, because what really matters
 here is latency, and the farther the device, the higher the latency.

A .5-ms RTT on an ethernet link to the iSCSI disk may be faster than a 9-ms 
latency on physical media.

There was a time when it was better to place workstations' swap files on the 
far side of a 100Mbps ethernet link rather than using the local spinning rust.  
Ah, the good old days...

--Joe
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] An slog experiment (my NAS can beat up your NAS)

2008-10-06 Thread Brian Hechinger
On Sun, Oct 05, 2008 at 11:30:54PM -0500, Nicolas Williams wrote:
 
 There have been threads about adding a feature to support slow mirror
 devices that don't stay synced synchronously.  At least IIRC.  That
 would help.  But then, if the pool is busy writing then your slow ZIL

That would definitely be a great help.

 mirrors would generally be out of sync, thus being of no help in the
 even of a power failure given fast slog devices that don't survive power
 failure.

Maybe not, but it would at least save *something* as opposed to not saving
anything at all.  Still, with enough UPS power, there should be at least
enough run time left to get the rest of the ZIL to the disk mirror.

 Also, using remote devices for a ZIL may defeat the purpose of fast
 ZILs, even if the actual devices are fast, because what really matters
 here is latency, and the farther the device, the higher the latency.

4Gb FC is slow and low latency?  Tell that to all my local fast disks that
are attached via FC. :)

 Yes, it's pretty smart.  Add UPS and it's sortof like battery-backed
 RAM.  You can probably get a good enough reliability rate out of this
 for your purposes, though actual slog devices would be better if you can
 afford them.

Or would they?  A box dedicated to being a RAM based slog is going to be
faster than any SSD would be.  Especially if you make the expensive jump
to 8Gb FC.

-brian
-- 
Coding in C is like sending a 3 year old to do groceries. You gotta
tell them exactly what you want or you'll end up with a cupboard full of
pop tarts and pancake mix. -- IRC User (http://www.bash.org/?841435)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] An slog experiment (my NAS can beat up your NAS)

2008-10-06 Thread Brian Hechinger
On Mon, Oct 06, 2008 at 10:47:04AM -0400, Moore, Joe wrote:
 
 I wonder if an AVS-replicated storage device on the backends would be 
 appropriate?
 
 write - ZFS-mirrored slog - ramdisk -AVS- physical disk
\
 +-iscsi- ramdisk -AVS- physical disk
 
 You'd get the continuous replication of the ramdisk to physical drive (and 
 perhaps automagic recovery on reboot) but not pay the syncronous write to 
 remote physical disk penalty

Hmmm, AVS *might* just be the ticket here.  Will have to look at that.

 A .5-ms RTT on an ethernet link to the iSCSI disk may be faster than a 9-ms 
 latency on physical media.

Or, if you're looking into what I'm thinking with 4Gb/8Gb FC, it gets even 
better.

 There was a time when it was better to place workstations' swap files on the 
 far side of a 100Mbps ethernet link rather than using the local spinning 
 rust.  Ah, the good old days...

I remember those days.  My SPARCstation LX ran that way.  Not due to speed,
however, due to lack of disk space in the LX. ;)

-brian
-- 
Coding in C is like sending a 3 year old to do groceries. You gotta
tell them exactly what you want or you'll end up with a cupboard full of
pop tarts and pancake mix. -- IRC User (http://www.bash.org/?841435)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] An slog experiment (my NAS can beat up your NAS)

2008-10-06 Thread Nicolas Williams
On Mon, Oct 06, 2008 at 05:38:33PM -0400, Brian Hechinger wrote:
 On Sun, Oct 05, 2008 at 11:30:54PM -0500, Nicolas Williams wrote:
  There have been threads about adding a feature to support slow mirror
  devices that don't stay synced synchronously.  At least IIRC.  That
  would help.  But then, if the pool is busy writing then your slow ZIL
 
 That would definitely be a great help.
 
  mirrors would generally be out of sync, thus being of no help in the
  even of a power failure given fast slog devices that don't survive power
  failure.
 
 Maybe not, but it would at least save *something* as opposed to not saving
 anything at all.  Still, with enough UPS power, there should be at least
 enough run time left to get the rest of the ZIL to the disk mirror.

Yes.  But again, you get somewhat more protection from writing to a
write-biased SSD in that once the ZIL bits are committed then you get
protection from panics in the OS too, not just power failure.

  Also, using remote devices for a ZIL may defeat the purpose of fast
  ZILs, even if the actual devices are fast, because what really matters
  here is latency, and the farther the device, the higher the latency.
 
 4Gb FC is slow and low latency?  Tell that to all my local fast disks that
 are attached via FC. :)

The comparison was to RAM, not local fast disks.

I'm pretty sure that local RAM beats remote-anything, no matter what the
anything (as long as it isn't RAM) and what the protocol to get to it
(as long as it isn't a normal backplane).  (You could claim with NUMA
memory can be remote, so let's say that for a reasonable value of
remote.)

  Yes, it's pretty smart.  Add UPS and it's sortof like battery-backed
  RAM.  You can probably get a good enough reliability rate out of this
  for your purposes, though actual slog devices would be better if you can
  afford them.
 
 Or would they?  A box dedicated to being a RAM based slog is going to be
 faster than any SSD would be.  Especially if you make the expensive jump
 to 8Gb FC.

Unless the SSD had a battery-backed RAM cache, or were based entierly on
battery-backed RAM (but then you have to worry about battery upkeep).

To me this is a performance/reliability trade-off.  RAM slogs mirrored
in cluster + UPS - very fast, works as well as the UPS.  Write-biased
flash slogs - fast, no UPS to worry about.

Nico
-- 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] An slog experiment (my NAS can beat up your NAS)

2008-10-06 Thread Brian Hechinger
On Mon, Oct 06, 2008 at 10:47:04AM -0400, Moore, Joe wrote:
 
 I wonder if an AVS-replicated storage device on the backends would be 
 appropriate?
 
 write - ZFS-mirrored slog - ramdisk -AVS- physical disk
\
 +-iscsi- ramdisk -AVS- physical disk
 
 You'd get the continuous replication of the ramdisk to physical drive (and 
 perhaps automagic recovery on reboot) but not pay the syncronous write to 
 remote physical disk penalty

It looks like the answer is no.

[EMAIL PROTECTED] sudo sndradm -e localhost /dev/rramdisk/avstest1 
/dev/zvol/rdsk/SYS0/bitmap1 \wintermute /dev/zvol/dsk/SYS0/avstest2 
/dev/zvol/rdsk/SYS0/bitmap2 ip async
Enable Remote Mirror? (Y/N) [N]: y
sndradm: Error: both localhost and wintermute are local

In order to use AVS, it looks like you'd have to replicate between two (or more)
ZIL Boxes.  Not the worst thing in the world to have to do, but it certainly
complicates things.  Also, you don't get that super fast RAM-Disk sync anymore
as you now have to traverse an IP network to get there.  Still might be an
acceptable way to achieve the goals we are looking at here.

I guess at this point falling back to 'zfs send' run in a continuous loop might
be an alternative.

-brian
-- 
Coding in C is like sending a 3 year old to do groceries. You gotta
tell them exactly what you want or you'll end up with a cupboard full of
pop tarts and pancake mix. -- IRC User (http://www.bash.org/?841435)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] An slog experiment (my NAS can beat up your NAS)

2008-10-06 Thread Brian Hechinger
On Mon, Oct 06, 2008 at 01:13:40AM -0700, Ross wrote:
 
 It's also worth bearing in mind that you can have multiple mirrors.  I don't 
 know what effect that will have on the performance, but it's an easy way to 
 boost the reliability even further.  I think this idea configured on a set of 
 2-3 servers, with separate UPS' for each, and a script that can export the 
 pool and save the ramdrive when the power fails, is potentially a very neat 
 little system.

The more slog devices, the better. :)

If the host using the slogs could trigger the shutdown, that would be even
better I think.  Once we know the zpool is exported, the slogs have just
entered a nicely consistent state at which point the copies could be made.

Also, it would also be nice if the host using these slogs would be able to
wait until enough of them are online to attempt to mount its pool.  That
shouldn't be too hard, nothing more than some startup script modifications.

-brian
-- 
Coding in C is like sending a 3 year old to do groceries. You gotta
tell them exactly what you want or you'll end up with a cupboard full of
pop tarts and pancake mix. -- IRC User (http://www.bash.org/?841435)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] An slog experiment (my NAS can beat up your NAS)

2008-10-05 Thread Brian Hechinger
On Sat, Oct 04, 2008 at 10:37:26PM -0700, Chris Greer wrote:
 
 So I tried this experiment this week...
 On each host (OpenSolaris 2008.05), I created an 8GB ramdisk with ramdiskadm. 
  I shared this ramdisk on each host via the iscsi target and initiator over a 
 1GB crossconnect cable (jumbo frames enabled).  I added these as mirrored 
 slog devices in a zpool.

Very interesting.  This also gives me an idea.  Using COMSTAR you could
build any number of RAM based slog devices.  They wouldn't need to be
anything amazing, just a bunch of RAM and a supported FC card (or two).

 I'm not sure I could survive a crash of both nodes, going to try and test 
 some more.

Ok, so taking my idea above, maybe a pair of 15K SAS disks in those
boxes so that you could create a backing store.  I wonder what the best
way to setup realtime sync would be (without making the backing store
responsible for slowing down the ramdisk, so no zfs mirroring between
rmadisk and sas disk in other words).

 So is this idea completely crazy?

I don't think so, no. ;)

-brian
-- 
Coding in C is like sending a 3 year old to do groceries. You gotta
tell them exactly what you want or you'll end up with a cupboard full of
pop tarts and pancake mix. -- IRC User (http://www.bash.org/?841435)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] An slog experiment (my NAS can beat up your NAS)

2008-10-05 Thread Adam Leventhal
 So what are the downsides to this?  If both nodes were to crash and  
 I used the same technique to recreate the ramdisk I would lose any  
 transactions in the slog at the time of the crash, but the physical  
 disk image is still in a consistent state right (just not from my  
 apps point of view)?

You would lose transactions, but the pool would still reflect a  
consistent
state.

 So is this idea completely crazy?


On the contrary; it's very clever.

Adam

--
Adam Leventhal, Fishworkshttp://blogs.sun.com/ahl

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] An slog experiment (my NAS can beat up your NAS)

2008-10-05 Thread Nicolas Williams
On Sun, Oct 05, 2008 at 09:07:31PM -0400, Brian Hechinger wrote:
 On Sat, Oct 04, 2008 at 10:37:26PM -0700, Chris Greer wrote:
  I'm not sure I could survive a crash of both nodes, going to try and
  test some more.
 
 Ok, so taking my idea above, maybe a pair of 15K SAS disks in those
 boxes so that you could create a backing store.  I wonder what the best
 way to setup realtime sync would be (without making the backing store
 responsible for slowing down the ramdisk, so no zfs mirroring between
 rmadisk and sas disk in other words).

There have been threads about adding a feature to support slow mirror
devices that don't stay synced synchronously.  At least IIRC.  That
would help.  But then, if the pool is busy writing then your slow ZIL
mirrors would generally be out of sync, thus being of no help in the
even of a power failure given fast slog devices that don't survive power
failure.

Also, using remote devices for a ZIL may defeat the purpose of fast
ZILs, even if the actual devices are fast, because what really matters
here is latency, and the farther the device, the higher the latency.

  So is this idea completely crazy?
 
 I don't think so, no. ;)

Yes, it's pretty smart.  Add UPS and it's sortof like battery-backed
RAM.  You can probably get a good enough reliability rate out of this
for your purposes, though actual slog devices would be better if you can
afford them.

Nico
-- 
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss