Re: [zfs-discuss] L2ARC in Cluster is picked up althought not part of the pool

2010-02-24 Thread Lutz Schumann
I fully agree. This needs fixing. I can think of so many situations, where 
device names change in OpenSolaris (especially with movable pools). This 
problem can lead to serious data corruption. 

Besides persistent L2ARC (which is much more difficult I would say) - Making 
L2ARC also rely on labels instead of device paths is essential.

Can someone open a CR for this ??
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] L2ARC in Cluster is picked up althought not part of the pool

2010-02-08 Thread Daniel Carosone
On Mon, Feb 01, 2010 at 12:22:55PM -0800, Lutz Schumann wrote:
   Created a pool on head1 containing just the cache
  device (c0t0d0). 
  
  This is not possible, unless there is a bug. You
  cannot create a pool
  with only a cache device.  I have verified this on
  b131:
  # zpool create norealpool cache /dev/ramdisk/rc1
  1
  invalid vdev specification: at least one toplevel
  l vdev must be specified
  
  This is also consistent with the notion that cache
  devices are auxiliary
  devices and do not have pool configuration
  information in the label.
 
 Sorry for the confustion ...  a little misunderstanding. I created a Pool 
 who's only data disk is the disk formally used as cache device in the pool 
 that switched. Then I exported this pool mad eform just a single disk (data 
 disk). And switched back. The exported pool was picked up as cache device ... 
 this seems really problematic. 

This is exactly the scenario I was concerned about earlier in the
thread.  Thanks for confirming that it occurs.  Please verify that the
pool had autoreplace=off (just to avoid that distraction), and file a
bug.  

Cache devices should not automatically destroy disk contents based
solely on device path, especially where that device path came along
with a pool import.  Cache devices need labels to confirm their
identity. This is irrespective of whether the cache contents after the
label are persistent or volatile, ie should be fixed without waiting
for the CR about persistent l2arc.

--
Dan.

pgpjdt4tg1JNp.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] L2ARC in Cluster is picked up althought not part of the pool

2010-02-01 Thread Lutz Schumann
I tested some more and found that Pool disks are picked UP. 

Head1: Cachedevice1 (c0t0d0)
Head2: Cachedevice2 (c0t0d0)
Pool: Shared, c1tXdY

I created a pool on shared storage. 
Added the cache device on Head1. 
Switched the pool to Head2 (export + import). 
Created a pool on head1 containing just the cache device (c0t0d0). 
Exported the pool on Head1. 
Switched back the pool from head2 to head1 (export + import)
The disk c0t0d0 is picked up as cache device ... 

This practially means my exported pool was destroyed. 

In production this would been hell.

Am I missing something here ?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] L2ARC in Cluster is picked up althought not part of the pool

2010-02-01 Thread Richard Elling
On Feb 1, 2010, at 5:53 AM, Lutz Schumann wrote:

 I tested some more and found that Pool disks are picked UP. 
 
 Head1: Cachedevice1 (c0t0d0)
 Head2: Cachedevice2 (c0t0d0)
 Pool: Shared, c1tXdY
 
 I created a pool on shared storage. 
 Added the cache device on Head1. 
 Switched the pool to Head2 (export + import). 
 Created a pool on head1 containing just the cache device (c0t0d0). 

This is not possible, unless there is a bug. You cannot create a pool
with only a cache device.  I have verified this on b131:
# zpool create norealpool cache /dev/ramdisk/rc1
invalid vdev specification: at least one toplevel vdev must be specified

This is also consistent with the notion that cache devices are auxiliary
devices and do not have pool configuration information in the label.
 -- richard

 Exported the pool on Head1. 
 Switched back the pool from head2 to head1 (export + import)
 The disk c0t0d0 is picked up as cache device ... 
 
 This practially means my exported pool was destroyed. 
 
 In production this would been hell.
 
 Am I missing something here ?
 -- 
 This message posted from opensolaris.org
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] L2ARC in Cluster is picked up althought not part of the pool

2010-02-01 Thread Lutz Schumann
  Created a pool on head1 containing just the cache
 device (c0t0d0). 
 
 This is not possible, unless there is a bug. You
 cannot create a pool
 with only a cache device.  I have verified this on
 b131:
 # zpool create norealpool cache /dev/ramdisk/rc1
 1
 invalid vdev specification: at least one toplevel
 l vdev must be specified
 
 This is also consistent with the notion that cache
 devices are auxiliary
 devices and do not have pool configuration
 information in the label.

Sorry for the confustion ...  a little misunderstanding. I created a Pool who's 
only data disk is the disk formally used as cache device in the pool that 
switched. Then I exported this pool mad eform just a single disk (data disk). 
And switched back. The exported pool was picked up as cache device ... this 
seems really problematic. 

Robert
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] L2ARC in Cluster is picked up althought not part of the pool

2010-01-28 Thread Lutz Schumann
Actuall I tested this. 

If I add a l2arc device to the syspool it is not used when issueing I/O to the 
data pool (note: on root pool it must no be a whole disk, but only a slice of 
it otherwise ZFS complains that root disks may not contain some EFI label). 

So this does not work - unfortunately :(

Just for Info. 
Robert
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] L2ARC in Cluster is picked up althought not part of the pool

2010-01-28 Thread Richard Elling
On Jan 28, 2010, at 10:54 AM, Lutz Schumann wrote:

 Actuall I tested this. 
 
 If I add a l2arc device to the syspool it is not used when issueing I/O to 
 the data pool (note: on root pool it must no be a whole disk, but only a 
 slice of it otherwise ZFS complains that root disks may not contain some EFI 
 label). 

In my tests it does work. Can you share your test plan?
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] L2ARC in Cluster is picked up althought not part of the pool

2010-01-24 Thread Lutz Schumann
Thanks for the feedback Richard. 

Does that mean that the L2ARC can be part of ANY pool and that there is only 
ONE L2ARC for all pools active on the machine ? 

Thesis:

  - There is one L2ARC on the machine for all pools
  - all Pools active share the same L2ARC
  - the L2ARC can be part of any pool, also the root (syspool) pool 

If this true, the solution would be like this: 

a) Add L2ARC to the syspool 

or

b) Add another two (standby) L2ARC devices in the head that are used in case of 
a failover. (Thus a configuration that accepts degrated performance after a 
failover has to life with this corrup data effect).  

True ?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] L2ARC in Cluster is picked up althought not part of the pool

2010-01-23 Thread Lutz Schumann
Hi, 

i found some time and was able to test again.

 - verify with unique uid of the device 
 - verify with autoreplace = off

Indeed autoreplace was set to yes for the pools. So I disabled the autoreplace. 

VOL PROPERTY   VALUE   SOURCE
nxvol2  autoreplaceoff default

Erased the labels on the cache disk and added it again to the pool. Now both 
cache disks have different guid's: 

# cache device in node1
r...@nex1:/volumes# zdb -l -e /dev/rdsk/c0t2d0s0

LABEL 0

version=14
state=4
guid=15970804704220025940

# cache device in node2
r...@nex2:/volumes# zdb -l -e /dev/rdsk/c0t2d0s0

LABEL 0

version=14
state=4
guid=2866316542752696853

GUID's are different. 

However after switching the pool nxvol2 to node1 (where nxvol1 was active), the 
disks picked up as cache dev's: 

# nxvol2 switched to this node ... 
volume: nxvol2
 state: ONLINE
 scrub: none requested
config:

NAME STATE READ WRITE CKSUM
nxvol2   ONLINE   0 0 0
  mirror ONLINE   0 0 0
c3t10d0  ONLINE   0 0 0
c4t13d0  ONLINE   0 0 0
  mirror ONLINE   0 0 0
c3t9d0   ONLINE   0 0 0
c4t12d0  ONLINE   0 0 0
  mirror ONLINE   0 0 0
c3t8d0   ONLINE   0 0 0
c4t11d0  ONLINE   0 0 0
  mirror ONLINE   0 0 0
c3t18d0  ONLINE   0 0 0
c4t22d0  ONLINE   0 0 0
  mirror ONLINE   0 0 0
c3t17d0  ONLINE   0 0 0
c4t21d0  ONLINE   0 0 0
cache
  c0t2d0 FAULTED  0 0 0  corrupted data

# nxvol1 was active here before ...
n...@nex1:/$ show volume nxvol1 status
volume: nxvol1
 state: ONLINE
 scrub: none requested
config:

NAME STATE READ WRITE CKSUM
nxvol1   ONLINE   0 0 0
  mirror ONLINE   0 0 0
c3t15d0  ONLINE   0 0 0
c4t18d0  ONLINE   0 0 0
  mirror ONLINE   0 0 0
c3t14d0  ONLINE   0 0 0
c4t17d0  ONLINE   0 0 0
  mirror ONLINE   0 0 0
c3t13d0  ONLINE   0 0 0
c4t16d0  ONLINE   0 0 0
  mirror ONLINE   0 0 0
c3t12d0  ONLINE   0 0 0
c4t15d0  ONLINE   0 0 0
  mirror ONLINE   0 0 0
c3t11d0  ONLINE   0 0 0
c4t14d0  ONLINE   0 0 0
cache
  c0t2d0 ONLINE  0 0 0  

So this is true with and without autoreplace, and with differnt guid's of the 
devices.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] L2ARC in Cluster is picked up althought not part of the pool

2010-01-23 Thread Richard Elling
AIUI, this works as designed. 

I think the best practice will be to add the L2ARC to syspool (nee rpool).
However, for current NexentaStor releases, you cannot add cache devices
to syspool.

Earlier I mentioned that this made me nervous.  I no longer hold any 
reservation against it. It should work just fine as-is.
 -- richard


On Jan 23, 2010, at 9:53 AM, Lutz Schumann wrote:

 Hi, 
 
 i found some time and was able to test again.
 
 - verify with unique uid of the device 
 - verify with autoreplace = off
 
 Indeed autoreplace was set to yes for the pools. So I disabled the 
 autoreplace. 
 
 VOL PROPERTY   VALUE   SOURCE
 nxvol2  autoreplaceoff default
 
 Erased the labels on the cache disk and added it again to the pool. Now both 
 cache disks have different guid's: 
 
 # cache device in node1
 r...@nex1:/volumes# zdb -l -e /dev/rdsk/c0t2d0s0
 
 LABEL 0
 
version=14
state=4
guid=15970804704220025940
 
 # cache device in node2
 r...@nex2:/volumes# zdb -l -e /dev/rdsk/c0t2d0s0
 
 LABEL 0
 
version=14
state=4
guid=2866316542752696853
 
 GUID's are different. 
 
 However after switching the pool nxvol2 to node1 (where nxvol1 was active), 
 the disks picked up as cache dev's: 
 
 # nxvol2 switched to this node ... 
 volume: nxvol2
 state: ONLINE
 scrub: none requested
 config:
 
NAME STATE READ WRITE CKSUM
nxvol2   ONLINE   0 0 0
  mirror ONLINE   0 0 0
c3t10d0  ONLINE   0 0 0
c4t13d0  ONLINE   0 0 0
  mirror ONLINE   0 0 0
c3t9d0   ONLINE   0 0 0
c4t12d0  ONLINE   0 0 0
  mirror ONLINE   0 0 0
c3t8d0   ONLINE   0 0 0
c4t11d0  ONLINE   0 0 0
  mirror ONLINE   0 0 0
c3t18d0  ONLINE   0 0 0
c4t22d0  ONLINE   0 0 0
  mirror ONLINE   0 0 0
c3t17d0  ONLINE   0 0 0
c4t21d0  ONLINE   0 0 0
cache
  c0t2d0 FAULTED  0 0 0  corrupted data
 
 # nxvol1 was active here before ...
 n...@nex1:/$ show volume nxvol1 status
 volume: nxvol1
 state: ONLINE
 scrub: none requested
 config:
 
NAME STATE READ WRITE CKSUM
nxvol1   ONLINE   0 0 0
  mirror ONLINE   0 0 0
c3t15d0  ONLINE   0 0 0
c4t18d0  ONLINE   0 0 0
  mirror ONLINE   0 0 0
c3t14d0  ONLINE   0 0 0
c4t17d0  ONLINE   0 0 0
  mirror ONLINE   0 0 0
c3t13d0  ONLINE   0 0 0
c4t16d0  ONLINE   0 0 0
  mirror ONLINE   0 0 0
c3t12d0  ONLINE   0 0 0
c4t15d0  ONLINE   0 0 0
  mirror ONLINE   0 0 0
c3t11d0  ONLINE   0 0 0
c4t14d0  ONLINE   0 0 0
cache
  c0t2d0 ONLINE  0 0 0  
 
 So this is true with and without autoreplace, and with differnt guid's of the 
 devices.
 -- 
 This message posted from opensolaris.org
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] L2ARC in Cluster is picked up althought not part of the pool

2010-01-21 Thread Richard Elling
On Jan 20, 2010, at 4:17 PM, Daniel Carosone wrote:

 On Wed, Jan 20, 2010 at 03:20:20PM -0800, Richard Elling wrote:
 Though the ARC case, PSARC/2007/618 is unpublished, I gather from
 googling and the source that L2ARC devices are considered auxiliary,
 in the same category as spares. If so, then it is perfectly reasonable to
 expect that it gets picked up regardless of the GUID. This also implies
 that it is shareable between pools until assigned. Brief testing confirms
 this behaviour.  I learn something new every day :-)
 
 So, I suspect Lutz sees a race when both pools are imported onto one
 node.  This still makes me nervous though...
 
 Yes. What if device reconfiguration renumbers my controllers, will
 l2arc suddenly start trashing a data disk?  The same problem used to
 be a risk for swap,  but less so now that we swap to named zvol. 

This will not happen unless the labels are rewritten on your data disk, 
and if that occurs, all bets are off.

 There's work afoot to make l2arc persistent across reboot, which
 implies some organised storage structure on the device.  Fixing this
 shouldn't wait for that.

Upon further review, the ruling on the field is confirmed ;-)  The L2ARC
is shared amongst pools just like the ARC. What is important is that at
least one pool has a cache vdev. I suppose one could make the case
that a new command is needed in addition to zpool and zfs (!) to manage
such devices. But perhaps we can live with the oddity for a while?

As such, for Lutz's configuration, I am now less nervous. If I understand
correctly, you could add the cache vdev to rpool and forget about how
it works with the shared pools.
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] L2ARC in Cluster is picked up althought not part of the pool

2010-01-21 Thread Daniel Carosone
On Thu, Jan 21, 2010 at 09:36:06AM -0800, Richard Elling wrote:
 On Jan 20, 2010, at 4:17 PM, Daniel Carosone wrote:
 
  On Wed, Jan 20, 2010 at 03:20:20PM -0800, Richard Elling wrote:
  Though the ARC case, PSARC/2007/618 is unpublished, I gather from
  googling and the source that L2ARC devices are considered auxiliary,
  in the same category as spares. If so, then it is perfectly reasonable to
  expect that it gets picked up regardless of the GUID. This also implies
  that it is shareable between pools until assigned. Brief testing confirms
  this behaviour.  I learn something new every day :-)
  
  So, I suspect Lutz sees a race when both pools are imported onto one
  node.  This still makes me nervous though...
  
  Yes. What if device reconfiguration renumbers my controllers, will
  l2arc suddenly start trashing a data disk?  The same problem used to
  be a risk for swap,  but less so now that we swap to named zvol. 
 
 This will not happen unless the labels are rewritten on your data disk, 
 and if that occurs, all bets are off.

It occurred to me later yesterday, while offline, that the pool in
question might have autoreplace=on set.  If that were true, it would
explain why a disk in the same controller slot was overwritten and
used.

Lutz, is the pool autoreplace property on?  If so, god help us all
is no longer quite so necessary.

  There's work afoot to make l2arc persistent across reboot, which
  implies some organised storage structure on the device.  Fixing this
  shouldn't wait for that.
 
 Upon further review, the ruling on the field is confirmed ;-)  The L2ARC
 is shared amongst pools just like the ARC. What is important is that at
 least one pool has a cache vdev. 

Wait, huh?  That's a totally separate issue from what I understood
from the discussion.  What I was worried about was that disk Y, that
happened to have the same cLtMdN address as disk X on another node,
was overwritten and trashed on import to become l2arc.  

Maybe I missed some other detail in the thread and reached the wrong
conclusion? 

 As such, for Lutz's configuration, I am now less nervous. If I understand
 correctly, you could add the cache vdev to rpool and forget about how
 it works with the shared pools.

The fact that l2arc devices could be caching data from any pool in the
system is .. a whole different set of (mostly performance) wrinkles.

For example, if I have a pool of very slow disks (usb or remote
iscsi), and a pool of faster disks, and l2arc for the slow pool on the
same faster disks, it's pointless having the faster pool using l2arc
on the same disks or even the same type of disks.  I'd need to set the
secondarycache properties of one pool according to the configuration
of another. 

 I suppose one could make the case
 that a new command is needed in addition to zpool and zfs (!) to manage
 such devices. But perhaps we can live with the oddity for a while?

This part, I expect, will be resolved or clarified as part of the
l2arc persistence work, since then their attachment to specific pools
will need to be clear and explicit.

Perhaps the answer is that the cache devices become their own pool
(since they're going to need filesystem-like structured storage
anyway). The actual cache could be a zvol (or new object type) within
that pool, and then (if necessary) an association is made between
normal pools and the cache (especially if I have multiple of them).
No new top-level commands needed. 

--
Dan.


pgp0MK26F4Jvy.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] L2ARC in Cluster is picked up althought not part of the pool

2010-01-21 Thread Richard Elling
[Richard makes a hobby of confusing Dan :-)]
more below..

On Jan 21, 2010, at 1:13 PM, Daniel Carosone wrote:

 On Thu, Jan 21, 2010 at 09:36:06AM -0800, Richard Elling wrote:
 On Jan 20, 2010, at 4:17 PM, Daniel Carosone wrote:
 
 On Wed, Jan 20, 2010 at 03:20:20PM -0800, Richard Elling wrote:
 Though the ARC case, PSARC/2007/618 is unpublished, I gather from
 googling and the source that L2ARC devices are considered auxiliary,
 in the same category as spares. If so, then it is perfectly reasonable to
 expect that it gets picked up regardless of the GUID. This also implies
 that it is shareable between pools until assigned. Brief testing confirms
 this behaviour.  I learn something new every day :-)
 
 So, I suspect Lutz sees a race when both pools are imported onto one
 node.  This still makes me nervous though...
 
 Yes. What if device reconfiguration renumbers my controllers, will
 l2arc suddenly start trashing a data disk?  The same problem used to
 be a risk for swap,  but less so now that we swap to named zvol. 
 
 This will not happen unless the labels are rewritten on your data disk, 
 and if that occurs, all bets are off.
 
 It occurred to me later yesterday, while offline, that the pool in
 question might have autoreplace=on set.  If that were true, it would
 explain why a disk in the same controller slot was overwritten and
 used.
 
 Lutz, is the pool autoreplace property on?  If so, god help us all
 is no longer quite so necessary.

I think this is a different issue. But since the label in a cache device does
not associate it with a pool, it is possible that any pool which expects a
cache will find it.  This seems to be as designed.

 There's work afoot to make l2arc persistent across reboot, which
 implies some organised storage structure on the device.  Fixing this
 shouldn't wait for that.
 
 Upon further review, the ruling on the field is confirmed ;-)  The L2ARC
 is shared amongst pools just like the ARC. What is important is that at
 least one pool has a cache vdev. 
 
 Wait, huh?  That's a totally separate issue from what I understood
 from the discussion.  What I was worried about was that disk Y, that
 happened to have the same cLtMdN address as disk X on another node,
 was overwritten and trashed on import to become l2arc.  
 
 Maybe I missed some other detail in the thread and reached the wrong
 conclusion? 
 
 As such, for Lutz's configuration, I am now less nervous. If I understand
 correctly, you could add the cache vdev to rpool and forget about how
 it works with the shared pools.
 
 The fact that l2arc devices could be caching data from any pool in the
 system is .. a whole different set of (mostly performance) wrinkles.
 
 For example, if I have a pool of very slow disks (usb or remote
 iscsi), and a pool of faster disks, and l2arc for the slow pool on the
 same faster disks, it's pointless having the faster pool using l2arc
 on the same disks or even the same type of disks.  I'd need to set the
 secondarycache properties of one pool according to the configuration
 of another. 

Don't use slow devices for L2ARC.

Secondarycache is a dataset property, not a pool property.  You can
definitely manage the primary and secondary cache policies for each
dataset.

 I suppose one could make the case
 that a new command is needed in addition to zpool and zfs (!) to manage
 such devices. But perhaps we can live with the oddity for a while?
 
 This part, I expect, will be resolved or clarified as part of the
 l2arc persistence work, since then their attachment to specific pools
 will need to be clear and explicit.

Since the ARC is shared amongst all pools, it makes sense to share
L2ARC amongst all pools.

 Perhaps the answer is that the cache devices become their own pool
 (since they're going to need filesystem-like structured storage
 anyway). The actual cache could be a zvol (or new object type) within
 that pool, and then (if necessary) an association is made between
 normal pools and the cache (especially if I have multiple of them).
 No new top-level commands needed. 

I propose a best practice of adding the cache device to rpool and be 
happy.
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] L2ARC in Cluster is picked up althought not part of the pool

2010-01-21 Thread Daniel Carosone
On Thu, Jan 21, 2010 at 03:33:28PM -0800, Richard Elling wrote:
 [Richard makes a hobby of confusing Dan :-)]

Heh.

  Lutz, is the pool autoreplace property on?  If so, god help us all
  is no longer quite so necessary.
 
 I think this is a different issue.

I agree. For me, it was the main issue, and I still want clarity on
it.  However, at this point I'll go back to the start of the thread
and look at what was actually reported again in more detail.  

 But since the label in a cache device does
 not associate it with a pool, it is possible that any pool which expects a
 cache will find it.  This seems to be as designed.

Hm. My recollection was that node b's disk in that controller slot was
totally unlabelled, but perhaps I'm misremembering.. as above.

  For example, if I have a pool of very slow disks (usb or remote
  iscsi), and a pool of faster disks, and l2arc for the slow pool on the
  same faster disks, it's pointless having the faster pool using l2arc
  on the same disks or even the same type of disks.  I'd need to set the
  secondarycache properties of one pool according to the configuration
  of another. 
 
 Don't use slow devices for L2ARC.

Slow is entirely relative, as we discussed here just recently.  They
just need to be faster than the pool devices I want to cache.  The
wrinkle here is that it's now clear they should be faster than the
devices in all other pools as well (or I need to take special
measures).

Faster is better regardless, and suitable l2arc ssd's are cheap
enough now.  It's mostly academic that, previously, faster/local hard
disks were fast enough, since now you can have both.

 Secondarycache is a dataset property, not a pool property.  You can
 definitely manage the primary and secondary cache policies for each
 dataset.

Yeah, properties of the root fs and of the pool are easily conflated.

  such devices. But perhaps we can live with the oddity for a while?
  
  This part, I expect, will be resolved or clarified as part of the
  l2arc persistence work, since then their attachment to specific pools
  will need to be clear and explicit.
 
 Since the ARC is shared amongst all pools, it makes sense to share
 L2ARC amongst all pools.

Of course it does - apart from the wrinkles we now know we need to
watch out for.

  Perhaps the answer is that the cache devices become their own pool
  (since they're going to need filesystem-like structured storage
  anyway). The actual cache could be a zvol (or new object type) within
  that pool, and then (if necessary) an association is made between
  normal pools and the cache (especially if I have multiple of them).
  No new top-level commands needed. 
 
 I propose a best practice of adding the cache device to rpool and be 
 happy.

It is *still* not that simple.  Forget my slow disks caching an even
slower pool (which is still fast enough for my needs, thanks to the
cache and zil).

Consider a server config thus:
 - two MLC SSDs (x25-M, OCZ Vertex, whatever)
 - SSDs partitioned in two, mirrored rpool  2x l2arc
 - a bunch of disks for a data pool

This is a likely/common configuration, commodity systems being limited
mostly by number of sata ports.  I'd even go so far as to propose it
as another best practice, for those circumstances.

Now, why would I waste l2arc space, bandwidth, and wear cycles to
cache rpool to the same ssd's that would be read on a miss anyway?  

So, there's at least one more step required for happiness:
 # zfs set secondarycache=none rpool

(plus relying on property inheritance through the rest of rpool)

--
Dan.



pgph2OAJgbY6C.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] L2ARC in Cluster is picked up althought not part of the pool

2010-01-21 Thread Richard Elling
On Jan 21, 2010, at 4:32 PM, Daniel Carosone wrote:
 I propose a best practice of adding the cache device to rpool and be 
 happy.
 
 It is *still* not that simple.  Forget my slow disks caching an even
 slower pool (which is still fast enough for my needs, thanks to the
 cache and zil).
 
 Consider a server config thus:
 - two MLC SSDs (x25-M, OCZ Vertex, whatever)
 - SSDs partitioned in two, mirrored rpool  2x l2arc
 - a bunch of disks for a data pool
 
 This is a likely/common configuration, commodity systems being limited
 mostly by number of sata ports.  I'd even go so far as to propose it
 as another best practice, for those circumstances.

 Now, why would I waste l2arc space, bandwidth, and wear cycles to
 cache rpool to the same ssd's that would be read on a miss anyway?  
 
 So, there's at least one more step required for happiness:
 # zfs set secondarycache=none rpool
 
 (plus relying on property inheritance through the rest of rpool)

I agree with this, except for the fact that the most common installers
(LiveCD, Nexenta, etc.) use the whole disk for rpool[1].  So the likely
and common configuration today is moving towards one whole
root disk.  That could change in the future.

[1] Solaris 10?  well... since installation hard anyway, might as well do this.
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] L2ARC in Cluster is picked up althought not part of the pool

2010-01-21 Thread Daniel Carosone
On Thu, Jan 21, 2010 at 05:52:57PM -0800, Richard Elling wrote:
 I agree with this, except for the fact that the most common installers
 (LiveCD, Nexenta, etc.) use the whole disk for rpool[1]. 

Er, no. You certainly get the option of whole disk or make
partitions, at least with the opensolaris livecd.

--
Dan.




pgpBWoV2Vz5kt.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] L2ARC in Cluster is picked up althought not part of the pool

2010-01-20 Thread Lutz Schumann
Hello, 

we tested clustering with ZFS and the setup looks like this: 

- 2 head nodes (nodea, nodeb)
- head nodes contain l2arc devices (nodea_l2arc, nodeb_l2arc)
- two external jbods
- two mirror zpools (pool1,pool2)
   - each mirror is a mirror of one disk from each jbod
- no ZIL (anyone knows a well priced SAS SSD ?)

We want active/active and added the l2arc to the pools. 

- pool1 has nodea_l2arc as cache
- pool2 has nodeb_l2arc as cache

Everything is great so far. 

One thing to node is that the nodea_l2arc and nodea_l2arc are named equally ! 
(c0t2d0 on both nodes).

What we found is that during tests, the pool just picked up the device 
nodeb_l2arc automatically, altought is was never explicitly added to the pool 
pool1.

We had a setup stage when pool1 was configured on nodea with nodea_l2arc and 
pool2 was configured on nodeb without a l2arc. Then we did a failover. Then 
pool1 pickup up the (until then) unconfigured nodeb_l2arc. 

Is this intended ? Why is a L2ARC device automatically picked up if the device 
name is the same ? 

In a later stage we had both pools configured with the corresponding l2arc 
device. (po...@nodea with nodea_l2arc and po...@nodeb with nodeb_l2arc). Then 
we also did a failover. The l2arc device of the pool failing over was marked as 
too many corruptions instead of missing. 

So from this tests it looks like ZFS just picks up the device with the same 
name and replaces the l2arc without looking at the device signatures to only 
consider devices beeing part of a pool.

We have not tested with a data disk as c0t2d0 but if the same behaviour 
occurs - god save us all.

Can someone clarify the logic behind this ? 

Can also someone give a hint how to rename SAS disk devices in opensolaris ? 
(to workaround I would like to rename c0t2d0 on nodea (nodea_l2arc) to c0t24d0 
and c0t2d0 on nodeb (nodea_l2arc) to c0t48d0). 

P.s. Release is build 104 (NexentaCore 2). 

Thanks!
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] L2ARC in Cluster is picked up althought not part of the pool

2010-01-20 Thread Richard Elling
Hi Lutz,

On Jan 20, 2010, at 3:17 AM, Lutz Schumann wrote:

 Hello, 
 
 we tested clustering with ZFS and the setup looks like this: 
 
 - 2 head nodes (nodea, nodeb)
 - head nodes contain l2arc devices (nodea_l2arc, nodeb_l2arc)

This makes me nervous. I suspect this is not in the typical QA 
test plan.

 - two external jbods
 - two mirror zpools (pool1,pool2)
   - each mirror is a mirror of one disk from each jbod
 - no ZIL (anyone knows a well priced SAS SSD ?)
 
 We want active/active and added the l2arc to the pools. 
 
 - pool1 has nodea_l2arc as cache
 - pool2 has nodeb_l2arc as cache
 
 Everything is great so far. 
 
 One thing to node is that the nodea_l2arc and nodea_l2arc are named equally ! 
 (c0t2d0 on both nodes).
 
 What we found is that during tests, the pool just picked up the device 
 nodeb_l2arc automatically, altought is was never explicitly added to the pool 
 pool1.

This is strange. Each vdev is supposed to be uniquely identified by its GUID.
This is how ZFS can identify the proper configuration when two pools have 
the same name. Can you check the GUIDs (using zdb) to see if there is a
collision?
 -- richard

 We had a setup stage when pool1 was configured on nodea with nodea_l2arc and 
 pool2 was configured on nodeb without a l2arc. Then we did a failover. Then 
 pool1 pickup up the (until then) unconfigured nodeb_l2arc. 
 
 Is this intended ? Why is a L2ARC device automatically picked up if the 
 device name is the same ? 
 
 In a later stage we had both pools configured with the corresponding l2arc 
 device. (po...@nodea with nodea_l2arc and po...@nodeb with nodeb_l2arc). Then 
 we also did a failover. The l2arc device of the pool failing over was marked 
 as too many corruptions instead of missing. 
 
 So from this tests it looks like ZFS just picks up the device with the same 
 name and replaces the l2arc without looking at the device signatures to only 
 consider devices beeing part of a pool.
 
 We have not tested with a data disk as c0t2d0 but if the same behaviour 
 occurs - god save us all.
 
 Can someone clarify the logic behind this ? 
 
 Can also someone give a hint how to rename SAS disk devices in opensolaris ? 
 (to workaround I would like to rename c0t2d0 on nodea (nodea_l2arc) to 
 c0t24d0 and c0t2d0 on nodeb (nodea_l2arc) to c0t48d0). 
 
 P.s. Release is build 104 (NexentaCore 2). 
 
 Thanks!
 -- 
 This message posted from opensolaris.org
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] L2ARC in Cluster is picked up althought not part of the pool

2010-01-20 Thread Tomas Ögren
On 20 January, 2010 - Richard Elling sent me these 2,7K bytes:

 Hi Lutz,
 
 On Jan 20, 2010, at 3:17 AM, Lutz Schumann wrote:
 
  Hello, 
  
  we tested clustering with ZFS and the setup looks like this: 
  
  - 2 head nodes (nodea, nodeb)
  - head nodes contain l2arc devices (nodea_l2arc, nodeb_l2arc)
 
 This makes me nervous. I suspect this is not in the typical QA 
 test plan.
 
  - two external jbods
  - two mirror zpools (pool1,pool2)
- each mirror is a mirror of one disk from each jbod
  - no ZIL (anyone knows a well priced SAS SSD ?)
  
  We want active/active and added the l2arc to the pools. 
  
  - pool1 has nodea_l2arc as cache
  - pool2 has nodeb_l2arc as cache
  
  Everything is great so far. 
  
  One thing to node is that the nodea_l2arc and nodea_l2arc are named equally 
  ! (c0t2d0 on both nodes).
  
  What we found is that during tests, the pool just picked up the device 
  nodeb_l2arc automatically, altought is was never explicitly added to the 
  pool pool1.
 
 This is strange. Each vdev is supposed to be uniquely identified by its GUID.
 This is how ZFS can identify the proper configuration when two pools have 
 the same name. Can you check the GUIDs (using zdb) to see if there is a
 collision?

Reproducable:

itchy:/tmp/blah# mkfile 64m 64m disk1
itchy:/tmp/blah# zfs create -V 64m rpool/blahcache
itchy:/tmp/blah# zpool create blah /tmp/blah/disk1 
itchy:/tmp/blah# zpool add blah cache /dev/zvol/dsk/rpool/blahcache 
itchy:/tmp/blah# zpool status blah
  pool: blah
 state: ONLINE
 scrub: none requested
config:

NAME STATE READ WRITE CKSUM
blah ONLINE   0 0 0
  /tmp/blah/disk1ONLINE   0 0 0
cache
  /dev/zvol/dsk/rpool/blahcache  ONLINE   0 0 0

errors: No known data errors
itchy:/tmp/blah# zpool export blah
itchy:/tmp/blah# zdb -l /dev/zvol/dsk/rpool/blahcache 

LABEL 0

version=15
state=4
guid=6931317478877305718

itchy:/tmp/blah# zfs destroy rpool/blahcache
itchy:/tmp/blah# zfs create -V 64m rpool/blahcache
itchy:/tmp/blah# dd if=/dev/zero of=/dev/zvol/dsk/rpool/blahcache bs=1024k 
count=64
64+0 records in
64+0 records out
67108864 bytes (67 MB) copied, 0.559299 seconds, 120 MB/s
itchy:/tmp/blah# zpool import -d /tmp/blah
  pool: blah
id: 16691059548146709374
 state: ONLINE
action: The pool can be imported using its name or numeric identifier.
config:

blah ONLINE
  /tmp/blah/disk1ONLINE
cache
  /dev/zvol/dsk/rpool/blahcache
itchy:/tmp/blah# zdb -l /dev/zvol/dsk/rpool/blahcache

LABEL 0


LABEL 1


LABEL 2


LABEL 3

itchy:/tmp/blah# zpool import -d /tmp/blah blah
itchy:/tmp/blah# zpool status
  pool: blah
 state: ONLINE
 scrub: none requested
config:

NAME STATE READ WRITE CKSUM
blah ONLINE   0 0 0
  /tmp/blah/disk1ONLINE   0 0 0
cache
  /dev/zvol/dsk/rpool/blahcache  ONLINE   0 0 0

errors: No known data errors
itchy:/tmp/blah# zdb -l /dev/zvol/dsk/rpool/blahcache

LABEL 0

version=15
state=4
guid=6931317478877305718
...


It did indeed overwrite my formerly clean blahcache.

Smells like a serious bug.

/Tomas
-- 
Tomas Ögren, st...@acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se

  -- richard
 
  We had a setup stage when pool1 was configured on nodea with nodea_l2arc 
  and pool2 was configured on nodeb without a l2arc. Then we did a failover. 
  Then pool1 pickup up the (until then) unconfigured nodeb_l2arc. 
  
  Is this intended ? Why is a L2ARC device automatically picked up if the 
  device name is the same ? 
  
  In a later stage we had both pools configured with the corresponding l2arc 
  device. (po...@nodea with nodea_l2arc and po...@nodeb with nodeb_l2arc). 
  Then we also did a failover. The l2arc device of the pool failing over was 
  marked as too many corruptions instead of missing. 
  
  So from this tests it looks like ZFS just picks up the device with the same 
  name and replaces the l2arc without looking at the device signatures to 
  only consider devices beeing part of a pool.
  
  We have not tested with a data disk as c0t2d0 but if the same behaviour 
  

Re: [zfs-discuss] L2ARC in Cluster is picked up althought not part of the pool

2010-01-20 Thread Richard Elling
Though the ARC case, PSARC/2007/618 is unpublished, I gather from
googling and the source that L2ARC devices are considered auxiliary,
in the same category as spares. If so, then it is perfectly reasonable to
expect that it gets picked up regardless of the GUID. This also implies
that it is shareable between pools until assigned. Brief testing confirms
this behaviour.  I learn something new every day :-)

So, I suspect Lutz sees a race when both pools are imported onto one
node.  This still makes me nervous though...
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] L2ARC in Cluster is picked up althought not part of the pool

2010-01-20 Thread Daniel Carosone
On Wed, Jan 20, 2010 at 03:20:20PM -0800, Richard Elling wrote:
 Though the ARC case, PSARC/2007/618 is unpublished, I gather from
 googling and the source that L2ARC devices are considered auxiliary,
 in the same category as spares. If so, then it is perfectly reasonable to
 expect that it gets picked up regardless of the GUID. This also implies
 that it is shareable between pools until assigned. Brief testing confirms
 this behaviour.  I learn something new every day :-)
 
 So, I suspect Lutz sees a race when both pools are imported onto one
 node.  This still makes me nervous though...

Yes. What if device reconfiguration renumbers my controllers, will
l2arc suddenly start trashing a data disk?  The same problem used to
be a risk for swap,  but less so now that we swap to named zvol. 

There's work afoot to make l2arc persistent across reboot, which
implies some organised storage structure on the device.  Fixing this
shouldn't wait for that.

--
Dan.

pgp1Mb4Zg7Mxp.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss