Re: [zfs-discuss] partioned cache devices

2013-03-19 Thread Ian Collins

Andrew Werchowiecki wrote:


Thanks for the info about slices, I may give that a go later on. I’m 
not keen on that because I have clear evidence (as in zpools set up 
this way, right now, working, without issue) that GPT partitions of 
the style shown above work and I want to see why it doesn’t work in my 
set up rather than simply ignoring and moving on.




Didn't you read Richard's post? You can have only one Solaris partition 
at a time.


Your original example failed when you tried to add a second.

--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] partioned cache devices

2013-03-19 Thread Cindy Swearingen

Hi Andrew,

Your original syntax was incorrect.

A p* device is a larger container for the d* device or s* devices.
In the case of a cache device, you need to specify a d* or s* device.
That you can add p* devices to a pool is a bug.

Adding different slices from c25t10d1 as both log and cache devices
would need the s* identifier, but you've already added the entire
c25t10d1 as the log device. A better configuration would be using
c25t10d1 for log and using c25t9d1 for cache or provide some spares
for this large pool.

After you remove the log devices, re-add like this:

# zpool add aggr0 log c25t10d1
# zpool add aggr0 cache c25t9d1

You might review the ZFS recommendation practices section, here:

http://docs.oracle.com/cd/E26502_01/html/E29007/zfspools-4.html#storage-2

See example 3-4 for adding a cache device, here:

http://docs.oracle.com/cd/E26502_01/html/E29007/gayrd.html#gazgw

Always have good backups.

Thanks, Cindy



On 03/18/13 23:23, Andrew Werchowiecki wrote:

I did something like the following:

format -e /dev/rdsk/c5t0d0p0

fdisk

1 (create)

F (EFI)

6 (exit)

partition

label

1

y

0

usr

wm

64

4194367e

1

usr

wm

4194368

117214990

label

1

y

Total disk size is 9345 cylinders

Cylinder size is 12544 (512 byte) blocks

Cylinders

Partition Status Type Start End Length %

= ==  = === == ===

1 EFI 0 9345 9346 100

partition print

Current partition table (original):

Total disk sectors available: 117214957 + 16384 (reserved sectors)

Part Tag Flag First Sector Size Last Sector

0 usr wm 64 2.00GB 4194367

1 usr wm 4194368 53.89GB 117214990

2 unassigned wm 0 0 0

3 unassigned wm 0 0 0

4 unassigned wm 0 0 0

5 unassigned wm 0 0 0

6 unassigned wm 0 0 0

8 reserved wm 117214991 8.00MB 117231374

This isn’t the output from when I did it but it is exactly the same
steps that I followed.

Thanks for the info about slices, I may give that a go later on. I’m not
keen on that because I have clear evidence (as in zpools set up this
way, right now, working, without issue) that GPT partitions of the style
shown above work and I want to see why it doesn’t work in my set up
rather than simply ignoring and moving on.

*From:*Fajar A. Nugraha [mailto:w...@fajar.net]
*Sent:* Sunday, 17 March 2013 3:04 PM
*To:* Andrew Werchowiecki
*Cc:* zfs-discuss@opensolaris.org
*Subject:* Re: [zfs-discuss] partioned cache devices

On Sun, Mar 17, 2013 at 1:01 PM, Andrew Werchowiecki
andrew.werchowie...@xpanse.com.au
mailto:andrew.werchowie...@xpanse.com.au wrote:

I understand that p0 refers to the whole disk... in the logs I
pasted in I'm not attempting to mount p0. I'm trying to work out why
I'm getting an error attempting to mount p2, after p1 has
successfully mounted. Further, this has been done before on other
systems in the same hardware configuration in the exact same
fashion, and I've gone over the steps trying to make sure I haven't
missed something but can't see a fault.

How did you create the partition? Are those marked as solaris partition,
or something else (e.g. fdisk on linux use type 83 by default).

I'm not keen on using Solaris slices because I don't have an
understanding of what that does to the pool's OS interoperability.

Linux can read solaris slice and import solaris-made pools just fine, as
long as you're using compatible zpool version (e.g. zpool version 28).

--

Fajar



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] partioned cache devices

2013-03-19 Thread Andrew Gabriel

Andrew Werchowiecki wrote:


 Total disk size is 9345 cylinders
 Cylinder size is 12544 (512 byte) blocks
 
   Cylinders

  Partition   StatusType  Start   End   Length%
  =   ==  =   ===   ==   ===
  1 EFI   0  93459346100


You only have a p1 (and for a GPT/EFI labeled disk, you can only
have p1 - no other FDISK partitions are allowed).


partition print
Current partition table (original):
Total disk sectors available: 117214957 + 16384 (reserved sectors)
 
Part  TagFlag First Sector Size Last Sector

  0usrwm642.00GB  4194367
  1usrwm   4194368   53.89GB  117214990
  2 unassignedwm 0   0   0
  3 unassignedwm 0   0   0
  4 unassignedwm 0   0   0
  5 unassignedwm 0   0   0
  6 unassignedwm 0   0   0
  8   reservedwm 1172149918.00MB  117231374


You have an s0 and s1.

This isn’t the output from when I did it but it is exactly the same 
steps that I followed.
 
Thanks for the info about slices, I may give that a go later on. I’m not 
keen on that because I have clear evidence (as in zpools set up this 
way, right now, working, without issue) that GPT partitions of the style 
shown above work and I want to see why it doesn’t work in my set up 
rather than simply ignoring and moving on.


You would have to blow away the partitioning you have, and create an FDISK
partitioned disk (not EFI), and then create a p1 and p2 partition. (Don't
use the 'partition' subcommand, which confusingly creates solaris slices.)
Give the FDISK partitions a partition type which nothing will recognise,
such as 'other', so that nothing will try and interpret them as OS partitions.
Then you can use them as raw devices, and they should be portable between
OS's which can handle FDISK partitioned devices.

--
Andrew
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] partioned cache devices

2013-03-19 Thread Jim Klimov

On 2013-03-19 20:38, Cindy Swearingen wrote:

Hi Andrew,

Your original syntax was incorrect.

A p* device is a larger container for the d* device or s* devices.
In the case of a cache device, you need to specify a d* or s* device.
That you can add p* devices to a pool is a bug.


I disagree; at least, I've always thought differently:
the d device is the whole disk denomination, with a
unique number for a particular controller link (c+t).

The disk has some partitioning table, MBR or GPT/EFI.
In these tables, partition p0 stands for the table
itself (i.e. to manage partitioning), and the rest kind
of depends. In case of MBR tables, one partition may
be named as having a Solaris (or Solaris2) type, and
there it holds a SMI table of Solaris slices, and these
slices can hold legacy filesystems or components of ZFS
pools. In case of GPT, the GPT-partitions can be used
directly by ZFS. However, they are also denominated as
slices in ZFS and format utility.

I believe, Solaris-based OSes accessing a p-named
partition and an s-named slice of the same number
on a GPT disk should lead to the same range of bytes
on disk, but I am not really certain about this.

Also, if a whole disk is given to ZFS (and for OSes
other that the latest Solaris 11 this means non-rpool
disks), then ZFS labels the disk as GPT and defines a
partition for itself plus a small trailing partition
(likely to level out discrepancies with replacement
disks that might happen to be a few sectors too small).
In this case ZFS reports that it uses cXtYdZ as a
pool component, since it considers itself in charge
of the partitioning table and its inner contents, and
doesn't intend to share the disk with other usages
(dual-booting and other OSes' partitions, or SLOG and
L2ARC parts, etc). This also allows ZFS to influence
hardware-related choices, like caching and throttling,
and likely auto-expansion with the changed LUN sizes
by fixing up the partition table along the way, since
it assumes being 100% in charge of the disk.

I don't think there is a crime in trying to use the
partitions (of either kind) as ZFS leaf vdevs, even the
zpool(1M) manpage states that:

... The  following  virtual  devices  are supported:
  disk
A block device, typically located under  /dev/dsk.
ZFS  can  use  individual  slices  or  partitions,
though the recommended mode of operation is to use
whole  disks.  ...

This is orthogonal to the fact that there can only be
one Solaris slice table, inside one partition, on MBR.
AFAIK this is irrelevant on GPT/EFI - no SMI slices there.

On my old home NAS with OpenSolaris I certainly did have
MBR partitions on the rpool intended initially for some
dual-booted OSes, but repurposed as L2ARC and ZIL devices
for the storage pool on other disks, when I played with
that technology. Didn't gain much with a single spindle ;)

HTH,
//Jim Klimov

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] partioned cache devices

2013-03-19 Thread Andrew Gabriel

On 03/19/13 20:27, Jim Klimov wrote:

I disagree; at least, I've always thought differently:
the d device is the whole disk denomination, with a
unique number for a particular controller link (c+t).

The disk has some partitioning table, MBR or GPT/EFI.
In these tables, partition p0 stands for the table
itself (i.e. to manage partitioning),


p0 is the whole disk regardless of any partitioning.
(Hence you can use p0 to access any type of partition table.)


and the rest kind
of depends. In case of MBR tables, one partition may
be named as having a Solaris (or Solaris2) type, and
there it holds a SMI table of Solaris slices, and these
slices can hold legacy filesystems or components of ZFS
pools. In case of GPT, the GPT-partitions can be used
directly by ZFS. However, they are also denominated as
slices in ZFS and format utility.


The GPT partitioning spec requires the disk to be FDISK
partitioned with just one single FDISK partition of type EFI,
so that tools which predate GPT partitioning will still see
such a GPT disk as fully assigned to FDISK partitions, and
therefore less likely to be accidentally blown away.


I believe, Solaris-based OSes accessing a p-named
partition and an s-named slice of the same number
on a GPT disk should lead to the same range of bytes
on disk, but I am not really certain about this.


No, you'll see just p0 (whole disk), and p1 (whole disk
less space for the backwards compatible FDISK partitioning).


Also, if a whole disk is given to ZFS (and for OSes
other that the latest Solaris 11 this means non-rpool
disks), then ZFS labels the disk as GPT and defines a
partition for itself plus a small trailing partition
(likely to level out discrepancies with replacement
disks that might happen to be a few sectors too small).
In this case ZFS reports that it uses cXtYdZ as a
pool component,


For an EFI disk, the device name without a final p* or s*
component is the whole EFI partition. (It's actually the
s7 slice minor device node, but the s7 is dropped from
the device name to avoid the confusion we had with s2
on SMI labeled disks being the whole SMI partition.)


since it considers itself in charge
of the partitioning table and its inner contents, and
doesn't intend to share the disk with other usages
(dual-booting and other OSes' partitions, or SLOG and
L2ARC parts, etc). This also allows ZFS to influence
hardware-related choices, like caching and throttling,
and likely auto-expansion with the changed LUN sizes
by fixing up the partition table along the way, since
it assumes being 100% in charge of the disk.

I don't think there is a crime in trying to use the
partitions (of either kind) as ZFS leaf vdevs, even the
zpool(1M) manpage states that:

... The  following  virtual  devices  are supported:
  disk
A block device, typically located under  /dev/dsk.
ZFS  can  use  individual  slices  or  partitions,
though the recommended mode of operation is to use
whole  disks.  ...


Right.


This is orthogonal to the fact that there can only be
one Solaris slice table, inside one partition, on MBR.
AFAIK this is irrelevant on GPT/EFI - no SMI slices there.


There's a simpler way to think of it on x86.
You always have FDISK partitioning (p1, p2, p3, p4).
You can then have SMI or GPT/EFI slices (both called s0, s1, ...)
in an FDISK partition of the appropriate type.
With SMI labeling, s2 is by convention the whole Solaris FDISK
partition (although this is not enforced).
With EFI labeling, s7 is enforced as the whole EFI FDISK partition,
and so the trailing s7 is dropped off the device name for
clarity.

This simplicity is brought about because the GPT spec requires
that backwards compatible FDISK partitioning is included, but
with just 1 partition assigned.

--
Andrew
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] partioned cache devices

2013-03-19 Thread Jim Klimov

On 2013-03-19 22:07, Andrew Gabriel wrote:

The GPT partitioning spec requires the disk to be FDISK
partitioned with just one single FDISK partition of type EFI,
so that tools which predate GPT partitioning will still see
such a GPT disk as fully assigned to FDISK partitions, and
therefore less likely to be accidentally blown away.


Okay, I guess I got entangled in terminology now ;)
Anyhow, your words are not all news to me, though my write-up
was likely misleading to unprepared readers... sigh... Thanks
for the clarifications and deeper details that I did not know!

So, we can concur that GPT does indeed include the fake MBR
header with one EFI partition which addresses the smaller of
2TB (MBR limit) or disk size, minus a few sectors for the GPT
housekeeping. Inside the EFI partition are defined the GPT,
um, partitions (represented as slices in Solaris). This is
after all a GUID *Partition* Table, and that's how parted
refers to them too ;)

Notably, there are also unportable tricks to fool legacy OSes
and bootloaders into addressing the same byte ranges via both
MBR entries (forged manually and abusing the GPT/EFI spec) and
proper GPT entries, as partitions in the sense of each table.

//Jim

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] partioned cache devices

2013-03-18 Thread Andrew Werchowiecki
I did something like the following:

format -e /dev/rdsk/c5t0d0p0
fdisk
1 (create)
F (EFI)
6 (exit)
partition
label
1
y
0
usr
wm
64
4194367e
1
usr
wm
4194368
117214990
label
1
y



 Total disk size is 9345 cylinders
 Cylinder size is 12544 (512 byte) blocks

   Cylinders
  Partition   StatusType  Start   End   Length%
  =   ==  =   ===   ==   ===
  1 EFI   0  93459346100

partition print
Current partition table (original):
Total disk sectors available: 117214957 + 16384 (reserved sectors)

Part  TagFlag First Sector Size Last Sector
  0usrwm642.00GB  4194367
  1usrwm   4194368   53.89GB  117214990
  2 unassignedwm 0   0   0
  3 unassignedwm 0   0   0
  4 unassignedwm 0   0   0
  5 unassignedwm 0   0   0
  6 unassignedwm 0   0   0
  8   reservedwm 1172149918.00MB  117231374

This isn't the output from when I did it but it is exactly the same steps that 
I followed.

Thanks for the info about slices, I may give that a go later on. I'm not keen 
on that because I have clear evidence (as in zpools set up this way, right now, 
working, without issue) that GPT partitions of the style shown above work and I 
want to see why it doesn't work in my set up rather than simply ignoring and 
moving on.

From: Fajar A. Nugraha [mailto:w...@fajar.net]
Sent: Sunday, 17 March 2013 3:04 PM
To: Andrew Werchowiecki
Cc: zfs-discuss@opensolaris.org
Subject: Re: [zfs-discuss] partioned cache devices

On Sun, Mar 17, 2013 at 1:01 PM, Andrew Werchowiecki 
andrew.werchowie...@xpanse.com.aumailto:andrew.werchowie...@xpanse.com.au 
wrote:
I understand that p0 refers to the whole disk... in the logs I pasted in I'm 
not attempting to mount p0. I'm trying to work out why I'm getting an error 
attempting to mount p2, after p1 has successfully mounted. Further, this has 
been done before on other systems in the same hardware configuration in the 
exact same fashion, and I've gone over the steps trying to make sure I haven't 
missed something but can't see a fault.

How did you create the partition? Are those marked as solaris partition, or 
something else (e.g. fdisk on linux use type 83 by default).

I'm not keen on using Solaris slices because I don't have an understanding of 
what that does to the pool's OS interoperability.


Linux can read solaris slice and import solaris-made pools just fine, as long 
as you're using compatible zpool version (e.g. zpool version 28).

--
Fajar
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] partioned cache devices

2013-03-17 Thread Fajar A. Nugraha
On Sun, Mar 17, 2013 at 1:01 PM, Andrew Werchowiecki 
andrew.werchowie...@xpanse.com.au wrote:

 I understand that p0 refers to the whole disk... in the logs I pasted in
 I'm not attempting to mount p0. I'm trying to work out why I'm getting an
 error attempting to mount p2, after p1 has successfully mounted. Further,
 this has been done before on other systems in the same hardware
 configuration in the exact same fashion, and I've gone over the steps
 trying to make sure I haven't missed something but can't see a fault.


How did you create the partition? Are those marked as solaris partition, or
something else (e.g. fdisk on linux use type 83 by default).

I'm not keen on using Solaris slices because I don't have an understanding
 of what that does to the pool's OS interoperability.



Linux can read solaris slice and import solaris-made pools just fine, as
long as you're using compatible zpool version (e.g. zpool version 28).

-- 
Fajar
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] partioned cache devices

2013-03-16 Thread Andrew Werchowiecki
It's a home set up, the performance penalty from splitting the cache devices is 
non-existant, and that work around sounds like some pretty crazy amount of 
overhead where I could instead just have a mirrored slog.

I'm less concerned about wasted space, more concerned about amount of SAS ports 
I have available.

I understand that p0 refers to the whole disk... in the logs I pasted in I'm 
not attempting to mount p0. I'm trying to work out why I'm getting an error 
attempting to mount p2, after p1 has successfully mounted. Further, this has 
been done before on other systems in the same hardware configuration in the 
exact same fashion, and I've gone over the steps trying to make sure I haven't 
missed something but can't see a fault. 

I'm not keen on using Solaris slices because I don't have an understanding of 
what that does to the pool's OS interoperability. 

From: Edward Ned Harvey (opensolarisisdeadlongliveopensolaris) 
[opensolarisisdeadlongliveopensola...@nedharvey.com]
Sent: Friday, 15 March 2013 8:44 PM
To: Andrew Werchowiecki; zfs-discuss@opensolaris.org
Subject: RE: partioned cache devices

 From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
 boun...@opensolaris.org] On Behalf Of Andrew Werchowiecki

 muslimwookie@Pyzee:~$ sudo zpool add aggr0 cache c25t10d1p2
 Password:
 cannot open '/dev/dsk/c25t10d1p2': I/O error
 muslimwookie@Pyzee:~$

 I have two SSDs in the system, I've created an 8gb partition on each drive for
 use as a mirrored write cache. I also have the remainder of the drive
 partitioned for use as the read only cache. However, when attempting to add
 it I get the error above.

Sounds like you're probably running into confusion about how to partition the 
drive.  If you create fdisk partitions, they will be accessible as p0, p1, p2, 
but I think p0 unconditionally refers to the whole drive, so the first 
partition is p1, and the second is p2.

If you create one big solaris fdisk parititon and then slice it via partition 
where s2 is typically the encompassing slice, and people usually use s1 and s2 
and s6 for actual slices, then they will be accessible via s1, s2, s6

Generally speaking, it's unadvisable to split the slog/cache devices anyway.  
Because:

If you're splitting it, evidently you're focusing on the wasted space.  Buying 
an expensive 128G device where you couldn't possibly ever use more than 4G or 
8G in the slog.  But that's not what you should be focusing on.  You should be 
focusing on the speed (that's why you bought it in the first place.)  The slog 
is write-only, and the cache is a mixture of read/write, where it should be 
hopefully doing more reads than writes.  But regardless of your actual success 
with the cache device, your cache device will be busy most of the time, and 
competing against the slog.

You have a mirror, you say.  You should probably drop both the cache  log.  
Use one whole device for the cache, use one whole device for the log.  The only 
risk you'll run is:

Since a slog is write-only (except during mount, typically at boot) it's 
possible to have a failure mode where you think you're writing to the log, but 
the first time you go back and read, you discover an error, and discover the 
device has gone bad.  In other words, without ever doing any reads, you might 
not notice when/if the device goes bad.  Fortunately, there's an easy 
workaround.  You could periodically (say, once a month) script the removal of 
your log device, create a junk pool, write a bunch of data to it, scrub it 
(thus verifying it was written correctly) and in the absence of any scrub 
errors, destroy the junk pool and re-add the device as a slog to the main pool.

I've never heard of anyone actually being that paranoid, and I've never heard 
of anyone actually experiencing the aforementioned possible undetected device 
failure mode.  So this is all mostly theoretical.

Mirroring the slog device really isn't necessary in the modern age.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] partioned cache devices

2013-03-16 Thread Richard Elling

On Mar 16, 2013, at 7:01 PM, Andrew Werchowiecki 
andrew.werchowie...@xpanse.com.au wrote:

 It's a home set up, the performance penalty from splitting the cache devices 
 is non-existant, and that work around sounds like some pretty crazy amount of 
 overhead where I could instead just have a mirrored slog.
 
 I'm less concerned about wasted space, more concerned about amount of SAS 
 ports I have available.
 
 I understand that p0 refers to the whole disk... in the logs I pasted in I'm 
 not attempting to mount p0. I'm trying to work out why I'm getting an error 
 attempting to mount p2, after p1 has successfully mounted. Further, this has 
 been done before on other systems in the same hardware configuration in the 
 exact same fashion, and I've gone over the steps trying to make sure I 
 haven't missed something but can't see a fault. 

You can have only one Solaris partition at a time. Ian already shared the 
answer, Create one 100% 
Solaris partition and then use format to create two slices.
 -- richard

 
 I'm not keen on using Solaris slices because I don't have an understanding of 
 what that does to the pool's OS interoperability. 
 
 From: Edward Ned Harvey (opensolarisisdeadlongliveopensolaris) 
 [opensolarisisdeadlongliveopensola...@nedharvey.com]
 Sent: Friday, 15 March 2013 8:44 PM
 To: Andrew Werchowiecki; zfs-discuss@opensolaris.org
 Subject: RE: partioned cache devices
 
 From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
 boun...@opensolaris.org] On Behalf Of Andrew Werchowiecki
 
 muslimwookie@Pyzee:~$ sudo zpool add aggr0 cache c25t10d1p2
 Password:
 cannot open '/dev/dsk/c25t10d1p2': I/O error
 muslimwookie@Pyzee:~$
 
 I have two SSDs in the system, I've created an 8gb partition on each drive 
 for
 use as a mirrored write cache. I also have the remainder of the drive
 partitioned for use as the read only cache. However, when attempting to add
 it I get the error above.
 
 Sounds like you're probably running into confusion about how to partition the 
 drive.  If you create fdisk partitions, they will be accessible as p0, p1, 
 p2, but I think p0 unconditionally refers to the whole drive, so the first 
 partition is p1, and the second is p2.
 
 If you create one big solaris fdisk parititon and then slice it via 
 partition where s2 is typically the encompassing slice, and people usually 
 use s1 and s2 and s6 for actual slices, then they will be accessible via s1, 
 s2, s6
 
 Generally speaking, it's unadvisable to split the slog/cache devices anyway.  
 Because:
 
 If you're splitting it, evidently you're focusing on the wasted space.  
 Buying an expensive 128G device where you couldn't possibly ever use more 
 than 4G or 8G in the slog.  But that's not what you should be focusing on.  
 You should be focusing on the speed (that's why you bought it in the first 
 place.)  The slog is write-only, and the cache is a mixture of read/write, 
 where it should be hopefully doing more reads than writes.  But regardless of 
 your actual success with the cache device, your cache device will be busy 
 most of the time, and competing against the slog.
 
 You have a mirror, you say.  You should probably drop both the cache  log.  
 Use one whole device for the cache, use one whole device for the log.  The 
 only risk you'll run is:
 
 Since a slog is write-only (except during mount, typically at boot) it's 
 possible to have a failure mode where you think you're writing to the log, 
 but the first time you go back and read, you discover an error, and discover 
 the device has gone bad.  In other words, without ever doing any reads, you 
 might not notice when/if the device goes bad.  Fortunately, there's an easy 
 workaround.  You could periodically (say, once a month) script the removal of 
 your log device, create a junk pool, write a bunch of data to it, scrub it 
 (thus verifying it was written correctly) and in the absence of any scrub 
 errors, destroy the junk pool and re-add the device as a slog to the main 
 pool.
 
 I've never heard of anyone actually being that paranoid, and I've never heard 
 of anyone actually experiencing the aforementioned possible undetected device 
 failure mode.  So this is all mostly theoretical.
 
 Mirroring the slog device really isn't necessary in the modern age.
 
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

-- 

ZFS and performance consulting
http://www.RichardElling.com
















___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] partioned cache devices

2013-03-15 Thread Ian Collins

Andrew Werchowiecki wrote:


Hi all,

I'm having some trouble with adding cache drives to a zpool, anyone 
got any ideas?


muslimwookie@Pyzee:~$ sudo zpool add aggr0 cache c25t10d1p2

Password:

cannot open '/dev/dsk/c25t10d1p2': I/O error

muslimwookie@Pyzee:~$

I have two SSDs in the system, I've created an 8gb partition on each 
drive for use as a mirrored write cache. I also have the remainder of 
the drive partitioned for use as the read only cache. However, when 
attempting to add it I get the error above.




Create one 100% Solaris partition and then use format to create two slices.

--
Ian.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] partioned cache devices

2013-03-15 Thread Edward Ned Harvey (opensolarisisdeadlongliveopensolaris)
 From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
 boun...@opensolaris.org] On Behalf Of Andrew Werchowiecki
 
 muslimwookie@Pyzee:~$ sudo zpool add aggr0 cache c25t10d1p2
 Password:
 cannot open '/dev/dsk/c25t10d1p2': I/O error
 muslimwookie@Pyzee:~$
 
 I have two SSDs in the system, I've created an 8gb partition on each drive for
 use as a mirrored write cache. I also have the remainder of the drive
 partitioned for use as the read only cache. However, when attempting to add
 it I get the error above.

Sounds like you're probably running into confusion about how to partition the 
drive.  If you create fdisk partitions, they will be accessible as p0, p1, p2, 
but I think p0 unconditionally refers to the whole drive, so the first 
partition is p1, and the second is p2.

If you create one big solaris fdisk parititon and then slice it via partition 
where s2 is typically the encompassing slice, and people usually use s1 and s2 
and s6 for actual slices, then they will be accessible via s1, s2, s6

Generally speaking, it's unadvisable to split the slog/cache devices anyway.  
Because:  

If you're splitting it, evidently you're focusing on the wasted space.  Buying 
an expensive 128G device where you couldn't possibly ever use more than 4G or 
8G in the slog.  But that's not what you should be focusing on.  You should be 
focusing on the speed (that's why you bought it in the first place.)  The slog 
is write-only, and the cache is a mixture of read/write, where it should be 
hopefully doing more reads than writes.  But regardless of your actual success 
with the cache device, your cache device will be busy most of the time, and 
competing against the slog.

You have a mirror, you say.  You should probably drop both the cache  log.  
Use one whole device for the cache, use one whole device for the log.  The only 
risk you'll run is:

Since a slog is write-only (except during mount, typically at boot) it's 
possible to have a failure mode where you think you're writing to the log, but 
the first time you go back and read, you discover an error, and discover the 
device has gone bad.  In other words, without ever doing any reads, you might 
not notice when/if the device goes bad.  Fortunately, there's an easy 
workaround.  You could periodically (say, once a month) script the removal of 
your log device, create a junk pool, write a bunch of data to it, scrub it 
(thus verifying it was written correctly) and in the absence of any scrub 
errors, destroy the junk pool and re-add the device as a slog to the main pool.

I've never heard of anyone actually being that paranoid, and I've never heard 
of anyone actually experiencing the aforementioned possible undetected device 
failure mode.  So this is all mostly theoretical.

Mirroring the slog device really isn't necessary in the modern age.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] partioned cache devices

2013-03-14 Thread Andrew Werchowiecki
Hi all,

I'm having some trouble with adding cache drives to a zpool, anyone got any 
ideas?

muslimwookie@Pyzee:~$ sudo zpool add aggr0 cache c25t10d1p2
Password:
cannot open '/dev/dsk/c25t10d1p2': I/O error
muslimwookie@Pyzee:~$

I have two SSDs in the system, I've created an 8gb partition on each drive for 
use as a mirrored write cache. I also have the remainder of the drive 
partitioned for use as the read only cache. However, when attempting to add it 
I get the error above.

Here's a zpool status:

  pool: aggr0
state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Thu Feb 21 21:13:45 2013
1.13T scanned out of 20.0T at 106M/s, 51h52m to go
74.2G resilvered, 5.65% done
config:

NAME STATE READ WRITE CKSUM
aggr0DEGRADED 0 0 0
  raidz2-0   DEGRADED 0 0 0
c7t5000C50035CA68EDd0ONLINE   0 0 0
c7t5000C5003679D3E2d0ONLINE   0 0 0
c7t50014EE2B16BC08Bd0ONLINE   0 0 0
c7t50014EE2B174216Dd0ONLINE   0 0 0
c7t50014EE2B174366Bd0ONLINE   0 0 0
c7t50014EE25C1E7646d0ONLINE   0 0 0
c7t50014EE25C17A62Cd0ONLINE   0 0 0
c7t50014EE25C17720Ed0ONLINE   0 0 0
c7t50014EE206C2AFD1d0ONLINE   0 0 0
c7t50014EE206C8E09Fd0ONLINE   0 0 0
c7t50014EE602DFAACAd0ONLINE   0 0 0
c7t50014EE602DFE701d0ONLINE   0 0 0
c7t50014EE20677C1C1d0ONLINE   0 0 0
replacing-13 UNAVAIL  0 0 0
  c7t50014EE6031198C1d0  UNAVAIL  0 0 0  cannot open
  c7t50014EE0AE2AB006d0  ONLINE   0 0 0  (resilvering)
c7t50014EE65835480Dd0ONLINE   0 0 0
logs
  mirror-1   ONLINE   0 0 0
c25t10d1p1   ONLINE   0 0 0
c25t9d1p1ONLINE   0 0 0

errors: No known data errors

As you can see, I've successfully added the 8gb partitions in a write caches. 
Interestingly, when I do a zpool iostat -v it shows the total as 111gb:

capacity operationsbandwidth
pool alloc   free   read  write   read  write
---  -  -  -  -  -  -
aggr020.0T  7.27T  1.33K139  81.7M  4.19M
  raidz2 20.0T  7.27T  1.33K115  81.7M  2.70M
c7t5000C50035CA68EDd0-  -566  9  6.91M   241K
c7t5000C5003679D3E2d0-  -493  8  6.97M   242K
c7t50014EE2B16BC08Bd0-  -544  9  7.02M   239K
c7t50014EE2B174216Dd0-  -525  9  6.94M   241K
c7t50014EE2B174366Bd0-  -540  9  6.95M   241K
c7t50014EE25C1E7646d0-  -549  9  7.02M   239K
c7t50014EE25C17A62Cd0-  -534  9  6.93M   241K
c7t50014EE25C17720Ed0-  -542  9  6.95M   241K
c7t50014EE206C2AFD1d0-  -549  9  7.02M   239K
c7t50014EE206C8E09Fd0-  -526 10  6.94M   241K
c7t50014EE602DFAACAd0-  -576 10  6.91M   241K
c7t50014EE602DFE701d0-  -591 10  7.00M   239K
c7t50014EE20677C1C1d0-  -530 10  6.95M   241K
replacing-  -  0922  0  7.11M
  c7t50014EE6031198C1d0  -  -  0  0  0  0
  c7t50014EE0AE2AB006d0  -  -  0622  2  7.10M
c7t50014EE65835480Dd0-  -595 10  6.98M   239K
logs -  -  -  -  -  -
  mirror  740K   111G  0 43  0  2.75M
c25t10d1p1   -  -  0 43  3  2.75M
c25t9d1p1-  -  0 43  3  2.75M
---  -  -  -  -  -  -
rpool7.32G  12.6G  2  4  41.9K  43.2K
  c4t0d0s0   7.32G  12.6G  2  4  41.9K  43.2K
---  -  -  -  -  -  -

Something funky is going on here...

Wooks
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss