date:20100407

Re: [zfs-discuss] Are there (non-Sun/Oracle) vendors selling OpenSolaris/ZFS based NAS Hardware?

2010-04-07 Thread Markus Kovero


 Seems like this issue only occurs when MSI-X interrupts are enabled
 for the BCM5709 chips, or am I reading it wrong?
 
 If I type 'echo ::interrupts | mdb -k', and isolate for
 network-related bits, I get the following output:


   IRQ  Vect IPL Bus   Trg Type   CPU Share APIC/INT# ISR(s)
   36   0x60 6   PCI   Lvl Fixed  3   1 0x1/0x4   bnx_intr_1lvl
   48   0x61 6   PCI   Lvl Fixed  2   1 0x1/0x10  bnx_intr_1lvl


 Does this imply that my system is not in a vulnerable configuration?
 Supposedly i'm losing some performance without MSI-X, but I'm not sure
 in which environments or workloads we would notice since the load on
 this server is relatively low, and the L2ARC serves data at greater
 than 100MB/s (wire speed) without stressing much of anything.

 The BIOS settings in our T610 are exactly as they arrived from Dell
 when we bought it over a year ago.

 Thoughts?
 --eric

Unfortunately I see irq type fixed in system that suffers from network issues 
with bnx. But yes, Regarding to redhat material this has something to do with 
Nehalem c-states (power saving etc) and/or MSI.
If your system has been running for year or so, I wouldn't expect this issue to 
come up, we have noted this issue with R410/R710 mostly that are manufactured 
in Q4/2009-Q1/2010 (different hw revisions?) 

Yours
Markus Kovero
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] SSD sale on newegg

2010-04-07 Thread Eugen Leitl

On Tue, Apr 06, 2010 at 05:22:25PM -0700, Carson Gaspar wrote:

 I just found an 8 GB SATA Zeus (Z4S28I) for £83.35 (~US$127) shipped to 
 California. That should be more than large enough for my ZIL @home, 
 based on zilstat.

Transcend sells an 8 GByte SLC SSD for about 70 EUR. The specs
are not awe-inspiring though (I used it in an embedded firewall).
 
 The web site says EOL, limited to current stock.
 
 http://www.dpieshop.com/stec-zeus-z4s28i-8gb-25-sata-ssd-solid-state-drive-industrial-temp-p-410.html
 
 Of course this seems _way_ too good to be true, but I decided to take 
 the risk.

-- 
Eugen* Leitl a href=http://leitl.org;leitl/a http://leitl.org
__
ICBM: 48.07100, 11.36820 http://www.ativel.com http://postbiota.org
8B29F6BE: 099D 78BA 2FD3 B014 B08A  7779 75B0 2443 8B29 F6BE
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-07 Thread Jeroen Roodhart

Hi list,

 If you're running solaris proper, you better mirror
 your
  ZIL log device.  
...
 I plan to get to test this as well, won't be until
 late next week though.

Running OSOL nv130. Power off the machine, removed the F20 and power back on. 
Machines boots OK and comes up normally with the following message in 'zpool 
status':
...
pool: mypool
 state: FAULTED
status: An intent log record could not be read.
Waiting for adminstrator intervention to fix the faulted pool.
action: Either restore the affected device(s) and run 'zpool online',
or ignore the intent log records by running 'zpool clear'.
   see: http://www.sun.com/msg/ZFS-8000-K4
 scrub: none requested
config:

NAMESTATE READ WRITE CKSUM
mypool  FAULTED  0 0 0  bad intent log
...

Nice! Running a later version of ZFS seems to lessen the need for 
ZIL-mirroring...

With kind regards,

Jeroen
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-07 Thread Jeroen Roodhart

Hi list,

 If you're running solaris proper, you better mirror
 your
  ZIL log device.  
...
 I plan to get to test this as well, won't be until
 late next week though.

Running OSOL nv130. Power off the machine, removed the F20 and power back on. 
Machines boots OK and comes up normally with the following message in 'zpool 
status':
...
pool: mypool
 state: FAULTED
status: An intent log record could not be read.
Waiting for adminstrator intervention to fix the faulted pool.
action: Either restore the affected device(s) and run 'zpool online',
or ignore the intent log records by running 'zpool clear'.
   see: http://www.sun.com/msg/ZFS-8000-K4
 scrub: none requested
config:

NAMESTATE READ WRITE CKSUM
mypool  FAULTED  0 0 0  bad intent log
...

Nice! Running a later version of ZFS seems to lessen the need for 
ZIL-mirroring...

With kind regards,

Jeroen
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-07 Thread Edward Ned Harvey

 From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
 boun...@opensolaris.org] On Behalf Of Jeroen Roodhart
 
  If you're running solaris proper, you better mirror
  your
   ZIL log device.
 ...
  I plan to get to test this as well, won't be until
  late next week though.
 
 Running OSOL nv130. Power off the machine, removed the F20 and power
 back on. Machines boots OK and comes up normally [...]
 
 Nice! Running a later version of ZFS seems to lessen the need for ZIL-
 mirroring...

Yes, since zpool 19, which is not available in any version of solaris yet,
and is not available in osol 2009.06 unless you update to developer
builds,  Since zpool 19, you have the ability to zpool remove log
devices.  And if a log device fails during operation, the system is supposed
to fall back and just start using ZIL blocks from the main pool instead.

So the recommendation for zpool 19 would be *strongly* recommended.  Mirror
your log device if you care about using your pool.
And the recommendation for zpool =19 would be ... don't mirror your log
device.  If you have more than one, just add them both unmirrored.

I edited the ZFS Best Practices yesterday to reflect these changes.

I always have a shade of doubt about things that are supposed to do
something.  Later this week, I am building an OSOL machine, updating it,
adding an unmirrored log device, starting a sync-write benchmark (to ensure
the log device is heavily in use) and then I'm going to yank out the log
device, and see what happens.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-07 Thread Ragnar Sundblad


On 7 apr 2010, at 14.28, Edward Ned Harvey wrote:

 From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
 boun...@opensolaris.org] On Behalf Of Jeroen Roodhart
 
 If you're running solaris proper, you better mirror
 your
 ZIL log device.
 ...
 I plan to get to test this as well, won't be until
 late next week though.
 
 Running OSOL nv130. Power off the machine, removed the F20 and power
 back on. Machines boots OK and comes up normally [...]
 
 Nice! Running a later version of ZFS seems to lessen the need for ZIL-
 mirroring...
 
 Yes, since zpool 19, which is not available in any version of solaris yet,
 and is not available in osol 2009.06 unless you update to developer
 builds,  Since zpool 19, you have the ability to zpool remove log
 devices.  And if a log device fails during operation, the system is supposed
 to fall back and just start using ZIL blocks from the main pool instead.
 
 So the recommendation for zpool 19 would be *strongly* recommended.  Mirror
 your log device if you care about using your pool.
 And the recommendation for zpool =19 would be ... don't mirror your log
 device.  If you have more than one, just add them both unmirrored.

Rather: ... =19 would be ... if you don't mind loosing data written
the ~30 seconds before the crash, you don't have to mirror your log
device.

For a file server, mail server, etc etc, where things are stored
and supposed to be available later, you almost certainly want
redundancy on your slog too. (There may be file servers where
this doesn't apply, but they are special cases that should not
be mentioned in the general documentation.)

 I edited the ZFS Best Practices yesterday to reflect these changes.

I'd say, that In zpool version 19 or greater, it is recommended not to
mirror log devices. is not a very good advice and should be changed.

/ragge

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-07 Thread Robert Milkowski


On 07/04/2010 13:58, Ragnar Sundblad wrote:


Rather: ...=19 would be ... if you don't mind loosing data written
the ~30 seconds before the crash, you don't have to mirror your log
device.

For a file server, mail server, etc etc, where things are stored
and supposed to be available later, you almost certainly want
redundancy on your slog too. (There may be file servers where
this doesn't apply, but they are special cases that should not
be mentioned in the general documentation.)

   


While I agree with you I want to mention that it is all about 
understanding a risk.
In this case not only your server has to crash in such a way so data has 
not been synced (sudden power loss for example) but there would have to 
be some data committed to a slog device(s) which was not written to a 
main pool and when your server restarts your slog device would have to 
completely die as well.


Other than that you are fine even with unmirrored slog device.

--
Robert Milkowski
http://milek.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-07 Thread Bob Friesenhahn


On Wed, 7 Apr 2010, Ragnar Sundblad wrote:


So the recommendation for zpool 19 would be *strongly* recommended.  Mirror
your log device if you care about using your pool.
And the recommendation for zpool =19 would be ... don't mirror your log
device.  If you have more than one, just add them both unmirrored.


Rather: ... =19 would be ... if you don't mind loosing data written
the ~30 seconds before the crash, you don't have to mirror your log
device.


It is also worth pointing out that in normal operation the slog is 
essentially a write-only device which is only read at boot time.  The 
writes are assumed to work if the device claims success.  If the log 
device fails to read (oops!), then a mirror would be quite useful.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-07 Thread Robert Milkowski


On 07/04/2010 15:35, Bob Friesenhahn wrote:

On Wed, 7 Apr 2010, Ragnar Sundblad wrote:


So the recommendation for zpool 19 would be *strongly* 
recommended.  Mirror

your log device if you care about using your pool.
And the recommendation for zpool =19 would be ... don't mirror your 
log

device.  If you have more than one, just add them both unmirrored.


Rather: ... =19 would be ... if you don't mind loosing data written
the ~30 seconds before the crash, you don't have to mirror your log
device.


It is also worth pointing out that in normal operation the slog is 
essentially a write-only device which is only read at boot time.  The 
writes are assumed to work if the device claims success.  If the log 
device fails to read (oops!), then a mirror would be quite useful.


it is only read at boot if there are uncomitted data on it - during 
normal reboots zfs won't read data from slog.


--
Robert Milkowski
http://milek.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-07 Thread Bob Friesenhahn


On Wed, 7 Apr 2010, Robert Milkowski wrote:


it is only read at boot if there are uncomitted data on it - during normal 
reboots zfs won't read data from slog.


How does zfs know if there is uncomitted data on the slog device 
without reading it?  The minimal read would be quite small, but it 
seems that a read is still required.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] Reclaiming Windows Partitions

2010-04-07 Thread Ron Marshall

I finally decided to get rid of my Windows XP partition as I rarely used it 
except to fire it up to install OS updates and virus signatures.  I had some 
trouble locating information on how to do this so I thought I'd document it 
here.

My system is Toshiba Tecra M9.  It had four partitions on it. 

Partition 1 - NTFS Windows XP OS (Drive C:)
Partition 2 - NTFS Windows data partition (D:)
Partition 3 - FAT32
Partition 4 - Solaris2

Partition 1 and 2 where laid down by my company's standard OS install.  I had 
shrunk these using QTparted to enable me to install OpenSolaris.

Partition 3 was setup to have a common file system mountable by OpenSolaris and 
Windows.  There may be ways to do this with NTFS now, but this was a legacy 
from older Solaris installs.

Partition 4 is my OpenSolaris ZFS install

Step 1) Backuped up all my data from Partition 3, and any files I needed from 
Partition 1 and 2.  I also had a current snapshot of my OpenSolaris partition 
(Partition 4).

Step 2) Delete Partitions 1,2, and 3.  I did this using fdisk option in format 
under Opensolaris.

   format - Select Disk 0 (make note of the short drive name alias, mine was 
c4t0d0)

You will receive a warning something like this;
[disk formatted]
/dev/dsk/c4t0d0s0 is part of active ZFS pool rpool. Please see zpool(1M)

Then select fdisk from the FORMAT MENU

You will see something like this;

Total disk size is 14593 cylinders
 Cylinder size is 16065 (512 byte) blocks


 Cylinders
  PartitionStatus Type
Start End Length%
  =   ==  =   ===   ==   ===
  1  FAT32LBA   
x xx
  2  FAT32LBA   
    xx
  3  Win95 FAT32
  5481  8157267 18
  4  Active Solaris2
  8158  145796422  44



SELECT ONE OF THE FOLLOWING:
   1. Create a partition
   2. Specify the active partition
   3. Delete a partition
   4. Change between Solaris and Solaris2 Partition IDs
   5. Edit/View extended partitions
   6. Exit (update disk configuration and exit)
   7. Cancel (exit without updating disk configuration)
Enter Selection: 

Delete the partitions 1, 2 and 3 (Don't forget to back them up before you do 
this)

Using the fdisk menu create a new Solaris2 partition for use by ZFS.  When you 
are done you should see something like this;

   Cylinder size is 16065 (512 byte) blocks

   Cylinders
  Partition   Status  Type
Start End Length%
  =   ==  =   ===   ==   ===
  1  Solaris2   
1  81578157  56
  4  Active  Solaris2   
  8158  14579  6422  44

Exit and update the disk configuration.


Step 3) Create the ZFS pool

First you can test if zpool will be successful in creating the pool by using 
the -n option;

 zpool create -n datapool c4t0d0p1  (I will make some notes about this disk 
name at the end)

Should report something like;

would create 'datapool' with the following layout:

  datapool
  c4t0d0p1

By default the zpool command will make a mount-point in your root  / with the 
same name as your pool.  If you don't want this you can change that in the 
create command (see the man page for details)


Now issue the command without the -n option;

   zpool create  datapool c4t0d0p1

Now check to see if it is there;

  zpool list

It should report something like this;

NAME SIZE  ALLOC   FREECAPDEDUP  HEALTH  ALTROOT
datapool  62G  30.7G   31.3G49%  1.06x ONLINE   -
rpool49G  43.4G   5.65G88%  1.00x ONLINE   -

Step 4) Remember to take any of the mount parameters out of your /etc/vfstab 
file.

You should be good to go at this point. 
==
Notes about disk/partition naming;

In my case the disk is called c4t0d0.  So how did I come up with c4t0d0p1?

The whole disk name is c4t0d0p0.  Each partition is has the following naming 
convention;

Partition 1 = c4t0d0p1
Partition 2 = c4t0d0p2
Partition 3 = c4t0d0p3
Partition 4 = c4t0d0p4

The fdisk  command does not renumber the partitions when you delete partitions. 
 So in the end I had Partition 1 and 4.  

Thanks to Srdjan Matovina for helping me sort this out, and as a second pair of 
eyes to

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-07 Thread Neil Perrin


On 04/07/10 09:19, Bob Friesenhahn wrote:

On Wed, 7 Apr 2010, Robert Milkowski wrote:


it is only read at boot if there are uncomitted data on it - during 
normal reboots zfs won't read data from slog.


How does zfs know if there is uncomitted data on the slog device 
without reading it?  The minimal read would be quite small, but it 
seems that a read is still required.


Bob


If there's ever been synchronous activity then there an empty tail block 
(stubby) that

will be read even after a clean shutdown.

Neil.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-07 Thread Edward Ned Harvey

 From: Ragnar Sundblad [mailto:ra...@csc.kth.se]
 
 Rather: ... =19 would be ... if you don't mind loosing data written
 the ~30 seconds before the crash, you don't have to mirror your log
 device.

If you have a system crash, *and* a failed log device at the same time, this
is an important consideration.  But if you have either a system crash, or a
failed log device, that don't happen at the same time, then your sync writes
are safe, right up to the nanosecond.  Using unmirrored nonvolatile log
device on zpool = 19.


 I'd say, that In zpool version 19 or greater, it is recommended not to
 mirror log devices. is not a very good advice and should be changed.

See above.  Still disagree?

If desired, I could clarify the statement, by basically pasting what's
written above.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-07 Thread Edward Ned Harvey

 From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
 boun...@opensolaris.org] On Behalf Of Bob Friesenhahn
 
 It is also worth pointing out that in normal operation the slog is
 essentially a write-only device which is only read at boot time.  The
 writes are assumed to work if the device claims success.  If the log
 device fails to read (oops!), then a mirror would be quite useful.

An excellent point.

BTW, does the system *ever* read from the log device during normal
operation?  Such as perhaps during a scrub?  It really would be nice to
detect failure of log devices in advance, that are claiming to write
correctly, but which are really unreadable.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Reclaiming Windows Partitions

2010-04-07 Thread Thomas Maier-Komor

On 07.04.2010 18:05, Ron Marshall wrote:
 I finally decided to get rid of my Windows XP partition as I rarely used it 
 except to fire it up to install OS updates and virus signatures.  I had some 
 trouble locating information on how to do this so I thought I'd document it 
 here.
 
 My system is Toshiba Tecra M9.  It had four partitions on it. 
 
 Partition 1 - NTFS Windows XP OS (Drive C:)
 Partition 2 - NTFS Windows data partition (D:)
 Partition 3 - FAT32
 Partition 4 - Solaris2
 
 Partition 1 and 2 where laid down by my company's standard OS install.  I had 
 shrunk these using QTparted to enable me to install OpenSolaris.
 
 Partition 3 was setup to have a common file system mountable by OpenSolaris 
 and Windows.  There may be ways to do this with NTFS now, but this was a 
 legacy from older Solaris installs.
 
 Partition 4 is my OpenSolaris ZFS install
 
 Step 1) Backuped up all my data from Partition 3, and any files I needed from 
 Partition 1 and 2.  I also had a current snapshot of my OpenSolaris partition 
 (Partition 4).
 
 Step 2) Delete Partitions 1,2, and 3.  I did this using fdisk option in 
 format under Opensolaris.
 
format - Select Disk 0 (make note of the short drive name alias, mine was 
 c4t0d0)
 
 You will receive a warning something like this;
 [disk formatted]
 /dev/dsk/c4t0d0s0 is part of active ZFS pool rpool. Please see zpool(1M)
 
 Then select fdisk from the FORMAT MENU
 
 You will see something like this;
 
 Total disk size is 14593 cylinders
  Cylinder size is 16065 (512 byte) blocks
 
   
Cylinders
   PartitionStatus Type
 Start End Length%
   =   ==  =   ===   ==   ===
   1  FAT32LBA 
   x xx
   2  FAT32LBA 
   xx
   3  Win95 FAT32  
 5481  8157267 18
   4  Active Solaris2  
 8158  145796422  44
 
 
 
 SELECT ONE OF THE FOLLOWING:
1. Create a partition
2. Specify the active partition
3. Delete a partition
4. Change between Solaris and Solaris2 Partition IDs
5. Edit/View extended partitions
6. Exit (update disk configuration and exit)
7. Cancel (exit without updating disk configuration)
 Enter Selection: 
 
 Delete the partitions 1, 2 and 3 (Don't forget to back them up before you do 
 this)
 
 Using the fdisk menu create a new Solaris2 partition for use by ZFS.  When 
 you are done you should see something like this;
 
Cylinder size is 16065 (512 byte) blocks
 
Cylinders
   Partition   Status  Type
 Start End Length%
   =   ==  =   ===   ==   ===
   1  Solaris2 
   1  81578157  56
   4  Active  Solaris2 
 8158  14579  6422  44
 
 Exit and update the disk configuration.
 
 
 Step 3) Create the ZFS pool
 
 First you can test if zpool will be successful in creating the pool by using 
 the -n option;
 
  zpool create -n datapool c4t0d0p1  (I will make some notes about this 
 disk name at the end)
 
 Should report something like;
 
 would create 'datapool' with the following layout:
 
   datapool
   c4t0d0p1
 
 By default the zpool command will make a mount-point in your root  / with 
 the same name as your pool.  If you don't want this you can change that in 
 the create command (see the man page for details)
 
 
 Now issue the command without the -n option;
 
zpool create  datapool c4t0d0p1
 
 Now check to see if it is there;
 
   zpool list
 
 It should report something like this;
 
 NAME SIZE  ALLOC   FREECAPDEDUP  HEALTH  ALTROOT
 datapool  62G  30.7G   31.3G49%  1.06x ONLINE   -
 rpool49G  43.4G   5.65G88%  1.00x ONLINE   -
 
 Step 4) Remember to take any of the mount parameters out of your /etc/vfstab 
 file.
 
 You should be good to go at this point. 
 ==
 Notes about disk/partition naming;
 
 In my case the disk is called c4t0d0.  So how did I come up with c4t0d0p1?
 
 The whole disk name is c4t0d0p0.  Each partition is has the following naming 
 convention;
 
 Partition 1 = c4t0d0p1
 Partition 2 = c4t0d0p2
 Partition 3 = c4t0d0p3
 Partition 4 = c4t0d0p4
 
 The fdisk  command does not

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-07 Thread Neil Perrin


On 04/07/10 10:18, Edward Ned Harvey wrote:

From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
boun...@opensolaris.org] On Behalf Of Bob Friesenhahn

It is also worth pointing out that in normal operation the slog is
essentially a write-only device which is only read at boot time.  The
writes are assumed to work if the device claims success.  If the log
device fails to read (oops!), then a mirror would be quite useful.



An excellent point.

BTW, does the system *ever* read from the log device during normal
operation?  Such as perhaps during a scrub?  It really would be nice to
detect failure of log devices in advance, that are claiming to write
correctly, but which are really unreadable.


A scrub will read the log blocks but only for unplayed logs.
Because of the transient nature of the log and becuase it operates
outside of the transaction group model it's hard to read the in-flight 
log blocks

to validate them.

There have previously been suggestions to read slogs periodically.
I don't know if  there's a CR raised for this though.

Neil.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-07 Thread Mark J Musante


On Wed, 7 Apr 2010, Neil Perrin wrote:

There have previously been suggestions to read slogs periodically. I 
don't know if there's a CR raised for this though.


Roch wrote up CR 6938883 Need to exercise read from slog dynamically


Regards,
markm
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-07 Thread Bob Friesenhahn


On Wed, 7 Apr 2010, Edward Ned Harvey wrote:


From: Ragnar Sundblad [mailto:ra...@csc.kth.se]

Rather: ... =19 would be ... if you don't mind loosing data written
the ~30 seconds before the crash, you don't have to mirror your log
device.


If you have a system crash, *and* a failed log device at the same time, this
is an important consideration.  But if you have either a system crash, or a
failed log device, that don't happen at the same time, then your sync writes
are safe, right up to the nanosecond.  Using unmirrored nonvolatile log
device on zpool = 19.


The point is that the slog is a write-only device and a device which 
fails such that its acks each write, but fails to read the data that 
it wrote, could silently fail at any time during the normal 
operation of the system.  It is not necessary for the slog device to 
fail at the exact same time that the system spontaneously reboots.  I 
don't know if Solaris implements a background scrub of the slog as a 
normal course of operation which would cause a device with this sort 
of failure to be exposed quickly.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-07 Thread Bob Friesenhahn


On Wed, 7 Apr 2010, Edward Ned Harvey wrote:


BTW, does the system *ever* read from the log device during normal
operation?  Such as perhaps during a scrub?  It really would be nice to
detect failure of log devices in advance, that are claiming to write
correctly, but which are really unreadable.


To make matters worse, a SSD with a large cache might satisfy such 
reads from its cache so a scrub of the (possibly) tiny bit of 
pending synchronous writes may not validate anything.  A lightly 
loaded slog should usually be empty.  We already know that some 
(many?) SSDs are not very good about persisting writes to FLASH, even 
after acking a cache flush request.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] ZFS RaidZ recommendation

2010-04-07 Thread Jason S

I have been searching this forum and just about every ZFS document i can find 
trying to find the answer to my questions. But i believe the answer i am 
looking for is not going to be documented and is probably best learned from 
experience.


This is my first time playing around with open solaris and ZFS. I am in the 
midst of replacing my home based filed server. This server hosts all of my 
media files from MP3's to Blue Ray ISO's. I stream media from this file server 
to several media players throughout my house. 

The server consists of a Supermicro X6DHE-XG2 motherboard, 2 X 2.8ghz xeon 
processors, 4 gigs of ram and 2 Supermicro SAT2MV8 controllers. I have 14 1TB 
hitachi hard drives connected to the controllers.

My initial thought was to just create a single 14 drive RaidZ2 pool, but i have 
read over and over again that i should be limiting each array to a max of 9 
drives. So then i would end up with 2 X 7 drive RaidZ arrays. 

To keep the pool size at 12TB i would have to give up my extra parity drive 
going to this 2 array setup and it is concerning as i have no room for hot 
spares in this system. So in my mind i am left with only one other choice and 
this is going to 2XRaidZ2 pools and loosing an additional 2 TB so i am left 
with a 10TB ZFS pool.

So my big question is given that i am working with 4mb - 50gb files is going 
with 14 spindles going incur a huge performance hit? I was hoping to be able to 
saturate a single GigE link with this setup, but i am concerned the single 
large array wont let me achieve this.


hh, decisions, decisions

Any advice would be greatly appreciated.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-07 Thread Richard Elling

On Apr 7, 2010, at 10:19 AM, Bob Friesenhahn wrote:
 On Wed, 7 Apr 2010, Edward Ned Harvey wrote:
 From: Ragnar Sundblad [mailto:ra...@csc.kth.se]
 
 Rather: ... =19 would be ... if you don't mind loosing data written
 the ~30 seconds before the crash, you don't have to mirror your log
 device.
 
 If you have a system crash, *and* a failed log device at the same time, this
 is an important consideration.  But if you have either a system crash, or a
 failed log device, that don't happen at the same time, then your sync writes
 are safe, right up to the nanosecond.  Using unmirrored nonvolatile log
 device on zpool = 19.
 
 The point is that the slog is a write-only device and a device which fails 
 such that its acks each write, but fails to read the data that it wrote, 
 could silently fail at any time during the normal operation of the system.  
 It is not necessary for the slog device to fail at the exact same time that 
 the system spontaneously reboots.  I don't know if Solaris implements a 
 background scrub of the slog as a normal course of operation which would 
 cause a device with this sort of failure to be exposed quickly.

You are playing against marginal returns. An ephemeral storage requirement
is very different than permanent storage requirement.  For permanent storage
services, scrubs work well -- you can have good assurance that if you read
the data once then you will likely be able to read the same data again with 
some probability based on the expected decay of the data. For ephemeral data,
you do not read the same data more than once, so there is no correlation
between reading once and reading again later.  In other words, testing the
readability of an ephemeral storage service is like a cat chasing its tail.  
IMHO,
this is particularly problematic for contemporary SSDs that implement wear 
leveling.

sidebar
For clusters the same sort of problem exists for path monitoring. If you think
about paths (networks, SANs, cups-n-strings) then there is no assurance 
that a failed transfer means all subsequent transfers will also fail. Some other
permanence test is required to predict future transfer failures.
s/fail/pass/g
/sidebar

Bottom line: if you are more paranoid, mirror the separate log devices and
sleep through the night.  Pleasant dreams! :-)
 -- richard


ZFS storage and performance consulting at http://www.RichardElling.com
ZFS training on deduplication, NexentaStor, and NAS performance
Las Vegas, April 29-30, 2010 http://nexenta-vegas.eventbrite.com 

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] compression property not received

2010-04-07 Thread Cindy Swearingen


Daniel,

Which Solaris release is this?

I can't reproduce this on my lab system that runs the Solaris 10 10/09 
release.


See the output below.

Thanks,

Cindy

# zfs destroy -r tank/test
# zfs create -o compression=gzip tank/test
# zfs snapshot tank/t...@now
# zfs send -R tank/t...@now | zfs receive -vd rpool
receiving full stream of tank/t...@now into rpool/t...@now
received 249KB stream in 2 seconds (125KB/sec)
# zfs list -r rpool
NAMEUSED  AVAIL  REFER  MOUNTPOINT
rpool  39.4G  27.5G  47.1M  /rpool
rpool/ROOT 4.89G  27.5G21K  legacy
rpool/ROOT/s10s_u8wos_08a  4.89G  27.5G  4.89G  /
rpool/dump 1.50G  27.5G  1.50G  -
rpool/export 44K  27.5G23K  /export
rpool/export/home21K  27.5G21K  /export/home
rpool/snaps31.0G  27.5G  31.0G  /rpool/snaps
rpool/swap2G  29.5G16K  -
rpool/test   21K  27.5G21K  /rpool/test
rpool/t...@now 0  -21K  -
# zfs get compression rpool/test
NAMEPROPERTY VALUE SOURCE
rpool/test  compression  gzip  local

On 04/07/10 11:47, Daniel Bakken wrote:
When I send a filesystem with compression=gzip to another server with 
compression=on, compression=gzip is not set on the received filesystem. 
I am using:


zfs send -R promise1/arch...@daily.1 | zfs receive -vd sas

The zfs manpage says regarding the -R flag: When received, all 
properties, snapshots, descendent file systems, and clones are 
preserved. Snapshots are preserved, but the compression property is 
not. Any ideas why this doesn't work as advertised?


Thanks,

Daniel Bakken




___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS RaidZ recommendation

2010-04-07 Thread Bob Friesenhahn


On Wed, 7 Apr 2010, Jason S wrote:

To keep the pool size at 12TB i would have to give up my extra 
parity drive going to this 2 array setup and it is concerning as i 
have no room for hot spares in this system. So in my mind i am left 
with only one other choice and this is going to 2XRaidZ2 pools and 
loosing an additional 2 TB so i am left with a 10TB ZFS pool.


I would go with a single pool with two raidz2 vdevs, even if you don't 
get the maximum possible space.  Raidz is best avoided when using 1GB 
SATA disk drives because of the relatively high probability of data 
loss during a resilver and the long resilver times.  I would trade the 
hot spare for the improved security of raidz2.  The hot spare is more 
helpful for mirrored setups or raidz1, where the data reliability is 
more sensitive to how long it takes to recover a lost drive.  Just buy 
a spare drive so that you can replace a failed drive expediently.


So my big question is given that i am working with 4mb - 50gb files 
is going with 14 spindles going incur a huge performance hit? I was 
hoping to be able to saturate a single GigE link with this setup, 
but i am concerned the single large array wont let me achieve this.


It is not difficult to saturate a gigabit link.  It can be easily 
accomplished with just a couple of drives.  The main factor is if 
zfs's prefetch is aggressive enough.  Each raidz2 vdev will offer the 
useful IOPS of a single disk drive so from an IOPS standpoint, the 
pool would behave like two drives.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS RaidZ recommendation

2010-04-07 Thread Erik Trimble

On Wed, 2010-04-07 at 10:40 -0700, Jason S wrote:
 I have been searching this forum and just about every ZFS document i can find 
 trying to find the answer to my questions. But i believe the answer i am 
 looking for is not going to be documented and is probably best learned from 
 experience.
 
 
 This is my first time playing around with open solaris and ZFS. I am in
  the midst of replacing my home based filed server. This server hosts
  all of my media files from MP3's to Blue Ray ISO's. I stream media
  from this file server to several media players throughout my house. 
 
 The server consists of a Supermicro X6DHE-XG2 motherboard, 2 X 2.8ghz
  xeon processors, 4 gigs of ram and 2 Supermicro SAT2MV8 controllers. I
  have 14 1TB hitachi hard drives connected to the controllers.
 
If you can at all afford it, upgrade your RAM to 8GB. More than anything
else, I've found that additional RAM makes up for any other deficiencies
with a ZFS setup.  4GB is OK, but 8GB is a pretty sweet spot for
price/performance for a small NAS server.


 My initial thought was to just create a single 14 drive RaidZ2 pool,
  but i have read over and over again that i should be limiting each
  array to a max of 9 drives. So then i would end up with 2 X 7 drive
  RaidZ arrays. 
 
That's correct. You can certainly do a 14-drive Raidz2, but given how
the access/storage pattern for data is in such a setup, you'll likely
suffer noticeable slowness vs. a 2x7-drive setup.


 To keep the pool size at 12TB i would have to give up my extra parity
  drive going to this 2 array setup and it is concerning as i have no
  room for hot spares in this system. So in my mind i am left with only
  one other choice and this is going to 2XRaidZ2 pools and loosing an
  additional 2 TB so i am left with a 10TB ZFS pool.
 
You've pretty much hit it right there.  There is *one* other option:
create a zpool of two raidz1 vdevs: one with 6 drives, and one with 7
drives. Then add a hot spare for the pool.  That will give you most of
the performance of a 2x7 setup, with the capacity of 11 disks.  The
tradeoff is that it's a bit less reliable, as you have to trust the
ability of the hot spare to resilver before any additional drives fail
in degraded array.  For a home NAS, it's likely a reasonable bet,
though.


 So my big question is given that i am working with 4mb - 50gb files is
  going with 14 spindles going incur a huge performance hit? I was
  hoping to be able to saturate a single GigE link with this setup, but
  i am concerned the single large array wont let me achieve this.
 
Frankly, testing is the only way to be sure. :-)

Writing files that large (and reading them back more frequently, I
assume...) will tend to reduce the differences in performance between a
1x14 and 2x7 setup. One way to keep your 1Gb Ethernet saturated is to
increase the RAM (as noted above). With 8GB of RAM, you should have
enough buffer space in play to mask the differences in large file I/O
between the 1x14 and 2x7 setups. 12GB or 16GB would most certainly erase
pretty much any noticeable difference.

For small random I/O, even with larger amounts of RAM, you'll notice
some difference between the two setups - exactly how noticeable I can't
say, and you'd have to try it to see, as it depends heavily on your
access pattern.

 
 hh, decisions, decisions
 
 Any advice would be greatly appreciated.


One thing Richard or Bob might be able to answer better is the tradeoff
between getting a cheap/small SSD for L2ARC and buying more RAM. That
is, I don't have a good feel for whether (for your normal usage case),
it would be better to get 8GB of more RAM, or buy something like a cheap
40-60GB SSD for use as an L2ARC (or some combinations of the two).  SSDs
in that size range are $150-200, which is what 8GB of DDR1 ECC RAM will
likely cost.


-- 
Erik Trimble
Java System Support
Mailstop:  usca22-123
Phone:  x17195
Santa Clara, CA
Timezone: US/Pacific (GMT-0800)

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-07 Thread Miles Nordin

 jr == Jeroen Roodhart j.r.roodh...@uva.nl writes:

jr Running OSOL nv130. Power off the machine, removed the F20 and
jr power back on. Machines boots OK and comes up normally with
jr the following message in 'zpool status':

yeah, but try it again and this time put rpool on the F20 as well and
try to import the pool from a LiveCD: if you lose zpool.cache at this
stage, your pool is toast./end repeat mode


pgpt1GZtrVxS6.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS RaidZ recommendation

2010-04-07 Thread Bob Friesenhahn


On Wed, 7 Apr 2010, Erik Trimble wrote:


One thing Richard or Bob might be able to answer better is the tradeoff
between getting a cheap/small SSD for L2ARC and buying more RAM. That
is, I don't have a good feel for whether (for your normal usage case),
it would be better to get 8GB of more RAM, or buy something like a cheap
40-60GB SSD for use as an L2ARC (or some combinations of the two).  SSDs
in that size range are $150-200, which is what 8GB of DDR1 ECC RAM will
likely cost.


If the storage is primarily used for single-user streamed video 
playback, data caching will have little value (data is accessed only 
once) and there may even be value to disabling data caching entirely 
(but cache metadata).  The only useful data caching would be to 
support file prefetch.  If data caching is disabled then the total RAM 
requirement may be reduced.  If the storage will serve other purposes 
as well, then retaining the caching and buying more RAM is a wise 
idea.


Zfs has a design weakness in that any substantial writes during 
streaming playback may temporarily interrupt (hickup) the streaming 
playback.  This weakness seems to be inherent to zfs although there 
are tuning options to reduce its effect.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS RaidZ recommendation

2010-04-07 Thread Jason S

Thank you for the replies guys! 

I was actually already planning to get another 4 gigs of ram for the box right 
away anyway, but thank you for mentioning it! As there appears to be a couple 
ways to skin the cat here i think i am going to try both a 14 spindle RaidZ2 
and 2 X 7 RaidZ2 configuration and see what the performance is like. I have a 
fews days of grace before i need to have this server ready for duty. 

Something i forgot to note in my original post is the performance numbers i am 
concerned with are going to be during reads primarily. There could be at any 
one point 4 media players attempting to stream media from this server. The 
media players all have 100mb interfaces so as long i can can reliable stream 
400mb/s it should be ok (this is assuming all the media players were playing 
high bitrate Blueray streams at one time). Any writing to this array would 
happen pretty infrequently and i normally schedule any file transfers for the 
wee hours of the morning anyway.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Sun Flash Accelerator F20 numbers

2010-04-07 Thread Ragnar Sundblad


On 7 apr 2010, at 18.13, Edward Ned Harvey wrote:

 From: Ragnar Sundblad [mailto:ra...@csc.kth.se]
 
 Rather: ... =19 would be ... if you don't mind loosing data written
 the ~30 seconds before the crash, you don't have to mirror your log
 device.
 
 If you have a system crash, *and* a failed log device at the same time, this
 is an important consideration.  But if you have either a system crash, or a
 failed log device, that don't happen at the same time, then your sync writes
 are safe, right up to the nanosecond.  Using unmirrored nonvolatile log
 device on zpool = 19.

Right, but if you have a power or a hardware problem, chances are
that more things really break at the same time, including the slog
device(s).

 I'd say, that In zpool version 19 or greater, it is recommended not to
 mirror log devices. is not a very good advice and should be changed.
 
 See above.  Still disagree?
 
 If desired, I could clarify the statement, by basically pasting what's
 written above.

I believe that for a mail server, NFS server (to be spec compliant),
general purpose file server and the like, where the last written data
is as important as older data (maybe even more), it would be wise to
have at least as good redundancy on the slog as on the data disks.

If one can stand the (pretty small) risk of of loosing the last
transaction group before a crash, at the moment typically up to the
last 30 seconds of changes, you may have less redundancy on the slog.

(And if you don't care at all, like on a web cache perhaps, you
could of course disable the zil all together - that is kind of
the other end of the scale, which puts this in perspective.)

As Robert M so wisely and simply put it; It is all about understanding
a risk. I think the documentation should help people take educated
decisions, though I am not right now sure how to put the words to
describe this in an easily understandable way.

/ragge

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS RaidZ recommendation

2010-04-07 Thread Chris Dunbar

Hello,

More for my own edification than to help Jason (sorry Jason!) I would like to 
clarify something. If read performance is paramount, am I correct in thinking 
RAIDZ is not the best way to go? Would not the ZFS equivalent of RAID 10 
(striped mirror sets) offer better read performance? In this case, I realize 
that Jason also needs to maximize the space he has in order to store all of 
those legitimately copied Blu-Ray movies. ;-)

Regards,
Chris

On Apr 7, 2010, at 3:09 PM, Jason S wrote:

 Thank you for the replies guys! 
 
 I was actually already planning to get another 4 gigs of ram for the box 
 right away anyway, but thank you for mentioning it! As there appears to be a 
 couple ways to skin the cat here i think i am going to try both a 14 
 spindle RaidZ2 and 2 X 7 RaidZ2 configuration and see what the performance is 
 like. I have a fews days of grace before i need to have this server ready for 
 duty. 
 
 Something i forgot to note in my original post is the performance numbers i 
 am concerned with are going to be during reads primarily. There could be at 
 any one point 4 media players attempting to stream media from this server. 
 The media players all have 100mb interfaces so as long i can can reliable 
 stream 400mb/s it should be ok (this is assuming all the media players were 
 playing high bitrate Blueray streams at one time). Any writing to this array 
 would happen pretty infrequently and i normally schedule any file transfers 
 for the wee hours of the morning anyway.
 -- 
 This message posted from opensolaris.org
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
 eSoft SpamFilter Training Tool
 Train as Spam
 Blacklist for All Users
 Whitelist for All Users

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Are there (non-Sun/Oracle) vendors selling OpenSolaris/ZFS based NAS Hardware?

2010-04-07 Thread Jeremy Archer

GreenBytes (USA) sells OpenSolaris based storage appliances
Web site: www.getgreenbytes.com
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS RaidZ recommendation

2010-04-07 Thread Frank Middleton


On 04/ 7/10 03:09 PM, Jason S wrote:
 

I was actually already planning to get another 4 gigs of ram for the
box right away anyway, but thank you for mentioning it! As there
appears to be a couple ways to skin the cat here i think i am going
to try both a 14 spindle RaidZ2 and 2 X 7 RaidZ2 configuration and
see what the performance is like. I have a fews days of grace before
i need to have this server ready for duty.


Just curious, what are you planning to boot from? AFAIK you can't
boot ZFS from anything much more complicated than a mirror.

Cheers -- Frank

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS RaidZ recommendation

2010-04-07 Thread Bob Friesenhahn


On Wed, 7 Apr 2010, Chris Dunbar wrote:

More for my own edification than to help Jason (sorry Jason!) I 
would like to clarify something. If read performance is paramount, 
am I correct in thinking RAIDZ is not the best way to go? Would not 
the ZFS equivalent of RAID 10 (striped mirror sets) offer better 
read performance? In this case, I realize that Jason also needs to


Striped mirror vdevs are assured to offer peak performance.  One would 
(naively) think that the striping in a raidz2 would allow it to offer 
more sequential performance, but zfs's sequential file prefetch allows 
mirrors to offer about the same level of sequential performance. 
With the mirror setup, 128K blocks are pulled from each disk whereas 
with the raidz setup, the 128K block is split across the drives 
constituting a vdev.  Zfs is very good at ramping up prefetch for 
large sequential files.  Due to this, raidz2 should be seen as a way 
to improve storage efficiency and data reliability, and not so much as 
a way to improve sequential performance.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS RaidZ recommendation

2010-04-07 Thread Freddie Cash

On Wed, Apr 7, 2010 at 12:09 PM, Jason S j.sin...@shaw.ca wrote:

 I was actually already planning to get another 4 gigs of ram for the box
 right away anyway, but thank you for mentioning it! As there appears to be a
 couple ways to skin the cat here i think i am going to try both a 14
 spindle RaidZ2 and 2 X 7 RaidZ2 configuration and see what the performance
 is like. I have a fews days of grace before i need to have this server ready
 for duty.

 Don't bother with the 14-drive raidz2.  I can attest to just how horrible
the performance is for a single, large, raidz2 vdev is:  atrocious.
 Especially when it comes time to scrub or resilver.  You'll end up
thrashing all the disks, taking close to a week to resilver a dead drive (if
you can actually get it to complete), and pulling your hair out is
frustration.

Our original configuration in our storage servers used a single 24-drive
raidz2 vdev using 7200 RPM SATA drives.  Worked, not well, but it worked ...
until the first drive died.  After 3 weeks, the resilver still hadn't
finished, the backups processes weren't completing overnight due to the
resilver process, and things just went downhill.  We redid the pool using 3x
raidz2 vdevs using 8 drives each, and things are much better.  (If I had to
re-do it again today, I'd use 4x raidz2 vdevs using 6 drives each.)

The more vdevs you can add to a pool, the better the raw I/O performance of
the pool will be.  Go with lots of smaller vdevs.  With 14 drives, play
around with the following:
  2x raidz2 vdevs using 7 drives each
  3x raidz2 vdevs using 5 drives each (with two hot-spares, or a mirror vdev
for root?)
  4x raidz2 vdevs using 4 drives each (with one hot-spare, perhaps?)
  4x raidz1 vdevs using 4 drives each (maybe not enough redundancy?)
  5x mirror vdevs using 3 drives each (maybe too much lost space for
redundancy?)
  7x mirror vdevs using 2 drives each

You really need to decide which is more important:  raw storage space or raw
I/O throughput.  They're almost (not quite, but almost) mutually exclusive.


 Something i forgot to note in my original post is the performance numbers i
 am concerned with are going to be during reads primarily. There could be at
 any one point 4 media players attempting to stream media from this server.
 The media players all have 100mb interfaces so as long i can can reliable
 stream 400mb/s it should be ok (this is assuming all the media players were
 playing high bitrate Blueray streams at one time). Any writing to this array
 would happen pretty infrequently and i normally schedule any file transfers
 for the wee hours of the morning anyway.


-- 
Freddie Cash
fjwc...@gmail.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS RaidZ recommendation

2010-04-07 Thread Freddie Cash

On Wed, Apr 7, 2010 at 12:29 PM, Frank Middleton
f.middle...@apogeect.comwrote:

 On 04/ 7/10 03:09 PM, Jason S wrote:


 I was actually already planning to get another 4 gigs of ram for the
 box right away anyway, but thank you for mentioning it! As there
 appears to be a couple ways to skin the cat here i think i am going
 to try both a 14 spindle RaidZ2 and 2 X 7 RaidZ2 configuration and
 see what the performance is like. I have a fews days of grace before
 i need to have this server ready for duty.


 Just curious, what are you planning to boot from? AFAIK you can't
 boot ZFS from anything much more complicated than a mirror.

 The OP mentioned OpenSolaris, so can't comment on what can/can't be booted
from on that OS.

However, FreeBSD 8 can boot from a mirror pool, a raidz1 pool, and a raidz2
pool.

So it's not a limitation in ZFS itself.  :)

-- 
Freddie Cash
fjwc...@gmail.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS RaidZ recommendation

2010-04-07 Thread Jason S

Ahh,

Thank you for the reply Bob, that is the info i was after. It looks like i will 
be going with the 2 X 7 RaidZ2 option. 

And just to clarify as far as expanding this pool in the future my only option 
is to add another 7 spindle RaidZ2 array correct?

Thanks for all the help guys !
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Are there (non-Sun/Oracle) vendors selling OpenSolaris/ZFS based NAS Hardware?

2010-04-07 Thread Tim Cook

On Wed, Apr 7, 2010 at 2:20 PM, Jeremy Archer j4rc...@gmail.com wrote:

 GreenBytes (USA) sells OpenSolaris based storage appliances
 Web site: www.getgreenbytes.com
  http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Unless something has changed recently, they were using their own modified,
and non-open-source version of ZFS.

--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS RaidZ recommendation

2010-04-07 Thread Jason S

I am booting from a single 74gig WD raptor attached to the motherboards onboard 
SATA port.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS RaidZ recommendation

2010-04-07 Thread Eric D. Mudama


On Wed, Apr  7 at 12:41, Jason S wrote:

And just to clarify as far as expanding this pool in the future my
only option is to add another 7 spindle RaidZ2 array correct?


That is correct, unless you want to use the -f option to force-allow
an asymmetric expansion of your pool.

--eric

--
Eric D. Mudama
edmud...@mail.bounceswoosh.org

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS RaidZ recommendation

2010-04-07 Thread Erik Trimble

On Wed, 2010-04-07 at 12:41 -0700, Jason S wrote:
 Ahh,
 
 Thank you for the reply Bob, that is the info i was after. It looks like i 
 will be going with the 2 X 7 RaidZ2 option. 
 
 And just to clarify as far as expanding this pool in the future my only 
 option is to add another 7 spindle RaidZ2 array correct?
 
 Thanks for all the help guys !

You can add arbitrary-sized vdevs to a pool, but you can't add any
drives to an existing raidz[123] vdev. You can even add things like a
mirrored vdev to a pool consisting of several raidz[123] vdevs. :-)

Thus, it would certainly be possible to add, say, a 4-drive raidz1 to
your 2x7 pool.  It wouldn't perform quite the same as a 3x7 pool, but it
still would perform better than the 2x7 pool.



-- 
Erik Trimble
Java System Support
Mailstop:  usca22-123
Phone:  x17195
Santa Clara, CA
Timezone: US/Pacific (GMT-0800)

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] compression property not received

2010-04-07 Thread Daniel Bakken

I worked around the problem by first creating a filesystem of the same name
with compression=gzip on the target server. Like this:

zfs create sas/archive
zfs set compression=gzip sas/archive

Then I used zfs receive with the -F option:

zfs send -vR promise1/arch...@daily.1 | zfs send zfs receive -vFd sas

And now I have gzip compression enabled locally:

zfs get compression sas/archive
NAME PROPERTY VALUE SOURCE
sas/archive  compression  gzip  local

Not pretty, but it works.

Daniel Bakken


On Wed, Apr 7, 2010 at 12:51 PM, Cindy Swearingen 
cindy.swearin...@oracle.com wrote:

 Hi Daniel,

 I tried to reproduce this by sending from a b130 system to a s10u9 system,
 which vary in pool versions, but this shouldn't matter. I've
 been sending/receiving streams between latest build systems and
 older s10 systems for a long time. The zfs send -R option to send a
 recursive snapshot and all properties integrated into b77 so that
 isn't your problem either.

 The above works as expected. See below.

 I also couldn't find any recent bugs related to this, but bug searching is
 not an exact science.

 Mystified as well...

 Cindy

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS RaidZ recommendation

2010-04-07 Thread Jason S

Freddie,

now you have brought up another question :) I had always assumed that i would 
just used open solaris for this file server build, as i had not actually done 
any research in regards to other operatin systems that support ZFS. Does anyone 
have any advice as to wether i should be considering FreeBSD instead of Open 
Solaris? Both operating systems are somewhat foriegn to me as i come from the 
windows domain with a little bit of linux experience as well.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] compression property not received

2010-04-07 Thread Daniel Bakken

The receive side is running build 111b (2009.06), so I'm not sure if your
advice actually applies to my situation.

Daniel Bakken


On Tue, Apr 6, 2010 at 10:57 PM, Tom Erickson thomas.erick...@oracle.comwrote:


 After build 128, locally set properties override received properties, and
 this would be the expected behavior. In that case, the value was received
 and you can see it like this:

 % zfs get -o all compression tank
 NAME  PROPERTY VALUE RECEIVED  SOURCE
 tank  compression  ongzip  local
 %

 You could make the received value the effective value (clearing the local
 value) like this:

 % zfs inherit -S compression tank
 % zfs get -o all compression tank
 NAME  PROPERTY VALUE RECEIVED  SOURCE
 tank  compression  gzip  gzip  received
 %

 If the receive side is below the version that supports received properties,
 then I would expect the receive to set compression=gzip.

 After build 128 'zfs receive' prints an error message for every property it
 fails to set. Before that version, 'zfs receive' is silent when it fails to
 set a property so long as everything else is successful. I might check
 whether I have permission to set compression with 'zfs allow'. You could
 pipe the send stream to zstreamdump to verify that compression=gzip is in
 the send stream, but I think before build 125 you will not have zstreamdump.

 Tom


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

[zfs-discuss] Locking snapshots when using zfs send

2010-04-07 Thread Will Murnane

I just bought a new set of disks, and want to move my primary data
store over to the new disks.  I created a new pool fine, and now I'm
trying to use zfs send -R | zfs receive to transfer the data.  Here's
the error I got:
$ pfexec zfs send -Rpv h...@next | pfexec zfs receive -duvF temp
sending from @ to h...@sync
receiving full stream of h...@sync into t...@sync
sending from @sync to h...@zfs-auto-snap:frequent-2010-04-05-22:00
warning: cannot send 'h...@zfs-auto-snap:frequent-2010-04-05-22:00':
no such pool or datasetsending from
@zfs-auto-snap:frequent-2010-04-05-22:00 to
h...@zfs-auto-snap:frequent-2010-04-06-00:00
warning: cannot send 'h...@zfs-auto-snap:frequent-2010-04-06-00:00':
no such pool or dataset
sending from @zfs-auto-snap:frequent-2010-04-06-00:00 to
h...@zfs-auto-snap:frequent-2010-04-06-11:45
warning: cannot send 'h...@zfs-auto-snap:frequent-2010-04-06-11:45':
no such pool or dataset
sending from @zfs-auto-snap:frequent-2010-04-06-11:45 to h...@next
warning: cannot send 'h...@next': incremental source
(@zfs-auto-snap:frequent-2010-04-06-11:45) does not exist
cannot receive new filesystem stream: invalid backup stream

This process took about 12 hours to do, so it's frustrating that
(apparently) snapshots disappearing causes the replication to fail.
Perhaps some sort of locking should be implemented to prevent
snapshots that will be needed from being destroyed.

In the meantime, I disabled all the zfs/auto-snapshot* services.
Should this be enough to prevent the send process from failing again,
or are there other steps I should take?

Thanks!
Will
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS RaidZ recommendation

2010-04-07 Thread Freddie Cash

On Wed, Apr 7, 2010 at 1:22 PM, Jason S j.sin...@shaw.ca wrote:

 now you have brought up another question :) I had always assumed that i
 would just used open solaris for this file server build, as i had not
 actually done any research in regards to other operatin systems that support
 ZFS. Does anyone have any advice as to wether i should be considering
 FreeBSD instead of Open Solaris? Both operating systems are somewhat foriegn
 to me as i come from the windows domain with a little bit of linux
 experience as well.

 If you want access  to the latest and greatest ZFS features as soon as they
are available, you'll need to use OpenSolaris (currently at ZFSv22 or
newer).

If you don't mind waiting up to a year for new ZFS features, you can use
FreeBSD (currently at ZFSv13 in 7.3 and 8.0).

Hardware support for enterprise server gear may be better in OSol.
Hardware support for general server gear should be about the same.
Hardware support for desktop gear may be better in FreeBSD.

Each has fancy software features that the other doesn't (GEOM, HAST,
IPFW/PF, Jails, etc in FreeBSD; Zones, Crossbow, whatever that fancy admin
framework is called, integrated iSCSI, integrated CIFS, etc in OSol).

I'm biased toward FreeBSD, but that's because I've never used OSol.
 Anything is better than Linux.  ;)

-- 
Freddie Cash
fjwc...@gmail.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS RaidZ recommendation

2010-04-07 Thread Bob Friesenhahn


On Wed, 7 Apr 2010, Jason S wrote:

systems that support ZFS. Does anyone have any advice as to wether i 
should be considering FreeBSD instead of Open Solaris? Both 
operating systems are somewhat foriegn to me as i come from the


FreeBSD zfs does clearly work, although it is an older version of zfs 
(version 13) than comes with the latest Solaris 10 (version 15), or 
development OpenSolaris.  Zfs is better integrated into Solaris than 
it is in FreeBSD since it was designed for Solaris.  While I have not 
used FreeBSD zfs, my experience with Solaris 10 and FreeBSD is that 
Solaris 10 (and later) is an extremely feature-rich system which can 
take considerable time to figure out if you really want to use all of 
those features (but you don't have to).  FreeBSD is simpler because it 
does not do as much.  FreeBSD boots extremely fast.


If your only interest is with zfs, my impression is that in a year or 
two it will not really matter if you are using Solaris or FreeBSD 
because FreeBSD will have an updated zfs (with deduplication) and will 
be more mature than it is now.  Today zfs is more mature and stable in 
Solaris.


Solaris NFS is clearly more mature and performant than in FreeBSD. 
OpenSolaris native CIFS is apparently quite a good performer.  I find 
that Solaris 10 with Samba works well for me.


Solaris 10's Live Upgrade (and the OpenSolaris equivalent) is quite 
valuable in that it allows you to upgrade the OS without more than a 
few minutes of down-time and with a quick fall-back if things don't 
work as expected.


It is more straightforward to update a FreeBSD install from source 
code because that is the way it is normally delivered.  Sometimes this 
is useful in order to incorporate a fix as soon as possible without 
needing to wait for someone to produce binaries.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS RaidZ recommendation

2010-04-07 Thread Jason S

Since i already have Open Solaris installed on the box, i probably wont jump 
over to FreeBSD. However someone has suggested to me to look into 
www.nexenta.org and i must say it is quite interesting. Someone correct me if i 
am wrong but it looks like it is Open Solaris based and has basically 
everything i am looking for (NFS and CIFS sharing). I am downloading it right 
now and am going to install it on another machine to see if this GUI is easy 
enough to use.

Does anyone have any experience or pointers with this NAS software?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Locking snapshots when using zfs send

2010-04-07 Thread Brandon High

On Wed, Apr 7, 2010 at 1:32 PM, Will Murnane will.murn...@gmail.com wrote:
 This process took about 12 hours to do, so it's frustrating that
 (apparently) snapshots disappearing causes the replication to fail.
 Perhaps some sort of locking should be implemented to prevent
 snapshots that will be needed from being destroyed.

What release of opensolaris are you using? Recent versions have the
ability to place holds on snapshots, and doing a send will
automatically place holds on the snapshots.

zfs hold tank/foo/b...@now
zfs release tank/foo/b...@now

-B

-- 
Brandon High : bh...@freaks.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Locking snapshots when using zfs send

2010-04-07 Thread Will Murnane

On Wed, Apr 7, 2010 at 17:51, Brandon High bh...@freaks.com wrote:
 On Wed, Apr 7, 2010 at 1:32 PM, Will Murnane will.murn...@gmail.com wrote:
 This process took about 12 hours to do, so it's frustrating that
 (apparently) snapshots disappearing causes the replication to fail.
 Perhaps some sort of locking should be implemented to prevent
 snapshots that will be needed from being destroyed.

 What release of opensolaris are you using? Recent versions have the
 ability to place holds on snapshots, and doing a send will
 automatically place holds on the snapshots.
This is on b134:
$ pfexec pkg image-update
No updates available for this image.

There is a zfs hold command available, but checking for holds on the
snapshot I'm trying to send (I started it again, to see if disabling
automatic snapshots helped) doesn't show anything:
$ zfs holds -r h...@next
$ echo $?
0
and applying a recursive hold to that snapshot doesn't seem to hold
all its children:
$ pfexec zfs hold -r keep h...@next
$ zfs holds -r h...@next
NAME TAG   TIMESTAMP
huge/homes/d...@next  keep  Wed Apr  7 18:02:09 2010
h...@nextkeep  Wed Apr  7 18:02:09 2010
$ zfs list -r -t all huge | grep next
h...@next204K
-  2.80T  -
huge/back...@next   0
-  42.0K  -
huge/ho...@next 0
-  42.9M  -
huge/homes/cnl...@next  59.9K
-   165G  -
huge/homes/d...@next 0
-  42.0K  -
huge/homes/svnb...@next 0
-  46.4M  -
huge/homes/w...@next23.9M
-  95.7G  -

Suggestions?  Comments?

Will
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS RaidZ recommendation

2010-04-07 Thread Tim Cook

On Wednesday, April 7, 2010, Jason S j.sin...@shaw.ca wrote:
 Since i already have Open Solaris installed on the box, i probably wont jump 
 over to FreeBSD. However someone has suggested to me to look into 
 www.nexenta.org and i must say it is quite interesting. Someone correct me if 
 i am wrong but it looks like it is Open Solaris based and has basically 
 everything i am looking for (NFS and CIFS sharing). I am downloading it right 
 now and am going to install it on another machine to see if this GUI is easy 
 enough to use.

 Does anyone have any experience or pointers with this NAS software?
 --
 This message posted from opensolaris.org
 _


I wouldn't waste your time. My last go round lacp was completely
broken for no apparent reason. The community is basically
non-existent.

--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS RaidZ recommendation

2010-04-07 Thread David Magda


On Apr 7, 2010, at 16:47, Bob Friesenhahn wrote:

Solaris 10's Live Upgrade (and the OpenSolaris equivalent) is quite  
valuable in that it allows you to upgrade the OS without more than a  
few minutes of down-time and with a quick fall-back if things don't  
work as expected.


It is more straightforward to update a FreeBSD install from source  
code because that is the way it is normally delivered.  Sometimes  
this is useful in order to incorporate a fix as soon as possible  
without needing to wait for someone to produce binaries.


If you're going to go with (Open)Solaris, the OP may also want to look  
into the multi-platform pkgsrc for third-party open source software:


http://www.pkgsrc.org/
http://en.wikipedia.org/wiki/Pkgsrc

It's not as comprehensive as FreeBSD Ports (21,500 and counting), but  
it has the major stuff and is quite good. I'd also look into the  
FreeBSD Handbook:


http://freebsd.org/handbook

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] compression property not received

2010-04-07 Thread Daniel Bakken

Here is the info from zstreamdump -v on the sending side:

BEGIN record
hdrtype = 2
features = 0
magic = 2f5bacbac
creation_time = 0
type = 0
flags = 0x0
toguid = 0
fromguid = 0
toname = promise1/arch...@daily.1

nvlist version: 0
tosnap = daily.1
fss = (embedded nvlist)
nvlist version: 0
0xcfde021e56c8fc = (embedded nvlist)
nvlist version: 0
name = promise1/archive
parentfromsnap = 0x0
props = (embedded nvlist)
nvlist version: 0
mountpoint = /promise1/archive
compression = 0xa
dedup = 0x2
(end props)

I assume that compression = 0xa means gzip. I wonder if the dedup property
is causing the receiver (build 111b)  to disregard all other properties,
since the receiver doesn't support dedup. Dedup was enabled in the past on
the sending filesystem, but is now disabled for reasons of sanity.

I'd like to try the dtrace debugging, but it would destroy the progress I've
made so far transferring the filesystem.

Thanks,

Daniel

On Wed, Apr 7, 2010 at 12:52 AM, Tom Erickson thomas.erick...@oracle.comwrote:


 The advice regarding received vs local properties definitely does not
 apply. You could still confirm the presence of the compression property in
 the send stream with zstreamdump, since the send side is running build 129.
 To debug the receive side I might dtrace the zap_update() function with the
 fbt provider, something like

 zfs send -R promise1/arch...@daily.1 | dtrace -c 'zfs receive -vd sas' \
 -n 'fbt::zap_update:entry / stringof(args[2]) == compression ||  \
 stringof(args[2]) == compression$recvd / { self-trace = 1; }'  \
 -n 'fbt::zap_update:return / self-trace / { trace(args[1]); \
 self-trace = 0; }'

 and look for non-zero return values.

 I'd also redirect 'zdb -vvv poolname' to a file and search it for
 compression to check the value in the ZAP.

 I assume you have permission to set the compression property on the receive
 side, but I'd check anyway.

 Tom

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS RaidZ recommendation

2010-04-07 Thread Richard Elling

On Apr 7, 2010, at 3:24 PM, Tim Cook wrote:
 On Wednesday, April 7, 2010, Jason S j.sin...@shaw.ca wrote:
 Since i already have Open Solaris installed on the box, i probably wont jump 
 over to FreeBSD. However someone has suggested to me to look into 
 www.nexenta.org and i must say it is quite interesting. Someone correct me 
 if i am wrong but it looks like it is Open Solaris based and has basically 
 everything i am looking for (NFS and CIFS sharing). I am downloading it 
 right now and am going to install it on another machine to see if this GUI 
 is easy enough to use.
 
 Does anyone have any experience or pointers with this NAS software?
 --
 This message posted from opensolaris.org
 _
 
 
 I wouldn't waste your time. My last go round lacp was completely
 broken for no apparent reason. The community is basically
 non-existent.

[richard pinches himself... yep, still there :-)]

NexentaStor version 3.0 is based on b134 so it has the same basic foundation 
as the yet-unreleased OpenSolaris 2010.next.  For an easy-to-use NAS box
for the masses, it is much more friendly and usable than a basic  OpenSolaris
or Solaris 10 release.
 -- richard

ZFS storage and performance consulting at http://www.RichardElling.com
ZFS training on deduplication, NexentaStor, and NAS performance
Las Vegas, April 29-30, 2010 http://nexenta-vegas.eventbrite.com 





___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS RaidZ recommendation

2010-04-07 Thread Bob Friesenhahn


On Wed, 7 Apr 2010, David Magda wrote:


It is more straightforward to update a FreeBSD install from source code 
because that is the way it is normally delivered.  Sometimes this is useful 
in order to incorporate a fix as soon as possible without needing to wait 
for someone to produce binaries.


If you're going to go with (Open)Solaris, the OP may also want to look into 
the multi-platform pkgsrc for third-party open source software:


http://www.pkgsrc.org/
http://en.wikipedia.org/wiki/Pkgsrc


But this does not update the OS kernel.  It is for application 
packages.  I did have to apply a source patch to the FreeBSD kernel 
the last time around.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS RaidZ recommendation

2010-04-07 Thread Freddie Cash

On Wed, Apr 7, 2010 at 4:27 PM, Bob Friesenhahn 
bfrie...@simple.dallas.tx.us wrote:

 On Wed, 7 Apr 2010, David Magda wrote:


 It is more straightforward to update a FreeBSD install from source code
 because that is the way it is normally delivered.  Sometimes this is useful
 in order to incorporate a fix as soon as possible without needing to wait
 for someone to produce binaries.


 If you're going to go with (Open)Solaris, the OP may also want to look
 into the multi-platform pkgsrc for third-party open source software:

http://www.pkgsrc.org/
http://en.wikipedia.org/wiki/Pkgsrc


 But this does not update the OS kernel.  It is for application packages.  I
 did have to apply a source patch to the FreeBSD kernel the last time around.

 This is getting a bit off-topic regarding ZFS, but you only need to patch
the FreeBSD kernel if you don't want to wait for an errata/security notice
to be made.  If you can wait, then you can just use the freebsd-update tool
to do a binary update of just the affected (files), or even to the next
(major or minor) release.

Not sure what the equivalent process would be on (Open)Solaris (or if you
even can do a patch/source update).

However, I believe the mention of pkgsrc was for use on OSol.  There's very
little reason to use pkgsrc on FreeBSD.

-- 
Freddie Cash
fjwc...@gmail.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS RaidZ recommendation

2010-04-07 Thread Edward Ned Harvey

 From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
 boun...@opensolaris.org] On Behalf Of Chris Dunbar

 like to clarify something. If read performance is paramount, am I
 correct in thinking RAIDZ is not the best way to go? Would not the ZFS
 equivalent of RAID 10 (striped mirror sets) offer better read
 performance? In this case, I realize that Jason also needs to maximize
 the space he has in order to store all of those legitimately copied
 Blu-Ray movies. ;-)

During my testing, for sequential reads using 6 disks, I got these numbers:
(normalized times performance of a single disk)
Stripe of 3 mirrors: 10.89
Raidz 6disks:  9.84
Raidz2 6disks: 7.17
Where any number 2 would max out a GigE.

The main performance advantage of the stripe of mirrors is in the random
reads, which aren't very significant for Jason's case.

http://nedharvey.com/iozone_weezer/bobs%20method/iozone%20results%20summary.
pdf

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS RaidZ recommendation

2010-04-07 Thread Edward Ned Harvey

 From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
 boun...@opensolaris.org] On Behalf Of David Magda
 
 If you're going to go with (Open)Solaris, the OP may also want to look
 into the multi-platform pkgsrc for third-party open source software:
 
   http://www.pkgsrc.org/
   http://en.wikipedia.org/wiki/Pkgsrc

Am I mistaken?  I thought pkgsrc was for netbsd.  
For solaris/opensolaris, I would normally say opencsw or blastwave.  (And in
some circumstances, sunfreeware.)

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] compression property not received

2010-04-07 Thread Daniel Bakken

We have found the problem. The mountpoint property on the sender was at one
time changed from the default, then later changed back to defaults using zfs
set instead of zfs inherit. Therefore, zfs send included these local
non-default properties in the stream, even though the local properties are
effectively set at defaults. This caused the receiver to stop processing
subsequent properties in the stream because the mountpoint isn't valid on
the receiver.

I tested this theory with a spare zpool. First I used zfs inherit
mountpoint promise1/archive to remove the local setting (which was
exactly the same value as the default). This time the compression=gzip
property was correctly received.

It seems like a bug to me that one failed property in a stream prevents the
rest from being applied. I should have used zfs inherit, but it would be
best if zfs receive handled failures more gracefully, and attempted to set
as many properties as possible.

Thanks to Cindy and Tom for their help.

Daniel

On Wed, Apr 7, 2010 at 2:31 AM, Tom Erickson thomas.erick...@oracle.comwrote:


 Now I remember that 'zfs receive' used to give up after the first property
 it failed to set. If I'm remembering correctly, then, in this case, if the
 mountpoint was invalid on the receive side, 'zfs receive' would not even try
 to set the remaining properties.

 I'd try the following in the source dataset:

 zfs inherit mountpoint promise1/archive

 to clear the explicit mountpoint and prevent it from being included in the
 send stream. Later set it back the way it was. (Soon there will be an option
 to take care of that; see CR 6883722 want 'zfs recv -o prop=value' to set
 initial property values of received dataset.) Then see if you receive the
 compression property successfully.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] compression property not received

2010-04-07 Thread Tom Erickson


Daniel Bakken wrote:
When I send a filesystem with compression=gzip to another server with 
compression=on, compression=gzip is not set on the received filesystem. 
I am using:


zfs send -R promise1/arch...@daily.1 | zfs receive -vd sas

The zfs manpage says regarding the -R flag: When received, all 
properties, snapshots, descendent file systems, and clones are 
preserved. Snapshots are preserved, but the compression property is 
not. Any ideas why this doesn't work as advertised?




After build 128, locally set properties override received properties, 
and this would be the expected behavior. In that case, the value was 
received and you can see it like this:


% zfs get -o all compression tank
NAME  PROPERTY VALUE RECEIVED  SOURCE
tank  compression  ongzip  local
%

You could make the received value the effective value (clearing the 
local value) like this:


% zfs inherit -S compression tank
% zfs get -o all compression tank
NAME  PROPERTY VALUE RECEIVED  SOURCE
tank  compression  gzip  gzip  received
%

If the receive side is below the version that supports received 
properties, then I would expect the receive to set compression=gzip.


After build 128 'zfs receive' prints an error message for every property 
it fails to set. Before that version, 'zfs receive' is silent when it 
fails to set a property so long as everything else is successful. I 
might check whether I have permission to set compression with 'zfs 
allow'. You could pipe the send stream to zstreamdump to verify that 
compression=gzip is in the send stream, but I think before build 125 you 
will not have zstreamdump.


Tom

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] compression property not received

2010-04-07 Thread Tom Erickson


Daniel Bakken wrote:
The receive side is running build 111b (2009.06), so I'm not sure if 
your advice actually applies to my situation.




The advice regarding received vs local properties definitely does not 
apply. You could still confirm the presence of the compression property 
in the send stream with zstreamdump, since the send side is running 
build 129. To debug the receive side I might dtrace the zap_update() 
function with the fbt provider, something like


zfs send -R promise1/arch...@daily.1 | dtrace -c 'zfs receive -vd sas' \ 
-n 'fbt::zap_update:entry / stringof(args[2]) == compression ||  \

stringof(args[2]) == compression$recvd / { self-trace = 1; }'  \
-n 'fbt::zap_update:return / self-trace / { trace(args[1]); \
self-trace = 0; }'

and look for non-zero return values.

I'd also redirect 'zdb -vvv poolname' to a file and search it for 
compression to check the value in the ZAP.


I assume you have permission to set the compression property on the 
receive side, but I'd check anyway.


Tom




On Tue, Apr 6, 2010 at 10:57 PM, Tom Erickson 
thomas.erick...@oracle.com mailto:thomas.erick...@oracle.com wrote:
 


After build 128, locally set properties override received
properties, and this would be the expected behavior. In that case,
the value was received and you can see it like this:

% zfs get -o all compression tank
NAME  PROPERTY VALUE RECEIVED  SOURCE
tank  compression  ongzip  local
%

You could make the received value the effective value (clearing the
local value) like this:

% zfs inherit -S compression tank
% zfs get -o all compression tank
NAME  PROPERTY VALUE RECEIVED  SOURCE
tank  compression  gzip  gzip  received
%

If the receive side is below the version that supports received
properties, then I would expect the receive to set compression=gzip.

After build 128 'zfs receive' prints an error message for every
property it fails to set. Before that version, 'zfs receive' is
silent when it fails to set a property so long as everything else is
successful. I might check whether I have permission to set
compression with 'zfs allow'. You could pipe the send stream to
zstreamdump to verify that compression=gzip is in the send stream,
but I think before build 125 you will not have zstreamdump.

Tom





___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] Locking snapshots when using zfs send

2010-04-07 Thread Chris Kirby

On Apr 7, 2010, at 5:06 PM, Will Murnane wrote:

 This is on b134:
 $ pfexec pkg image-update
 No updates available for this image.
 
 There is a zfs hold command available, but checking for holds on the
 snapshot I'm trying to send (I started it again, to see if disabling
 automatic snapshots helped) doesn't show anything:
 $ zfs holds -r h...@next
 $ echo $?
 0
 and applying a recursive hold to that snapshot doesn't seem to hold
 all its children:
 $ pfexec zfs hold -r keep h...@next

Hmm, I made a number of fixes in build 132 related to destroying
snapshots while sending replication streams.  I'm unable to reproduce
the 'zfs holds -r' issue on build 133.  I'll try build 134, but I'm
not aware of any changes in that area.

-Chris

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] compression property not received

2010-04-07 Thread Tom Erickson


Daniel Bakken wrote:

Here is the info from zstreamdump -v on the sending side:

BEGIN record
hdrtype = 2
features = 0
magic = 2f5bacbac
creation_time = 0
type = 0
flags = 0x0
toguid = 0
fromguid = 0
toname = promise1/arch...@daily.1

nvlist version: 0
tosnap = daily.1
fss = (embedded nvlist)
nvlist version: 0
0xcfde021e56c8fc = (embedded nvlist)
nvlist version: 0
name = promise1/archive
parentfromsnap = 0x0
props = (embedded nvlist)
nvlist version: 0
mountpoint = /promise1/archive
compression = 0xa
dedup = 0x2
(end props)

I assume that compression = 0xa means gzip.


Yep, that's ZIO_COMPRESS_GZIP_6, the default gzip.

I wonder if the dedup 
property is causing the receiver (build 111b)  to disregard all other 
properties, since the receiver doesn't support dedup. Dedup was enabled 
in the past on the sending filesystem, but is now disabled for reasons 
of sanity.




Now I remember that 'zfs receive' used to give up after the first 
property it failed to set. If I'm remembering correctly, then, in this 
case, if the mountpoint was invalid on the receive side, 'zfs receive' 
would not even try to set the remaining properties.


You could try 'zfs get mountpoint' (or 'zdb -vvv poolname  file' and 
search the file for 'mountpoint') to see if that was set.


I'd like to try the dtrace debugging, but it would destroy the progress 
I've made so far transferring the filesystem.




Maybe you could try receiving into a new pool that you can later throw away.

zpool create bogustestpool c0t0d0
zfs send -R promise1/arch...@daily.1 | zfs receive -vd bogustestpool

I'd try the following in the source dataset:

zfs inherit mountpoint promise1/archive

to clear the explicit mountpoint and prevent it from being included in 
the send stream. Later set it back the way it was. (Soon there will be 
an option to take care of that; see CR 6883722 want 'zfs recv -o 
prop=value' to set initial property values of received dataset.) Then 
see if you receive the compression property successfully.


Tom
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] compression property not received

2010-04-07 Thread Tom Erickson


Daniel Bakken wrote:
We have found the problem. The mountpoint property on the sender was at 
one time changed from the default, then later changed back to defaults 
using zfs set instead of zfs inherit. Therefore, zfs send included these 
local non-default properties in the stream, even though the local 
properties are effectively set at defaults. This caused the receiver to 
stop processing subsequent properties in the stream because the 
mountpoint isn't valid on the receiver.


I tested this theory with a spare zpool. First I used zfs inherit 
mountpoint promise1/archive to remove the local setting (which was 
exactly the same value as the default). This time the compression=gzip 
property was correctly received.


It seems like a bug to me that one failed property in a stream prevents 
the rest from being applied. I should have used zfs inherit, but it 
would be best if zfs receive handled failures more gracefully, and 
attempted to set as many properties as possible.


Yes, that was fixed in build 128.


Thanks to Cindy and Tom for their help.


Glad to hear we identified the problem. Sorry for the trouble.

Tom
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS RaidZ recommendation

2010-04-07 Thread David Magda


On Apr 7, 2010, at 19:58, Edward Ned Harvey wrote:


From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
boun...@opensolaris.org] On Behalf Of David Magda

If you're going to go with (Open)Solaris, the OP may also want to  
look

into the multi-platform pkgsrc for third-party open source software:

http://www.pkgsrc.org/
http://en.wikipedia.org/wiki/Pkgsrc


Am I mistaken?  I thought pkgsrc was for netbsd.
For solaris/opensolaris, I would normally say opencsw or blastwave.   
(And in

some circumstances, sunfreeware.)


It was originally created by the NetBSD (forking from FreeBSD), but  
like everything else they seem to do, it's multi-platform: BSDs,  
Linux, Darwin/OS X, IRIX, AIX, Interix, QNX, HP-UX, and Solaris. AFAIK  
you can do cross-compiling as well (i.e., use Pkgsrc on Linux/AMD to  
compile a package for IRIX/MIPS).


Pkgsrc currently has 9500 packages; Blastwave 4500; OpenCSW about 2300  
AFAICT; FreeBSD Ports, 21500. YMMV.


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS RaidZ recommendation

2010-04-07 Thread Tim Cook

On Wed, Apr 7, 2010 at 5:59 PM, Richard Elling richard.ell...@gmail.comwrote:

 On Apr 7, 2010, at 3:24 PM, Tim Cook wrote:
  On Wednesday, April 7, 2010, Jason S j.sin...@shaw.ca wrote:
  Since i already have Open Solaris installed on the box, i probably wont
 jump over to FreeBSD. However someone has suggested to me to look into
 www.nexenta.org and i must say it is quite interesting. Someone correct me
 if i am wrong but it looks like it is Open Solaris based and has basically
 everything i am looking for (NFS and CIFS sharing). I am downloading it
 right now and am going to install it on another machine to see if this GUI
 is easy enough to use.
 
  Does anyone have any experience or pointers with this NAS software?
  --
  This message posted from opensolaris.org
  _
 
 
  I wouldn't waste your time. My last go round lacp was completely
  broken for no apparent reason. The community is basically
  non-existent.

 [richard pinches himself... yep, still there :-)]

 NexentaStor version 3.0 is based on b134 so it has the same basic
 foundation
 as the yet-unreleased OpenSolaris 2010.next.  For an easy-to-use NAS box
 for the masses, it is much more friendly and usable than a basic
  OpenSolaris
 or Solaris 10 release.
  -- richard

 ZFS storage and performance consulting at http://www.RichardElling.com
 ZFS training on deduplication, NexentaStor, and NAS performance
 Las Vegas, April 29-30, 2010 http://nexenta-vegas.eventbrite.com



**Unless of course you were looking for any community support or basic LACP
functionality.

;)

--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS RaidZ recommendation

2010-04-07 Thread Freddie Cash

On Wed, Apr 7, 2010 at 4:58 PM, Edward Ned Harvey solar...@nedharvey.comwrote:

  From: zfs-discuss-boun...@opensolaris.org [mailto:zfs-discuss-
  boun...@opensolaris.org] On Behalf Of David Magda
 
  If you're going to go with (Open)Solaris, the OP may also want to look
  into the multi-platform pkgsrc for third-party open source software:
 
http://www.pkgsrc.org/
http://en.wikipedia.org/wiki/Pkgsrc

 Am I mistaken?  I thought pkgsrc was for netbsd.
 For solaris/opensolaris, I would normally say opencsw or blastwave.  (And
 in
 some circumstances, sunfreeware.)


pkgsrc is available for several Unix-like systems.  NetBSD is just the
origin of it, and the main development environment.  It's even available for
MacOSX, DragonFlyBSD, Linux distros, and more.
-- 
Freddie Cash
fjwc...@gmail.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

Re: [zfs-discuss] ZFS RaidZ recommendation

2010-04-07 Thread Daniel Carosone

Go with the 2x7 raidz2.  When you start to really run out of space,
replace the drives with bigger ones.  You will run out of space
eventually regardless; this way you can replace 7 at a time, not 14 at
a time.   With luck, each replacement will last you long enough that
the next replacement will come when the next generation of drive sizes
is at the price sweet-spot.

--
Dan.

pgpLa7oRhxFnc.pgp
Description: PGP signature
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

66 matches

Mail list logo