Re: [zfs-discuss] ATA UDMA data parity error

2008-01-22 Thread Kent Watsen

For the archive, I swapped the mobo and all is good now...  (I copied 
100GB into the pool without a crash)

One problem I had was that Solaris would hang whenever booting - even 
when all the aoc-sat2-mv8 cards were pulled out.  Turns out that 
switching the BIOS field USB 2.0 Controller Mode from HiSpeed to 
FullSpeed makes the difference - any ideas why?

Thanks,
Kent

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] problem with nfs share of zfs storage

2008-01-22 Thread Robert Milkowski
Hello Francois,

Monday, January 21, 2008, 9:51:22 PM, you wrote:

FD I have a need to stream video over nfs. video is stored on zfs. every 10
FD minutes or so, the video will freeze, and then 1 minute later it
FD resumes. This doesn't happen from an nfs mount on ufs. zfs server is a
FD 32 bit P4 box with 512MB, running nexenta in plain text mode, and
FD nothing else, really. Tried playback from different OSes and the same is
FD happening. Network has more than 10x the capacity that is required, no
FD compression on zfs

FD Any idea what is going on? cpu is not pegged on server or playback
FD client. Not sure what to look for.



try to do iostat -xnz 1 while you are streamin and catch the moment
you experience a problem.

Also try vmstat -p 1 at the same time and catch the same moment.



-- 
Best regards,
 Robertmailto:[EMAIL PROTECTED]
   http://milek.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Ditto blocks in S10U4 ?

2008-01-22 Thread przemolicc
bash-3.00# cat /etc/release
Solaris 10 8/07 s10x_u4wos_12b X86
   Copyright 2007 Sun Microsystems, Inc.  All Rights Reserved.
Use is subject to license terms.
Assembled 16 August 2007

(with all the latest patches)

bash-3.00# zpool list
NAMESIZEUSED   AVAILCAP  HEALTH ALTROOT
zpool1 20.8T   5.44G   20.8T 0%  ONLINE -

bash-3.00# zpool upgrade -v
This system is currently running ZFS version 4.

The following versions are supported:

VER  DESCRIPTION
---  
 1   Initial ZFS version
 2   Ditto blocks (replicated metadata)
 3   Hot spares and double parity RAID-Z
 4   zpool history

For more information on a particular version, including supported
releases, see:

http://www.opensolaris.org/os/community/zfs/version/N

Where 'N' is the version number.

bash-3.00# zfs set copies=2 zpool1
cannot set property for 'zpool1': invalid property 'copies'

From http://www.opensolaris.org/os/community/zfs/version/2/
... This version includes support for Ditto Blocks, or replicated
metadata.

Can anybody shed any light on it ?

Regards
przemol

-- 
http://przemol.blogspot.com/






















--
Kogo Doda ciagnela do lozka?
Sprawdz   http://link.interia.pl/f1cde

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Ditto blocks in S10U4 ?

2008-01-22 Thread Tomas Ögren
On 22 January, 2008 - [EMAIL PROTECTED] sent me these 1,6K bytes:

 bash-3.00# cat /etc/release
 Solaris 10 8/07 s10x_u4wos_12b X86
Copyright 2007 Sun Microsystems, Inc.  All Rights Reserved.
 Use is subject to license terms.
 Assembled 16 August 2007
 
 (with all the latest patches)
 
 bash-3.00# zpool list
 NAMESIZEUSED   AVAILCAP  HEALTH ALTROOT
 zpool1 20.8T   5.44G   20.8T 0%  ONLINE -
 
 bash-3.00# zpool upgrade -v
 This system is currently running ZFS version 4.
 
 The following versions are supported:
 
 VER  DESCRIPTION
 ---  
  1   Initial ZFS version
  2   Ditto blocks (replicated metadata)
  3   Hot spares and double parity RAID-Z
  4   zpool history
 
 For more information on a particular version, including supported
 releases, see:
 
 http://www.opensolaris.org/os/community/zfs/version/N
 
 Where 'N' is the version number.
 
 bash-3.00# zfs set copies=2 zpool1
 cannot set property for 'zpool1': invalid property 'copies'
 
 From http://www.opensolaris.org/os/community/zfs/version/2/
 ... This version includes support for Ditto Blocks, or replicated
 metadata.
 
 Can anybody shed any light on it ?

The 'copies' thing in zfs set is ditto blocks for data.. the one in ver2
is for metadata only..

/Tomas
-- 
Tomas Ögren, [EMAIL PROTECTED], http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS vdev_cache

2008-01-22 Thread Manoj Nayak
Hi All,

If any dtrace script is available to figure out  the vdev_cache (or 
software track buffer) reads  in kiloBytes ?

The document says the default size of the read is 128k , However 
vdev_cache source code implementation says the default size is 64k

Thanks
Manoj Nayak



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Sparc zfs root/boot status ?

2008-01-22 Thread Mauro Mozzarelli
Back in October/November 2007 when I asked about Sparc zfs boot and root 
capabilities, I got a reply indicating late December 2007 for a possible 
release.

I was wondering what is the status right now, will this feature make it into 
build 79?
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS vq_max_pending value ?

2008-01-22 Thread Manoj Nayak
Hi All.

ZFS document says ZFS schedules it's I/O in such way that it manages to 
saturate a single disk bandwidth  using enough concurrent 128K I/O.
The no of concurrent I/O is decided by vq_max_pending.The default value 
for  vq_max_pending is 35.

We have created 4-disk raid-z group inside ZFS pool on Thumper.ZFS 
record size is set to 128k.When we read/write a 128K record ,it issue a
128K/3 I/O to each of the 3 data disks in the 4-disk raid-z group.

We need to saturate all three data disk bandwidth in the Raidz group.Is 
it required to set vq_max_pending value to 35*3=135  ?

Thanks
Manoj Nayak
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Updated ZFS Automatic Snapshot Service - version 0.10.

2008-01-22 Thread Tim Foster
Hi all,

I've got a slightly updated version of the ZFS Automatic Snapshot SMF
Service on my blog.

This version contains a few bugfixes (many thanks to Reid Spencer and
Breandan Dezendorf!) as well as a small new feature - by default we now
avoid taking snapshots for any datasets that are on a pool that's
currently being scrubbed or resilvered to avoid running into 6343667.

More at:
http://blogs.sun.com/timf/entry/zfs_automatic_snapshots_0_10


Is this service something that we'd like to put into OpenSolaris or are
there plans for something similar that achieves the same goal (and
perhaps integrates more neatly with the rest of ZFS) ?

Otherwise, should I start filling in an ARC one-pager template or is
this sort of utility something that's better left to sysadmins to
implement themselves, rather than baking it into the OS ?

cheers,
tim
-- 
Tim Foster, Sun Microsystems Inc, Solaris Engineering Ops
http://blogs.sun.com/timf

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Swap on ZVOL safe to use?

2008-01-22 Thread Darren J Moffat
Lori Alt wrote:
 The bug is being actively worked at this time (it just got a boost
 in urgency as a result of the issues it was causing for the
 zfs boot project).   It is likely that there will be a fix soon
 (sooner than zfs boot will be available).  In the
 meantime, I know of no workaround.  Maybe someone
 else does.

Is the fix to make it safe to swap on a ZVOL or is it the introduction 
of the raw (non COW) volumes mentioned previously ?

-- 
Darren J Moffat
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Sparc zfs root/boot status ?

2008-01-22 Thread Darren J Moffat
Mauro Mozzarelli wrote:
 Back in October/November 2007 when I asked about Sparc zfs boot and root 
 capabilities, I got a reply indicating late December 2007 for a possible 
 release.
 
 I was wondering what is the status right now, will this feature make it into 
 build 79?

No build 79 has long since closed and SPARC ZFS Boot isn't in it.

-- 
Darren J Moffat
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS vdev_cache

2008-01-22 Thread Roch - PAE

Manoj Nayak writes:
  Hi All,
  
  If any dtrace script is available to figure out  the vdev_cache (or 
  software track buffer) reads  in kiloBytes ?
  
  The document says the default size of the read is 128k , However 
  vdev_cache source code implementation says the default size is 64k
  
  Thanks
  Manoj Nayak
  

Which document ? It's 64K when it applies.
Nevada won't use the vdev_cache for data block anymore.

-r

  
  
  ___
  zfs-discuss mailing list
  zfs-discuss@opensolaris.org
  http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Sparc zfs root/boot status ?

2008-01-22 Thread Lori Alt
zfs boot on sparc will not be putback on its own.
It will be putback with the rest of zfs boot support,
sometime around build 86.

Lori

Mauro Mozzarelli wrote:
 Back in October/November 2007 when I asked about Sparc zfs boot and root 
 capabilities, I got a reply indicating late December 2007 for a possible 
 release.

 I was wondering what is the status right now, will this feature make it into 
 build 79?
  
  
 This message posted from opensolaris.org
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
   

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] LowEnd Batt. backed raid controllers that will deal with ZFS commit semantics correctly?

2008-01-22 Thread Kyle McDonald
Are there, or Does it make any sense to try to find a RAID card with 
battery backup that will ignore the ZFS commit commands when the battery 
is able to guarantee stable storage?

I don't know if they do this, but I've recently had good non-ZFS 
performance with the IBM ServeRAID 8k raid that was in an xSeries server 
I was using. the 8k has 256MB or batter backed cache.

The server it was in, only had 6 drive bays, and I'm not looking to have 
it do RAID5 for ZFS, but I just had the idea:

 Hey, I wonder if I could setup the card with 5 (single drive) RAID 0 LUNs,
  and gain the advantage of the the 256MB battery backed cache, when I tell
  ZFS to do RAIDZ across them?

I know battery-backed cache, and the proper commit semantics are 
generally found only on higher end raid controllers and arrays (right?) 
But I'm wondering now if I couldn't get an 8 port SATA controller that 
would let me map each single drive as a RAID 0 LUN and use it's cache to 
boost performance.

My primary use case, is NFS base storage to a farm of software build 
servers, and developer desktops.

Anyone searched for this already? Anyone found any reasons why it 
wouldn't work already?

  -Kyle


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] LowEnd Batt. backed raid controllers that will deal with ZFS commit semantics correctly?

2008-01-22 Thread Kyle McDonald
Are there, or Does it make any sense to try to find a RAID card with 
battery backup that will ignore the ZFS commit commands when the battery 
is able to guarantee stable storage?

I don't know if they do this, but I've recently had good non-ZFS 
performance with the IBM ServeRAID 8k raid that was in an xSeries server 
I was using. the 8k has 256MB or batter backed cache.

The server it was in, only had 6 drive bays, and I'm not looking to have 
it do RAID5 for ZFS, but I just had the idea:

 Hey, I wonder if I could setup the card with 5 (single drive) RAID 0 LUNs,
  and gain the advantage of the the 256MB battery backed cache, when I tell
  ZFS to do RAIDZ across them?

I know battery-backed cache, and the proper commit semantics are 
generally found only on higher end raid controllers and arrays (right?) 
But I'm wondering now if I couldn't get an 8 port SATA controller that 
would let me map each single drive as a RAID 0 LUN and use it's cache to 
boost performance.

My primary use case, is NFS base storage to a farm of software build 
servers, and developer desktops.

Anyone searched for this already? Anyone found any reasons why it 
wouldn't work already?

  -Kyle


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS vq_max_pending value ?

2008-01-22 Thread Richard Elling
Manoj Nayak wrote:
 Hi All.

 ZFS document says ZFS schedules it's I/O in such way that it manages to 
 saturate a single disk bandwidth  using enough concurrent 128K I/O.
 The no of concurrent I/O is decided by vq_max_pending.The default value 
 for  vq_max_pending is 35.

 We have created 4-disk raid-z group inside ZFS pool on Thumper.ZFS 
 record size is set to 128k.When we read/write a 128K record ,it issue a
 128K/3 I/O to each of the 3 data disks in the 4-disk raid-z group.
   

Yes, this is how it works for a read without errors.  For a write, you
should see 4 writes, each 128KBytes/3.  Writes may also be
coalesced, so you may see larger physical writes.

 We need to saturate all three data disk bandwidth in the Raidz group.Is 
 it required to set vq_max_pending value to 35*3=135  ?
   

No.  vq_max_pending applies to each vdev.  Use iostat to see what
the device load is.  For the commonly used Hitachi 500 GByte disks
in a thumper, the read media bandwidth is 31-64.8 MBytes/s.  Writes
will be about 80% of reads, or 24.8-51.8 MBytes/s.  In a thumper,
the disk bandwidth will be the limiting factor for the hardware.
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] LowEnd Batt. backed raid controllers that will deal with ZFS commit semantics correctly?

2008-01-22 Thread Albert Chin
On Tue, Jan 22, 2008 at 12:47:37PM -0500, Kyle McDonald wrote:
 
 My primary use case, is NFS base storage to a farm of software build 
 servers, and developer desktops.

For the above environment, you'll probably see a noticable improvement
with a battery-backed NVRAM-based ZIL. Unfortunately, no inexpensive
cards exist for the common consumer (with ECC memory anyways). If you
convince http://www.micromemory.com/ to sell you one, let us know :)

Set set zfs:zil_disable = 1 in /etc/system to gauge the type of
improvement you can expect. Don't use this in production though.

-- 
albert chin ([EMAIL PROTECTED])
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] LowEnd Batt. backed raid controllers that will deal with ZFS commit semantics correctly?

2008-01-22 Thread Kyle McDonald
Albert Chin wrote:
 On Tue, Jan 22, 2008 at 12:47:37PM -0500, Kyle McDonald wrote:
   
 My primary use case, is NFS base storage to a farm of software build 
 servers, and developer desktops.
 

 For the above environment, you'll probably see a noticable improvement
 with a battery-backed NVRAM-based ZIL. Unfortunately, no inexpensive
 cards exist for the common consumer (with ECC memory anyways). If you
 convince http://www.micromemory.com/ to sell you one, let us know :)

   
I know, but for a that card you need a driver to make it appear as a 
device. Plus it would take a PCI slot.
I was hoping to make use of the battery backed ram on a RAID card that I 
already have (but can't use since I want to let ZFS do the redundancy.)  
If I had a card with battery backed ram, how would I go about testing 
the commit semantics to see if it is only obeying ZFS commits when the 
battery is bad?

Does anyone know if the IBM ServeRAID 7k or 8k do this correctly? If not 
any chance of getting IBM to 'fix' the firmware? The Solaris RedBooks 
I've read, they seem to think highly of ZFS.

Back on the subject of NVRAM for ZIL devices, What are people using then 
for ZIL devices on the budget-limited side of things?

I've foudn some SATA Flash drive, and a bunch that are IDE. 
Unfortunately the HW I'd like to stick this in is a little older... It's 
got a U320 SCSI controller in it. Has anyone found a good U320 Flash 
Disk that's not overkill size wise, and not outrageously expensive? 
Google found what appear to be a few OEM vendors, but no resellers on 
the qty I'd be interested in.

Anyone using a USB Flash drive? Is USB fast enough to gain any benefits?

   -Kyle

 Set set zfs:zil_disable = 1 in /etc/system to gauge the type of
 improvement you can expect. Don't use this in production though.

   


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Sparc zfs root/boot status ?

2008-01-22 Thread andrewk9
 zfs boot on sparc will not be putback on its own.
 It will be putback with the rest of zfs boot support,
 sometime around build 86.

Since we already have ZFS boot on x86, what else will be added in addition to 
ZFS boot for SPARC?

Thanks

Andrew.
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] zpool attach problem

2008-01-22 Thread Rob
On a V240 running s10u4 (no additional patches), I had a pool which looked like 
this:

pre
 # zpool status
   pool: pool01
  state: ONLINE
  scrub: none requested
 config:
 
 NAME   STATE READ WRITE CKSUM
 pool01 ONLINE   0 0 0
   mirror   ONLINE   0 0 0
 c8t600C0FF0082668310F838000d0  ONLINE   0 0 0
 c8t600C0FF007E4BE4C38F4ED00d0  ONLINE   0 0 0
   mirror   ONLINE   0 0 0
 c8t600C0FF008266812A0877700d0  ONLINE   0 0 0
 c8t600C0FF007E4BE2BEDBC9600d0  ONLINE   0 0 0
 
 errors: No known data errors
/pre

Since this system is not in production yet, I wanted to do a little disk 
juggling as follows:

pre
 # zpool detach pool01 c8t600C0FF007E4BE4C38F4ED00d0
 # zpool detach pool01 c8t600C0FF007E4BE2BEDBC9600d0
/pre

New pool status:

pre
 # zpool status
   pool: pool01
  state: ONLINE
  scrub: none requested
 config:
 
 NAME STATE READ WRITE CKSUM
 pool01   ONLINE   0 0 0
   c8t600C0FF0082668310F838000d0  ONLINE   0 0 0
   c8t600C0FF008266812A0877700d0  ONLINE   0 0 0
 
 errors: No known data errors
/pre

Finally, I wanted to re-establish mirrors, but am seeing the following errors:

pre
 # zpool attach pool01 c8t600C0FF008266812A0877700d0 
 c8t600C0FF007E4BE4C38F4ED00d0
 cannot attach c8t600C0FF007E4BE4C38F4ED00d0 to 
 c8t600C0FF008266812A0877700d0: device is too small
 # zpool attach pool01 c8t600C0FF0082668310F838000d0 
 c8t600C0FF007E4BE2BEDBC9600d0
 cannot attach c8t600C0FF007E4BE2BEDBC9600d0 to 
 c8t600C0FF0082668310F838000d0: device is too small
/pre

Is this expected behavior? The 'zpool' man page says:

If device is not currently part of a mirrored configuration,  device
automatically  transforms  into a two-way  mirror of device and new_device.

But, this isn't what I'm seeing . . . did I do something wrong?

Here's the format output for the disks:

pre
   4. c8t600C0FF007E4BE2BEDBC9600d0 lt;SUN-StorEdge 
3510-421F-545.91GBgt;
  /scsi_vhci/[EMAIL PROTECTED]
   5. c8t600C0FF007E4BE4C38F4ED00d0 lt;SUN-StorEdge 
3510-421F-545.91GBgt;
  /scsi_vhci/[EMAIL PROTECTED]
   6. c8t600C0FF008266812A0877700d0 lt;SUN-StorEdge 
3510-421F-545.91GBgt;
  /scsi_vhci/[EMAIL PROTECTED]
   7. c8t600C0FF0082668310F838000d0 lt;SUN-StorEdge 
3510-421F-545.91GBgt;
  /scsi_vhci/[EMAIL PROTECTED]
/pre

Rob
 
 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS vq_max_pending value ?

2008-01-22 Thread manoj nayak

 Manoj Nayak wrote:
 Hi All.

 ZFS document says ZFS schedules it's I/O in such way that it manages to 
 saturate a single disk bandwidth  using enough concurrent 128K I/O.
 The no of concurrent I/O is decided by vq_max_pending.The default value 
 for  vq_max_pending is 35.

 We have created 4-disk raid-z group inside ZFS pool on Thumper.ZFS record 
 size is set to 128k.When we read/write a 128K record ,it issue a
 128K/3 I/O to each of the 3 data disks in the 4-disk raid-z group.


 Yes, this is how it works for a read without errors.  For a write, you
 should see 4 writes, each 128KBytes/3.  Writes may also be
 coalesced, so you may see larger physical writes.

 We need to saturate all three data disk bandwidth in the Raidz group.Is 
 it required to set vq_max_pending value to 35*3=135  ?


 No.  vq_max_pending applies to each vdev.

4 disk raidz group issues 128k/3=42.6k io to each individual data disk.If 35 
concurrent 128k IO is enough to saturate a disk( vdev ) ,
then 35*3=105 concurrent 42k io will be required to saturates the same disk.

Thanks
Manoj Nayak

Use iostat to see what
 the device load is.  For the commonly used Hitachi 500 GByte disks
 in a thumper, the read media bandwidth is 31-64.8 MBytes/s.  Writes
 will be about 80% of reads, or 24.8-51.8 MBytes/s.  In a thumper,
 the disk bandwidth will be the limiting factor for the hardware.
 -- richard

 

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS vdev_cache

2008-01-22 Thread manoj nayak

 Manoj Nayak writes:
  Hi All,
 
  If any dtrace script is available to figure out  the vdev_cache (or
  software track buffer) reads  in kiloBytes ?
 
  The document says the default size of the read is 128k , However
  vdev_cache source code implementation says the default size is 64k
 
  Thanks
  Manoj Nayak
 

 Which document ? It's 64K when it applies.
 Nevada won't use the vdev_cache for data block anymore.

How readahead or software track buffer is going to used in Navada without 
vdev_cache ? Any pointer to documents regarding that ?

Thanks
Manoj Nayak


 -r

 
 
  ___
  zfs-discuss mailing list
  zfs-discuss@opensolaris.org
  http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

 

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS vq_max_pending value ?

2008-01-22 Thread Richard Elling
manoj nayak wrote:

 Manoj Nayak wrote:
 Hi All.

 ZFS document says ZFS schedules it's I/O in such way that it manages 
 to saturate a single disk bandwidth  using enough concurrent 128K I/O.
 The no of concurrent I/O is decided by vq_max_pending.The default 
 value for  vq_max_pending is 35.

 We have created 4-disk raid-z group inside ZFS pool on Thumper.ZFS 
 record size is set to 128k.When we read/write a 128K record ,it issue a
 128K/3 I/O to each of the 3 data disks in the 4-disk raid-z group.


 Yes, this is how it works for a read without errors.  For a write, you
 should see 4 writes, each 128KBytes/3.  Writes may also be
 coalesced, so you may see larger physical writes.

 We need to saturate all three data disk bandwidth in the Raidz 
 group.Is it required to set vq_max_pending value to 35*3=135  ?


 No.  vq_max_pending applies to each vdev.

 4 disk raidz group issues 128k/3=42.6k io to each individual data 
 disk.If 35 concurrent 128k IO is enough to saturate a disk( vdev ) ,
 then 35*3=105 concurrent 42k io will be required to saturates the same 
 disk.

ZFS doesn't know anything about disk saturation.  It will send
up to vq_max_pending  I/O requests per vdev (usually a vdev is a
disk). It will try to keep vq_max_pending I/O requests queued to
the vdev.

For writes, you should see them become coalesced, so rather than
sending 3 42.6kByte write requests to a vdev, you might see one
128kByte write request.

In other words, ZFS has an I/O scheduler which is responsible
for sending I/O requests to vdevs.
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Sparc zfs root/boot status ?

2008-01-22 Thread David Magda
On Jan 22, 2008, at 18:24, Lori Alt wrote:

 ZFS boot supported by the installation software, plus
 support for having swap and dump be zvols within
 the root pool (i.e., no longer requiring a separate
 swap/dump slice), plus various other features, such
 as support for failsafe-archive booting.

Will there any support for tying into patching / Live Upgrade with  
the ZFS boot put back, or is that a separate project?

Thanks for any info.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] LowEnd Batt. backed raid controllers that will deal with ZFS commit semantics correctly?

2008-01-22 Thread Kyle McDonald
Carson Gaspar wrote:
 Kyle McDonald wrote:
 ...
   
 I know, but for a that card you need a driver to make it appear as a 
 device. Plus it would take a PCI slot.
 I was hoping to make use of the battery backed ram on a RAID card that I 
 already have (but can't use since I want to let ZFS do the redundancy.)  
 If I had a card with battery backed ram, how would I go about testing 
 the commit semantics to see if it is only obeying ZFS commits when the 
 battery is bad?
 

 Any _sane_ controller that supports battery backed cache will disable 
 its write cache if its battery goes bad. It should also log this. I'd 
 check the docs or contact your vendor's tech support to verify the card 
 you have is sane, and if it reports the error to its monitoring tools so 
 you find out about it quickly.
   
You're right. I forgot that. Not only would the commits need to happen 
right away, but the cache should be disabled completely.

Now that you mention it, I know from experience, for the ServeRAID 7k/8k 
controllers, the cache is disabled if/when the battery fails. Good point.

Now I just need to determine if  a) the cache is used by the card even 
when useing the disks on it as JBOD, or b) if the card will allow me to 
make 5 or 6 raid 0 luns with only 1 disk in each, to simulate (a) and 
activate the write cache.

Anyone know the answer to this? I'll be ordering 2 of the 7K's for my 
x346's this week. If niether A nor B will work I'm not sure there's any 
advantage to using the 7k card considering I want ZFS to do the mirroring.

If this all does work, it should speed up all the writes to the disk, 
including the ZIL writes. Is there still an advantage to investigating a 
Solid State Disk, or Flash Drive device to reloacte the ZIL to?
 Now you'll probably _still_ need to disable the ZFS cache flushes, which 
 is a global option, so you'd need to make sure that _all_ your ZFS 
 devices had battery backed write caches or no write caches at all.

   
I guess this is a better solution than chasing down firmware authors to 
get them to ignore flush requests.
It's just too bad it's not settable on a pool by pool basis rather than 
server by server. Won't affect me though this will be the only pool on 
this machine.

   -Kyle

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS vq_max_pending value ?

2008-01-22 Thread Richard Elling
manoj nayak wrote:

 - Original Message - From: Richard Elling 
 [EMAIL PROTECTED]
 To: manoj nayak [EMAIL PROTECTED]
 Cc: zfs-discuss@opensolaris.org
 Sent: Wednesday, January 23, 2008 7:20 AM
 Subject: Re: [zfs-discuss] ZFS vq_max_pending value ?


 manoj nayak wrote:

 Manoj Nayak wrote:
 Hi All.

 ZFS document says ZFS schedules it's I/O in such way that it 
 manages to saturate a single disk bandwidth  using enough 
 concurrent 128K I/O.
 The no of concurrent I/O is decided by vq_max_pending.The default 
 value for  vq_max_pending is 35.

 We have created 4-disk raid-z group inside ZFS pool on Thumper.ZFS 
 record size is set to 128k.When we read/write a 128K record ,it 
 issue a
 128K/3 I/O to each of the 3 data disks in the 4-disk raid-z group.


 Yes, this is how it works for a read without errors.  For a write, you
 should see 4 writes, each 128KBytes/3.  Writes may also be
 coalesced, so you may see larger physical writes.

 We need to saturate all three data disk bandwidth in the Raidz 
 group.Is it required to set vq_max_pending value to 35*3=135  ?


 No.  vq_max_pending applies to each vdev.

 4 disk raidz group issues 128k/3=42.6k io to each individual data 
 disk.If 35 concurrent 128k IO is enough to saturate a disk( vdev ) ,
 then 35*3=105 concurrent 42k io will be required to saturates the 
 same disk.

 ZFS doesn't know anything about disk saturation.  It will send
 up to vq_max_pending  I/O requests per vdev (usually a vdev is a
 disk). It will try to keep vq_max_pending I/O requests queued to
 the vdev.

 I can see the avg pending I/Os hitting my  vq_max_pending limit, 
 then raising the limit would be a good thing. I think , it's due to
 many 42k Read IO to individual disk in the 4 disk raidz group.

You're dealing with a queue here.  iostat's average pending I/Os represents
the queue depth.   Some devices can't handle a large queue.  In any
case, queuing theory applies.

Note that for reads, the disk will likely have a track cache, so it is
not a good assumption that a read I/O will require a media access.
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] LowEnd Batt. backed raid controllers that will deal with ZFS commit semantics correctly?

2008-01-22 Thread Kyle McDonald
Albert Chin wrote:
 On Tue, Jan 22, 2008 at 09:20:30PM -0500, Kyle McDonald wrote:
   
 Anyone know the answer to this? I'll be ordering 2 of the 7K's for
 my x346's this week. If niether A nor B will work I'm not sure
 there's any advantage to using the 7k card considering I want ZFS to
 do the mirroring.
 

 Why even both with a H/W RAID array when you won't use the H/W RAID?
 Better to find a decent SAS/FC JBOD with cache. Would definitely be
 cheaper.

   
I've never heard of such a thing? Do you have any links (cheap or not?)

Do they exist for less than $350? Thats what the 7k will run me.
Do they include an enclosure for at least 6 disks? the 7k will use the 6 
U320 hot swap bays already in my IBM x346 chassis.

I'm not being sarcastic, if something better exists, even for  a little 
more, I'm interested. I'd especially love to switch to SATA as I'm about 
to pay about $550 each for 300GB U320 drives, and with SATA I could go 
bigger, or save money or both. :)

   -Kyle

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS vq_max_pending value ?

2008-01-22 Thread Manoj Nayak

 4 disk raidz group issues 128k/3=42.6k io to each individual data 
 disk.If 35 concurrent 128k IO is enough to saturate a disk( vdev ) ,
 then 35*3=105 concurrent 42k io will be required to saturates the 
 same disk.

 ZFS doesn't know anything about disk saturation.  It will send
 up to vq_max_pending  I/O requests per vdev (usually a vdev is a
 disk). It will try to keep vq_max_pending I/O requests queued to
 the vdev.

 I can see the avg pending I/Os hitting my  vq_max_pending limit, 
 then raising the limit would be a good thing. I think , it's due to
 many 42k Read IO to individual disk in the 4 disk raidz group.

 You're dealing with a queue here.  iostat's average pending I/Os 
 represents
 the queue depth.   Some devices can't handle a large queue.  In any
 case, queuing theory applies.

 Note that for reads, the disk will likely have a track cache, so it is
 not a good assumption that a read I/O will require a media access.
My workload issues around 5000 MB read I/0  iopattern says around 55% 
of the IO are random in nature.
I don't know how much prefetching through track cache is going to help 
here.Probably I can try disabling vdev_cache
through set 'zfs_vdev_cache_max' 1

Thanks
Manoj Nayak
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss