Re: [zfs-discuss] zpool with very different sized vdevs?

2009-10-24 Thread Orvar Korvar
You could add these new drives to your zpool. Then you should create a new vdev 
as a raidz1 or raidz2 vdev, and then add them to your zpool. I suggest raidz2, 
becuase that gives you greater reliability.

However, you can not remove a vdev. In the future, say that you have swapped 
your original drives to 4TB, then those small drives will only be a nuisance 
(unless you swap them to larger drives too). And you can not remove these small 
drives, you can not remove a vdev. They will only suck power and make noice. 
Therefore, I myself, would not have added them to your zpool. Instead I would 
have created another zpool2 of these small drives. That zpool2 you can destroy 
later if you wish.

I am building a zpool with 8 of 1TB drives. In the future, I will just swap 
them to 2TB or 4TB drives, when I need more capacity. Instead of adding lots 
and lots of small drives. They will only give med headache later when 4TB 
drives are common.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] bewailing of the n00b

2009-10-24 Thread Orvar Korvar
You dont need a hw raid card with ZFS, instead ZFS prefers to work alone. It is 
the best solution to ditch a hw raid card.

I strongly advice you to use raidz2 (raid-6). Because if you use raidz1 
(raid-5) and a drive fails, you have to swap that disc and repair your zfs 
raid. That will cause lots of stress on your other drives. If you try to repair 
a 1TB drive, it can take 6h. If you try to repair a 4TB it can take a week. 
During that time, if another drive fails, your data is lost. It takes long time 
to repair a raid, because the transfer speeds are slow compared to storage 
capacity.

If you have lots of discs, it is adviced to create several groups (vdev) and 
make one large zpool of the groups. Each group should be a raidz1 or raidz2. I 
prefer raidz2. You can add a new group later if you wish. But you can not 
remove a group.

But, it is as easy to just swap the drives to larger ones. Say you have 1TB 
drives in your zpool, then you can swap one drive at a time to 2TB drives. Then 
your zpool have grown in capacity. Therefore I have a zpool of 8 drives 1TB. If 
I need more capacity, I just swap them to 2TB or 4TB drives. Instead of adding 
more and more small drives that consume power and make noice. In the future, 
one drive will be 10TB. Then I regret I have lots of small drives. Therefore, I 
would make a zpool of 8 discs, configured as one raidz2. And add no more 
drives. Instead, I would swap the drives to larger and larger. If I had 16 
drives, I would make two raidz2 groups. One raidz2 should consist of 6-8 
drives. It is better to have several smaller groups in your zpool, than one 
large group - because each group increases performance. (Not sure on this: It 
is similar to one group gives you the performance of ONE drive. If you have 24 
discs in one group, they will be slow as one drive, when talking
  about IOPS. If you have 8 discs in 3 groups, they will be fast as three 
drives, when talking about IOPS, i.e. head move speed and latency)

if you mix sizes of discs, only the smallest size will be valid in one group. 
Say you have 500GB and 1TB drives. If you make one group, it will 500GB drives. 
The proper way would be to make two different groups (each one raidz1/raidz2), 
and add them to your zpool.

Lastly, you must read ALL of these articles:
http://breden.org.uk/2008/03/02/a-home-fileserver-using-zfs/
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] bewailing of the n00b

2009-10-24 Thread Orvar Korvar
And, ZFS likes 64bit CPUs. I had 32bit P4 and 1GB RAM. It worked fine, but I 
only got 20-30MB/sec. 64 bit CPU and 2-3GB gives you over 100MB/sec.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SNV_125 MPT warning in logfile

2009-10-24 Thread Markus Kovero
How do you estimate needed queue depth if one has say 64 to 128 disks sitting 
behind LSI?
Is it bad idea having queuedepth 1?

Yours
Markus Kovero


Lähettäjä: zfs-discuss-boun...@opensolaris.org 
[zfs-discuss-boun...@opensolaris.org] k#228;ytt#228;j#228;n Richard Elling 
[richard.ell...@gmail.com] puolesta
Lähetetty: 24. lokakuuta 2009 7:36
Vastaanottaja: Adam Cheal
Kopio: zfs-discuss@opensolaris.org
Aihe: Re: [zfs-discuss] SNV_125 MPT warning in logfile

ok, see below...

On Oct 23, 2009, at 8:14 PM, Adam Cheal wrote:

 Here is example of the pool config we use:

 # zpool status
  pool: pool002
 state: ONLINE
 scrub: scrub stopped after 0h1m with 0 errors on Fri Oct 23 23:07:52
 2009
 config:

NAME STATE READ WRITE CKSUM
pool002  ONLINE   0 0 0
  raidz2 ONLINE   0 0 0
c9t18d0  ONLINE   0 0 0
c9t17d0  ONLINE   0 0 0
c9t55d0  ONLINE   0 0 0
c9t13d0  ONLINE   0 0 0
c9t15d0  ONLINE   0 0 0
c9t16d0  ONLINE   0 0 0
c9t11d0  ONLINE   0 0 0
c9t12d0  ONLINE   0 0 0
c9t14d0  ONLINE   0 0 0
c9t9d0   ONLINE   0 0 0
c9t8d0   ONLINE   0 0 0
c9t10d0  ONLINE   0 0 0
c9t29d0  ONLINE   0 0 0
c9t28d0  ONLINE   0 0 0
c9t27d0  ONLINE   0 0 0
c9t23d0  ONLINE   0 0 0
c9t25d0  ONLINE   0 0 0
c9t26d0  ONLINE   0 0 0
c9t21d0  ONLINE   0 0 0
c9t22d0  ONLINE   0 0 0
c9t24d0  ONLINE   0 0 0
c9t19d0  ONLINE   0 0 0
  raidz2 ONLINE   0 0 0
c9t30d0  ONLINE   0 0 0
c9t31d0  ONLINE   0 0 0
c9t32d0  ONLINE   0 0 0
c9t33d0  ONLINE   0 0 0
c9t34d0  ONLINE   0 0 0
c9t35d0  ONLINE   0 0 0
c9t36d0  ONLINE   0 0 0
c9t37d0  ONLINE   0 0 0
c9t38d0  ONLINE   0 0 0
c9t39d0  ONLINE   0 0 0
c9t40d0  ONLINE   0 0 0
c9t41d0  ONLINE   0 0 0
c9t42d0  ONLINE   0 0 0
c9t44d0  ONLINE   0 0 0
c9t45d0  ONLINE   0 0 0
c9t46d0  ONLINE   0 0 0
c9t47d0  ONLINE   0 0 0
c9t48d0  ONLINE   0 0 0
c9t49d0  ONLINE   0 0 0
c9t50d0  ONLINE   0 0 0
c9t51d0  ONLINE   0 0 0
c9t52d0  ONLINE   0 0 0
cache
  c8t2d0 ONLINE   0 0 0
  c8t3d0 ONLINE   0 0 0
spares
  c9t20d0AVAIL
  c9t43d0AVAIL

 errors: No known data errors

  pool: rpool
 state: ONLINE
 scrub: none requested
 config:

NAME  STATE READ WRITE CKSUM
rpool ONLINE   0 0 0
  mirror  ONLINE   0 0 0
c8t0d0s0  ONLINE   0 0 0
c8t1d0s0  ONLINE   0 0 0

 errors: No known data errors

 ...and here is a snapshot of the system using iostat -indexC 5
 during a scrub of pool002 (c8 is onboard AHCI controller, c9 is
 LSI SAS 3801E):

  extended device statistics   
 errors ---
r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b s/w h/w
 trn tot device
0.00.00.00.0  0.0  0.00.00.0   0   0   0
 0   0   0 c8
0.00.00.00.0  0.0  0.00.00.0   0   0   0
 0   0   0 c8t0d0
0.00.00.00.0  0.0  0.00.00.0   0   0   0
 0   0   0 c8t1d0
0.00.00.00.0  0.0  0.00.00.0   0   0   0
 0   0   0 c8t2d0
0.00.00.00.0  0.0  0.00.00.0   0   0   0
 0   0   0 c8t3d0
 8738.70.0 555346.10.0  0.1 345.00.0   39.5   0 3875
 0   1   1   2 c9

You see 345 entries in the active queue. If the controller rolls over at
511 active entries, then it would explain why it would soon begin to
have difficulty.

Meanwhile, it is providing 8,738 IOPS and 555 MB/sec, which is quite
respectable.

  194.80.0 11936.90.0  0.0  7.90.0   40.3   0  87   0
 0   0   0 c9t8d0

These disks are doing almost 200 read IOPS, but are not 100% busy.
Average I/O size is 66 KB, which is not bad, lots of little I/Os could
be
worse, but at only 11.9 MB/s, you are not near the media bandwidth.
Average service time is 40.3 milliseconds, which is not 

Re: [zfs-discuss] new google group for ZFS on OSX

2009-10-24 Thread Craig Morgan
Gruber (http://daringfireball.net/linked/2009/10/23/zfs) is normally  
well-informed and has some feedbackseems possible that legal canned it.


--Craig

On 23 Oct 2009, at 20:42, Tim Cook wrote:




On Fri, Oct 23, 2009 at 2:38 PM, Richard Elling richard.ell...@gmail.com 
 wrote:

FYI,
The ZFS project on MacOS forge (zfs.macosforge.org) has provided the
following announcement:

  ZFS Project Shutdown2009-10-23
  The ZFS project has been discontinued. The mailing list and  
repository will

  also be removed shortly.

The community is migrating to a new google group:
  http://groups.google.com/group/zfs-macos

-- richard


Any official word from Apple on the abandonment?

--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


--
Craig Morgan
Cinnabar Solutions Ltd

t: +44 (0)791 338 3190
f: +44 (0)870 705 1726
e: cr...@cinnabar-solutions.com
w: www.cinnabar-solutions.com



--
Craig

Craig Morgan
t: +44 (0)791 338 3190
f: +44 (0)870 705 1726
e: craig.mor...@sun.com

~ 


 NOTICE:  This email message is for the sole use of the intended
 recipient(s) and may contain confidential and privileged information.
 Any unauthorized review, use, disclosure or distribution is  
prohibited.

 If you are not the intended recipient, please contact the sender by
 reply email and destroy all copies of the original message.
~ 





___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SNV_125 MPT warning in logfile

2009-10-24 Thread Adam Cheal
The iostat I posted previously was from a system we had already tuned the 
zfs:zfs_vdev_max_pending depth down to 10 (as visible by the max of about 10 in 
actv per disk).

I reset this value in /etc/system to 7, rebooted, and started a scrub. iostat 
output showed busier disks (%b is higher, which seemed odd) but a cap of about 
7 queue items per disk, proving the tuning was effective. iostat at a 
high-water mark during the test looked like this:

extended device statistics  
r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
0.00.00.00.0  0.0  0.00.00.0   0   0 c8
0.00.00.00.0  0.0  0.00.00.0   0   0 c8t0d0
0.00.00.00.0  0.0  0.00.00.0   0   0 c8t1d0
0.00.00.00.0  0.0  0.00.00.0   0   0 c8t2d0
0.00.00.00.0  0.0  0.00.00.0   0   0 c8t3d0
 8344.50.0 359640.40.0  0.1 300.50.0   36.0   0 4362 c9
  190.00.0 6800.40.0  0.0  6.60.0   34.8   0  99 c9t8d0
  185.00.0 6917.10.0  0.0  6.10.0   32.9   0  94 c9t9d0
  187.00.0 6640.90.0  0.0  6.50.0   34.6   0  98 c9t10d0
  186.50.0 6543.40.0  0.0  7.00.0   37.5   0 100 c9t11d0
  180.50.0 7203.10.0  0.0  6.70.0   37.2   0 100 c9t12d0
  195.50.0 7352.40.0  0.0  7.00.0   35.8   0 100 c9t13d0
  188.00.0 6884.90.0  0.0  6.60.0   35.2   0  99 c9t14d0
  204.00.0 6990.10.0  0.0  7.00.0   34.3   0 100 c9t15d0
  199.00.0 7336.70.0  0.0  7.00.0   35.2   0 100 c9t16d0
  180.50.0 6837.90.0  0.0  7.00.0   38.8   0 100 c9t17d0
  198.00.0 7668.90.0  0.0  7.00.0   35.3   0 100 c9t18d0
  203.00.0 7983.20.0  0.0  7.00.0   34.5   0 100 c9t19d0
0.00.00.00.0  0.0  0.00.00.0   0   0 c9t20d0
  195.50.0 7096.40.0  0.0  6.70.0   34.1   0  98 c9t21d0
  189.50.0 7757.20.0  0.0  6.40.0   33.9   0  97 c9t22d0
  195.50.0 7645.90.0  0.0  6.60.0   33.8   0  99 c9t23d0
  194.50.0 7925.90.0  0.0  7.00.0   36.0   0 100 c9t24d0
  188.50.0 6725.60.0  0.0  6.20.0   32.8   0  94 c9t25d0
  188.50.0 7199.60.0  0.0  6.50.0   34.6   0  98 c9t26d0
  196.00.0 .90.0  0.0  6.30.0   32.1   0  95 c9t27d0
  193.50.0 7455.40.0  0.0  6.20.0   32.0   0  95 c9t28d0
  189.00.0 7400.90.0  0.0  6.30.0   33.2   0  96 c9t29d0
  182.50.0 9397.00.0  0.0  7.00.0   38.3   0 100 c9t30d0
  192.50.0 9179.50.0  0.0  7.00.0   36.3   0 100 c9t31d0
  189.50.0 9431.80.0  0.0  7.00.0   36.9   0 100 c9t32d0
  187.50.0 9082.00.0  0.0  7.00.0   37.3   0 100 c9t33d0
  188.50.0 9368.80.0  0.0  7.00.0   37.1   0 100 c9t34d0
  180.50.0 9332.80.0  0.0  7.00.0   38.8   0 100 c9t35d0
  183.00.0 9690.30.0  0.0  7.00.0   38.2   0 100 c9t36d0
  186.00.0 9193.80.0  0.0  7.00.0   37.6   0 100 c9t37d0
  180.50.0 8233.40.0  0.0  7.00.0   38.8   0 100 c9t38d0
  175.50.0 9085.20.0  0.0  7.00.0   39.9   0 100 c9t39d0
  177.00.0 9340.00.0  0.0  7.00.0   39.5   0 100 c9t40d0
  175.50.0 8831.00.0  0.0  7.00.0   39.9   0 100 c9t41d0
  190.50.0 9177.80.0  0.0  7.00.0   36.7   0 100 c9t42d0
0.00.00.00.0  0.0  0.00.00.0   0   0 c9t43d0
  196.00.0 9180.50.0  0.0  7.00.0   35.7   0 100 c9t44d0
  193.50.0 9496.80.0  0.0  7.00.0   36.2   0 100 c9t45d0
  187.00.0 8699.50.0  0.0  7.00.0   37.4   0 100 c9t46d0
  198.50.0 9277.00.0  0.0  7.00.0   35.2   0 100 c9t47d0
  185.50.0 9778.30.0  0.0  7.00.0   37.7   0 100 c9t48d0
  192.00.0 8384.20.0  0.0  7.00.0   36.4   0 100 c9t49d0
  198.50.0 8864.70.0  0.0  7.00.0   35.2   0 100 c9t50d0
  192.00.0 9369.80.0  0.0  7.00.0   36.4   0 100 c9t51d0
  182.50.0 8825.70.0  0.0  7.00.0   38.3   0 100 c9t52d0
  202.00.0 7387.90.0  0.0  7.00.0   34.6   0 100 c9t55d0

...and sure enough about 20 minutes into it I get this (bus reset?):

scsi: [ID 107833 kern.warning] WARNING: 
/p...@0,0/pci8086,6...@4/pci1000,3...@0/s...@34,0 (sd49):
   incomplete read- retrying
scsi: [ID 107833 kern.warning] WARNING: 
/p...@0,0/pci8086,6...@4/pci1000,3...@0/s...@21,0 (sd30):
   incomplete read- retrying
scsi: [ID 107833 kern.warning] WARNING: 
/p...@0,0/pci8086,6...@4/pci1000,3...@0/s...@1e,0 (sd27):
   incomplete read- retrying
scsi: [ID 365881 kern.info] /p...@0,0/pci8086,6...@4/pci1000,3...@0 (mpt0):
   Rev. 8 LSI, Inc. 1068E found.
scsi: [ID 365881 kern.info] /p...@0,0/pci8086,6...@4/pci1000,3...@0 (mpt0):
   mpt0 supports power management.
scsi: [ID 365881 kern.info] /p...@0,0/pci8086,6...@4/pci1000,3...@0 (mpt0):
   mpt0: IOC Operational.

During the bus reset, iostat output 

[zfs-discuss] Zpool issue question

2009-10-24 Thread Karim Akhssassi

Hi,

Im Karim from the Solaris software support, i need a help from your side 
regarding this issue :


Why ZFS is full while it's zpool has 3.11 Go available ?

zfs list -t filesystem | egrep db-smp|NAME
NAME USED AVAIL REFER MOUNTPOINT
db-smp.zpool 196G 0 1K legacy
db-smp.zpool/db-smp.zfs 180G 0 70.8G 
/opt/quetzal.zone/data/quetzal.zone/root/opt/db-smp
db-smp.zpool/oraarch.zfs 15.8G 0 15.8G 
/opt/quetzal.zone/data/quetzal.zone/root/opt/db-smp/oracle/SMP/oraarch/

zpool list | egrep db-smp|NAME
NAME SIZE USED AVAIL CAP HEALTH ALTROOT
db-smp.zpool 199G 196G 3.11G 98% ONLINE 
/opt/quetzal.zone/data/quetzal.zone/root


*I Found*http://www.opensolaris.org/os/community/zfs/faq/#zfsspace 
http://www.opensolaris.org/os/community/zfs/faq/#zfsspace

---
1) Why doesn't the space that is reported by the zpool list command and 
the zfs list command match?
The available space that is reported by the zpool list command is the 
amount of physical disk space. The zfs list command lists the usable 
space that is available to file systems, which is disk space minus ZFS 
redundancy metadata overhead, if any.

**

if this doc can explain the above issue, how can i know (calculate) the 
zfs redundancy metadata?how can i know if it exists?

otherwise how can we fix the above problem.

I appreciate your help

regards

A.Karim


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Apple cans ZFS project

2009-10-24 Thread Alex Blewitt
Apple has finally canned [1] the ZFS port [2]. To try and keep momentum up and 
continue to use the best filing system available, a group of fans have set up a 
continuation project and mailing list [3,4].

If anyone's interested in joining in to help, please join in the mailing list.

[1] http://alblue.blogspot.com/2009/10/apple-finally-kill-off-zfs.html
[2] http://zfs.macosforge.org
[3] http://code.google.com/p/maczfs/
[4] http://groups.google.com/group/zfs-macos
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SNV_125 MPT warning in logfile

2009-10-24 Thread Markus Kovero
We actually hit similar issues with LSI, but within workload not scrub, result 
is same but it seems to choke on writes rather than reads with suboptimal 
performance.
http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6891413

Anyway, we haven't experienced this _at all_ with RE3-version of Western 
Digital disks..
Issues seem to pop up with 750GB seagate and 1TB WD black-series, so far 2TB 
green WDs seem unaffected too, so might it be related to disks firmware due how 
they chat with LSI?

Also, we noticed more severe (even RE3 and 2TBWD green) timeouts if disks are 
not forced into SATA1-mode, I believe this is known issue with newer 2TB disks 
and some other disk controllers and may be caused by bad cabling or 
connectivity.

We have never witnessed this behaviour with SAS (fujitsu,ibm..) also. All this 
happens with snv 118,122,123 and 125.

Yours
Markus Kovero


Lähettäjä: zfs-discuss-boun...@opensolaris.org 
[zfs-discuss-boun...@opensolaris.org] k#228;ytt#228;j#228;n Adam Cheal 
[ach...@pnimedia.com] puolesta
Lähetetty: 24. lokakuuta 2009 12:49
Vastaanottaja: zfs-discuss@opensolaris.org
Aihe: Re: [zfs-discuss] SNV_125 MPT warning in logfile

The iostat I posted previously was from a system we had already tuned the 
zfs:zfs_vdev_max_pending depth down to 10 (as visible by the max of about 10 in 
actv per disk).

I reset this value in /etc/system to 7, rebooted, and started a scrub. iostat 
output showed busier disks (%b is higher, which seemed odd) but a cap of about 
7 queue items per disk, proving the tuning was effective. iostat at a 
high-water mark during the test looked like this:

extended device statistics
r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
0.00.00.00.0  0.0  0.00.00.0   0   0 c8
0.00.00.00.0  0.0  0.00.00.0   0   0 c8t0d0
0.00.00.00.0  0.0  0.00.00.0   0   0 c8t1d0
0.00.00.00.0  0.0  0.00.00.0   0   0 c8t2d0
0.00.00.00.0  0.0  0.00.00.0   0   0 c8t3d0
 8344.50.0 359640.40.0  0.1 300.50.0   36.0   0 4362 c9
  190.00.0 6800.40.0  0.0  6.60.0   34.8   0  99 c9t8d0
  185.00.0 6917.10.0  0.0  6.10.0   32.9   0  94 c9t9d0
  187.00.0 6640.90.0  0.0  6.50.0   34.6   0  98 c9t10d0
  186.50.0 6543.40.0  0.0  7.00.0   37.5   0 100 c9t11d0
  180.50.0 7203.10.0  0.0  6.70.0   37.2   0 100 c9t12d0
  195.50.0 7352.40.0  0.0  7.00.0   35.8   0 100 c9t13d0
  188.00.0 6884.90.0  0.0  6.60.0   35.2   0  99 c9t14d0
  204.00.0 6990.10.0  0.0  7.00.0   34.3   0 100 c9t15d0
  199.00.0 7336.70.0  0.0  7.00.0   35.2   0 100 c9t16d0
  180.50.0 6837.90.0  0.0  7.00.0   38.8   0 100 c9t17d0
  198.00.0 7668.90.0  0.0  7.00.0   35.3   0 100 c9t18d0
  203.00.0 7983.20.0  0.0  7.00.0   34.5   0 100 c9t19d0
0.00.00.00.0  0.0  0.00.00.0   0   0 c9t20d0
  195.50.0 7096.40.0  0.0  6.70.0   34.1   0  98 c9t21d0
  189.50.0 7757.20.0  0.0  6.40.0   33.9   0  97 c9t22d0
  195.50.0 7645.90.0  0.0  6.60.0   33.8   0  99 c9t23d0
  194.50.0 7925.90.0  0.0  7.00.0   36.0   0 100 c9t24d0
  188.50.0 6725.60.0  0.0  6.20.0   32.8   0  94 c9t25d0
  188.50.0 7199.60.0  0.0  6.50.0   34.6   0  98 c9t26d0
  196.00.0 .90.0  0.0  6.30.0   32.1   0  95 c9t27d0
  193.50.0 7455.40.0  0.0  6.20.0   32.0   0  95 c9t28d0
  189.00.0 7400.90.0  0.0  6.30.0   33.2   0  96 c9t29d0
  182.50.0 9397.00.0  0.0  7.00.0   38.3   0 100 c9t30d0
  192.50.0 9179.50.0  0.0  7.00.0   36.3   0 100 c9t31d0
  189.50.0 9431.80.0  0.0  7.00.0   36.9   0 100 c9t32d0
  187.50.0 9082.00.0  0.0  7.00.0   37.3   0 100 c9t33d0
  188.50.0 9368.80.0  0.0  7.00.0   37.1   0 100 c9t34d0
  180.50.0 9332.80.0  0.0  7.00.0   38.8   0 100 c9t35d0
  183.00.0 9690.30.0  0.0  7.00.0   38.2   0 100 c9t36d0
  186.00.0 9193.80.0  0.0  7.00.0   37.6   0 100 c9t37d0
  180.50.0 8233.40.0  0.0  7.00.0   38.8   0 100 c9t38d0
  175.50.0 9085.20.0  0.0  7.00.0   39.9   0 100 c9t39d0
  177.00.0 9340.00.0  0.0  7.00.0   39.5   0 100 c9t40d0
  175.50.0 8831.00.0  0.0  7.00.0   39.9   0 100 c9t41d0
  190.50.0 9177.80.0  0.0  7.00.0   36.7   0 100 c9t42d0
0.00.00.00.0  0.0  0.00.00.0   0   0 c9t43d0
  196.00.0 9180.50.0  0.0  7.00.0   35.7   0 100 c9t44d0
  193.50.0 9496.80.0  0.0  7.00.0   36.2   0 100 c9t45d0
  187.00.0 8699.50.0  0.0  7.00.0   37.4   0 100 c9t46d0
  198.50.0 9277.00.0  0.0  7.00.0   35.2   0 100 c9t47d0
  185.50.0 9778.30.0  0.0  7.00.0   

Re: [zfs-discuss] zpool with very different sized vdevs?

2009-10-24 Thread David Turnbull

On 23/10/2009, at 9:39 AM, Travis Tabbal wrote:

I have a new array of 4x1.5TB drives running fine. I also have the  
old array of 4x400GB drives in the box on a separate pool for  
testing. I was planning to have the old drives just be a backup file  
store, so I could keep snapshots and such over there for important  
files.


I was wondering if it makes any sense to add the older drives to the  
new pool. Reliability might be lower as they are older drives, so if  
I were to loose 2 of them, things could get ugly. I'm just curious  
if it would make any sense to do something like this.


Makes sense to me. My current upgrade strategy is to add groups of 5  
disks whenever space is needed, up until physical space is exhausted,  
each time getting the current best $/GB disks.
This will result, at times, in having significant amounts of data on  
relatively few disks though, impacting performance.


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Dumb idea?

2009-10-24 Thread Orvar Korvar
Would this be possible to implement ontop ZFS? Maybe it is a dumb idea, I dont 
know. What do you think, and how to improve this?

Assume all files are put in the zpool, helter skelter. And then you can create 
arbitrary different filters that shows you the files you want to see. 

As of now, you have files in one directory structure. This makes the 
organization of the files, hardcoded. You have /Movies/Action and that is it. 
But if you had all movies in one large zpool, and if you could programmatically 
define different structures that act as filters, you could have different 
directory structures.

Programmatically defined directory structure1, that acts on the zpool:
/Movies/Action

Programmatically defined directory structure2:
/Movies/Actors/AlPacino

etc. 

Maybe this is what MS WinFS was about? Maybe tag the files? Maybe a relational 
database ontop ZFS? Maybe no directories at all? I dont know, just brain 
storming. Is this is a dumb idea? Or old idea?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Apple shuts down open source ZFS project

2009-10-24 Thread Gary Gendel
Apple is known to strong arm in licensing negotiations.  I'd really like to 
hear the straight-talk about what transpired.

That's ok, it just means that I won't be using mac as a server.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Apple cans ZFS project

2009-10-24 Thread Joerg Schilling
Alex Blewitt alex.blew...@gmail.com wrote:

 Apple has finally canned [1] the ZFS port [2]. To try and keep momentum up 
 and continue to use the best filing system available, a group of fans have 
 set up a continuation project and mailing list [3,4].

The article that was mentioned a few hours ago did mention
licensing problems without giving any kind of evidence for 
this claim. If there is evidence, I would be interested in
knowing the background, otherwise it looks to me like FUD.

Jörg

-- 
 EMail:jo...@schily.isdn.cs.tu-berlin.de (home) Jörg Schilling D-13353 Berlin
   j...@cs.tu-berlin.de(uni)  
   joerg.schill...@fokus.fraunhofer.de (work) Blog: 
http://schily.blogspot.com/
 URL:  http://cdrecord.berlios.de/private/ ftp://ftp.berlios.de/pub/schily
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Change physical path to a zpool.

2009-10-24 Thread Jürgen Keil
 I have a functional OpenSolaris x64 system on which I need to physically
 move the boot disk, meaning its physical device path will change and
 probably its cXdX name.
 
 When I do this the system fails to boot
...
 How do I inform ZFS of the new path?
...
 Do I need to boot from the LiveCD and then import the
 pool from its new path?

Exactly.

Boot from the livecd with the disk connected on the
new physical path, and run pfexec zpool import -f rpool,
followed by a reboot.

That'll update the zpool's label with the new physical
device path information.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Apple cans ZFS project

2009-10-24 Thread David Magda

On Oct 24, 2009, at 08:53, Joerg Schilling wrote:


The article that was mentioned a few hours ago did mention
licensing problems without giving any kind of evidence for
this claim. If there is evidence, I would be interested in
knowing the background, otherwise it looks to me like FUD.



I'm guessing that you'll never see direct evidence given the  
sensitivity that these negotiations can take. All you'll guess is  
rumours and leaks of various levels of reliability.


Apple can currently just take the ZFS CDDL code and incorporate it  
(like they did with DTrace), but it may be that they wanted a private  
license from Sun (with appropriate technical support and  
indemnification), and the two entities couldn't come to mutually  
agreeable terms.


Oh well. I'm sure Apple can come up something good in the FS team, but  
it's a shame that the wheel has to be re-invented when there's a  
production-ready option available.


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] PSARC 2009/571: ZFS deduplication properties

2009-10-24 Thread David Magda

On Oct 23, 2009, at 19:27, BJ Quinn wrote:

Anyone know if this means that this will actually show up in SNV  
soon, or whether it will make 2010.02?  (on disk dedup specifically)


It will go in when it goes in. If you have a support contract call up  
Sun and ask for details; if you're using a free version quit pestering  
the nice developers. :)


In the ZFS keynote at KCA:

http://blogs.sun.com/video/entry/kernel_conference_australia_2009_jeff

it was stated that they're hoping to get de-dupe and crypto in by the  
end of the year.


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] PSARC 2009/571: ZFS deduplication properties

2009-10-24 Thread Richard Elling

On Oct 24, 2009, at 7:18 AM, David Magda wrote:

On Oct 23, 2009, at 19:27, BJ Quinn wrote:

Anyone know if this means that this will actually show up in SNV  
soon, or whether it will make 2010.02?  (on disk dedup specifically)


It will go in when it goes in. If you have a support contract call  
up Sun and ask for details; if you're using a free version quit  
pestering the nice developers. :)


In the ZFS keynote at KCA:

http://blogs.sun.com/video/entry/kernel_conference_australia_2009_jeff

it was stated that they're hoping to get de-dupe and crypto in by  
the end of the year.


At LISA09 in Baltimore next week, Darren is scheduled to give an update
on the ZFS crypto project.  We should grab him, take him to our secret
rendition site at Inner Harbor, force him into a comfy chair, and beer- 
board

him until he confesses.

...sss, don't tell Darren ;-)
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Setting up an SSD ZIL - Need A Reality Check

2009-10-24 Thread Bob Friesenhahn

On Fri, 23 Oct 2009, Eric D. Mudama wrote:


I don't believe the above statement is correct.

According to anandtech who asked Intel:

http://www.anandtech.com/cpuchipsets/intel/showdoc.aspx?i=3403p=10

the DRAM doesn't hold user data.  The article claims that data goes
through an internal 256KB buffer.


These folks may well be in Intel's back pocket, but it seems that the 
data given is not clear or accurate.  It does not matter if DRAM or 
SRAM is used for the disk's cache.  What matters is if all user data 
gets flushed to non-volatile storage for each cache flush request. 
Since FLASH drives need to erase a larger block than might be written, 
existing data needs to be read, updated, and then written.  This data 
needs to be worked on in a volatile buffer.  Without extreme care, it 
is possible for the FLASH drive to corrupt other existing unrelated 
data if there is power loss.  The FLASH drive could use a COW scheme 
(like ZFS) but it still needs to take care to persist the block 
mappings for each cache sync request or transactions would be lost.


Folks at another site found that the drive was losing the last few 
synchronous writes with the cache enabled.  This could be a problem 
with the drive, or the OS if it is not issuing the cache flush 
request.



Is solaris incapable of issuing a SATA command FLUSH CACHE EXT?


It issues one for each update to the intent log.

Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SNV_125 MPT warning in logfile

2009-10-24 Thread Tim Cook
On Sat, Oct 24, 2009 at 4:49 AM, Adam Cheal ach...@pnimedia.com wrote:

 The iostat I posted previously was from a system we had already tuned the
 zfs:zfs_vdev_max_pending depth down to 10 (as visible by the max of about 10
 in actv per disk).

 I reset this value in /etc/system to 7, rebooted, and started a scrub.
 iostat output showed busier disks (%b is higher, which seemed odd) but a cap
 of about 7 queue items per disk, proving the tuning was effective. iostat at
 a high-water mark during the test looked like this:



 ...and sure enough about 20 minutes into it I get this (bus reset?):

 scsi: [ID 107833 kern.warning] WARNING: /p...@0,0/pci8086,6...@4
 /pci1000,3...@0/s...@34,0 (sd49):
   incomplete read- retrying
 scsi: [ID 107833 kern.warning] WARNING: /p...@0,0/pci8086,6...@4
 /pci1000,3...@0/s...@21,0 (sd30):
   incomplete read- retrying
 scsi: [ID 107833 kern.warning] WARNING: /p...@0,0/pci8086,6...@4
 /pci1000,3...@0/s...@1e,0 (sd27):
   incomplete read- retrying
 scsi: [ID 365881 kern.info] /p...@0,0/pci8086,6...@4/pci1000,3...@0 (mpt0):
   Rev. 8 LSI, Inc. 1068E found.
 scsi: [ID 365881 kern.info] /p...@0,0/pci8086,6...@4/pci1000,3...@0 (mpt0):
   mpt0 supports power management.
 scsi: [ID 365881 kern.info] /p...@0,0/pci8086,6...@4/pci1000,3...@0 (mpt0):
   mpt0: IOC Operational.

 During the bus reset, iostat output looked like this:


 During our previous testing, we had tried even setting this max_pending
 value down to 1, but we still hit the problem (albeit it took a little
 longer to hit it) and I couldn't find anything else I could set to throttle
 IO to the disk, hence the frustration.

 If you hadn't seen this output, would you say that 7 was a reasonable
 value for that max_pending queue for our architecture and should give the
 LSI controller in this situation enough breathing room to operate? If so, I
 *should* be able to scrub the disks successfully (ZFS isn't to blame) and
 therefore have to point the finger at the
 mpt-driver/LSI-firmware/disk-firmware instead.
 --


A little bit of searching google says:
http://downloadmirror.intel.com/17968/eng/ESRT2_IR_readme.txt
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SNV_125 MPT warning in logfile

2009-10-24 Thread Tim Cook
On Sat, Oct 24, 2009 at 11:20 AM, Tim Cook t...@cook.ms wrote:



 On Sat, Oct 24, 2009 at 4:49 AM, Adam Cheal ach...@pnimedia.com wrote:

 The iostat I posted previously was from a system we had already tuned the
 zfs:zfs_vdev_max_pending depth down to 10 (as visible by the max of about 10
 in actv per disk).

 I reset this value in /etc/system to 7, rebooted, and started a scrub.
 iostat output showed busier disks (%b is higher, which seemed odd) but a cap
 of about 7 queue items per disk, proving the tuning was effective. iostat at
 a high-water mark during the test looked like this:



 ...and sure enough about 20 minutes into it I get this (bus reset?):


 scsi: [ID 107833 kern.warning] WARNING: /p...@0,0/pci8086,6...@4
 /pci1000,3...@0/s...@34,0 (sd49):
   incomplete read- retrying
 scsi: [ID 107833 kern.warning] WARNING: /p...@0,0/pci8086,6...@4
 /pci1000,3...@0/s...@21,0 (sd30):
   incomplete read- retrying
 scsi: [ID 107833 kern.warning] WARNING: /p...@0,0/pci8086,6...@4
 /pci1000,3...@0/s...@1e,0 (sd27):
   incomplete read- retrying
 scsi: [ID 365881 kern.info] /p...@0,0/pci8086,6...@4/pci1000,3...@0(mpt0):
   Rev. 8 LSI, Inc. 1068E found.
 scsi: [ID 365881 kern.info] /p...@0,0/pci8086,6...@4/pci1000,3...@0(mpt0):
   mpt0 supports power management.
 scsi: [ID 365881 kern.info] /p...@0,0/pci8086,6...@4/pci1000,3...@0(mpt0):
   mpt0: IOC Operational.

 During the bus reset, iostat output looked like this:


 During our previous testing, we had tried even setting this max_pending
 value down to 1, but we still hit the problem (albeit it took a little
 longer to hit it) and I couldn't find anything else I could set to throttle
 IO to the disk, hence the frustration.

 If you hadn't seen this output, would you say that 7 was a reasonable
 value for that max_pending queue for our architecture and should give the
 LSI controller in this situation enough breathing room to operate? If so, I
 *should* be able to scrub the disks successfully (ZFS isn't to blame) and
 therefore have to point the finger at the
 mpt-driver/LSI-firmware/disk-firmware instead.
 --


 A little bit of searching google says:
 http://downloadmirror.intel.com/17968/eng/ESRT2_IR_readme.txt


Huh, good old keyboard shortcuts firing off emails before I'm done with
them.  Anyways, in that link, I found he following:
 3. Updated - to provide NCQ queue depth of 32 (was 8) on 1064e and 1068e
and 1078 internal-only controllers in IR and ESRT2 modes.

Then there's also this link from someone using a similar controller under
freebsd:
http://www.nabble.com/mpt-errors-QUEUE-FULL-EVENT,-freebsd-7.0-on-dell-1950-td20019090.html

It would make total sense if you're having issues and the default queue
depth for that controller is 8 per port.  Even setting it to 1 isn't going
to fix your issue if you've got 46 drives on one channel/port.

Honestly I'm just taking shots in the dark though.

--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Setting up an SSD ZIL - Need A Reality Check

2009-10-24 Thread Bob Friesenhahn

On Sat, 24 Oct 2009, Bob Friesenhahn wrote:



Is solaris incapable of issuing a SATA command FLUSH CACHE EXT?


It issues one for each update to the intent log.


I should mention that FLASH SSDs without a capacitor/battery-backed 
cache flush (like the X25-E) are likely to get burned out pretty 
quickly if they respect each cache flush request.  The reason is that 
each write needs to update a full FLASH metablock.  This means that a 
small 4K syncronous update forces a write of a full FLASH metablock in 
the X25-E.  I don't know the size of the FLASH metablock in the X25-E 
(seems to be a closely-held secret), but perhaps it is 128K, 256K, or 
512K.


The rumor that disabling the cache on the X25-E disables the wear 
leveling is probably incorrect.  It is much more likely that disabling 
the cache causes each write to erase and write a full FLASH 
metablock (known as write amplification), therefore causing the 
device to wear out much more quickly than if it deferred writes.


  http://www.tomshardware.com/reviews/Intel-x25-m-SSD,2012-5.html

Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SNV_125 MPT warning in logfile

2009-10-24 Thread Richard Elling

more below...

On Oct 24, 2009, at 2:49 AM, Adam Cheal wrote:

The iostat I posted previously was from a system we had already  
tuned the zfs:zfs_vdev_max_pending depth down to 10 (as visible by  
the max of about 10 in actv per disk).


I reset this value in /etc/system to 7, rebooted, and started a  
scrub. iostat output showed busier disks (%b is higher, which seemed  
odd) but a cap of about 7 queue items per disk, proving the tuning  
was effective. iostat at a high-water mark during the test looked  
like this:


   extended device statistics
   r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device
   0.00.00.00.0  0.0  0.00.00.0   0   0 c8
   0.00.00.00.0  0.0  0.00.00.0   0   0 c8t0d0
   0.00.00.00.0  0.0  0.00.00.0   0   0 c8t1d0
   0.00.00.00.0  0.0  0.00.00.0   0   0 c8t2d0
   0.00.00.00.0  0.0  0.00.00.0   0   0 c8t3d0
8344.50.0 359640.40.0  0.1 300.50.0   36.0   0 4362 c9
 190.00.0 6800.40.0  0.0  6.60.0   34.8   0  99 c9t8d0
 185.00.0 6917.10.0  0.0  6.10.0   32.9   0  94 c9t9d0
 187.00.0 6640.90.0  0.0  6.50.0   34.6   0  98 c9t10d0
 186.50.0 6543.40.0  0.0  7.00.0   37.5   0 100 c9t11d0
 180.50.0 7203.10.0  0.0  6.70.0   37.2   0 100 c9t12d0
 195.50.0 7352.40.0  0.0  7.00.0   35.8   0 100 c9t13d0
 188.00.0 6884.90.0  0.0  6.60.0   35.2   0  99 c9t14d0
 204.00.0 6990.10.0  0.0  7.00.0   34.3   0 100 c9t15d0
 199.00.0 7336.70.0  0.0  7.00.0   35.2   0 100 c9t16d0
 180.50.0 6837.90.0  0.0  7.00.0   38.8   0 100 c9t17d0
 198.00.0 7668.90.0  0.0  7.00.0   35.3   0 100 c9t18d0
 203.00.0 7983.20.0  0.0  7.00.0   34.5   0 100 c9t19d0
   0.00.00.00.0  0.0  0.00.00.0   0   0 c9t20d0
 195.50.0 7096.40.0  0.0  6.70.0   34.1   0  98 c9t21d0
 189.50.0 7757.20.0  0.0  6.40.0   33.9   0  97 c9t22d0
 195.50.0 7645.90.0  0.0  6.60.0   33.8   0  99 c9t23d0
 194.50.0 7925.90.0  0.0  7.00.0   36.0   0 100 c9t24d0
 188.50.0 6725.60.0  0.0  6.20.0   32.8   0  94 c9t25d0
 188.50.0 7199.60.0  0.0  6.50.0   34.6   0  98 c9t26d0
 196.00.0 .90.0  0.0  6.30.0   32.1   0  95 c9t27d0
 193.50.0 7455.40.0  0.0  6.20.0   32.0   0  95 c9t28d0
 189.00.0 7400.90.0  0.0  6.30.0   33.2   0  96 c9t29d0
 182.50.0 9397.00.0  0.0  7.00.0   38.3   0 100 c9t30d0
 192.50.0 9179.50.0  0.0  7.00.0   36.3   0 100 c9t31d0
 189.50.0 9431.80.0  0.0  7.00.0   36.9   0 100 c9t32d0
 187.50.0 9082.00.0  0.0  7.00.0   37.3   0 100 c9t33d0
 188.50.0 9368.80.0  0.0  7.00.0   37.1   0 100 c9t34d0
 180.50.0 9332.80.0  0.0  7.00.0   38.8   0 100 c9t35d0
 183.00.0 9690.30.0  0.0  7.00.0   38.2   0 100 c9t36d0
 186.00.0 9193.80.0  0.0  7.00.0   37.6   0 100 c9t37d0
 180.50.0 8233.40.0  0.0  7.00.0   38.8   0 100 c9t38d0
 175.50.0 9085.20.0  0.0  7.00.0   39.9   0 100 c9t39d0
 177.00.0 9340.00.0  0.0  7.00.0   39.5   0 100 c9t40d0
 175.50.0 8831.00.0  0.0  7.00.0   39.9   0 100 c9t41d0
 190.50.0 9177.80.0  0.0  7.00.0   36.7   0 100 c9t42d0
   0.00.00.00.0  0.0  0.00.00.0   0   0 c9t43d0
 196.00.0 9180.50.0  0.0  7.00.0   35.7   0 100 c9t44d0
 193.50.0 9496.80.0  0.0  7.00.0   36.2   0 100 c9t45d0
 187.00.0 8699.50.0  0.0  7.00.0   37.4   0 100 c9t46d0
 198.50.0 9277.00.0  0.0  7.00.0   35.2   0 100 c9t47d0
 185.50.0 9778.30.0  0.0  7.00.0   37.7   0 100 c9t48d0
 192.00.0 8384.20.0  0.0  7.00.0   36.4   0 100 c9t49d0
 198.50.0 8864.70.0  0.0  7.00.0   35.2   0 100 c9t50d0
 192.00.0 9369.80.0  0.0  7.00.0   36.4   0 100 c9t51d0
 182.50.0 8825.70.0  0.0  7.00.0   38.3   0 100 c9t52d0
 202.00.0 7387.90.0  0.0  7.00.0   34.6   0 100 c9t55d0

...and sure enough about 20 minutes into it I get this (bus reset?):

scsi: [ID 107833 kern.warning] WARNING: /p...@0,0/pci8086,6...@4/ 
pci1000,3...@0/s...@34,0 (sd49):

  incomplete read- retrying
scsi: [ID 107833 kern.warning] WARNING: /p...@0,0/pci8086,6...@4/ 
pci1000,3...@0/s...@21,0 (sd30):

  incomplete read- retrying
scsi: [ID 107833 kern.warning] WARNING: /p...@0,0/pci8086,6...@4/ 
pci1000,3...@0/s...@1e,0 (sd27):

  incomplete read- retrying
scsi: [ID 365881 kern.info] /p...@0,0/pci8086,6...@4/pci1000,3...@0  
(mpt0):

  Rev. 8 LSI, Inc. 1068E found.
scsi: [ID 365881 kern.info] /p...@0,0/pci8086,6...@4/pci1000,3...@0  
(mpt0):

  mpt0 supports power management.
scsi: [ID 365881 kern.info] /p...@0,0/pci8086,6...@4/pci1000,3...@0  
(mpt0):

  mpt0: IOC Operational.

During the bus reset, 

Re: [zfs-discuss] PSARC 2009/571: ZFS deduplication properties

2009-10-24 Thread Carson Gaspar

On 10/24/09 8:37 AM, Richard Elling wrote:


At LISA09 in Baltimore next week, Darren is scheduled to give an update
on the ZFS crypto project. We should grab him, take him to our secret
rendition site at Inner Harbor, force him into a comfy chair, and
beer-board him until he confesses.


I can supply the horribly fluffy pillows and the {whiskey, vodka, gin, rum, ...} 
if the comfy chair isn't enough to make him talk.



...sss, don't tell Darren ;-)


In all seriousness, if any of the ZFS folks see me at the bar at LISA, hit me up 
for a drink or 3 - I owe ya'


--
Carson
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SNV_125 MPT warning in logfile

2009-10-24 Thread Carson Gaspar

On 10/24/09 9:43 AM, Richard Elling wrote:


OK, here we see 4 I/Os pending outside of the host. The host has
sent them on and is waiting for them to return. This means they are
getting dropped either at the disk or somewhere between the disk
and the controller.

When this happens, the sd driver will time them out, try to clear
the fault by reset, and retry. In other words, the resets you see
are when the system tries to recover.

Since there are many disks with 4 stuck I/Os, I would lean towards
a common cause. What do these disks have in common? Firmware?
Do they share a SAS expander?


I saw this with my WD 500GB SATA disks (HDS725050KLA360) and LSI firmware 
1.28.02.00 in IT mode, but I (almost?) always had exactly 1 stuck I/O. Note 
that my disks were one per channel, no expanders. I have _not_ seen it since 
replacing those disks. So my money is on a bug in the LSI firmware, the drive 
firmware, the drive controller hardware, or some combination thereof.


Note that LSI has released firmware 1.29.00.00. Sadly I cannot find any 
documentation on what has changed. Downloadable from LSI at 
http://lsi.com/storage_home/products_home/host_bus_adapters/sas_hbas/internal/sas3081e-r/index.html?remote=1locale=EN


--
Carson




___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SNV_125 MPT warning in logfile

2009-10-24 Thread Tim Cook
On Sat, Oct 24, 2009 at 12:30 PM, Carson Gaspar car...@taltos.org wrote:


 I saw this with my WD 500GB SATA disks (HDS725050KLA360) and LSI firmware
 1.28.02.00 in IT mode, but I (almost?) always had exactly 1 stuck I/O.
 Note that my disks were one per channel, no expanders. I have _not_ seen it
 since replacing those disks. So my money is on a bug in the LSI firmware,
 the drive firmware, the drive controller hardware, or some combination
 thereof.

 Note that LSI has released firmware 1.29.00.00. Sadly I cannot find any
 documentation on what has changed. Downloadable from LSI at
 http://lsi.com/storage_home/products_home/host_bus_adapters/sas_hbas/internal/sas3081e-r/index.html?remote=1locale=EN

 --
 Carson


Here's the closest I could find from some Intel release notes.  It came
from: ESRT2_IR_readme.txt and does mention the 1068e chipset, as well as
that firmware rev.



Package Information

FW and OpROM Package for Native SAS mode, IT/IR mode and Intel(R) Embedded
Server RAID Technology II

Package version: 2009.10.06
FW Version = 01.29.00 (includes fixed firmware settings)
BIOS (non-RAID) Version = 06.28.00
BIOS (SW RAID) Version = 08.09041155

Supported RAID modes: 0, 1, 1E, 10, 10E and 5 (activation key AXXRAKSW5
required for RAID 5 support)

Supported Intel(R) Server Boards and Systems:
 - S5000PSLSASR, S5000XVNSASR, S5000VSASASR, S5000VCLSASR, S5000VSFSASR
 - SR1500ALSASR, SR1550ALSASR, SR2500ALLXR, S5000PALR (with SAS I/O Module)
 - S5000PSLROMBR (SROMBSAS18E) without HW RAID activation key AXXRAK18E
installed (native SAS or SW RAID modes only) - for HW RAID mode separate
package is available
 - NSC2U, TIGW1U

Supported Intel(R) RAID controller (adapters):
- SASMF8I, SASWT4I, SASUC8I

Intel(R) SAS Entry RAID Module AXX4SASMOD, when inserted in below Intel(R)
Server Boards and Systems:
 - S5520HC / S5520HCV, S5520SC,S5520UR,S5500WB


Known Restrictions

1. The sasflash versions within this package don't support ESRTII
controllers.
2. The sasflash utility for Windows and Linux version within this package
only support Intel(R) IT/IR RAID controllers.  The sasflash utility for
Windows and Linux version within this package don't support sasflash -o -e 6
command.
3. The sasflash utility for DOS version doesn't support the Intel(R) Server
Boards and Systems due to BIOS limitation.  The DOS version sasflash might
still be supported on 3rd party server boards which don't have the BIOS
limitation.
4. No PCI 3.0 support
5. No Foreign Configuration Resolution Support
6. No RAID migration Support
7. No mixed RAID mode support ever
8. No Stop On Error support


Known Bugs

(1)
For Intel(R) chipset S5000P/S5000V/S5000X based server systems, please use
the 32 bit, non-EBC version of sasflash which is
SASFLASH_Ph17-1.22.00.00\sasflash_efi_bios32_rel\sasflash.efi, instead of
the ebc version of sasflash which is in the top package directory and also
in
SASFLASH_Ph17-1.22.00.00\sasflash_efi_ebc_rel\sasflash.efi.  The latter one
may return a wrong sas address with a sasflash -list command in the listed
systems.

(2)
LED behavior does not match between SES and SGPIO for some conditions
(documentation in process).

(3)
When in EFI Optimized Boot mode, the task bar is not displayed in EFI_BSD
after two volumes are created.

(4)
If a system is rebooted while a volume rebuild is in progress, the rebuild
will start over from the beginning.


Fixes/Updates

Version 2009.10.06
 1. Fixed - MP2 HDD fault LED stays on after rebuild completes
 2. Fixed - System hangs if drive hot-unplugged during stress

Version 2009.07.30
 1. Fixed - SES over i2c for 106x products
 2. Fixed - FW settings updated to support SES over i2c drive lights on
FALSASMP2.

Version 2009.06.15
 1. Fixed - SES over I2C issue for 1078IR.
 2. Updated - 1068e fw to fix SES over I2C on MP2 bug.
 3. Updated - to provide NCQ queue depth of 32 (was 8) on 1064e and 1068e
and 1078 internal-only controllers in IR and ESRT2 modes.
 4. Updated - Firmware to enable SES over I2C on AXX4SASMOD.
 5. Updated - Settings to provide better LED indicators for SGPIO.

Version 2008.12.11
 1. Fixed - Media can't boot from SATA DVD in some systems in Software RAID
(ESRT2) mode.
 2. Fixed - Incorrect RAID 5 ECC error handling in Ctrl+M

Version 2008.11.07
 1. Added support for - Enable ICH10 support
 2. Added support for - Software RAID5 to support ICH10R
 3. Added support for - Single Drive RAID 0 (IS) Volume
 4. Fixed - Resolved issue where user could not create a second volume
immediately following the deletion of a second volume.
 5. Fixed - Second hot spare status not shown when first hot spare is
inactive/missing

Version 2008.09.22
 1. Fixed - SWR:During hot PD removal and then quick reboot, not updating
the DDF correctly.

Version 2008.06.16
 1. Fixed - the issue withThe LED functions are not working inside the
OSes for SWR5
 2. Fixed - 

Re: [zfs-discuss] Performance problems with Thumper and 7TB ZFS pool using RAIDZ2

2009-10-24 Thread Jim Mauro

Posting to zfs-discuss. There's no reason this needs to be
kept confidential.

5-disk RAIDZ2 - doesn't that equate to only 3 data disks?
Seems pointless - they'd be much better off using mirrors,
which is a better choice for random IO...

Looking at this now...

/jim


Jeff Savit wrote:

Hi all,

I'm looking for suggestions for the following situation: I'm helping 
another SE with a customer using Thumper with a large ZFS pool mostly 
used as an NFS server, and disappointments in performance. The storage 
is an intermediate holding place for data to be fed into a relational 
database, and the statement is that the NFS side can't keep up with 
data feeds written to it as flat files.


The ZFS pool has 8 5-volume RAIDZ2 groups, for 7.3TB of storage, with 
1.74TB available.  Plenty of idle CPU as shown by vmstat and mpstat.  
iostat shows queued I/O and I'm not happy about the total latencies - 
wsvc_t in excess of 75ms at times.  Average of ~60KB per read and only 
~2.5KB per write. Evil Tuning guide tells me that RAIDZ2 is happiest 
for long reads and writes, and this is not the use case here.


I was surprised to see commands like tar, rm, and chown running 
locally on the NFS server, so it looks like they're locally doing file 
maintenance and pruning at the same time it's being accessed remotely. 
That makes sense to me for the short write lengths and for the high 
ZFS ACL activity shown by DTrace. I wonder if there is a lot of sync 
I/O that would benefit from separately defined ZILs (whether SSD or 
not), so I've asked them to look for fsync activity.


Data collected thus far is listed below. I've asked for verification 
of the Solaris 10 level (I believe it's S10u6) and ZFS recordsize.  
Any suggestions will be appreciated.


regards, Jeff

 stuff starts here 


zpool iostat -v gives figures like:

bash-3.00# zpool iostat -v
  capacity operations  bandwidth
pool   used avail read write  read write
-- - - -- - -
mdpool 7.32T 1.74T 290  455 1.57M 3.21M
raidz2  937G  223G  36   56   201K 411K
c0t0d0 -  - 18   40  1.13M 141K
c1t0d0 -  - 18   40  1.12M 141K
c4t0d0 -  - 18   40  1.13M 141K
c6t0d0 -  - 18   40  1.13M 141K
c7t0d0 -  - 18   40  1.13M 141K

---the other 7 raidz2 groups have almost identical numbers on their 
devices---


iostat -iDnxz looks like:

extended device statistics 
r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b device

0.00.00.00.0  0.0  0.00.00.1   0   0 c5t0d0
   15.8   95.9  996.9  233.1  4.3  1.3   38.2   12.0  20  37 c6t0d0
   16.1   95.6 1018.5  232.4  2.5  2.6   22.2   23.2  16  36 c7t0d0
   16.1   96.0 1012.5  232.8  2.8  2.9   24.5   26.1  19  38 c4t0d0
   16.0   93.1 1012.9  242.2  3.6  1.5   33.2   14.2  18  36 c5t1d0
   15.9   82.2 1000.5  235.0  1.9  1.6   19.2   16.0  12  31 c5t2d0
   16.6   95.6 1046.7  232.7  2.5  2.7   22.2   23.7  18  37 c0t0d0
   16.6   96.1 1042.4  232.8  4.7  0.6   42.05.2  19  38 c1t0d0
...snip...
   16.5   95.4 1027.2  263.0  5.9  0.4   53.03.6  26  40 c0t4d0
   16.6   95.4 1041.1  263.6  3.9  1.0   34.59.3  18  36 c1t4d0
   16.8   99.1 1060.6  248.6  7.2  0.7   62.06.0  32  45 c0t5d0
   16.5   99.6 1034.7  248.9  8.2  1.1   70.59.1  38  48 c1t5d0
   17.0   82.5 1072.9  219.8  4.8  0.5   48.44.7  21  38 c0t6d0


prstat  looks like:

bash-3.00# prstat
PID USERNAME SIZE RSS STATE PRI NICE TIME CPU PROCESS/NLWP
815 daemon 3192K 2560K sleep 60 -20 83:10:07 0.6% nfsd/24
27918 root 1092K 920K cpu2 37 4 0:01:37 0.2% rm/1
19142 root 248M 247M sleep 60 0 1:24:24 0.1% chown/1
28794 root 2552K 1304K sleep 59 0 0:00:00 0.1% tar/1
29957 root 1192K 908K sleep 59 0 0:57:30 0.1% find/1
14737 root 7620K 1964K sleep 59 0 0:03:56 0.0% sshd/1
...


prstat -Lm looks like:

bash-3.00# prstat -Lm
PID USERNAME USR SYS TRP TFL DFL LCK SLP LAT VCX ICX SCL SIG PROCESS/LWPID
27918 root 0.0 0.9 0.0 0.0 0.0 0.0 99 0.0 194 7 2K 0 rm/1
28794 root 0.1 0.6 0.0 0.0 0.0 0.0 99 0.0 209 10 909 0 tar/1
19142 root 0.0 0.6 0.0 0.0 0.0 0.0 99 0.0 224 3 1K 0 chown/1
29957 root 0.0 0.4 0.0 0.0 0.0 0.0 100 0.0 213 6 420 0 find/1
815 daemon 0.0 0.3 0.0 0.0 0.0 0.0 100 0.0 197 0 0 0 nfsd/28230
815 daemon 0.0 0.3 0.0 0.0 0.0 0.0 100 0.0 191 0 0 0 nfsd/28222
815 daemon 0.0 0.3 0.0 0.0 0.0 0.0 100 0.0 185 0 0 0 nfsd/28211
---many more nfsd lines of similar appearance---


A small DTrace script for ZFS gives me:

# dtrace -n 'fbt::zfs*:ent...@[pid,execname,probefunc] = count()} END 
{trunc(@,20); printa(@)}'

^C
...some lines trimmed...
28835 tar zfs_dirlook 67761
28835 tar zfs_lookup 67761
28835 tar zfs_zaccess 69166
28835 tar zfs_dirent_lock 71083
28835 tar zfs_dirent_unlock 71084
28835 tar zfs_zaccess_common28835 tar zfs_acl_node_read 77251

28835 tar zfs_acl_node_read_internal 77251
28835 tar zfs_acl_alloc 78656
28835 tar zfs_acl_free 78656
27918 rm zfs_acl_alloc 85888

Re: [zfs-discuss] Performance problems with Thumper and 7TB ZFS pool using RAIDZ2

2009-10-24 Thread Albert Chin
On Sat, Oct 24, 2009 at 03:31:25PM -0400, Jim Mauro wrote:
 Posting to zfs-discuss. There's no reason this needs to be
 kept confidential.

 5-disk RAIDZ2 - doesn't that equate to only 3 data disks?
 Seems pointless - they'd be much better off using mirrors,
 which is a better choice for random IO...

Is it really pointless? Maybe they want the insurance RAIDZ2 provides.
Given the choice between insurance and performance, I'll take insurance,
though it depends on your use case. We're using 5-disk RAIDZ2 vdevs.
While I want the performance a mirrored vdev would give, it scares me
that you're just one drive away from a failed pool. Of course, you could
have two mirrors in each vdev but I don't want to sacrifice that much
space. However, over the last two years, we haven't had any
demonstratable failures that would give us cause for concern. But, it's
still unsettling.

Would love to hear other opinions on this.

 Looking at this now...

 /jim


 Jeff Savit wrote:
 Hi all,

 I'm looking for suggestions for the following situation: I'm helping  
 another SE with a customer using Thumper with a large ZFS pool mostly  
 used as an NFS server, and disappointments in performance. The storage  
 is an intermediate holding place for data to be fed into a relational  
 database, and the statement is that the NFS side can't keep up with  
 data feeds written to it as flat files.

 The ZFS pool has 8 5-volume RAIDZ2 groups, for 7.3TB of storage, with  
 1.74TB available.  Plenty of idle CPU as shown by vmstat and mpstat.   
 iostat shows queued I/O and I'm not happy about the total latencies -  
 wsvc_t in excess of 75ms at times.  Average of ~60KB per read and only  
 ~2.5KB per write. Evil Tuning guide tells me that RAIDZ2 is happiest  
 for long reads and writes, and this is not the use case here.

 I was surprised to see commands like tar, rm, and chown running  
 locally on the NFS server, so it looks like they're locally doing file  
 maintenance and pruning at the same time it's being accessed remotely.  
 That makes sense to me for the short write lengths and for the high  
 ZFS ACL activity shown by DTrace. I wonder if there is a lot of sync  
 I/O that would benefit from separately defined ZILs (whether SSD or  
 not), so I've asked them to look for fsync activity.

 Data collected thus far is listed below. I've asked for verification  
 of the Solaris 10 level (I believe it's S10u6) and ZFS recordsize.   
 Any suggestions will be appreciated.

 regards, Jeff

-- 
albert chin (ch...@thewrittenword.com)
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Apple cans ZFS project

2009-10-24 Thread Jeff Bonwick
 Apple can currently just take the ZFS CDDL code and incorporate it  
 (like they did with DTrace), but it may be that they wanted a private  
 license from Sun (with appropriate technical support and  
 indemnification), and the two entities couldn't come to mutually  
 agreeable terms.

I cannot disclose details, but that is the essence of it.

Jeff
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SNV_125 MPT warning in logfile

2009-10-24 Thread Adam Cheal
The controller connects to two disk shelves (expanders), one per port on the 
card. If you look back in the thread, you'll see our zpool config has one vdev 
per shelf. All of the disks are Western Digital (model WD1002FBYS-18A6B0) 1TB 
7.2K, firmware rev. 03.00C06. Without actually matching up the disks with 
stuck IOs, I am assuming they are all on the same vdev/shelf/controller port.

I communicated with LSI support directly regarding the v1.29 firmware update, 
and here's what they wrote back:

I have checked with our development team on this one. There are no release 
notes available as the functionality of the coding itself has not changed. This 
was a minor cleanup and the firmware was assigned a new phase number for these. 
There were no defects or added functionality in going from the P16 firmware to 
the P17 firmware.

Also, regarding the NCQ depth on the drives I used the LSIUTIL in expert mode 
and used options 13/14 to dump the following settings (which are all default):

Multi-pathing:  [0=Disabled, 1=Enabled, default is 0] 
SATA Native Command Queuing:  [0=Disabled, 1=Enabled, default is 1] 
SATA Write Caching:  [0=Disabled, 1=Enabled, default is 1] 
SATA Maximum Queue Depth:  [0 to 255, default is 32] 
Device Missing Report Delay:  [0 to 2047, default is 0] 
Device Missing I/O Delay:  [0 to 255, default is 0] 
Persistence:  [0=Disabled, 1=Enabled, default is 1] 
Physical mapping:  [0=None, 1=DirectAttach, 2=EnclosureSlot, default is 0]
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Performance problems with Thumper and 7TB ZFS pool using RAIDZ2

2009-10-24 Thread Bob Friesenhahn

On Sat, 24 Oct 2009, Albert Chin wrote:


5-disk RAIDZ2 - doesn't that equate to only 3 data disks?
Seems pointless - they'd be much better off using mirrors,
which is a better choice for random IO...


Is it really pointless? Maybe they want the insurance RAIDZ2 
provides. Given the choice between insurance and performance, I'll 
take insurance, though it depends on your use case. We're using 
5-disk RAIDZ2 vdevs. While I want the performance a mirrored vdev 
would give, it scares me that you're just one drive away from a 
failed pool. Of course, you could have two mirrors in each vdev but 
I don't want to sacrifice that much space. However, over the last 
two years, we haven't had any demonstratable failures that would 
give us cause for concern. But, it's still unsettling.


I am using duplex mirrors here even though if a drive fails, the pool 
is just one drive away from failure.  I do feel that it is safer than 
raidz1 because resilvering is much less complex so there is less to go 
wrong and the resilver time should be the best possible.


For heavy multi-user use (like this Sun customer has) it is impossible 
to beat the mirrored configuration for performance.  If the I/O load 
is heavy and the storage is an intermediate holding place for data 
then it makes sense to use mirrors.  If it was for long term archival 
storage, then raidz2 would make more sense.



iostat shows queued I/O and I'm not happy about the total latencies -
wsvc_t in excess of 75ms at times.  Average of ~60KB per read and only
~2.5KB per write. Evil Tuning guide tells me that RAIDZ2 is happiest
for long reads and writes, and this is not the use case here.


~2.5KB per write is definitely problematic.  NFS writes are usually 
synchronous so this is using up the available IOPS, and consuming them 
at a 5X elevated rate with a 5 disk raidz2.  It seems that a SSD for 
the intent log would help quite a lot for this situation so that zfs 
can aggregate the writes.  If the typical writes are small, it would 
also help to reduce the filesystem blocksize to 8K.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss