[zfs-discuss] compressed fs taking up more space than uncompressed equivalent

2009-10-23 Thread Stathis Kamperis
Salute.

I have a filesystem where I store various source repositories (cvs +
git). I have compression enabled on and zfs get compressratio reports
1.46x. When I copy all the stuff to another filesystem without
compression, the data take up _less_ space (3.5GB vs 2.5GB). How's
that possible ?


Best regards,
Stathis
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] compressed fs taking up more space than uncompressed equivalent

2009-10-23 Thread michael schuster

Stathis Kamperis wrote:

Salute.

I have a filesystem where I store various source repositories (cvs +
git). I have compression enabled on and zfs get compressratio reports
1.46x. When I copy all the stuff to another filesystem without
compression, the data take up _less_ space (3.5GB vs 2.5GB). How's
that possible ?


just a few thoughts:
- how do you measure how much space your data consumes?
- how do you copy?
- is the other FS also ZFS?

Michael
--
Michael Schuster http://blogs.sun.com/recursion
Recursion, n.: see 'Recursion'
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] compressed fs taking up more space than uncompressed equivalent

2009-10-23 Thread Stathis Kamperis
2009/10/23 michael schuster michael.schus...@sun.com:
 Stathis Kamperis wrote:

 Salute.

 I have a filesystem where I store various source repositories (cvs +
 git). I have compression enabled on and zfs get compressratio reports
 1.46x. When I copy all the stuff to another filesystem without
 compression, the data take up _less_ space (3.5GB vs 2.5GB). How's
 that possible ?

 just a few thoughts:
 - how do you measure how much space your data consumes?
With zfs list, under the 'USED' column. du(1) gives the same results
as well (the individual fs sizes aren't enterily identical with those
that zfs list reports , but the difference still exists).

tank/sources   3.73G   620G  3.73G  /export/sources
  --- compressed
tank/test  2.32G   620G  2.32G  /tank/test
  --- uncompressed

 - how do you copy?
With cp(1). Should I be using zfs send | zfs receive ?

 - is the other FS also ZFS?
Yes. And they both live under the same pool.

If it matters, I don't have any snapshots on neither of the filesystems.

Thank you for your time.

Best regards,
Stathis Kamperis
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] compressed fs taking up more space than uncompressed equivalent

2009-10-23 Thread michael schuster

Stathis Kamperis wrote:

2009/10/23 michael schuster michael.schus...@sun.com:

Stathis Kamperis wrote:

Salute.

I have a filesystem where I store various source repositories (cvs +
git). I have compression enabled on and zfs get compressratio reports
1.46x. When I copy all the stuff to another filesystem without
compression, the data take up _less_ space (3.5GB vs 2.5GB). How's
that possible ?

just a few thoughts:
- how do you measure how much space your data consumes?

With zfs list, under the 'USED' column. du(1) gives the same results
as well (the individual fs sizes aren't enterily identical with those
that zfs list reports , but the difference still exists).

tank/sources   3.73G   620G  3.73G  /export/sources
  --- compressed
tank/test  2.32G   620G  2.32G  /tank/test
  --- uncompressed


obvious, but still: you did make sure that the compressed one doesn't have 
any other data lying around, right?





- how do you copy?

With cp(1). Should I be using zfs send | zfs receive ?


I don't know :-) I was just (still am) thinking out loud.


- is the other FS also ZFS?

Yes. And they both live under the same pool.

If it matters, I don't have any snapshots on neither of the filesystems.


zfs list -t all might still be revealing ...

Michael
--
Michael Schuster http://blogs.sun.com/recursion
Recursion, n.: see 'Recursion'
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] compressed fs taking up more space than uncompressed equivalent

2009-10-23 Thread Gaëtan Lehmann


Le 23 oct. 09 à 08:46, Stathis Kamperis a écrit :


2009/10/23 michael schuster michael.schus...@sun.com:

Stathis Kamperis wrote:


Salute.

I have a filesystem where I store various source repositories (cvs +
git). I have compression enabled on and zfs get compressratio  
reports

1.46x. When I copy all the stuff to another filesystem without
compression, the data take up _less_ space (3.5GB vs 2.5GB). How's
that possible ?


just a few thoughts:
- how do you measure how much space your data consumes?

With zfs list, under the 'USED' column. du(1) gives the same results
as well (the individual fs sizes aren't enterily identical with those
that zfs list reports , but the difference still exists).

tank/sources   3.73G   620G  3.73G  /export/sources
 --- compressed
tank/test  2.32G   620G  2.32G  /tank/test
 --- uncompressed



USED includes the size of the children and the size of the snapshot. I  
see below that you don't have snapshots on that pull, but in general,  
I found more useful to use


  zfs list -o space,compress,ratio

to look at how the space is used.


- how do you copy?

With cp(1). Should I be using zfs send | zfs receive ?


zfs send/receive or rsync -aH may do a better job by preserving hard  
links.


Regards,

Gaëtan

--
Gaëtan Lehmann
Biologie du Développement et de la Reproduction
INRA de Jouy-en-Josas (France)
tel: +33 1 34 65 29 66fax: 01 34 65 29 09
http://voxel.jouy.inra.fr  http://www.itk.org
http://www.mandriva.org  http://www.bepo.fr



PGP.sig
Description: Ceci est une signature électronique PGP
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SNV_125 MPT warning in logfile

2009-10-23 Thread Bruno Sousa

Hi Cindy,

I have a couple of questions about this issue :

  1. i have exactly the same LSI controller in another server running
 opensolaris snv_101b, and so far no errors like this ones where
 seen in the system
  2. up to snv_118 i haven't seen any problems, only now within snv_125
  3. the Sun StorageTek SAS HBA isn't a LSI OEM ? if so, is it possible
 to know what firmware version is that HBA using?


Thank you,
Bruno

Cindy Swearingen wrote:

Hi Bruno,

I see some bugs associated with these messages (6694909) that point to
an LSI firmware upgrade that cause these harmless errors to display.

According to the 6694909 comments, this issue is documented in the
release notes.

As they are harmless, I wouldn't worry about them.

Maybe someone from the driver group can comment further.

Cindy


On 10/22/09 05:40, Bruno Sousa wrote:

Hi all,

Recently i upgrade from snv_118 to snv_125, and suddently i started 
to see this messages at /var/adm/messages :


Oct 22 12:54:37 SAN02 scsi: [ID 243001 kern.warning] WARNING: 
/p...@0,0/pci10de,3...@a/pci1000,3...@0 (mpt0):
Oct 22 12:54:37 SAN02  mpt_handle_event: IOCStatus=0x8000, 
IOCLogInfo=0x3112011a
Oct 22 12:56:47 SAN02 scsi: [ID 243001 kern.warning] WARNING: 
/p...@0,0/pci10de,3...@a/pci1000,3...@0 (mpt0):
Oct 22 12:56:47 SAN02  mpt_handle_event_sync: IOCStatus=0x8000, 
IOCLogInfo=0x3112011a
Oct 22 12:56:47 SAN02 scsi: [ID 243001 kern.warning] WARNING: 
/p...@0,0/pci10de,3...@a/pci1000,3...@0 (mpt0):
Oct 22 12:56:47 SAN02  mpt_handle_event: IOCStatus=0x8000, 
IOCLogInfo=0x3112011a
Oct 22 12:56:50 SAN02 scsi: [ID 243001 kern.warning] WARNING: 
/p...@0,0/pci10de,3...@a/pci1000,3...@0 (mpt0):
Oct 22 12:56:50 SAN02  mpt_handle_event_sync: IOCStatus=0x8000, 
IOCLogInfo=0x3112011a
Oct 22 12:56:50 SAN02 scsi: [ID 243001 kern.warning] WARNING: 
/p...@0,0/pci10de,3...@a/pci1000,3...@0 (mpt0):
Oct 22 12:56:50 SAN02  mpt_handle_event: IOCStatus=0x8000, 
IOCLogInfo=0x3112011a



Is this a symptom of a disk error or some change was made in the 
driver?,that now i have more information, where in the past such 
information didn't appear?


Thanks,
Bruno

I'm using a LSI Logic SAS1068E B3 and i within lsiutil i have this 
behaviour :



1 MPT Port found

Port Name Chip Vendor/Type/RevMPT Rev  Firmware Rev  IOC
1.  mpt0  LSI Logic SAS1068E B3 105  011a 0

Select a device:  [1-1 or 0 to quit] 1

1.  Identify firmware, BIOS, and/or FCode
2.  Download firmware (update the FLASH)
4.  Download/erase BIOS and/or FCode (update the FLASH)
8.  Scan for devices
10.  Change IOC settings (interrupt coalescing)
13.  Change SAS IO Unit settings
16.  Display attached devices
20.  Diagnostics
21.  RAID actions
22.  Reset bus
23.  Reset target
42.  Display operating system names for devices
45.  Concatenate SAS firmware and NVDATA files
59.  Dump PCI config space
60.  Show non-default settings
61.  Restore default settings
66.  Show SAS discovery errors
69.  Show board manufacturing information
97.  Reset SAS link, HARD RESET
98.  Reset SAS link
99.  Reset port
e   Enable expert mode in menus
p   Enable paged mode
w   Enable logging

Main menu, select an option:  [1-99 or e/p/w or 0 to quit] 20

1.  Inquiry Test
2.  WriteBuffer/ReadBuffer/Compare Test
3.  Read Test
4.  Write/Read/Compare Test
8.  Read Capacity / Read Block Limits Test
12.  Display phy counters
13.  Clear phy counters
14.  SATA SMART Read Test
15.  SEP (SCSI Enclosure Processor) Test
18.  Report LUNs Test
19.  Drive firmware download
20.  Expander firmware download
21.  Read Logical Blocks
99.  Reset port
e   Enable expert mode in menus
p   Enable paged mode
w   Enable logging

Diagnostics menu, select an option:  [1-99 or e/p/w or 0 to quit] 12

Adapter Phy 0:  Link Down, No Errors

Adapter Phy 1:  Link Down, No Errors

Adapter Phy 2:  Link Down, No Errors

Adapter Phy 3:  Link Down, No Errors

Adapter Phy 4:  Link Up, No Errors

Adapter Phy 5:  Link Up, No Errors

Adapter Phy 6:  Link Up, No Errors

Adapter Phy 7:  Link Up, No Errors

Expander (Handle 0009) Phy 0:  Link Up
 Invalid DWord Count  79,967,229
 Running Disparity Error Count63,036,893
 Loss of DWord Synch Count   113
 Phy Reset Problem Count   0

Expander (Handle 0009) Phy 1:  Link Up
 Invalid DWord Count  79,967,207
 Running Disparity Error Count78,339,626
 Loss of DWord Synch Count   113
 Phy Reset Problem Count   0

Expander (Handle 0009) Phy 2:  Link Up
 Invalid DWord Count  76,717,646
 Running Disparity Error Count73,334,563
 Loss of DWord Synch Count   113
 Phy Reset Problem Count   0

Expander (Handle 0009) Phy 3:  

Re: [zfs-discuss] SNV_125 MPT warning in logfile

2009-10-23 Thread Bruno Sousa

Hi Adam,

How many disks and zpoo/zfs's do you have behind that LSI?
I have a system with 22 disks and 4 zpools with around 30 zfs's and so 
far it works like a charm, even during heavy load. The opensolaris 
release is snv_101b .


Bruno
Adam Cheal wrote:

Cindy: How can I view the bug report you referenced? Standard methods show my 
the bug number is valid (6694909) but no content or notes. We are having 
similar messages appear with snv_118 with a busy LSI controller, especially 
during scrubbing, and I'd be interested to see what they mentioned in that 
report. Also, the LSI firmware updates for the LSISAS3081E (the controller we 
use) don't usually come with release notes indicating what has changed in each 
firmware revision, so I'm not sure where they got that idea from.
  




--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] compressed fs taking up more space than uncompressed equivalent

2009-10-23 Thread Stathis Kamperis
2009/10/23 Gaëtan Lehmann gaetan.lehm...@jouy.inra.fr:

 Le 23 oct. 09 à 08:46, Stathis Kamperis a écrit :

 2009/10/23 michael schuster michael.schus...@sun.com:

 Stathis Kamperis wrote:

 Salute.

 I have a filesystem where I store various source repositories (cvs +
 git). I have compression enabled on and zfs get compressratio reports
 1.46x. When I copy all the stuff to another filesystem without
 compression, the data take up _less_ space (3.5GB vs 2.5GB). How's
 that possible ?

 just a few thoughts:
 - how do you measure how much space your data consumes?

 With zfs list, under the 'USED' column. du(1) gives the same results
 as well (the individual fs sizes aren't enterily identical with those
 that zfs list reports , but the difference still exists).

 tank/sources               3.73G   620G  3.73G  /export/sources
  --- compressed
 tank/test                  2.32G   620G  2.32G  /tank/test
     --- uncompressed


 USED includes the size of the children and the size of the snapshot. I see
 below that you don't have snapshots on that pull, but in general, I found
 more useful to use

  zfs list -o space,compress,ratio

 to look at how the space is used.

 - how do you copy?

 With cp(1). Should I be using zfs send | zfs receive ?

 zfs send/receive or rsync -aH may do a better job by preserving hard links.

I destroyed the test fs, recreated it and did an rsync. The size of
the uncompressed filesystem is now larger than the compressed one. I
guess cp(1) missed a great deal of stuff, which is weird because I
didn't get any error/warning on the console output. All good now.

Thanks Gaëtan and Michael for your time and sorry to the rest of the
list readers for the noise.

Best regards,
Stathis Kamperis
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] ZFS port to Linux

2009-10-23 Thread Anand Mitra
Hi All,

At KQ Infotech, we have always looked at challenging ourselves by
trying to scope out new technologies. Currently we are porting ZFS to
Linux and would like to share our progress and the challenges faced,
we would also like to know your thoughts/inputs regarding our efforts.

Though we are at early stages of porting ZFS to Linux, we have gained
some insight into how we can move forward. So far we have been
successful in achieving the following milestones.

We have a ZFS building as a module and the following primitive
operations are possible.

* Creating a pool over a file (devices not supported yet)
* Zpool list, remove
* Creating filesystems and mounting them

But we are still not at a stage, where we can create files and read
and write to them. Once we are able to successfully achieve that we
will make the same available for download.

One of the biggest questions around this effort would be “licensing”.
As far as our understanding goes; CDDL doesn’t restrict us from
modifying ZFS code and releasing it. However GPL and CDDL code cannot
be mixed, which implies that ZFS cannot be compiled into Linux Kernel
which is GPL. But we believe the way to get around this issue is to
build ZFS as a module with a CDDL license, it can still be loaded in
the Linux kernel. Though it would be restricted to use the non-GPL
symbols, but as long as that rule is adhered to there is no problem of
legal issues.

For any queries please contact us at z...@kqinfotech.com

Stay tuned for latest updates on http://twitter.com/KQInfotech and our
blog http://kqinfotech.wordpress.com.


regards
-- 
Anand Mitra
CTO, Founder
KQ Infotech
www.kqinfotech.com
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS port to Linux

2009-10-23 Thread Darren J Moffat

Anand Mitra wrote:

Hi All,

At KQ Infotech, we have always looked at challenging ourselves by
trying to scope out new technologies. Currently we are porting ZFS to
Linux and would like to share our progress and the challenges faced,
we would also like to know your thoughts/inputs regarding our efforts.

Though we are at early stages of porting ZFS to Linux, we have gained
some insight into how we can move forward. So far we have been
successful in achieving the following milestones.

We have a ZFS building as a module and the following primitive
operations are possible.

* Creating a pool over a file (devices not supported yet)
* Zpool list, remove
* Creating filesystems and mounting them

But we are still not at a stage, where we can create files and read
and write to them. Once we are able to successfully achieve that we
will make the same available for download.


That is great process thanks for sharing.   Why not share the source now 
so that others can help you ?



One of the biggest questions around this effort would be “licensing”.
As far as our understanding goes; CDDL doesn’t restrict us from
modifying ZFS code and releasing it. However GPL and CDDL code cannot
be mixed, which implies that ZFS cannot be compiled into Linux Kernel
which is GPL. But we believe the way to get around this issue is to
build ZFS as a module with a CDDL license, it can still be loaded in
the Linux kernel. Though it would be restricted to use the non-GPL
symbols, but as long as that rule is adhered to there is no problem of
legal issues.


That is my personal understanding as well, however this is not legal 
advice and I am not qualified to (or even wish to) give it in any case.


Good luck with the port.

--
Darren J Moffat
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS port to Linux

2009-10-23 Thread Joerg Schilling
Darren J Moffat darr...@opensolaris.org wrote:

  One of the biggest questions around this effort would be ?licensing?.
  As far as our understanding goes; CDDL doesn?t restrict us from
  modifying ZFS code and releasing it. However GPL and CDDL code cannot
  be mixed, which implies that ZFS cannot be compiled into Linux Kernel
  which is GPL. But we believe the way to get around this issue is to
  build ZFS as a module with a CDDL license, it can still be loaded in
  the Linux kernel. Though it would be restricted to use the non-GPL
  symbols, but as long as that rule is adhered to there is no problem of
  legal issues.

 That is my personal understanding as well, however this is not legal 
 advice and I am not qualified to (or even wish to) give it in any case.

From what I have been told by various lawyers, as long as ZFS is a separate 
work (which means that you cannot tell other people that you mixed the work 
Linux-kernel with ZFS) there is no legal problem. If you like to ask a lawyer, 
be careful to ask an independent lawyer and not the SFLC (which is 
unfortunately giving biased advise).

I good choice for an independent lawyer seems to be Lawrence Rosen:

Check e.g. http://www.rosenlaw.com/Rosen_Ch06.pdf
and http://www.rosenlaw.com/

Jörg

-- 
 EMail:jo...@schily.isdn.cs.tu-berlin.de (home) Jörg Schilling D-13353 Berlin
   j...@cs.tu-berlin.de(uni)  
   joerg.schill...@fokus.fraunhofer.de (work) Blog: 
http://schily.blogspot.com/
 URL:  http://cdrecord.berlios.de/private/ ftp://ftp.berlios.de/pub/schily
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] zfs recv complains about destroyed filesystem

2009-10-23 Thread Robert Milkowski
Hi,

snv_123, x64

zfs recv -F complains it can't open a snapshot it just destroyed itself as it 
was destroyed on a sending side. Other than complaining about it it finishes 
successfully.

Below is an example where I created a filesystem fs1 with three snapshots of it 
called snap1, snap2, snap3. Next I replicated snap1 to different location. Then 
I did incremental replication between snap1 and snap2. Then destroyed snap1 on 
a source. Then did incremental replication between snap2 and snap3 with zfs 
recv -F which should also replicate a destroying of snap1 which it did. But at 
the same time it complains it can't open snap1 just after it destroyed it 
itself. Seems like it is entirely harmless still it shouldn't happen.


pre
 zfs create archive-2/test
 zfs create archive-2/test/fs1
 zfs snapshot archive-2/test/f...@snap1
 zfs snapshot archive-2/test/f...@snap2
 zfs snapshot archive-2/test/f...@snap3

 zfs create archive-2/test/repl

 zfs send -R archive-2/test/f...@snap1 | zfs recv -vdF archive-2/test/repl
receiving full stream of archive-2/test/f...@snap1 into 
archive-2/test/repl/test/f...@snap1
received 249KB stream in 14 seconds (17.8KB/sec)
 
 zfs send -R -I archive-2/test/f...@snap1 archive-2/test/f...@snap2 | zfs recv 
 -vdF archive-2/test/repl
receiving incremental stream of archive-2/test/f...@snap2 into 
archive-2/test/repl/test/f...@snap2
received 312B stream in 1 seconds (312B/sec)
 

 zfs destroy archive-2/test/f...@snap1

 zfs send -R -I archive-2/test/f...@snap2 archive-2/test/f...@snap3 | zfs recv 
 -vdF archive-2/test/repl
attempting destroy archive-2/test/repl/test/f...@snap1
success
cannot open 'archive-2/test/repl/test/f...@snap1': dataset does not exist
receiving incremental stream of archive-2/test/f...@snap3 into 
archive-2/test/repl/test/f...@snap3
received 312B stream in 1 seconds (312B/sec)
 

 zfs list -t all -r archive-2/test
NAME USED  AVAIL  REFER  MOUNTPOINT
archive-2/test   266K  7.35T  56.1K  /archive-2/test
archive-2/test/fs1  51.2K  7.35T  51.2K  /archive-2/test/fs1
archive-2/test/f...@snap20  -  51.2K  -
archive-2/test/f...@snap30  -  51.2K  -
archive-2/test/repl  158K  7.35T  53.6K  /archive-2/test/repl
archive-2/test/repl/test 105K  7.35T  53.6K  
/archive-2/test/repl/test
archive-2/test/repl/test/fs151.2K  7.35T  51.2K  
/archive-2/test/repl/test/fs1
archive-2/test/repl/test/f...@snap2  0  -  51.2K  -
archive-2/test/repl/test/f...@snap3  0  -  51.2K  -
 

so everything was replicated as expected. However zfs recv -F should not 
complain that it can't open snap1.

/pre


-- 
Robert Milkowski
http://milek.blogspot.com
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] problems with netatalk and zfs after upgrade to snv_125

2009-10-23 Thread dirk schelfhout
( sry for cross post , I posted this in opensolaris discuss. but I think it 
belongs here )

I can no longer mount 1 of my 2 volumes.
they are both on zfs. I can still mount my home, which is on rpool.
but can not mount my data which is on a raidz pool.
setting are the same.
this is from AppleVolumes.default
~ cnidscheme:cdb options:usedots,invisibledots,upriv perm:0770
/safe ZFSshare allow:@staff,myUser cnidscheme:cdb 
options:usedots,invisibledots,upriv perm:0770

I tried recompiling and re installing. I enabled logging and don't like this 
entry in the log : 
Oct 22 20:41:24 afpd[26902][cnid.c:86]: I:CNID: Setting uid/gid to 0/80

It happened after I upgraded the raidz to the latest version. ( zpool upgrade )

update :

I added a share inside this top level. /safe/data
I can access this without problems.
still don't like these lines :
Oct 22 21:56:38 afpd[27305][auth.c:230]: I:AFPDaemon: login myUser (uid 101, 
gid 10) AFP3.1
Oct 22 21:56:40 afpd[27305][cnid.c:86]: I:CNID: Setting uid/gid to 0/0

update :

I can read with that solution but not write.
Its also easy to lock up snow leopard now :-)
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SNV_125 MPT warning in logfile

2009-10-23 Thread Adam Cheal
Our config is:
OpenSolaris snv_118 x64
1 x LSISAS3801E controller
2 x 23-disk JBOD (fully populated, 1TB 7.2k SATA drives)
Each of the two external ports on the LSI connects to a 23-disk JBOD. ZFS-wise 
we use 1 zpool with 2 x 22-disk raidz2 vdevs (1 vdev per JBOD). Each zpool has 
one ZFS filesystem containing millions of files/directories. This data is 
served up via CIFS (kernel), which is why we went with snv_118 (first release 
post-2009.06 that had stable CIFS server). Like I mentioned to James, we know 
that the server won't be a star performance-wise especially because of the wide 
vdevs but it shouldn't hiccup under load either. A guaranteed way for us to 
cause these IO errors is to load up the zpool with about 30 TB of data (90% 
full) then scrub it. Within 30 minutes we start to see the errors, which 
usually evolves into failing disks (because of excessive retry errors) which 
just makes things worse.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] bewailing of the n00b

2009-10-23 Thread Robert
I am in the beginning stage of converting multiple two-drive NAS devices to a 
more proper single-device storage solution for my home network. 

Because I have a pretty good understanding of hardware-based storage solutions, 
originally I was going to go with a a traditional server-class motherboard that 
allows for ecc memory, a depressingly expensive raid card that supports 8-12 
drives and allowed for raid6, OCE (online capacity expansion), and spin down of 
unused drives, and driven with FreeNAS.

A few months ago I happened upon ZFS and have been excitedly trying to learn 
all I can about it. There is much to admire about ZFS, so I would like to 
integrate it into my solution. The simple statement of requirements is: support 
for total of 8-12 SATA2 hard drives, protection against data loss/corruption, 
and protection against drive failure.  Must have a mostly GUI interface 
(regrettably, my command-line confidence is poor). Prefer to have a single 
large storage pool. The ability to start with few drives and add storage as 
needed is greatly desired.* 
[ *if it is more future-proof, I could dig up some lower capacity drives and 
start out with (for example) two 1.5tb drives and six 250gb drives with the 
plan to replace a 250gb drive with a 1.5tb drive as needed]

When I feel I have a secure foothold of understanding I have little (if any) 
trouble executing a plan. But I seem to be hung up on a multiple simple 
questions that are of the duh variety for which I have been unable to google 
myself to a solution. I normally take pride in finding my own solutions so it 
is with some discomfort that I admit that I find myself in need of a avuncular 
mentor to help me fit prior hardware raid knowledge with ZFS. If I had to 
guess, I'd say that a ten minute verbal conversation combined with maybe a 
dozen afterthought-induced questions that could be answered by yes/no would get 
me to that secure foothold place. 

A few examples of duh ?s
   - How can I effect OCE with ZFS? The traditional 'back up all the data 
somewhere, add a drive, re-establish the file system/pools/whatever, then copy 
the data back' is not going to work because there will be nowhere to 
temporarily 'put' the data.
  - Concordantly, Is ZFS affected by a RAID card that supports OCE? Or is 
this to no advantage?
   - RAID5/6 with ZFS: As I understand it, ZFS with raidz will provide the 
data/drive redundancy I seek [home network, with maybe two simultaneous users 
on at least a p...@1ghz/1Gb RAM storage server] so obtaining a RAID controller 
card is unnecessary/unhelpful. Yes?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Resilver speed

2009-10-23 Thread Arne Jansen
Hi,

I have a pool of 22 1T SATA disks in a RAIDZ3 configuration. It is filled with 
files of an average size of 2MB. I filled it randomly to resemble the expected 
workload in production use.
Problems arise when I try to scrub/resilver this pool. This operation takes the 
better part of a week (!). During this time the disk being resilvered is at 
100% utilisation with 300 writes/s, but only 3MB/s, which is only about 3% of 
its best case performance.
Having a window of one week with degraded redundancy is intolerable. It is 
quite likely that one loses more disks during this period, eventually leading 
to a total loss of the pool, not to mention the degraded performance during 
this period. In fact, in previous tests I lost a pool in a 6x11 RAIDZ2 
configuration.

I skimmed through the code of resilver and found out that it just enumerates 
all object in the pool and checks them one by one, having maxinflight 
I/O-request in parallel. Because this does not take the order of data ondisk 
into account it leads to this pathological performance. Also I found Bug 
6678033 stating that a prefetch might fix this.

Now my questions:
1) Are there tunings that could speed up resilver, possibly with a negative 
effect on normal performance? I thought of raising recordsize to the expected 
filesize of 2MB. Could this help?
2) What is the state of the fix? When will it be ready?
3) Do you have any configuration hints for setting up a pool layout which might 
help resilver performance? (aside from using hardware RAID instead of RAIDZ)

Thanks for any hints.
sensille
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS port to Linux

2009-10-23 Thread Bob Friesenhahn

On Fri, 23 Oct 2009, Anand Mitra wrote:


One of the biggest questions around this effort would be “licensing”.
As far as our understanding goes; CDDL doesn’t restrict us from
modifying ZFS code and releasing it. However GPL and CDDL code cannot
be mixed, which implies that ZFS cannot be compiled into Linux Kernel
which is GPL. But we believe the way to get around this issue is to
build ZFS as a module with a CDDL license, it can still be loaded in
the Linux kernel. Though it would be restricted to use the non-GPL
symbols, but as long as that rule is adhered to there is no problem of
legal issues.


The legal issues surrounding GPLv2 is what constitutes the Program 
and work based on the Program.  In the case of Linux, the Program 
is usually the Linux kernel, and things like device drivers become a 
work based on the Program.


Conjoining of source code is not really the issue.  The issue is what 
constitutes the Program.


About 10 years ago I had a long discussion with RMS and the 
(presumably) injured party related to dynamically loading a module 
linked to GPLv2 code into our application.  RMS felt that loading that 
module caused the entire work to become a work based on the Program 
while I felt that the module was the work based on the Program but 
that the rest of our application was not since that module could be 
deleted without impact to the application.


Regardless, it has always seemed to me that (with sufficient care), a 
loadable module can be developed which has no linkages to other code, 
yet can still be successfully loaded and used.  In this case it seems 
that the module could be loaded into the Linux kernel without itself 
being distributed under GPL terms.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Resilver speed

2009-10-23 Thread Bob Friesenhahn

On Fri, 23 Oct 2009, Arne Jansen wrote:
3) Do you have any configuration hints for setting up a pool layout 
which might help resilver performance? (aside from using hardware 
RAID instead of RAIDZ)


Using fewer drives per vdev should certainly speed up resilver 
performance.  It sounds you pool has one vdev with 22 drives.  If you 
can afford to lose a bit of disk space, you will obtain more 
performance by splitting into more vdevs.


Since your test files are relatively large, they are likely maximally 
striped across the disks, which results in more seeks/reads in order 
to reconstruct the data block, or to reconstruct a disk.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] bewailing of the n00b

2009-10-23 Thread Matthias Appel
I consider myself also as a noob when it gets to ZFS but I already built
myself a ZFS filer and maybe I can
enlighten you by sharing my advanced noob who read about a lot about ZFS
thoughts about ZFS


 A few examples of duh ?s
- How can I effect OCE with ZFS? The traditional 'back up all the
 data somewhere, add a drive, re-establish the file
 system/pools/whatever, then copy the data back' is not going to work
 because there will be nowhere to temporarily 'put' the data.

RAIDZ/ZS expansion is a future target of the ZFS development but until today
I was not able to determine when
this will be available.
Due to this fact I decided to assemble a pool with multiple mirrors.
I did not want to end up with a 100% full RAIDZ and no ability to extend the
pool with another drive and have to wait until RAIDZ expansion is
pruduction ready.

Ok, you losse half the capacity of the vdefs but IO performance is way
better than RAIDZ.

It is possible to extend a existing RAIDZ vpool with another RAIDZ (or a
mirror if you want, but I think this is not ideal because the
reliability/performance of a zpool is also as good as the weakest item in
the chain) but you might have to buy at least 3 drives to extend the vpool.

Adding another mirror to the vpool extends performance as well as capacity
which is not always true for RAIDZ (in the performance kind of view).

For bulk storage IO performance is not that important...depends on your
workload (YMMW, as always ;-)



   - Concordantly, Is ZFS affected by a RAID card that supports OCE?
 Or is this to no advantage?

While RAID cards are supportet it is wiser to obtain a card which is able to
publish the drives as JBOD and let ZFS handle the redundancy to benefit oft
he ZFS checksumming.

If ZFS can only see one device (whitch is redundant within the controller
point of view) you might end up loosing some data when ZFS encounters a
checksum error.

I have a two way mirror consisting of 4 Seagate enterprise-type sata discs
and doing a zpool scrub (aka. checksum check) once in a while.

Even with those enterprise grade harddisks I encounter some inconsistencies
on the drives.
I am aware that this should not happen (ideally there are no errors reported
on zppol scrub) but I i had no possibility to track down the source of error
(all disks had errors until now, so I suppose the error is caused by the
mainboard/cabling/controller)but all errors until now could be repaired
so this is n obig deal for me.

Errors can only be corrected if ZFS handles the redundancy so it can fetch
the failed blocks from another vdef.



- RAID5/6 with ZFS: As I understand it, ZFS with raidz will provide
 the data/drive redundancy I seek [home network, with maybe two
 simultaneous users on at least a p...@1ghz/1Gb RAM storage server] so
 obtaining a RAID controller card is unnecessary/unhelpful. Yes?

Yes! ;-)

If only two users are accessing the ZFS pool simultaneously, RAIDZ would be
sufficient for your type of use but I would go with dual redundancy
(RAIDZ2) because if one harddisk dies and you do a resilver (rebuild the
RAID) and one harddisk encounters chekcsum errors, you will loose data (ZFS
will tell you which files are in question but I suppose you do not want to
loose data at all)

Read http://www.enterprisestorageforum.com/article.php/3839636 for more info
about that.

I would advise you to choose a mainboard with with at least 2 Gig of ECC-RAM
because (IIRC) 1 Gig of RAM it the lowest suppoorted amount of RAM.


 --
 This message posted from opensolaris.org
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] moving files from one fs to another, splittin/merging

2009-10-23 Thread Kyle McDonald

Mike Bo wrote:

Once data resides within a pool, there should be an efficient method of moving 
it from one ZFS file system to another. Think Link/Unlink vs. Copy/Remove.

Here's my scenario... When I originally created a 3TB pool, I didn't know the 
best way carve up the space, so I used a single, flat ZFS file system. Now that 
I'm more familiar with ZFS, managing the sub-directories as separate file 
systems would have made a lot more sense (seperate policies, snapshots, etc.). 
The problem is that some of these directories contain tens of thousands of 
files and many hundreds of gigabytes. Copying this much data between file 
systems within the same disk pool just seems wrong.

I hope such a feature is possible and not too difficult to implement, because 
I'd like to see this capability in ZFS.

  
Alternatively, (and I don't know if this is feasible,) it might be 
easier and/or better to be able to set those properties on, and 
independently snapshot regular old sub directories.


Just an idea

 -Kyle


Regards,
mikebo
  


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS port to Linux

2009-10-23 Thread Kyle McDonald

Bob Friesenhahn wrote:

On Fri, 23 Oct 2009, Anand Mitra wrote:


One of the biggest questions around this effort would be “licensing”.
As far as our understanding goes; CDDL doesn’t restrict us from
modifying ZFS code and releasing it. However GPL and CDDL code cannot
be mixed, which implies that ZFS cannot be compiled into Linux Kernel
which is GPL. But we believe the way to get around this issue is to
build ZFS as a module with a CDDL license, it can still be loaded in
the Linux kernel. Though it would be restricted to use the non-GPL
symbols, but as long as that rule is adhered to there is no problem of
legal issues.


The legal issues surrounding GPLv2 is what constitutes the Program 
and work based on the Program.  In the case of Linux, the Program 
is usually the Linux kernel, and things like device drivers become a 
work based on the Program.


Conjoining of source code is not really the issue.  The issue is what 
constitutes the Program.


About 10 years ago I had a long discussion with RMS and the 
(presumably) injured party related to dynamically loading a module 
linked to GPLv2 code into our application.  RMS felt that loading that 
module caused the entire work to become a work based on the Program 
while I felt that the module was the work based on the Program but 
that the rest of our application was not since that module could be 
deleted without impact to the application.


Regardless, it has always seemed to me that (with sufficient care), a 
loadable module can be developed which has no linkages to other code, 
yet can still be successfully loaded and used.  In this case it seems 
that the module could be loaded into the Linux kernel without itself 
being distributed under GPL terms.


Disclaimer: I am not a lawyer, nor do I play one on TV. I could be very 
wrong about this.


Along these lines, it's always struck me that most of the restrictions 
of the GPL fall on the entity who distrbutes the 'work' in question.


I would thinkthat distributing the source to a separate original work 
for a module, leaves that responsibility up to who-ever compiles it and 
loads it. This means the end-users, as long as they never distribute 
what they create, are (mostly?) unaffected by the Kernel's GPL, and if 
they do distribute it, the burden is on them.


Arguably that line might even be shifted from the act of compiling it, 
to the act of actually loading (linking) it into the Kernel, so that 
distributing a compiled module might even work the same way. I'm not so 
sure about this though. Presumably compiling it before distribution 
would require the use of include files from the kernel, and that seems a 
grey area to me. Maybe clean room include files could be created?


 -Kyle



Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, 
http://www.simplesystems.org/users/bfriesen/

GraphicsMagick Maintainer,http://www.GraphicsMagick.org/


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
  



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] bewailing of the n00b

2009-10-23 Thread David Dyer-Bennet

On Fri, October 23, 2009 09:57, Robert wrote:

 A few months ago I happened upon ZFS and have been excitedly trying to
 learn all I can about it. There is much to admire about ZFS, so I would
 like to integrate it into my solution. The simple statement of
 requirements is: support for total of 8-12 SATA2 hard drives, protection
 against data loss/corruption, and protection against drive failure.  Must
 have a mostly GUI interface (regrettably, my command-line confidence is
 poor). Prefer to have a single large storage pool. The ability to start
 with few drives and add storage as needed is greatly desired.*
 [ *if it is more future-proof, I could dig up some lower capacity drives
 and start out with (for example) two 1.5tb drives and six 250gb drives
 with the plan to replace a 250gb drive with a 1.5tb drive as needed]

Solaris is really pretty command-line oriented, and the people who use it
are, so even if a GUI tool exists (sometimes it does), people tend to tell
you how to do things from the command line.  This generally works better
for remote servers, and for people supporting dozens of servers.

 When I feel I have a secure foothold of understanding I have little (if
 any) trouble executing a plan. But I seem to be hung up on a multiple
 simple questions that are of the duh variety for which I have been
 unable to google myself to a solution. I normally take pride in finding my
 own solutions so it is with some discomfort that I admit that I find
 myself in need of a avuncular mentor to help me fit prior hardware raid
 knowledge with ZFS. If I had to guess, I'd say that a ten minute verbal
 conversation combined with maybe a dozen afterthought-induced questions
 that could be answered by yes/no would get me to that secure foothold
 place.

You don't happen to be around Minneapolis MN, do you?  But really,
chatting with me we'd both just end up with new and more interesting
questions, I'm afraid.

I've been running my home NAS on ZFS for something over a year now with
good success (no data loss, though I did have to restore from backup at
one point in an upgrade brangle).  Sounds smaller than what you need,
currently 800GB usable (2 vdevs each a 2-way mirror of 400GB disks).

 A few examples of duh ?s
- How can I effect OCE with ZFS? The traditional 'back up all the data
 somewhere, add a drive, re-establish the file system/pools/whatever,
 then copy the data back' is not going to work because there will be
 nowhere to temporarily 'put' the data.

With ZFS currently, you can add a new VDEV into a pool, and the 
additional space is immediately available to everything (filesystems,
volumes, etc.) allocated out of that pool.

Also, mirrors and RAIDZ groups can expand to the size of the smallest
device in the group -- so if you have a mirror of 2 400GB drives, you can
replace one with something bigger, wait for resilver to complete (and be
vulnerable to single failure during that period), then replace the other,
resilver again (vulnerable to single failure during that period), and then
the VDEV will have the new bigger size and the pool it's in will have
access to that space.  Same thing for RAIDZ though with more drives hence
more steps.  For the mirror, by temporarily attaching a third drive you
can avoid the window of vulnerability to single-disk failure (attach third
large drive, wait for resilver, detach one small drive, attach another
large drive, wait for resilver, detach remaining small drive, for
example).

What you CANNOT do yet is add an additional disk to a RAIDZ.

   - Concordantly, Is ZFS affected by a RAID card that supports OCE? Or
 is this to no advantage?

For the kind of thing we're doing, my impression is that fancy controller
cards are a big-time losing proposition.  They make management much more
complex, prevent ZFS from doing its job, often lie about data security
issues.  And, as you say, cost money.

Which doesn't answer you question.  I don't know the answer to your
question; I've avoided working with such cards with ZFS.

- RAID5/6 with ZFS: As I understand it, ZFS with raidz will provide the
 data/drive redundancy I seek [home network, with maybe two simultaneous
 users on at least a p...@1ghz/1Gb RAM storage server] so obtaining a RAID
 controller card is unnecessary/unhelpful. Yes?

Yes.  ZFS does a good job with redundancy.  Mirroring gives you 100% or
higher (you can define 3-way, 4-way, 5-way, etc. mirrors to whatever
absurd degree you want!) redundancy; RAIDZ gives you one redundant disk in
the VDEV.  RAIDZ2 gives you two redundant disks in the VDEV.  You can pick
the degree of redundancy that meets your needs.

Performance will be a total non-issue with such a small user population. 
If they're already used to the performance of getting files over the
network, ZFS will be fine, may even improved perceived performance at the
user workstation.

Furthermore, the ZFS scrub operation and the fact that ZFS stores block
checksums for data blocks (as well as 

Re: [zfs-discuss] ZFS port to Linux

2009-10-23 Thread Bob Friesenhahn

On Fri, 23 Oct 2009, Kyle McDonald wrote:


Along these lines, it's always struck me that most of the restrictions of the 
GPL fall on the entity who distrbutes the 'work' in question.


A careful reading of GPLv2 shows that restrictions only apply when 
distributing binaries.


I would thinkthat distributing the source to a separate original work for a 
module, leaves that responsibility up to who-ever compiles it and loads it. 
This means the end-users, as long as they never distribute what they create, 
are (mostly?) unaffected by the Kernel's GPL, and if they do distribute it, 
the burden is on them.


If the end user builds from source then there are no GPL license 
issues whatsoever.  This is the nature of free software.


Arguably that line might even be shifted from the act of compiling it, to the 
act of actually loading (linking) it into the Kernel, so that distributing a 
compiled module might even work the same way. I'm not so sure about this 
though. Presumably compiling it before distribution would require the use of 
include files from the kernel, and that seems a grey area to me. Maybe clean 
room include files could be created?


This is where the lawyers get involved. :-)

There are a few vendors who have managed to distribute proprietary 
drivers as binaries for Linux.  Nvidia is one such vendor.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS port to Linux

2009-10-23 Thread David Dyer-Bennet

On Fri, October 23, 2009 11:57, Kyle McDonald wrote:


 Along these lines, it's always struck me that most of the restrictions
 of the GPL fall on the entity who distrbutes the 'work' in question.

 I would thinkthat distributing the source to a separate original work
 for a module, leaves that responsibility up to who-ever compiles it and
 loads it. This means the end-users, as long as they never distribute
 what they create, are (mostly?) unaffected by the Kernel's GPL, and if
 they do distribute it, the burden is on them.

The problem with this, I think, is that to be used by any significant
number of users, the module has to be included in a distribution, not just
distributed by itself.  (And the different distributions have their own
policies on what they will and won't consider including in terms of
licenses.)

I am also not a lawyer.  And I suspect that one important answer to many
of these questions is that the issues aren't totally clear and there isn't
precedent in case law to guide our understanding much yet.  Most of these
things haven't been litigated even once yet.
-- 
David Dyer-Bennet, d...@dd-b.net; http://dd-b.net/
Snapshots: http://dd-b.net/dd-b/SnapshotAlbum/data/
Photos: http://dd-b.net/photography/gallery/
Dragaera: http://dragaera.info

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS port to Linux

2009-10-23 Thread Kyle McDonald

Bob Friesenhahn wrote:

On Fri, 23 Oct 2009, Kyle McDonald wrote:


Along these lines, it's always struck me that most of the 
restrictions of the GPL fall on the entity who distrbutes the 'work' 
in question.


A careful reading of GPLv2 shows that restrictions only apply when 
distributing binaries.


I would thinkthat distributing the source to a separate original work 
for a module, leaves that responsibility up to who-ever compiles it 
and loads it. This means the end-users, as long as they never 
distribute what they create, are (mostly?) unaffected by the Kernel's 
GPL, and if they do distribute it, the burden is on them.


If the end user builds from source then there are no GPL license 
issues whatsoever.  This is the nature of free software.


Arguably that line might even be shifted from the act of compiling 
it, to the act of actually loading (linking) it into the Kernel, so 
that distributing a compiled module might even work the same way. I'm 
not so sure about this though. Presumably compiling it before 
distribution would require the use of include files from the kernel, 
and that seems a grey area to me. Maybe clean room include files 
could be created?


This is where the lawyers get involved. :-)

There are a few vendors who have managed to distribute proprietary 
drivers as binaries for Linux.  Nvidia is one such vendor.

Exactly. I can think of several Companies that do the same.

Packaging it in a single kernel RPM, might be too close for comfort, but 
packaging it in it's own optionally installed RPM, and including that on 
the distribution DVD should also be safe - There is a cluase that allows 
non-GPL works to be distributed on the same media.


-Kyle



Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, 
http://www.simplesystems.org/users/bfriesen/

GraphicsMagick Maintainer,http://www.GraphicsMagick.org/


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] cryptic vdev name from fmdump

2009-10-23 Thread sean walmsley
This morning we got a fault management message from one of our production 
servers stating that a fault in one of our pools had been detected and fixed. 
Looking into the error using fmdump gives:

fmdump -v -u 90ea244e-1ea9-4bd6-d2be-e4e7a021f006
TIME UUID SUNW-MSG-ID
Oct 22 09:29:05.3448 90ea244e-1ea9-4bd6-d2be-e4e7a021f006 FMD-8000-4M Repaired
  100%  fault.fs.zfs.device

Problem in: zfs://pool=vol02/vdev=179e471c0732582
   Affects:   zfs://pool=vol02/vdev=179e471c0732582
   FRU: -
  Location: -

My question is: how do I relate the vdev name above (179e471c0732582) with an 
actual drive? I've checked these id's against the device ids (cXtYdZ - 
obviously no match) and against all of the disk serial numbers. I've also tried 
all of the zpool list and zpool status options with no luck.

I'm sure I'm missing something obvious here, but if anyone can point me in the 
right direction I'd appreciate it!
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SNV_125 MPT warning in logfile

2009-10-23 Thread Jeremy f
What bug# is this under? I'm having what I believe is the same problem. Is
it possible to just take the mpt driver from a prior build in the time
being?
The below is from the load the zpool scrub creates. This is on a dell t7400
workstation with a 1068E oemed lsi. I updated the firmware to the newest
available from dell. The errors follow whichever of the 4 drives has the
highest load.

Streaming doesn't seem to trigger it as I can push 60 MiB a second to a
mirrored rpool all day, it's only when there are a lot of metadata
operations.


Oct 23 06:25:44 systurbo5 scsi: [ID 107833 kern.warning] WARNING: /p...@0
,0/pci8086,4...@9/pci8086,3...@0/pci8086,3...@0/pci1028,2...@0 (mpt0):
Oct 23 06:25:44 systurbo5   Disconnected command timeout for Target 1
Oct 23 06:27:15 systurbo5 scsi: [ID 107833 kern.warning] WARNING: /p...@0
,0/pci8086,4...@9/pci8086,3...@0/pci8086,3...@0/pci1028,2...@0 (mpt0):
Oct 23 06:27:15 systurbo5   Disconnected command timeout for Target 1
Oct 23 06:28:26 systurbo5 scsi: [ID 107833 kern.warning] WARNING: /p...@0
,0/pci8086,4...@9/pci8086,3...@0/pci8086,3...@0/pci1028,2...@0 (mpt0):
Oct 23 06:28:26 systurbo5   Disconnected command timeout for Target 1
Oct 23 06:29:47 systurbo5 scsi: [ID 107833 kern.warning] WARNING: /p...@0
,0/pci8086,4...@9/pci8086,3...@0/pci8086,3...@0/pci1028,2...@0 (mpt0):
Oct 23 06:29:47 systurbo5   Disconnected command timeout for Target 1
Oct 23 06:30:58 systurbo5 scsi: [ID 107833 kern.warning] WARNING: /p...@0
,0/pci8086,4...@9/pci8086,3...@0/pci8086,3...@0/pci1028,2...@0 (mpt0):
Oct 23 06:30:58 systurbo5   Disconnected command timeout for Target 1
Oct 23 06:31:28 systurbo5 scsi: [ID 243001 kern.warning] WARNING: /p...@0
,0/pci8086,4...@9/pci8086,3...@0/pci8086,3...@0/pci1028,2...@0 (mpt0):
Oct 23 06:31:28 systurbo5   mpt_handle_event_sync: IOCStatus=0x8000,
IOCLogInfo=0x31123000
Oct 23 06:31:28 systurbo5 scsi: [ID 243001 kern.warning] WARNING: /p...@0
,0/pci8086,4...@9/pci8086,3...@0/pci8086,3...@0/pci1028,2...@0 (mpt0):
Oct 23 06:31:28 systurbo5   mpt_handle_event: IOCStatus=0x8000,
IOCLogInfo=0x31123000
Oct 23 06:31:29 systurbo5 scsi: [ID 365881 kern.info] /p...@0
,0/pci8086,4...@9/pci8086,3...@0/pci8086,3...@0/pci1028,2...@0 (mpt0):
Oct 23 06:31:29 systurbo5   Log info 0x31123000 received for target 1.
Oct 23 06:31:29 systurbo5   scsi_status=0x0, ioc_status=0x804b,
scsi_state=0xc
Oct 23 06:31:29 systurbo5 scsi: [ID 365881 kern.info] /p...@0
,0/pci8086,4...@9/pci8086,3...@0/pci8086,3...@0/pci1028,2...@0 (mpt0):
Oct 23 06:31:29 systurbo5   Log info 0x31123000 received for target 1.
Oct 23 06:31:29 systurbo5   scsi_status=0x0, ioc_status=0x804b,
scsi_state=0xc
Oct 23 06:31:29 systurbo5 scsi: [ID 365881 kern.info] /p...@0
,0/pci8086,4...@9/pci8086,3...@0/pci8086,3...@0/pci1028,2...@0 (mpt0):
Oct 23 06:31:29 systurbo5   Log info 0x31123000 received for target 1.
Oct 23 06:31:29 systurbo5   scsi_status=0x0, ioc_status=0x804b,
scsi_state=0xc
Oct 23 06:31:29 systurbo5 scsi: [ID 365881 kern.info] /p...@0
,0/pci8086,4...@9/pci8086,3...@0/pci8086,3...@0/pci1028,2...@0 (mpt0):
Oct 23 06:31:29 systurbo5   Log info 0x31123000 received for target 1.
Oct 23 06:31:29 systurbo5   scsi_status=0x0, ioc_status=0x804b,
scsi_state=0xc


On Fri, Oct 23, 2009 at 7:13 AM, Adam Cheal ach...@pnimedia.com wrote:

 Our config is:
 OpenSolaris snv_118 x64
 1 x LSISAS3801E controller
 2 x 23-disk JBOD (fully populated, 1TB 7.2k SATA drives)
 Each of the two external ports on the LSI connects to a 23-disk JBOD.
 ZFS-wise we use 1 zpool with 2 x 22-disk raidz2 vdevs (1 vdev per JBOD).
 Each zpool has one ZFS filesystem containing millions of files/directories.
 This data is served up via CIFS (kernel), which is why we went with snv_118
 (first release post-2009.06 that had stable CIFS server). Like I mentioned
 to James, we know that the server won't be a star performance-wise
 especially because of the wide vdevs but it shouldn't hiccup under load
 either. A guaranteed way for us to cause these IO errors is to load up the
 zpool with about 30 TB of data (90% full) then scrub it. Within 30 minutes
 we start to see the errors, which usually evolves into failing disks
 (because of excessive retry errors) which just makes things worse.
 --
 This message posted from opensolaris.org
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SNV_125 MPT warning in logfile

2009-10-23 Thread Jeremy f
Sorry, running snv_123, indiana

On Fri, Oct 23, 2009 at 11:16 AM, Jeremy f rysh...@gmail.com wrote:

 What bug# is this under? I'm having what I believe is the same problem. Is
 it possible to just take the mpt driver from a prior build in the time
 being?
 The below is from the load the zpool scrub creates. This is on a dell t7400
 workstation with a 1068E oemed lsi. I updated the firmware to the newest
 available from dell. The errors follow whichever of the 4 drives has the
 highest load.

 Streaming doesn't seem to trigger it as I can push 60 MiB a second to a
 mirrored rpool all day, it's only when there are a lot of metadata
 operations.


 Oct 23 06:25:44 systurbo5 scsi: [ID 107833 kern.warning] WARNING: /p...@0
 ,0/pci8086,4...@9/pci8086,3...@0/pci8086,3...@0/pci1028,2...@0 (mpt0):
 Oct 23 06:25:44 systurbo5   Disconnected command timeout for Target 1
 Oct 23 06:27:15 systurbo5 scsi: [ID 107833 kern.warning] WARNING: /p...@0
 ,0/pci8086,4...@9/pci8086,3...@0/pci8086,3...@0/pci1028,2...@0 (mpt0):
 Oct 23 06:27:15 systurbo5   Disconnected command timeout for Target 1
 Oct 23 06:28:26 systurbo5 scsi: [ID 107833 kern.warning] WARNING: /p...@0
 ,0/pci8086,4...@9/pci8086,3...@0/pci8086,3...@0/pci1028,2...@0 (mpt0):
 Oct 23 06:28:26 systurbo5   Disconnected command timeout for Target 1
 Oct 23 06:29:47 systurbo5 scsi: [ID 107833 kern.warning] WARNING: /p...@0
 ,0/pci8086,4...@9/pci8086,3...@0/pci8086,3...@0/pci1028,2...@0 (mpt0):
 Oct 23 06:29:47 systurbo5   Disconnected command timeout for Target 1
 Oct 23 06:30:58 systurbo5 scsi: [ID 107833 kern.warning] WARNING: /p...@0
 ,0/pci8086,4...@9/pci8086,3...@0/pci8086,3...@0/pci1028,2...@0 (mpt0):
 Oct 23 06:30:58 systurbo5   Disconnected command timeout for Target 1
 Oct 23 06:31:28 systurbo5 scsi: [ID 243001 kern.warning] WARNING: /p...@0
 ,0/pci8086,4...@9/pci8086,3...@0/pci8086,3...@0/pci1028,2...@0 (mpt0):
 Oct 23 06:31:28 systurbo5   mpt_handle_event_sync: IOCStatus=0x8000,
 IOCLogInfo=0x31123000
 Oct 23 06:31:28 systurbo5 scsi: [ID 243001 kern.warning] WARNING: /p...@0
 ,0/pci8086,4...@9/pci8086,3...@0/pci8086,3...@0/pci1028,2...@0 (mpt0):
 Oct 23 06:31:28 systurbo5   mpt_handle_event: IOCStatus=0x8000,
 IOCLogInfo=0x31123000
 Oct 23 06:31:29 systurbo5 scsi: [ID 365881 kern.info] /p...@0
 ,0/pci8086,4...@9/pci8086,3...@0/pci8086,3...@0/pci1028,2...@0 (mpt0):
 Oct 23 06:31:29 systurbo5   Log info 0x31123000 received for target 1.
 Oct 23 06:31:29 systurbo5   scsi_status=0x0, ioc_status=0x804b,
 scsi_state=0xc
 Oct 23 06:31:29 systurbo5 scsi: [ID 365881 kern.info] /p...@0
 ,0/pci8086,4...@9/pci8086,3...@0/pci8086,3...@0/pci1028,2...@0 (mpt0):
 Oct 23 06:31:29 systurbo5   Log info 0x31123000 received for target 1.
 Oct 23 06:31:29 systurbo5   scsi_status=0x0, ioc_status=0x804b,
 scsi_state=0xc
 Oct 23 06:31:29 systurbo5 scsi: [ID 365881 kern.info] /p...@0
 ,0/pci8086,4...@9/pci8086,3...@0/pci8086,3...@0/pci1028,2...@0 (mpt0):
 Oct 23 06:31:29 systurbo5   Log info 0x31123000 received for target 1.
 Oct 23 06:31:29 systurbo5   scsi_status=0x0, ioc_status=0x804b,
 scsi_state=0xc
 Oct 23 06:31:29 systurbo5 scsi: [ID 365881 kern.info] /p...@0
 ,0/pci8086,4...@9/pci8086,3...@0/pci8086,3...@0/pci1028,2...@0 (mpt0):
 Oct 23 06:31:29 systurbo5   Log info 0x31123000 received for target 1.
 Oct 23 06:31:29 systurbo5   scsi_status=0x0, ioc_status=0x804b,
 scsi_state=0xc


 On Fri, Oct 23, 2009 at 7:13 AM, Adam Cheal ach...@pnimedia.com wrote:

 Our config is:
 OpenSolaris snv_118 x64
 1 x LSISAS3801E controller
 2 x 23-disk JBOD (fully populated, 1TB 7.2k SATA drives)
 Each of the two external ports on the LSI connects to a 23-disk JBOD.
 ZFS-wise we use 1 zpool with 2 x 22-disk raidz2 vdevs (1 vdev per JBOD).
 Each zpool has one ZFS filesystem containing millions of files/directories.
 This data is served up via CIFS (kernel), which is why we went with snv_118
 (first release post-2009.06 that had stable CIFS server). Like I mentioned
 to James, we know that the server won't be a star performance-wise
 especially because of the wide vdevs but it shouldn't hiccup under load
 either. A guaranteed way for us to cause these IO errors is to load up the
 zpool with about 30 TB of data (90% full) then scrub it. Within 30 minutes
 we start to see the errors, which usually evolves into failing disks
 (because of excessive retry errors) which just makes things worse.
 --
 This message posted from opensolaris.org
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS port to Linux

2009-10-23 Thread Joerg Schilling
Kyle McDonald kmcdon...@egenera.com wrote:

 Arguably that line might even be shifted from the act of compiling it, 
 to the act of actually loading (linking) it into the Kernel, so that 
 distributing a compiled module might even work the same way. I'm not so 
 sure about this though. Presumably compiling it before distribution 
 would require the use of include files from the kernel, and that seems a 
 grey area to me. Maybe clean room include files could be created?

In Germany/Europe, we have something called Wissenschaftliches Kleinzitat,
in the USA, there is fair use. For this reason, I don't believe that 
using include files or calling kernel functions is a problem.

Also note that the FSF was asked by the Open Source Initative on whether 
the GPL follows the 10 rules from the OSS definition at:

http://www.opensource.org/docs/definition.php

The FSF did reply to the OSI that the GPL has to interpreted in a way that 
makes it OSS compliant.

I would like to direct you in special to section 9 of the OSS definition.
People who claim to see problems usually ignore the rules from the OSS 
definition or from the Copyright law.

Also, looking at the substanciations of the adjucations from the lawsuits 
driven by Harald Welte shows that the German judges  have the same doubts
about the legality of many claims from the GPL as you see in the GPL review
from Lawrence Rosen in http://www.rosenlaw.com/Rosen_Ch06.pdf

I would be relaxed even if I did plan to ship ZFS binaries for Linux.
If in doubt, ask a specialized completely independend lawyer.

Jörg

-- 
 EMail:jo...@schily.isdn.cs.tu-berlin.de (home) Jörg Schilling D-13353 Berlin
   j...@cs.tu-berlin.de(uni)  
   joerg.schill...@fokus.fraunhofer.de (work) Blog: 
http://schily.blogspot.com/
 URL:  http://cdrecord.berlios.de/private/ ftp://ftp.berlios.de/pub/schily
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS port to Linux

2009-10-23 Thread Joerg Schilling
Bob Friesenhahn bfrie...@simple.dallas.tx.us wrote:

 On Fri, 23 Oct 2009, Kyle McDonald wrote:
 
  Along these lines, it's always struck me that most of the restrictions of 
  the 
  GPL fall on the entity who distrbutes the 'work' in question.

 A careful reading of GPLv2 shows that restrictions only apply when 
 distributing binaries.

These restrictions in substance require you to that you need to make 
everything (except things that wre usually distributed with the system)
available to allow a recompilation (including linking) for a GPLd work.
You have to do this in case that you ship binaries from the GPLd work. 
If you ship a ZFS binary, I see no reason why someone could try to argue
that you ship binaries from GPLd code ;-)


 There are a few vendors who have managed to distribute proprietary 
 drivers as binaries for Linux.  Nvidia is one such vendor.

But ZFS is no proprietary driver, it is OSS.

Jörg

-- 
 EMail:jo...@schily.isdn.cs.tu-berlin.de (home) Jörg Schilling D-13353 Berlin
   j...@cs.tu-berlin.de(uni)  
   joerg.schill...@fokus.fraunhofer.de (work) Blog: 
http://schily.blogspot.com/
 URL:  http://cdrecord.berlios.de/private/ ftp://ftp.berlios.de/pub/schily
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS port to Linux

2009-10-23 Thread Joerg Schilling
David Dyer-Bennet d...@dd-b.net wrote:

 The problem with this, I think, is that to be used by any significant
 number of users, the module has to be included in a distribution, not just
 distributed by itself.  (And the different distributions have their own
 policies on what they will and won't consider including in terms of
 licenses.)

For this argument, I recommend to read the OpenSource Definition at:
http://www.opensource.org/docs/definition.php in special look at section 9.

The FSF grants you that the GPL is an OSS compliant license, so there is no
difference between shipping ZFS separately and shipping it as part of a distro.

Jörg

-- 
 EMail:jo...@schily.isdn.cs.tu-berlin.de (home) Jörg Schilling D-13353 Berlin
   j...@cs.tu-berlin.de(uni)  
   joerg.schill...@fokus.fraunhofer.de (work) Blog: 
http://schily.blogspot.com/
 URL:  http://cdrecord.berlios.de/private/ ftp://ftp.berlios.de/pub/schily
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] cryptic vdev name from fmdump

2009-10-23 Thread Cindy Swearingen

Hi Sean,

A better way probably exists but I use the fdump -eV to identify the
pool and the device information (vdev_path) that is listed like this:

# fmdump -eV | more
.
.
.


pool = test
pool_guid = 0x6de45047d7bde91d
pool_context = 0
pool_failmode = wait
vdev_guid = 0x2ab2d3ba9fc1922b
vdev_type = disk
vdev_path = /dev/dsk/c0t6d0s0

Then you can match the vdev_path device to the device in your storage
pool.

You can also review the date/time stamps in this output to see how long
the device has had a problem.

Its probably a good idea to run a zpool scrub on this pool too.

Cindy


On 10/23/09 12:04, sean walmsley wrote:

This morning we got a fault management message from one of our production 
servers stating that a fault in one of our pools had been detected and fixed. 
Looking into the error using fmdump gives:

fmdump -v -u 90ea244e-1ea9-4bd6-d2be-e4e7a021f006
TIME UUID SUNW-MSG-ID
Oct 22 09:29:05.3448 90ea244e-1ea9-4bd6-d2be-e4e7a021f006 FMD-8000-4M Repaired
  100%  fault.fs.zfs.device

Problem in: zfs://pool=vol02/vdev=179e471c0732582
   Affects:   zfs://pool=vol02/vdev=179e471c0732582
   FRU: -
  Location: -

My question is: how do I relate the vdev name above (179e471c0732582) with an actual drive? I've 
checked these id's against the device ids (cXtYdZ - obviously no match) and against all of the disk 
serial numbers. I've also tried all of the zpool list and zpool status 
options with no luck.

I'm sure I'm missing something obvious here, but if anyone can point me in the 
right direction I'd appreciate it!

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SNV_125 MPT warning in logfile

2009-10-23 Thread Adam Cheal
Just submitted the bug yesterday, under advice of James, so I don't have a 
number you can refer to you...the change request number is 6894775 if that 
helps or is directly related to the future bugid.

From what I seen/read this problem has been around for awhile but only rears 
its ugly head under heavy IO with large filesets, probably related to large 
metadata sets as you spoke of. We are using snv_118 x64 but it seems to appear 
in snv_123 and snv_125 as well from what I read here.

We've tried installing SSD's to act as a read-cache for the pool to reduce the 
metadata hits on the physical disks and as a last-ditch effort we even tried 
switching to the latest LSI-supplied itmpt driver from 2007 (from reading 
http://enginesmith.wordpress.com/2009/08/28/ssd-faults-finally-resolved/) and 
disabling the mpt driver but we ended up with the same timeout issues. In our 
case, the drives in the JBODs are all WD (model WD1002FBYS-18A6B0) 1TB 7.2k 
SATA drives.

In revisting our architecture, we compared it to Sun's x4540 Thumper offering 
which uses the same controller with similar (though apparently customized) 
firmware and 48 disks. The difference is that they use 6 x LSI1068e controllers 
which each have to deal with only 8 disks...obviously better on performance but 
this architecture could be hiding the real IO issue by distributing the IO 
across so many controllers.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Zpool issue question

2009-10-23 Thread Cindy Swearingen

Hi Karim,

All ZFS storage pools are going to use some amount of space for
metadata and in this example it looks like 3 GB. This is what
the difference between zpool list and zfs list is telling you.
No other way exists to calculate the space that is consumed by
metadata.

pool space (199 GB) minus avail file system space (196 GB) = 3 GB

The db-smp.zpool/db-smp.zfs is consuming 180 GB and the 
db-smp.zpool/oraarch.zfs is consuming 15.8 GB, which adds up to 196 GB.


This pool is full so you need to reduce the space that the file systems
are consuming or add more disks to the pool.

Cindy

On 10/23/09 06:55, Karim Akhssassi wrote:

Hi,

Im Karim from the Solaris software support, i need a help from your side 
regarding this issue :


Why ZFS is full while it's zpool has 3.11 Go available ?

zfs list -t filesystem | egrep db-smp|NAME
NAME USED AVAIL REFER MOUNTPOINT
db-smp.zpool 196G 0 1K legacy
db-smp.zpool/db-smp.zfs 180G 0 70.8G 
/opt/quetzal.zone/data/quetzal.zone/root/opt/db-smp
db-smp.zpool/oraarch.zfs 15.8G 0 15.8G 
/opt/quetzal.zone/data/quetzal.zone/root/opt/db-smp/oracle/SMP/oraarch/

zpool list | egrep db-smp|NAME
NAME SIZE USED AVAIL CAP HEALTH ALTROOT
db-smp.zpool 199G 196G 3.11G 98% ONLINE 
/opt/quetzal.zone/data/quetzal.zone/root


*I Found*http://www.opensolaris.org/os/community/zfs/faq/#zfsspace 
http://www.opensolaris.org/os/community/zfs/faq/#zfsspace

---
1) Why doesn't the space that is reported by the zpool list command and 
the zfs list command match?
The available space that is reported by the zpool list command is the 
amount of physical disk space. The zfs list command lists the usable 
space that is available to file systems, which is disk space minus ZFS 
redundancy metadata overhead, if any.

**

if this doc can explain the above issue, how can i know (calculate) the 
zfs redundancy metadata?how can i know if it exists?

otherwise how can we fix the above problem.

I appreciate your help

regards

A.Karim



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] new google group for ZFS on OSX

2009-10-23 Thread Richard Elling

FYI,
The ZFS project on MacOS forge (zfs.macosforge.org) has provided the
following announcement:

ZFS Project Shutdown2009-10-23  
	The ZFS project has been discontinued. The mailing list and  
repository will

also be removed shortly.

The community is migrating to a new google group:
http://groups.google.com/group/zfs-macos

 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] new google group for ZFS on OSX

2009-10-23 Thread Tim Cook
On Fri, Oct 23, 2009 at 2:38 PM, Richard Elling richard.ell...@gmail.comwrote:

 FYI,
 The ZFS project on MacOS forge (zfs.macosforge.org) has provided the
 following announcement:

ZFS Project Shutdown2009-10-23
The ZFS project has been discontinued. The mailing list and
 repository will
also be removed shortly.

 The community is migrating to a new google group:
http://groups.google.com/group/zfs-macos

  -- richard



Any official word from Apple on the abandonment?

--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] new google group for ZFS on OSX

2009-10-23 Thread Richard Elling

On Oct 23, 2009, at 12:42 PM, Tim Cook wrote:
On Fri, Oct 23, 2009 at 2:38 PM, Richard Elling richard.ell...@gmail.com 
 wrote:

FYI,
The ZFS project on MacOS forge (zfs.macosforge.org) has provided the
following announcement:

   ZFS Project Shutdown2009-10-23
   The ZFS project has been discontinued. The mailing list and  
repository will

   also be removed shortly.

The community is migrating to a new google group:
   http://groups.google.com/group/zfs-macos

 -- richard


Any official word from Apple on the abandonment?


I can't hear anything over the din of crickets...
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] bewailing of the n00b

2009-10-23 Thread Travis Tabbal
 - How can I effect OCE with ZFS? The traditional
 'back up all the data somewhere, add a drive,
 re-establish the file system/pools/whatever, then
 copy the data back' is not going to work because
 there will be nowhere to temporarily 'put' the
  data.

Add devices to the pool. Preferably in mirrors or raidz configurations. If you 
just add bare devices to the pool you are running RAID-0, no redundancy. You 
cannot add devices to a raidz, as mentioned. But you can add more raidz or 
mirror devices. You can also replace devices with larger ones. It would be nice 
to be able to add more devices to a raidz for home users like us, maybe we'll 
see it someday. For now, the capabilities we do have make it reasonable to deal 
with. 

 - Concordantly, Is ZFS affected by a RAID card
 that supports OCE? Or is this to no advantage?


Don't bother. Spend the money on more RAM, and drives. :) Do get a nice 
controller though. Supermicro makes a few nice units. I'm using 2 AOC-USAS-L8i 
cards. They work great, though you do have to mod the mounting bracket to get 
them to work in a standard case. These are based on LSI cards, I just found 
them cheaper than the same LSI branded card. Avoid the cheap $20 4-port jobs. 
I've had a couple of them die already. Thankfully, I didn't lose any data... I 
think... no ZFS on that box. 


 - RAID5/6 with ZFS: As I understand it, ZFS with
 raidz will provide the data/drive redundancy I seek
 [home network, with maybe two simultaneous users on
 at least a p...@1ghz/1Gb RAM storage server] so
 obtaining a RAID controller card is
  unnecessary/unhelpful. Yes?


Correct. Though I would increase the RAM personally, it's so cheap these days. 
My home fileserver has 8GB of ECC RAM. I'm also running Xen VMs though, so some 
of my RAM is used for running those. 

You can even do tripple-redundant raidz with ZFS now, so you could lose 3 
drives without any data loss. For those that want really high availability, or 
really big arrays I suppose. I'm running 4x1.5TB in a raidz1, no problems. I do 
plan to keep a spare around though. I'll just use it to store backups to start 
with. If a drive goes bad, I'll drop it in and do a zpool replace. 

Don't worry about the command line. The ZFS based commands are pretty short and 
simple. Read up on zpool and zfs. Those are the commands you use the most for 
managing ZFS. There's also the ZFS best practices guide if you haven't seen it. 
Useful advice in there.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zpool with very different sized vdevs?

2009-10-23 Thread Travis Tabbal
Hmm.. I expected people to jump on me yelling that it's a bad idea. :) 

How about this, can I remove a vdev from a pool if the pool still has enough 
space to hold the data? So could I add it in and mess with it for a while 
without losing anything? I would expect the system to resliver the data onto 
the remaining vdevs, or tell me to go jump off a pier. :)
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zpool with very different sized vdevs?

2009-10-23 Thread Tim Cook
On Fri, Oct 23, 2009 at 3:05 PM, Travis Tabbal tra...@tabbal.net wrote:

 Hmm.. I expected people to jump on me yelling that it's a bad idea. :)

 How about this, can I remove a vdev from a pool if the pool still has
 enough space to hold the data? So could I add it in and mess with it for a
 while without losing anything? I would expect the system to resliver the
 data onto the remaining vdevs, or tell me to go jump off a pier. :)
 --



Jump off a pier.  Removing devices is not currently supported but it is in
the works.

--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SNV_125 MPT warning in logfile

2009-10-23 Thread James C. McPherson

Adam Cheal wrote:

Just submitted the bug yesterday, under advice of James, so I don't have a number you can 
refer to you...the change request number is 6894775 if that helps or is 
directly related to the future bugid.


From what I seen/read this problem has been around for awhile but only rears 
its ugly head under heavy IO with large filesets, probably related to large 
metadata sets as you spoke of. We are using snv_118 x64 but it seems to appear 
in snv_123 and snv_125 as well from what I read here.


We've tried installing SSD's to act as a read-cache for the pool to reduce the metadata 
hits on the physical disks and as a last-ditch effort we even tried switching to the 
latest LSI-supplied itmpt driver from 2007 (from reading 
http://enginesmith.wordpress.com/2009/08/28/ssd-faults-finally-resolved/) and disabling 
the mpt driver but we ended up with the same timeout issues. In our case, the drives in 
the JBODs are all WD (model WD1002FBYS-18A6B0) 1TB 7.2k SATA drives.

In revisting our architecture, we compared it to Sun's x4540 Thumper offering which uses 
the same controller with similar (though apparently customized) firmware and 48 disks. 
The difference is that they use 6 x LSI1068e controllers which each have to deal with 
only 8 disks...obviously better on performance but this architecture could be 
hiding the real IO issue by distributing the IO across so many controllers.


Hi Adam,
I was watching the incoming queues all day yesterday for the
bug, but missed seeing it, not sure why.

I've now moved the bug to the appropriate category so it will
get attention from the right people.


Thanks,
James C. McPherson
--
Senior Kernel Software Engineer, Solaris
Sun Microsystems
http://blogs.sun.com/jmcp   http://www.jmcp.homeunix.com/blog
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SNV_125 MPT warning in logfile

2009-10-23 Thread Bruno Sousa
Could Sun'x x4540 Thumper reason to have 6 LSI's some sort of hidden 
problems found by Sun where the HBA resets, and due to market time 
pressure the quick and dirty solution was to spread the load over 
multiple HBA's instead of software fix?


Just my 2 cents..


Bruno


Adam Cheal wrote:

Just submitted the bug yesterday, under advice of James, so I don't have a number you can 
refer to you...the change request number is 6894775 if that helps or is 
directly related to the future bugid.

From what I seen/read this problem has been around for awhile but only rears 
its ugly head under heavy IO with large filesets, probably related to large 
metadata sets as you spoke of. We are using snv_118 x64 but it seems to appear in 
snv_123 and snv_125 as well from what I read here.

We've tried installing SSD's to act as a read-cache for the pool to reduce the metadata 
hits on the physical disks and as a last-ditch effort we even tried switching to the 
latest LSI-supplied itmpt driver from 2007 (from reading 
http://enginesmith.wordpress.com/2009/08/28/ssd-faults-finally-resolved/) and disabling 
the mpt driver but we ended up with the same timeout issues. In our case, the drives in 
the JBODs are all WD (model WD1002FBYS-18A6B0) 1TB 7.2k SATA drives.

In revisting our architecture, we compared it to Sun's x4540 Thumper offering which uses 
the same controller with similar (though apparently customized) firmware and 48 disks. 
The difference is that they use 6 x LSI1068e controllers which each have to deal with 
only 8 disks...obviously better on performance but this architecture could be 
hiding the real IO issue by distributing the IO across so many controllers.
  



--
This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] cryptic vdev name from fmdump

2009-10-23 Thread sean walmsley
Thanks for this information.

We have a weekly scrub schedule, but I ran another just to be sure :-) It 
completed with 0 errors.

Running fmdump -eV gives:
TIME   CLASS
fmdump: /var/fm/fmd/errlog is empty

Dumping the faultlog (no -e) does give some output, but again there are no 
human readable identifiers:

... (some stuff omitted)
(start fault-list[0])
nvlist version: 0
version = 0x0
class = fault.fs.zfs.device
certainty = 0x64
asru = (embedded nvlist)
nvlist version: 0
version = 0x0
scheme = zfs
pool = 0x4fcdc2c9d60a5810
vdev = 0x179e471c0732582
(end asru)

resource = (embedded nvlist)
nvlist version: 0
version = 0x0
scheme = zfs
pool = 0x4fcdc2c9d60a5810
vdev = 0x179e471c0732582
(end resource)

(end fault-list[0])

So, I'm still stumped.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SNV_125 MPT warning in logfile

2009-10-23 Thread Bruno Sousa

Hi Cindy,

Thank you for the update, mas it seems like i can't see any information 
specific to that bug.
I can only see bugs number 6702538 and 6615564, but according to their 
history, they have been fixed quite some time ago.

Can you by any chance present the information about bug 6694909 ?

Thank you,
Bruno


Cindy Swearingen wrote:

Hi Bruno,

I see some bugs associated with these messages (6694909) that point to
an LSI firmware upgrade that cause these harmless errors to display.

According to the 6694909 comments, this issue is documented in the
release notes.

As they are harmless, I wouldn't worry about them.

Maybe someone from the driver group can comment further.

Cindy


On 10/22/09 05:40, Bruno Sousa wrote:

Hi all,

Recently i upgrade from snv_118 to snv_125, and suddently i started 
to see this messages at /var/adm/messages :


Oct 22 12:54:37 SAN02 scsi: [ID 243001 kern.warning] WARNING: 
/p...@0,0/pci10de,3...@a/pci1000,3...@0 (mpt0):
Oct 22 12:54:37 SAN02  mpt_handle_event: IOCStatus=0x8000, 
IOCLogInfo=0x3112011a
Oct 22 12:56:47 SAN02 scsi: [ID 243001 kern.warning] WARNING: 
/p...@0,0/pci10de,3...@a/pci1000,3...@0 (mpt0):
Oct 22 12:56:47 SAN02  mpt_handle_event_sync: IOCStatus=0x8000, 
IOCLogInfo=0x3112011a
Oct 22 12:56:47 SAN02 scsi: [ID 243001 kern.warning] WARNING: 
/p...@0,0/pci10de,3...@a/pci1000,3...@0 (mpt0):
Oct 22 12:56:47 SAN02  mpt_handle_event: IOCStatus=0x8000, 
IOCLogInfo=0x3112011a
Oct 22 12:56:50 SAN02 scsi: [ID 243001 kern.warning] WARNING: 
/p...@0,0/pci10de,3...@a/pci1000,3...@0 (mpt0):
Oct 22 12:56:50 SAN02  mpt_handle_event_sync: IOCStatus=0x8000, 
IOCLogInfo=0x3112011a
Oct 22 12:56:50 SAN02 scsi: [ID 243001 kern.warning] WARNING: 
/p...@0,0/pci10de,3...@a/pci1000,3...@0 (mpt0):
Oct 22 12:56:50 SAN02  mpt_handle_event: IOCStatus=0x8000, 
IOCLogInfo=0x3112011a



Is this a symptom of a disk error or some change was made in the 
driver?,that now i have more information, where in the past such 
information didn't appear?


Thanks,
Bruno

I'm using a LSI Logic SAS1068E B3 and i within lsiutil i have this 
behaviour :



1 MPT Port found

Port Name Chip Vendor/Type/RevMPT Rev  Firmware Rev  IOC
1.  mpt0  LSI Logic SAS1068E B3 105  011a 0

Select a device:  [1-1 or 0 to quit] 1

1.  Identify firmware, BIOS, and/or FCode
2.  Download firmware (update the FLASH)
4.  Download/erase BIOS and/or FCode (update the FLASH)
8.  Scan for devices
10.  Change IOC settings (interrupt coalescing)
13.  Change SAS IO Unit settings
16.  Display attached devices
20.  Diagnostics
21.  RAID actions
22.  Reset bus
23.  Reset target
42.  Display operating system names for devices
45.  Concatenate SAS firmware and NVDATA files
59.  Dump PCI config space
60.  Show non-default settings
61.  Restore default settings
66.  Show SAS discovery errors
69.  Show board manufacturing information
97.  Reset SAS link, HARD RESET
98.  Reset SAS link
99.  Reset port
e   Enable expert mode in menus
p   Enable paged mode
w   Enable logging

Main menu, select an option:  [1-99 or e/p/w or 0 to quit] 20

1.  Inquiry Test
2.  WriteBuffer/ReadBuffer/Compare Test
3.  Read Test
4.  Write/Read/Compare Test
8.  Read Capacity / Read Block Limits Test
12.  Display phy counters
13.  Clear phy counters
14.  SATA SMART Read Test
15.  SEP (SCSI Enclosure Processor) Test
18.  Report LUNs Test
19.  Drive firmware download
20.  Expander firmware download
21.  Read Logical Blocks
99.  Reset port
e   Enable expert mode in menus
p   Enable paged mode
w   Enable logging

Diagnostics menu, select an option:  [1-99 or e/p/w or 0 to quit] 12

Adapter Phy 0:  Link Down, No Errors

Adapter Phy 1:  Link Down, No Errors

Adapter Phy 2:  Link Down, No Errors

Adapter Phy 3:  Link Down, No Errors

Adapter Phy 4:  Link Up, No Errors

Adapter Phy 5:  Link Up, No Errors

Adapter Phy 6:  Link Up, No Errors

Adapter Phy 7:  Link Up, No Errors

Expander (Handle 0009) Phy 0:  Link Up
 Invalid DWord Count  79,967,229
 Running Disparity Error Count63,036,893
 Loss of DWord Synch Count   113
 Phy Reset Problem Count   0

Expander (Handle 0009) Phy 1:  Link Up
 Invalid DWord Count  79,967,207
 Running Disparity Error Count78,339,626
 Loss of DWord Synch Count   113
 Phy Reset Problem Count   0

Expander (Handle 0009) Phy 2:  Link Up
 Invalid DWord Count  76,717,646
 Running Disparity Error Count73,334,563
 Loss of DWord Synch Count   113
 Phy Reset Problem Count   0

Expander (Handle 0009) Phy 3:  Link Up
 Invalid DWord Count  79,896,409
 Running Disparity Error Count   

Re: [zfs-discuss] SNV_125 MPT warning in logfile

2009-10-23 Thread Richard Elling

On Oct 23, 2009, at 1:48 PM, Bruno Sousa wrote:
Could Sun'x x4540 Thumper reason to have 6 LSI's some sort of  
hidden problems found by Sun where the HBA resets, and due to  
market time pressure the quick and dirty solution was to spread  
the load over multiple HBA's instead of software fix?


I don't think so. X4540 has 48 disks -- 6 controllers at 8 disks/ 
controller.
This is the same configuration as the X4500, which used a Marvell  
controller.

This decision leverages parts from the previous design.
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] cannot import 'rpool': one or more devices is currently unavailable

2009-10-23 Thread Victor Latushkin

Tommy McNeely wrote:
I have a system who's rpool has gone defunct. The rpool is made of a 
single disk which is a raid5EE made of all 8 146G disks on the box. 
The raid card is the Adaptec brand card.  It was running nv_107, but its 
currently net booted to nv_121. I have already checked in the raid card 
bios, and it says the volume is optimal . We had a power outage in 
BRM07 on Tuesday, and the system appeared to boot back up, but then went 
wonky. I power cycled it, and it came back to a grub prompt cause it 
couldn't read the filesystem.


We've been able to recover pool by rolling back a few uberblocks - only 
a few log files had errors. Uberblock rollback project would allow to 
recover in this case quite easily.


Victor


# uname -a
SunOS  5.11 snv_121 i86pc i386 i86pc

# zpool import
 pool: rpool
   id: 7197437773913332097
state: ONLINE
status: The pool was last accessed by another system.
action: The pool can be imported using its name or numeric identifier and
   the '-f' flag.
  see: http://www.sun.com/msg/ZFS-8000-EY
config:

   rpool   ONLINE
 c0t0d0s0  ONLINE
# zpool import -f 7197437773913332097
cannot import 'rpool': one or more devices is currently unavailable
#

# zpool import -a -f -R /a
cannot import 'rpool': one or more devices is currently unavailable
# zdb -l /dev/dsk/c0t0d0s0

LABEL 0

   version=14
   name='rpool'
   state=0
   txg=742622
   pool_guid=7197437773913332097
   hostid=4930069
   hostname=''
   top_guid=5620634672424557591
   guid=5620634672424557591
   vdev_tree
   type='disk'
   id=0
   guid=5620634672424557591
   path='/dev/dsk/c0t0d0s0'
   devid='id1,s...@tsun_stk_raid_intefd1dfe0/a'
   phys_path='/p...@0,0/pci8086,3...@4/pci108e,2...@0/d...@0,0:a'
   whole_disk=0
   metaslab_array=24
   metaslab_shift=33
   ashift=9
   asize=880083730432
   is_log=0

LABEL 1

   version=14
   name='rpool'
   state=0
   txg=742622
   pool_guid=7197437773913332097
   hostid=4930069
   hostname=''
   top_guid=5620634672424557591
   guid=5620634672424557591
   vdev_tree
   type='disk'
   id=0
   guid=5620634672424557591
   path='/dev/dsk/c0t0d0s0'
   devid='id1,s...@tsun_stk_raid_intefd1dfe0/a'
   phys_path='/p...@0,0/pci8086,3...@4/pci108e,2...@0/d...@0,0:a'
   whole_disk=0
   metaslab_array=24
   metaslab_shift=33
   ashift=9
   asize=880083730432
   is_log=0

LABEL 2

   version=14
   name='rpool'
   state=0
   txg=742622
   pool_guid=7197437773913332097
   hostid=4930069
   hostname=''
   top_guid=5620634672424557591
   guid=5620634672424557591
   vdev_tree
   type='disk'
   id=0
   guid=5620634672424557591
   path='/dev/dsk/c0t0d0s0'
   devid='id1,s...@tsun_stk_raid_intefd1dfe0/a'
   phys_path='/p...@0,0/pci8086,3...@4/pci108e,2...@0/d...@0,0:a'
   whole_disk=0
   metaslab_array=24
   metaslab_shift=33
   ashift=9
   asize=880083730432
   is_log=0

LABEL 3

   version=14
   name='rpool'
   state=0
   txg=742622
   pool_guid=7197437773913332097
   hostid=4930069
   hostname=''
   top_guid=5620634672424557591
   guid=5620634672424557591
   vdev_tree
   type='disk'
   id=0
   guid=5620634672424557591
   path='/dev/dsk/c0t0d0s0'
   devid='id1,s...@tsun_stk_raid_intefd1dfe0/a'
   phys_path='/p...@0,0/pci8086,3...@4/pci108e,2...@0/d...@0,0:a'
   whole_disk=0
   metaslab_array=24
   metaslab_shift=33
   ashift=9
   asize=880083730432
   is_log=0
# zdb -cu -e -d /dev/dsk/c0t0d0s0
zdb: can't open /dev/dsk/c0t0d0s0: No such file or directory
# zdb -e rpool -cu
zdb: can't open rpool: No such device or address
# zdb -e 7197437773913332097
zdb: can't open 7197437773913332097: No such device or address
#

I obviously have no clue how to weild zdb.

Any help you can offer would be appreciated.

Thanks,
Tommy

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] heads up on SXCE build 125 (LU + mirrored root pools)

2009-10-23 Thread Karl Rossing

Is there a CR yet for this?

Thanks
Karl

Cindy Swearingen wrote:


Hi everyone,

Currently, the device naming changes in build 125 mean that you cannot
use Solaris Live Upgrade to upgrade or patch a ZFS root dataset in a
mirrored root pool.

If you are considering this release for the ZFS log device removal
feature, then also consider that you will not be able to patch or
upgrade the ZFS root dataset in a mirrored root pool in this release
with Solaris Live Upgrade. Unmirrored root pools are not impacted.

OpenSolaris releases are not impacted by the build 125 device naming
changes.

I don't have a CR yet that covers this problem, but we will keep you
posted.

Thanks,

Cindy


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss




CONFIDENTIALITY NOTICE:  This communication (including all attachments) is
confidential and is intended for the use of the named addressee(s) only and
may contain information that is private, confidential, privileged, and
exempt from disclosure under law.  All rights to privilege are expressly
claimed and reserved and are not waived.  Any use, dissemination,
distribution, copying or disclosure of this message and any attachments, in
whole or in part, by anyone other than the intended recipient(s) is strictly
prohibited.  If you have received this communication in error, please notify
the sender immediately, delete this communication from all data storage
devices and destroy all hard copies.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] cryptic vdev name from fmdump

2009-10-23 Thread Cindy Swearingen

I'm stumped too. Someone with more FM* experience needs to comment.

Cindy

On 10/23/09 14:52, sean walmsley wrote:

Thanks for this information.

We have a weekly scrub schedule, but I ran another just to be sure :-) It 
completed with 0 errors.

Running fmdump -eV gives:
TIME   CLASS
fmdump: /var/fm/fmd/errlog is empty

Dumping the faultlog (no -e) does give some output, but again there are no human 
readable identifiers:

... (some stuff omitted)
(start fault-list[0])
nvlist version: 0
version = 0x0
class = fault.fs.zfs.device
certainty = 0x64
asru = (embedded nvlist)
nvlist version: 0
version = 0x0
scheme = zfs
pool = 0x4fcdc2c9d60a5810
vdev = 0x179e471c0732582
(end asru)

resource = (embedded nvlist)
nvlist version: 0
version = 0x0
scheme = zfs
pool = 0x4fcdc2c9d60a5810
vdev = 0x179e471c0732582
(end resource)

(end fault-list[0])

So, I'm still stumped.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] cryptic vdev name from fmdump

2009-10-23 Thread Eric Schrock

On 10/23/09 15:05, Cindy Swearingen wrote:

I'm stumped too. Someone with more FM* experience needs to comment.


Looks like your errlog may have been rotated out of existence - see if 
there is a .X or .gz version in /var/fm/fmd/errlog*.  The list.suspect 
fault should be including a location field that would contain the human 
readable name for the vdev, but this work (extending the libtopo scheme 
to support enumeration and label properties) hasn't yet been done. 
There is also a small change that needs to be made to fmd to support 
location for non-FRUs.  You should to able to do echo ::spa -c | mdb 
-k and look for that vdev id, assuming the vdev is still active on the 
system.


- Eric



Cindy

On 10/23/09 14:52, sean walmsley wrote:

Thanks for this information.

We have a weekly scrub schedule, but I ran another just to be sure :-) 
It completed with 0 errors.


Running fmdump -eV gives:
TIME   CLASS
fmdump: /var/fm/fmd/errlog is empty

Dumping the faultlog (no -e) does give some output, but again there 
are no human readable identifiers:


... (some stuff omitted)
(start fault-list[0])
nvlist version: 0
version = 0x0
class = fault.fs.zfs.device
certainty = 0x64
asru = (embedded nvlist)
nvlist version: 0
version = 0x0
scheme = zfs
pool = 0x4fcdc2c9d60a5810
vdev = 0x179e471c0732582
(end asru)

resource = (embedded nvlist)
nvlist version: 0
version = 0x0
scheme = zfs
pool = 0x4fcdc2c9d60a5810
vdev = 0x179e471c0732582
(end resource)

(end fault-list[0])

So, I'm still stumped.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss



--
Eric Schrock, Fishworkshttp://blogs.sun.com/eschrock
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] heads up on SXCE build 125 (LU + mirrored root pools)

2009-10-23 Thread Chris Du
Sorry, do you mean luupgrade from previous versions or from 125 to future 
versions?

I luupgrade from 124 to 125 with mirrored root pool and everything is working 
fine.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] PSARC 2009/571: ZFS deduplication properties

2009-10-23 Thread Craig S. Bell
I haven't seen any mention of it in this forum yet, so FWIW you might be 
interested in the details of ZFS deduplication mentioned in this recently-filed 
case.

Case log:  http://arc.opensolaris.org/caselog/PSARC/2009/571/
Discussion:  http://www.opensolaris.org/jive/thread.jspa?threadID=115507

Very nice -- I like the interaction with copies, and (like a few others) I 
think the default threshold could be much lower.  Like *many* others,  I look 
forward to trying it out.  =-)

Also see PSARC 2009/557 for a separate dedup capability within zfs send.

Case log:  http://arc.opensolaris.org/caselog/PSARC/2009/557/
Discussion:  http://www.opensolaris.org/jive/thread.jspa?threadID=115082

-cheers, CSB
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Apple shuts down open source ZFS project

2009-10-23 Thread Craig S. Bell
Sad to hear that Apple is apparently going in another direction.

http://www.macrumors.com/2009/10/23/apple-shuts-down-open-source-zfs-project/

-cheers, CSB
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] cryptic vdev name from fmdump

2009-10-23 Thread Richard Elling

On Oct 23, 2009, at 3:19 PM, Eric Schrock wrote:


On 10/23/09 15:05, Cindy Swearingen wrote:

I'm stumped too. Someone with more FM* experience needs to comment.


Looks like your errlog may have been rotated out of existence - see  
if there is a .X or .gz version in /var/fm/fmd/errlog*.  The  
list.suspect fault should be including a location field that would  
contain the human readable name for the vdev, but this work  
(extending the libtopo scheme to support enumeration and label  
properties) hasn't yet been done. There is also a small change that  
needs to be made to fmd to support location for non-FRUs.  You  
should to able to do echo ::spa -c | mdb -k and look for that vdev  
id, assuming the vdev is still active on the system.


These are the guids, correct?  If so, then zdb -C will show them.
Conversion of hex-decimal or verse vica is an exercise for the reader.
 -richard



- Eric


Cindy
On 10/23/09 14:52, sean walmsley wrote:

Thanks for this information.

We have a weekly scrub schedule, but I ran another just to be  
sure :-) It completed with 0 errors.


Running fmdump -eV gives:
TIME   CLASS
fmdump: /var/fm/fmd/errlog is empty

Dumping the faultlog (no -e) does give some output, but again  
there are no human readable identifiers:


... (some stuff omitted)
   (start fault-list[0])
   nvlist version: 0
   version = 0x0
   class = fault.fs.zfs.device
   certainty = 0x64
   asru = (embedded nvlist)
   nvlist version: 0
   version = 0x0
   scheme = zfs
   pool = 0x4fcdc2c9d60a5810
   vdev = 0x179e471c0732582
   (end asru)

   resource = (embedded nvlist)
   nvlist version: 0
   version = 0x0
   scheme = zfs
   pool = 0x4fcdc2c9d60a5810
   vdev = 0x179e471c0732582
   (end resource)

   (end fault-list[0])

So, I'm still stumped.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss



--
Eric Schrock, Fishworkshttp://blogs.sun.com/eschrock
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] heads up on SXCE build 125 (LU + mirrored root pools)

2009-10-23 Thread Cindy Swearingen

Probably if you try to use any LU operation after you have upgraded to
build 125.

cs

On 10/23/09 16:18, Chris Du wrote:

Sorry, do you mean luupgrade from previous versions or from 125 to future 
versions?

I luupgrade from 124 to 125 with mirrored root pool and everything is working 
fine.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SNV_125 MPT warning in logfile

2009-10-23 Thread Tim Cook
On Fri, Oct 23, 2009 at 3:48 PM, Bruno Sousa bso...@epinfante.com wrote:

 Could Sun'x x4540 Thumper reason to have 6 LSI's some sort of hidden
 problems found by Sun where the HBA resets, and due to market time pressure
 the quick and dirty solution was to spread the load over multiple HBA's
 instead of software fix?

 Just my 2 cents..


 Bruno


What else were you expecting them to do?  According to LSI's website, the
1068e in an x8 configuration is an 8-port card.
http://www.lsi.com/DistributionSystem/AssetDocument/files/docs/marketing_docs/storage_stand_prod/SCG_LSISAS1068E_PB_040407.pdf

While they could've used expanders, that just creates one more component
that can fail/have issues.  Looking at the diagram, they've taken the
absolute shortest I/O path possible, which is what I would hope to
see/expect.
http://www.sun.com/servers/x64/x4540/server_architecture.pdf

One drive per channel, 6 channels total.

I also wouldn't be surprised to find out that they found this the optimal
configuration from a performance/throughput/IOPS perspective as well.  Can't
seem to find those numbers published by LSI.

--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] PSARC 2009/571: ZFS deduplication properties

2009-10-23 Thread BJ Quinn
Anyone know if this means that this will actually show up in SNV soon, or 
whether it will make 2010.02?  (on disk dedup specifically)
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] heads up on SXCE build 125 (LU + mirrored root pools)

2009-10-23 Thread Kurt Schreiner
Hi,

On Mon, Oct 19, 2009 at 05:03:18PM -0600, Cindy Swearingen wrote:
 Currently, the device naming changes in build 125 mean that you cannot
 use Solaris Live Upgrade to upgrade or patch a ZFS root dataset in a
 mirrored root pool.
 [...]
Just ran into this yesterday... The change to get things going again
is ~ trivial:

-1014: diff -u /usr/lib/lu/lulib{.ori,}
--- /usr/lib/lu/lulib.ori   Thu Oct 22 22:42:19 2009
+++ /usr/lib/lu/lulib   Sat Oct 24 01:21:41 2009
@@ -236,6 +236,7 @@
start=`echo $blob | /usr/bin/grep -n $lgzd_pool | head -2 | tail +2 
| cut -d: -f1`
start=`expr $start + 1`
echo $blob | tail +$start | awk '{print $1}' | while read dev; do
+   dev=`echo $dev | sed 's/mirror.*/mirror/'`
if [ -z $dev ]; then
   continue;
elif [ $dev = errors: ]; then

With this little hack luactivate, lucreate and ludelete (that's what I just
tested) are working again...

YMMV

Kurt
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SNV_125 MPT warning in logfile

2009-10-23 Thread Adam Cheal
I don't think there was any intention on Sun's part to ignore the 
problem...obviously their target market wants a performance-oriented box and 
the x4540 delivers that. Each 1068E controller chip supports 8 SAS PHY channels 
= 1 channel per drive = no contention for channels. The x4540 is a monster and 
performs like a dream with snv_118 (we have a few ourselves).

My issue is that implementing an archival-type solution demands a dense, simple 
storage platform that performs at a reasonable level, nothing more. Our design 
has the same controller chip (8 SAS PHY channels) driving 46 disks, so there is 
bound to be contention there especially in high-load situations. I just need it 
to work and handle load gracefully, not timeout and cause disk failures; at 
this point I can't even scrub the zpools to verify the data we have on there is 
valid. From a hardware perspective, the 3801E card is spec'ed to handle our 
architecture; the OS just seems to fall over somewhere though and not be able 
to throttle itself in certain intensive IO situations.

That said, I don't know whether to point the finger at LSI's firmware or 
mpt-driver/ZFS. Sun obviously has a good relationship with LSI as their 1068E 
is the recommended SAS controller chip and is used in their own products. At 
least we've got a bug filed now, and we can hopefully follow this through to 
find out where the system breaks down.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SNV_125 MPT warning in logfile

2009-10-23 Thread Tim Cook
On Fri, Oct 23, 2009 at 6:32 PM, Adam Cheal ach...@pnimedia.com wrote:

 I don't think there was any intention on Sun's part to ignore the
 problem...obviously their target market wants a performance-oriented box and
 the x4540 delivers that. Each 1068E controller chip supports 8 SAS PHY
 channels = 1 channel per drive = no contention for channels. The x4540 is a
 monster and performs like a dream with snv_118 (we have a few ourselves).

 My issue is that implementing an archival-type solution demands a dense,
 simple storage platform that performs at a reasonable level, nothing more.
 Our design has the same controller chip (8 SAS PHY channels) driving 46
 disks, so there is bound to be contention there especially in high-load
 situations. I just need it to work and handle load gracefully, not timeout
 and cause disk failures; at this point I can't even scrub the zpools to
 verify the data we have on there is valid. From a hardware perspective, the
 3801E card is spec'ed to handle our architecture; the OS just seems to fall
 over somewhere though and not be able to throttle itself in certain
 intensive IO situations.

 That said, I don't know whether to point the finger at LSI's firmware or
 mpt-driver/ZFS. Sun obviously has a good relationship with LSI as their
 1068E is the recommended SAS controller chip and is used in their own
 products. At least we've got a bug filed now, and we can hopefully follow
 this through to find out where the system breaks down.


Have you checked in with LSI to verify the IOPS ability of the chip?  Just
because it supports having 46 drives attached to one ASIC doesn't mean it
can actually service all 46 at once.  You're talking (VERY conservatively)
2800 IOPS.

Even ignoring that, I know for a fact that the chip can't handle raw
throughput numbers on 46 disks unless you've got some very severe raid
overhead.  That chip is good for roughly 2GB/sec each direction.  46 7200RPM
drives can fairly easily push 4x that amount in streaming IO loads.

Long story short, it appears you've got a 5lbs bag a 50lbs load...

--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Checksums

2009-10-23 Thread Tim Cook
So, from what I gather, even though the documentation appears to state
otherwise, default checksums have been changed to SHA256.  Making that
assumption, I have two questions.

First, is the default updated from fletcher2 to SHA256 automatically for a
pool that was created with an older version of zfs and then upgraded to the
latest?  Second, would all of the blocks be re-checksummed with a zfs
send/receive on the receiving side?

--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] cryptic vdev name from fmdump

2009-10-23 Thread sean walmsley
Eric and Richard - thanks for your responses.

I tried both:

  echo ::spa -c | mcb
  zdb -C (not much of a man page for this one!)

and was able to match the POOL id from the log (hex 4fcdc2c9d60a5810) with both 
outputs. As Richard pointed out, I needed to convert the hex value to decimal 
to get a match with the zdb output.

In neither case, however, was I able to get a match with the disk vdev id from 
the fmdump output.

It turns out that a disk in this machine was replaced about a month ago, and 
sure enough the vdev that was complaining at the time was the 0x179e471c0732582 
vdev that is now missing.
What's confusing is that the fmd message I posted about is dated Oct 22 whereas 
the original error and replacement happened back in September. An fmadm 
faulty on the machine currently doesn't return any issues.

After physically replacing the bad drive and issuing the zpool replace 
command, I think that we probably issued the fmadm repair uuid command in 
line with what Sun has asked us to do in the past. In our experience, if you 
don't do this then fmd will re-issue duplicate complaints regarding hardware 
failures after every reboot until you do. In this case, perhaps a repair 
wasn't really the appropriate command since we actually replaced the drive. 
Would a fmadm flush have been better? Perhaps a clean reboot is in order?

So, it looks like the root problem here is that fmd is confused rather than 
there being a real issue with ZFS. Despite this, we're happy to know that we 
can now match vdevs against physical devices using either the mdb trick or zdb.

We've followed Eric's work on ZFS device enumeration for the Fishwork project 
with great interest - hopefully this will eventually get extended to the fmdump 
output as suggested.

Sean Walmsley
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] cryptic vdev name from fmdump

2009-10-23 Thread Eric Schrock

On 10/23/09 16:56, sean walmsley wrote:

Eric and Richard - thanks for your responses.

I tried both:

  echo ::spa -c | mcb
  zdb -C (not much of a man page for this one!)

and was able to match the POOL id from the log (hex 4fcdc2c9d60a5810) with both 
outputs. As Richard pointed out, I needed to convert the hex value to decimal 
to get a match with the zdb output.

In neither case, however, was I able to get a match with the disk vdev id from 
the fmdump output.

It turns out that a disk in this machine was replaced about a month ago, and 
sure enough the vdev that was complaining at the time was the 0x179e471c0732582 
vdev that is now missing.
What's confusing is that the fmd message I posted about is dated Oct 22 whereas the 
original error and replacement happened back in September. An fmadm faulty on 
the machine currently doesn't return any issues.


That message indicates that a previous problem was repaired, not a new 
diagnosis.



After physically replacing the bad drive and issuing the zpool replace command, I think that we probably issued 
the fmadm repair uuid command in line with what Sun has asked us to do in the past. In our experience, if 
you don't do this then fmd will re-issue duplicate complaints regarding hardware failures after every reboot until you do. In 
this case, perhaps a repair wasn't really the appropriate command since we actually replaced the drive. Would a 
fmadm flush have been better? Perhaps a clean reboot is in order?

So, it looks like the root problem here is that fmd is confused rather than 
there being a real issue with ZFS. Despite this, we're happy to know that we 
can now match vdevs against physical devices using either the mdb trick or zdb.


This is fixed in build 127 via:

6889827 ZFS retire agent needs to do a better job of staying in sync


We've followed Eric's work on ZFS device enumeration for the Fishwork project 
with great interest - hopefully this will eventually get extended to the fmdump 
output as suggested.


Yep, we're working on it ;-)

- Eric

--
Eric Schrock, Fishworkshttp://blogs.sun.com/eschrock
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Change physical path to a zpool.

2009-10-23 Thread Jon Aimone

Hi,

I have a functional OpenSolaris x64 system on which I need to physically
move the boot disk, meaning its physical device path will change and
probably its cXdX name.

When I do this the system fails to boot. The error messages indicate
that it's still trying to read from the original path.

I know the system can boot from the disk in its new location, because
it's Solaris complaining -- long after the boot loader is done.

How do I inform ZFS of the new path?

If this disk were simply a data pool, I could just import it and be
done, but this is the boot disk. The system never comes far enough up to
run zpool...

Do I need to boot from the LiveCD and then import the pool from its new
path?

Please reply directly to me. I'm not on this alias yet.

--
~~~\0/
Cheers,
Jon.
{-%]

If you always do what you've always done, you'll always get what you've always 
gotten.

- Anon.

When someone asks you, Penny for your thoughts, and you put your two cents 
in, what happens to the other penny?

- G. Carlin (May 12, 1937 - June 22, 2008)


attachment: Jon_Aimone.vcf___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Change physical path to a zpool.

2009-10-23 Thread Jon Aimone

Hi,

Check that... I'm on the alias now...

Jon Aimone spake thusly, on or about 10/23/09 17:15:

Hi,

I have a functional OpenSolaris x64 system on which I need to physically
move the boot disk, meaning its physical device path will change and
probably its cXdX name.

When I do this the system fails to boot. The error messages indicate
that it's still trying to read from the original path.

I know the system can boot from the disk in its new location, because
it's Solaris complaining -- long after the boot loader is done.

How do I inform ZFS of the new path?

If this disk were simply a data pool, I could just import it and be
done, but this is the boot disk. The system never comes far enough up to
run zpool...

Do I need to boot from the LiveCD and then import the pool from its new
path?

Please reply directly to me. I'm not on this alias yet.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
  


--
~~~\0/
Cheers,
Jon.
{-%]

If you always do what you've always done, you'll always get what you've always 
gotten.
- Anon.

When someone asks you, Penny for your thoughts, and you put your two cents 
in, what happens to the other penny?
- G. Carlin (May 12, 1937 - June 22, 2008)

attachment: Jon_Aimone.vcf___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SNV_125 MPT warning in logfile

2009-10-23 Thread Adam Cheal
LSI's sales literature on that card specs 128 devices which I take with a few 
hearty grains of salt. I agree that with all 46 drives pumping out streamed 
data, the controller would be overworked BUT the drives will only deliver data 
as fast as the OS tells them to. Just because the speedometer says 200 mph max 
doesn't mean we should (or even can!) go that fast.

The IO intensive operations that trigger our timeout issues are a small 
percentage of the actual normal IO we do to the box. Most of the time the 
solution happily serves up archived data, but when it comes time to scrub or do 
mass operations on the entire dataset bad things happen. It seems a waste to 
architect a more expensive performance-oriented solution when you aren't going 
to use that performance the majority of the time. There is a balance between 
performance and functionality, but I still feel that we should be able to make 
this situation work.

Ideally, the OS could dynamically adapt to slower storage and throttle its IO 
requests accordingly. At the least, it could allow the user to specify some IO 
thresholds so we can cage the beast if need be. We've tried some manual 
tuning via kernel parameters to restrict max queued operations per vdev and 
also a scrub related one (specifics escape me), but it still manages to 
overload itself.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SNV_125 MPT warning in logfile

2009-10-23 Thread Richard Elling

On Oct 23, 2009, at 4:46 PM, Tim Cook wrote:
On Fri, Oct 23, 2009 at 6:32 PM, Adam Cheal ach...@pnimedia.com  
wrote:
I don't think there was any intention on Sun's part to ignore the  
problem...obviously their target market wants a performance-oriented  
box and the x4540 delivers that. Each 1068E controller chip supports  
8 SAS PHY channels = 1 channel per drive = no contention for  
channels. The x4540 is a monster and performs like a dream with  
snv_118 (we have a few ourselves).


My issue is that implementing an archival-type solution demands a  
dense, simple storage platform that performs at a reasonable level,  
nothing more. Our design has the same controller chip (8 SAS PHY  
channels) driving 46 disks, so there is bound to be contention there  
especially in high-load situations. I just need it to work and  
handle load gracefully, not timeout and cause disk failures; at  
this point I can't even scrub the zpools to verify the data we have  
on there is valid. From a hardware perspective, the 3801E card is  
spec'ed to handle our architecture; the OS just seems to fall over  
somewhere though and not be able to throttle itself in certain  
intensive IO situations.


That said, I don't know whether to point the finger at LSI's  
firmware or mpt-driver/ZFS. Sun obviously has a good relationship  
with LSI as their 1068E is the recommended SAS controller chip and  
is used in their own products. At least we've got a bug filed now,  
and we can hopefully follow this through to find out where the  
system breaks down.



Have you checked in with LSI to verify the IOPS ability of the  
chip?  Just because it supports having 46 drives attached to one  
ASIC doesn't mean it can actually service all 46 at once.  You're  
talking (VERY conservatively) 2800 IOPS.


Tim has a valid point. By default, ZFS will queue 35 commands per disk.
For 46 disks that is 1,610 concurrent I/Os.  Historically, it has  
proven to be

relatively easy to crater performance or cause problems with very, very,
very expensive arrays that are easily overrun by Solaris. As a result,  
it is
not uncommon to see references to setting throttles, especially in  
older docs.


Fortunately, this is  simple to test by reducing the number of I/Os ZFS
will queue.  See the Evil Tuning Guide
http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide#Device_I.2FO_Queue_Size_.28I.2FO_Concurrency.29

The mpt source is not open, so the mpt driver's reaction to 1,610  
concurrent
I/Os can only be guessed from afar -- public LSI docs mention a number  
of 511
concurrent I/Os for SAS1068, but it is not clear to me that is an  
explicit limit.  If

you have success with zfs_vdev_max_pending set to 10, then the mystery
might be solved. Use iostat to observe the wait and actv columns, which
show the number of transactions in the queues.  JCMP?

NB sometimes a driver will have the limit be configurable. For  
example, to get
high performance out of a high-end array attached to a qlc card, I've  
set
the execution-throttle in /kernel/drv/qlc.conf to be more than two  
orders of
magnitude greater than its default of 32. /kernel/drv/mpt*.conf does  
not seem

to have a similar throttle.
 -- richard

Even ignoring that, I know for a fact that the chip can't handle raw  
throughput numbers on 46 disks unless you've got some very severe  
raid overhead.  That chip is good for roughly 2GB/sec each  
direction.  46 7200RPM drives can fairly easily push 4x that amount  
in streaming IO loads.


Long story short, it appears you've got a 5lbs bag a 50lbs load...

--Tim

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] heads up on SXCE build 125 (LU + mirrored root pools)

2009-10-23 Thread Chris Du
What luupgrade do you use?
I uninstall lu package in current build first, then install new lu package in 
the verion to upgrade.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SNV_125 MPT warning in logfile

2009-10-23 Thread Tim Cook
On Fri, Oct 23, 2009 at 7:17 PM, Adam Cheal ach...@pnimedia.com wrote:

 LSI's sales literature on that card specs 128 devices which I take with a
 few hearty grains of salt. I agree that with all 46 drives pumping out
 streamed data, the controller would be overworked BUT the drives will only
 deliver data as fast as the OS tells them to. Just because the speedometer
 says 200 mph max doesn't mean we should (or even can!) go that fast.

 The IO intensive operations that trigger our timeout issues are a small
 percentage of the actual normal IO we do to the box. Most of the time the
 solution happily serves up archived data, but when it comes time to scrub or
 do mass operations on the entire dataset bad things happen. It seems a waste
 to architect a more expensive performance-oriented solution when you aren't
 going to use that performance the majority of the time. There is a balance
 between performance and functionality, but I still feel that we should be
 able to make this situation work.

 Ideally, the OS could dynamically adapt to slower storage and throttle its
 IO requests accordingly. At the least, it could allow the user to specify
 some IO thresholds so we can cage the beast if need be. We've tried some
 manual tuning via kernel parameters to restrict max queued operations per
 vdev and also a scrub related one (specifics escape me), but it still
 manages to overload itself.
 --


Where are you planning on queueing up those requests?  The scrub, I can
understand wanting throttling, but what about your user workload?  Unless
you're talking about EXTREMELY  short bursts of I/O, what do you suggest the
OS do?  If you're sending 3000 IOPS at the box from a workstation, where is
that workload going to sit if you're only dumping 500 IOPS to disk?  The
only thing that will change is that your client will timeout instead of your
disks.

I don't recall seeing what generates the I/O, but I do recall that it's
backup.  My assumption would be it's something coming in over the network,
in which case I'd say you're far, far better off throttling at the network
stack.

--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SNV_125 MPT warning in logfile

2009-10-23 Thread Tim Cook
On Fri, Oct 23, 2009 at 7:17 PM, Richard Elling richard.ell...@gmail.comwrote:


 Tim has a valid point. By default, ZFS will queue 35 commands per disk.
 For 46 disks that is 1,610 concurrent I/Os.  Historically, it has proven to
 be
 relatively easy to crater performance or cause problems with very, very,
 very expensive arrays that are easily overrun by Solaris. As a result, it
 is
 not uncommon to see references to setting throttles, especially in older
 docs.

 Fortunately, this is  simple to test by reducing the number of I/Os ZFS
 will queue.  See the Evil Tuning Guide

 http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide#Device_I.2FO_Queue_Size_.28I.2FO_Concurrency.29

 The mpt source is not open, so the mpt driver's reaction to 1,610
 concurrent
 I/Os can only be guessed from afar -- public LSI docs mention a number of
 511
 concurrent I/Os for SAS1068, but it is not clear to me that is an explicit
 limit.  If
 you have success with zfs_vdev_max_pending set to 10, then the mystery
 might be solved. Use iostat to observe the wait and actv columns, which
 show the number of transactions in the queues.  JCMP?

 NB sometimes a driver will have the limit be configurable. For example, to
 get
 high performance out of a high-end array attached to a qlc card, I've set
 the execution-throttle in /kernel/drv/qlc.conf to be more than two orders
 of
 magnitude greater than its default of 32. /kernel/drv/mpt*.conf does not
 seem
 to have a similar throttle.
  -- richard



I believe there's a caveat here though.  That really only helps if the total
I/O load is actually enough for the controller to handle.  If the sustained
I/O workload is still 1600 concurrent I/O's, lowering the batch won't
actually cause any difference in the timeouts, will it?  It would obviously
eliminate burstiness (yes, I made that word up), but if the total sustained
I/O load is greater than the ASIC can handle, it's still going to fall over
and die with a queue of 10, correct?

--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Checksums

2009-10-23 Thread Adam Leventhal
On Fri, Oct 23, 2009 at 06:55:41PM -0500, Tim Cook wrote:
 So, from what I gather, even though the documentation appears to state
 otherwise, default checksums have been changed to SHA256.  Making that
 assumption, I have two questions.

That's false. The default checksum has changed from fletcher2 to fletcher4
that is to say, the definition of the value of 'on' has changed.

 First, is the default updated from fletcher2 to SHA256 automatically for a
 pool that was created with an older version of zfs and then upgraded to the
 latest?  Second, would all of the blocks be re-checksummed with a zfs
 send/receive on the receiving side?

As with all property changes, new writes get the new properties. Old data
is not rewritten.

Adam

-- 
Adam Leventhal, Fishworks http://blogs.sun.com/ahl
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Checksums

2009-10-23 Thread Tim Cook
On Fri, Oct 23, 2009 at 7:19 PM, Adam Leventhal a...@eng.sun.com wrote:

 On Fri, Oct 23, 2009 at 06:55:41PM -0500, Tim Cook wrote:
  So, from what I gather, even though the documentation appears to state
  otherwise, default checksums have been changed to SHA256.  Making that
  assumption, I have two questions.

 That's false. The default checksum has changed from fletcher2 to fletcher4
 that is to say, the definition of the value of 'on' has changed.

  First, is the default updated from fletcher2 to SHA256 automatically for
 a
  pool that was created with an older version of zfs and then upgraded to
 the
  latest?  Second, would all of the blocks be re-checksummed with a zfs
  send/receive on the receiving side?

 As with all property changes, new writes get the new properties. Old data
 is not rewritten.

 Adam



Adam,

Thank you for the correction.  My next question is, do you happen to know
what the overhead difference between fletcher4 and SHA256 is?  Is the
checksumming multi-threaded in nature?  I know my fileserver has a lot of
spare cpu cycles, but it would be good to know if I'm going to take a
substantial hit in throughput moving from one to the other.

--Tim
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Sun Flash Accelerator F20

2009-10-23 Thread Eric D. Mudama

On Tue, Oct 20 at 21:54, Bob Friesenhahn wrote:

On Tue, 20 Oct 2009, Richard Elling wrote:


Intel:  X-25E read latency 75 microseconds


... but they don't say where it was measured or how big it was...


Probably measured using a logic analyzer and measuring the time from 
the last bit of the request going in, to the first bit of the 
response coming out.  It is not clear if this latency is a minimum, 
maximum, median, or average.  It is not clear if this latency is 
while the device is under some level of load, or if it is in a 
quiescent state.


This is one of the skimpiest specification sheets that I have ever 
seen for an enterprise product.


It may be skimpy compared to what you're used to, but it seems to
answer most of your questions, looking at the public X25-E data sheet
on intel.com:

The latency numbers clearly indicate typical which I equate to
average (perhaps incorrectly) and regarding the system load, they're
measured doing 4KB reads or writes with a queue depth of 1 which is
traditionally considered very light loading.  Correct, it doesn't
explicitly state whether the data transfer phase via 3Gbit/s SATA is
included or not.  At 300MB/s the bus transfer is relatively small,
even compared to the quoted numbers.

The read and write performance IOPS numbers clearly indicate a SATA
queue depth of 32, with write cacheing enabled, and that every LBA on
the device has been written to (device 100% full) prior to
measurement, which answers some (granted not all) of the questions
about device preparations before measurement.  The full pack IO cases
should be worst case, since the device has the minimum available spare
area for managing wear leveling, garbage collection, and other
features at that point.

The inverse of latency under load in the IOPS testing should give you
a number that you can multiply by queue depth to get typical
individual command latency under load. (Assuming commands are done in
order or mostly in order) It also gives you the typical bandwidth
under load as well, which is about 13MB/s in full-pack random 4KB
writes and about 140MB/s in full-pack random 4KB reads.

--eric

--
Eric D. Mudama
edmud...@mail.bounceswoosh.org

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SNV_125 MPT warning in logfile

2009-10-23 Thread Adam Cheal
And therein lies the issue. The excessive load that causes the IO issues is 
almost always generated locally from a scrub or a local recursive ls used to 
warm up the SSD-based zpool cache with metadata. The regular network IO to the 
box is minimal and is very read-centric; once we load the box up with archived 
data (which generally happens in a short amount of time), we simply serve it 
out as needed.

As far as queueing goes, I would expect the system to queue bursts of IO in 
memory with appropriate timeouts, as required. These timeouts could either be 
manually or auto-magically adjusted to deal with the slower storage hardware. 
Obviously sustained intense IO requests would eventually blow up the queue so 
the goal here is to avoid creating those situations in the first place. We can 
throttle the network IO, if needed; I need the OS to know it's own local IO 
boundaries though and not attempt to overwork itself during scrubs etc.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SNV_125 MPT warning in logfile

2009-10-23 Thread Richard Elling

On Oct 23, 2009, at 5:32 PM, Tim Cook wrote:
On Fri, Oct 23, 2009 at 7:17 PM, Richard Elling richard.ell...@gmail.com 
 wrote:


Tim has a valid point. By default, ZFS will queue 35 commands per  
disk.
For 46 disks that is 1,610 concurrent I/Os.  Historically, it has  
proven to be
relatively easy to crater performance or cause problems with very,  
very,
very expensive arrays that are easily overrun by Solaris. As a  
result, it is
not uncommon to see references to setting throttles, especially in  
older docs.


Fortunately, this is  simple to test by reducing the number of I/Os  
ZFS

will queue.  See the Evil Tuning Guide
http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide#Device_I.2FO_Queue_Size_.28I.2FO_Concurrency.29

The mpt source is not open, so the mpt driver's reaction to 1,610  
concurrent
I/Os can only be guessed from afar -- public LSI docs mention a  
number of 511
concurrent I/Os for SAS1068, but it is not clear to me that is an  
explicit limit.  If

you have success with zfs_vdev_max_pending set to 10, then the mystery
might be solved. Use iostat to observe the wait and actv columns,  
which

show the number of transactions in the queues.  JCMP?

NB sometimes a driver will have the limit be configurable. For  
example, to get
high performance out of a high-end array attached to a qlc card,  
I've set
the execution-throttle in /kernel/drv/qlc.conf to be more than two  
orders of
magnitude greater than its default of 32. /kernel/drv/mpt*.conf does  
not seem

to have a similar throttle.
 -- richard



I believe there's a caveat here though.  That really only helps if  
the total I/O load is actually enough for the controller to handle.   
If the sustained I/O workload is still 1600 concurrent I/O's,  
lowering the batch won't actually cause any difference in the  
timeouts, will it?  It would obviously eliminate burstiness (yes, I  
made that word up), but if the total sustained I/O load is greater  
than the ASIC can handle, it's still going to fall over and die with  
a queue of 10, correct?


Yes, but since they are disks, and I'm assuming HDDs here, there is no
chance the disks will be faster than the host's ability to send I/Os ;-)
iostat will show what the queues look like.
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS port to Linux

2009-10-23 Thread Anurag Agarwal
Hi Joerg,

Thanks for this clarification. We understand that we can distribute ZFS
binary under a non GPL license, as long as it does not use GPL symbols.

Our plan regarding ZFS is to first port it to Linux kernel and then make its
binary distributions available for various different distributions of Linux.
These binary distribution will be in form of loadable kernel modules and
commands.

Once we get ready with ZFS port then we will start sharing our plans for its
binary distributions.
Feel free to contact us if anyone is interested in ZFS port on specific
linux distribution.

Regards,
Anurag.

On Sat, Oct 24, 2009 at 12:20 AM, Joerg Schilling 
joerg.schill...@fokus.fraunhofer.de wrote:

 David Dyer-Bennet d...@dd-b.net wrote:

  The problem with this, I think, is that to be used by any significant
  number of users, the module has to be included in a distribution, not
 just
  distributed by itself.  (And the different distributions have their own
  policies on what they will and won't consider including in terms of
  licenses.)

 For this argument, I recommend to read the OpenSource Definition at:
 http://www.opensource.org/docs/definition.php in special look at section
 9.

 The FSF grants you that the GPL is an OSS compliant license, so there is no
 difference between shipping ZFS separately and shipping it as part of a
 distro.

 Jörg

 --
  
 EMail:jo...@schily.isdn.cs.tu-berlin.deemail%3ajo...@schily.isdn.cs.tu-berlin.de(home)
  Jörg Schilling D-13353 Berlin
   j...@cs.tu-berlin.de(uni)
   joerg.schill...@fokus.fraunhofer.de (work) Blog:
 http://schily.blogspot.com/
  URL:  http://cdrecord.berlios.de/private/ ftp://ftp.berlios.de/pub/schily
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss




-- 
Anurag Agarwal
CEO, Founder
KQ Infotech, Pune
www.kqinfotech.com
9881254401
Coordinator Akshar Bharati
www.aksharbharati.org
Spreading joy through reading
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SNV_125 MPT warning in logfile

2009-10-23 Thread Adam Cheal
Here is example of the pool config we use:

# zpool status
  pool: pool002
 state: ONLINE
 scrub: scrub stopped after 0h1m with 0 errors on Fri Oct 23 23:07:52 2009
config:

NAME STATE READ WRITE CKSUM
pool002  ONLINE   0 0 0
  raidz2 ONLINE   0 0 0
c9t18d0  ONLINE   0 0 0
c9t17d0  ONLINE   0 0 0
c9t55d0  ONLINE   0 0 0
c9t13d0  ONLINE   0 0 0
c9t15d0  ONLINE   0 0 0
c9t16d0  ONLINE   0 0 0
c9t11d0  ONLINE   0 0 0
c9t12d0  ONLINE   0 0 0
c9t14d0  ONLINE   0 0 0
c9t9d0   ONLINE   0 0 0
c9t8d0   ONLINE   0 0 0
c9t10d0  ONLINE   0 0 0
c9t29d0  ONLINE   0 0 0
c9t28d0  ONLINE   0 0 0
c9t27d0  ONLINE   0 0 0
c9t23d0  ONLINE   0 0 0
c9t25d0  ONLINE   0 0 0
c9t26d0  ONLINE   0 0 0
c9t21d0  ONLINE   0 0 0
c9t22d0  ONLINE   0 0 0
c9t24d0  ONLINE   0 0 0
c9t19d0  ONLINE   0 0 0
  raidz2 ONLINE   0 0 0
c9t30d0  ONLINE   0 0 0
c9t31d0  ONLINE   0 0 0
c9t32d0  ONLINE   0 0 0
c9t33d0  ONLINE   0 0 0
c9t34d0  ONLINE   0 0 0
c9t35d0  ONLINE   0 0 0
c9t36d0  ONLINE   0 0 0
c9t37d0  ONLINE   0 0 0
c9t38d0  ONLINE   0 0 0
c9t39d0  ONLINE   0 0 0
c9t40d0  ONLINE   0 0 0
c9t41d0  ONLINE   0 0 0
c9t42d0  ONLINE   0 0 0
c9t44d0  ONLINE   0 0 0
c9t45d0  ONLINE   0 0 0
c9t46d0  ONLINE   0 0 0
c9t47d0  ONLINE   0 0 0
c9t48d0  ONLINE   0 0 0
c9t49d0  ONLINE   0 0 0
c9t50d0  ONLINE   0 0 0
c9t51d0  ONLINE   0 0 0
c9t52d0  ONLINE   0 0 0
cache
  c8t2d0 ONLINE   0 0 0
  c8t3d0 ONLINE   0 0 0
spares
  c9t20d0AVAIL   
  c9t43d0AVAIL   

errors: No known data errors

  pool: rpool
 state: ONLINE
 scrub: none requested
config:

NAME  STATE READ WRITE CKSUM
rpool ONLINE   0 0 0
  mirror  ONLINE   0 0 0
c8t0d0s0  ONLINE   0 0 0
c8t1d0s0  ONLINE   0 0 0

errors: No known data errors

...and here is a snapshot of the system using iostat -indexC 5 during a scrub 
of pool002 (c8 is onboard AHCI controller, c9 is LSI SAS 3801E):

  extended device statistics    errors --- 
r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b s/w h/w trn tot 
device
0.00.00.00.0  0.0  0.00.00.0   0   0   0   0   0   0 c8
0.00.00.00.0  0.0  0.00.00.0   0   0   0   0   0   0 
c8t0d0
0.00.00.00.0  0.0  0.00.00.0   0   0   0   0   0   0 
c8t1d0
0.00.00.00.0  0.0  0.00.00.0   0   0   0   0   0   0 
c8t2d0
0.00.00.00.0  0.0  0.00.00.0   0   0   0   0   0   0 
c8t3d0
 8738.70.0 555346.10.0  0.1 345.00.0   39.5   0 3875   0   1   1   
2 c9
  194.80.0 11936.90.0  0.0  7.90.0   40.3   0  87   0   0   0   0 
c9t8d0
  194.60.0 12927.90.0  0.0  7.60.0   38.9   0  86   0   0   0   0 
c9t9d0
  194.60.0 12622.60.0  0.0  8.10.0   41.7   0  90   0   0   0   0 
c9t10d0
  201.60.0 13350.90.0  0.0  8.00.0   39.5   0  90   0   0   0   0 
c9t11d0
  194.40.0 12902.30.0  0.0  7.80.0   40.1   0  88   0   0   0   0 
c9t12d0
  194.60.0 12902.30.0  0.0  7.70.0   39.3   0  88   0   0   0   0 
c9t13d0
  195.40.0 12479.00.0  0.0  8.50.0   43.4   0  92   0   0   0   0 
c9t14d0
  197.60.0 13107.40.0  0.0  8.10.0   41.0   0  92   0   0   0   0 
c9t15d0
  198.80.0 12918.10.0  0.0  8.20.0   41.4   0  92   0   0   0   0 
c9t16d0
  201.00.0 13350.30.0  0.0  8.10.0   40.4   0  91   0   0   0   0 
c9t17d0
  201.20.0 13325.00.0  0.0  7.80.0   38.5   0  88   0   0   0   0 
c9t18d0
  200.60.0 13021.50.0  0.0  8.20.0   40.7   0  91   0   0   0   0 
c9t19d0
0.00.00.00.0  0.0  0.00.00.0   0   0   0   0   0   0 
c9t20d0
  196.60.0 12991.9

Re: [zfs-discuss] Setting up an SSD ZIL - Need A Reality Check

2009-10-23 Thread Eric D. Mudama

On Tue, Oct 20 at 22:24, Frédéric VANNIERE wrote:


You can't use the Intel X25-E because it has a 32 or 64 MB volatile
cache that can't be disabled neither flushed by ZFS.


I don't believe the above statement is correct.

According to anandtech who asked Intel:

http://www.anandtech.com/cpuchipsets/intel/showdoc.aspx?i=3403p=10

the DRAM doesn't hold user data.  The article claims that data goes
through an internal 256KB buffer.

Is solaris incapable of issuing a SATA command FLUSH CACHE EXT?


--
Eric D. Mudama
edmud...@mail.bounceswoosh.org

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SNV_125 MPT warning in logfile

2009-10-23 Thread Richard Elling

ok, see below...

On Oct 23, 2009, at 8:14 PM, Adam Cheal wrote:


Here is example of the pool config we use:

# zpool status
 pool: pool002
state: ONLINE
scrub: scrub stopped after 0h1m with 0 errors on Fri Oct 23 23:07:52  
2009

config:

   NAME STATE READ WRITE CKSUM
   pool002  ONLINE   0 0 0
 raidz2 ONLINE   0 0 0
   c9t18d0  ONLINE   0 0 0
   c9t17d0  ONLINE   0 0 0
   c9t55d0  ONLINE   0 0 0
   c9t13d0  ONLINE   0 0 0
   c9t15d0  ONLINE   0 0 0
   c9t16d0  ONLINE   0 0 0
   c9t11d0  ONLINE   0 0 0
   c9t12d0  ONLINE   0 0 0
   c9t14d0  ONLINE   0 0 0
   c9t9d0   ONLINE   0 0 0
   c9t8d0   ONLINE   0 0 0
   c9t10d0  ONLINE   0 0 0
   c9t29d0  ONLINE   0 0 0
   c9t28d0  ONLINE   0 0 0
   c9t27d0  ONLINE   0 0 0
   c9t23d0  ONLINE   0 0 0
   c9t25d0  ONLINE   0 0 0
   c9t26d0  ONLINE   0 0 0
   c9t21d0  ONLINE   0 0 0
   c9t22d0  ONLINE   0 0 0
   c9t24d0  ONLINE   0 0 0
   c9t19d0  ONLINE   0 0 0
 raidz2 ONLINE   0 0 0
   c9t30d0  ONLINE   0 0 0
   c9t31d0  ONLINE   0 0 0
   c9t32d0  ONLINE   0 0 0
   c9t33d0  ONLINE   0 0 0
   c9t34d0  ONLINE   0 0 0
   c9t35d0  ONLINE   0 0 0
   c9t36d0  ONLINE   0 0 0
   c9t37d0  ONLINE   0 0 0
   c9t38d0  ONLINE   0 0 0
   c9t39d0  ONLINE   0 0 0
   c9t40d0  ONLINE   0 0 0
   c9t41d0  ONLINE   0 0 0
   c9t42d0  ONLINE   0 0 0
   c9t44d0  ONLINE   0 0 0
   c9t45d0  ONLINE   0 0 0
   c9t46d0  ONLINE   0 0 0
   c9t47d0  ONLINE   0 0 0
   c9t48d0  ONLINE   0 0 0
   c9t49d0  ONLINE   0 0 0
   c9t50d0  ONLINE   0 0 0
   c9t51d0  ONLINE   0 0 0
   c9t52d0  ONLINE   0 0 0
   cache
 c8t2d0 ONLINE   0 0 0
 c8t3d0 ONLINE   0 0 0
   spares
 c9t20d0AVAIL
 c9t43d0AVAIL

errors: No known data errors

 pool: rpool
state: ONLINE
scrub: none requested
config:

   NAME  STATE READ WRITE CKSUM
   rpool ONLINE   0 0 0
 mirror  ONLINE   0 0 0
   c8t0d0s0  ONLINE   0 0 0
   c8t1d0s0  ONLINE   0 0 0

errors: No known data errors

...and here is a snapshot of the system using iostat -indexC 5  
during a scrub of pool002 (c8 is onboard AHCI controller, c9 is  
LSI SAS 3801E):


 extended device statistics     
errors ---
   r/sw/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b s/w h/w  
trn tot device
   0.00.00.00.0  0.0  0.00.00.0   0   0   0
0   0   0 c8
   0.00.00.00.0  0.0  0.00.00.0   0   0   0
0   0   0 c8t0d0
   0.00.00.00.0  0.0  0.00.00.0   0   0   0
0   0   0 c8t1d0
   0.00.00.00.0  0.0  0.00.00.0   0   0   0
0   0   0 c8t2d0
   0.00.00.00.0  0.0  0.00.00.0   0   0   0
0   0   0 c8t3d0
8738.70.0 555346.10.0  0.1 345.00.0   39.5   0 3875
0   1   1   2 c9


You see 345 entries in the active queue. If the controller rolls over at
511 active entries, then it would explain why it would soon begin to
have difficulty.

Meanwhile, it is providing 8,738 IOPS and 555 MB/sec, which is quite
respectable.

 194.80.0 11936.90.0  0.0  7.90.0   40.3   0  87   0
0   0   0 c9t8d0


These disks are doing almost 200 read IOPS, but are not 100% busy.
Average I/O size is 66 KB, which is not bad, lots of little I/Os could  
be

worse, but at only 11.9 MB/s, you are not near the media bandwidth.
Average service time is 40.3 milliseconds, which is not super, but may
be reflective of contention in the channel.
So there is more capacity to accept I/O commands, but...

 194.60.0 12927.90.0  0.0  7.60.0   38.9   0  86   0
0   0   0 c9t9d0
 194.60.0 12622.60.0  0.0  8.10.0   41.7   0  90   0
0   0   0 c9t10d0
 201.60.0 13350.90.0  0.0  8.00.0   39.5   0  90   0
0   0   0 c9t11d0
 194.40.0 12902.30.0  0.0  7.80.0   40.1   0  88   0
0   0   0 c9t12d0
 194.60.0 12902.30.0  0.0  7.70.0   39.3   0  88   0
0   0   0 c9t13d0