Re: [zfs-discuss] importing pool with missing/failed log device

2009-10-22 Thread Victor Latushkin

On 21.10.09 23:23, Paul B. Henson wrote:

I've had a case open for a while (SR #66210171) regarding the inability to
import a pool whose log device failed while the pool was off line.

I was told this was CR #6343667,


CR 6343667 synopsis is scrub/resilver has to start over when a snapshot is 
taken:
http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6343667

so I do not see how it can be related to log removal.
Could you please check bug number in question?

regards,
victor


which was supposedly fixed in patches
141444-09/141445-09. However, I recently upgraded a system to U8 which
includes that kernel patch, and still am unable to import a pool with a
failed log device:

r...@ike ~ # zpool import
  pool: export
id: 4066329346842580031
 state: UNAVAIL
status: One or more devices are missing from the system.
action: The pool cannot be imported. Attach the missing
devices and try again.
   see: http://www.sun.com/msg/ZFS-8000-6X
config:

export  UNAVAIL  missing device
  mirrorONLINE
c0t0d0  ONLINE
c1t0d0  ONLINE
[...]
Additional devices are known to be part of this pool, though their
exact configuration cannot be determined.

I have not as yet updated the pool to the new version included in U8, but I
was not told that was a prerequisite to availing of the fix.

Is this issue supposed to have been fixed by that CR, or did that resolve
some other issue and I was misinformed on my support ticket?

Any information appreciated, thanks...

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Disk locating in OpenSolaris/Solaris 10

2009-10-22 Thread Bruno Sousa
If you use an LSI, maybe you install the LSI Logic MPT Configuration 
Utility.

Example of the usage :

lsiutil

LSI Logic MPT Configuration Utility, Version 1.61, September 18, 2008

1 MPT Port found

Port Name Chip Vendor/Type/RevMPT Rev  Firmware Rev  IOC
1.  mpt0  LSI Logic SAS1068E B3 105  011a 0

Select a device:  [1-1 or 0 to quit] 1

1.  Identify firmware, BIOS, and/or FCode
2.  Download firmware (update the FLASH)
4.  Download/erase BIOS and/or FCode (update the FLASH)
8.  Scan for devices
10.  Change IOC settings (interrupt coalescing)
13.  Change SAS IO Unit settings
16.  Display attached devices
20.  Diagnostics
21.  RAID actions
22.  Reset bus
23.  Reset target
42.  Display operating system names for devices
45.  Concatenate SAS firmware and NVDATA files
59.  Dump PCI config space
60.  Show non-default settings
61.  Restore default settings
66.  Show SAS discovery errors
69.  Show board manufacturing information
97.  Reset SAS link, HARD RESET
98.  Reset SAS link
99.  Reset port
e   Enable expert mode in menus
p   Enable paged mode
w   Enable logging

Main menu, select an option:  [1-99 or e/p/w or 0 to quit] 16

SAS1068E's links are down, down, down, down, 3.0 G, 3.0 G, 3.0 G, 3.0 G

B___T SASAddress PhyNum  Handle  Parent  Type
   500605b000eea990   0001   SAS Initiator
   500605b000eea991   0002   SAS Initiator
   500605b000eea992   0003   SAS Initiator
   500605b000eea993   0004   SAS Initiator
   500605b000eea994   0005   SAS Initiator
   500605b000eea995   0006   SAS Initiator
   500605b000eea996   0007   SAS Initiator
   500605b000eea997   0008   SAS Initiator
   50030480003d95ff   4 00090005   Edge Expander
0  10  50030480003d95c4 4 000a0009   SATA Target
0  11  50030480003d95c5 5 000b0009   SATA Target
0  12  50030480003d95c6 6 000c0009   SATA Target
0  13  50030480003d95c7 7 000d0009   SATA Target
0  14  50030480003d95c8 8 000e0009   SATA Target
0  15  50030480003d95c9 9 000f0009   SATA Target
0  17  50030480003d95ca10 00100009   SATA Target
0  16  50030480003d95cb11 00110009   SATA Target
0  18  50030480003d95cc12 00120009   SATA Target
0  19  50030480003d95cd13 00130009   SATA Target
0  20  50030480003d95ce14 00140009   SATA Target
0  21  50030480003d95cf15 00150009   SATA Target
0  22  50030480003d95d016 00160009   SATA Target
0  23  50030480003d95d117 00170009   SATA Target
0  24  50030480003d95d218 00180009   SATA Target
0  25  50030480003d95d319 00190009   SATA Target
0  26  50030480003d95d622 001a0009   SATA Target
0   8  50030480003d95fd36 001b0009   SAS Initiator and Target

The colum  PhyNum , im my case points out to the drive disk slot in the 
JBOD chassis.


However i don't know how this works with multipath.
The ideal solution would be to use the cfgadm with the hardware option, 
to put the disk led blinking.
Something like, as seen in 
http://docs.sun.com/app/docs/doc/816-5166/cfgadm-scsi-1m?a=view :



 Example 6 Display the Value of the Locator for a Disk

The following command displays the value of the locator for a disk. This 
example is specific to the SPARC Enterprise Server family:



# *cfgadm -x locator c0::dsk/c0t6d0*
 


The system responds with the following:


DiskLed
c0t6d0  locator=on

But maybe this option is just for SPARC with SCSI?

Bruno


SHOUJIN WANG wrote:

Hi there,
What I am tring to do is: Build a NAS storage server based on the following 
hardware architecture:
Server--SAS HBA---SAS JBOD
I plugin 2 SAS HBA cards into a X86 box, I also have 2 SAS I/O Modules on SAS JBOD. From each HBA card, I have one SAS cable which connects to SAS JBOD. 
Configured MPT successfully on server, I can see the single multipahted disks likes the following:

r...@super01:~# format
Searching for disks...done


AVAILABLE DISK SELECTIONS:
   0. c0t5000C5000D34BEDFd0 SEAGATE-ST31000640SS-0001-931.51GB
  /scsi_vhci/d...@g5000c5000d34bedf
   1. c0t5000C5000D34BF37d0 SEAGATE-ST31000640SS-0001-931.51GB
  /scsi_vhci/d...@g5000c5000d34bf37
   2. c0t5000C5000D34C727d0 SEAGATE-ST31000640SS-0001-931.51GB
  /scsi_vhci/d...@g5000c5000d34c727
   3. c0t5000C5000D34D0C7d0 SEAGATE-ST31000640SS-0001-931.51GB
  /scsi_vhci/d...@g5000c5000d34d0c7
   4. c0t5000C5000D34D85Bd0 SEAGATE-ST31000640SS-0001-931.51GB
  /scsi_vhci/d...@g5000c5000d34d85b

The problem is: if one of disks failed, I don't know how to locate the disk in 
chasiss. It is diffcult for failed disk replacement.

Is there any utility in opensoalris which can be used to locate/blink 

[zfs-discuss] SNV_125 MPT warning in logfile

2009-10-22 Thread Bruno Sousa

Hi all,

Recently i upgrade from snv_118 to snv_125, and suddently i started to 
see this messages at /var/adm/messages :


Oct 22 12:54:37 SAN02 scsi: [ID 243001 kern.warning] WARNING: 
/p...@0,0/pci10de,3...@a/pci1000,3...@0 (mpt0):
Oct 22 12:54:37 SAN02  mpt_handle_event: IOCStatus=0x8000, 
IOCLogInfo=0x3112011a
Oct 22 12:56:47 SAN02 scsi: [ID 243001 kern.warning] WARNING: 
/p...@0,0/pci10de,3...@a/pci1000,3...@0 (mpt0):
Oct 22 12:56:47 SAN02  mpt_handle_event_sync: IOCStatus=0x8000, 
IOCLogInfo=0x3112011a
Oct 22 12:56:47 SAN02 scsi: [ID 243001 kern.warning] WARNING: 
/p...@0,0/pci10de,3...@a/pci1000,3...@0 (mpt0):
Oct 22 12:56:47 SAN02  mpt_handle_event: IOCStatus=0x8000, 
IOCLogInfo=0x3112011a
Oct 22 12:56:50 SAN02 scsi: [ID 243001 kern.warning] WARNING: 
/p...@0,0/pci10de,3...@a/pci1000,3...@0 (mpt0):
Oct 22 12:56:50 SAN02  mpt_handle_event_sync: IOCStatus=0x8000, 
IOCLogInfo=0x3112011a
Oct 22 12:56:50 SAN02 scsi: [ID 243001 kern.warning] WARNING: 
/p...@0,0/pci10de,3...@a/pci1000,3...@0 (mpt0):
Oct 22 12:56:50 SAN02  mpt_handle_event: IOCStatus=0x8000, 
IOCLogInfo=0x3112011a



Is this a symptom of a disk error or some change was made in the 
driver?,that now i have more information, where in the past such 
information didn't appear?


Thanks,
Bruno

I'm using a LSI Logic SAS1068E B3 and i within lsiutil i have this 
behaviour :



1 MPT Port found

Port Name Chip Vendor/Type/RevMPT Rev  Firmware Rev  IOC
1.  mpt0  LSI Logic SAS1068E B3 105  011a 0

Select a device:  [1-1 or 0 to quit] 1

1.  Identify firmware, BIOS, and/or FCode
2.  Download firmware (update the FLASH)
4.  Download/erase BIOS and/or FCode (update the FLASH)
8.  Scan for devices
10.  Change IOC settings (interrupt coalescing)
13.  Change SAS IO Unit settings
16.  Display attached devices
20.  Diagnostics
21.  RAID actions
22.  Reset bus
23.  Reset target
42.  Display operating system names for devices
45.  Concatenate SAS firmware and NVDATA files
59.  Dump PCI config space
60.  Show non-default settings
61.  Restore default settings
66.  Show SAS discovery errors
69.  Show board manufacturing information
97.  Reset SAS link, HARD RESET
98.  Reset SAS link
99.  Reset port
e   Enable expert mode in menus
p   Enable paged mode
w   Enable logging

Main menu, select an option:  [1-99 or e/p/w or 0 to quit] 20

1.  Inquiry Test
2.  WriteBuffer/ReadBuffer/Compare Test
3.  Read Test
4.  Write/Read/Compare Test
8.  Read Capacity / Read Block Limits Test
12.  Display phy counters
13.  Clear phy counters
14.  SATA SMART Read Test
15.  SEP (SCSI Enclosure Processor) Test
18.  Report LUNs Test
19.  Drive firmware download
20.  Expander firmware download
21.  Read Logical Blocks
99.  Reset port
e   Enable expert mode in menus
p   Enable paged mode
w   Enable logging

Diagnostics menu, select an option:  [1-99 or e/p/w or 0 to quit] 12

Adapter Phy 0:  Link Down, No Errors

Adapter Phy 1:  Link Down, No Errors

Adapter Phy 2:  Link Down, No Errors

Adapter Phy 3:  Link Down, No Errors

Adapter Phy 4:  Link Up, No Errors

Adapter Phy 5:  Link Up, No Errors

Adapter Phy 6:  Link Up, No Errors

Adapter Phy 7:  Link Up, No Errors

Expander (Handle 0009) Phy 0:  Link Up
 Invalid DWord Count  79,967,229
 Running Disparity Error Count63,036,893
 Loss of DWord Synch Count   113
 Phy Reset Problem Count   0

Expander (Handle 0009) Phy 1:  Link Up
 Invalid DWord Count  79,967,207
 Running Disparity Error Count78,339,626
 Loss of DWord Synch Count   113
 Phy Reset Problem Count   0

Expander (Handle 0009) Phy 2:  Link Up
 Invalid DWord Count  76,717,646
 Running Disparity Error Count73,334,563
 Loss of DWord Synch Count   113
 Phy Reset Problem Count   0

Expander (Handle 0009) Phy 3:  Link Up
 Invalid DWord Count  79,896,409
 Running Disparity Error Count76,199,329
 Loss of DWord Synch Count   113
 Phy Reset Problem Count   0

Expander (Handle 0009) Phy 4:  Link Up, No Errors

Expander (Handle 0009) Phy 5:  Link Up, No Errors

Expander (Handle 0009) Phy 6:  Link Up, No Errors

Expander (Handle 0009) Phy 7:  Link Up, No Errors

Expander (Handle 0009) Phy 8:  Link Up, No Errors

Expander (Handle 0009) Phy 9:  Link Up, No Errors

Expander (Handle 0009) Phy 10:  Link Up, No Errors

Expander (Handle 0009) Phy 11:  Link Up, No Errors

Expander (Handle 0009) Phy 12:  Link Up, No Errors

Expander (Handle 0009) Phy 13:  Link Up, No Errors

Expander (Handle 0009) Phy 14:  Link Up, No Errors

Expander (Handle 0009) 

Re: [zfs-discuss] Setting up an SSD ZIL - Need A Reality Check

2009-10-22 Thread Edward Ned Harvey
 Replacing failed disks is easy when PERC is doing the RAID. Just remove
 the failed drive and replace with a good one, and the PERC will rebuild
 automatically. 

Sorry, not correct.  When you replace a failed drive, the perc card doesn't
know for certain that the new drive you're adding is meant to be a
replacement.  For all it knows, you could coincidentally be adding new disks
for a new VirtualDevice which already contains data, during the failure
state of some other device.  So it will not automatically resilver (which
would be a permanently destructive process, applied to a disk which is not
*certainly* meant for destruction).

You have to open the perc config interface, tell it this disk is a
replacement for the old disk (probably you're just saying This disk is the
new global hotspare) or else the new disk will sit there like a bump on a
log.  Doing nothing.

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Setting up an SSD ZIL - Need A Reality Check

2009-10-22 Thread Edward Ned Harvey
 The Intel specified random write IOPS are with the cache enabled and
 without cache flushing.  They also carefully only use a limited span
 of the device, which fits most perfectly with how the device is built.

How do you know this?  This sounds much more detailed than any average
person could ever know

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Setting up an SSD ZIL - Need A Reality Check

2009-10-22 Thread Ross
Actually, I think this is a case of crossed wires.  This issue was reported a 
while back on a news site for the X25-M G2.  Somebody pointed out that these 
devices have 8GB of cache, which is exactly the dataset size they use for the 
iops figures.

The X25-E datasheet however states that while write cache is enabled, the iops 
figures are over the entire drive.

And looking at the X25-M G2 datasheet again, it states that the measurements 
are over 8GB of range, but these come with 32MB of cache, so I think that was 
also a false alarm.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] raidz ZFS Best Practices wiki inconsistency

2009-10-22 Thread Frank Cusack

http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide#RAID-Z_Configuration_Requirements_and_Recommendations
says that the number of disks in a RAIDZ should be (N+P) with
N = {2,4,8} and P = {1,2}.

But if you go down the page just a little further to the thumper
configuration examples, none of the 3 examples follow this recommendation!

I will have 10 disks to put into a RAIDZ.  I would like as little waste
as possible, so that means just 1 hot spare, and a 3,3,3 config for the
remaining 9 is not appealing.  Should I do a single 9 disk RAIDZ, per
the guideline, or should I do 4,5.

This is for engineering data.  My workload isn't established yet but from
talking to the guys the working set would fit in a TB and just be local
to engineer workstations, while the file server will just store
infrequently used data. As such, I'm inclined to do a single 9 disk
RAIDZ and maximize the available disk space, which at the same time
follows the configuration guideline.

I'm pretty sure I already know the correct answer as I remember when
this guideline was created and why.

Besides just thinking out loud, I do want to emphasize the inconsistency
on the wiki and suggest that it be updated or a comment added.

-frank
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] raidz ZFS Best Practices wiki inconsistency

2009-10-22 Thread Cindy Swearingen

Thanks for your comments, Frank.

I will take a look at the inconsistencies.

Cindy

On 10/22/09 08:29, Frank Cusack wrote:
http://www.solarisinternals.com/wiki/index.php/ZFS_Best_Practices_Guide#RAID-Z_Configuration_Requirements_and_Recommendations 


says that the number of disks in a RAIDZ should be (N+P) with
N = {2,4,8} and P = {1,2}.

But if you go down the page just a little further to the thumper
configuration examples, none of the 3 examples follow this recommendation!

I will have 10 disks to put into a RAIDZ.  I would like as little waste
as possible, so that means just 1 hot spare, and a 3,3,3 config for the
remaining 9 is not appealing.  Should I do a single 9 disk RAIDZ, per
the guideline, or should I do 4,5.

This is for engineering data.  My workload isn't established yet but from
talking to the guys the working set would fit in a TB and just be local
to engineer workstations, while the file server will just store
infrequently used data. As such, I'm inclined to do a single 9 disk
RAIDZ and maximize the available disk space, which at the same time
follows the configuration guideline.

I'm pretty sure I already know the correct answer as I remember when
this guideline was created and why.

Besides just thinking out loud, I do want to emphasize the inconsistency
on the wiki and suggest that it be updated or a comment added.

-frank
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] importing pool with missing/failed log device

2009-10-22 Thread Paul B. Henson
On Thu, 22 Oct 2009, Victor Latushkin wrote:

 CR 6343667 synopsis is scrub/resilver has to start over when a snapshot is 
 taken:
 http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=6343667

 so I do not see how it can be related to log removal.
 Could you please check bug number in question?

Ack, my bad, too many open cases :(, sorry. The correct bug for the
inquiry is CR 6707530.

Thanks...

-- 
Paul B. Henson  |  (909) 979-6361  |  http://www.csupomona.edu/~henson/
Operating Systems and Network Analyst  |  hen...@csupomona.edu
California State Polytechnic University  |  Pomona CA 91768
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Setting up an SSD ZIL - Need A Reality Check

2009-10-22 Thread Bob Friesenhahn

On Thu, 22 Oct 2009, Marc Bevand wrote:


Bob Friesenhahn bfriesen at simple.dallas.tx.us writes:

For random write I/O, caching improves I/O latency not sustained I/O
throughput (which is what random write IOPS usually refer to). So Intel can't
cheat with caching. However they can cheat by benchmarking a brand new drive
instead of an aged one.


With FLASH devices, a sufficiently large write cache can improve 
random write I/O.  One can imagine that the wear leveling logic 
could be used to do tricky remapping so that several random writes 
actually lead to sequential writes to the same FLASH superblock so 
only one superblock needs to be updated and the parts of the old 
superblocks which would have been overwritten are marked as unused. 
This of course requires rather advanced remapping logic at a 
finer-grained resolution than the superblock.  When erased space 
becomes tight (or on a periodic basis), the data in several 
sparsely-used superblocks are migrated to a different superblock in a 
more compact way (along with requisite logical block remapping) to 
reclaim space.  It is worth developing such remapping logic since 
FLASH erasures and re-writes are so expensive.



They also carefully only use a limited span
of the device, which fits most perfectly with how the device is built.


AFAIK, for the X25-E series, they benchmark random write IOPS on a 100% span.
You may be confusing it with the X25-M series with which they actually clearly
disclose two performance numbers: 350 random write IOPS on 8GB span, and 3.3k
on 100% span. See
http://www.intel.com/cd/channel/reseller/asmo-na/eng/products/nand/tech/425265.htm


You are correct that I interpreted the benchmark scenarios from the 
X25-M series documentation.  It seems reasonable for the same 
manufacturer to use the same benchmark methodology for similar 
products.  Then again, they are still new at this.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SNV_125 MPT warning in logfile

2009-10-22 Thread Cindy Swearingen

Hi Bruno,

I see some bugs associated with these messages (6694909) that point to
an LSI firmware upgrade that cause these harmless errors to display.

According to the 6694909 comments, this issue is documented in the
release notes.

As they are harmless, I wouldn't worry about them.

Maybe someone from the driver group can comment further.

Cindy


On 10/22/09 05:40, Bruno Sousa wrote:

Hi all,

Recently i upgrade from snv_118 to snv_125, and suddently i started to 
see this messages at /var/adm/messages :


Oct 22 12:54:37 SAN02 scsi: [ID 243001 kern.warning] WARNING: 
/p...@0,0/pci10de,3...@a/pci1000,3...@0 (mpt0):
Oct 22 12:54:37 SAN02  mpt_handle_event: IOCStatus=0x8000, 
IOCLogInfo=0x3112011a
Oct 22 12:56:47 SAN02 scsi: [ID 243001 kern.warning] WARNING: 
/p...@0,0/pci10de,3...@a/pci1000,3...@0 (mpt0):
Oct 22 12:56:47 SAN02  mpt_handle_event_sync: IOCStatus=0x8000, 
IOCLogInfo=0x3112011a
Oct 22 12:56:47 SAN02 scsi: [ID 243001 kern.warning] WARNING: 
/p...@0,0/pci10de,3...@a/pci1000,3...@0 (mpt0):
Oct 22 12:56:47 SAN02  mpt_handle_event: IOCStatus=0x8000, 
IOCLogInfo=0x3112011a
Oct 22 12:56:50 SAN02 scsi: [ID 243001 kern.warning] WARNING: 
/p...@0,0/pci10de,3...@a/pci1000,3...@0 (mpt0):
Oct 22 12:56:50 SAN02  mpt_handle_event_sync: IOCStatus=0x8000, 
IOCLogInfo=0x3112011a
Oct 22 12:56:50 SAN02 scsi: [ID 243001 kern.warning] WARNING: 
/p...@0,0/pci10de,3...@a/pci1000,3...@0 (mpt0):
Oct 22 12:56:50 SAN02  mpt_handle_event: IOCStatus=0x8000, 
IOCLogInfo=0x3112011a



Is this a symptom of a disk error or some change was made in the 
driver?,that now i have more information, where in the past such 
information didn't appear?


Thanks,
Bruno

I'm using a LSI Logic SAS1068E B3 and i within lsiutil i have this 
behaviour :



1 MPT Port found

Port Name Chip Vendor/Type/RevMPT Rev  Firmware Rev  IOC
1.  mpt0  LSI Logic SAS1068E B3 105  011a 0

Select a device:  [1-1 or 0 to quit] 1

1.  Identify firmware, BIOS, and/or FCode
2.  Download firmware (update the FLASH)
4.  Download/erase BIOS and/or FCode (update the FLASH)
8.  Scan for devices
10.  Change IOC settings (interrupt coalescing)
13.  Change SAS IO Unit settings
16.  Display attached devices
20.  Diagnostics
21.  RAID actions
22.  Reset bus
23.  Reset target
42.  Display operating system names for devices
45.  Concatenate SAS firmware and NVDATA files
59.  Dump PCI config space
60.  Show non-default settings
61.  Restore default settings
66.  Show SAS discovery errors
69.  Show board manufacturing information
97.  Reset SAS link, HARD RESET
98.  Reset SAS link
99.  Reset port
e   Enable expert mode in menus
p   Enable paged mode
w   Enable logging

Main menu, select an option:  [1-99 or e/p/w or 0 to quit] 20

1.  Inquiry Test
2.  WriteBuffer/ReadBuffer/Compare Test
3.  Read Test
4.  Write/Read/Compare Test
8.  Read Capacity / Read Block Limits Test
12.  Display phy counters
13.  Clear phy counters
14.  SATA SMART Read Test
15.  SEP (SCSI Enclosure Processor) Test
18.  Report LUNs Test
19.  Drive firmware download
20.  Expander firmware download
21.  Read Logical Blocks
99.  Reset port
e   Enable expert mode in menus
p   Enable paged mode
w   Enable logging

Diagnostics menu, select an option:  [1-99 or e/p/w or 0 to quit] 12

Adapter Phy 0:  Link Down, No Errors

Adapter Phy 1:  Link Down, No Errors

Adapter Phy 2:  Link Down, No Errors

Adapter Phy 3:  Link Down, No Errors

Adapter Phy 4:  Link Up, No Errors

Adapter Phy 5:  Link Up, No Errors

Adapter Phy 6:  Link Up, No Errors

Adapter Phy 7:  Link Up, No Errors

Expander (Handle 0009) Phy 0:  Link Up
 Invalid DWord Count  79,967,229
 Running Disparity Error Count63,036,893
 Loss of DWord Synch Count   113
 Phy Reset Problem Count   0

Expander (Handle 0009) Phy 1:  Link Up
 Invalid DWord Count  79,967,207
 Running Disparity Error Count78,339,626
 Loss of DWord Synch Count   113
 Phy Reset Problem Count   0

Expander (Handle 0009) Phy 2:  Link Up
 Invalid DWord Count  76,717,646
 Running Disparity Error Count73,334,563
 Loss of DWord Synch Count   113
 Phy Reset Problem Count   0

Expander (Handle 0009) Phy 3:  Link Up
 Invalid DWord Count  79,896,409
 Running Disparity Error Count76,199,329
 Loss of DWord Synch Count   113
 Phy Reset Problem Count   0

Expander (Handle 0009) Phy 4:  Link Up, No Errors

Expander (Handle 0009) Phy 5:  Link Up, No Errors

Expander (Handle 0009) Phy 6:  Link Up, No Errors

Expander (Handle 0009) Phy 7:  Link Up, No 

Re: [zfs-discuss] Setting up an SSD ZIL - Need A Reality Check

2009-10-22 Thread Meilicke, Scott
Interesting. We must have different setups with our PERCs. Mine have  
always auto rebuilt.


--
Scott Meilicke

On Oct 22, 2009, at 6:14 AM, Edward Ned Harvey  
sola...@nedharvey.com wrote:


Replacing failed disks is easy when PERC is doing the RAID. Just  
remove
the failed drive and replace with a good one, and the PERC will  
rebuild

automatically.


Sorry, not correct.  When you replace a failed drive, the perc card  
doesn't

know for certain that the new drive you're adding is meant to be a
replacement.  For all it knows, you could coincidentally be adding  
new disks
for a new VirtualDevice which already contains data, during the  
failure
state of some other device.  So it will not automatically resilver  
(which
would be a permanently destructive process, applied to a disk which  
is not

*certainly* meant for destruction).

You have to open the perc config interface, tell it this disk is a
replacement for the old disk (probably you're just saying This disk  
is the
new global hotspare) or else the new disk will sit there like a  
bump on a

log.  Doing nothing.




We value your opinion!  How may we serve you better? 
Please click the survey link to tell us how we are doing:

http://www.craneae.com/ContactUs/VoiceofCustomer.aspx
Your feedback is of the utmost importance to us. Thank you for your time.

Crane Aerospace  Electronics Confidentiality Statement:
The information contained in this email message may be privileged and is 
confidential information intended only for the use of the recipient, or any 
employee or agent responsible to deliver it to the intended recipient. Any 
unauthorized use, distribution or copying of this information is strictly prohibited 
and may be unlawful. If you have received this communication in error, please notify 
the sender immediately and destroy the original message and all attachments from 
your electronic files.


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] strange results ...

2009-10-22 Thread Marion Hakanson
jel+...@cs.uni-magdeburg.de said:
 2nd) Never had a Sun STK RAID INT before. Actually my intention was to create
 a zpool mirror of sd0 and sd1 for boot and logs, and a 2x2-way  zpool mirror
 with the 4 remaining disks. However, the controller seems not to support
 JBODs :( - which is also bad, since we can't simply put those disks into
 another machine with a different controller without data loss, because the
 controller seems to use its own format under the hood.

Yes, those Adaptec/STK internal RAID cards are annoying for use with ZFS.
You also cannot replace a failed disk without using the STK RAID software
to configure the new disk as a standalone volume (before zpool replace).
Fortunately you probably don't need to boot into the BIOS-level utility,
I think you can use the Adaptec StorMan utilities from within the OS, if
you remembered to install them.


  Also the 256MB
 BBCache seems to be a little bit small for ZIL even if one would know, how to
 configure it ...

Unless you have an external (non-NV cached) pool on the same server, you
wouldn't gain anything from setting up a separate ZIL in this case.  All
your internal drives have NV cache without doing anything special.


 So what would you recommend? Creating 2 appropriate STK INT arrays and using
 both as a single zpool device, i.e. without ZFS mirror devs and 2nd copies?  

Here's what we did:  Configure all internal disks as standalone volumes on
the RAID card.  All those volumes have the battery-backed cache enabled.
The first two 146GB drives got sliced in two:  the first half of each disk
became the boot/root mirror pool.  The 2nd half was used for a separate-ZIL
mirror, applied to an external SATA pool.

Our remaining internal drives were configured into a mirrored ZFS pool
for database transaction logs.  No need for a separate ZIL there, since
the internal drives effectively have NV cache as far as ZFS is concerned.

Yes, the 256MB cache is small, but if it fills up, it is backed by the
10kRPM internal SAS drives, which should have decent latency when compared
to external SATA JBOD drives.  And even this tiny NV cache makes a huge
difference when used on an NFS server:
http://acc.ohsu.edu/~hakansom/j4400_bench.html

Regards,

Marion


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] [Fwd: snv_123: kernel memory leak?]

2009-10-22 Thread Robert Milkowski


anyone?
---BeginMessage---
Hi,

pre
 ::status
debugging live kernel (64-bit) on mk-archive-1
operating system: 5.11 snv_123 (i86pc)
 ::system
set noexec_user_stack_log=0x1 [0t1]
set noexec_user_stack=0x1 [0t1]
set snooping=0x1 [0t1]
set zfs:zfs_arc_max=0x28000 [0t10737418240]
 
 ::memstat
Page SummaryPagesMB  %Tot
     
Kernel1843301  7200   44%
ZFS File Data 1651701  6451   40%
Anon   202473   7905%
Exec and libs 934 30%
Page cache   4369170%
Free (cachelist) 2084 80%
Free (freelist)462699  1807   11%

Total 4167561 16279
Physical  4167560 16279
 

We are experiencing out-of-memory issues during the night on this server and 
our apps do not use much memory but are during lots of disk and network IO. The 
server is x4500 with 16GB of memory, the filesystem is of course ZFS. I'm 
concerned with the kernel size here.
I'm getting applications crashing and errors like:

WARNING: Sorry, no swap space to grow stack for pid 5991 (cron)
WARNING: Sorry, no swap space to grow stack for pid 6092 (mv)
WARNING: Sorry, no swap space to grow stack for pid 18481 (ggrep)
WARNING: Sorry, no swap space to grow stack for pid 18562 (cron)
WARNING: /tmp: File system full, swap space limit exceeded
WARNING: /etc/svc/volatile: File system full, swap space limit exceeded

And no, we are not filling /tmp and besides /tmp is limited to 1GB anyway.

Couple of highlights from ::kmastat
[...]
cachebufbufbufmemory alloc alloc 
namesize in use  totalin use   succeed  fail 
- -- -- -- -- - - 
kmem_magazine_143   1152  53621  55065  75182080B  15082603  
[b]5195[/b] 
[...]
vmem memory memorymemory alloc alloc 
name in use  totalimport   succeed  fail 
- -- --- -- - - 
heap  [b]13564526592B[/b] 1077569126400B 0B  
34565103 0 
[...]
kmem_metadata 586997760B  623640576B 623640576B912072 0 
kmem_msb  539336704B  539336704B 539336704B864680  
[b]5198[/b] 
[...]
kmem_firewall_va  141729792B  141729792B 141729792B578915 0 
kmem_firewall 0B  0B 0B 0 0 
kmem_oversize 141532041B  141729792B 141729792B578904
[b]11[/b] 
[...]
kmem_va   [b]12148248576B[/b] 12148248576B 12148248576B  
31994594 0 
kmem_default  5738831872B 5738831872B 5738831872B 187061764 0 
[...]
zfs_file_data 6734151680B 17070817280B 0B  90214511 0 
zfs_file_data_buf 6734151680B 6734151680B 6734151680B  90214511 0 
[...]

Is it a memory fragmentation or something else?


A full ::kmastat output

 ::kmastat
cachebufbufbufmemory alloc alloc 
namesize in use  totalin use   succeed  fail 
- -- -- -- -- - - 
kmem_magazine_1   16   2893  32379528384B  94677416 1 
kmem_magazine_3   32   5111   5875192512B  11584154 0 
kmem_magazine_7   64  15917  17670   1167360B  24614491 0 
kmem_magazine_15 128   4478   4619610304B   6863634 0 
kmem_magazine_31 256   7823  12150   3317760B   3485723 0 
kmem_magazine_47 384   2796   3280   1343488B   1616387 0 
kmem_magazine_63 512550   3353   1961984B   1256504 0 
kmem_magazine_95 768  13035  18095  14823424B  18231914 0 
kmem_magazine_143   1152  53621  55065  75182080B  15082603  5195 
kmem_slab_cache   72 822572 1785630 132980736B 275428877 0 
kmem_bufctl_cache 24 6384507 12425301 304754688B 535280109 0 
kmem_bufctl_audit_cache  192  0  0 0B 0 0 
kmem_va_40964096 1091176 1998784 3892051968B  49062569 0 
kmem_va_81928192  17696  18896 154796032B   4835863 0 
kmem_va_12288  12288451   3580  46923776B   9282713 0 
kmem_va_16384  16384  26040 182176 2984771584B 106103361 0 
kmem_va_20480  20480   5974   6984 152567808B   9298381 0 
kmem_va_24576  24576 94365   9568256B   2731797 0 
kmem_va_28672  28672270   1700  55705600B   9520293 0 
kmem_va_32768  32768406620  20316160B   2796821 0 
kmem_alloc_8   8 203502 

Re: [zfs-discuss] ZFS disk failure question

2009-10-22 Thread Cindy Swearingen

Hi Jason,

Since spare replacement is an important process, I've rewritten this
section to provide 3 main examples, here:

http://docs.sun.com/app/docs/doc/817-2271/gcvcw?a=view

Scroll down the section:

Activating and Deactivating Hot Spares in Your Storage Pool

Example 4–7 Manually Replacing a Disk With a Hot Spare
Example 4–8 Detaching a Hot Spare After the Failed Disk is Replaced
Example 4–9 Detaching a Failed Disk and Using the Hot Spare

The third example is your scenario. I finally listened to the answer,
which is you must detach the original disk if you want to continue to
use the spare and replace the original disk later. It all works as
described.

I see some other improvements coming with spare replacement and will
provide details when they are available.

Thanks,

Cindy

On 10/14/09 15:54, Jason Frank wrote:

See, I get overly literal when working on failed production storage
(and yes, I do have backups...)  I wasn't wanting to cancel the
in-progress spare replacement.  I had a completed spare replacement,
and I wanted to make it official.  So, that didn't really fit my
scenario either.

I'm glad you agree on the brevity of the detach subcommand man page.
I would guess that the intricacies of the failure modes would probably
lend itself to richer content than a man page.

I'd really like to see some kind of web based wizard to walk through
it  I doubt I'd get motivated to write it myself though.

The web page Cindy pointed to does not cover how to make the
replacement official either.  It gets close.  But at the end, it
detaches the hot spare, and not the original disk.  Everything seems
to be close, but not quite there.  Of course, now that I've been
through this once, I'll remember all.  I'm just thinking of the
children.

Also, I wanted to try and reconstruct all of my steps from zpool
history -i tank.  According to that, zpool decided to replace t7 with
t11 this morning (why wasn't it last night?), and I offlined, onlined
and detach of t7 and I was OK.  I did notice that the history records
internal scrubs, but not resilvers,  It also doesn't record failed
commands, or disk failures in a zpool.  It would be sweet to have a
line that said something like marking vdev  /dev/dsk/c8t7d0s0 as
UNAVAIL due to X read errors in Y minutes, Then we can really see
what happened.

Jason

On Wed, Oct 14, 2009 at 4:32 PM, Eric Schrock eric.schr...@sun.com wrote:

On 10/14/09 14:26, Jason Frank wrote:

Thank you, that did the trick.  That's not terribly obvious from the
man page though.  The man page says it detaches the devices from a
mirror, and I had a raidz2.  Since I'm messing with production data, I
decided I wasn't going to chance it when I was reading the man page.
You might consider changing the man page, and explaining a little more
what it means, maybe even what the circumstances look like where you
might use it.

This is covered in the Hot Spares section of the manpage:

An in-progress spare replacement can be cancelled by detach-
ing  the  hot  spare.  If  the  original  faulted  device is
detached, then the hot spare assumes its place in the confi-
guration,  and  is removed from the spare list of all active
pools.

It is true that the description for zpool detach is overly brief and could
be expanded to include this use case.

- Eric

--
Eric Schrock, Fishworkshttp://blogs.sun.com/eschrock


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] strange results ...

2009-10-22 Thread Robert Milkowski

Jens Elkner wrote:

Hmmm,

wondering about IMHO strange ZFS results ...

X4440:  4x6 2.8GHz cores (Opteron 8439 SE), 64 GB RAM
6x Sun STK RAID INT V1.0 (Hitachi H103012SCSUN146G SAS)
Nevada b124

Started with a simple test using zfs on c1t0d0s0: cd /var/tmp

(1) time sh -c 'mkfile 32g bla ; sync' 
0.16u 19.88s 5:04.15 6.5%

(2) time sh -c 'mkfile 32g blabla ; sync'
0.13u 46.41s 5:22.65 14.4%
(3) time sh -c 'mkfile 32g blablabla ; sync'
0.19u 26.88s 5:38.07 8.0%

chmod 644 b*
(4) time dd if=bla of=/dev/null bs=128k
262144+0 records in
262144+0 records out
0.26u 25.34s 6:06.16 6.9%
(5) time dd if=blabla of=/dev/null bs=128k
262144+0 records in
262144+0 records out
0.15u 26.67s 4:46.63 9.3%
(6) time dd if=blablabla of=/dev/null bs=128k
262144+0 records in
262144+0 records out
0.10u 20.56s 0:20.68 99.9%

So 1-3 is more or less expected (~97..108 MB/s write).
However 4-6 looks strange: 89, 114 and 1585 MB/s read!

Since the arc size is ~55+-2GB (at least arcstat.pl says so), I guess (6)
reads from memory completely. Hmm - maybe.
However, I would expect, that when repeating 5-6, 'blablabla' gets replaced
by 'bla' or 'blabla'. But the numbers say, that 'blablabla' is kept in the
cache, since I get almost the same results as in the first run (and zpool
iostat/arcstat.pl show for the blablabla almost no activity at all).
So is this a ZFS bug? Or does the OS some magic here?

  
IIRC zfs when detects sequential reads in a given file will stop caching 
block for it.
So because #6 wes created as last one all its blocks are cached in arc, 
then when reading in #4 and #5 zfs detected sequential read and did not 
put data in a cache leaving last written file entirely cached.


While for many workloads this is desired behavior for many other it is 
not (like parsing with grep like tool large log files which are not 
getting cached...).



--
Robert Milkowski
http://milek.blogspot.com

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Only a few days left for Online Registration: Solaris Security Summit Nov 3rd

2009-10-22 Thread Jennifer Bauer Scarpino

Hello All

There is still time to register online. You will also be available
to register on-site as well.

Just to give you an idea of the presentation that will be
given.

* Presentation: Kerberos Authentication for Web Security
* Presentation: Protecting Oracle Applications with Built-In Solaris 
Security Features

* Presentation: H/W based isolation and security for Virtual Machine Network
* Presentation: ZFS-Crypto Overview

Hope to see you there!!



-

To: Developers and Students

You are invited to participate in the first OpenSolaris Security Summit


Solaris Security Summit
Tuesday, November 3rd, 2009
Baltimore Marriott Waterfront
700 Aliceanna Street
Baltimore, Maryland 21202


Join us as we explore the latest trends of Solaris Security
technologies, as well as key insights from security community members,
technologists, and users.


You will also have the unique opportunity to hear from our keynote
speaker William Cheswick, Lead Member of the Technical Staff at ATT
labs

Bio:

Ches is an early innovator in Internet security. He is known for his
work in firewalls, proxies, and Internet mapping at Bell Labs and Lumeta
Corp. He is best known for the book he co-authored with Steve Bellovin
and now Avi Rubin, Firewalls and Internet Security; Repelling the Wily
Hacker.
Ches is now a member of the technical staff at ATT Labs - Research in
Florham Park, NJ, where he is working on security, visualization, user
interfaces, and a variety of other things.


Registration now available!

http://wikis.sun.com/display/secsummit09/
http://www.usenix.org/events/lisa09/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SNV_125 MPT warning in logfile

2009-10-22 Thread Adam Cheal
Cindy: How can I view the bug report you referenced? Standard methods show my 
the bug number is valid (6694909) but no content or notes. We are having 
similar messages appear with snv_118 with a busy LSI controller, especially 
during scrubbing, and I'd be interested to see what they mentioned in that 
report. Also, the LSI firmware updates for the LSISAS3081E (the controller we 
use) don't usually come with release notes indicating what has changed in each 
firmware revision, so I'm not sure where they got that idea from.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SNV_125 MPT warning in logfile

2009-10-22 Thread James C. McPherson

Adam Cheal wrote:

Cindy: How can I view the bug report you referenced? Standard methods
show my the bug number is valid (6694909) but no content or notes. We are
having similar messages appear with snv_118 with a busy LSI controller,
especially during scrubbing, and I'd be interested to see what they
mentioned in that report. Also, the LSI firmware updates for the
LSISAS3081E (the controller we use) don't usually come with release notes
indicating what has changed in each firmware revision, so I'm not sure
where they got that idea from.



Hi Adam,
unfortunately, you can't see that bug from outside.

The evaluation from LSI is very clear that this is a firmware issue
rather than a driver issue, and is claimed to be fixed in

LSI BIOS v6.26.00 FW 1.27.02
(aka Phase 15)


cheers,
James
--
Senior Kernel Software Engineer, Solaris
Sun Microsystems
http://blogs.sun.com/jmcp   http://www.jmcp.homeunix.com/blog
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] zpool with very different sized vdevs?

2009-10-22 Thread Travis Tabbal
I have a new array of 4x1.5TB drives running fine. I also have the old array of 
4x400GB drives in the box on a separate pool for testing. I was planning to 
have the old drives just be a backup file store, so I could keep snapshots and 
such over there for important files. 

I was wondering if it makes any sense to add the older drives to the new pool. 
Reliability might be lower as they are older drives, so if I were to loose 2 of 
them, things could get ugly. I'm just curious if it would make any sense to do 
something like this.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SNV_125 MPT warning in logfile

2009-10-22 Thread Adam Cheal
James: We are running Phase 16 on our LSISAS3801E's, and have also tried the 
recently released Phase 17 but it didn't help. All firmware NVRAM settings are 
default. Basically, when we put the disks behind this controller under load 
(e.g. scrubbing, recursive ls on large ZFS filesystem) we get this series of 
log entries that appear at random intervals:

scsi: [ID 107833 kern.warning] WARNING: 
/p...@0,0/pci8086,6...@4/pci1000,3...@0/s...@34,0 (sd49):
   incomplete read- retrying
scsi: [ID 243001 kern.warning] WARNING: /p...@0,0/pci8086,6...@4/pci1000,3...@0 
(mpt0):
   mpt_handle_event_sync: IOCStatus=0x8000, IOCLogInfo=0x31110b00
scsi: [ID 243001 kern.warning] WARNING: /p...@0,0/pci8086,6...@4/pci1000,3...@0 
(mpt0):
   mpt_handle_event: IOCStatus=0x8000, IOCLogInfo=0x31110b00
scsi: [ID 243001 kern.warning] WARNING: /p...@0,0/pci8086,6...@4/pci1000,3...@0 
(mpt0):
   mpt_handle_event_sync: IOCStatus=0x8000, IOCLogInfo=0x31112000
scsi: [ID 243001 kern.warning] WARNING: /p...@0,0/pci8086,6...@4/pci1000,3...@0 
(mpt0):
   mpt_handle_event: IOCStatus=0x8000, IOCLogInfo=0x31112000
scsi: [ID 365881 kern.info] /p...@0,0/pci8086,6...@4/pci1000,3...@0 (mpt0):
   Log info 0x31110b00 received for target 40.
   scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc
scsi: [ID 365881 kern.info] /p...@0,0/pci8086,6...@4/pci1000,3...@0 (mpt0):
   Log info 0x31110b00 received for target 40.
   scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc
scsi: [ID 365881 kern.info] /p...@0,0/pci8086,6...@4/pci1000,3...@0 (mpt0):
   Log info 0x31110b00 received for target 40.
   scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc
scsi: [ID 365881 kern.info] /p...@0,0/pci8086,6...@4/pci1000,3...@0 (mpt0):
   Log info 0x31110b00 received for target 40.
   scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc
scsi: [ID 107833 kern.warning] WARNING: 
/p...@0,0/pci8086,6...@4/pci1000,3...@0/s...@2d,0 (sd42):
   incomplete read- retrying
scsi: [ID 365881 kern.info] /p...@0,0/pci8086,6...@4/pci1000,3...@0 (mpt0):
   Rev. 8 LSI, Inc. 1068E found.
scsi: [ID 365881 kern.info] /p...@0,0/pci8086,6...@4/pci1000,3...@0 (mpt0):
   mpt0 supports power management.
scsi: [ID 365881 kern.info] /p...@0,0/pci8086,6...@4/pci1000,3...@0 (mpt0):
   mpt0: IOC Operational.

It seems to be timing out accessing a disk, retrying, giving up and then doing 
a bus reset?

This is happening with random disks behind the controller and on multiple 
systems with the same hardware config. We are running snv_118 right now and was 
hoping this was some magic mpt-related bug that was going to be fixed in 
snv_125 but it doesn't look like it. The LSI3801E is driving 2 x 23-disk JBOD's 
which, albeit a dense solution, it should be able to handle. We are also using 
wide raidz2 vdevs (22 disks each, one per JBOD) which agreeably is slower 
performance-wise, but the goal here is density not performance. I would have 
hoped that the system would just slow down if there was IO contention, but 
not experience things like bus resets.

Your thoughts?
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SNV_125 MPT warning in logfile

2009-10-22 Thread James C. McPherson

Adam Cheal wrote:

James: We are running Phase 16 on our LSISAS3801E's, and have also tried
the recently released Phase 17 but it didn't help. All firmware NVRAM
settings are default. Basically, when we put the disks behind this
controller under load (e.g. scrubbing, recursive ls on large ZFS
filesystem) we get this series of log entries that appear at random
intervals:

...

It seems to be timing out accessing a disk, retrying, giving up and then
doing a bus reset?

This is happening with random disks behind the controller and on multiple
systems with the same hardware config. We are running snv_118 right now
and was hoping this was some magic mpt-related bug that was going to be
fixed in snv_125 but it doesn't look like it. The LSI3801E is driving 2 x
23-disk JBOD's which, albeit a dense solution, it should be able to
handle. We are also using wide raidz2 vdevs (22 disks each, one per JBOD)
which agreeably is slower performance-wise, but the goal here is density
not performance. I would have hoped that the system would just slow
down if there was IO contention, but not experience things like bus
resets.

Your thoughts?


ugh. New bug time - bugs.opensolaris.org, please select
Solaris / kernel / driver-mpt. In addition to the error
messages and description of when you see it, please provide
output from

cfgadm -lav
prtconf -v

I'll see that it gets moved to the correct group asap.


Cheers,
James
--
Senior Kernel Software Engineer, Solaris
Sun Microsystems
http://blogs.sun.com/jmcp   http://www.jmcp.homeunix.com/blog
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] zpool getting in a stuck state?

2009-10-22 Thread Jeremy Kitchen

Hey folks!

We're using zfs-based file servers for our backups and we've been  
having some issues as of late with certain situations causing zfs/ 
zpool commands to hang.


Currently, it appears that raid3155 is in this broken state:

r...@homiebackup10:~# ps auxwww | grep zfs
root 15873  0.0  0.0 4216 1236 pts/2S 15:56:54  0:00 grep zfs
root 13678  0.0  0.1 7516 2176 ?S 14:18:00  0:00 zfs list - 
t filesystem raid3155/angels
root 13691  0.0  0.1 7516 2188 ?S 14:18:04  0:00 zfs list - 
t filesystem raid3155/blazers
root 13731  0.0  0.1 7516 2200 ?S 14:18:20  0:00 zfs list - 
t filesystem raid3155/broncos
root 13792  0.0  0.1 7516 2220 ?S 14:18:51  0:00 zfs list - 
t filesystem raid3155/diamondbacks
root 13910  0.0  0.1 7516 2216 ?S 14:19:52  0:00 zfs list - 
t filesystem raid3155/knicks
root 13911  0.0  0.1 7516 2196 ?S 14:19:53  0:00 zfs list - 
t filesystem raid3155/lions
root 13916  0.0  0.1 7516 2220 ?S 14:19:55  0:00 zfs list - 
t filesystem raid3155/magic
root 13933  0.0  0.1 7516 2232 ?S 14:20:01  0:00 zfs list - 
t filesystem raid3155/mariners
root 13966  0.0  0.1 7516 2212 ?S 14:20:11  0:00 zfs list - 
t filesystem raid3155/mets
root 13971  0.0  0.1 7516 2208 ?S 14:20:21  0:00 zfs list - 
t filesystem raid3155/niners
root 13982  0.0  0.1 7516 2220 ?S 14:20:32  0:00 zfs list - 
t filesystem raid3155/padres
root 14064  0.0  0.1 7516 2220 ?S 14:21:03  0:00 zfs list - 
t filesystem raid3155/redwings
root 14123  0.0  0.1 7516 2212 ?S 14:21:20  0:00 zfs list - 
t filesystem raid3155/seahawks
root 14323  0.0  0.1 7420 2184 ?S 14:22:51  0:00 zfs allow  
zfsrcv create,mount,receive,share raid3155
root 15245  0.0  0.1 7468 2256 ?S 15:17:59  0:00 zfs  
create raid3155/angels
root 15250  0.0  0.1 7468 2244 ?S 15:18:03  0:00 zfs  
create raid3155/blazers
root 15256  0.0  0.1 7468 2248 ?S 15:18:19  0:00 zfs  
create raid3155/broncos
root 15284  0.0  0.1 7468 2256 ?S 15:18:51  0:00 zfs  
create raid3155/diamondbacks
root 15322  0.0  0.1 7468 2260 ?S 15:19:51  0:00 zfs  
create raid3155/knicks
root 15332  0.0  0.1 7468 2260 ?S 15:19:53  0:00 zfs  
create raid3155/magic
root 15333  0.0  0.1 7468 2236 ?S 15:19:53  0:00 zfs  
create raid3155/lions
root 15345  0.0  0.1 7468 2264 ?S 15:20:01  0:00 zfs  
create raid3155/mariners
root 15355  0.0  0.1 7468 2260 ?S 15:20:10  0:00 zfs  
create raid3155/mets
root 15363  0.0  0.1 7468 2252 ?S 15:20:20  0:00 zfs  
create raid3155/niners
root 15368  0.0  0.1 7468 2256 ?S 15:20:33  0:00 zfs  
create raid3155/padres
root 15384  0.0  0.1 7468 2256 ?S 15:21:01  0:00 zfs  
create raid3155/redwings
root 15389  0.0  0.1 7468 2264 ?S 15:21:20  0:00 zfs  
create raid3155/seahawks


attempting to do a zpool list hangs, as does attempting to do a zpool  
status raid3155.  Rebooting the system (forcefully) seems to 'fix' the  
problem, but once it comes back up, doing a zpool list or zpool status  
shows no issues with any of the drives.


(after a reboot):
r...@homiebackup10:~# zpool list
NAME   SIZE   USED  AVAILCAP  HEALTH  ALTROOT
raid3066  32.5T  18.1T  14.4T55%  ONLINE  -
raid3154  32.5T  18.2T  14.3T55%  ONLINE  -
raid3155  32.5T  18.7T  13.8T57%  ONLINE  -
raid3156  32.5T  22.0T  10.5T67%  ONLINE  -
rpool 59.5G  14.1G  45.4G23%  ONLINE  -

We are using silmech storform iserv r505 machines with 3x silmech  
storform D55J jbod sas expanders connected to LSI Logic SAS1068E B3  
esas cards all containing 1.5TB seagate 7200.11 sata hard drives.  We  
make a single striped raidz2 pool out of each chassis giving us ~29TB  
of storage out of each 'brick' and we use rsync to copy the data from  
the machines to be backed up.


They're currently running OpenSolaris 2009.06 (snv_111b)

We have had issues with the backplanes on these machines, but this  
particular machine has been up and running for nearly a year without  
any problems.  It's currently at about 50% capacity on all pools.


I'm not really sure how to proceed from here as far as getting debug  
information while it's hung like this.  I saw someone with similar  
issues post a few days ago but don't see any replies.  The thread  
title is [zfs-discuss] Problem with resilvering and faulty disk.   
We've been seeing that issue as well while rebuilding these drives.


Any assistance with this would be greatly appreciated, and any  
information you folks might need to help troubleshoot this issue I can  
provide, just let me know what you need!


-Jeremy
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SNV_125 MPT warning in logfile

2009-10-22 Thread Adam Cheal
I've filed the bug, but was unable to include the prtconf -v output as the 
comments field only accepted 15000 chars total. Let me know if there is 
anything else I can provide/do to help figure this problem out as it is 
essentially preventing us from doing any kind of heavy IO to these pools, 
including scrubbing.
-- 
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS disk failure question

2009-10-22 Thread Jason Frank
Thank you for your follow-up.  The doc looks great.  Having good
examples goes a long way to helping others that have my problem.

Ideally, the replacement would all happen magically, and I would have
had everything marked as good, with one failed disk (like a certain
other storage vendor that has it's beefs with Sun does).  But, I can
live with detaching them if I have to.

Another thing that would be nice would be to receive notification of
disk failures from the OS via email or SMS (like the vendor I
previously alluded to), but I know I'm talking crazy now.

Jason

On Thu, Oct 22, 2009 at 2:15 PM, Cindy Swearingen
cindy.swearin...@sun.com wrote:
 Hi Jason,

 Since spare replacement is an important process, I've rewritten this
 section to provide 3 main examples, here:

 http://docs.sun.com/app/docs/doc/817-2271/gcvcw?a=view

 Scroll down the section:

 Activating and Deactivating Hot Spares in Your Storage Pool

 Example 4–7 Manually Replacing a Disk With a Hot Spare
 Example 4–8 Detaching a Hot Spare After the Failed Disk is Replaced
 Example 4–9 Detaching a Failed Disk and Using the Hot Spare

 The third example is your scenario. I finally listened to the answer,
 which is you must detach the original disk if you want to continue to
 use the spare and replace the original disk later. It all works as
 described.

 I see some other improvements coming with spare replacement and will
 provide details when they are available.

 Thanks,

 Cindy

 On 10/14/09 15:54, Jason Frank wrote:

 See, I get overly literal when working on failed production storage
 (and yes, I do have backups...)  I wasn't wanting to cancel the
 in-progress spare replacement.  I had a completed spare replacement,
 and I wanted to make it official.  So, that didn't really fit my
 scenario either.

 I'm glad you agree on the brevity of the detach subcommand man page.
 I would guess that the intricacies of the failure modes would probably
 lend itself to richer content than a man page.

 I'd really like to see some kind of web based wizard to walk through
 it  I doubt I'd get motivated to write it myself though.

 The web page Cindy pointed to does not cover how to make the
 replacement official either.  It gets close.  But at the end, it
 detaches the hot spare, and not the original disk.  Everything seems
 to be close, but not quite there.  Of course, now that I've been
 through this once, I'll remember all.  I'm just thinking of the
 children.

 Also, I wanted to try and reconstruct all of my steps from zpool
 history -i tank.  According to that, zpool decided to replace t7 with
 t11 this morning (why wasn't it last night?), and I offlined, onlined
 and detach of t7 and I was OK.  I did notice that the history records
 internal scrubs, but not resilvers,  It also doesn't record failed
 commands, or disk failures in a zpool.  It would be sweet to have a
 line that said something like marking vdev  /dev/dsk/c8t7d0s0 as
 UNAVAIL due to X read errors in Y minutes, Then we can really see
 what happened.

 Jason

 On Wed, Oct 14, 2009 at 4:32 PM, Eric Schrock eric.schr...@sun.com
 wrote:

 On 10/14/09 14:26, Jason Frank wrote:

 Thank you, that did the trick.  That's not terribly obvious from the
 man page though.  The man page says it detaches the devices from a
 mirror, and I had a raidz2.  Since I'm messing with production data, I
 decided I wasn't going to chance it when I was reading the man page.
 You might consider changing the man page, and explaining a little more
 what it means, maybe even what the circumstances look like where you
 might use it.

 This is covered in the Hot Spares section of the manpage:

    An in-progress spare replacement can be cancelled by detach-
    ing  the  hot  spare.  If  the  original  faulted  device is
    detached, then the hot spare assumes its place in the confi-
    guration,  and  is removed from the spare list of all active
    pools.

 It is true that the description for zpool detach is overly brief and
 could
 be expanded to include this use case.

 - Eric

 --
 Eric Schrock, Fishworks                    http://blogs.sun.com/eschrock


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] cannot import 'rpool': one or more devices is currently unavailable

2009-10-22 Thread Tommy McNeely
I have a system who's rpool has gone defunct. The rpool is made of a 
single disk which is a raid5EE made of all 8 146G disks on the box. 
The raid card is the Adaptec brand card.  It was running nv_107, but its 
currently net booted to nv_121. I have already checked in the raid card 
bios, and it says the volume is optimal . We had a power outage in 
BRM07 on Tuesday, and the system appeared to boot back up, but then went 
wonky. I power cycled it, and it came back to a grub prompt cause it 
couldn't read the filesystem.


# uname -a
SunOS  5.11 snv_121 i86pc i386 i86pc

# zpool import
 pool: rpool
   id: 7197437773913332097
state: ONLINE
status: The pool was last accessed by another system.
action: The pool can be imported using its name or numeric identifier and
   the '-f' flag.
  see: http://www.sun.com/msg/ZFS-8000-EY
config:

   rpool   ONLINE
 c0t0d0s0  ONLINE
# zpool import -f 7197437773913332097
cannot import 'rpool': one or more devices is currently unavailable
#

# zpool import -a -f -R /a
cannot import 'rpool': one or more devices is currently unavailable
# zdb -l /dev/dsk/c0t0d0s0

LABEL 0

   version=14
   name='rpool'
   state=0
   txg=742622
   pool_guid=7197437773913332097
   hostid=4930069
   hostname=''
   top_guid=5620634672424557591
   guid=5620634672424557591
   vdev_tree
   type='disk'
   id=0
   guid=5620634672424557591
   path='/dev/dsk/c0t0d0s0'
   devid='id1,s...@tsun_stk_raid_intefd1dfe0/a'
   phys_path='/p...@0,0/pci8086,3...@4/pci108e,2...@0/d...@0,0:a'
   whole_disk=0
   metaslab_array=24
   metaslab_shift=33
   ashift=9
   asize=880083730432
   is_log=0

LABEL 1

   version=14
   name='rpool'
   state=0
   txg=742622
   pool_guid=7197437773913332097
   hostid=4930069
   hostname=''
   top_guid=5620634672424557591
   guid=5620634672424557591
   vdev_tree
   type='disk'
   id=0
   guid=5620634672424557591
   path='/dev/dsk/c0t0d0s0'
   devid='id1,s...@tsun_stk_raid_intefd1dfe0/a'
   phys_path='/p...@0,0/pci8086,3...@4/pci108e,2...@0/d...@0,0:a'
   whole_disk=0
   metaslab_array=24
   metaslab_shift=33
   ashift=9
   asize=880083730432
   is_log=0

LABEL 2

   version=14
   name='rpool'
   state=0
   txg=742622
   pool_guid=7197437773913332097
   hostid=4930069
   hostname=''
   top_guid=5620634672424557591
   guid=5620634672424557591
   vdev_tree
   type='disk'
   id=0
   guid=5620634672424557591
   path='/dev/dsk/c0t0d0s0'
   devid='id1,s...@tsun_stk_raid_intefd1dfe0/a'
   phys_path='/p...@0,0/pci8086,3...@4/pci108e,2...@0/d...@0,0:a'
   whole_disk=0
   metaslab_array=24
   metaslab_shift=33
   ashift=9
   asize=880083730432
   is_log=0

LABEL 3

   version=14
   name='rpool'
   state=0
   txg=742622
   pool_guid=7197437773913332097
   hostid=4930069
   hostname=''
   top_guid=5620634672424557591
   guid=5620634672424557591
   vdev_tree
   type='disk'
   id=0
   guid=5620634672424557591
   path='/dev/dsk/c0t0d0s0'
   devid='id1,s...@tsun_stk_raid_intefd1dfe0/a'
   phys_path='/p...@0,0/pci8086,3...@4/pci108e,2...@0/d...@0,0:a'
   whole_disk=0
   metaslab_array=24
   metaslab_shift=33
   ashift=9
   asize=880083730432
   is_log=0
# zdb -cu -e -d /dev/dsk/c0t0d0s0
zdb: can't open /dev/dsk/c0t0d0s0: No such file or directory
# zdb -e rpool -cu
zdb: can't open rpool: No such device or address
# zdb -e 7197437773913332097
zdb: can't open 7197437773913332097: No such device or address
#

I obviously have no clue how to weild zdb.

Any help you can offer would be appreciated.

Thanks,
Tommy

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] ZFS disk failure question

2009-10-22 Thread Richard Elling

On Oct 22, 2009, at 12:29 PM, Jason Frank wrote:


Thank you for your follow-up.  The doc looks great.  Having good
examples goes a long way to helping others that have my problem.

Ideally, the replacement would all happen magically, and I would have
had everything marked as good, with one failed disk (like a certain
other storage vendor that has it's beefs with Sun does).  But, I can
live with detaching them if I have to.


The zpool autoreplace property manages the policy for automatic
replacement in ZFS. I presume it will work for most cases, but am
less sure when a RAID controller hides the disk from the OS behind
a volume.  Does anyone have direct experience with this?


Another thing that would be nice would be to receive notification of
disk failures from the OS via email or SMS (like the vendor I
previously alluded to), but I know I'm talking crazy now.


Configure an SNMP monitor to do as you wish. FMA generates SNMP
traps when something like that occurs. Solaris ships with net-snmp, see
snmpd(1m) for more info.
 -- richard

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] SNV_125 MPT warning in logfile

2009-10-22 Thread Carson Gaspar

On 10/22/09 4:07 PM, James C. McPherson wrote:

Adam Cheal wrote:

It seems to be timing out accessing a disk, retrying, giving up and then
doing a bus reset?

...

ugh. New bug time - bugs.opensolaris.org, please select
Solaris / kernel / driver-mpt. In addition to the error
messages and description of when you see it, please provide
output from

cfgadm -lav
prtconf -v

I'll see that it gets moved to the correct group asap.


FYI this is very similar to the behaviour I was seeing with my directly attached 
SATA disks on snv_118 (see the list archives for my original messages). I have 
not yet seen the error since I replaced my Hitachi 500 GB disks for Seagate 
1.5TB disks, so it could very well have been some unfortunate LSI firmware / 
Hitachi drive firmware interaction.


carson:gandalf 0 $ gzcat /var/adm/messages.2.gz  | ggrep -4 mpt | tail -9
Oct  8 00:44:17 gandalf.taltos.org scsi: [ID 365881 kern.notice] 
/p...@0,0/pci8086,2...@1c/pci1000,3...@0 (mpt0):

Oct  8 00:44:17 gandalf.taltos.org  Log info 0x3113 received for target 
1.
Oct  8 00:44:17 gandalf.taltos.org  scsi_status=0x0, ioc_status=0x8048, 
scsi_state=0xc
Oct  8 00:44:17 gandalf.taltos.org scsi: [ID 365881 kern.notice] 
/p...@0,0/pci8086,2...@1c/pci1000,3...@0 (mpt0):

Oct  8 00:44:17 gandalf.taltos.org  Log info 0x3113 received for target 
1.
Oct  8 00:44:17 gandalf.taltos.org  scsi_status=0x0, ioc_status=0x8048, 
scsi_state=0xc
Oct  8 00:44:17 gandalf.taltos.org scsi: [ID 365881 kern.notice] 
/p...@0,0/pci8086,2...@1c/pci1000,3...@0 (mpt0):

Oct  8 00:44:17 gandalf.taltos.org  Log info 0x3113 received for target 
1.
Oct  8 00:44:17 gandalf.taltos.org  scsi_status=0x0, ioc_status=0x8048, 
scsi_state=0xc


carson:gandalf 1 $ gzcat /var/adm/messages.2.gz  | sed -ne 's,^.*\(Log 
info\),\1,p' | sort -u

Log info 0x31110b00 received for target 7.
Log info 0x3113 received for target 0.
Log info 0x3113 received for target 1.
Log info 0x3113 received for target 2.
Log info 0x3113 received for target 3.
Log info 0x3113 received for target 4.
Log info 0x3113 received for target 6.
Log info 0x3113 received for target 7.
Log info 0x3114 received for target 0.
Log info 0x3114 received for target 1.
Log info 0x3114 received for target 2.
Log info 0x3114 received for target 3.
Log info 0x3114 received for target 4.
Log info 0x3114 received for target 6.
Log info 0x3114 received for target 7.

carson:gandalf 0 $ gzcat /var/adm/messages.2.gz  | sed -ne 
's,^.*\(scsi_status\),\1,p' | sort -u

scsi_status=0x0, ioc_status=0x8048, scsi_state=0xc
scsi_status=0x0, ioc_status=0x804b, scsi_state=0xc

--
Carson
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] moving files from one fs to another, splittin/merging

2009-10-22 Thread David Turnbull

On 21/10/2009, at 7:39 AM, Mike Bo wrote:

Once data resides within a pool, there should be an efficient method  
of moving it from one ZFS file system to another. Think Link/Unlink  
vs. Copy/Remove.


I agree with this sentiment, it's certainly a surprise when you first  
notice.


Here's my scenario... When I originally created a 3TB pool, I didn't  
know the best way carve up the space, so I used a single, flat ZFS  
file system. Now that I'm more familiar with ZFS, managing the sub- 
directories as separate file systems would have made a lot more  
sense (seperate policies, snapshots, etc.). The problem is that some  
of these directories contain tens of thousands of files and many  
hundreds of gigabytes. Copying this much data between file systems  
within the same disk pool just seems wrong.


I hope such a feature is possible and not too difficult to  
implement, because I'd like to see this capability in ZFS.


It doesn't seem unreasonable. It seems like the different properties  
available on the given datasets (recordsize, checksum, compression,  
encryption, copies, version, utf8only, casesensitivity) would have to  
match, or else fall back to blind copying?




Regards,
mikebo
--
This message posted from opensolaris.org
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss