[zfs-discuss] test

2011-04-15 Thread Jerry Kemp
I have not seen any email from this list in a couple of days.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] zpool scrub on b123

2011-04-15 Thread Karl Rossing

Hi,

One of our zfs volumes seems to be having some errors. So I ran zpool 
scrub and it's currently showing the following.


-bash-3.2$ pfexec /usr/sbin/zpool status -x
  pool: vdipool
 state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
attempt was made to correct the error.  Applications are 
unaffected.

action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://www.sun.com/msg/ZFS-8000-9P
 scrub: scrub in progress for 3h10m, 13.53% done, 20h16m to go
config:

NAME STATE READ WRITE CKSUM
vdipool  ONLINE   0 0 0
  raidz1 ONLINE   0 0 0
c9t14d0  ONLINE   0 012  6K repaired
c9t15d0  ONLINE   0 013  167K repaired
c9t16d0  ONLINE   0 011  5.50K repaired
c9t17d0  ONLINE   0 020  10K repaired
c9t18d0  ONLINE   0 015  7.50K repaired
spares
  c9t19d0AVAIL

errors: No known data errors


I have another server connected to the same jbod using drives c8t1d0 to 
c8t13d0 and it doesn't seem to have any errors.


I'm wondering how it could have gotten so screwed up?

Karl





CONFIDENTIALITY NOTICE:  This communication (including all attachments) is
confidential and is intended for the use of the named addressee(s) only and
may contain information that is private, confidential, privileged, and
exempt from disclosure under law.  All rights to privilege are expressly
claimed and reserved and are not waived.  Any use, dissemination,
distribution, copying or disclosure of this message and any attachments, in
whole or in part, by anyone other than the intended recipient(s) is strictly
prohibited.  If you have received this communication in error, please notify
the sender immediately, delete this communication from all data storage
devices and destroy all hard copies.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zpool scrub on b123

2011-04-15 Thread Cindy Swearingen

Hi Karl...

I just saw this same condition on another list. I think the poster
resolved it by replacing the HBA.

Drives go bad but they generally don't all go bad at once, so I would
suspect some common denominator like the HBA/controller, cables, and
so on.

See what FMA thinks by running fmdump like this:

# fmdump
TIME UUID SUNW-MSG-ID
Apr 11 16:02:38.2262 ed0bdffe-3cf9-6f46-f20c-99e2b9a6f1cb ZFS-8000-D3
Apr 11 16:22:23.8401 d4157e2f-c46d-c1e9-c05b-f2d3e57f3893 ZFS-8000-D3
Apr 14 15:55:26.1918 71bd0b08-60c2-e114-e1bc-daa03d7b163f ZFS-8000-D3

This output will tell you when the problem started.

Depending on what fmdump says, which probably indicates multiple drive
problems, I would run diagnostics on the HBA or get it replaced.

Always have good backups.

Thanks,

Cindy


On 04/15/11 12:52, Karl Rossing wrote:

Hi,

One of our zfs volumes seems to be having some errors. So I ran zpool 
scrub and it's currently showing the following.


-bash-3.2$ pfexec /usr/sbin/zpool status -x
  pool: vdipool
 state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
attempt was made to correct the error.  Applications are 
unaffected.

action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://www.sun.com/msg/ZFS-8000-9P
 scrub: scrub in progress for 3h10m, 13.53% done, 20h16m to go
config:

NAME STATE READ WRITE CKSUM
vdipool  ONLINE   0 0 0
  raidz1 ONLINE   0 0 0
c9t14d0  ONLINE   0 012  6K repaired
c9t15d0  ONLINE   0 013  167K repaired
c9t16d0  ONLINE   0 011  5.50K repaired
c9t17d0  ONLINE   0 020  10K repaired
c9t18d0  ONLINE   0 015  7.50K repaired
spares
  c9t19d0AVAIL

errors: No known data errors


I have another server connected to the same jbod using drives c8t1d0 to 
c8t13d0 and it doesn't seem to have any errors.


I'm wondering how it could have gotten so screwed up?

Karl





CONFIDENTIALITY NOTICE:  This communication (including all attachments) is
confidential and is intended for the use of the named addressee(s) only and
may contain information that is private, confidential, privileged, and
exempt from disclosure under law.  All rights to privilege are expressly
claimed and reserved and are not waived.  Any use, dissemination,
distribution, copying or disclosure of this message and any attachments, in
whole or in part, by anyone other than the intended recipient(s) is 
strictly
prohibited.  If you have received this communication in error, please 
notify

the sender immediately, delete this communication from all data storage
devices and destroy all hard copies.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zpool scrub on b123

2011-04-15 Thread Cindy Swearingen

D'oh. One more thing.

We had a problem in b120-123 that caused random checksum errors on RAIDZ 
configs. This info is still in the ZFS troubleshooting guide.


See if a zpool clear resolves these errors. If that works, then I would
upgrade to a more recent build and see if the problem is resolved
completely.

If not, then see the recommendation below.

Thanks,

Cindy

On 04/15/11 13:18, Cindy Swearingen wrote:

Hi Karl...

I just saw this same condition on another list. I think the poster
resolved it by replacing the HBA.

Drives go bad but they generally don't all go bad at once, so I would
suspect some common denominator like the HBA/controller, cables, and
so on.

See what FMA thinks by running fmdump like this:

# fmdump
TIME UUID SUNW-MSG-ID
Apr 11 16:02:38.2262 ed0bdffe-3cf9-6f46-f20c-99e2b9a6f1cb ZFS-8000-D3
Apr 11 16:22:23.8401 d4157e2f-c46d-c1e9-c05b-f2d3e57f3893 ZFS-8000-D3
Apr 14 15:55:26.1918 71bd0b08-60c2-e114-e1bc-daa03d7b163f ZFS-8000-D3

This output will tell you when the problem started.

Depending on what fmdump says, which probably indicates multiple drive
problems, I would run diagnostics on the HBA or get it replaced.

Always have good backups.

Thanks,

Cindy


On 04/15/11 12:52, Karl Rossing wrote:

Hi,

One of our zfs volumes seems to be having some errors. So I ran zpool 
scrub and it's currently showing the following.


-bash-3.2$ pfexec /usr/sbin/zpool status -x
  pool: vdipool
 state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
attempt was made to correct the error.  Applications are 
unaffected.
action: Determine if the device needs to be replaced, and clear the 
errors

using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://www.sun.com/msg/ZFS-8000-9P
 scrub: scrub in progress for 3h10m, 13.53% done, 20h16m to go
config:

NAME STATE READ WRITE CKSUM
vdipool  ONLINE   0 0 0
  raidz1 ONLINE   0 0 0
c9t14d0  ONLINE   0 012  6K repaired
c9t15d0  ONLINE   0 013  167K repaired
c9t16d0  ONLINE   0 011  5.50K repaired
c9t17d0  ONLINE   0 020  10K repaired
c9t18d0  ONLINE   0 015  7.50K repaired
spares
  c9t19d0AVAIL

errors: No known data errors


I have another server connected to the same jbod using drives c8t1d0 
to c8t13d0 and it doesn't seem to have any errors.


I'm wondering how it could have gotten so screwed up?

Karl





CONFIDENTIALITY NOTICE:  This communication (including all 
attachments) is
confidential and is intended for the use of the named addressee(s) 
only and

may contain information that is private, confidential, privileged, and
exempt from disclosure under law.  All rights to privilege are expressly
claimed and reserved and are not waived.  Any use, dissemination,
distribution, copying or disclosure of this message and any 
attachments, in
whole or in part, by anyone other than the intended recipient(s) is 
strictly
prohibited.  If you have received this communication in error, please 
notify

the sender immediately, delete this communication from all data storage
devices and destroy all hard copies.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zpool scrub on b123

2011-04-15 Thread Karl Rossing

I'm going to wait until the scrub is complete before diving in some more.

I'm wondering if replacing the LSI SAS 3801E with an LSI SAS 9200-8e 
might help too.


Karl

On 04/15/2011 02:23 PM, Cindy Swearingen wrote:

D'oh. One more thing.

We had a problem in b120-123 that caused random checksum errors on 
RAIDZ configs. This info is still in the ZFS troubleshooting guide.


See if a zpool clear resolves these errors. If that works, then I would
upgrade to a more recent build and see if the problem is resolved
completely.

If not, then see the recommendation below.

Thanks,

Cindy

On 04/15/11 13:18, Cindy Swearingen wrote:

Hi Karl...

I just saw this same condition on another list. I think the poster
resolved it by replacing the HBA.

Drives go bad but they generally don't all go bad at once, so I would
suspect some common denominator like the HBA/controller, cables, and
so on.

See what FMA thinks by running fmdump like this:

# fmdump
TIME UUID SUNW-MSG-ID
Apr 11 16:02:38.2262 ed0bdffe-3cf9-6f46-f20c-99e2b9a6f1cb ZFS-8000-D3
Apr 11 16:22:23.8401 d4157e2f-c46d-c1e9-c05b-f2d3e57f3893 ZFS-8000-D3
Apr 14 15:55:26.1918 71bd0b08-60c2-e114-e1bc-daa03d7b163f ZFS-8000-D3

This output will tell you when the problem started.

Depending on what fmdump says, which probably indicates multiple drive
problems, I would run diagnostics on the HBA or get it replaced.

Always have good backups.

Thanks,

Cindy


On 04/15/11 12:52, Karl Rossing wrote:

Hi,

One of our zfs volumes seems to be having some errors. So I ran 
zpool scrub and it's currently showing the following.


-bash-3.2$ pfexec /usr/sbin/zpool status -x
  pool: vdipool
 state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
attempt was made to correct the error.  Applications are 
unaffected.
action: Determine if the device needs to be replaced, and clear the 
errors

using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://www.sun.com/msg/ZFS-8000-9P
 scrub: scrub in progress for 3h10m, 13.53% done, 20h16m to go
config:

NAME STATE READ WRITE CKSUM
vdipool  ONLINE   0 0 0
  raidz1 ONLINE   0 0 0
c9t14d0  ONLINE   0 012  6K repaired
c9t15d0  ONLINE   0 013  167K repaired
c9t16d0  ONLINE   0 011  5.50K repaired
c9t17d0  ONLINE   0 020  10K repaired
c9t18d0  ONLINE   0 015  7.50K repaired
spares
  c9t19d0AVAIL

errors: No known data errors


I have another server connected to the same jbod using drives c8t1d0 
to c8t13d0 and it doesn't seem to have any errors.


I'm wondering how it could have gotten so screwed up?

Karl





CONFIDENTIALITY NOTICE:  This communication (including all 
attachments) is
confidential and is intended for the use of the named addressee(s) 
only and

may contain information that is private, confidential, privileged, and
exempt from disclosure under law.  All rights to privilege are 
expressly

claimed and reserved and are not waived.  Any use, dissemination,
distribution, copying or disclosure of this message and any 
attachments, in
whole or in part, by anyone other than the intended recipient(s) is 
strictly
prohibited.  If you have received this communication in error, 
please notify

the sender immediately, delete this communication from all data storage
devices and destroy all hard copies.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss




CONFIDENTIALITY NOTICE:  This communication (including all attachments) is
confidential and is intended for the use of the named addressee(s) only and
may contain information that is private, confidential, privileged, and
exempt from disclosure under law.  All rights to privilege are expressly
claimed and reserved and are not waived.  Any use, dissemination,
distribution, copying or disclosure of this message and any attachments, in
whole or in part, by anyone other than the intended recipient(s) is strictly
prohibited.  If you have received this communication in error, please notify
the sender immediately, delete this communication from all data storage
devices and destroy all hard copies.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zpool scrub on b123

2011-04-15 Thread Karl Rossing

Would moving the pool to a Solaris 10U9 server fix the random RAIDZ errors?

On 04/15/2011 02:23 PM, Cindy Swearingen wrote:

D'oh. One more thing.

We had a problem in b120-123 that caused random checksum errors on 
RAIDZ configs. This info is still in the ZFS troubleshooting guide.


See if a zpool clear resolves these errors. If that works, then I would
upgrade to a more recent build and see if the problem is resolved
completely.

If not, then see the recommendation below.

Thanks,

Cindy

On 04/15/11 13:18, Cindy Swearingen wrote:

Hi Karl...

I just saw this same condition on another list. I think the poster
resolved it by replacing the HBA.

Drives go bad but they generally don't all go bad at once, so I would
suspect some common denominator like the HBA/controller, cables, and
so on.

See what FMA thinks by running fmdump like this:

# fmdump
TIME UUID SUNW-MSG-ID
Apr 11 16:02:38.2262 ed0bdffe-3cf9-6f46-f20c-99e2b9a6f1cb ZFS-8000-D3
Apr 11 16:22:23.8401 d4157e2f-c46d-c1e9-c05b-f2d3e57f3893 ZFS-8000-D3
Apr 14 15:55:26.1918 71bd0b08-60c2-e114-e1bc-daa03d7b163f ZFS-8000-D3

This output will tell you when the problem started.

Depending on what fmdump says, which probably indicates multiple drive
problems, I would run diagnostics on the HBA or get it replaced.

Always have good backups.

Thanks,

Cindy


On 04/15/11 12:52, Karl Rossing wrote:

Hi,

One of our zfs volumes seems to be having some errors. So I ran 
zpool scrub and it's currently showing the following.


-bash-3.2$ pfexec /usr/sbin/zpool status -x
  pool: vdipool
 state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
attempt was made to correct the error.  Applications are 
unaffected.
action: Determine if the device needs to be replaced, and clear the 
errors

using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://www.sun.com/msg/ZFS-8000-9P
 scrub: scrub in progress for 3h10m, 13.53% done, 20h16m to go
config:

NAME STATE READ WRITE CKSUM
vdipool  ONLINE   0 0 0
  raidz1 ONLINE   0 0 0
c9t14d0  ONLINE   0 012  6K repaired
c9t15d0  ONLINE   0 013  167K repaired
c9t16d0  ONLINE   0 011  5.50K repaired
c9t17d0  ONLINE   0 020  10K repaired
c9t18d0  ONLINE   0 015  7.50K repaired
spares
  c9t19d0AVAIL

errors: No known data errors


I have another server connected to the same jbod using drives c8t1d0 
to c8t13d0 and it doesn't seem to have any errors.


I'm wondering how it could have gotten so screwed up?

Karl





CONFIDENTIALITY NOTICE:  This communication (including all 
attachments) is
confidential and is intended for the use of the named addressee(s) 
only and

may contain information that is private, confidential, privileged, and
exempt from disclosure under law.  All rights to privilege are 
expressly

claimed and reserved and are not waived.  Any use, dissemination,
distribution, copying or disclosure of this message and any 
attachments, in
whole or in part, by anyone other than the intended recipient(s) is 
strictly
prohibited.  If you have received this communication in error, 
please notify

the sender immediately, delete this communication from all data storage
devices and destroy all hard copies.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss




CONFIDENTIALITY NOTICE:  This communication (including all attachments) is
confidential and is intended for the use of the named addressee(s) only and
may contain information that is private, confidential, privileged, and
exempt from disclosure under law.  All rights to privilege are expressly
claimed and reserved and are not waived.  Any use, dissemination,
distribution, copying or disclosure of this message and any attachments, in
whole or in part, by anyone other than the intended recipient(s) is strictly
prohibited.  If you have received this communication in error, please notify
the sender immediately, delete this communication from all data storage
devices and destroy all hard copies.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] zpool scrub on b123

2011-04-15 Thread Cindy Swearingen

Yes, the Solaris 10 9/10 release has the fix for RAIDZ checksum errors
if you have ruled out any hardware problems.

cs
On 04/15/11 14:47, Karl Rossing wrote:

Would moving the pool to a Solaris 10U9 server fix the random RAIDZ errors?

On 04/15/2011 02:23 PM, Cindy Swearingen wrote:

D'oh. One more thing.

We had a problem in b120-123 that caused random checksum errors on 
RAIDZ configs. This info is still in the ZFS troubleshooting guide.


See if a zpool clear resolves these errors. If that works, then I would
upgrade to a more recent build and see if the problem is resolved
completely.

If not, then see the recommendation below.

Thanks,

Cindy

On 04/15/11 13:18, Cindy Swearingen wrote:

Hi Karl...

I just saw this same condition on another list. I think the poster
resolved it by replacing the HBA.

Drives go bad but they generally don't all go bad at once, so I would
suspect some common denominator like the HBA/controller, cables, and
so on.

See what FMA thinks by running fmdump like this:

# fmdump
TIME UUID SUNW-MSG-ID
Apr 11 16:02:38.2262 ed0bdffe-3cf9-6f46-f20c-99e2b9a6f1cb ZFS-8000-D3
Apr 11 16:22:23.8401 d4157e2f-c46d-c1e9-c05b-f2d3e57f3893 ZFS-8000-D3
Apr 14 15:55:26.1918 71bd0b08-60c2-e114-e1bc-daa03d7b163f ZFS-8000-D3

This output will tell you when the problem started.

Depending on what fmdump says, which probably indicates multiple drive
problems, I would run diagnostics on the HBA or get it replaced.

Always have good backups.

Thanks,

Cindy


On 04/15/11 12:52, Karl Rossing wrote:

Hi,

One of our zfs volumes seems to be having some errors. So I ran 
zpool scrub and it's currently showing the following.


-bash-3.2$ pfexec /usr/sbin/zpool status -x
  pool: vdipool
 state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
attempt was made to correct the error.  Applications are 
unaffected.
action: Determine if the device needs to be replaced, and clear the 
errors

using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://www.sun.com/msg/ZFS-8000-9P
 scrub: scrub in progress for 3h10m, 13.53% done, 20h16m to go
config:

NAME STATE READ WRITE CKSUM
vdipool  ONLINE   0 0 0
  raidz1 ONLINE   0 0 0
c9t14d0  ONLINE   0 012  6K repaired
c9t15d0  ONLINE   0 013  167K repaired
c9t16d0  ONLINE   0 011  5.50K repaired
c9t17d0  ONLINE   0 020  10K repaired
c9t18d0  ONLINE   0 015  7.50K repaired
spares
  c9t19d0AVAIL

errors: No known data errors


I have another server connected to the same jbod using drives c8t1d0 
to c8t13d0 and it doesn't seem to have any errors.


I'm wondering how it could have gotten so screwed up?

Karl





CONFIDENTIALITY NOTICE:  This communication (including all 
attachments) is
confidential and is intended for the use of the named addressee(s) 
only and

may contain information that is private, confidential, privileged, and
exempt from disclosure under law.  All rights to privilege are 
expressly

claimed and reserved and are not waived.  Any use, dissemination,
distribution, copying or disclosure of this message and any 
attachments, in
whole or in part, by anyone other than the intended recipient(s) is 
strictly
prohibited.  If you have received this communication in error, 
please notify

the sender immediately, delete this communication from all data storage
devices and destroy all hard copies.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss




CONFIDENTIALITY NOTICE:  This communication (including all attachments) is
confidential and is intended for the use of the named addressee(s) only and
may contain information that is private, confidential, privileged, and
exempt from disclosure under law.  All rights to privilege are expressly
claimed and reserved and are not waived.  Any use, dissemination,
distribution, copying or disclosure of this message and any attachments, in
whole or in part, by anyone other than the intended recipient(s) is 
strictly
prohibited.  If you have received this communication in error, please 
notify

the sender immediately, delete this communication from all data storage
devices and destroy all hard copies.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss