Re: [zfs-discuss] Solaris 11 System Reboots Continuously Because of a ZFS-Related Panic (7191375)

2013-01-14 Thread Cindy Swearingen

Hi Jamie,

Yes, that is correct.

The S11u1 version of this bug is:

https://bug.oraclecorp.com/pls/bug/webbug_print.show?c_rptno=15852599

and has this notation which means Solaris 11.1 SRU 3.4:

Changeset pushed to build 0.175.1.3.0.4.0

Thanks,

Cindy

On 01/11/13 19:10, Jamie Krier wrote:

It appears this bug has been fixed in Solaris 11.1 SRU 3.4

7191375 15809921SUNBT7191375 metadata rewrites should 
coordinate with
l2arc


Cindy can you confirm?

Thanks


On Fri, Jan 4, 2013 at 3:55 PM, Richard Elling richard.ell...@gmail.com
mailto:richard.ell...@gmail.com wrote:

On Jan 4, 2013, at 11:12 AM, Robert Milkowski
rmilkow...@task.gda.pl mailto:rmilkow...@task.gda.pl wrote:



Illumos is not so good at dealing with huge memory systems but
perhaps
it is also more stable as well.


Well, I guess that it depends on your environment, but generally I
would
expect S11 to be more stable if only because the sheer amount of bugs
reported by paid customers and bug fixes by Oracle that Illumos is not
getting (lack of resource, limited usage, etc.).


There is a two-edged sword. Software reliability analysis shows that
the
most reliable software is the software that is oldest and unchanged.
But
people also want new functionality. So while Oracle has more changes
being implemented in Solaris, it is destabilizing while simultaneously
improving reliability. Unfortunately, it is hard to get both wins.
What is more
likely is that new features are being driven into Solaris 11 that are
destabilizing. By contrast, the number of new features being added to
illumos-gate (not to be confused with illumos-based distros) is
relatively
modest and in all cases are not gratuitous.
  -- richard

--

richard.ell...@richardelling.com
mailto:richard.ell...@richardelling.com
+1-760-896-4422 tel:%2B1-760-896-4422












___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Solaris 11 System Reboots Continuously Because of a ZFS-Related Panic (7191375)

2013-01-14 Thread Tomas Forsman
On 14 January, 2013 - Cindy Swearingen sent me these 2,3K bytes:

 Hi Jamie,

 Yes, that is correct.

 The S11u1 version of this bug is:

 https://bug.oraclecorp.com/pls/bug/webbug_print.show?c_rptno=15852599

Host oraclecorp.com not found: 3(NXDOMAIN)

Would oracle.internal be a better domain name?

 and has this notation which means Solaris 11.1 SRU 3.4:

 Changeset pushed to build 0.175.1.3.0.4.0

 Thanks,

 Cindy

 On 01/11/13 19:10, Jamie Krier wrote:
 It appears this bug has been fixed in Solaris 11.1 SRU 3.4

 7191375  15809921SUNBT7191375 metadata rewrites should 
 coordinate with
 l2arc


 Cindy can you confirm?

 Thanks


 On Fri, Jan 4, 2013 at 3:55 PM, Richard Elling richard.ell...@gmail.com
 mailto:richard.ell...@gmail.com wrote:

 On Jan 4, 2013, at 11:12 AM, Robert Milkowski
 rmilkow...@task.gda.pl mailto:rmilkow...@task.gda.pl wrote:


 Illumos is not so good at dealing with huge memory systems but
 perhaps
 it is also more stable as well.

 Well, I guess that it depends on your environment, but generally I
 would
 expect S11 to be more stable if only because the sheer amount of bugs
 reported by paid customers and bug fixes by Oracle that Illumos is not
 getting (lack of resource, limited usage, etc.).

 There is a two-edged sword. Software reliability analysis shows that
 the
 most reliable software is the software that is oldest and unchanged.
 But
 people also want new functionality. So while Oracle has more changes
 being implemented in Solaris, it is destabilizing while simultaneously
 improving reliability. Unfortunately, it is hard to get both wins.
 What is more
 likely is that new features are being driven into Solaris 11 that are
 destabilizing. By contrast, the number of new features being added to
 illumos-gate (not to be confused with illumos-based distros) is
 relatively
 modest and in all cases are not gratuitous.
   -- richard

 --

 richard.ell...@richardelling.com
 mailto:richard.ell...@richardelling.com
 +1-760-896-4422 tel:%2B1-760-896-4422












 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


/Tomas
-- 
Tomas Forsman, st...@acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Solaris 11 System Reboots Continuously Because of a ZFS-Related Panic (7191375)

2013-01-14 Thread Ian Collins

Cindy Swearingen wrote:

Hi Jamie,

Yes, that is correct.

The S11u1 version of this bug is:

https://bug.oraclecorp.com/pls/bug/webbug_print.show?c_rptno=15852599

and has this notation which means Solaris 11.1 SRU 3.4:

Changeset pushed to build 0.175.1.3.0.4.0

Hello Cindy,

I really really hope this will be a public update.  Within a week of 
upgrading to 11.1 I hit this bug and I had to rebuild my main pool. I'm 
still restoring backups.


Without this fix, 11.1 is a bomb waiting to go off!

--
Ian.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Solaris 11 System Reboots Continuously Because of a ZFS-Related Panic (7191375)

2013-01-14 Thread Nico Williams
On Mon, Jan 14, 2013 at 1:48 PM, Tomas Forsman st...@acc.umu.se wrote:
 https://bug.oraclecorp.com/pls/bug/webbug_print.show?c_rptno=15852599

 Host oraclecorp.com not found: 3(NXDOMAIN)

 Would oracle.internal be a better domain name?

Things like that cannot be changed easily.  They (Oracle) are stuck
with that domainname for the forseeable future.  Also, whoever thought
it up probably didn't consider leakage of internal URIs to the
outside.  *shrug*
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Solaris 11 System Reboots Continuously Because of a ZFS-Related Panic (7191375)

2013-01-14 Thread Cindy Swearingen

I believe the bug.oraclecorp.com URL is accessible with a support
contract, but its difficult for me to test.

I should have mentioned it. I apologize.

cs

On 01/14/13 14:02, Nico Williams wrote:

On Mon, Jan 14, 2013 at 1:48 PM, Tomas Forsmanst...@acc.umu.se  wrote:

https://bug.oraclecorp.com/pls/bug/webbug_print.show?c_rptno=15852599


Host oraclecorp.com not found: 3(NXDOMAIN)

Would oracle.internal be a better domain name?


Things like that cannot be changed easily.  They (Oracle) are stuck
with that domainname for the forseeable future.  Also, whoever thought
it up probably didn't consider leakage of internal URIs to the
outside.  *shrug*
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Solaris 11 System Reboots Continuously Because of a ZFS-Related Panic (7191375)

2013-01-14 Thread Tomas Forsman
On 14 January, 2013 - Cindy Swearingen sent me these 1,0K bytes:

 I believe the bug.oraclecorp.com URL is accessible with a support
 contract, but its difficult for me to test.

Support contract or not, the domain is not exposed to the internet.

 I should have mentioned it. I apologize.

 cs

 On 01/14/13 14:02, Nico Williams wrote:
 On Mon, Jan 14, 2013 at 1:48 PM, Tomas Forsmanst...@acc.umu.se  wrote:
 https://bug.oraclecorp.com/pls/bug/webbug_print.show?c_rptno=15852599

 Host oraclecorp.com not found: 3(NXDOMAIN)

 Would oracle.internal be a better domain name?

 Things like that cannot be changed easily.  They (Oracle) are stuck
 with that domainname for the forseeable future.  Also, whoever thought
 it up probably didn't consider leakage of internal URIs to the
 outside.  *shrug*
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


/Tomas
-- 
Tomas Forsman, st...@acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Solaris 11 System Reboots Continuously Because of a ZFS-Related Panic (7191375)

2013-01-11 Thread Jamie Krier
It appears this bug has been fixed in Solaris 11.1 SRU 3.4

719137515809921SUNBT7191375 metadata rewrites should coordinate with l2arc

Cindy can you confirm?

Thanks


On Fri, Jan 4, 2013 at 3:55 PM, Richard Elling richard.ell...@gmail.comwrote:

 On Jan 4, 2013, at 11:12 AM, Robert Milkowski rmilkow...@task.gda.pl
 wrote:


 Illumos is not so good at dealing with huge memory systems but perhaps
 it is also more stable as well.


 Well, I guess that it depends on your environment, but generally I would
 expect S11 to be more stable if only because the sheer amount of bugs
 reported by paid customers and bug fixes by Oracle that Illumos is not
 getting (lack of resource, limited usage, etc.).


 There is a two-edged sword. Software reliability analysis shows that the
 most reliable software is the software that is oldest and unchanged. But
 people also want new functionality. So while Oracle has more changes
 being implemented in Solaris, it is destabilizing while simultaneously
 improving reliability. Unfortunately, it is hard to get both wins. What is
 more
 likely is that new features are being driven into Solaris 11 that are
 destabilizing. By contrast, the number of new features being added to
 illumos-gate (not to be confused with illumos-based distros) is relatively
 modest and in all cases are not gratuitous.
  -- richard

 --

 richard.ell...@richardelling.com
 +1-760-896-4422










___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Solaris 11 System Reboots Continuously Because of a ZFS-Related Panic (7191375)

2013-01-04 Thread Robert Milkowski


 
 Illumos is not so good at dealing with huge memory systems but perhaps
 it is also more stable as well.

Well, I guess that it depends on your environment, but generally I would
expect S11 to be more stable if only because the sheer amount of bugs
reported by paid customers and bug fixes by Oracle that Illumos is not
getting (lack of resource, limited usage, etc.).

-- 
Robert Milkowski
http://milek.blogspot.com


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Solaris 11 System Reboots Continuously Because of a ZFS-Related Panic (7191375)

2013-01-04 Thread Richard Elling
On Jan 4, 2013, at 11:12 AM, Robert Milkowski rmilkow...@task.gda.pl wrote:

 
 Illumos is not so good at dealing with huge memory systems but perhaps
 it is also more stable as well.
 
 Well, I guess that it depends on your environment, but generally I would
 expect S11 to be more stable if only because the sheer amount of bugs
 reported by paid customers and bug fixes by Oracle that Illumos is not
 getting (lack of resource, limited usage, etc.).


There is a two-edged sword. Software reliability analysis shows that the 
most reliable software is the software that is oldest and unchanged. But 
people also want new functionality. So while Oracle has more changes
being implemented in Solaris, it is destabilizing while simultaneously
improving reliability. Unfortunately, it is hard to get both wins. What is more
likely is that new features are being driven into Solaris 11 that are 
destabilizing. By contrast, the number of new features being added to
illumos-gate (not to be confused with illumos-based distros) is relatively
modest and in all cases are not gratuitous.
 -- richard

--

richard.ell...@richardelling.com
+1-760-896-4422









___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Solaris 11 System Reboots Continuously Because of a ZFS-Related Panic (7191375)

2012-12-30 Thread cindy swearingen
Existing Solaris 10 releases are not impacted. S10u11 isn't released yet so
I think
we can assume that this upcoming Solaris 10 release will include a
preventative fix.

Thanks, Cindy

On Thu, Dec 27, 2012 at 11:11 PM, Andras Spitzer wsen...@gmail.com wrote:

 Josh,

 You mention that Oracle is preparing patches for both Solaris 11.2 and
 S10u11, does that mean that the bug exist in Solaris 10 as well? I may be
 wrong but Cindy mentioned the bug is only in Solaris 11.

 Regards,
 sendai

 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Solaris 11 System Reboots Continuously Because of a ZFS-Related Panic (7191375)

2012-12-28 Thread Andras Spitzer
Josh,

You mention that Oracle is preparing patches for both Solaris 11.2 and
S10u11, does that mean that the bug exist in Solaris 10 as well? I may be
wrong but Cindy mentioned the bug is only in Solaris 11.

Regards,
sendai
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Solaris 11 System Reboots Continuously Because of a ZFS-Related Panic (7191375)

2012-12-27 Thread Josh Simon

Hi,

We were hit by this bug as well on Solaris 11 (2+ months ago). Our only 
options were to import the pool read-only and transfer the data off to 
another system or restore from backups.


Oracle told us that the bug is caused by a race condition within 
read/re-write operations on the same block. There is a small window of 
opportunity for the in-memory data (in the ARC) for an individual data 
block to become corrupt when being re-written while a read request is 
in-flight for the same block and the pass is greater than 1.


Assuming the re-write operation completes first, the read operation 
overwrites the in-memory copy using the older/stale on-disk data 
(corruption). If the read is completed before the re-write no corruption 
is seen. It's a very specific set of circumstances needed to reproduce 
the issue. The reason why metaslabs are more commonly affected is due to 
the fact they're re-written within the same birthtime more frequently 
than any other object.


Solaris 11.1 has a new feature (ZIO Join) that allows multiple read 
requests for the same data block to issue just 1 IO instead of 
individual IOs for each request. The bug still exists in S11.1 but the 
new code reduces the window of opportunity for this bug to almost zero. 
The complete bug fix has already been implemented in Solaris 12 and is 
currently being tested in Solaris 11.2 and S10u11. From there it will be 
put into an SRU for S11.1 (I assume S11.0 as well).


I followed up with Oracle today and was told that their investigation 
uncovered that rewrite may inherit a previous copy of a metadata block 
cached in L2ARC. As soon as rewritten block is evicted from ARC, the 
next read will fetch a stale inherited copy from L2ARC. So not using 
L2ARC or CACHE devices sounds like a good idea to me!


Hopefully this nasty bug is fixed soon :(

Thanks,

Josh Simon

On 12/12/2012 1:21 PM, Jamie Krier wrote:

I've hit this bug on four of my Solaris 11 servers. Looking for anyone
else who has seen it, as well as comments/speculation on cause.

This bug is pretty bad.  If you are lucky you can import the pool
read-only and migrate it elsewhere.

I've also tried setting zfs:zfs_recover=1,aok=1 with varying results.


http://docs.oracle.com/cd/E26502_01/html/E28978/gmkgj.html#scrolltoc


Hardware platform:

Supermicro X8DAH

144GB ram

Supermicro sas2 jbods

LSI 9200-8e controllers (Phase 13 fw)

Zuesram log

ZuesIops sas l2arc

Seagate ST33000650SS sas drives


All four servers are running the same hardware, so at first I suspected
a problem there.  I opened a ticket with Oracle which ended with this email:

-

We strongly expect that this is a software issue because this problem
does not happen

on Solaris 10.   On Solaris 11, it happens with both the SPARC and the
X64 versions of

Solaris.


We have quite a few customer who have seen this issue and we are in the
process of

working on a fix.  Because we do not know the source of the problem yet,
I cannot speculate

on the time to fix.  This particular portion of Solaris 11 (the virtual
memory sub-system) is quite

different than in Solaris 10.  We re-wrote the memory management in
order to get ready for

systems with much more memory than Solaris 10 was designed to handle.


Because this is the memory management system, there is not expected to
be any

work-around.


Depending on your company's requirements, one possibility is to use
Solaris 10 until this

issue is resolved.


I apologize for any inconvenience that  this bug may cause.  We are
working on it as a Sev 1 Priority1 in sustaining engineering.

-


I am thinking about switching to an Illumos distro, but wondering if
this problem may be present there as well.


Thanks


- Jamie



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Solaris 11 System Reboots Continuously Because of a ZFS-Related Panic (7191375)

2012-12-19 Thread Cindy Swearingen

Hi Everyone,

I was mistaken. The ZFSSA is not impacted by this bug.

I provided a set of steps below to help identify this problem.

If you file an SR, an IDR can be applied. Otherwise, you will
need to wait for the SRU.

Thanks,

Cindy

If you are running S11 or S11.1 and you have a ZFS storage
pool with separate cache devices, consider running these
steps to identify whether your pool is impacted.

Until an IDR is applied or the SRU is available, remove the
cache devices.

1. Export the pool.

This step is necessary because zdb needs to be run on a
quiet pool.

# zpool export pool-name

2. Run zdb to identify space map inconsistencies.

# zdb -emm pool-name

3. Based on running zdb, determine your next step:

A. If zdb completes successfully, scrub the pool.

# zpool import pool-name
# zpool scrub pool-name

If scrubbing the pool finds no issues, then your pool
is most likely not impacted by this problem.

If scrubbing the pool finds permanent metadata errors,
then you should open an SR.

B. If zdb doesn't complete successfully, open an SR.




On 12/18/12 09:45, Cindy Swearingen wrote:

Hi Sol,

The appliance is affected as well.

I apologize. The MOS article is for internal diagnostics.

I'll provide a set of steps to identify this problem
as soon as I understand them better.

Thanks, Cindy

On 12/18/12 05:27, sol wrote:

*From:* Cindy Swearingen cindy.swearin...@oracle.com
No doubt. This is a bad bug and we apologize.
1. If you are running Solaris 11 or Solaris 11.1 and have separate
cache devices, you should remove them to avoid this problem.

How is the 7000-series storage appliance affected?

2. A MOS knowledge article (1497293.1) is available to help diagnose
this problem.

MOS isn't able to find this article when I search for it.


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Solaris 11 System Reboots Continuously Because of a ZFS-Related Panic (7191375)

2012-12-18 Thread sol
From: Cindy Swearingen cindy.swearin...@oracle.com
No doubt. This is a bad bug and we apologize.
1. If you are running Solaris 11 or Solaris 11.1 and have separate
cache devices, you should remove them to avoid this problem.How is the 
7000-series storage appliance affected?

2. A MOS knowledge article (1497293.1) is available to help diagnose this 
problem.
MOS isn't able to find this article when I search for it.
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Solaris 11 System Reboots Continuously Because of a ZFS-Related Panic (7191375)

2012-12-18 Thread Cindy Swearingen

Hi Sol,

The appliance is affected as well.

I apologize. The MOS article is for internal diagnostics.

I'll provide a set of steps to identify this problem
as soon as I understand them better.

Thanks, Cindy

On 12/18/12 05:27, sol wrote:

*From:* Cindy Swearingen cindy.swearin...@oracle.com
No doubt. This is a bad bug and we apologize.
1. If you are running Solaris 11 or Solaris 11.1 and have separate
cache devices, you should remove them to avoid this problem.

How is the 7000-series storage appliance affected?

2. A MOS knowledge article (1497293.1) is available to help diagnose
this problem.

MOS isn't able to find this article when I search for it.


___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Solaris 11 System Reboots Continuously Because of a ZFS-Related Panic (7191375)

2012-12-17 Thread Cindy Swearingen

Hi Jamie,

No doubt. This is a bad bug and we apologize.

Below is a misconception that this bug is related to the VM2 project.
It is not. Its related to a problem that was introduced in the ZFS ARC
code.

If you would send me your SR number privately, we can work with the
support person to correct this misconception.

We agree with Thomas's advice that should you remove separate cache
devices to help alleviate this problem.

To summarize:

1. If you are running Solaris 11 or Solaris 11.1 and have separate
cache devices, you should remove them to avoid this problem.

When the SRU that fixes this problem is available, apply the SRU.

Solaris 10 releases are not impacted.

2. A MOS knowledge article (1497293.1) is available to help diagnose
this problem.

3. File a MOS SR to get access to the IDR.

4. We hope to have the SRU information available in a few days.

Thanks, Cindy

On 12/12/12 11:21, Jamie Krier wrote:

I've hit this bug on four of my Solaris 11 servers. Looking for anyone
else who has seen it, as well as comments/speculation on cause.

This bug is pretty bad.  If you are lucky you can import the pool
read-only and migrate it elsewhere.

I've also tried setting zfs:zfs_recover=1,aok=1 with varying results.


http://docs.oracle.com/cd/E26502_01/html/E28978/gmkgj.html#scrolltoc


Hardware platform:

Supermicro X8DAH

144GB ram

Supermicro sas2 jbods

LSI 9200-8e controllers (Phase 13 fw)

Zuesram log

ZuesIops sas l2arc

Seagate ST33000650SS sas drives


All four servers are running the same hardware, so at first I suspected
a problem there.  I opened a ticket with Oracle which ended with this email:

-

We strongly expect that this is a software issue because this problem
does not happen

on Solaris 10.   On Solaris 11, it happens with both the SPARC and the
X64 versions of

Solaris.


We have quite a few customer who have seen this issue and we are in the
process of

working on a fix.  Because we do not know the source of the problem yet,
I cannot speculate

on the time to fix.  This particular portion of Solaris 11 (the virtual
memory sub-system) is quite

different than in Solaris 10.  We re-wrote the memory management in
order to get ready for

systems with much more memory than Solaris 10 was designed to handle.


Because this is the memory management system, there is not expected to
be any

work-around.


Depending on your company's requirements, one possibility is to use
Solaris 10 until this

issue is resolved.


I apologize for any inconvenience that  this bug may cause.  We are
working on it as a Sev 1 Priority1 in sustaining engineering.

-


I am thinking about switching to an Illumos distro, but wondering if
this problem may be present there as well.


Thanks


- Jamie



___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Solaris 11 System Reboots Continuously Because of a ZFS-Related Panic (7191375)

2012-12-14 Thread Jamie Krier
I have removed all L2arc devices as a precaution.  Has anyone seen this
error with no L2arc device configured?


On Thu, Dec 13, 2012 at 9:03 AM, Bob Friesenhahn 
bfrie...@simple.dallas.tx.us wrote:

 On Wed, 12 Dec 2012, Jamie Krier wrote:



 I am thinking about switching to an Illumos distro, but wondering if this
 problem may be present there
 as well.


 I believe that Illumos is forked before this new virtual memory sub-system
 was added to Solaris.  There have not been such reports on Illumos or
 OpenIndiana mailing lists and I don't recall seeing this issue in the bug
 trackers.

 Illumos is not so good at dealing with huge memory systems but perhaps it
 is also more stable as well.

 Bob
 --
 Bob Friesenhahn
 bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/**
 users/bfriesen/ http://www.simplesystems.org/users/bfriesen/
 GraphicsMagick Maintainer,http://www.GraphicsMagick.org/
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Solaris 11 System Reboots Continuously Because of a ZFS-Related Panic (7191375)

2012-12-13 Thread Bob Friesenhahn

On Wed, 12 Dec 2012, Jamie Krier wrote:



I am thinking about switching to an Illumos distro, but wondering if this 
problem may be present there
as well. 


I believe that Illumos is forked before this new virtual memory 
sub-system was added to Solaris.  There have not been such reports on 
Illumos or OpenIndiana mailing lists and I don't recall seeing this 
issue in the bug trackers.


Illumos is not so good at dealing with huge memory systems but 
perhaps it is also more stable as well.


Bob
--
Bob Friesenhahn
bfrie...@simple.dallas.tx.us, http://www.simplesystems.org/users/bfriesen/
GraphicsMagick Maintainer,http://www.GraphicsMagick.org/___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


[zfs-discuss] Solaris 11 System Reboots Continuously Because of a ZFS-Related Panic (7191375)

2012-12-12 Thread Jamie Krier
I've hit this bug on four of my Solaris 11 servers. Looking for anyone else
who has seen it, as well as comments/speculation on cause.

This bug is pretty bad.  If you are lucky you can import the pool read-only
and migrate it elsewhere.

I've also tried setting zfs:zfs_recover=1,aok=1 with varying results.


http://docs.oracle.com/cd/E26502_01/html/E28978/gmkgj.html#scrolltoc


Hardware platform:

Supermicro X8DAH

144GB ram

Supermicro sas2 jbods

LSI 9200-8e controllers (Phase 13 fw)

Zuesram log

ZuesIops sas l2arc

Seagate ST33000650SS sas drives


All four servers are running the same hardware, so at first I suspected a
problem there.  I opened a ticket with Oracle which ended with this email:

-

We strongly expect that this is a software issue because this problem does
not happen

on Solaris 10.   On Solaris 11, it happens with both the SPARC and the X64
versions of

Solaris.


We have quite a few customer who have seen this issue and we are in the
process of

working on a fix.  Because we do not know the source of the problem yet, I
cannot speculate

on the time to fix.  This particular portion of Solaris 11 (the virtual
memory sub-system) is quite

different than in Solaris 10.  We re-wrote the memory management in order
to get ready for

systems with much more memory than Solaris 10 was designed to handle.


Because this is the memory management system, there is not expected to be
any

work-around.


Depending on your company's requirements, one possibility is to use Solaris
10 until this

issue is resolved.


I apologize for any inconvenience that  this bug may cause.  We are working
on it as a Sev 1 Priority1 in sustaining engineering.

-


I am thinking about switching to an Illumos distro, but wondering if this
problem may be present there as well.


Thanks


- Jamie
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Solaris 11 System Reboots Continuously Because of a ZFS-Related Panic (7191375)

2012-12-12 Thread Thomas Nau
Jamie
We ran Into the same and had to migrate the pool while imported read-only. On 
top we were adviced to NOT use an L2ARC. Maybe you should consider that as well

Thomas


Am 12.12.2012 um 19:21 schrieb Jamie Krier jamie.kr...@gmail.com:

 I've hit this bug on four of my Solaris 11 servers. Looking for anyone else 
 who has seen it, as well as comments/speculation on cause.  
 
 This bug is pretty bad.  If you are lucky you can import the pool read-only 
 and migrate it elsewhere.  
 
 I've also tried setting zfs:zfs_recover=1,aok=1 with varying results.
 
 
 
 http://docs.oracle.com/cd/E26502_01/html/E28978/gmkgj.html#scrolltoc
 
 
 
 Hardware platform:
 
 Supermicro X8DAH
 
 144GB ram
 
 Supermicro sas2 jbods
 
 LSI 9200-8e controllers (Phase 13 fw)
 
 Zuesram log
 
 ZuesIops sas l2arc
 
 Seagate ST33000650SS sas drives
 
 
 
 All four servers are running the same hardware, so at first I suspected a 
 problem there.  I opened a ticket with Oracle which ended with this email:
 
 -
 
 We strongly expect that this is a software issue because this problem does 
 not happen
 
 on Solaris 10.   On Solaris 11, it happens with both the SPARC and the X64 
 versions of
 
 Solaris.
 
 
 
 We have quite a few customer who have seen this issue and we are in the 
 process of
 
 working on a fix.  Because we do not know the source of the problem yet, I 
 cannot speculate
 
 on the time to fix.  This particular portion of Solaris 11 (the virtual 
 memory sub-system) is quite
 
 different than in Solaris 10.  We re-wrote the memory management in order to 
 get ready for
 
 systems with much more memory than Solaris 10 was designed to handle.
 
 
 
 Because this is the memory management system, there is not expected to be any
 
 work-around.
 
 
 
 Depending on your company's requirements, one possibility is to use Solaris 
 10 until this
 
 issue is resolved.
 
 
 
 I apologize for any inconvenience that  this bug may cause.  We are working 
 on it as a Sev 1 Priority1 in sustaining engineering.
 
 -
 
 
 
 I am thinking about switching to an Illumos distro, but wondering if this 
 problem may be present there as well. 
 
 
 
 Thanks
 
 
 
 - Jamie
 
 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss


Re: [zfs-discuss] Solaris 11 System Reboots Continuously Because of a ZFS-Related Panic (7191375)

2012-12-12 Thread Tomas Forsman
On 12 December, 2012 - Thomas Nau sent me these 7,3K bytes:

 Jamie
 We ran Into the same and had to migrate the pool while imported
 read-only. On top we were adviced to NOT use an L2ARC. Maybe you
 should consider that as well

We also ran into something similar, imported read-only and created a new
pool. A few months later, we ran into an L2ARC bug (15809921) to which
we've received an IDR that we have not applied yet.

This bug caused the following:
errors: Permanent errors have been detected in the following files:

metadata:0x132c1f

on a 3x3 mirrored pool (triple-mirroring), all 9 disks had checksum
errors.

 Thomas
 
 
 Am 12.12.2012 um 19:21 schrieb Jamie Krier jamie.kr...@gmail.com:
 
  I've hit this bug on four of my Solaris 11 servers. Looking for anyone else 
  who has seen it, as well as comments/speculation on cause.  
  
  This bug is pretty bad.  If you are lucky you can import the pool read-only 
  and migrate it elsewhere.  
  
  I've also tried setting zfs:zfs_recover=1,aok=1 with varying results.
  
  
  
  http://docs.oracle.com/cd/E26502_01/html/E28978/gmkgj.html#scrolltoc
  
  
  
  Hardware platform:
  
  Supermicro X8DAH
  
  144GB ram
  
  Supermicro sas2 jbods
  
  LSI 9200-8e controllers (Phase 13 fw)
  
  Zuesram log
  
  ZuesIops sas l2arc
  
  Seagate ST33000650SS sas drives
  
  
  
  All four servers are running the same hardware, so at first I suspected a 
  problem there.  I opened a ticket with Oracle which ended with this email:
  
  -
  
  We strongly expect that this is a software issue because this problem does 
  not happen
  
  on Solaris 10.   On Solaris 11, it happens with both the SPARC and the X64 
  versions of
  
  Solaris.
  
  
  
  We have quite a few customer who have seen this issue and we are in the 
  process of
  
  working on a fix.  Because we do not know the source of the problem yet, I 
  cannot speculate
  
  on the time to fix.  This particular portion of Solaris 11 (the virtual 
  memory sub-system) is quite
  
  different than in Solaris 10.  We re-wrote the memory management in order 
  to get ready for
  
  systems with much more memory than Solaris 10 was designed to handle.
  
  
  
  Because this is the memory management system, there is not expected to be 
  any
  
  work-around.
  
  
  
  Depending on your company's requirements, one possibility is to use Solaris 
  10 until this
  
  issue is resolved.
  
  
  
  I apologize for any inconvenience that  this bug may cause.  We are working 
  on it as a Sev 1 Priority1 in sustaining engineering.
  
  -
  
  
  
  I am thinking about switching to an Illumos distro, but wondering if this 
  problem may be present there as well. 
  
  
  
  Thanks
  
  
  
  - Jamie
  
  ___
  zfs-discuss mailing list
  zfs-discuss@opensolaris.org
  http://mail.opensolaris.org/mailman/listinfo/zfs-discuss

 ___
 zfs-discuss mailing list
 zfs-discuss@opensolaris.org
 http://mail.opensolaris.org/mailman/listinfo/zfs-discuss



/Tomas
-- 
Tomas Forsman, st...@acc.umu.se, http://www.acc.umu.se/~stric/
|- Student at Computing Science, University of Umeå
`- Sysadmin at {cs,acc}.umu.se
___
zfs-discuss mailing list
zfs-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/zfs-discuss