RE: Raid Arrays and Power Loss

2003-09-16 Thread Igor Neyman
Ian,

Thanks for sharing (seriously).

Igor Neyman, OCP DBA
[EMAIL PROTECTED]



-Original Message-
MacGregor, Ian A.
Sent: Monday, September 15, 2003 11:34 PM
To: Multiple recipients of list ORACLE-L

Last Friday was hot here, and rumor has it our  230 KV  power line
sagged and touched some tree branches.  The local power company shut it
off.  Leaving our systems to depend on UPS.  About 30 minutes afterwards
one system produced these  errors.  This was jus before the system went
dead

Fri Sep 12 12:58:40 2003
Errors in file /opt/oracle/admin/BBRO/bdump/bbro_ckpt_1420.trc:
ORA-00206: error in writing (block 3, # blocks 1) of controlfile
ORA-00202: controlfile: '/u1/oradata/BBRO/BBROcntrl01.ctl'
ORA-27063: skgfospo: number of bytes read/written is incorrect
SVR4 Error: 5: I/O error
Additional information: -1
Additional information: 8192
Fri Sep 12 12:58:42 2003
Errors in file /opt/oracle/admin/BBRO/bdump/bbro_ckpt_1420.trc:
ORA-00221: error on write to controlfile
ORA-00206: error in writing (block 3, # blocks 1) of controlfile
ORA-00202: controlfile: '/u1/oradata/BBRO/BBROcntrl01.ctl'
ORA-27063: skgfospo: number of bytes read/written is incorrect
SVR4 Error: 5: I/O error
Additional information: -1
Additional information: 8192
Fri Sep 12 12:58:42 2003
CKPT: terminating instance due to error 221
Instance terminated by CKPT, pid = 1420

-
Things look pretty shaky here.  When things were restarted the following
error was produced.
Fri Sep 12 13:32:01 2003
ORA-00204: error in reading (block 1, # blocks 1) of controlfile
ORA-00202: controlfile: '/u1/oradata/BBRO/BBROcntrl01.ctl'
ORA-27091: skgfqio: unable to queue I/O
SVR4 Error: 6: No such device or address
Additional information: 1

The raid array had not been powered on

---
However 
Fri Sep 12 15:33:08 2003
ORA-00202: controlfile: '/u1/oradata/BBRO/BBROcntrl01.ctl'
ORA-27037: unable to obtain file status
SVR4 Error: 2: No such file or directory
Additional information: 3
Fri Sep 12 15:33:11 2003
ORA-205 signalled during: alter database  mount...

Now the file system is available, but the file itself has disappeared.
It was not corrupted, just disappeared.  We duplex a copy to an internal
disk.  So recovery was easy.

However once this was fixed

Fri Sep 12 16:18:58 2003
Thread recovery: start rolling forward thread 1
Fri Sep 12 16:18:58 2003
Errors in file /opt/oracle/admin/BBRO/udump/bbro_ora_1804.trc:
ORA-00313: open failed for members of log group 3 of thread 1
ORA-00312: online log 3 thread 1: '/u2/oradata/BBRO/redo0301.log'
ORA-27037: unable to obtain file status
SVR4 Error: 2: No such file or directory
Additional information: 3
ORA-313 signalled during: ALTER DATABASE OPEN...

-
These files are on a RAID  1 LUN.  Both copies of the file are gone.
Again not corrupted but gone.  I don't know if using duplexing rather
than RAID 1 would have mattered here, but I am changing things so that
one group of redo logs is on internal disk and written via the duplexing
method.




Ian MacGregor
Stanford linear Accelerator Center
[EMAIL PROTECTED]

 

-- 
Please see the official ORACLE-L FAQ: http://www.orafaq.net
-- 
Author: MacGregor, Ian A.
  INET: [EMAIL PROTECTED]

Fat City Network Services-- 858-538-5051 http://www.fatcity.com
San Diego, California-- Mailing list and web hosting services
-
To REMOVE yourself from this mailing list, send an E-Mail message
to: [EMAIL PROTECTED] (note EXACT spelling of 'ListGuru') and in
the message BODY, include a line containing: UNSUB ORACLE-L
(or the name of mailing list you want to be removed from).  You may
also send the HELP command for other information (like subscribing).


-- 
Please see the official ORACLE-L FAQ: http://www.orafaq.net
-- 
Author: Igor Neyman
  INET: [EMAIL PROTECTED]

Fat City Network Services-- 858-538-5051 http://www.fatcity.com
San Diego, California-- Mailing list and web hosting services
-
To REMOVE yourself from this mailing list, send an E-Mail message
to: [EMAIL PROTECTED] (note EXACT spelling of 'ListGuru') and in
the message BODY, include a line containing: UNSUB ORACLE-L
(or the name of mailing list you want to be removed from).  You may
also send the HELP command for other information (like subscribing).


RE: Raid Arrays and Power Loss

2003-09-16 Thread Jesse, Rich
For the curious, what brand/model RAID 1 are you using?  Size?

Rich

Rich Jesse   System/Database Administrator
[EMAIL PROTECTED]  Quad/Tech Inc, Sussex, WI USA


 -Original Message-
 From: MacGregor, Ian A. [mailto:[EMAIL PROTECTED]
 Sent: Monday, September 15, 2003 11:34 PM
 To: Multiple recipients of list ORACLE-L
 Subject: Raid Arrays and Power Loss
 
 
 Last Friday was hot here, and rumor has it our  230 KV  power 
 line sagged and touched some tree branches.  The local power 
 company shut it off.  Leaving our systems to depend on UPS.  
 About 30 minutes afterwards one system produced these  
 errors.  This was jus before the system went dead

snip
-- 
Please see the official ORACLE-L FAQ: http://www.orafaq.net
-- 
Author: Jesse, Rich
  INET: [EMAIL PROTECTED]

Fat City Network Services-- 858-538-5051 http://www.fatcity.com
San Diego, California-- Mailing list and web hosting services
-
To REMOVE yourself from this mailing list, send an E-Mail message
to: [EMAIL PROTECTED] (note EXACT spelling of 'ListGuru') and in
the message BODY, include a line containing: UNSUB ORACLE-L
(or the name of mailing list you want to be removed from).  You may
also send the HELP command for other information (like subscribing).


Re: Raid Arrays and Power Loss

2003-09-16 Thread zhu chao
Hi, what is your OS and filesystem?

Regards
zhu chao
msn:[EMAIL PROTECTED]
www.cnoug.org
- Original Message -
To: Multiple recipients of list ORACLE-L [EMAIL PROTECTED]
Sent: Tuesday, September 16, 2003 12:34 PM


 Last Friday was hot here, and rumor has it our  230 KV  power line sagged
and touched some tree branches.  The local power company shut it off.
Leaving our systems to depend on UPS.  About 30 minutes afterwards one
system produced these  errors.  This was jus before the system went dead

 Fri Sep 12 12:58:40 2003
 Errors in file /opt/oracle/admin/BBRO/bdump/bbro_ckpt_1420.trc:
 ORA-00206: error in writing (block 3, # blocks 1) of controlfile
 ORA-00202: controlfile: '/u1/oradata/BBRO/BBROcntrl01.ctl'
 ORA-27063: skgfospo: number of bytes read/written is incorrect
 SVR4 Error: 5: I/O error
 Additional information: -1
 Additional information: 8192
 Fri Sep 12 12:58:42 2003
 Errors in file /opt/oracle/admin/BBRO/bdump/bbro_ckpt_1420.trc:
 ORA-00221: error on write to controlfile
 ORA-00206: error in writing (block 3, # blocks 1) of controlfile
 ORA-00202: controlfile: '/u1/oradata/BBRO/BBROcntrl01.ctl'
 ORA-27063: skgfospo: number of bytes read/written is incorrect
 SVR4 Error: 5: I/O error
 Additional information: -1
 Additional information: 8192
 Fri Sep 12 12:58:42 2003
 CKPT: terminating instance due to error 221
 Instance terminated by CKPT, pid = 1420
 --
---
 Things look pretty shaky here.  When things were restarted the following
error was produced.
 Fri Sep 12 13:32:01 2003
 ORA-00204: error in reading (block 1, # blocks 1) of controlfile
 ORA-00202: controlfile: '/u1/oradata/BBRO/BBROcntrl01.ctl'
 ORA-27091: skgfqio: unable to queue I/O
 SVR4 Error: 6: No such device or address
 Additional information: 1

 The raid array had not been powered on
 --
-
 However
 Fri Sep 12 15:33:08 2003
 ORA-00202: controlfile: '/u1/oradata/BBRO/BBROcntrl01.ctl'
 ORA-27037: unable to obtain file status
 SVR4 Error: 2: No such file or directory
 Additional information: 3
 Fri Sep 12 15:33:11 2003
 ORA-205 signalled during: alter database  mount...

 Now the file system is available, but the file itself has disappeared. It
was not corrupted, just disappeared.  We duplex a copy to an internal disk.
So recovery was easy.

 However once this was fixed

 Fri Sep 12 16:18:58 2003
 Thread recovery: start rolling forward thread 1
 Fri Sep 12 16:18:58 2003
 Errors in file /opt/oracle/admin/BBRO/udump/bbro_ora_1804.trc:
 ORA-00313: open failed for members of log group 3 of thread 1
 ORA-00312: online log 3 thread 1: '/u2/oradata/BBRO/redo0301.log'
 ORA-27037: unable to obtain file status
 SVR4 Error: 2: No such file or directory
 Additional information: 3
 ORA-313 signalled during: ALTER DATABASE OPEN...
 --
---
 These files are on a RAID  1 LUN.  Both copies of the file are gone.
Again not corrupted but gone.  I don't know if using duplexing rather than
RAID 1 would have mattered here, but I am changing things so that one group
of redo logs is on internal disk and written via the duplexing method.




 Ian MacGregor
 Stanford linear Accelerator Center
 [EMAIL PROTECTED]



 --
 Please see the official ORACLE-L FAQ: http://www.orafaq.net
 --
 Author: MacGregor, Ian A.
   INET: [EMAIL PROTECTED]

 Fat City Network Services-- 858-538-5051 http://www.fatcity.com
 San Diego, California-- Mailing list and web hosting services
 -
 To REMOVE yourself from this mailing list, send an E-Mail message
 to: [EMAIL PROTECTED] (note EXACT spelling of 'ListGuru') and in
 the message BODY, include a line containing: UNSUB ORACLE-L
 (or the name of mailing list you want to be removed from).  You may
 also send the HELP command for other information (like subscribing).



-- 
Please see the official ORACLE-L FAQ: http://www.orafaq.net
-- 
Author: zhu chao
  INET: [EMAIL PROTECTED]

Fat City Network Services-- 858-538-5051 http://www.fatcity.com
San Diego, California-- Mailing list and web hosting services
-
To REMOVE yourself from this mailing list, send an E-Mail message
to: [EMAIL PROTECTED] (note EXACT spelling of 'ListGuru') and in
the message BODY, include a line containing: UNSUB ORACLE-L
(or the name of mailing list you want to be removed from).  You may
also send the HELP command for other information (like subscribing).


RE: [SPAM:#] Re: Raid Arrays and Power Loss

2003-09-16 Thread MacGregor, Ian A.
The OS is Solaris 5.8.  The file systems is Veritas. 

Ian MacGregor
Stanford Linear Accelerator Center
[EMAIL PROTECTED]

-Original Message-
Sent: Tuesday, September 16, 2003 8:25 AM
To: Multiple recipients of list ORACLE-L


Hi, what is your OS and filesystem?

Regards
zhu chao
msn:[EMAIL PROTECTED]
www.cnoug.org
- Original Message -
To: Multiple recipients of list ORACLE-L [EMAIL PROTECTED]
Sent: Tuesday, September 16, 2003 12:34 PM


 Last Friday was hot here, and rumor has it our  230 KV  power line 
 sagged
and touched some tree branches.  The local power company shut it off. Leaving our 
systems to depend on UPS.  About 30 minutes afterwards one system produced these  
errors.  This was jus before the system went dead

 Fri Sep 12 12:58:40 2003
 Errors in file /opt/oracle/admin/BBRO/bdump/bbro_ckpt_1420.trc:
 ORA-00206: error in writing (block 3, # blocks 1) of controlfile
 ORA-00202: controlfile: '/u1/oradata/BBRO/BBROcntrl01.ctl'
 ORA-27063: skgfospo: number of bytes read/written is incorrect SVR4 
 Error: 5: I/O error Additional information: -1
 Additional information: 8192
 Fri Sep 12 12:58:42 2003
 Errors in file /opt/oracle/admin/BBRO/bdump/bbro_ckpt_1420.trc:
 ORA-00221: error on write to controlfile
 ORA-00206: error in writing (block 3, # blocks 1) of controlfile
 ORA-00202: controlfile: '/u1/oradata/BBRO/BBROcntrl01.ctl'
 ORA-27063: skgfospo: number of bytes read/written is incorrect
 SVR4 Error: 5: I/O error
 Additional information: -1
 Additional information: 8192
 Fri Sep 12 12:58:42 2003
 CKPT: terminating instance due to error 221
 Instance terminated by CKPT, pid = 1420
 --
---
 Things look pretty shaky here.  When things were restarted the 
 following
error was produced.
 Fri Sep 12 13:32:01 2003
 ORA-00204: error in reading (block 1, # blocks 1) of controlfile
 ORA-00202: controlfile: '/u1/oradata/BBRO/BBROcntrl01.ctl'
 ORA-27091: skgfqio: unable to queue I/O
 SVR4 Error: 6: No such device or address
 Additional information: 1

 The raid array had not been powered on
 --
 
-
 However
 Fri Sep 12 15:33:08 2003
 ORA-00202: controlfile: '/u1/oradata/BBRO/BBROcntrl01.ctl'
 ORA-27037: unable to obtain file status
 SVR4 Error: 2: No such file or directory
 Additional information: 3
 Fri Sep 12 15:33:11 2003
 ORA-205 signalled during: alter database  mount...

 Now the file system is available, but the file itself has disappeared. 
 It
was not corrupted, just disappeared.  We duplex a copy to an internal disk. So 
recovery was easy.

 However once this was fixed

 Fri Sep 12 16:18:58 2003
 Thread recovery: start rolling forward thread 1
 Fri Sep 12 16:18:58 2003
 Errors in file /opt/oracle/admin/BBRO/udump/bbro_ora_1804.trc:
 ORA-00313: open failed for members of log group 3 of thread 1
 ORA-00312: online log 3 thread 1: '/u2/oradata/BBRO/redo0301.log'
 ORA-27037: unable to obtain file status
 SVR4 Error: 2: No such file or directory
 Additional information: 3
 ORA-313 signalled during: ALTER DATABASE OPEN...
 --
 
---
 These files are on a RAID  1 LUN.  Both copies of the file are gone.
Again not corrupted but gone.  I don't know if using duplexing rather than RAID 1 
would have mattered here, but I am changing things so that one group of redo logs is 
on internal disk and written via the duplexing method.




 Ian MacGregor
 Stanford linear Accelerator Center
 [EMAIL PROTECTED]



 --
 Please see the official ORACLE-L FAQ: http://www.orafaq.net
 --
 Author: MacGregor, Ian A.
   INET: [EMAIL PROTECTED]

 Fat City Network Services-- 858-538-5051 http://www.fatcity.com
 San Diego, California-- Mailing list and web hosting services
 -
 To REMOVE yourself from this mailing list, send an E-Mail message
 to: [EMAIL PROTECTED] (note EXACT spelling of 'ListGuru') and in 
 the message BODY, include a line containing: UNSUB ORACLE-L (or the 
 name of mailing list you want to be removed from).  You may also send 
 the HELP command for other information (like subscribing).



-- 
Please see the official ORACLE-L FAQ: http://www.orafaq.net
-- 
Author: zhu chao
  INET: [EMAIL PROTECTED]

Fat City Network Services-- 858-538-5051 http://www.fatcity.com
San Diego, California-- Mailing list and web hosting services
-
To REMOVE yourself from this mailing list, send an E-Mail message
to: [EMAIL PROTECTED] (note EXACT spelling of 'ListGuru') and in the message BODY, 
include a line containing: UNSUB ORACLE-L (or the name of mailing list you want to be 
removed from).  You may also send the HELP command for other information (like 

RE: Raid Arrays and Power Loss

2003-09-16 Thread Matthew Zito

Okay, core questions:

-as someone asked, what's the make/model of storage?
-has your raid array lost its config?  In other words, is the storage there,
just with an empty vtoc/volume table/partition table (insert your particular
OS nomenclature)
-Is the filesystem good, just empty?  When you say the file is gone, is the
/u1 directory empty, or is the filesystem structure there, just that file is
gone?

Okay, I just saw your message that shows its solaris 8 + veritas.  Here's
what probably happened.  The box was powered on without the RAID array
powered on and consequently veritas doesn't see the disk groups/volumes that
are on the RAID array.  Have you tried doing (as root):

vxconfigd -km enable

This will cause a rescan of the existing volume groups.  Afterwards, what
does a vxprint -hrt look like?

In general, power loss to a RAID array will not produce the results you
describe - I think its far more likely that a system-array interaction is
preventing proper access to your storage.

Thanks,
Matt

--
Matthew Zito
GridApp Systems
Email: [EMAIL PROTECTED]
Cell: 646-220-3551
Phone: 212-358-8211 x 359
http://www.gridapp.com

 -Original Message-
 From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On 
 Behalf Of MacGregor, Ian A.
 Sent: Tuesday, September 16, 2003 12:34 AM
 To: Multiple recipients of list ORACLE-L
 Subject: Raid Arrays and Power Loss
 
 
 Last Friday was hot here, and rumor has it our  230 KV  power 
 line sagged and touched some tree branches.  The local power 
 company shut it off.  Leaving our systems to depend on UPS.  
 About 30 minutes afterwards one system produced these  
 errors.  This was jus before the system went dead
 
 Fri Sep 12 12:58:40 2003
 Errors in file /opt/oracle/admin/BBRO/bdump/bbro_ckpt_1420.trc:
 ORA-00206: error in writing (block 3, # blocks 1) of controlfile
 ORA-00202: controlfile: '/u1/oradata/BBRO/BBROcntrl01.ctl'
 ORA-27063: skgfospo: number of bytes read/written is 
 incorrect SVR4 Error: 5: I/O error Additional information: -1 
 Additional information: 8192 Fri Sep 12 12:58:42 2003 Errors 
 in file /opt/oracle/admin/BBRO/bdump/bbro_ckpt_1420.trc:
 ORA-00221: error on write to controlfile
 ORA-00206: error in writing (block 3, # blocks 1) of controlfile
 ORA-00202: controlfile: '/u1/oradata/BBRO/BBROcntrl01.ctl'
 ORA-27063: skgfospo: number of bytes read/written is 
 incorrect SVR4 Error: 5: I/O error Additional information: -1 
 Additional information: 8192 Fri Sep 12 12:58:42 2003
 CKPT: terminating instance due to error 221
 Instance terminated by CKPT, pid = 1420
 --
 ---
 Things look pretty shaky here.  When things were restarted 
 the following error was produced. Fri Sep 12 13:32:01 2003
 ORA-00204: error in reading (block 1, # blocks 1) of controlfile
 ORA-00202: controlfile: '/u1/oradata/BBRO/BBROcntrl01.ctl'
 ORA-27091: skgfqio: unable to queue I/O
 SVR4 Error: 6: No such device or address
 Additional information: 1
 
 The raid array had not been powered on
 --
 -
 However 
 Fri Sep 12 15:33:08 2003
 ORA-00202: controlfile: '/u1/oradata/BBRO/BBROcntrl01.ctl'
 ORA-27037: unable to obtain file status
 SVR4 Error: 2: No such file or directory
 Additional information: 3
 Fri Sep 12 15:33:11 2003
 ORA-205 signalled during: alter database  mount...
 
 Now the file system is available, but the file itself has 
 disappeared. It was not corrupted, just disappeared.  We 
 duplex a copy to an internal disk.  So recovery was easy.
 
 However once this was fixed
 
 Fri Sep 12 16:18:58 2003
 Thread recovery: start rolling forward thread 1
 Fri Sep 12 16:18:58 2003
 Errors in file /opt/oracle/admin/BBRO/udump/bbro_ora_1804.trc:
 ORA-00313: open failed for members of log group 3 of thread 1
 ORA-00312: online log 3 thread 1: '/u2/oradata/BBRO/redo0301.log'
 ORA-27037: unable to obtain file status
 SVR4 Error: 2: No such file or directory
 Additional information: 3
 ORA-313 signalled during: ALTER DATABASE OPEN...
 --
 ---
 These files are on a RAID  1 LUN.  Both copies of the file 
 are gone.  Again not corrupted but gone.  I don't know if 
 using duplexing rather than RAID 1 would have mattered here, 
 but I am changing things so that one group of redo logs is on 
 internal disk and written via the duplexing method.
 
 
 
 
 Ian MacGregor
 Stanford linear Accelerator Center
 [EMAIL PROTECTED]
 
  
 
 -- 
 Please see the official ORACLE-L FAQ: http://www.orafaq.net
 -- 
 Author: MacGregor, Ian A.
   INET: [EMAIL PROTECTED]
 
 Fat City Network Services-- 858-538-5051 http://www.fatcity.com
 San Diego, California-- Mailing list and web hosting services
 -
 To REMOVE yourself 

RE: Raid Arrays and Power Loss

2003-09-16 Thread MacGregor, Ian A.
The Raid Array is a Sun  A1000.  I'm not sure the vintage, but the disks are 18 GB. 
The Raid array did not lose its configuration.  The storage is still there.  Neither 
affected file system was every empty, but a couple of files were lost.  One on each 
file system.

The box is located at one of our interaction regions (IR's).  some additional 
information [results truncated]

[EMAIL PROTECTED] $ last reboot  

rebootsystem boot   Fri Sep 12 15:32
rebootsystem boot   Mon Aug 25 14:24

When the 

  Fri Sep 12 13:32:01 2003
 ORA-00204: error in reading (block 1, # blocks 1) of controlfile
 ORA-00202: controlfile: '/u1/oradata/BBRO/BBROcntrl01.ctl'
 ORA-27091: skgfqio: unable to queue I/O
 SVR4 Error: 6: No such device or address
 Additional information: 1

Error occurred the raid box was off.  I had thought that the unix box had already been 
rebooted but that turns out to be false.

After the box was rebooted with the raid array on

Fri Sep 12 15:33:08 2003
 ORA-00202: controlfile: '/u1/oradata/BBRO/BBROcntrl01.ctl'
 ORA-27037: unable to obtain file status
 SVR4 Error: 2: No such file or directory
 Additional information: 3
 Fri Sep 12 15:33:11 2003

The other files on /u1 were fine.  Also concerning 

The other error

Fri Sep 12 16:18:58 2003
 Thread recovery: start rolling forward thread 1
 Fri Sep 12 16:18:58 2003
 Errors in file /opt/oracle/admin/BBRO/udump/bbro_ora_1804.trc:
 ORA-00313: open failed for members of log group 3 of thread 1
 ORA-00312: online log 3 thread 1: '/u2/oradata/BBRO/redo0301.log'
 ORA-27037: unable to obtain file status
 SVR4 Error: 2: No such file or directory
 Additional information: 3

The other files are /u2 were fine.  The files in question just disappeared.  I know 
this is not normal and raid boxes do not normally lose files, but it's hard to argue 
against the empirical evidence here that they can.  It may be that either I or the 
folks down an IR-2 induced the problems.  But files were indeed lost on two different 
LUN's.

My current thinking is that the two files were being written when the power was turned 
off on the raid array or there was not enough to keep the disks spinning because the 
UPS had been drained.  The battery for the cache was reporting  low, but based on the 
number of hours it operation.  Should it not have maintained the cache?

Ian MacGregor
Stanford Linear Accelerator Center
[EMAIL PROTECTED] 








  
 



-Original Message-
Sent: Tuesday, September 16, 2003 10:55 AM
To: Multiple recipients of list ORACLE-L



Okay, core questions:

-as someone asked, what's the make/model of storage?
-has your raid array lost its config?  In other words, is the storage there, just with 
an empty vtoc/volume table/partition table (insert your particular OS nomenclature) 
-Is the filesystem good, just empty?  When you say the file is gone, is the /u1 
directory empty, or is the filesystem structure there, just that file is gone?

Okay, I just saw your message that shows its solaris 8 + veritas.  Here's what 
probably happened.  The box was powered on without the RAID array powered on and 
consequently veritas doesn't see the disk groups/volumes that are on the RAID array.  
Have you tried doing (as root):

vxconfigd -km enable

This will cause a rescan of the existing volume groups.  Afterwards, what does a 
vxprint -hrt look like?

In general, power loss to a RAID array will not produce the results you describe - I 
think its far more likely that a system-array interaction is preventing proper access 
to your storage.

Thanks,
Matt

--
Matthew Zito
GridApp Systems
Email: [EMAIL PROTECTED]
Cell: 646-220-3551
Phone: 212-358-8211 x 359
http://www.gridapp.com

 -Original Message-
 From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On
 Behalf Of MacGregor, Ian A.
 Sent: Tuesday, September 16, 2003 12:34 AM
 To: Multiple recipients of list ORACLE-L
 Subject: Raid Arrays and Power Loss
 
 
 Last Friday was hot here, and rumor has it our  230 KV  power
 line sagged and touched some tree branches.  The local power 
 company shut it off.  Leaving our systems to depend on UPS.  
 About 30 minutes afterwards one system produced these  
 errors.  This was jus before the system went dead
 
 Fri Sep 12 12:58:40 2003
 Errors in file /opt/oracle/admin/BBRO/bdump/bbro_ckpt_1420.trc:
 ORA-00206: error in writing (block 3, # blocks 1) of controlfile
 ORA-00202: controlfile: '/u1/oradata/BBRO/BBROcntrl01.ctl'
 ORA-27063: skgfospo: number of bytes read/written is
 incorrect SVR4 Error: 5: I/O error Additional information: -1 
 Additional information: 8192 Fri Sep 12 12:58:42 2003 Errors 
 in file /opt/oracle/admin/BBRO/bdump/bbro_ckpt_1420.trc:
 ORA-00221: error on write to controlfile
 ORA-00206: error in writing (block 3, # blocks 1) of controlfile
 ORA-00202: controlfile: '/u1/oradata/BBRO/BBROcntrl01.ctl'
 ORA-27063: skgfospo: number of bytes read/written is 
 incorrect SVR4 Error: 5: I/O error Additional information: -1 

RE: Raid Arrays and Power Loss

2003-09-16 Thread Jesse, Rich
My Veritas-trained co-worker says they ran into the same situation in the
class and fsck was able to find the missing inodes and repair the damage.
We were thinking that it could be Solaris not flushing the writes that could
be your problem.  I was warned about that for HP/UX's syncer during training
and am told there's a similar function on Solaris.

I'm just the messenger...

Rich

Rich Jesse   System/Database Administrator
[EMAIL PROTECTED]  Quad/Tech Inc, Sussex, WI USA


 -Original Message-
 From: MacGregor, Ian A. [mailto:[EMAIL PROTECTED]
 Sent: Tuesday, September 16, 2003 2:05 PM
 To: Multiple recipients of list ORACLE-L
 Subject: RE: Raid Arrays and Power Loss
 
 
 The Raid Array is a Sun  A1000.  I'm not sure the vintage, 
 but the disks are 18 GB. The Raid array did not lose its 
 configuration.  The storage is still there.  Neither affected 
 file system was every empty, but a couple of files were lost. 
  One on each file system.
 
 The box is located at one of our interaction regions (IR's).  
 some additional information [results truncated]
 
 [EMAIL PROTECTED] $ last reboot  
 
 rebootsystem boot   Fri Sep 12 15:32
 rebootsystem boot   Mon Aug 25 14:24
 
 When the 
 
   Fri Sep 12 13:32:01 2003
  ORA-00204: error in reading (block 1, # blocks 1) of controlfile
  ORA-00202: controlfile: '/u1/oradata/BBRO/BBROcntrl01.ctl'
  ORA-27091: skgfqio: unable to queue I/O
  SVR4 Error: 6: No such device or address
  Additional information: 1
 
 Error occurred the raid box was off.  I had thought that the 
 unix box had already been rebooted but that turns out to be false.
 
 After the box was rebooted with the raid array on
 
 Fri Sep 12 15:33:08 2003
  ORA-00202: controlfile: '/u1/oradata/BBRO/BBROcntrl01.ctl'
  ORA-27037: unable to obtain file status
  SVR4 Error: 2: No such file or directory
  Additional information: 3
  Fri Sep 12 15:33:11 2003
 
 The other files on /u1 were fine.  Also concerning 
 
 The other error
 
 Fri Sep 12 16:18:58 2003
  Thread recovery: start rolling forward thread 1
  Fri Sep 12 16:18:58 2003
  Errors in file /opt/oracle/admin/BBRO/udump/bbro_ora_1804.trc:
  ORA-00313: open failed for members of log group 3 of thread 1
  ORA-00312: online log 3 thread 1: '/u2/oradata/BBRO/redo0301.log'
  ORA-27037: unable to obtain file status
  SVR4 Error: 2: No such file or directory
  Additional information: 3
 
 The other files are /u2 were fine.  The files in question 
 just disappeared.  I know this is not normal and raid boxes 
 do not normally lose files, but it's hard to argue against 
 the empirical evidence here that they can.  It may be that 
 either I or the folks down an IR-2 induced the problems.  But 
 files were indeed lost on two different LUN's.
 
 My current thinking is that the two files were being written 
 when the power was turned off on the raid array or there was 
 not enough to keep the disks spinning because the UPS had 
 been drained.  The battery for the cache was reporting  low, 
 but based on the number of hours it operation.  Should it not 
 have maintained the cache?
 
 Ian MacGregor
 Stanford Linear Accelerator Center
 [EMAIL PROTECTED] 
-- 
Please see the official ORACLE-L FAQ: http://www.orafaq.net
-- 
Author: Jesse, Rich
  INET: [EMAIL PROTECTED]

Fat City Network Services-- 858-538-5051 http://www.fatcity.com
San Diego, California-- Mailing list and web hosting services
-
To REMOVE yourself from this mailing list, send an E-Mail message
to: [EMAIL PROTECTED] (note EXACT spelling of 'ListGuru') and in
the message BODY, include a line containing: UNSUB ORACLE-L
(or the name of mailing list you want to be removed from).  You may
also send the HELP command for other information (like subscribing).


RE: Raid Arrays and Power Loss

2003-09-16 Thread MacGregor, Ian A.
Thanks.  I'll keep this in mind, if it happens again.

-Original Message-
Sent: Tuesday, September 16, 2003 1:55 PM
To: Multiple recipients of list ORACLE-L


My Veritas-trained co-worker says they ran into the same situation in the class and 
fsck was able to find the missing inodes and repair the damage. We were thinking that 
it could be Solaris not flushing the writes that could be your problem.  I was warned 
about that for HP/UX's syncer during training and am told there's a similar function 
on Solaris.

I'm just the messenger...

Rich

Rich Jesse   System/Database Administrator
[EMAIL PROTECTED]  Quad/Tech Inc, Sussex, WI USA


 -Original Message-
 From: MacGregor, Ian A. [mailto:[EMAIL PROTECTED]
 Sent: Tuesday, September 16, 2003 2:05 PM
 To: Multiple recipients of list ORACLE-L
 Subject: RE: Raid Arrays and Power Loss
 
 
 The Raid Array is a Sun  A1000.  I'm not sure the vintage,
 but the disks are 18 GB. The Raid array did not lose its 
 configuration.  The storage is still there.  Neither affected 
 file system was every empty, but a couple of files were lost. 
  One on each file system.
 
 The box is located at one of our interaction regions (IR's).
 some additional information [results truncated]
 
 [EMAIL PROTECTED] $ last reboot
 
 rebootsystem boot   Fri Sep 12 15:32
 rebootsystem boot   Mon Aug 25 14:24
 
 When the
 
   Fri Sep 12 13:32:01 2003
  ORA-00204: error in reading (block 1, # blocks 1) of controlfile
  ORA-00202: controlfile: '/u1/oradata/BBRO/BBROcntrl01.ctl'
  ORA-27091: skgfqio: unable to queue I/O
  SVR4 Error: 6: No such device or address
  Additional information: 1
 
 Error occurred the raid box was off.  I had thought that the
 unix box had already been rebooted but that turns out to be false.
 
 After the box was rebooted with the raid array on
 
 Fri Sep 12 15:33:08 2003
  ORA-00202: controlfile: '/u1/oradata/BBRO/BBROcntrl01.ctl'
  ORA-27037: unable to obtain file status
  SVR4 Error: 2: No such file or directory
  Additional information: 3
  Fri Sep 12 15:33:11 2003
 
 The other files on /u1 were fine.  Also concerning
 
 The other error
 
 Fri Sep 12 16:18:58 2003
  Thread recovery: start rolling forward thread 1
  Fri Sep 12 16:18:58 2003
  Errors in file /opt/oracle/admin/BBRO/udump/bbro_ora_1804.trc:
  ORA-00313: open failed for members of log group 3 of thread 1
  ORA-00312: online log 3 thread 1: '/u2/oradata/BBRO/redo0301.log'
  ORA-27037: unable to obtain file status
  SVR4 Error: 2: No such file or directory
  Additional information: 3
 
 The other files are /u2 were fine.  The files in question
 just disappeared.  I know this is not normal and raid boxes 
 do not normally lose files, but it's hard to argue against 
 the empirical evidence here that they can.  It may be that 
 either I or the folks down an IR-2 induced the problems.  But 
 files were indeed lost on two different LUN's.
 
 My current thinking is that the two files were being written
 when the power was turned off on the raid array or there was 
 not enough to keep the disks spinning because the UPS had 
 been drained.  The battery for the cache was reporting  low, 
 but based on the number of hours it operation.  Should it not 
 have maintained the cache?
 
 Ian MacGregor
 Stanford Linear Accelerator Center
 [EMAIL PROTECTED]
-- 
Please see the official ORACLE-L FAQ: http://www.orafaq.net
-- 
Author: Jesse, Rich
  INET: [EMAIL PROTECTED]

Fat City Network Services-- 858-538-5051 http://www.fatcity.com
San Diego, California-- Mailing list and web hosting services
-
To REMOVE yourself from this mailing list, send an E-Mail message
to: [EMAIL PROTECTED] (note EXACT spelling of 'ListGuru') and in the message BODY, 
include a line containing: UNSUB ORACLE-L (or the name of mailing list you want to be 
removed from).  You may also send the HELP command for other information (like 
subscribing).
-- 
Please see the official ORACLE-L FAQ: http://www.orafaq.net
-- 
Author: MacGregor, Ian A.
  INET: [EMAIL PROTECTED]

Fat City Network Services-- 858-538-5051 http://www.fatcity.com
San Diego, California-- Mailing list and web hosting services
-
To REMOVE yourself from this mailing list, send an E-Mail message
to: [EMAIL PROTECTED] (note EXACT spelling of 'ListGuru') and in
the message BODY, include a line containing: UNSUB ORACLE-L
(or the name of mailing list you want to be removed from).  You may
also send the HELP command for other information (like subscribing).