Re: [Veritas-bu] drives going down on media servers

2006-01-26 Thread Blaine Robison
thanks for your input, I have already disabled the RSM on the master. 
Let me give the entire rundown of the system. 

Master Server 
win 2000 NBU 5.1 MP4

Media Servers
2 Sun 480 Sol9 qlogic cards with Leadville drivers. 

Master and media servers are on their own internal Gb network. they are SAN
attaced to the tape drives using Brocade switches. the drives are IBM and HP
LTO2. 

when i run the backups on 1 media server the backups run fine. When I try to
share the drives and run both master servers I get Permission denied in the
messages files then 84,85 errors. In the bptm log I see the external event
caused rewind. After talking with STK they told me the drive was getting
inquiries from another system during the backup. I am lead to believe that the
SCSI reserve is not being handled properly between the servers. since the SCSI
reserve is supposed to be initiated when the drive is opened I would think it
would not except any inquiries or SCSI commands until it was closed. 

My conclusion is the Leadville HBA drivers are not handling the SCSI reserve
properly. But Sun says there is no problem call Veritas. Veritas tells me the
error is given by io_ctl in the OS call Sun. 

thanks for your input it is nice not to be all alone in this. 




--- Dave Markham [EMAIL PROTECTED] wrote:

 Unbelievably i have seen this yesterday as a windows guy asked me if i 
 knew about it seeing as i support Netbackup on solaris.
 
 The fix he got which worked was to disable the Removable storage manager 
 service. The errors are no more.
 
 That was on a windows 2003 setup with netbackmup 5.1 mp4
 
 Roger Dombrowski wrote:
 
  Hi Blaine,
 
  I have been looking to try and solve this problem for two sites that 
  I'm working with right now
  and we're not having much luck either.  In my travels I've talked to a 
  few folks that have seen
  this External Event issue caused by monitoring software. One client 
  in particular found that one
  of Sun's monitoring tools was sending out scsi inquiries and causing 
  the external event rewinds.
 
  I also ran across a post on this mailing list that documents about 30 
  such applications that have
  been known to cause this type of behaviour.  Try searching this list 
  for external event. If a get
  a chance, I'll try and dig it up and send you the post I'm thinking of.
 
  Through the course of my research I've basically found that two things 
  are trying to communicate
  with the drive and most folks check out the data path (hba's, 
  switches, bridges,...) to look for problems.
 
  Maybe the upgrade stepped on some scsi reservation setting. If I find 
  anything else, I'll post to the
  list...
 
  Blaine Robison wrote:
 
  I am having a similar issue. I have a windows 2000 master and a pair 
  of sun
  480's with 8 LTO2 drives shared between them. I get External Event 
  caused
  rewind error and the tapes get frozen or the drives go down. I didn't 
  have the
  problem unti lI upgraded to 5.1 MP4. I have gone over the entire 
  configuration
  and cannot find a problem.
  Has anyone else seen this and found a resolution?  
  --- [EMAIL PROTECTED] wrote:
 
   
 
  Have you tried /var/adm/messages (Solaris) or the equivalent log ?
 
  Regards
  Michael
 
  On Wed, 18 Jan 2006 15:00:24 +, Dave Markham wrote

 
  I have 1 master server, and 2 media servers connected over fiber to 
  an L700. Im not sure what the switch in the middle is as didnt 
  install the system or have any info on it.
 
  There are 5 drives in the L700 and 3 of them are shared with sso 
  option to the master, and both media servers.
 
  People i have had an issue lately with drives being not visible to 
  one of my media servers.
 
  I have fixed this by unloading the fibre hba using cfgadm and 
  loading it again. It then can see the devices under sgscan and has 
  seen them under /dev/rmt
 
  I also noticed the customer had removed a /etc/hosts entry for the 
  media servers to talk to each other by the correct name so i put 
  that back in and can now talk on port 13701 to each machine in the 
  nbu setup.
 
  Whats happening now though is drives just keep going down on the 
  media servers and backups are not working. I have ITC enabled so 
  each media server needs to lock 2 drives.
 
  I have looked the bptm logs and cant see anything jumping out apart 
  from many request medias of different tape ids. I have looked in 
  /usr/openv/volmgr/debug/ltid/ and the logs in their show 
  successfully on communicating shared drive info to the master.
 
  Therefore i am now stuck and have no idea whats going wrong :(
 
  Anyone any advice/pointers? Is ether anything specific i should be 
  looking for in the logs or are there other important logs im not 
  checking.
 
  Thanks
  ___
  Veritas-bu maillist  -  Veritas-bu@mailman.eng.auburn.edu
  http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
  
 
  -- 
  Cybercity Webhosting 

RE: [Veritas-bu] drives going down on media servers

2006-01-26 Thread Paul Keating
Sounds like SSO is misbehaving.



 -Original Message-
 From: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] On Behalf Of 
 Blaine Robison
 Sent: January 26, 2006 11:17 AM
 To: [EMAIL PROTECTED]; Roger Dombrowski
 Cc: [EMAIL PROTECTED]; veritas-bu@mailman.eng.auburn.edu
 Subject: Re: [Veritas-bu] drives going down on media servers
 
 
 thanks for your input, I have already disabled the RSM on the master. 
 Let me give the entire rundown of the system. 
 
 Master Server 
 win 2000 NBU 5.1 MP4
 
 Media Servers
 2 Sun 480 Sol9 qlogic cards with Leadville drivers. 
 
 Master and media servers are on their own internal Gb 
 network. they are SAN
 attaced to the tape drives using Brocade switches. the drives 
 are IBM and HP
 LTO2. 
 
 when i run the backups on 1 media server the backups run 
 fine. When I try to
 share the drives and run both master servers I get Permission 
 denied in the
 messages files then 84,85 errors. In the bptm log I see the 
 external event
 caused rewind. After talking with STK they told me the drive 
 was getting
 inquiries from another system during the backup. I am lead to 
 believe that the
 SCSI reserve is not being handled properly between the 
 servers. since the SCSI
 reserve is supposed to be initiated when the drive is opened 
 I would think it
 would not except any inquiries or SCSI commands until it was closed. 
 
 My conclusion is the Leadville HBA drivers are not handling 
 the SCSI reserve
 properly. But Sun says there is no problem call Veritas. 
 Veritas tells me the
 error is given by io_ctl in the OS call Sun. 
 
 thanks for your input it is nice not to be all alone in this. 
 
 
 
 
 --- Dave Markham [EMAIL PROTECTED] wrote:
 
  Unbelievably i have seen this yesterday as a windows guy 
 asked me if i 
  knew about it seeing as i support Netbackup on solaris.
  
  The fix he got which worked was to disable the Removable 
 storage manager 
  service. The errors are no more.
  
  That was on a windows 2003 setup with netbackmup 5.1 mp4
  
  Roger Dombrowski wrote:
  
   Hi Blaine,
  
   I have been looking to try and solve this problem for two 
 sites that 
   I'm working with right now
   and we're not having much luck either.  In my travels 
 I've talked to a 
   few folks that have seen
   this External Event issue caused by monitoring 
 software. One client 
   in particular found that one
   of Sun's monitoring tools was sending out scsi inquiries 
 and causing 
   the external event rewinds.
  
   I also ran across a post on this mailing list that 
 documents about 30 
   such applications that have
   been known to cause this type of behaviour.  Try 
 searching this list 
   for external event. If a get
   a chance, I'll try and dig it up and send you the post 
 I'm thinking of.
  
   Through the course of my research I've basically found 
 that two things 
   are trying to communicate
   with the drive and most folks check out the data path (hba's, 
   switches, bridges,...) to look for problems.
  
   Maybe the upgrade stepped on some scsi reservation 
 setting. If I find 
   anything else, I'll post to the
   list...
  
   Blaine Robison wrote:
  
   I am having a similar issue. I have a windows 2000 
 master and a pair 
   of sun
   480's with 8 LTO2 drives shared between them. I get 
 External Event 
   caused
   rewind error and the tapes get frozen or the drives go 
 down. I didn't 
   have the
   problem unti lI upgraded to 5.1 MP4. I have gone over the entire 
   configuration
   and cannot find a problem.
   Has anyone else seen this and found a resolution?  
   --- [EMAIL PROTECTED] wrote:
  

  
   Have you tried /var/adm/messages (Solaris) or the 
 equivalent log ?
  
   Regards
   Michael
  
   On Wed, 18 Jan 2006 15:00:24 +, Dave Markham wrote
 
  
   I have 1 master server, and 2 media servers connected 
 over fiber to 
   an L700. Im not sure what the switch in the middle is as didnt 
   install the system or have any info on it.
  
   There are 5 drives in the L700 and 3 of them are 
 shared with sso 
   option to the master, and both media servers.
  
   People i have had an issue lately with drives being 
 not visible to 
   one of my media servers.
  
   I have fixed this by unloading the fibre hba using cfgadm and 
   loading it again. It then can see the devices under 
 sgscan and has 
   seen them under /dev/rmt
  
   I also noticed the customer had removed a /etc/hosts 
 entry for the 
   media servers to talk to each other by the correct 
 name so i put 
   that back in and can now talk on port 13701 to each 
 machine in the 
   nbu setup.
  
   Whats happening now though is drives just keep going 
 down on the 
   media servers and backups are not working. I have ITC 
 enabled so 
   each media server needs to lock 2 drives.
  
   I have looked the bptm logs and cant see anything 
 jumping out apart 
   from many request medias of different tape ids. I have 
 looked in 
   /usr/openv/volmgr/debug/ltid/ and the logs

Re: [Veritas-bu] drives going down on media servers

2006-01-24 Thread Roger Dombrowski

Hi Blaine,

I have been looking to try and solve this problem for two sites that I'm 
working with right now
and we're not having much luck either.  In my travels I've talked to a 
few folks that have seen
this External Event issue caused by monitoring software. One client in 
particular found that one
of Sun's monitoring tools was sending out scsi inquiries and causing the 
external event rewinds.


I also ran across a post on this mailing list that documents about 30 
such applications that have
been known to cause this type of behaviour.  Try searching this list for 
external event. If a get

a chance, I'll try and dig it up and send you the post I'm thinking of.

Through the course of my research I've basically found that two things 
are trying to communicate
with the drive and most folks check out the data path (hba's, switches, 
bridges,...) to look for problems.


Maybe the upgrade stepped on some scsi reservation setting. If I find 
anything else, I'll post to the

list...

Blaine Robison wrote:


I am having a similar issue. I have a windows 2000 master and a pair of sun
480's with 8 LTO2 drives shared between them. I get External Event caused
rewind error and the tapes get frozen or the drives go down. I didn't have the
problem unti lI upgraded to 5.1 MP4. I have gone over the entire configuration
and cannot find a problem. 

Has anyone else seen this and found a resolution?   


--- [EMAIL PROTECTED] wrote:

 


Have you tried /var/adm/messages (Solaris) or the equivalent log ?

Regards
Michael

On Wed, 18 Jan 2006 15:00:24 +, Dave Markham wrote
   

I have 1 master server, and 2 media servers connected over fiber to 
an L700. Im not sure what the switch in the middle is as didnt 
install the system or have any info on it.


There are 5 drives in the L700 and 3 of them are shared with sso 
option to the master, and both media servers.


People i have had an issue lately with drives being not visible to 
one of my media servers.


I have fixed this by unloading the fibre hba using cfgadm and 
loading it again. It then can see the devices under sgscan and has 
seen them under /dev/rmt


I also noticed the customer had removed a /etc/hosts entry for the 
media servers to talk to each other by the correct name so i put 
that back in and can now talk on port 13701 to each machine in the 
nbu setup.


Whats happening now though is drives just keep going down on the 
media servers and backups are not working. I have ITC enabled so 
each media server needs to lock 2 drives.


I have looked the bptm logs and cant see anything jumping out apart 
from many request medias of different tape ids. I have looked in 
/usr/openv/volmgr/debug/ltid/ and the logs in their show 
successfully on communicating shared drive info to the master.


Therefore i am now stuck and have no idea whats going wrong :(

Anyone any advice/pointers? Is ether anything specific i should be 
looking for in the logs or are there other important logs im not checking.


Thanks
___
Veritas-bu maillist  -  Veritas-bu@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
 


--
Cybercity Webhosting (http://www.cybercity.dk)

___
Veritas-bu maillist  -  Veritas-bu@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu

   




Blaine Robison
Solaris Ceritfied System Administrator 
Solaris Certified Network Administrator

Veritas Certified Professional
972-853-2459
214-578-5391

__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 
___

Veritas-bu maillist  -  Veritas-bu@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu

 




--
Roger DombrowskidcVAST, Inc.
[EMAIL PROTECTED]   1327 Butterfield Rd.
ATT: (630) 964-6060Suite 610
FAX:  (630) 964-6069Downers Grove, IL 60515

___
Veritas-bu maillist  -  Veritas-bu@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu


Re: [Veritas-bu] drives going down on media servers

2006-01-24 Thread Dave Markham
Unbelievably i have seen this yesterday as a windows guy asked me if i 
knew about it seeing as i support Netbackup on solaris.


The fix he got which worked was to disable the Removable storage manager 
service. The errors are no more.


That was on a windows 2003 setup with netbackmup 5.1 mp4

Roger Dombrowski wrote:


Hi Blaine,

I have been looking to try and solve this problem for two sites that 
I'm working with right now
and we're not having much luck either.  In my travels I've talked to a 
few folks that have seen
this External Event issue caused by monitoring software. One client 
in particular found that one
of Sun's monitoring tools was sending out scsi inquiries and causing 
the external event rewinds.


I also ran across a post on this mailing list that documents about 30 
such applications that have
been known to cause this type of behaviour.  Try searching this list 
for external event. If a get

a chance, I'll try and dig it up and send you the post I'm thinking of.

Through the course of my research I've basically found that two things 
are trying to communicate
with the drive and most folks check out the data path (hba's, 
switches, bridges,...) to look for problems.


Maybe the upgrade stepped on some scsi reservation setting. If I find 
anything else, I'll post to the

list...

Blaine Robison wrote:

I am having a similar issue. I have a windows 2000 master and a pair 
of sun
480's with 8 LTO2 drives shared between them. I get External Event 
caused
rewind error and the tapes get frozen or the drives go down. I didn't 
have the
problem unti lI upgraded to 5.1 MP4. I have gone over the entire 
configuration

and cannot find a problem.
Has anyone else seen this and found a resolution?  
--- [EMAIL PROTECTED] wrote:


 


Have you tried /var/adm/messages (Solaris) or the equivalent log ?

Regards
Michael

On Wed, 18 Jan 2006 15:00:24 +, Dave Markham wrote
  

I have 1 master server, and 2 media servers connected over fiber to 
an L700. Im not sure what the switch in the middle is as didnt 
install the system or have any info on it.


There are 5 drives in the L700 and 3 of them are shared with sso 
option to the master, and both media servers.


People i have had an issue lately with drives being not visible to 
one of my media servers.


I have fixed this by unloading the fibre hba using cfgadm and 
loading it again. It then can see the devices under sgscan and has 
seen them under /dev/rmt


I also noticed the customer had removed a /etc/hosts entry for the 
media servers to talk to each other by the correct name so i put 
that back in and can now talk on port 13701 to each machine in the 
nbu setup.


Whats happening now though is drives just keep going down on the 
media servers and backups are not working. I have ITC enabled so 
each media server needs to lock 2 drives.


I have looked the bptm logs and cant see anything jumping out apart 
from many request medias of different tape ids. I have looked in 
/usr/openv/volmgr/debug/ltid/ and the logs in their show 
successfully on communicating shared drive info to the master.


Therefore i am now stuck and have no idea whats going wrong :(

Anyone any advice/pointers? Is ether anything specific i should be 
looking for in the logs or are there other important logs im not 
checking.


Thanks
___
Veritas-bu maillist  -  Veritas-bu@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu



--
Cybercity Webhosting (http://www.cybercity.dk)

___
Veritas-bu maillist  -  Veritas-bu@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu

  




Blaine Robison
Solaris Ceritfied System Administrator Solaris Certified Network 
Administrator

Veritas Certified Professional
972-853-2459
214-578-5391

__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com ___

Veritas-bu maillist  -  Veritas-bu@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu

 






___
Veritas-bu maillist  -  Veritas-bu@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu


Re: [Veritas-bu] drives going down on media servers

2006-01-23 Thread Blaine Robison
I am having a similar issue. I have a windows 2000 master and a pair of sun
480's with 8 LTO2 drives shared between them. I get External Event caused
rewind error and the tapes get frozen or the drives go down. I didn't have the
problem unti lI upgraded to 5.1 MP4. I have gone over the entire configuration
and cannot find a problem. 

Has anyone else seen this and found a resolution?   

--- [EMAIL PROTECTED] wrote:

 Have you tried /var/adm/messages (Solaris) or the equivalent log ?
 
 Regards
 Michael
 
 On Wed, 18 Jan 2006 15:00:24 +, Dave Markham wrote
  I have 1 master server, and 2 media servers connected over fiber to 
  an L700. Im not sure what the switch in the middle is as didnt 
  install the system or have any info on it.
  
  There are 5 drives in the L700 and 3 of them are shared with sso 
  option to the master, and both media servers.
  
  People i have had an issue lately with drives being not visible to 
  one of my media servers.
  
  I have fixed this by unloading the fibre hba using cfgadm and 
  loading it again. It then can see the devices under sgscan and has 
  seen them under /dev/rmt
  
  I also noticed the customer had removed a /etc/hosts entry for the 
  media servers to talk to each other by the correct name so i put 
  that back in and can now talk on port 13701 to each machine in the 
  nbu setup.
  
  Whats happening now though is drives just keep going down on the 
  media servers and backups are not working. I have ITC enabled so 
  each media server needs to lock 2 drives.
  
  I have looked the bptm logs and cant see anything jumping out apart 
  from many request medias of different tape ids. I have looked in 
  /usr/openv/volmgr/debug/ltid/ and the logs in their show 
  successfully on communicating shared drive info to the master.
  
  Therefore i am now stuck and have no idea whats going wrong :(
  
  Anyone any advice/pointers? Is ether anything specific i should be 
  looking for in the logs or are there other important logs im not checking.
  
  Thanks
  ___
  Veritas-bu maillist  -  Veritas-bu@mailman.eng.auburn.edu
  http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
 
 
 --
 Cybercity Webhosting (http://www.cybercity.dk)
 
 ___
 Veritas-bu maillist  -  Veritas-bu@mailman.eng.auburn.edu
 http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
 


Blaine Robison
Solaris Ceritfied System Administrator 
Solaris Certified Network Administrator
Veritas Certified Professional
972-853-2459
214-578-5391

__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 
___
Veritas-bu maillist  -  Veritas-bu@mailman.eng.auburn.edu
http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu