Re: [Veritas-bu] drives going down on media servers
thanks for your input, I have already disabled the RSM on the master. Let me give the entire rundown of the system. Master Server win 2000 NBU 5.1 MP4 Media Servers 2 Sun 480 Sol9 qlogic cards with Leadville drivers. Master and media servers are on their own internal Gb network. they are SAN attaced to the tape drives using Brocade switches. the drives are IBM and HP LTO2. when i run the backups on 1 media server the backups run fine. When I try to share the drives and run both master servers I get Permission denied in the messages files then 84,85 errors. In the bptm log I see the external event caused rewind. After talking with STK they told me the drive was getting inquiries from another system during the backup. I am lead to believe that the SCSI reserve is not being handled properly between the servers. since the SCSI reserve is supposed to be initiated when the drive is opened I would think it would not except any inquiries or SCSI commands until it was closed. My conclusion is the Leadville HBA drivers are not handling the SCSI reserve properly. But Sun says there is no problem call Veritas. Veritas tells me the error is given by io_ctl in the OS call Sun. thanks for your input it is nice not to be all alone in this. --- Dave Markham [EMAIL PROTECTED] wrote: Unbelievably i have seen this yesterday as a windows guy asked me if i knew about it seeing as i support Netbackup on solaris. The fix he got which worked was to disable the Removable storage manager service. The errors are no more. That was on a windows 2003 setup with netbackmup 5.1 mp4 Roger Dombrowski wrote: Hi Blaine, I have been looking to try and solve this problem for two sites that I'm working with right now and we're not having much luck either. In my travels I've talked to a few folks that have seen this External Event issue caused by monitoring software. One client in particular found that one of Sun's monitoring tools was sending out scsi inquiries and causing the external event rewinds. I also ran across a post on this mailing list that documents about 30 such applications that have been known to cause this type of behaviour. Try searching this list for external event. If a get a chance, I'll try and dig it up and send you the post I'm thinking of. Through the course of my research I've basically found that two things are trying to communicate with the drive and most folks check out the data path (hba's, switches, bridges,...) to look for problems. Maybe the upgrade stepped on some scsi reservation setting. If I find anything else, I'll post to the list... Blaine Robison wrote: I am having a similar issue. I have a windows 2000 master and a pair of sun 480's with 8 LTO2 drives shared between them. I get External Event caused rewind error and the tapes get frozen or the drives go down. I didn't have the problem unti lI upgraded to 5.1 MP4. I have gone over the entire configuration and cannot find a problem. Has anyone else seen this and found a resolution? --- [EMAIL PROTECTED] wrote: Have you tried /var/adm/messages (Solaris) or the equivalent log ? Regards Michael On Wed, 18 Jan 2006 15:00:24 +, Dave Markham wrote I have 1 master server, and 2 media servers connected over fiber to an L700. Im not sure what the switch in the middle is as didnt install the system or have any info on it. There are 5 drives in the L700 and 3 of them are shared with sso option to the master, and both media servers. People i have had an issue lately with drives being not visible to one of my media servers. I have fixed this by unloading the fibre hba using cfgadm and loading it again. It then can see the devices under sgscan and has seen them under /dev/rmt I also noticed the customer had removed a /etc/hosts entry for the media servers to talk to each other by the correct name so i put that back in and can now talk on port 13701 to each machine in the nbu setup. Whats happening now though is drives just keep going down on the media servers and backups are not working. I have ITC enabled so each media server needs to lock 2 drives. I have looked the bptm logs and cant see anything jumping out apart from many request medias of different tape ids. I have looked in /usr/openv/volmgr/debug/ltid/ and the logs in their show successfully on communicating shared drive info to the master. Therefore i am now stuck and have no idea whats going wrong :( Anyone any advice/pointers? Is ether anything specific i should be looking for in the logs or are there other important logs im not checking. Thanks ___ Veritas-bu maillist - Veritas-bu@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu -- Cybercity Webhosting
RE: [Veritas-bu] drives going down on media servers
Sounds like SSO is misbehaving. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Blaine Robison Sent: January 26, 2006 11:17 AM To: [EMAIL PROTECTED]; Roger Dombrowski Cc: [EMAIL PROTECTED]; veritas-bu@mailman.eng.auburn.edu Subject: Re: [Veritas-bu] drives going down on media servers thanks for your input, I have already disabled the RSM on the master. Let me give the entire rundown of the system. Master Server win 2000 NBU 5.1 MP4 Media Servers 2 Sun 480 Sol9 qlogic cards with Leadville drivers. Master and media servers are on their own internal Gb network. they are SAN attaced to the tape drives using Brocade switches. the drives are IBM and HP LTO2. when i run the backups on 1 media server the backups run fine. When I try to share the drives and run both master servers I get Permission denied in the messages files then 84,85 errors. In the bptm log I see the external event caused rewind. After talking with STK they told me the drive was getting inquiries from another system during the backup. I am lead to believe that the SCSI reserve is not being handled properly between the servers. since the SCSI reserve is supposed to be initiated when the drive is opened I would think it would not except any inquiries or SCSI commands until it was closed. My conclusion is the Leadville HBA drivers are not handling the SCSI reserve properly. But Sun says there is no problem call Veritas. Veritas tells me the error is given by io_ctl in the OS call Sun. thanks for your input it is nice not to be all alone in this. --- Dave Markham [EMAIL PROTECTED] wrote: Unbelievably i have seen this yesterday as a windows guy asked me if i knew about it seeing as i support Netbackup on solaris. The fix he got which worked was to disable the Removable storage manager service. The errors are no more. That was on a windows 2003 setup with netbackmup 5.1 mp4 Roger Dombrowski wrote: Hi Blaine, I have been looking to try and solve this problem for two sites that I'm working with right now and we're not having much luck either. In my travels I've talked to a few folks that have seen this External Event issue caused by monitoring software. One client in particular found that one of Sun's monitoring tools was sending out scsi inquiries and causing the external event rewinds. I also ran across a post on this mailing list that documents about 30 such applications that have been known to cause this type of behaviour. Try searching this list for external event. If a get a chance, I'll try and dig it up and send you the post I'm thinking of. Through the course of my research I've basically found that two things are trying to communicate with the drive and most folks check out the data path (hba's, switches, bridges,...) to look for problems. Maybe the upgrade stepped on some scsi reservation setting. If I find anything else, I'll post to the list... Blaine Robison wrote: I am having a similar issue. I have a windows 2000 master and a pair of sun 480's with 8 LTO2 drives shared between them. I get External Event caused rewind error and the tapes get frozen or the drives go down. I didn't have the problem unti lI upgraded to 5.1 MP4. I have gone over the entire configuration and cannot find a problem. Has anyone else seen this and found a resolution? --- [EMAIL PROTECTED] wrote: Have you tried /var/adm/messages (Solaris) or the equivalent log ? Regards Michael On Wed, 18 Jan 2006 15:00:24 +, Dave Markham wrote I have 1 master server, and 2 media servers connected over fiber to an L700. Im not sure what the switch in the middle is as didnt install the system or have any info on it. There are 5 drives in the L700 and 3 of them are shared with sso option to the master, and both media servers. People i have had an issue lately with drives being not visible to one of my media servers. I have fixed this by unloading the fibre hba using cfgadm and loading it again. It then can see the devices under sgscan and has seen them under /dev/rmt I also noticed the customer had removed a /etc/hosts entry for the media servers to talk to each other by the correct name so i put that back in and can now talk on port 13701 to each machine in the nbu setup. Whats happening now though is drives just keep going down on the media servers and backups are not working. I have ITC enabled so each media server needs to lock 2 drives. I have looked the bptm logs and cant see anything jumping out apart from many request medias of different tape ids. I have looked in /usr/openv/volmgr/debug/ltid/ and the logs
Re: [Veritas-bu] drives going down on media servers
Hi Blaine, I have been looking to try and solve this problem for two sites that I'm working with right now and we're not having much luck either. In my travels I've talked to a few folks that have seen this External Event issue caused by monitoring software. One client in particular found that one of Sun's monitoring tools was sending out scsi inquiries and causing the external event rewinds. I also ran across a post on this mailing list that documents about 30 such applications that have been known to cause this type of behaviour. Try searching this list for external event. If a get a chance, I'll try and dig it up and send you the post I'm thinking of. Through the course of my research I've basically found that two things are trying to communicate with the drive and most folks check out the data path (hba's, switches, bridges,...) to look for problems. Maybe the upgrade stepped on some scsi reservation setting. If I find anything else, I'll post to the list... Blaine Robison wrote: I am having a similar issue. I have a windows 2000 master and a pair of sun 480's with 8 LTO2 drives shared between them. I get External Event caused rewind error and the tapes get frozen or the drives go down. I didn't have the problem unti lI upgraded to 5.1 MP4. I have gone over the entire configuration and cannot find a problem. Has anyone else seen this and found a resolution? --- [EMAIL PROTECTED] wrote: Have you tried /var/adm/messages (Solaris) or the equivalent log ? Regards Michael On Wed, 18 Jan 2006 15:00:24 +, Dave Markham wrote I have 1 master server, and 2 media servers connected over fiber to an L700. Im not sure what the switch in the middle is as didnt install the system or have any info on it. There are 5 drives in the L700 and 3 of them are shared with sso option to the master, and both media servers. People i have had an issue lately with drives being not visible to one of my media servers. I have fixed this by unloading the fibre hba using cfgadm and loading it again. It then can see the devices under sgscan and has seen them under /dev/rmt I also noticed the customer had removed a /etc/hosts entry for the media servers to talk to each other by the correct name so i put that back in and can now talk on port 13701 to each machine in the nbu setup. Whats happening now though is drives just keep going down on the media servers and backups are not working. I have ITC enabled so each media server needs to lock 2 drives. I have looked the bptm logs and cant see anything jumping out apart from many request medias of different tape ids. I have looked in /usr/openv/volmgr/debug/ltid/ and the logs in their show successfully on communicating shared drive info to the master. Therefore i am now stuck and have no idea whats going wrong :( Anyone any advice/pointers? Is ether anything specific i should be looking for in the logs or are there other important logs im not checking. Thanks ___ Veritas-bu maillist - Veritas-bu@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu -- Cybercity Webhosting (http://www.cybercity.dk) ___ Veritas-bu maillist - Veritas-bu@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu Blaine Robison Solaris Ceritfied System Administrator Solaris Certified Network Administrator Veritas Certified Professional 972-853-2459 214-578-5391 __ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com ___ Veritas-bu maillist - Veritas-bu@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu -- Roger DombrowskidcVAST, Inc. [EMAIL PROTECTED] 1327 Butterfield Rd. ATT: (630) 964-6060Suite 610 FAX: (630) 964-6069Downers Grove, IL 60515 ___ Veritas-bu maillist - Veritas-bu@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
Re: [Veritas-bu] drives going down on media servers
Unbelievably i have seen this yesterday as a windows guy asked me if i knew about it seeing as i support Netbackup on solaris. The fix he got which worked was to disable the Removable storage manager service. The errors are no more. That was on a windows 2003 setup with netbackmup 5.1 mp4 Roger Dombrowski wrote: Hi Blaine, I have been looking to try and solve this problem for two sites that I'm working with right now and we're not having much luck either. In my travels I've talked to a few folks that have seen this External Event issue caused by monitoring software. One client in particular found that one of Sun's monitoring tools was sending out scsi inquiries and causing the external event rewinds. I also ran across a post on this mailing list that documents about 30 such applications that have been known to cause this type of behaviour. Try searching this list for external event. If a get a chance, I'll try and dig it up and send you the post I'm thinking of. Through the course of my research I've basically found that two things are trying to communicate with the drive and most folks check out the data path (hba's, switches, bridges,...) to look for problems. Maybe the upgrade stepped on some scsi reservation setting. If I find anything else, I'll post to the list... Blaine Robison wrote: I am having a similar issue. I have a windows 2000 master and a pair of sun 480's with 8 LTO2 drives shared between them. I get External Event caused rewind error and the tapes get frozen or the drives go down. I didn't have the problem unti lI upgraded to 5.1 MP4. I have gone over the entire configuration and cannot find a problem. Has anyone else seen this and found a resolution? --- [EMAIL PROTECTED] wrote: Have you tried /var/adm/messages (Solaris) or the equivalent log ? Regards Michael On Wed, 18 Jan 2006 15:00:24 +, Dave Markham wrote I have 1 master server, and 2 media servers connected over fiber to an L700. Im not sure what the switch in the middle is as didnt install the system or have any info on it. There are 5 drives in the L700 and 3 of them are shared with sso option to the master, and both media servers. People i have had an issue lately with drives being not visible to one of my media servers. I have fixed this by unloading the fibre hba using cfgadm and loading it again. It then can see the devices under sgscan and has seen them under /dev/rmt I also noticed the customer had removed a /etc/hosts entry for the media servers to talk to each other by the correct name so i put that back in and can now talk on port 13701 to each machine in the nbu setup. Whats happening now though is drives just keep going down on the media servers and backups are not working. I have ITC enabled so each media server needs to lock 2 drives. I have looked the bptm logs and cant see anything jumping out apart from many request medias of different tape ids. I have looked in /usr/openv/volmgr/debug/ltid/ and the logs in their show successfully on communicating shared drive info to the master. Therefore i am now stuck and have no idea whats going wrong :( Anyone any advice/pointers? Is ether anything specific i should be looking for in the logs or are there other important logs im not checking. Thanks ___ Veritas-bu maillist - Veritas-bu@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu -- Cybercity Webhosting (http://www.cybercity.dk) ___ Veritas-bu maillist - Veritas-bu@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu Blaine Robison Solaris Ceritfied System Administrator Solaris Certified Network Administrator Veritas Certified Professional 972-853-2459 214-578-5391 __ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com ___ Veritas-bu maillist - Veritas-bu@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu ___ Veritas-bu maillist - Veritas-bu@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu
Re: [Veritas-bu] drives going down on media servers
I am having a similar issue. I have a windows 2000 master and a pair of sun 480's with 8 LTO2 drives shared between them. I get External Event caused rewind error and the tapes get frozen or the drives go down. I didn't have the problem unti lI upgraded to 5.1 MP4. I have gone over the entire configuration and cannot find a problem. Has anyone else seen this and found a resolution? --- [EMAIL PROTECTED] wrote: Have you tried /var/adm/messages (Solaris) or the equivalent log ? Regards Michael On Wed, 18 Jan 2006 15:00:24 +, Dave Markham wrote I have 1 master server, and 2 media servers connected over fiber to an L700. Im not sure what the switch in the middle is as didnt install the system or have any info on it. There are 5 drives in the L700 and 3 of them are shared with sso option to the master, and both media servers. People i have had an issue lately with drives being not visible to one of my media servers. I have fixed this by unloading the fibre hba using cfgadm and loading it again. It then can see the devices under sgscan and has seen them under /dev/rmt I also noticed the customer had removed a /etc/hosts entry for the media servers to talk to each other by the correct name so i put that back in and can now talk on port 13701 to each machine in the nbu setup. Whats happening now though is drives just keep going down on the media servers and backups are not working. I have ITC enabled so each media server needs to lock 2 drives. I have looked the bptm logs and cant see anything jumping out apart from many request medias of different tape ids. I have looked in /usr/openv/volmgr/debug/ltid/ and the logs in their show successfully on communicating shared drive info to the master. Therefore i am now stuck and have no idea whats going wrong :( Anyone any advice/pointers? Is ether anything specific i should be looking for in the logs or are there other important logs im not checking. Thanks ___ Veritas-bu maillist - Veritas-bu@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu -- Cybercity Webhosting (http://www.cybercity.dk) ___ Veritas-bu maillist - Veritas-bu@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu Blaine Robison Solaris Ceritfied System Administrator Solaris Certified Network Administrator Veritas Certified Professional 972-853-2459 214-578-5391 __ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com ___ Veritas-bu maillist - Veritas-bu@mailman.eng.auburn.edu http://mailman.eng.auburn.edu/mailman/listinfo/veritas-bu