Re: [Bacula-users] Destructive Tape Label Crossing! (was: Problem mounting Volume)

2006-11-15 Thread Ryan Novosielski
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Last update to myself:

Seems as if what happened is that the wrong tape was inserted during the
wrong week. For reasons unknown (if anyone can tell me where to look,
I'd be very grateful), the tape, when inserted the wrong week, appears
to have been written to. The volume label appears to have been changed
to match the tape that it wanted, and written despite the fact that it
was not writable at that time (it may have been just about ready to be
recycled, but still, it was in the wrong pool anyway).

bls indicates that the WRONG tape contains my last week's backups. The
new tape does not appear to contain anything. I see no place to get a
media ID for any of this stuff, but I suspect both tapes have a mediaID
that's the same at this point.

This is seriously messed up, and even if I -- or my staff -- did
something to cause it, I really need to know what to be careful not to
do again.

Please let me know what I should provide to the list or how I should
troubleshoot this. I'm leaving everything as-is for now so that I have
all the evidence.

Thanks for any assistance you can provide.



Ryan Novosielski wrote:
 Here's a followup to myself. Apparently, I have two tapes called
 catalyst_BW1 and have no idea how I could have gotten into this
 situation. One of them is a brandy new tape:
 
 Volume Label:
 Id: Bacula 1.0 immortal
 VerNo : 11
 VolName   : catalyst_BW1
 PrevVolName   :
 VolFile   : 0
 LabelType : PRE_LABEL
 LabelSize : 168
 PoolName  : catalyst_FULL
 MediaType : DDS-4
 PoolType  : Backup
 HostName  : helios
 Date label written: 31-Oct-2006 08:51
 
 ...one of them has been around for awhile, supposedly was not touched,
 is empty (this is probably OK), but for some reason does not have the
 right volume name anymore:
 
 Volume Label:
 Id: Bacula 1.0 immortal
 VerNo : 11
 VolName   : catalyst_BW1
 PrevVolName   :
 VolFile   : 0
 LabelType : VOL_LABEL
 LabelSize : 168
 PoolName  : catalyst_FULL
 MediaType : DDS-4
 PoolType  : Backup
 HostName  : helios
 Date label written: 31-Oct-2006 08:51
 
 I can't tell which media ID's these two tapes think they have. Is there
 any way with any of the commands that work directly on the tapes (ie.
 NOT the catalog) to check? It doesn't look like bls is interested in
 telling me.
 
 =R
 
 Ryan Novosielski wrote:
 OK, here's how I got into this mess:
 
 Operations staff has a calendar for which tape goes in when -- they
 wrote it out because they really don't have the knowhow to check for
 themselves. Well, they messed up because I don't have a full backup
 scheduled for the fifth Tuesday:
 
 Schedule {
   Name = UMD-F13T-Inc
   Run = Level=Full Storage=helios_DAT72 1st,3rd tue at 21:00
   Run = Level=Incremental Storage=helios_DDS 1st,3rd mon,wed-fri at 23:00
   Run = Level=Incremental Storage=helios_DDS 2nd,4th-5th mon-fri at 23:00
 }
 
 Schedule {
   Name = UMD-F24T-Inc
   Run = Level=Full Storage=helios_DAT72 2nd,4th tue at 21:00
   Run = Level=Incremental Storage=helios_DDS 2nd,4th mon,wed-fri at 23:00
   Run = Level=Incremental Storage=helios_DDS 1st,3rd,5th mon-fri at 23:00
 }
 
 ...however, their calendar had a fifth Tuesday and messed up the
 rotation. Today, as a result, the wrong tape went into the drive. Here
 is my storage config:
 
 Device {
   Name = helios_DAT72   #
   Media Type = DDS-4
   Archive Device = /dev/rmt/1lbn
   AutomaticMount = yes;   # when device opened, read it
   AlwaysOpen = no;
   Volume Poll Interval = 30 minutes;
   Close on Poll = yes;
   RemovableMedia = yes;
   RandomAccess = no;
   Spool Directory = /usr/local/bacula/var/spool;
 }
 
 I meant to set AlwaysOpen to yes here, but apparently did not -- so now
 I'm even more confused. Anyway, what happened... it tried to run a
 backup, but it looked at the tape and saw that it was used and in the
 wrong pool, and rightly refused. We noticed the error and now have the
 proper tape in the drive. However:
 
 #umount
 Using default Catalog name=MyCatalog DB=bacula
 The defined Storage resources are:
  1: File
  2: helios_DDS
  3: helios_DAT72
 Select Storage resource (1-3): Unexpected question has been received.
 3
 3901 Device helios_DAT72 (/dev/rmt/1lbn) is already unmounted.
 #mount
 The defined Storage resources are:
  1: File
  2: helios_DDS
  3: helios_DAT72
 Select Storage resource (1-3): Unexpected question has been received.
 3
 3001 Mounted Volume: catalyst_BW1
 3001 Device helios_DAT72 (/dev/rmt/1lbn) is already mounted with
 Volume catalyst_BW1
 #
 
 ...as you can see, my tape is both already unmounted and already
 mounted, and claims that the tape that is in the drive is the tape that
 I've taken out of the drive and replaced with the right tape. It has
 requested the tape I'd 

Re: [Bacula-users] Destructive Tape Label Crossing!

2006-11-15 Thread Arno Lehmann
Hello,

On 11/15/2006 6:22 PM, Ryan Novosielski wrote:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1
 
 Last update to myself:
 
 Seems as if what happened is that the wrong tape was inserted during the
 wrong week. For reasons unknown (if anyone can tell me where to look,
 I'd be very grateful), the tape, when inserted the wrong week, appears
 to have been written to. The volume label appears to have been changed
 to match the tape that it wanted, and written despite the fact that it
 was not writable at that time (it may have been just about ready to be
 recycled, but still, it was in the wrong pool anyway).
 
 bls indicates that the WRONG tape contains my last week's backups. The
 new tape does not appear to contain anything. I see no place to get a
 media ID for any of this stuff, but I suspect both tapes have a mediaID
 that's the same at this point.

As far as I know, media IDs are only stored in the catalog. You might 
find something if you compare older catalog dumps, checking for changes 
regarding that volume, but I wouldn't want to do that :-)

 This is seriously messed up, and even if I -- or my staff -- did
 something to cause it, I really need to know what to be careful not to
 do again.

If Bacula accidentially overwrites a tape label I would consider that a bug.

That said, I can imagine situations where such a thing can happen (but 
never investigated it):
Imagine you have a tape inserted in a drive, rewound. The SD uses 
always open=yes and the polling stuff.

The tape is mounted, and thus Bacula knows for sure which tape is in the 
drive.

If you can change the tape without unmounting from Bacula and the drive 
doesn't inform the OS of that operation, or Bacula doesn't query that 
status from the OS, and you change the tape in between Bacula acesses, 
what you describe might happen.

(Keep in mind that this is mostly fiction, not science - I don't know if 
such a thing might happen with any tape drive, OS, or Bacula without 
indicating a bug.)

 Please let me know what I should provide to the list or how I should
 troubleshoot this. I'm leaving everything as-is for now so that I have
 all the evidence.

What I'd do is to examine all my tapes to find the one that is missing. 
If there is exactly one tape label that can't be found, you know at 
least which volume got overwritten and can invalidate the jobs on it.

Also, given the fact that your operators work by a list, you can 
probably determine when the wrong tape was inserted, perhaps even who 
did it :-)

Once this is sorted out, you should use that example as reason why your 
operators should be educated to use Bacula for tape management and not 
their caledars ;-)

 Thanks for any assistance you can provide.

I have seen such a thing myself, once, but that was during a beta test 
phase where I more or less tried to get such results, and it happened 
before the new locking mechanisms were implemented IIRC.

Arno

 
 
 Ryan Novosielski wrote:
 
Here's a followup to myself. Apparently, I have two tapes called
catalyst_BW1 and have no idea how I could have gotten into this
situation. One of them is a brandy new tape:

Volume Label:
Id: Bacula 1.0 immortal
VerNo : 11
VolName   : catalyst_BW1
PrevVolName   :
VolFile   : 0
LabelType : PRE_LABEL
LabelSize : 168
PoolName  : catalyst_FULL
MediaType : DDS-4
PoolType  : Backup
HostName  : helios
Date label written: 31-Oct-2006 08:51

...one of them has been around for awhile, supposedly was not touched,
is empty (this is probably OK), but for some reason does not have the
right volume name anymore:

Volume Label:
Id: Bacula 1.0 immortal
VerNo : 11
VolName   : catalyst_BW1
PrevVolName   :
VolFile   : 0
LabelType : VOL_LABEL
LabelSize : 168
PoolName  : catalyst_FULL
MediaType : DDS-4
PoolType  : Backup
HostName  : helios
Date label written: 31-Oct-2006 08:51

I can't tell which media ID's these two tapes think they have. Is there
any way with any of the commands that work directly on the tapes (ie.
NOT the catalog) to check? It doesn't look like bls is interested in
telling me.

=R

Ryan Novosielski wrote:

OK, here's how I got into this mess:

Operations staff has a calendar for which tape goes in when -- they
wrote it out because they really don't have the knowhow to check for
themselves. Well, they messed up because I don't have a full backup
scheduled for the fifth Tuesday:

Schedule {
  Name = UMD-F13T-Inc
  Run = Level=Full Storage=helios_DAT72 1st,3rd tue at 21:00
  Run = Level=Incremental Storage=helios_DDS 1st,3rd mon,wed-fri at 23:00
  Run = Level=Incremental Storage=helios_DDS 2nd,4th-5th mon-fri at 23:00
}

Schedule {
  Name = UMD-F24T-Inc
  Run = Level=Full Storage=helios_DAT72 2nd,4th tue at 21:00
  Run = Level=Incremental Storage=helios_DDS 2nd,4th mon,wed-fri at 23:00
  Run = 

Re: [Bacula-users] Destructive Tape Label Crossing!

2006-11-15 Thread Ryan Novosielski
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Arno Lehmann wrote:
 Hello,
 
 On 11/15/2006 6:22 PM, Ryan Novosielski wrote:
 Last update to myself:
 
 Seems as if what happened is that the wrong tape was inserted during the
 wrong week. For reasons unknown (if anyone can tell me where to look,
 I'd be very grateful), the tape, when inserted the wrong week, appears
 to have been written to. The volume label appears to have been changed
 to match the tape that it wanted, and written despite the fact that it
 was not writable at that time (it may have been just about ready to be
 recycled, but still, it was in the wrong pool anyway).
 
 bls indicates that the WRONG tape contains my last week's backups. The
 new tape does not appear to contain anything. I see no place to get a
 media ID for any of this stuff, but I suspect both tapes have a mediaID
 that's the same at this point.
 
 As far as I know, media IDs are only stored in the catalog. You might 
 find something if you compare older catalog dumps, checking for changes 
 regarding that volume, but I wouldn't want to do that :-)
 
 This is seriously messed up, and even if I -- or my staff -- did
 something to cause it, I really need to know what to be careful not to
 do again.
 
 If Bacula accidentially overwrites a tape label I would consider that a bug.

I guess I should see about filing one, I just don't really have a lot of
information to provide at this point.

 That said, I can imagine situations where such a thing can happen (but 
 never investigated it):
 Imagine you have a tape inserted in a drive, rewound. The SD uses 
 always open=yes and the polling stuff.
 
 The tape is mounted, and thus Bacula knows for sure which tape is in the 
 drive.
 
 If you can change the tape without unmounting from Bacula and the drive 
 doesn't inform the OS of that operation, or Bacula doesn't query that 
 status from the OS, and you change the tape in between Bacula acesses, 
 what you describe might happen.
 
 (Keep in mind that this is mostly fiction, not science - I don't know if 
 such a thing might happen with any tape drive, OS, or Bacula without 
 indicating a bug.)

I'd think so too. However, in this case it appears as if AlwaysOpen is
off for this drive. I've seen a case where this exact thing DID happen
to someone on this mailing list. Basically the resolution was don't do
that. However, here, I do not use that directive for this drive.
Theoretically, there's no way for this to have happened.

 Please let me know what I should provide to the list or how I should
 troubleshoot this. I'm leaving everything as-is for now so that I have
 all the evidence.
 
 What I'd do is to examine all my tapes to find the one that is missing. 
 If there is exactly one tape label that can't be found, you know at 
 least which volume got overwritten and can invalidate the jobs on it.
 
 Also, given the fact that your operators work by a list, you can 
 probably determine when the wrong tape was inserted, perhaps even who 
 did it :-)

I've basically done this.

The result is that combined_BW1, the tape incorrectly inserted last
week, is now catalyst_BW1. catalyst_BW1, consequently, is empty, and
combined_BW1 no longer exists. However, I still believe that in this
particular case, given the course of events, this should NOT have happened.

 Once this is sorted out, you should use that example as reason why your 
 operators should be educated to use Bacula for tape management and not 
 their caledars ;-)

Is there really an easy way for the staff to determine next tape
though, when the storage devices and pools are defined in the schedule?
status dir does not show them in these cases (showing instead *unknown*).

 Thanks for any assistance you can provide.
 
 I have seen such a thing myself, once, but that was during a beta test 
 phase where I more or less tried to get such results, and it happened 
 before the new locking mechanisms were implemented IIRC.
 
 Arno
 
 
 Ryan Novosielski wrote:
 
 Here's a followup to myself. Apparently, I have two tapes called
 catalyst_BW1 and have no idea how I could have gotten into this
 situation. One of them is a brandy new tape:

 Volume Label:
 Id: Bacula 1.0 immortal
 VerNo : 11
 VolName   : catalyst_BW1
 PrevVolName   :
 VolFile   : 0
 LabelType : PRE_LABEL
 LabelSize : 168
 PoolName  : catalyst_FULL
 MediaType : DDS-4
 PoolType  : Backup
 HostName  : helios
 Date label written: 31-Oct-2006 08:51

 ...one of them has been around for awhile, supposedly was not touched,
 is empty (this is probably OK), but for some reason does not have the
 right volume name anymore:

 Volume Label:
 Id: Bacula 1.0 immortal
 VerNo : 11
 VolName   : catalyst_BW1
 PrevVolName   :
 VolFile   : 0
 LabelType : VOL_LABEL
 LabelSize : 168
 PoolName  : catalyst_FULL
 MediaType : DDS-4

Re: [Bacula-users] Destructive Tape Label Crossing!

2006-11-15 Thread Arno Lehmann
Hi,

On 11/15/2006 10:05 PM, Ryan Novosielski wrote:
...
 Is there really an easy way for the staff to determine next tape
 though, when the storage devices and pools are defined in the schedule?
 status dir does not show them in these cases (showing instead *unknown*).

Why not use the mails Bacula sends when it requests a new tape?

If your problem is getting the necessary tapes before they are 
requested, from off-site storage or a firesafe, then you could simply 
keep a small number of purged tapes from each volume available. Bacula 
is flexible enough to accept other tapes than the ones it requests if 
the tapes qualify.

Other than that, a more useful schedule listing would be nice, but then 
we'd all want that Bacula tells us which would be the next tape it wants 
when the currently scheduled one fill :-)

Arno

-- 
IT-Service Lehmann[EMAIL PROTECTED]
Arno Lehmann  http://www.its-lehmann.de

-
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT  business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.phpp=sourceforgeCID=DEVDEV
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users