Re: [Bacula-users] btape test works, btape fill doesn't

2005-12-15 Thread Cameron Murphy
I'm not positive, but I think this might be a bug in the btape.c  
unfillcmd().   I ran into the same problems when I was setting up my 
adic Scalar 218 DLT7000 library under FreeBSD 5.4.  The problem comes 
when the unfill command tries to read back the first 1000 blocks.  It 
has a check in a function called quickie_count() (or something like 
that) that requires the current file on the tape to be 0, but it 
looks like the label command writes an EOF to end the label, so the 
unfill command gets and EOF somewhere around block 40??? (it's been a 
week or two), which bumps the file counter to 1, resulting in the 
ERROR! device at 1:1 count=1.  I'm not really familiar enough with the 
inner workings of bacula to say if the error is in the EOF after the 
label, or if the file check should be for file = 1.   I just 
changed the check to require that file is less than or equal to 1, 
and everything worked fine after that.  Without digging around some 
more, I can't say if it's correct or not, but everything is working fine 
for me now, and I haven't had any problems with my test restores.


The only difference is that I've been using mt seteotmodel 1, then 
using  fast forward space file=yes, and TWO EOF=no in the storage 
daemon config.   It the same setup as what's listed as the second option 
in the docs under FreeBSD tape support.   It's working fine, and it 
really speeds up restores.



Hope this helps,

Cameron

Kern Sibbald wrote:


On Thursday 15 December 2005 02:57, Joe Dollard wrote:
 


Hi,
  I'm in the process of setting up bacula for the first time for our
network but am getting errors when I run btape fill.  I'm running bacula
version 1.38.1 (installed from ports) on FreeBSD 5.4 with a Quantum SDLT
110/220 tape drive (which is inside a Overland LoaderXPress).  If I run
the btape test command the output indicates everything worked
successfully, however when running btape fill using single tape mode
btape gives me the following error:

btape: btape.c:2329 End of tape 108:0. VolumeCapacity=102,275,630,752.
Write rate = 9689.8 KB/s
Done writing ...
Done filling tape at 108:0. Now beginning re-read of tape ...
ERROR! device at 1:1 count=1
ERROR! device at 1:1 count=1
Reposition from 1:1 to 106:10374

I waited about 30 minutes after receiving this last message but btape
didn't display anything else.  Assuming it had hung I killed it with
control+c.
Output from btape fill, btape test and my bacula-sd.conf file are
included below.

Does anybody have any ideas about what might be wrong here?  I'm trying
to get bacula into production in the next few days, so any help would be
greatly appreciated.
   



I suspect that you killed it too quicly.  It takes some time to reposition 102 
GBytes on a tape, and probably 5-10 times longer on FreeBSD, which doesn't 
have fast forward space file. This requires Bacula to read every record -- if 
you think it is slow, a complaint to the SCSI tape driver writers might 
help ...


 


Thanks,
Joe


Here's the device definition from my bacula-sd.conf file:
Device {
Name = Overland
Description = Overland on FreeBSD
Media Type = SDLT-1
Archive Device = /dev/nsa0
AutomaticMount = yes;   # when device opened, read it
AlwaysOpen = yes
Offline On Unmount = no
Hardware End of Medium = no   #FreeBSD secific
BSF at EOM = yes  #FreeBSD secific
Backward Space Record = no#FreeBSD secific
Fast Forward Space File = no  #FreeBSD secific
TWO EOF = yes #FreeBSD secific
#  Autochanger = yes
#  Changer Device = /dev/pass0
#  Changer Command = /usr/local/sbin/mtx-changer %c %o %S %a %d
}

The device has an autochanger but I want to get it working without an
autochanger first.


Here's some of the output from btape fill:
[EMAIL PROTECTED] btape /dev/nsa0
Tape block granularity is 1024 bytes.
btape: butil.c:266 Using device: /dev/nsa0 for writing.
btape: btape.c:338 open device Overland (/dev/nsa0): OK
*fill

This command simulates Bacula writing to a tape.
It requires either one or two blank tapes, which it
will label and write.

If you have an autochanger configured, it will use
the tapes that are in slots 1 and 2, otherwise, you will
be prompted to insert the tapes when necessary.

It will print a status approximately
every 322 MB, and write an EOF every 3.2 GB.  If you have
selected the simple test option, after writing the first tape
it will rewind it and re-read the last block written.

If you have selected the multiple tape test, when the first tape
fills, it will ask for a second, and after writing a few more
blocks, it will stop.  Then it will begin re-reading the
two tapes.

This may take a long time -- hours! ...

Do you want to run the simplified test (s) with one tape
or the complete multiple tape (m) test: (s/m) s
Simple test (single tape) selected.
Wrote Volume label for volume TestVolume1.
Wrote Start of Session label.
Begin writing Bacula records to tape ...
Wrote blk_block=5000, 

Re: [Bacula-users] 1.38.1 on FreeBSD device not configured and mutex unlock failure

2005-11-20 Thread Cameron Murphy
Hey guys, I think I found the bug...  it looks like in autochanger.c at 
line 130 there is a test to see if the loaded slot is what is wanted.  
If not, it locks the drive, and loads the proper one.   The problem is 
the unlock doesn't come until line 177, which is after the conditional 
block. If you move unlock_changer(dcr) to line 165, it won't be 
unbalanced if the drive already had the right tape to begin with.


Hope this helps,

Cameron


Kern Sibbald wrote:


Hello,

This looks like a bug to me.  You are getting a mutex (lock) failure probably 
because a mutex is being cleared twice.  Normally this doesn't happen and 
FreeBSD passed all my regression scripts.  However, I see that in your case, 
Bacula did not recognize the tape as a Bacula labeled tape, so perhaps it 
took some error path where this problem occurs.


I'd appreciate it if you would submit a bug report on this titled SD crashes 
with mutex failure and include everything in your email.


Then, please make sure your binaries are not stripped by the installation 
proces so that the debugging symbols will be available, and run the Storage 
daemon under the debugger as described in the Kaboom chapter. When it dies, 
please get a traceback (as described in the manual) and add it to the bug 
report as a xxx.txt file.  This should give me a better idea where the 
problem is coming from. 

In the mean time, I will see if I can duplicate this.  If I can, I can fix it, 
if not, we may need to turn on mutex trace code in the source to figure out 
what is going wrong.


On Saturday 19 November 2005 07:02, Josh Endries wrote:
 


Hello,

I'm having trouble getting things working after upgrading to 1.38 on
my system. btape's normal and autochanger tests were all successful,
it's probably a FreeBSD issue but I don't know what. I have an
Exabyte VXA 1x10 2U autoloader.

19-Nov 00:14 backup-dir: Start Backup JobId 162,
Job=leto_Daily_Backup.2005-11-19_00.14.40
19-Nov 00:21 leto: DIR and FD clocks differ by 398 seconds, FD
automatically adjusting.
19-Nov 00:14 backup: 3301 Issuing autochanger loaded drive 0 command.
19-Nov 00:14 backup: 3302 Autochanger loaded drive 0, result is Slot 1.
*
\19-Nov 00:15 backup: 3307 Issuing autochanger unload slot 1, drive
0 command.
19-Nov 00:15 backup: 3304 Issuing autochanger load slot 6, drive 0
command.
*\
19-Nov 00:16 backup: 3305 Autochanger load slot 6, drive 0, status
is OK.
19-Nov 00:16 backup: 3301 Issuing autochanger loaded drive 0 command.
19-Nov 00:16 backup: 3302 Autochanger loaded drive 0, result is Slot 6.
*
19-Nov 00:18 backup: 3301 Issuing autochanger loaded drive 0 command.
19-Nov 00:18 backup: 3302 Autochanger loaded drive 0, result is Slot 6.
19-Nov 00:25 leto: leto_Daily_Backup.2005-11-19_00.14.40 Fatal error:
job.c:1594 Comm error with SD. bad response to Append Data.
ERR=Broken pipe
19-Nov 00:18 backup-dir: leto_Daily_Backup.2005-11-19_00.14.40 Error:
Bacula 1.38.1 (14Nov05): 19-Nov-2005 00:18:47
 JobId:  162
 Job:leto_Daily_Backup.2005-11-19_00.14.40
 Backup Level:   Differential, since=2005-09-17 01:00:04
 Client: leto
i386-portbld-freebsd5.4,freebsd,5.4-STABLE
 FileSet:leto 2005-09-13 20:26:12
 Pool:   Weekly
 Storage:Exabyte VXA AutoPak 1x10
 Scheduled time: 19-Nov-2005 00:14:36
 Start time: 19-Nov-2005 00:14:42
 End time:   19-Nov-2005 00:18:47
 Priority:   1
 FD Files Written:   0
 SD Files Written:   0
 FD Bytes Written:   0
 SD Bytes Written:   0
 Rate:   0.0 KB/s
 Software Compression:   None
 Volume name(s):
 Volume Session Id:  1
 Volume Session Time:1132376988
 Last Volume Bytes:  241,210,545
 Non-fatal FD errors:0
 SD Errors:  0
 FD termination status:  Error
 SD termination status:  Error
 Termination:*** Backup Error ***

19-Nov 00:18 backup-dir: message.c:454 Mail prog: bsmtp: bsmtp.c:277
Fatal connect error to localhost: ERR=Connection refused
19-Nov 00:18 backup-dir: leto_Daily_Backup.2005-11-19_00.14.40 Error:
message.c:465 Mail program terminated in error.
CMD=/usr/local/sbin/bsmtp -h localhost -f (Bacula) [EMAIL PROTECTED]
-s Bacula: Backup Fatal Error of leto Differential [EMAIL PROTECTED]
ERR=Child exited with code 1

The SD process actually dies during this process (at 00:25). This is
SD'd output:

backup: cram-md5.c:52 send: auth cram-md5
[EMAIL PROTECTED] ssl=0
backup: cram-md5.c:68 Authenticate OK 7jUWknxuD5xCh/seH//vIA
backup: cram-md5.c:114 sending resp to challenge: r+/jcih9CE+XEzwlz9+fzA
backup: dircmd.c:187 Message channel init completed.
backup: cram-md5.c:52 send: auth cram-md5
[EMAIL PROTECTED] ssl=0
backup: cram-md5.c:68 Authenticate OK X4+uVxsZG4+sd3+vmzU3ZC
backup: cram-md5.c:114 sending resp to challenge: 5R+GJxtYK6+fLw+Rnm/47D
backup: autochanger.c:217 run_prog:
/usr/local/share/bacula/my-changer /dev/pass0 loaded 6 /dev/nsa0 0
stat=0