Re: [Bacula-users] btape test works, btape fill doesn't
I'm not positive, but I think this might be a bug in the btape.c unfillcmd(). I ran into the same problems when I was setting up my adic Scalar 218 DLT7000 library under FreeBSD 5.4. The problem comes when the unfill command tries to read back the first 1000 blocks. It has a check in a function called quickie_count() (or something like that) that requires the current file on the tape to be 0, but it looks like the label command writes an EOF to end the label, so the unfill command gets and EOF somewhere around block 40??? (it's been a week or two), which bumps the file counter to 1, resulting in the ERROR! device at 1:1 count=1. I'm not really familiar enough with the inner workings of bacula to say if the error is in the EOF after the label, or if the file check should be for file = 1. I just changed the check to require that file is less than or equal to 1, and everything worked fine after that. Without digging around some more, I can't say if it's correct or not, but everything is working fine for me now, and I haven't had any problems with my test restores. The only difference is that I've been using mt seteotmodel 1, then using fast forward space file=yes, and TWO EOF=no in the storage daemon config. It the same setup as what's listed as the second option in the docs under FreeBSD tape support. It's working fine, and it really speeds up restores. Hope this helps, Cameron Kern Sibbald wrote: On Thursday 15 December 2005 02:57, Joe Dollard wrote: Hi, I'm in the process of setting up bacula for the first time for our network but am getting errors when I run btape fill. I'm running bacula version 1.38.1 (installed from ports) on FreeBSD 5.4 with a Quantum SDLT 110/220 tape drive (which is inside a Overland LoaderXPress). If I run the btape test command the output indicates everything worked successfully, however when running btape fill using single tape mode btape gives me the following error: btape: btape.c:2329 End of tape 108:0. VolumeCapacity=102,275,630,752. Write rate = 9689.8 KB/s Done writing ... Done filling tape at 108:0. Now beginning re-read of tape ... ERROR! device at 1:1 count=1 ERROR! device at 1:1 count=1 Reposition from 1:1 to 106:10374 I waited about 30 minutes after receiving this last message but btape didn't display anything else. Assuming it had hung I killed it with control+c. Output from btape fill, btape test and my bacula-sd.conf file are included below. Does anybody have any ideas about what might be wrong here? I'm trying to get bacula into production in the next few days, so any help would be greatly appreciated. I suspect that you killed it too quicly. It takes some time to reposition 102 GBytes on a tape, and probably 5-10 times longer on FreeBSD, which doesn't have fast forward space file. This requires Bacula to read every record -- if you think it is slow, a complaint to the SCSI tape driver writers might help ... Thanks, Joe Here's the device definition from my bacula-sd.conf file: Device { Name = Overland Description = Overland on FreeBSD Media Type = SDLT-1 Archive Device = /dev/nsa0 AutomaticMount = yes; # when device opened, read it AlwaysOpen = yes Offline On Unmount = no Hardware End of Medium = no #FreeBSD secific BSF at EOM = yes #FreeBSD secific Backward Space Record = no#FreeBSD secific Fast Forward Space File = no #FreeBSD secific TWO EOF = yes #FreeBSD secific # Autochanger = yes # Changer Device = /dev/pass0 # Changer Command = /usr/local/sbin/mtx-changer %c %o %S %a %d } The device has an autochanger but I want to get it working without an autochanger first. Here's some of the output from btape fill: [EMAIL PROTECTED] btape /dev/nsa0 Tape block granularity is 1024 bytes. btape: butil.c:266 Using device: /dev/nsa0 for writing. btape: btape.c:338 open device Overland (/dev/nsa0): OK *fill This command simulates Bacula writing to a tape. It requires either one or two blank tapes, which it will label and write. If you have an autochanger configured, it will use the tapes that are in slots 1 and 2, otherwise, you will be prompted to insert the tapes when necessary. It will print a status approximately every 322 MB, and write an EOF every 3.2 GB. If you have selected the simple test option, after writing the first tape it will rewind it and re-read the last block written. If you have selected the multiple tape test, when the first tape fills, it will ask for a second, and after writing a few more blocks, it will stop. Then it will begin re-reading the two tapes. This may take a long time -- hours! ... Do you want to run the simplified test (s) with one tape or the complete multiple tape (m) test: (s/m) s Simple test (single tape) selected. Wrote Volume label for volume TestVolume1. Wrote Start of Session label. Begin writing Bacula records to tape ... Wrote blk_block=5000,
Re: [Bacula-users] 1.38.1 on FreeBSD device not configured and mutex unlock failure
Hey guys, I think I found the bug... it looks like in autochanger.c at line 130 there is a test to see if the loaded slot is what is wanted. If not, it locks the drive, and loads the proper one. The problem is the unlock doesn't come until line 177, which is after the conditional block. If you move unlock_changer(dcr) to line 165, it won't be unbalanced if the drive already had the right tape to begin with. Hope this helps, Cameron Kern Sibbald wrote: Hello, This looks like a bug to me. You are getting a mutex (lock) failure probably because a mutex is being cleared twice. Normally this doesn't happen and FreeBSD passed all my regression scripts. However, I see that in your case, Bacula did not recognize the tape as a Bacula labeled tape, so perhaps it took some error path where this problem occurs. I'd appreciate it if you would submit a bug report on this titled SD crashes with mutex failure and include everything in your email. Then, please make sure your binaries are not stripped by the installation proces so that the debugging symbols will be available, and run the Storage daemon under the debugger as described in the Kaboom chapter. When it dies, please get a traceback (as described in the manual) and add it to the bug report as a xxx.txt file. This should give me a better idea where the problem is coming from. In the mean time, I will see if I can duplicate this. If I can, I can fix it, if not, we may need to turn on mutex trace code in the source to figure out what is going wrong. On Saturday 19 November 2005 07:02, Josh Endries wrote: Hello, I'm having trouble getting things working after upgrading to 1.38 on my system. btape's normal and autochanger tests were all successful, it's probably a FreeBSD issue but I don't know what. I have an Exabyte VXA 1x10 2U autoloader. 19-Nov 00:14 backup-dir: Start Backup JobId 162, Job=leto_Daily_Backup.2005-11-19_00.14.40 19-Nov 00:21 leto: DIR and FD clocks differ by 398 seconds, FD automatically adjusting. 19-Nov 00:14 backup: 3301 Issuing autochanger loaded drive 0 command. 19-Nov 00:14 backup: 3302 Autochanger loaded drive 0, result is Slot 1. * \19-Nov 00:15 backup: 3307 Issuing autochanger unload slot 1, drive 0 command. 19-Nov 00:15 backup: 3304 Issuing autochanger load slot 6, drive 0 command. *\ 19-Nov 00:16 backup: 3305 Autochanger load slot 6, drive 0, status is OK. 19-Nov 00:16 backup: 3301 Issuing autochanger loaded drive 0 command. 19-Nov 00:16 backup: 3302 Autochanger loaded drive 0, result is Slot 6. * 19-Nov 00:18 backup: 3301 Issuing autochanger loaded drive 0 command. 19-Nov 00:18 backup: 3302 Autochanger loaded drive 0, result is Slot 6. 19-Nov 00:25 leto: leto_Daily_Backup.2005-11-19_00.14.40 Fatal error: job.c:1594 Comm error with SD. bad response to Append Data. ERR=Broken pipe 19-Nov 00:18 backup-dir: leto_Daily_Backup.2005-11-19_00.14.40 Error: Bacula 1.38.1 (14Nov05): 19-Nov-2005 00:18:47 JobId: 162 Job:leto_Daily_Backup.2005-11-19_00.14.40 Backup Level: Differential, since=2005-09-17 01:00:04 Client: leto i386-portbld-freebsd5.4,freebsd,5.4-STABLE FileSet:leto 2005-09-13 20:26:12 Pool: Weekly Storage:Exabyte VXA AutoPak 1x10 Scheduled time: 19-Nov-2005 00:14:36 Start time: 19-Nov-2005 00:14:42 End time: 19-Nov-2005 00:18:47 Priority: 1 FD Files Written: 0 SD Files Written: 0 FD Bytes Written: 0 SD Bytes Written: 0 Rate: 0.0 KB/s Software Compression: None Volume name(s): Volume Session Id: 1 Volume Session Time:1132376988 Last Volume Bytes: 241,210,545 Non-fatal FD errors:0 SD Errors: 0 FD termination status: Error SD termination status: Error Termination:*** Backup Error *** 19-Nov 00:18 backup-dir: message.c:454 Mail prog: bsmtp: bsmtp.c:277 Fatal connect error to localhost: ERR=Connection refused 19-Nov 00:18 backup-dir: leto_Daily_Backup.2005-11-19_00.14.40 Error: message.c:465 Mail program terminated in error. CMD=/usr/local/sbin/bsmtp -h localhost -f (Bacula) [EMAIL PROTECTED] -s Bacula: Backup Fatal Error of leto Differential [EMAIL PROTECTED] ERR=Child exited with code 1 The SD process actually dies during this process (at 00:25). This is SD'd output: backup: cram-md5.c:52 send: auth cram-md5 [EMAIL PROTECTED] ssl=0 backup: cram-md5.c:68 Authenticate OK 7jUWknxuD5xCh/seH//vIA backup: cram-md5.c:114 sending resp to challenge: r+/jcih9CE+XEzwlz9+fzA backup: dircmd.c:187 Message channel init completed. backup: cram-md5.c:52 send: auth cram-md5 [EMAIL PROTECTED] ssl=0 backup: cram-md5.c:68 Authenticate OK X4+uVxsZG4+sd3+vmzU3ZC backup: cram-md5.c:114 sending resp to challenge: 5R+GJxtYK6+fLw+Rnm/47D backup: autochanger.c:217 run_prog: /usr/local/share/bacula/my-changer /dev/pass0 loaded 6 /dev/nsa0 0 stat=0