[Bacula-users] Migration needs volstat!=append, but how to reach? [Was: Re: [7.0.5] Option to start spooling before volume mount?]
Bezüglich John Lockard's Nachricht vom 26.10.2014 15:34 (localtime): I run into this issue with several of my servers and dealt with it by creating migrate jobs. Came to the same conclusion. Spooling doesn't make sense in it's current implementation for today's capacities. But there's another behaviour I'm fighting with: Volume Use Duration, defined in the director's pool resource, only has effect if I write (a second time) to the pool. This pool is only used as buffer for a second backup, which afterwards will be migrated to tape (and pruned immediatly, to get restore using the first backup which is on another (disk) pool and has sensible retention periods). So I write to a auto-labeled volume of this pool every weekend, to prepare tape-food (which could be much more easy if spooling would start without requireing the destination volume beeing labeled+mounted!!!). migration starts on Monday morning, not blocking other disk pools, hence plenty of time getting the duplicate backup migrated. But until the next time I write into that special buffer pool (which is not used by any other job until next weekend) the migration won't happen since the volstat stays append. I haven't found a way to get it into used. *How did you accomplish that?* console command with full volume name is no option! And any form of wildcard doesn't seem to work (e. g. telling alle volumes of a specific pool should be updated to have volstat=used. Thanks, -Harry signature.asc Description: OpenPGP digital signature -- ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
[Bacula-users] [7.0.5] Incorrect NextPool evaluation from Job ressource
Hello, testing one of the new 7.0.0 Features, mentioned in chapter 2.1.3 of the great documentation: »The Next Pool concept has been extended in Bacula version 7.0.0 to allow you to specify the Next Pool directive in the Job resource as well.« Job { Name = CopyTest Type = Copy JobDefs = GenericJobSettings # see bacula-dir.conf Client = client1-fd Pool = quaterly-backups Selection Type = Job Selection Pattern = CLIENT2-periodic B2D netshare-Gruppen-HR Next Pool = weekly-tapedup-buffer } *run Automatically selected Catalog: MyCatalog Using Catalog MyCatalog A job name must be specified. The defined Job resources are: … 15: CopyTest … Select Job resource (1-16): 15 Run Copy job JobName: CopyTest Bootstrap: *None* Client: client1-fd FileSet: None Pool: quaterly-backups (From Job resource) NextPool: LTO4-Archive (From Job's NextPool resource) ^^^ Here's some inconsistency. The source is claimed to be From Job resource, but the value actually is from Pool resource: Pool { Name = quaterly-backups … Next Pool = LTO4-Archive # needed for copy job } _run-directive-override works as expected:_ *run nextpool=weekly-tapedup-buffer A job name must be specified. The defined Job resources are: … 15: CopyTest … Select Job resource (1-16): 15 Run Copy job JobName: CopyTest Bootstrap: *None* Client: client1-fd FileSet: None Pool: quaterly-backups (From Job resource) NextPool: weekly-tapedup-buffer (From Command input) Thanks, -Harry signature.asc Description: OpenPGP digital signature -- ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
[Bacula-users] WinBin 7.0.5 on w2k3(x86), bat+tray-monitor - missing mingwm10.dll
Hello, I just upgraded bat from 6.0.6 to 7.0.5. On W2k8(amd64), bat now runs much more stable! On Windows Server 2003 (x86), bat refuses to start with the same error like bacula-tray-monitor: Missing mingwm10.dll. Unfortunately just taking the on from Bacula 6.0.6 dosn't solve the problem. The error message vanishes, but both (bat and tray-monitor) are not starting. FD just works fine on 2k3 though. Best regards, -Harry signature.asc Description: OpenPGP digital signature -- ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
[Bacula-users] [7.0.5] Option to start spooling before volume mount?
Hello, I enable data spooling for almost any job, because my LTO4 drive's hw-compression allows to stream _my_ data at little over 100MByte/s average, which bacula-fd can't deliver (localhost FD-SD connections allow ~25MB/s with 60+% CPU usage; SoftCompression is disabled; oberved FD reqests are 4KB/t only, but that's a completely different issue I have to look at more closely some times later). I'm aware that the numbers above are highly workload dependent and are not representative. They basically should give an idea why I need data spooling – nothing more. Environemnt is bacula 7.0.5 (dir, sd and fd), FreeBSD 10.1 (amd64) (also dir, sd and fd), LACP-GbE (irrelevant since I'm using localhost sockets), 3.4GHz Xeon-E3v3, 8GB RAM and two LSI2008. A typical job is 100-500GB in size. So spooling with only 25MB/s takes significant ammount of time. That's why I want to start the job as early as possible.. My problem is that data spooling starts _after_ the volume has been mounted and positioned. So if I start the job at midnight, no single bit get's spooled unless somebody feeds the correct tape next morning :-( And next morning very often delays to next lunch; that's were the missing 4 hours breaks the timetable for my setup. Is there already an option I missed, which would enable data spooling without requesting the volume first? That would save exactly the hours my setup needs as buffer if somebody forgets to feed the tape. Or is there any reason why data spooling must be delayed? I can't imagine one. If something goes wrong at despooling, nothing changes if volume was available before spooling or not. Thanks, -Harry signature.asc Description: OpenPGP digital signature -- ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] [7.0.5] Option to start spooling before volume mount?
Bezüglich John Lockard's Nachricht vom 26.10.2014 15:34 (localtime): I run into this issue with several of my servers and dealt with it by creating migrate jobs. First job goes to disk. Second job runs some reasonable time later and migrates the D2D job to tape. I had a number of key servers I did this for with the advantage that I could offsite the tapes and keep the D2D job on disk till after the next backup had run. This way I had immediate recovery available as well as disaster recovery. Well, this indeed is a good workarround. Im already using job migration, but for other tasks (regular file-backups). In that special case, the data to archive onto LTO is already backup-data (iSCSI exported WindowsBackupDrives). But of course I could do another intermediate-backup. Will do that I guess. As long as I'm not occupying too much space from my spool device (2TB only), since my backup-pool is pretty well filled and I don't like backing up the same data twice on the same spindle-pool… Having an option which allows spooling start pre-volume-mount would life make easier, though. Thanks, -Harry signature.asc Description: OpenPGP digital signature -- ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
[Bacula-users] Frist-Written-Timestamp with recycled volumes - renders Use Duration useless?
Hello, unfortunately volume recycling doesn't work like I expected it. I have a pool with 2 volumes (tapes), which I use for duplicating (hdd) backups every weekend. So if one tape is in the drive and the house is on fire, I have the tape from the weekend before. I have a “Use Duration” of 3 days and a “Volume Retention” of 9 days, so after 12 days, the volume will get purged and recycled – theoretically. I haven't managed to get the automatic recyling to work yet, it's a new setup and I did the retention periods wrong on the first shot, so I manually purged the volume and set to recycle. The problem is: Recycled volumes keep their First-Written-Timestamp, so it will be always marked “Used”, since “Use Duration” was intentionally lapsed for the first volume usage. But when recycling a volume, there should begin a new “Use Duration“ period!!! Also, in my opinion, the label of the volume should be recreated when recyled, not copied (see excerpt below for obvious auto-label reason). But what I don't understand at all is why the timestamps are not reset at volume recycling. That can't be intenionally, is it?!??!!? Thanks, -Harry Config excerpt from the affected pool resource: Pool { Name = LTO4-weekly-Dups # Description: # Pool for copies of differential-backups from harddisk store (for # disaster-recovery). Maximum Volumes = 2 Pool Type = Backup Storage = Tandberg-LTO4 Volume Use Duration = 3 days Volume Retention = 9 days # keep 9 days after last write Recycle Current Volume = yes # prune currently mounted tape LabelFormat = weekly-rotation-${NumVols:p/2/0/r}_${Year}-${Month:p/2/0/r}-${Day:p/2/0/r} } signature.asc Description: OpenPGP digital signature -- CenturyLink Cloud: The Leader in Enterprise Cloud Services. Learn Why More Businesses Are Choosing CenturyLink Cloud For Critical Workloads, Development Environments Everything In Between. Get a Quote or Start a Free Trial Today. http://pubads.g.doubleclick.net/gampad/clk?id=119420431iu=/4140/ostg.clktrk___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
[Bacula-users] Copy/Migration utilization
Dear bacula insiders, I got a replacement for my old DDS5 - a LTO4. Setup is D2D2T. The problem is not the setup, but the tape drive utilization. The disk-storage can provide well over 300MByte/s, and using tar with -b 126 or dump or dd, I see 78-160MB/s moving to the drive. So the problem is probably not hardware related. When I have bacula copying a job to tape, there are very often outages, weher no data at all is transfered to the tape. I found out that after this break, the `mt status` reports a incremented file number. When does bacula write a new file, meaning why so often? I'd like to treat my tape drive as gently as possible, so the job result of a average transfer rate of 65MB/s makes me worrying about massive repositioning. Can I optimize my setup so that there won't be so many new files written on tape? Or should the creation of a new file mark been done without interruption of the transfer, and there's something wrong with my setup? Another strange thing for me is the device utilization. When using tar I can see a %busy report of ~90 constantly for the tape device, no matter if the transferrate is 80MB/s or (due to compressable material) 120MB/s. When bacula writes 75MB/s, I get only 68% busy reported ?!? Up to the break point (new file), after which I get 350% usage And if compression allows 100MB/s the %busy rate decreases to 50!?! But this is probably OS-specific (FreeBSD), just in case someone could enlighten me why this is not the case with other tape writers... Thanks ins advance, -Harry signature.asc Description: OpenPGP digital signature -- All of the data generated in your IT infrastructure is seriously valuable. Why? It contains a definitive record of application performance, security threats, fraudulent activity, and more. Splunk takes this data and makes sense of it. IT sense. And common sense. http://p.sf.net/sfu/splunk-d2d-c2___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Copy/Migration utilization
schrieb Adrian Reyer am 10.07.2011 14:43 (localtime): Hi Harry, On Sun, Jul 10, 2011 at 01:38:52PM +0200, Harald Schmalzbauer wrote: of a average transfer rate of 65MB/s makes me worrying about massive repositioning. AFAIK LTO-Drives have adaptive speeds compared to older technologies. If the data comes in slower, the drive will just run slower on a somewhat constant speed. No more stop-and-go. Hi Adrian, thanks for your reply. I read about throttling capability, but `mt status` shows drive state at rest even during active transferes. The server is at remote site, so I can't hear any mechanicals, but I guess at rest means stop, thus my worries about extensive repositioning. Can I optimize my setup so that there won't be so many new files written on tape? Or should the creation of a new file mark been done without interruption of the transfer, and there's something wrong with my setup? Do you use 'Spool Data = yes'? To my understanding you can run multiple jobs to storage the same time, but they end up interleaved. Spooling the data will write full jobs or at least bigger chunks of a job in one run. I have no backup jobs using the tape drive, so no spool is in use. I only use the tape drive for migration (or sometimes copy) jobs. And in the disk-pool I use Use Volume Once = yes, so every job has it's own file without interleaved data, which has exactly the size the job summary reports. Do you know if marking a new file on tape interrupts the LTO drive from streaming? Perhaps it shouldn't interrupt streaming, and writing many files for one job is a well choosen design, and I'm suffering from some other misconfiguration which leads to interruption on marking new file. If it's technically not possible to keep the drive streaming while marking a new file, then I'm interested in tweaks how to avoid hundreds of new file marks per backup job. Do others see 208 files after 200G writte on tape? Wild guess: If one file is marked every 1GByte written (wich takes max 12.8sec in my case), and this file mark interrupts the drive for round about 4 seconds, then my transfer rate decreases from the usual rate for uncompressable material of 80MB/s by 25% to ~60MB/s. That would exactly represent the numbers I'm seeing here... Which means the drive is only streaming 75% of the time, the rest is used for repositionings :-( Maybe this was not an issue with slower tape drives. LTO2 would only suffer from about 6% performance loss, if my wild guess has any truth... Thanks, -Harry signature.asc Description: OpenPGP digital signature -- All of the data generated in your IT infrastructure is seriously valuable. Why? It contains a definitive record of application performance, security threats, fraudulent activity, and more. Splunk takes this data and makes sense of it. IT sense. And common sense. http://p.sf.net/sfu/splunk-d2d-c2___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Copy/Migration utilization
schrieb Adrian Reyer am 10.07.2011 15:58 (localtime): ... I found using Spool Data for copy jobs to be faster for my setup. I have fast local disks for spooling, but some of my disk storage is accessed vie iSCSI on 1-GBit/s-links. However, I am currently running a few copy jobs and the limiting factor seems to be my bacula-sd consuming 1 complete CPU, throttling me at 55MB/s. The CPU is an older 'AMD Athlon(tm) 64 X2 Dual Core Processor 3800+' Hmmm, interesting point. Is the storage daemon single threaded? I'll check this, since I have an old dual core Xeon, but I checked that there were at least 25% idle of CPU time (ZFS compression is used, so that client-to-disk backup is uncompressed avoiding interference with later (migration) tape drive compression) Maybe this was not an issue with slower tape drives. LTO2 would only suffer from about 6% performance loss, if my wild guess has any truth... LTO4 as well here, and no ear next to the drive. However, 'mt status' won't run as the drive is in use by the copy jobs, how you got that info? On FreeBSD there's a special control device for every sequential access device (/dev/sa0 /dev/sa0.ctl for example). No luck so far finding technical end-user-details for the design of LTO drives. I'm really wondering if file marks are written to the tape or just in the cartridge memory chip. And even if they get written to the tape, is it unavoidable that streaming is interrupted??? Questions over questions, hopefully one magneto-guru will read... Thanks, -Harry signature.asc Description: OpenPGP digital signature -- All of the data generated in your IT infrastructure is seriously valuable. Why? It contains a definitive record of application performance, security threats, fraudulent activity, and more. Splunk takes this data and makes sense of it. IT sense. And common sense. http://p.sf.net/sfu/splunk-d2d-c2___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Copy/Migration utilization
schrieb Harald Schmalzbauer am 10.07.2011 16:28 (localtime): schrieb Adrian Reyer am 10.07.2011 15:58 (localtime): ... I found using Spool Data for copy jobs to be faster for my setup. I have fast local disks for spooling, but some of my disk storage is accessed vie iSCSI on 1-GBit/s-links. However, I am currently running a few copy jobs and the limiting factor seems to be my bacula-sd consuming 1 complete CPU, throttling me at 55MB/s. The CPU is an older 'AMD Athlon(tm) 64 X2 Dual Core Processor 3800+' Hmmm, interesting point. Is the storage daemon single threaded? I'll check this, since I have an old dual core Xeon, but I checked that there were at least 25% idle of CPU time (ZFS compression is used, so that client-to-disk backup is uncompressed avoiding interference with later (migration) tape drive compression) Maybe this was not an issue with slower tape drives. LTO2 would only suffer from about 6% performance loss, if my wild guess has any truth... LTO4 as well here, and no ear next to the drive. However, 'mt status' won't run as the drive is in use by the copy jobs, how you got that info? On FreeBSD there's a special control device for every sequential access device (/dev/sa0 /dev/sa0.ctl for example). No luck so far finding technical end-user-details for the design of LTO drives. I'm really wondering if file marks are written to the tape or just in the cartridge memory chip. And even if they get written to the tape, is it unavoidable that streaming is interrupted??? But reading the great official bacula manual answered one of my questions and prooved my wild guess to be correct. in Chapter 19, page 188, one can find: Maximum File Size = size No more than size bytes will be written into a given logical file on the volume. Once this size is reached, an end of file mark is written on the volume and subsequent data are written into the next file. ... If you are configuring an LTO-3 or LTO-4 tape, you probably will want to set the Maximum File Size to 2GB to avoid making the drive stop to write an EOF mark. ... I set it to 100G and the frequent interruptions vanished :-) Maybe the suggested 2GB are well chosen for LTO-3, I think for LTO-4 you need much larger values than 2GB. Now I still have the problem that I don't get more than 110MB/s using bacula to the drive, while %busy states 55% and the disks from the ZFS pool only read 15MB/s, reflecting that currently written material is compresed by almost 2:1. I have seen transerrates of 150MB/s with tar... CPU is 83% idle, ZFS disks 10% busy and the tape drive ~50% idle. Why don't I get the 150MB/s as seen with tar ...?!?... I'll report if I find out and thank you in advance for any hints. -Harry signature.asc Description: OpenPGP digital signature -- All of the data generated in your IT infrastructure is seriously valuable. Why? It contains a definitive record of application performance, security threats, fraudulent activity, and more. Splunk takes this data and makes sense of it. IT sense. And common sense. http://p.sf.net/sfu/splunk-d2d-c2___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
[Bacula-users] VSS for volumes without drive letter?!?
Dear all, I have problems with the following fileset: File = C:/ File = D:/ File = D:/windvsw1/DATEV/DATEN The latter is a volume without a drive letter. How can I tell bacula that this is a volume for which VSS should be used? I get the expected cannont backup because file is opened error on all the database files on that volume. Thanks, -Harry signature.asc Description: OpenPGP digital signature -- EditLive Enterprise is the world's most technically advanced content authoring tool. Experience the power of Track Changes, Inline Image Editing and ensure content is compliant with Accessibility Checking. http://p.sf.net/sfu/ephox-dev2dev___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
[Bacula-users] Wrong Bytes Written count in canceled Job status notification
Hello bacula list, I'm trying to setup a small backup concept with bacula 5.0.3 If a job was canceled due to Maxc Run Time limits for example, the report states: FD Files Written: 0 SD Files Written: 0 FD Bytes Written: 0 (0 B) SD Bytes Written: 0 (0 B) Rate: 0.0 KB/s That's not the truth, in fact there were some hundred GigBytes written. Any hints? Known bug? Thanks, -Harry signature.asc Description: OpenPGP digital signature -- EditLive Enterprise is the world's most technically advanced content authoring tool. Experience the power of Track Changes, Inline Image Editing and ensure content is compliant with Accessibility Checking. http://p.sf.net/sfu/ephox-dev2dev___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
[Bacula-users] Max Wait Time not respected, Max Run Time exceedes with 0s run time
Hello bacula list, I'm trying to setup a small backup concept with bacula 5.0.3 Unfortunately I ran into the same problem about one year ago with 5.0.1 I'm using file based SD. I had one job running, accessing SD1 running, so the next scheduled job had to wait. But it didn't wait Max Wait Time of 2 hours, instead it started Max Run Time after the scheduled start and terminated with 0 seconds runtime because Max Run Time exceeded. Can somebody help? Here's te relevant part of the canceled job status: 13-Jun 23:01 uruba-dir JobId 76: Fatal error: Max run time exceeded. Job canceled. 13-Jun 23:01 uruba-dir JobId 76: Bacula uruba-dir 5.0.3 (04Aug10): 13-Jun-2011 23:01:15 Build OS: amd64-portbld-freebsd8.2 freebsd 8.2-RELEASE-p1 JobId: 76 Job:WTS1-Complete.2011-06-13_22.31.01_52 Backup Level: Incremental, since=2011-06-13 21:12:31 Client: wts1b-fd 5.0.3 (04Aug10) Linux,Cross-compile,Win64 FileSet:Win2008_VSS_C_Drive 2011-06-06 11:00:01 Pool: Daily (From Run pool override) Catalog:UrubaCatalog (From Client resource) Storage:ZFS1 (From run override) Scheduled time: 13-Jun-2011 22:31:01 Start time: 13-Jun-2011 23:01:15 End time: 13-Jun-2011 23:01:15 Elapsed time: 0 secs Thanks, -Harry signature.asc Description: OpenPGP digital signature -- EditLive Enterprise is the world's most technically advanced content authoring tool. Experience the power of Track Changes, Inline Image Editing and ensure content is compliant with Accessibility Checking. http://p.sf.net/sfu/ephox-dev2dev___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Max Wait Time not respected, Max Run Time exceedes with 0s run time
schrieb Jeremy Maes am 14.06.2011 10:46 (localtime): Op 14/06/2011 9:52, Harald Schmalzbauer schreef: Hello bacula list, I'm trying to setup a small backup concept with bacula 5.0.3 Unfortunately I ran into the same problem about one year ago with 5.0.1 I'm using file based SD. I had one job running, accessing SD1 running, so the next scheduled job had to wait. But it didn't wait Max Wait Time of 2 hours, instead it started Max Run Time after the scheduled start and terminated with 0 seconds runtime because Max Run Time exceeded. Can somebody help? This is because you are probably using the wrong kind of wait times for your job, or wrong values for them. A picture from the manual should show this clearly: The Max Run Time will start counting the moment the job tries to get a hold of a storage volume. Max Wait Time does NOT get added to this time, Thanks a lot for your help. I had looked at the illustration from the manual and together with the status report, I understand it as the job start time is not the schedule time. Here's the excerpt of the status report: Scheduled time: 13-Jun-2011 22:31:01 Start time: 13-Jun-2011 23:01:15 So corresponding to the illustration, this should be the period Wait time, limited by Max Start Delay. The report states Elapsed time: 0 secs. The definition of Run Time in the manual makes clear that Elapsed time should be the same: Max Run Time = time The time specifies the maximum allowed time that a job may run, counted from when the job starts, (not necessarily the same as when the job was scheduled). What I can observe is that the cancellation could only be correct if I had set Max Run Sched Time, but that's not set at all. Max Run Sched Time = time The time specifies the maximum allowed time that a job may run, counted from when the job was scheduled. This can be useful to prevent jobs from running during working hours. We can see it like Max Start Delay + Max Run Time. so if your Max Run Time is shorter than the Max Wait Time it'll time out with the warning you're getting. This doesn't correspond to the manual nor make sense imho or I don't understand the concept at all. My first problem of understanding is why my job gets started at 23:01:14, 30 Minutes after it was scheduled? If it starts regardless of the SD-state, meaning it had waited for 30 Minutes for the Storage Device, the start time was schedule time, not 30 Minutes later... But if I understood right, the job should be delayed Delay Time before it gets started, for the Storage device to become available. Or are there other reasons why a job could be delayed and a blocked Storage Device is no reason? The solution I use is to not specify most of them, and only use a Max Run Sched Time. This will make sure the job finishes (or gets cancelled if it's not done) within a set amount of time after I schedule it. Though depending on the situation that might not be the best way to go... In my case unfortunately that's not what I want to limit. It's only about to limit the time it's one client allowed to transfere data. Thanks, -Harry signature.asc Description: OpenPGP digital signature -- EditLive Enterprise is the world's most technically advanced content authoring tool. Experience the power of Track Changes, Inline Image Editing and ensure content is compliant with Accessibility Checking. http://p.sf.net/sfu/ephox-dev2dev___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Max Wait Time not respected, Max Run Time exceedes with 0s run time
schrieb Jeremy Maes am 14.06.2011 11:43 (localtime): Op 14/06/2011 11:23, Harald Schmalzbauer schreef: schrieb Jeremy Maes am 14.06.2011 10:46 (localtime): Op 14/06/2011 9:52, Harald Schmalzbauer schreef: Hello bacula list, I'm trying to setup a small backup concept with bacula 5.0.3 Unfortunately I ran into the same problem about one year ago with 5.0.1 I'm using file based SD. I had one job running, accessing SD1 running, so the next scheduled job had to wait. But it didn't wait Max Wait Time of 2 hours, instead it started Max Run Time after the scheduled start and terminated with 0 seconds runtime because Max Run Time exceeded. Can somebody help? This is because you are probably using the wrong kind of wait times for your job, or wrong values for them. A picture from the manual should show this clearly: The Max Run Time will start counting the moment the job tries to get a hold of a storage volume. Max Wait Time does NOT get added to this time, Thanks a lot for your help. I had looked at the illustration from the manual and together with the status report, I understand it as the job start time is not the schedule time. Here's the excerpt of the status report: Scheduled time: 13-Jun-2011 22:31:01 Start time: 13-Jun-2011 23:01:15 So corresponding to the illustration, this should be the period Wait time, limited by Max Start Delay. The report states Elapsed time: 0 secs. The definition of Run Time in the manual makes clear that Elapsed time should be the same: Before I make any more assumptions etc, could you show us your configs for the given job? What settings related to this have you set or not set? Thanks, of course I can give you all the details: JobDefs { Name = WindowsDefault Type = Backup Level = Incremental Messages = Standard Pool = Temp # Full Backup Pool = Weekly # Incremental Backup Pool = Daily Priority = 10 Schedule = ServerCompleteBackup Write Bootstrap = /data/bacula-storage/BaculaFD/BootStrapRecords/%c.bsr Max Start Delay = 14400 # 4h to wait after scheduled start Max Run Time = 14400 # s42 hours to run after beeing really started Differential Max Run Time = 7200 # 2 hour for incrementals after beeing started Incremental Max Run Time = 3600 # 1 hour for incrementals after beeing started Max Run Sched Time = 36000 # 10 hours to wait to start job as planned Max Wait Time = 7200 # 2h to wait for resources after job really started # Max Full Interval = # If Full is older thatn this, alwas full will be performed! } # Client (File Services) to backup Client { Name = wts2b-fd Address = FDPort = 9102 Catalog = UrubaCatalog Password = File Retention = 30 days# 30 days Job Retention = 12 months# six months AutoPrune = yes # Prune expired Jobs/Files } Job { Name = WTS2-Complete Enabled = yes Client = wts2b-fd JobDefs = WindowsDefault Full Backup Pool = Monthly Differential Backup Pool = Weekly Incremental Backup Pool = Daily Storage = ZFS2 FileSet = Win2008_VSS_C_Drive Schedule = ServerCompleteBackup } Schedule { Name = ServerCompleteBackup Run = Level=Full 1st sun at 21:01 Run = Level=Differential 2nd-5th sun at 21:01 Run = Level=Incremental mon-sat at 22:31 } --- bacula-dir.conf Director {# define myself Name = uruba-dir DIRport = 9101# where we listen for UA connections QueryFile = /usr/local/share/bacula/query.sql WorkingDirectory = /var/db/bacula PidDirectory = /var/run Maximum Concurrent Jobs = 4 Password = # Console password Messages = Daemon } # Include subfiles associated with configuration of clients. # They define the bulk of the Clients, Jobs, and FileSets. # Remember to reload the Director after adding a client file. @|sh -c 'for f in /usr/local/etc/bacula/*.conf ; do echo @${f} ; done' # Generic catalog service Catalog { Name = UrubaCatalog # Uncomment the following line if you want the dbi driver # dbdriver = dbi:postgresql; dbaddress = 127.0.0.1; dbport = dbname = bacula; dbuser = bacula; dbpassword = } # Definition of file storage device Storage { Name = ZFS1 # Do not use localhost here Address = uruba# N.B. Use a fully qualified name here SDPort = 9103 Password = Device = raidzP1_datadir1 Media Type = dev1file AllowCompression = no } Storage { Name = ZFS2 # Do not use localhost here Address = uruba# N.B. Use a fully qualified name here SDPort = 9103 Password = Device = raidzP1_datadir2 Media Type = dev2file AllowCompression = no } Storage { Name = ZFS3 # Do not use localhost here Address = uruba# N.B. Use a fully qualified name here SDPort = 9103 Password = Device = raidzP1_datadir3 Media Type = dev3file AllowCompression = no } # Definition of DDS tape storage device Storage { Name = HP-DAT72 Address
Re: [Bacula-users] Max Run Time exceede with 0s run time!
Harald Schmalzbauer schrieb am 08.04.2010 20:18 (localtime): Am 08.04.2010 14:20, schrieb Matija Nalis: On Mon, Apr 05, 2010 at 12:46:25PM +0200, Harald Schmalzbauer wrote: Absurdly canceled job 47: Fatal error: Max run time exceeded. Job canceled. Scheduled time: 04-Apr-2010 21:01:03 Start time: 04-Apr-2010 21:39:41 End time: 04-Apr-2010 21:39:41 Elapsed time: 0 secs ... Here's my conf regarding max times: Max Start Delay = 14400 # 4h to wait after scheduled start Max Run Time = 1800 # Half an hour to run after beeing really started Incremental Max Run Time = 900 # 15 Minutes for incrementals after beeing started Max Run Sched Time = 36000 # 10 hours to wait to start job as planned Max Wait Time = 7200 # 2h to wait for resources after job really started Which version of bacula is that ? There were bugs not too far ago where 'Max Wait Time' wrongly acted like 'Max Run Time'; maybe it had similar problems with other related directives too. I'm running 5.0.1. First job took longer than 30 minutes, so it was canceled. Second job took 9 minutes, so start time of third job is 39minutes after scheduled. There's no other time limit which could fit, Max Wait Time is 2 hours. Hello, this weekend the same thing happened again. I intentionally set the Max Run Time to 30 mins, but it is not working. It hoses all other subsequent jobs but the one following. Again to visualize my timetable: at 21:00h: startjob1 - startjob2 - startjob3 - startjob4 - startjob5 -- runing waiting waiting waiting waiting -- 30 mins canceled running waiting waiting waiting -- -finisehdcanceled ok (8min) (0s runtime) waiting waiting -- - - -canceled (0s runtime) waiting -- - - -- canceled (0s runtime) Why do job 3-5 get cancelled with 0s runtime? How can I file a bug report? Another thing: The canceled (due to runtime longer than 30mins) job reports 0 Bytes written, but in fact it should have been writing for 30 minutes. Which seems to be tha case if I compare Volume Bytes: At last volume usage: 162,598,409,235 (162.5 GB) Canceled Job reports: Elapsed time: 30 mins 22 secs Priority: 10 FD Files Written: 0 SD Files Written: 0 FD Bytes Written: 0 (0 B) SD Bytes Written: 0 (0 B) Rate: 0.0 KB/s Software Compression: None VSS:no Encryption: no Accurate: no Volume name(s): MonthA Volume Session Id: 73 Volume Session Time:1270146159 Last Volume Bytes: 174,597,640,742 (174.5 GB) Non-fatal FD errors:0 SD Errors: 0 FD termination status: Error SD termination status: Running Termination:Backup Canceled So it has written 12GB. Are the reports only valid for correctly terminated jobs? I think even for canceled or other error termination it should report as many correct values as possible. Thanks, -Harry signature.asc Description: OpenPGP digital signature -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Max Run Time exceede with 0s run time!
Am 08.04.2010 14:20, schrieb Matija Nalis: On Mon, Apr 05, 2010 at 12:46:25PM +0200, Harald Schmalzbauer wrote: Absurdly canceled job 47: Fatal error: Max run time exceeded. Job canceled. Scheduled time: 04-Apr-2010 21:01:03 Start time: 04-Apr-2010 21:39:41 End time: 04-Apr-2010 21:39:41 Elapsed time: 0 secs Hm, yeah, not much sense. Was that incremental job or ? It looks like it has 'Incremental Max Run Sched Time' instead of 'Incremental Max Run Time'... Thanks for your replay. All jobs were full backups. Did you try increasing it to see if that's really what is causing problems ? All other timeouts are bigger than one hour so really shouldn't be the problem. I haven't done any try'n'error tests yet. I don't really understand why the first job after the canceled one was finishing fine but all other subsequent were canceled immediately. This can't be a timer setting problem IMO. I guess it's more a logical error with handling subsequent jobs. Here's my conf regarding max times: Max Start Delay = 14400 # 4h to wait after scheduled start Max Run Time = 1800 # Half an hour to run after beeing really started Incremental Max Run Time = 900 # 15 Minutes for incrementals after beeing started Max Run Sched Time = 36000 # 10 hours to wait to start job as planned Max Wait Time = 7200 # 2h to wait for resources after job really started Which version of bacula is that ? There were bugs not too far ago where 'Max Wait Time' wrongly acted like 'Max Run Time'; maybe it had similar problems with other related directives too. I'm running 5.0.1. First job took longer than 30 minutes, so it was canceled. Second job took 9 minutes, so start time of third job is 39minutes after scheduled. There's no other time limit which could fit, Max Wait Time is 2 hours. See http://www.bacula.org/en/dev-manual/New_Features.html Thank you, should I file a bug report in mantis? -Harry signature.asc Description: OpenPGP digital signature -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
[Bacula-users] Max Run Time exceede with 0s run time!
Dear all, one of my job ran into max run time limitation and was canceled (after 30 mins) Then next job was started and finisehd correclty (another 8 mins) but all subsequent jobs where cancelled due to Max run time exceeded. Here's the journal: Timed out job 45: banana-dir Fatal error: Max run time exceeded. Job canceled. banana-sd JobId=45 Job=Client1-Complete.2010-04-04_21.01.03_30 marked to be canceled. Scheduled time: 04-Apr-2010 21:01:03 Start time: 04-Apr-2010 21:01:05 End time: 04-Apr-2010 21:31:26 Elapsed time: 30 mins 21 secs Correct Job 46: Scheduled time: 04-Apr-2010 21:01:03 Start time: 04-Apr-2010 21:31:31 End time: 04-Apr-2010 21:39:37 Elapsed time: 8 mins 6 secs Absurdly canceled job 47: Fatal error: Max run time exceeded. Job canceled. Scheduled time: 04-Apr-2010 21:01:03 Start time: 04-Apr-2010 21:39:41 End time: 04-Apr-2010 21:39:41 Elapsed time: 0 secs Absurdly canceled job 48: Fatal error: Max run time exceeded. Job canceled. Scheduled time: 04-Apr-2010 21:01:03 Start time: 04-Apr-2010 21:39:41 End time: 04-Apr-2010 21:39:42 Elapsed time: 1 sec And so on. What am I missing? Here's my conf regarding max times: Max Start Delay = 14400 # 4h to wait after scheduled start Max Run Time = 1800 # Half an hour to run after beeing really started Incremental Max Run Time = 900 # 15 Minutes for incrementals after beeing started Max Run Sched Time = 36000 # 10 hours to wait to start job as planned Max Wait Time = 7200 # 2h to wait for resources after job really started Thanks for any help in advance. -Harry signature.asc Description: OpenPGP digital signature -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users