Re: [Bacula-users] make bat troubles in bacula 3.0.0
Hello Dirk, I would greatly appreciate a patch to be able to install bat 3.0.0 on centos 5 with stock qt 4.2. I am getting the same errors as Jan Jap. thanks! Stephen Dirk Bartley wrote: I have done a patch for being able to install on the older qt. If you wat it, let me know. I'll hunt it down. The issue is not with the programming but the version of designer I was using. If I open with an older designer and solve a couple of issues it compiled just fine on centos. I'm trying not to keep so up to date lately. Dirk On Wed, 2009-04-08 at 15:27 -0700, Kelvin Raywood wrote: JanJaap Scholing janjaapschol...@hotmail.com: I'm trying to install bat. ./configure --enable-lockmgr --with-mysql --enable-bat --disable-libtool looks ok But when I make I see the following error messages: [snip] I use Debian 4 with qt4 installed (4.2.1-2+etch1), bacula 3.0.0 (latest svn) What can I do to solve this problem? John Drescher wrote: Install Qt 4.3 or greater. Ignore that. I was looking at the wrong class in the Qt docs. I think the first advice was correct. I recently had the same problem myself building bacula-2.5.42-b2 on CentOS-5 which includes QT4.2.1 . I grabbed qt 4.3.4 srpm from Fedora-7 and did rpmbuild --rebuild --define 'dist .el5' --define 'rhel 5' \ qt4-4.3.4-14.fc7.src.rpm yum localinstall qt4-4.3.4-14.el5.x86_64.rpm \ qt4-devel-4.3.4-14.el5.x86_64.rpm \ qt4-x11-4.3.4-14.el5.x86_64.rpm I installed qwt-devel from EPEL. BAT built and runs fine. Kel Raywood -- This SF.net email is sponsored by: High Quality Requirements in a Collaborative Environment. Download a free trial of Rational Requirements Composer Now! http://p.sf.net/sfu/www-ibm-com ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users -- This SF.net email is sponsored by: High Quality Requirements in a Collaborative Environment. Download a free trial of Rational Requirements Composer Now! http://p.sf.net/sfu/www-ibm-com ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users -- Stephen Thompson Berkeley Seismological Laboratory step...@seismo.berkeley.edu215 McCone Hall # 4760 404.538.7077 (phone) University of California, Berkeley 510.643.5811 (fax) Berkeley, CA 94720-4760 -- Stay on top of everything new and different, both inside and around Java (TM) technology - register by April 22, and save $200 on the JavaOne (SM) conference, June 2-5, 2009, San Francisco. 300 plus technical and hands-on sessions. Register today. Use priority code J9JMT32. http://p.sf.net/sfu/p ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
[Bacula-users] Restore only file attributes (permissions, ACL, owner, group...)
Hello, I see that a restore attribute only feature has been added with 3.0.0, but I cannot find any documentation on how to run a restore with this feature. Is this a command line option to restore? Any help would be greatly appreciated. Stephen -- Stephen Thompson Berkeley Seismological Laboratory step...@seismo.berkeley.edu215 McCone Hall # 4760 404.538.7077 (phone) University of California, Berkeley 510.643.5811 (fax) Berkeley, CA 94720-4760 -- Stay on top of everything new and different, both inside and around Java (TM) technology - register by April 22, and save $200 on the JavaOne (SM) conference, June 2-5, 2009, San Francisco. 300 plus technical and hands-on sessions. Register today. Use priority code J9JMT32. http://p.sf.net/sfu/p ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
[Bacula-users] job progress?
Hello, Is there any built-in/simple way to determine how far along a job is? Some kind of progress meter against a job size estimate? Even knowing how much has been put to tape at a given point would be nice. We have jobs that take more than 24 hours to run. :S The best I can see is looking at the JobMedia table and then multiplying the number of entries for a job by the file size for our tape media. Not even sure if that's accurate. Anything simpler? thanks! Stephen -- Stephen Thompson Berkeley Seismological Laboratory step...@seismo.berkeley.edu215 McCone Hall # 4760 404.538.7077 (phone) University of California, Berkeley 510.643.5811 (fax) Berkeley, CA 94720-4760 -- Stay on top of everything new and different, both inside and around Java (TM) technology - register by April 22, and save $200 on the JavaOne (SM) conference, June 2-5, 2009, San Francisco. 300 plus technical and hands-on sessions. Register today. Use priority code J9JMT32. http://p.sf.net/sfu/p ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] job progress?
This is what I couldn't seem to find -- the running job bytes to tape per jobname. thanks! Stephen Ralf Gross wrote: Stephen Thompson schrieb: Is there any built-in/simple way to determine how far along a job is? Some kind of progress meter against a job size estimate? Even knowing how much has been put to tape at a given point would be nice. We have jobs that take more than 24 hours to run. :S The best I can see is looking at the JobMedia table and then multiplying the number of entries for a job by the file size for our tape media. Not even sure if that's accurate. Anything simpler? status client= shows you how much data was backed up so far. Ralf -- Stay on top of everything new and different, both inside and around Java (TM) technology - register by April 22, and save $200 on the JavaOne (SM) conference, June 2-5, 2009, San Francisco. 300 plus technical and hands-on sessions. Register today. Use priority code J9JMT32. http://p.sf.net/sfu/p ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users -- Stephen Thompson Berkeley Seismological Laboratory step...@seismo.berkeley.edu215 McCone Hall # 4760 404.538.7077 (phone) University of California, Berkeley 510.643.5811 (fax) Berkeley, CA 94720-4760 -- Stay on top of everything new and different, both inside and around Java (TM) technology - register by April 22, and save $200 on the JavaOne (SM) conference, June 2-5, 2009, San Francisco. 300 plus technical and hands-on sessions. Register today. Use priority code J9JMT32. http://p.sf.net/sfu/p ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] make bat troubles in bacula 3.0.0
Dirk, Thanks, though I'm still getting the error: g++ -c -m64 -pipe -g -D_REENTRANT -Wall -W -DQT_GUI_LIB -DQT_CORE_LIB -DQT_SHARED -I/usr/lib64/qt4/mkspecs/linux-g++-64 -I. -I/usr/lib64/qt4/include/QtCore -I/usr/lib64/qt4/include/QtCore -I/usr/lib64/qt4/include/QtGui -I/usr/lib64/qt4/include/QtGui -I/usr/lib64/qt4/include -I.. -I. -Iconsole -Irestore -Iselect -I../../../../qwt/include -Imoc -Iui -o obj/main.o main.cpp ui/ui_main.h: In member function ���void Ui_MainForm::setupUi(QMainWindow*)���: ui/ui_main.h:168: error: ���class QGridLayout��� has no member named ���setLeftMargin��� ui/ui_main.h:169: error: ���class QGridLayout��� has no member named ���setTopMargin��� ui/ui_main.h:170: error: ���class QGridLayout��� has no member named ���setRightMargin��� ui/ui_main.h:171: error: ���class QGridLayout��� has no member named ���setBottomMargin��� ui/ui_main.h:172: error: ���class QGridLayout��� has no member named ���setHorizontalSpacing��� ui/ui_main.h:173: error: ���class QGridLayout��� has no member named ���setVerticalSpacing��� ui/ui_main.h:224: error: ���class QGridLayout��� has no member named ���setLeftMargin��� ui/ui_main.h:225: error: ���class QGridLayout��� has no member named ���setTopMargin��� ui/ui_main.h:226: error: ���class QGridLayout��� has no member named ���setRightMargin��� ui/ui_main.h:227: error: ���class QGridLayout��� has no member named ���setBottomMargin��� ui/ui_main.h:228: error: ���class QGridLayout��� has no member named ���setHorizontalSpacing��� ui/ui_main.h:229: error: ���class QGridLayout��� has no member named ���setVerticalSpacing��� ui/ui_main.h:255: error: ���class QGridLayout��� has no member named ���setLeftMargin��� ui/ui_main.h:256: error: ���class QGridLayout��� has no member named ���setTopMargin��� ui/ui_main.h:257: error: ���class QGridLayout��� has no member named ���setRightMargin��� ui/ui_main.h:258: error: ���class QGridLayout��� has no member named ���setBottomMargin��� ui/ui_main.h:259: error: ���class QGridLayout��� has no member named ���setHorizontalSpacing��� ui/ui_main.h:260: error: ���class QGridLayout��� has no member named ���setVerticalSpacing��� ui/ui_main.h:264: error: ���class QHBoxLayout��� has no member named ���setLeftMargin��� ui/ui_main.h:265: error: ���class QHBoxLayout��� has no member named ���setTopMargin��� ui/ui_main.h:266: error: ���class QHBoxLayout��� has no member named ���setRightMargin��� ui/ui_main.h:267: error: ���class QHBoxLayout��� has no member named ���setBottomMargin��� make: *** [obj/main.o] Error 1 Also 2 of the diffs on main.ui and prefs.ui failed. Attached are the rejs. thanks, Stephen Dirk Bartley wrote: This is the diff. Dirk On Fri, 2009-04-17 at 09:14 -0700, Stephen Thompson wrote: Hello Dirk, I would greatly appreciate a patch to be able to install bat 3.0.0 on centos 5 with stock qt 4.2. I am getting the same errors as Jan Jap. thanks! Stephen Dirk Bartley wrote: I have done a patch for being able to install on the older qt. If you wat it, let me know. I'll hunt it down. The issue is not with the programming but the version of designer I was using. If I open with an older designer and solve a couple of issues it compiled just fine on centos. I'm trying not to keep so up to date lately. Dirk On Wed, 2009-04-08 at 15:27 -0700, Kelvin Raywood wrote: JanJaap Scholing janjaapschol...@hotmail.com: I'm trying to install bat. ./configure --enable-lockmgr --with-mysql --enable-bat --disable-libtool looks ok But when I make I see the following error messages: [snip] I use Debian 4 with qt4 installed (4.2.1-2+etch1), bacula 3.0.0 (latest svn) What can I do to solve this problem? John Drescher wrote: Install Qt 4.3 or greater. Ignore that. I was looking at the wrong class in the Qt docs. I think the first advice was correct. I recently had the same problem myself building bacula-2.5.42-b2 on CentOS-5 which includes QT4.2.1 . I grabbed qt 4.3.4 srpm from Fedora-7 and did rpmbuild --rebuild --define 'dist .el5' --define 'rhel 5' \ qt4-4.3.4-14.fc7.src.rpm yum localinstall qt4-4.3.4-14.el5.x86_64.rpm \ qt4-devel-4.3.4-14.el5.x86_64.rpm \ qt4-x11-4.3.4-14.el5.x86_64.rpm I installed qwt-devel from EPEL. BAT built and runs fine. Kel Raywood -- This SF.net email is sponsored by: High Quality Requirements in a Collaborative Environment. Download a free trial of Rational Requirements Composer Now! http://p.sf.net/sfu/www-ibm-com ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users -- This SF.net email is sponsored by: High Quality Requirements in a Collaborative Environment. Download a free trial of Rational Requirements Composer Now
[Bacula-users] Duplicate Job Control?
Hello, I was excited to take advantage of the Duplicate Job Control feature in 3.0.0 but it does not appear to be working. At first I assumed the defaults listed on the New Features documentation page, then later explicitly defined them in my jobdefs config: JobDefs { Name = DefaultJob ... Maximum Concurrent Jobs = 2 Allow Duplicate Jobs = no Allow Higher Duplicates = yes Cancel Queued Duplicates = yes Cancel Running Duplicates = no } I used to set 'per job' the max concurrent jobs to 1, which would not cancel duplicate jobs, but would force them to wait for the running job to finish. What I actually want is for the duplicate job (queued) to be canceled, so that the running job can complete and a redundant job is not then run immediately afterwards. So, I changed the maximum concurrent jobs to 2, hoping to see the duplicate job control cancel one of the duplicate jobs. Here is what I actually experienced. I launched a job, waited for it to reach a running state, then launched an identical job (name, level, priority, etc). To my surprise the identical job reached a running state as well! I thought that according to the above config, the queued job of the same priority should be canceled rather than moved into a running state? JobId Level Name Status == 37802 Increme seismo70.2009-04-21_08.42.17_05 is running 37803 Increme seismo70.2009-04-21_08.42.30_06 is running Anyone have any idea why this might not be working for me? Am I misunderstanding how this should work? thanks! Stephen berkeley seismology laboratory -- -- Stay on top of everything new and different, both inside and around Java (TM) technology - register by April 22, and save $200 on the JavaOne (SM) conference, June 2-5, 2009, San Francisco. 300 plus technical and hands-on sessions. Register today. Use priority code J9JMT32. http://p.sf.net/sfu/p ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
[Bacula-users] Canceling jobs in 3.0.0 results in Terminated with error
Hello everyone. I recently upgrade from 2.4.4 to 3.0.0. Everything went very smoothly and the new version is running at least as well as the previous. Once peculiar thing I've noticed is that if I cancel a job, rather than the job being set to a Canceled by user state, it winds up being set to Terminated with error. Anyone else notice this and/or have ideas why this might be happening? thanks! Stephen -- Stay on top of everything new and different, both inside and around Java (TM) technology - register by April 22, and save $200 on the JavaOne (SM) conference, June 2-5, 2009, San Francisco. 300 plus technical and hands-on sessions. Register today. Use priority code J9JMT32. http://p.sf.net/sfu/p ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Canceling jobs in 3.0.0 results in Terminated with error
RE logs... Looks like the cancel request is put in, but results in an error. Note that the message about there being no SL500-changer with type LTO-3 in the SD resources seems bogus -- that's the storage device and type with which all the non-canceled jobs successfully run. sample log (server lawson/client agentsmith): 16-Apr 07:41 lawson-sd JobId 315-Apr 20:00 lawson-dir JobId 36627: Start Backup JobId 36627, Job=agentsmith.2009-04-15_20.00.00_23 16-Apr 08:31 lawson-sd JobId 36627: Job agentsmith.2009-04-15_20.00.00_23 marked to be canceled. 16-Apr 08:31 lawson-sd JobId 36627: Failed command: Jmsg Job=agentsmith.2009-04-15_20.00.00_23 type=6 level=1239895886 lawson-sd JobId 36627: Job agentsmith.2009-04-15_20.00.00_23 marked to be canceled. 16-Apr 08:31 lawson-sd JobId 36627: Fatal error: Device SL500-changer with MediaType LTO-3 requested by DIR not found in SD Device resources. 16-Apr 08:31 lawson-dir JobId 36627: Fatal error: Storage daemon didn't accept Device SL500-changer because: 3924 Device SL500-changer not in SD Device resources. 16-Apr 08:31 lawson-dir JobId 36627: Error: Bacula lawson-dir 3.0.0 (06Apr09): 16-Apr-2009 08:31:58 Build OS: i386-pc-solaris2.10 solaris 5.10 JobId: 36627 Job:agentsmith.2009-04-15_20.00.00_23 Backup Level: Incremental, since=2009-04-14 20:01:47 Client: agentsmith-fd 2.4.2 (26Jul08) x86_64-unknown-linux-gnu,redhat,Enterprise release FileSet:agentsmith-fs 2008-08-15 12:09:36 Pool: Incremental-Pool (From Run pool override) Catalog:MyCatalog (From Client resource) Storage:SL500-changer (From Job resource) Scheduled time: 15-Apr-2009 20:00:00 Start time: 15-Apr-2009 20:00:00 End time: 16-Apr-2009 08:31:58 Elapsed time: 12 hours 31 mins 58 secs Priority: 10 FD Files Written: 0 SD Files Written: 0 FD Bytes Written: 0 (0 B) SD Bytes Written: 0 (0 B) Rate: 0.0 KB/s Software Compression: None VSS:no Encryption: no Accurate: no Volume name(s): Volume Session Id: 7 Volume Session Time:1239837527 Last Volume Bytes: 1 (1 B) Non-fatal FD errors:0 SD Errors: 0 FD termination status: SD termination status: Termination:*** Backup Error *** snippet from SD resources file: . . . Autochanger { Name = SL500-changer Device = SL500-Drive-0 Device = SL500-Drive-1 Changer Command = /opt/bacula/scripts/mtx-changer %c %o %S %a %d Changer Device = /dev/changer } Device { Name = SL500-Drive-0 Drive Index = 0 Media Type = LTO-3 Archive Device = /dev/rmt/0cbn AutomaticMount = yes; # when device opened, read it AlwaysOpen = yes; RemovableMedia = yes; RandomAccess = no; AutoChanger = yes; AutoSelect = yes; Maximum block size = 262144 # 256kb Alert Command = sh -c 'tapeinfo -f %c |grep TapeAlert|cat' Maximum Spool Size = 140gb Maximum Job Spool Size = 50gb Spool Directory = /bacula/spool } . . . Martin Simmons wrote: On Wed, 22 Apr 2009 16:07:49 -0700, Stephen Thompson said: I recently upgrade from 2.4.4 to 3.0.0. Everything went very smoothly and the new version is running at least as well as the previous. Once peculiar thing I've noticed is that if I cancel a job, rather than the job being set to a Canceled by user state, it winds up being set to Terminated with error. Anyone else notice this and/or have ideas why this might be happening? What does the log output show? __Martin -- Stay on top of everything new and different, both inside and around Java (TM) technology - register by April 22, and save $200 on the JavaOne (SM) conference, June 2-5, 2009, San Francisco. 300 plus technical and hands-on sessions. Register today. Use priority code J9JMT32. http://p.sf.net/sfu/p ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users -- Stephen Thompson Berkeley Seismological Laboratory step...@seismo.berkeley.edu215 McCone Hall # 4760 404.538.7077 (phone) University of California, Berkeley 510.643.5811 (fax) Berkeley, CA 94720-4760 -- Stay on top of everything new and different, both inside and around Java (TM) technology - register by April 22, and save $200 on the JavaOne (SM) conference, June 2-5, 2009, San Francisco. 300 plus technical and hands-on sessions. Register today. Use priority code J9JMT32. http
Re: [Bacula-users] duplicate jobs in 3.0
That's the behaviour I've seen when I set Maximum Concurrent Jobs = 1 under JobDefs. Then only one job with the same name can run at a time. What I was hoping for with duplicate job control was for the subsequent job(s) to be canceled so that they wouldn't run at all. thanks, Stephen Silver Salonen wrote: Hello. I noticed one thing today.. a big full backup was ran on friday, so it wasn't completed 24 hours later, but when the next job's time arrived, it wasn't run. I was very surprised, because I expected it to run as it has been the case without allow duplicate jobs = no with Bacula 2.x. When the full job completed, the scheduled (and not run) one started immediately and was correctly making an incremental backup. So it seems that duplicate job control does work, just not the way I expected, ie. I expected it to being cancelled (I guess I thought it's the Cancel Queued Duplicates directive, but now, I guess not) instead of being hidden and waiting somewhere back there. -- Stephen Thompson Berkeley Seismological Laboratory step...@seismo.berkeley.edu215 McCone Hall # 4760 404.538.7077 (phone) University of California, Berkeley 510.643.5811 (fax) Berkeley, CA 94720-4760 -- Register Now Save for Velocity, the Web Performance Operations Conference from O'Reilly Media. Velocity features a full day of expert-led, hands-on workshops and two days of sessions from industry leaders in dedicated Performance Operations tracks. Use code vel09scf and Save an extra 15% before 5/3. http://p.sf.net/sfu/velocityconf ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] How to prevent to schedule a job if the same job is still running
Note this tread as well; it's possible the documentation is wrong (the source code seem to indicate so): Dear All, Bacula (3.0.2) is configured to make daily backups of some systems. Full backups unfortunately take more then one day to complete and I want to avoid that duplicate jobs start (or are queued) before the full backup is completed. No duplicate job control directives are configured. If I understand the manual correctly (perhaps it's an interpretation error of me) http://www.bacula.org/en/dev-manual/New_Features.html#SECTION00310 this should not happen. I had a quick look in the source code and found this code in src/dird/job.c: bool allow_duplicate_job(JCR *jcr) { JOB *job = jcr-job; JCR *djcr;/* possible duplicate */ if (job-AllowDuplicateJobs) { return true; } if (!job-AllowHigherDuplicates) { -- code related to Cancel Queued Duplicates: and Cancel Running Duplicates here } return true; } Apparently Cancel Queued Duplicates and Cancel Running Duplicates are only evaluated when Allow Higher Duplicates is set to no - not default. Is this an error in the documentation, code or me not correctly understanding the manual or code? Kind regards, Bram Vandoren. -- Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users Silver Salonen wrote: On Monday 31 August 2009 09:33:22 Ralf Gross wrote: Silver Salonen schrieb: On Sunday 30 August 2009 13:58:44 Ralf Gross wrote: Martina Mohrstein schrieb: So my question is how could I prevent the schedule of a job when the same job is already running? Maybe the new Duplicate Job Control feature in 3.0.x helps to prevent this? http://www.bacula.org/en/dev-manual/New_Features.html#515 - Allow Duplicate Jobs - Allow Higher Duplicates - Cancel Queued Duplicates - Cancel Running Duplicates I haven't still seen it working as it should (even in 3.0.2), but yes, a day it will, it'll be the most correct and easiest way to achieve this. Did anyone file a bug report about that? I searched the bug database, but couldn't find a report about that. At least no open bug. Ralf Hmm, I guess not then. But it has been reported several times in the list. So, any volunteers? :) -- Stephen Thompson Berkeley Seismological Laboratory step...@seismo.berkeley.edu215 McCone Hall # 4760 404.538.7077 (phone) University of California, Berkeley 510.643.5811 (fax) Berkeley, CA 94720-4760 -- Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] How to prevent to schedule a job if the same job is still running
Hello, This works as reported for me as well, however, what I want to have in the first case is for the originally scheduled job to be canceled, not the duplicate. The reason being that my incrementals fall into a daily schedule, whereas my fulls are scheduled out-of-band, so I want the incremental to be canceled on the day that a full is scheduled. Given what you all say below, this doesn't seem possible with bacula's Duplicate Job Control. Correct? thanks! Stephen Silver Salonen wrote: On Monday 14 September 2009 15:59:24 Bram Vandoren wrote: Hi All, Silver Salonen wrote: Hmm, I guess not then. But it has been reported several times in the list. So, any volunteers? :) This configuration: Allow Higher Duplicates = no Cancel Queued Duplicates = yes Seems to work fine in my situation (some more testing is needed). It cancels the newly created duplicate job immediately. This configuration: Allow Higher Duplicates = no Cancel Running Duplicates = yes cancels the running job and starts the same one. If you have a job that takes more than 24h to complete a runs daily, it will never finish. Hope it helps. When I find some try I will reopen the bug report. Cheers, Bram. I'll try that on my servers with a few hundred jobs and report about it tomorrow :) But as default options are Cancel Queued Duplicates = yes and Cancel Running Duplicates = no, the only needed option seems to be Allow Higher Duplicates = no. I myself had set Allow Duplicate Jobs = no, because I thought it includes Allow Higher Duplicates = no too. Whether the one option helps or not, I'll tell tomorrow. -- Stephen Thompson Berkeley Seismological Laboratory step...@seismo.berkeley.edu215 McCone Hall # 4760 404.538.7077 (phone) University of California, Berkeley 510.643.5811 (fax) Berkeley, CA 94720-4760 -- Come build with us! The BlackBerryreg; Developer Conference in SF, CA is the only developer event you need to attend this year. Jumpstart your developing skills, take BlackBerry mobile applications to market and stay ahead of the curve. Join us from November 9#45;12, 2009. Register now#33; http://p.sf.net/sfu/devconf ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] How to prevent to schedule a job if the same job is still running
Silver Salonen wrote: Actually, you can do it - Allow Higher Duplicates really means ANY duplicate job, not only a higher one. I just tested it and an incremental job is cancelled if either full or incremental instance of the same job is still running. So in my case Allow Higher Duplicates did the trick :) Really? This is exactly what I want and what I tried for when 3.x was first released, but my experiments showed that nothing was canceled. The jobs rather began running concurrently. I'll try this again. Are you saying to set Allow Higher Duplicates to Yes or No? Actually, could you possibly list what you have all the relevant values set to? I would most appreciate it. thanks, Stephen -- Stephen Thompson Berkeley Seismological Laboratory step...@seismo.berkeley.edu215 McCone Hall # 4760 404.538.7077 (phone) University of California, Berkeley 510.643.5811 (fax) Berkeley, CA 94720-4760 -- Come build with us! The BlackBerryreg; Developer Conference in SF, CA is the only developer event you need to attend this year. Jumpstart your developing skills, take BlackBerry mobile applications to market and stay ahead of the curve. Join us from November 9#45;12, 2009. Register now#33; http://p.sf.net/sfu/devconf ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] How to prevent to schedule a job if the same job is still running
Ah, thanks for the info, but this still is not the behavior that I am looking for. This does indeed cancel incrementals if a full is already running (actually even if a full is merely scheduled), but it goes both ways, it also cancels my fulls if an incremental is already running or scheduled. It's the scheduled part that causes me problems. I have incrementals scheduled to run every day. I then interject full jobs each day based on a script that determines which of my hosts are available for a full that day. This configuration immediately cancels my fulls, rather than letting them run and then canceling the corresponding incrementals when they are actually launched. This might work out if all jobs have static (i.e. based on configuration files) schedules, but rather than controlling duplicates, it seems better at preventing administrator intervention which is frustrating. I recognize I might have a unique situation (dynamically scheduling fulls based on availability rather than a regular calendar cycle) which is fine; I'll probably have to pull my incremental scheduling out of bacula and cron the injection of jobs via a script. But to me, there is still a design issue with considering a scheduled job to be in duplicate conflict with a running job; it seems like it would make more sense to only apply that logic in the running queue (whether actually running or waiting for resources). Then canceled queued duplicates would cancel any job that attempted to enter the running state if another job was already running. As it is now, it appears to cancel any job entering the running state even if another job is merely scheduled to run at some point in the future. Cancellations should happen on conflict, not on suspicion that conflict might arise in the future. But perhaps that's being too philosophical. :) Stephen Silver Salonen wrote: On Tuesday 15 September 2009 17:36:25 Stephen Thompson wrote: Silver Salonen wrote: Actually, you can do it - Allow Higher Duplicates really means ANY duplicate job, not only a higher one. I just tested it and an incremental job is cancelled if either full or incremental instance of the same job is still running. So in my case Allow Higher Duplicates did the trick :) Really? This is exactly what I want and what I tried for when 3.x was first released, but my experiments showed that nothing was canceled. The jobs rather began running concurrently. I'll try this again. Are you saying to set Allow Higher Duplicates to Yes or No? Actually, could you possibly list what you have all the relevant values set to? I would most appreciate it. thanks, Stephen Yeah, I was positively surprised today too :) I have just one option in every JobDefs for that: Allow Higher Duplicates = no -- Stephen Thompson Berkeley Seismological Laboratory step...@seismo.berkeley.edu215 McCone Hall # 4760 404.538.7077 (phone) University of California, Berkeley 510.643.5811 (fax) Berkeley, CA 94720-4760 -- Come build with us! The BlackBerryreg; Developer Conference in SF, CA is the only developer event you need to attend this year. Jumpstart your developing skills, take BlackBerry mobile applications to market and stay ahead of the curve. Join us from November 9#45;12, 2009. Register now#33; http://p.sf.net/sfu/devconf ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] bacula 5.0.1 and db issues - please, share your experience
I mostly use bat for restores (i.e. building restore tree). I did nothing with my tables regarding indexing. I have whatever the bacula scripts create by defaults. In regards to tuning, I did play with changing the join and sort buffer sizes. I found a 'slight' increase in performance. By slight, I mean something like 4.5 vs 4.6 minutes for the same restore. Stephen On 4/14/10 6:29 AM, Koldo Santisteban wrote: Thanks for your answers. Stephen, do you use bconsole or bat? Perhaps the issue is only on bat. I recognize that i only use bweb and bat (on windows). Regarding your comments Stephen, my bacula server is smaller than yours, but my catalog size was 400 Mb (Baculas is working since 2 months ago). I don´t tune my database, but with 3.0.3 version thats wasn´t neccesary. Wich parameters are recommending to tune? This info i think that is very useful for people with the same issue like me... i see that some people says that the better way is creating new indexes(someone says that this is the worst option), others say to custom mysql parameters...but i can´t find any official info, and, at less in my case, i don´t have enough time(and knowledge) to testing bacula with some new indexes, or customizing mysql/postgre... I miss this offcial info... Regards On Tue, Apr 13, 2010 at 7:36 PM, Thomas Mueller tho...@chaschperli.ch mailto:tho...@chaschperli.ch wrote: Am Tue, 13 Apr 2010 15:59:25 +0200 schrieb Koldo Santisteban: thanks for your answer. The first stage was with mysql 5.0.77, and works with bacula 3.0.3 without problems. I have used the same database and server with bacula 5.0.1. The bacula server + DB is a 3,5 Gb Ram with a Xeon processor. I have tested my environment installing postgree on the same server and with a empty db. I create full bacula server backup and then try to restore. I have detected that the restore process works fine usgin bweb and bacula 5.0.1, what is the difference between bat and bweb? noticed too, bat takes forever on building trees on restores. bconsole is _much_ faster. - Thomas -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net mailto:Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users -- Stephen Thompson Berkeley Seismological Laboratory step...@seismo.berkeley.edu215 McCone Hall # 4760 404.538.7077 (phone) University of California, Berkeley 510.643.5811 (fax) Berkeley, CA 94720-4760 -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
[Bacula-users] 5.0.1 infinite email loop bug??
Hello, I have just now experienced a possible new bug with bacula 5.0.1. The symptoms are this: bacula-sd crashes bacula-dir continues to run bacula-dir then spews out identical Intervention needed emails until manually restarted The first time this happened over a weekend and upon returning I found my inbox has about 120,000 bacula emails, all the SAME and of this type: 15-Apr 10:02 client-fd JobId 11: Fatal error: backup.c:1048 Network send error to SD. ERR=Broken pipe It happened again just now (second time since upgrading from 3.0.3 to 5.0.1) and I managed to stop the director with only a few thousand emails going out. So there are really 2 issues here: 1) Why does the director apparently get stuck in an infinite loop of sending the same email message? Is this a known bug? 2) Regarding the SD, I received one alert of this type, the rest like the above: 15-Apr 10:02 server-sd: ERROR in lock.c:268 Failed ASSERT: dev-blocked() A traceback like: -- ptrace: Operation not permitted. /var/bacula/work/29091: No such file or directory. $1 = 0 /opt/bacula-5.0.1/scripts/btraceback.gdb:2: Error in sourced command file: No symbol exename in current context. -- And a bactrace like: -- Attempt to dump current JCRs JCR=0x19a24888 JobId=10 name=client_1.2010-04-14_18.02.33_41 JobStatus=l use_count=1 JobType=B JobLevel=F sched_time=14-Apr-2010 21:35 start_time=14-Apr-2010 21:35 end_time=31-Dec-1969 16:00 wait_time=31-Dec-1969 16:00 db=(nil) db_batch=(nil) batch_started=0 JCR=0x1981b248 JobId=11 name=client_10.2010-04-14_20.00.15_04 JobStatus=R use_count=1 JobType=B JobLevel=I sched_time=15-Apr-2010 09:15 start_time=15-Apr-2010 09:15 end_time=31-Dec-1969 16:00 wait_time=31-Dec-1969 16:00 db=(nil) db_batch=(nil) batch_started=0 Attempt to dump plugins. Hook count=0 -- Both clients and server seem healthy, except for the SD crash. Any ideas? thanks! Stephen - Further info: My catalog... mysql-5.0.77 (64bit) MyISAM 210Gb in size 1,412,297,215 records in File table note: database built with bacula 2x scripts, upgraded with 3x scripts, then again with 5x scripts (i.e. nothing customized along the way) My OS hardware for bacula DIR+SD server... Centos 5.4 (fully patched) 8Gb RAM 2Gb Swap 1Tb EXT3 filesystem on external fiber RAID5 array (dedicated to database, incl. temp files) 2 dual-core [AMD Opteron(tm) Processor 2220] CPUs StorageTek SL500 Library with 2 LTO3 Drives -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] 5.0.1 infinite email loop bug??
Additionally, seems like the SD was possibly reading a new freshly-labeled tape when it crashed... Last items in bacula log besides alerts already mentioned: 15-Apr 09:31 server-sd JobId 10: Writing spooled data to Volume. Despooling 35,000,185,219 bytes ... 15-Apr 09:51 server-sd JobId 10: End of Volume FB0568 at 888:1414 on device SL500-Drive-1 (/dev/nst0). Write of 262144 bytes got -1. 15-Apr 09:51 server-sd JobId 10: Re-read of last block succeeded. 15-Apr 09:51 server-sd JobId 10: End of medium on Volume FB0568 Bytes=887,261,470,720 Blocks=3,384,635 at 15-Apr-2010 09:51. 15-Apr 09:51 server-sd JobId 10: 3307 Issuing autochanger unload slot 38, drive 1 command. 15-Apr 09:52 server-sd JobId 10: 3301 Issuing autochanger loaded? drive 1 command. 15-Apr 09:52 server-sd JobId 10: 3302 Autochanger loaded? drive 1, result: nothing loaded. 15-Apr 09:52 server-sd JobId 10: 3304 Issuing autochanger load slot 39, drive 1 command. 15-Apr 09:52 server-sd JobId 10: 3305 Autochanger load slot 39, drive 1, status is OK. 15-Apr 09:52 server-sd JobId 10: Volume FB0569 previously written, moving to end of data. Nothing but thousands of 'repetitive' alerts after that... thanks again, Stephen On 04/15/2010 10:25 AM, Stephen Thompson wrote: Hello, I have just now experienced a possible new bug with bacula 5.0.1. The symptoms are this: bacula-sd crashes bacula-dir continues to run bacula-dir then spews out identical Intervention needed emails until manually restarted The first time this happened over a weekend and upon returning I found my inbox has about 120,000 bacula emails, all the SAME and of this type: 15-Apr 10:02 client-fd JobId 11: Fatal error: backup.c:1048 Network send error to SD. ERR=Broken pipe It happened again just now (second time since upgrading from 3.0.3 to 5.0.1) and I managed to stop the director with only a few thousand emails going out. So there are really 2 issues here: 1) Why does the director apparently get stuck in an infinite loop of sending the same email message? Is this a known bug? 2) Regarding the SD, I received one alert of this type, the rest like the above: 15-Apr 10:02 server-sd: ERROR in lock.c:268 Failed ASSERT: dev-blocked() A traceback like: -- ptrace: Operation not permitted. /var/bacula/work/29091: No such file or directory. $1 = 0 /opt/bacula-5.0.1/scripts/btraceback.gdb:2: Error in sourced command file: No symbol exename in current context. -- And a bactrace like: -- Attempt to dump current JCRs JCR=0x19a24888 JobId=10 name=client_1.2010-04-14_18.02.33_41 JobStatus=l use_count=1 JobType=B JobLevel=F sched_time=14-Apr-2010 21:35 start_time=14-Apr-2010 21:35 end_time=31-Dec-1969 16:00 wait_time=31-Dec-1969 16:00 db=(nil) db_batch=(nil) batch_started=0 JCR=0x1981b248 JobId=11 name=client_10.2010-04-14_20.00.15_04 JobStatus=R use_count=1 JobType=B JobLevel=I sched_time=15-Apr-2010 09:15 start_time=15-Apr-2010 09:15 end_time=31-Dec-1969 16:00 wait_time=31-Dec-1969 16:00 db=(nil) db_batch=(nil) batch_started=0 Attempt to dump plugins. Hook count=0 -- Both clients and server seem healthy, except for the SD crash. Any ideas? thanks! Stephen - Further info: My catalog... mysql-5.0.77 (64bit) MyISAM 210Gb in size 1,412,297,215 records in File table note: database built with bacula 2x scripts, upgraded with 3x scripts, then again with 5x scripts (i.e. nothing customized along the way) My OS hardware for bacula DIR+SD server... Centos 5.4 (fully patched) 8Gb RAM 2Gb Swap 1Tb EXT3 filesystem on external fiber RAID5 array (dedicated to database, incl. temp files) 2 dual-core [AMD Opteron(tm) Processor 2220] CPUs StorageTek SL500 Library with 2 LTO3 Drives -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ Bacula-devel mailing list bacula-de...@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-devel -- Stephen Thompson Berkeley Seismological Laboratory step...@seismo.berkeley.edu215 McCone Hall # 4760 404.538.7077 (phone) University of California, Berkeley 510.643.5811 (fax) Berkeley, CA 94720-4760 -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself
Re: [Bacula-users] [Bacula-devel] 5.0.1 infinite email loop bug??
Additionally, seems like the SD was possibly reading a new freshly-labeled tape when it crashed... Last items in bacula log besides alerts already mentioned: 15-Apr 09:31 server-sd JobId 10: Writing spooled data to Volume. Despooling 35,000,185,219 bytes ... 15-Apr 09:51 server-sd JobId 10: End of Volume FB0568 at 888:1414 on device SL500-Drive-1 (/dev/nst0). Write of 262144 bytes got -1. 15-Apr 09:51 server-sd JobId 10: Re-read of last block succeeded. 15-Apr 09:51 server-sd JobId 10: End of medium on Volume FB0568 Bytes=887,261,470,720 Blocks=3,384,635 at 15-Apr-2010 09:51. 15-Apr 09:51 server-sd JobId 10: 3307 Issuing autochanger unload slot 38, drive 1 command. 15-Apr 09:52 server-sd JobId 10: 3301 Issuing autochanger loaded? drive 1 command. 15-Apr 09:52 server-sd JobId 10: 3302 Autochanger loaded? drive 1, result: nothing loaded. 15-Apr 09:52 server-sd JobId 10: 3304 Issuing autochanger load slot 39, drive 1 command. 15-Apr 09:52 server-sd JobId 10: 3305 Autochanger load slot 39, drive 1, status is OK. 15-Apr 09:52 server-sd JobId 10: Volume FB0569 previously written, moving to end of data. Nothing but thousands of 'repetitive' alerts after that... thanks again, Stephen On 04/15/2010 10:25 AM, Stephen Thompson wrote: Hello, I have just now experienced a possible new bug with bacula 5.0.1. The symptoms are this: bacula-sd crashes bacula-dir continues to run bacula-dir then spews out identical Intervention needed emails until manually restarted The first time this happened over a weekend and upon returning I found my inbox has about 120,000 bacula emails, all the SAME and of this type: 15-Apr 10:02 client-fd JobId 11: Fatal error: backup.c:1048 Network send error to SD. ERR=Broken pipe It happened again just now (second time since upgrading from 3.0.3 to 5.0.1) and I managed to stop the director with only a few thousand emails going out. So there are really 2 issues here: 1) Why does the director apparently get stuck in an infinite loop of sending the same email message? Is this a known bug? 2) Regarding the SD, I received one alert of this type, the rest like the above: 15-Apr 10:02 server-sd: ERROR in lock.c:268 Failed ASSERT: dev-blocked() A traceback like: -- ptrace: Operation not permitted. /var/bacula/work/29091: No such file or directory. $1 = 0 /opt/bacula-5.0.1/scripts/btraceback.gdb:2: Error in sourced command file: No symbol exename in current context. -- And a bactrace like: -- Attempt to dump current JCRs JCR=0x19a24888 JobId=10 name=client_1.2010-04-14_18.02.33_41 JobStatus=l use_count=1 JobType=B JobLevel=F sched_time=14-Apr-2010 21:35 start_time=14-Apr-2010 21:35 end_time=31-Dec-1969 16:00 wait_time=31-Dec-1969 16:00 db=(nil) db_batch=(nil) batch_started=0 JCR=0x1981b248 JobId=11 name=client_10.2010-04-14_20.00.15_04 JobStatus=R use_count=1 JobType=B JobLevel=I sched_time=15-Apr-2010 09:15 start_time=15-Apr-2010 09:15 end_time=31-Dec-1969 16:00 wait_time=31-Dec-1969 16:00 db=(nil) db_batch=(nil) batch_started=0 Attempt to dump plugins. Hook count=0 -- Both clients and server seem healthy, except for the SD crash. Any ideas? thanks! Stephen - Further info: My catalog... mysql-5.0.77 (64bit) MyISAM 210Gb in size 1,412,297,215 records in File table note: database built with bacula 2x scripts, upgraded with 3x scripts, then again with 5x scripts (i.e. nothing customized along the way) My OS hardware for bacula DIR+SD server... Centos 5.4 (fully patched) 8Gb RAM 2Gb Swap 1Tb EXT3 filesystem on external fiber RAID5 array (dedicated to database, incl. temp files) 2 dual-core [AMD Opteron(tm) Processor 2220] CPUs StorageTek SL500 Library with 2 LTO3 Drives -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ Bacula-devel mailing list bacula-de...@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-devel -- Stephen Thompson Berkeley Seismological Laboratory step...@seismo.berkeley.edu215 McCone Hall # 4760 404.538.7077 (phone) University of California, Berkeley 510.643.5811 (fax) Berkeley, CA 94720-4760 -- Download Intel#174; Parallel Studio Eval Try the new software tools
Re: [Bacula-users] [Bacula-devel] 5.0.1 infinite email loop bug??
Hello, Thanks for the response. No, it's nothing to do with mail configuration; 100% sure of that. (I know people say that all the time, but, seriously, it's the director). And by alerts, I do mean Messages in the bacula vernacular. The first time this crash happened, we received 120,000 Messages in the form of emails to our administrative account. The messages were identical both to each other and to the content of the $JOB.mail file in our bacula working directory (which is never removed automatically after one of these crashes - perhaps that causes the endless cycle). The same Message also appears to be written to our bacula log file each time an email is generated (or vice versa). It seems to me like it's possible for the director to get stuck in a loop and send the contents of that mail file again and again, infinitely. Both times we've had the SD crash (both have happened since upgrading to 5.0.1), the only thing that stopped the Message generation was stopping the director itself. Of course, that's the annoying symptom. The more serious problem is our the crash of our SD. Any pointers to getting ptrace working with the automatic scripts? thanks! Stephen On 04/15/2010 12:40 PM, Kern Sibbald wrote: On Thursday 15 April 2010 19:36:51 Stephen Thompson wrote: Additionally, seems like the SD was possibly reading a new freshly-labeled tape when it crashed... Last items in bacula log besides alerts already mentioned: In Bacula alerts refer to tape drive information stored concerning tape problems, so I am assuming you mean messages. 15-Apr 09:31 server-sd JobId 10: Writing spooled data to Volume. Despooling 35,000,185,219 bytes ... 15-Apr 09:51 server-sd JobId 10: End of Volume FB0568 at 888:1414 on device SL500-Drive-1 (/dev/nst0). Write of 262144 bytes got -1. 15-Apr 09:51 server-sd JobId 10: Re-read of last block succeeded. 15-Apr 09:51 server-sd JobId 10: End of medium on Volume FB0568 Bytes=887,261,470,720 Blocks=3,384,635 at 15-Apr-2010 09:51. 15-Apr 09:51 server-sd JobId 10: 3307 Issuing autochanger unload slot 38, drive 1 command. 15-Apr 09:52 server-sd JobId 10: 3301 Issuing autochanger loaded? drive 1 command. 15-Apr 09:52 server-sd JobId 10: 3302 Autochanger loaded? drive 1, result: nothing loaded. 15-Apr 09:52 server-sd JobId 10: 3304 Issuing autochanger load slot 39, drive 1 command. 15-Apr 09:52 server-sd JobId 10: 3305 Autochanger load slot 39, drive 1, status is OK. 15-Apr 09:52 server-sd JobId 10: Volume FB0569 previously written, moving to end of data. Nothing but thousands of 'repetitive' alerts after that... What exactly is repeated? There was a Bacula bug #1480 in message delivery that may be the same that you are experiencing, it was triggered by a misconfigured SMTP server or by a reference in Bacula to a non-existent SMTP server - and the simple solution is to make sure Bacula points to a valid functional SMTP server. This problem was not particular to version 5.0.1, but I think it was fixed after the release of 5.0.1. Please see the bugs database for more details. Kern thanks again, Stephen On 04/15/2010 10:25 AM, Stephen Thompson wrote: Hello, I have just now experienced a possible new bug with bacula 5.0.1. The symptoms are this: bacula-sd crashes bacula-dir continues to run bacula-dir then spews out identical Intervention needed emails until manually restarted The first time this happened over a weekend and upon returning I found my inbox has about 120,000 bacula emails, all the SAME and of this type: 15-Apr 10:02 client-fd JobId 11: Fatal error: backup.c:1048 Network send error to SD. ERR=Broken pipe It happened again just now (second time since upgrading from 3.0.3 to 5.0.1) and I managed to stop the director with only a few thousand emails going out. So there are really 2 issues here: 1) Why does the director apparently get stuck in an infinite loop of sending the same email message? Is this a known bug? 2) Regarding the SD, I received one alert of this type, the rest like the above: 15-Apr 10:02 server-sd: ERROR in lock.c:268 Failed ASSERT: dev-blocked() A traceback like: -- ptrace: Operation not permitted. /var/bacula/work/29091: No such file or directory. $1 = 0 /opt/bacula-5.0.1/scripts/btraceback.gdb:2: Error in sourced command file: No symbol exename in current context. -- And a bactrace like: -- Attempt to dump current JCRs JCR=0x19a24888 JobId=10 name=client_1.2010-04-14_18.02.33_41 JobStatus=l use_count=1 JobType=B JobLevel=F sched_time=14-Apr-2010 21:35 start_time=14-Apr-2010 21:35 end_time=31-Dec-1969 16:00 wait_time=31-Dec-1969 16:00 db=(nil) db_batch=(nil) batch_started=0 JCR=0x1981b248 JobId=11 name=client_10.2010-04-14_20.00.15_04 JobStatus=R use_count=1 JobType=B JobLevel=I sched_time=15-Apr
[Bacula-users] Warning about setting File/Job Retentions in Pool resource!
My, possibly mistaken, understanding of having File/Job Retention directives in a Pool resource was to be able to deviate from File/Job Retentions set by the Client resource AND to confine those retentions to the Pool where they are specified. What actually happens is that when using the Pool where the File/Job Retentions are specified, the retentions will apply to any File/Job's that were written to another Pool, overriding the Client resource. Real life example: The Job Retention for all my clients defaults to 1 year and I have monthly full Pools that I keep for a year. I also have an incremental/differential pool that I recycle on a 60-90 day basis. When I set the File/Job Retention to 90 days for my incremental/differential Pool and ran a complete set of incrementals, the 90 day retention was then applied to all of those jobs, not just for the incremental/differential Pool where the 90 day period was set, but for all of my monthly full Pools as well! This effectively purged 9 months of my Catalog records. :( Yes, I had a backup of the Catalog and yet it took 12 hours to restore. But, please note that it can be dangerous to use File/Job retentions in a Pool resource. thanks, Stephen -- Stephen Thompson Berkeley Seismological Laboratory step...@seismo.berkeley.edu215 McCone Hall # 4760 404.538.7077 (phone) University of California, Berkeley 510.643.5811 (fax) Berkeley, CA 94720-4760 -- ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Warning about setting File/Job Retentions in Pool resource!
For more clarity: What actually happens is that when writing to the Pool where the File/Job Retentions are specified, the retentions will apply to any File/Job's that were ALSO written to another Pool, thus overriding the Client resource regardless of Pool. On 04/26/2010 11:52 AM, Stephen Thompson wrote: My, possibly mistaken, understanding of having File/Job Retention directives in a Pool resource was to be able to deviate from File/Job Retentions set by the Client resource AND to confine those retentions to the Pool where they are specified. What actually happens is that when using the Pool where the File/Job Retentions are specified, the retentions will apply to any File/Job's that were written to another Pool, overriding the Client resource. Real life example: The Job Retention for all my clients defaults to 1 year and I have monthly full Pools that I keep for a year. I also have an incremental/differential pool that I recycle on a 60-90 day basis. When I set the File/Job Retention to 90 days for my incremental/differential Pool and ran a complete set of incrementals, the 90 day retention was then applied to all of those jobs, not just for the incremental/differential Pool where the 90 day period was set, but for all of my monthly full Pools as well! This effectively purged 9 months of my Catalog records. :( Yes, I had a backup of the Catalog and yet it took 12 hours to restore. But, please note that it can be dangerous to use File/Job retentions in a Pool resource. thanks, Stephen -- Stephen Thompson Berkeley Seismological Laboratory step...@seismo.berkeley.edu215 McCone Hall # 4760 404.538.7077 (phone) University of California, Berkeley 510.643.5811 (fax) Berkeley, CA 94720-4760 -- ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
[Bacula-users] 5.0.2 status of Restore jobs bytes read
Hello all, I can't say this with certainty, but I've believe I've been experiencing a new oddity with bacula ever since I upgraded from 3.X to 5.X. After launching a restore job, the status of the job via bconsole is always is waiting on Storage Device. This continues on long past the loading and forwarding of tapes until job completion. I could swear that there used to be more granular status messages, if only that it switched to running when not sending mtx commands. Am I mistaken? Also a status of the Storage Daemon does not display bytes out for a Restore job the way it does bytes in for a Backup job. I don't know if this is how it's always been or not, but seems like a bytes read back out would be pretty handy to have. thanks!!! Stephen -- Stephen Thompson Berkeley Seismological Laboratory step...@seismo.berkeley.edu215 McCone Hall # 4760 404.538.7077 (phone) University of California, Berkeley 510.643.5811 (fax) Berkeley, CA 94720-4760 -- ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
[Bacula-users] mysql to postgresql conversion?
Hello, Anyone have an up to date howto on converting a mysql bacula database to postgresql? There are notes on the bacula site, but it appears that they may be out of date. thanks! Stephen -- Stephen Thompson Berkeley Seismological Laboratory step...@seismo.berkeley.edu215 McCone Hall # 4760 404.538.7077 (phone) University of California, Berkeley 510.643.5811 (fax) Berkeley, CA 94720-4760 -- ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
[Bacula-users] postgres tuning?
Hello everyone, We recently attempted a mysql to postgresql migration for our bacula 5.0.2 server. The data migration itself was successful, however we are disappointly either getting the same or significantly worse performance out of the postgres db. I was hoping that someone might have some insight into this. Here is some background: software: centos 5.5 (64bit) bacula 5.0.2 (64bit) postgresql 8.1.21 (64bit) (previously... mysql-5.0.77 (64bit) MyISAM) database: select count(*) from File -- 1,439,626,558 du -sk /var/lib/pgsql/data -- 346,236,136 /var/lib/pgsql/data hardware: 1Tb EXT3 external fibre-RAID storage 8Gb RAM 2Gb SWAP 2 dual-core [AMD Opteron(tm) Processor 2220] CPUs Some of the postgres tuning that I've attempted thus far (comments are either default or alternatively settings I've tried without effect): #shared_buffers = 1000# min 16 or max_connections*2, 8KB each shared_buffers = 262144 # 2Gb #work_mem = 1024# min 64, size in KB work_mem = 524288 # 512Mb #maintenance_work_mem = 16384 # min 1024, size in KB maintenance_work_mem = 2097152 # 2Gb #checkpoint_segments = 3 # in logfile segments, min 1, 16MB each checkpoint_segments = 16 #checkpoint_warning = 30# in seconds, 0 is off checkpoint_warning = 16 #effective_cache_size = 1000# typically 8KB each #effective_cache_size = 262144 # 256Mb effective_cache_size = 6291456 # 6Gb #random_page_cost = 4 # units are one sequential page fetch cost random_page_cost = 2 Now, as to what I'm 'seeing'. Building restore trees are on par with my previous mysql db, but what I'm seeing as significantly worse are: mysql postgresql Within Bat: 1) Version Browser (large sample job)3min 9min 2) Restore Tree (average sample job)40sec25sec 3) Restore Tree (large sample job) 10min 8.5min 2) Jobs Run (1000 Records) 10sec 2min Within psql/mysql: 1) select count(*) from File;1sec30min Catalog dump: 1) mysqldump/pgdump 2hrs 3hrs I get a win on building Restore trees, but everywhere else, it's painfully slow. It makes the bat utility virtually unusable as an interface. Why the win (albeit moderate) in some cases but terrible responses in others? I admit that I am not familiar with postgres at all, but I tried to walk through some of the postgres tuning documents, including the notes in the bacula manual to arrive at the above settings. Also note that I've tried several variants on the configuration above (including the postgres defaults), don't have a detailed play by play of the results, but the time results above seemed typical regardless of what settings I tweaked. Any help would be greatly appreciated! Stephen -- Stephen Thompson Berkeley Seismological Laboratory step...@seismo.berkeley.edu215 McCone Hall # 4760 404.538.7077 (phone) University of California, Berkeley 510.643.5811 (fax) Berkeley, CA 94720-4760 -- ThinkGeek and WIRED's GeekDad team up for the Ultimate GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the lucky parental unit. See the prize list and enter to win: http://p.sf.net/sfu/thinkgeek-promo ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] postgres tuning?
Correction: I didn't notice the 8k per unit settings at first with postgres 8.1. Should read: effective_cache_size = 786432# 6Gb On 06/04/2010 10:58 AM, Stephen Thompson wrote: Hello everyone, We recently attempted a mysql to postgresql migration for our bacula 5.0.2 server. The data migration itself was successful, however we are disappointly either getting the same or significantly worse performance out of the postgres db. I was hoping that someone might have some insight into this. Here is some background: software: centos 5.5 (64bit) bacula 5.0.2 (64bit) postgresql 8.1.21 (64bit) (previously... mysql-5.0.77 (64bit) MyISAM) database: select count(*) from File -- 1,439,626,558 du -sk /var/lib/pgsql/data -- 346,236,136 /var/lib/pgsql/data hardware: 1Tb EXT3 external fibre-RAID storage 8Gb RAM 2Gb SWAP 2 dual-core [AMD Opteron(tm) Processor 2220] CPUs Some of the postgres tuning that I've attempted thus far (comments are either default or alternatively settings I've tried without effect): #shared_buffers = 1000# min 16 or max_connections*2, 8KB each shared_buffers = 262144 # 2Gb #work_mem = 1024# min 64, size in KB work_mem = 524288 # 512Mb #maintenance_work_mem = 16384 # min 1024, size in KB maintenance_work_mem = 2097152 # 2Gb #checkpoint_segments = 3 # in logfile segments, min 1, 16MB each checkpoint_segments = 16 #checkpoint_warning = 30# in seconds, 0 is off checkpoint_warning = 16 #effective_cache_size = 1000# typically 8KB each #effective_cache_size = 262144 # 256Mb effective_cache_size = 6291456 # 6Gb #random_page_cost = 4 # units are one sequential page fetch cost random_page_cost = 2 Now, as to what I'm 'seeing'. Building restore trees are on par with my previous mysql db, but what I'm seeing as significantly worse are: mysql postgresql Within Bat: 1) Version Browser (large sample job) 3min 9min 2) Restore Tree (average sample job) 40sec25sec 3) Restore Tree (large sample job)10min 8.5min 2) Jobs Run (1000 Records)10sec 2min Within psql/mysql: 1) select count(*) from File; 1sec30min Catalog dump: 1) mysqldump/pgdump2hrs 3hrs I get a win on building Restore trees, but everywhere else, it's painfully slow. It makes the bat utility virtually unusable as an interface. Why the win (albeit moderate) in some cases but terrible responses in others? I admit that I am not familiar with postgres at all, but I tried to walk through some of the postgres tuning documents, including the notes in the bacula manual to arrive at the above settings. Also note that I've tried several variants on the configuration above (including the postgres defaults), don't have a detailed play by play of the results, but the time results above seemed typical regardless of what settings I tweaked. Any help would be greatly appreciated! Stephen -- Stephen Thompson Berkeley Seismological Laboratory step...@seismo.berkeley.edu215 McCone Hall # 4760 404.538.7077 (phone) University of California, Berkeley 510.643.5811 (fax) Berkeley, CA 94720-4760 -- ThinkGeek and WIRED's GeekDad team up for the Ultimate GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the lucky parental unit. See the prize list and enter to win: http://p.sf.net/sfu/thinkgeek-promo ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] postgres tuning?
Thanks, yes it is Linux. I will look at those limits settings. And yes, I've built indexes and analyze (nothing to vacuum yet since it's a fresh import). Stephen On 06/04/2010 12:16 PM, Alan Brown wrote: On Fri, 4 Jun 2010, Stephen Thompson wrote: Correction: I didn't notice the 8k per unit settings at first with postgres 8.1. Should read: effective_cache_size = 786432# 6Gb Assuming this is linux, you need to tweak /etc/sysctl/limits.conf a little: postgres softmemlock unlimited postgres hardmemlock unlimited @postgres hardmemlock unlimited @postgres softmemlock unlimited bacula softmemlock unlimited bacula hardmemlock unlimited @bacula softmemlock unlimited @bacula hardmemlock unlimited postgres softrss unlimited postgres hardrss unlimited Don't forget to build the indexes and run analyse/vacuum commands. So far I'm finding Postgres is far more forgiving than MySQL and has far fewer parts to tune... On 06/04/2010 10:58 AM, Stephen Thompson wrote: Hello everyone, We recently attempted a mysql to postgresql migration for our bacula 5.0.2 server. The data migration itself was successful, however we are disappointly either getting the same or significantly worse performance out of the postgres db. I was hoping that someone might have some insight into this. Here is some background: software: centos 5.5 (64bit) bacula 5.0.2 (64bit) postgresql 8.1.21 (64bit) (previously... mysql-5.0.77 (64bit) MyISAM) database: select count(*) from File -- 1,439,626,558 du -sk /var/lib/pgsql/data -- 346,236,136 /var/lib/pgsql/data hardware: 1Tb EXT3 external fibre-RAID storage 8Gb RAM 2Gb SWAP 2 dual-core [AMD Opteron(tm) Processor 2220] CPUs Some of the postgres tuning that I've attempted thus far (comments are either default or alternatively settings I've tried without effect): #shared_buffers = 1000# min 16 or max_connections*2, 8KB each shared_buffers = 262144 # 2Gb #work_mem = 1024# min 64, size in KB work_mem = 524288 # 512Mb #maintenance_work_mem = 16384 # min 1024, size in KB maintenance_work_mem = 2097152 # 2Gb #checkpoint_segments = 3 # in logfile segments, min 1, 16MB each checkpoint_segments = 16 #checkpoint_warning = 30# in seconds, 0 is off checkpoint_warning = 16 #effective_cache_size = 1000# typically 8KB each #effective_cache_size = 262144 # 256Mb effective_cache_size = 6291456 # 6Gb #random_page_cost = 4 # units are one sequential page fetch cost random_page_cost = 2 Now, as to what I'm 'seeing'. Building restore trees are on par with my previous mysql db, but what I'm seeing as significantly worse are: mysql postgresql Within Bat: 1) Version Browser (large sample job)3min 9min 2) Restore Tree (average sample job)40sec25sec 3) Restore Tree (large sample job) 10min 8.5min 2) Jobs Run (1000 Records) 10sec 2min Within psql/mysql: 1) select count(*) from File;1sec30min Catalog dump: 1) mysqldump/pgdump 2hrs 3hrs I get a win on building Restore trees, but everywhere else, it's painfully slow. It makes the bat utility virtually unusable as an interface. Why the win (albeit moderate) in some cases but terrible responses in others? I admit that I am not familiar with postgres at all, but I tried to walk through some of the postgres tuning documents, including the notes in the bacula manual to arrive at the above settings. Also note that I've tried several variants on the configuration above (including the postgres defaults), don't have a detailed play by play of the results, but the time results above seemed typical regardless of what settings I tweaked. Any help would be greatly appreciated! Stephen -- Stephen Thompson Berkeley Seismological Laboratory step...@seismo.berkeley.edu215 McCone Hall # 4760 404.538.7077 (phone) University of California, Berkeley 510.643.5811 (fax) Berkeley, CA 94720-4760 -- ThinkGeek and WIRED's GeekDad team up for the Ultimate GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the lucky parental unit. See the prize list and enter to win: http://p.sf.net/sfu/thinkgeek-promo ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] postgres tuning?
Yes, it's ext3. On 6/4/10 7:24 PM, Jon Schewe wrote: Which filesystem are you on too? I've found that ext3 is significantly faster than ext4 and xfs. On 06/04/2010 04:01 PM, Stephen Thompson wrote: Thanks, yes it is Linux. I will look at those limits settings. And yes, I've built indexes and analyze (nothing to vacuum yet since it's a fresh import). Stephen On 06/04/2010 12:16 PM, Alan Brown wrote: On Fri, 4 Jun 2010, Stephen Thompson wrote: Correction: I didn't notice the 8k per unit settings at first with postgres 8.1. Should read: effective_cache_size = 786432# 6Gb Assuming this is linux, you need to tweak /etc/sysctl/limits.conf a little: postgres softmemlock unlimited postgres hardmemlock unlimited @postgres hardmemlock unlimited @postgres softmemlock unlimited bacula softmemlock unlimited bacula hardmemlock unlimited @bacula softmemlock unlimited @bacula hardmemlock unlimited postgres softrss unlimited postgres hardrss unlimited Don't forget to build the indexes and run analyse/vacuum commands. So far I'm finding Postgres is far more forgiving than MySQL and has far fewer parts to tune... On 06/04/2010 10:58 AM, Stephen Thompson wrote: Hello everyone, We recently attempted a mysql to postgresql migration for our bacula 5.0.2 server. The data migration itself was successful, however we are disappointly either getting the same or significantly worse performance out of the postgres db. I was hoping that someone might have some insight into this. Here is some background: software: centos 5.5 (64bit) bacula 5.0.2 (64bit) postgresql 8.1.21 (64bit) (previously... mysql-5.0.77 (64bit) MyISAM) database: select count(*) from File --1,439,626,558 du -sk /var/lib/pgsql/data --346,236,136 /var/lib/pgsql/data hardware: 1Tb EXT3 external fibre-RAID storage 8Gb RAM 2Gb SWAP 2 dual-core [AMD Opteron(tm) Processor 2220] CPUs Some of the postgres tuning that I've attempted thus far (comments are either default or alternatively settings I've tried without effect): #shared_buffers = 1000# min 16 or max_connections*2, 8KB each shared_buffers = 262144 # 2Gb #work_mem = 1024# min 64, size in KB work_mem = 524288 # 512Mb #maintenance_work_mem = 16384 # min 1024, size in KB maintenance_work_mem = 2097152 # 2Gb #checkpoint_segments = 3 # in logfile segments, min 1, 16MB each checkpoint_segments = 16 #checkpoint_warning = 30# in seconds, 0 is off checkpoint_warning = 16 #effective_cache_size = 1000# typically 8KB each #effective_cache_size = 262144 # 256Mb effective_cache_size = 6291456 # 6Gb #random_page_cost = 4 # units are one sequential page fetch cost random_page_cost = 2 Now, as to what I'm 'seeing'. Building restore trees are on par with my previous mysql db, but what I'm seeing as significantly worse are: mysql postgresql Within Bat: 1) Version Browser (large sample job) 3min 9min 2) Restore Tree (average sample job) 40sec25sec 3) Restore Tree (large sample job)10min 8.5min 2) Jobs Run (1000 Records)10sec 2min Within psql/mysql: 1) select count(*) from File; 1sec30min Catalog dump: 1) mysqldump/pgdump2hrs 3hrs I get a win on building Restore trees, but everywhere else, it's painfully slow. It makes the bat utility virtually unusable as an interface. Why the win (albeit moderate) in some cases but terrible responses in others? I admit that I am not familiar with postgres at all, but I tried to walk through some of the postgres tuning documents, including the notes in the bacula manual to arrive at the above settings. Also note that I've tried several variants on the configuration above (including the postgres defaults), don't have a detailed play by play of the results, but the time results above seemed typical regardless of what settings I tweaked. Any help would be greatly appreciated! Stephen -- Stephen Thompson Berkeley Seismological Laboratory step...@seismo.berkeley.edu215 McCone Hall # 4760 404.538.7077 (phone) University of California, Berkeley 510.643.5811 (fax) Berkeley, CA 94720-4760 -- ThinkGeek and WIRED's GeekDad team up for the Ultimate GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the lucky parental unit. See the prize list and enter to win: http://p.sf.net/sfu
Re: [Bacula-users] postgres tuning?
On 06/07/2010 12:33 AM, Julien Cigar wrote: Stephen Thompson wrote: Hello everyone, We recently attempted a mysql to postgresql migration for our bacula 5.0.2 server. The data migration itself was successful, however we are disappointly either getting the same or significantly worse performance out of the postgres db. I was hoping that someone might have some insight into this. Here is some background: software: centos 5.5 (64bit) bacula 5.0.2 (64bit) postgresql 8.1.21 (64bit) Why 8.1 ..? 8.1 is more than 5 years old ... Yes, Alan Brown answered this for me, but, yeah, it's a restriction based on our policy to use Centos (5.5.) packages which is at postgresql 8.1. It might be worth trying a compiled version of the latest release to at least be able to compare and possibly make an argument for a non-Centos package in this case. (previously... mysql-5.0.77 (64bit) MyISAM) database: select count(*) from File -- 1,439,626,558 du -sk /var/lib/pgsql/data -- 346,236,136 /var/lib/pgsql/data hardware: 1Tb EXT3 external fibre-RAID storage 8Gb RAM 2Gb SWAP 2 dual-core [AMD Opteron(tm) Processor 2220] CPUs Some of the postgres tuning that I've attempted thus far (comments are either default or alternatively settings I've tried without effect): #shared_buffers = 1000 # min 16 or max_connections*2, 8KB each shared_buffers = 262144 # 2Gb This is too large, set shared_buffers to something like 256-512 MB I can try that. The postgres tuning documents I'd read said to try 1/4 the RAM as a starting point, which is how I arrived at 2Gb. #work_mem = 1024 # min 64, size in KB work_mem = 524288 # 512Mb Don't forget that work_mem is allocated *per-operation* (maybe several times per query). 512 MB seems too large for me Thanks, I can try reducing this and the shared_buffers. #maintenance_work_mem = 16384 # min 1024, size in KB maintenance_work_mem = 2097152 # 2Gb #checkpoint_segments = 3 # in logfile segments, min 1, 16MB each checkpoint_segments = 16 #checkpoint_warning = 30 # in seconds, 0 is off checkpoint_warning = 16 #effective_cache_size = 1000 # typically 8KB each #effective_cache_size = 262144 # 256Mb effective_cache_size = 6291456 # 6Gb 6GB seems OK to me #random_page_cost = 4 # units are one sequential page fetch cost random_page_cost = 2 only reduce random_page_cost if you have fast disks (SAS, ...) Thanks, I'll try putting this back to the default of 4. Now, as to what I'm 'seeing'. Building restore trees are on par with my previous mysql db, but what I'm seeing as significantly worse are: mysql postgresql Within Bat: 1) Version Browser (large sample job) 3min 9min 2) Restore Tree (average sample job) 40sec 25sec 3) Restore Tree (large sample job) 10min 8.5min 2) Jobs Run (1000 Records) 10sec 2min Within psql/mysql: 1) select count(*) from File; 1sec 30min Catalog dump: 1) mysqldump/pgdump 2hrs 3hrs I get a win on building Restore trees, but everywhere else, it's painfully slow. It makes the bat utility virtually unusable as an interface. Why the win (albeit moderate) in some cases but terrible responses in others? I admit that I am not familiar with postgres at all, but I tried to walk through some of the postgres tuning documents, including the notes in the bacula manual to arrive at the above settings. Also note that I've tried several variants on the configuration above (including the postgres defaults), don't have a detailed play by play of the results, but the time results above seemed typical regardless of what settings I tweaked. Any help would be greatly appreciated! Stephen It doesn't sound like I'm doing anything egregiously wrong. I am still surprised at how slow postgres is compared to mysql on the same hardware after all I've read and heard about postgres superiority. Don't get me wrong, I understand it's strengths, but for an application like Bacula, it doesn't seem like many of it's features are really needed, and if it runs more slowly... I may very well continue to run with mysql which is rather disappointing. thanks! Stephen -- Stephen Thompson Berkeley Seismological Laboratory step...@seismo.berkeley.edu215 McCone Hall # 4760 404.538.7077 (phone) University of California, Berkeley 510.643.5811 (fax) Berkeley, CA 94720-4760 -- ThinkGeek and WIRED's GeekDad team up for the Ultimate GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the lucky parental unit. See the prize list and enter to win: http://p.sf.net/sfu/thinkgeek-promo ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] postgres tuning?
On 06/07/2010 09:24 AM, Florian Heigl wrote: Hi all, Within psql/mysql: 1) select count(*) from File;1sec30min Disclaimer: I don't know a dime's worth of databases per se. But I spend a lot of time hunting other peoples performance issues. :) I think you should start identifying the cause for this bit at the very first, as it shows the absolutely worst perfomance and probably what slows this also slows the rest. My nose says this is really smelly and should'nt even take as long without any indexes. I totally agree. This was my first cause for concern when after the data import, I wanted to check that I still had the same number of rows. Can you verify whether the system swaps during your test commands or doesn't? No, it does not swap, in this case or any other one I've tested. sar -dp 100 1 | grep -v nodev will do nicely with the external storage. sdc1 is the database partition (including log and temp files)... 11:05:21 AM DEV tps rd_sec/s wr_sec/s avgrq-sz avgqu-sz await svctm %util 11:07:01 AM sdd449.95 111886.88 4.32248.67 1.54 3.41 2.11 94.85 11:07:01 AM sdd1449.95 111886.88 4.32248.67 1.54 3.41 2.11 94.85 Average: DEV tps rd_sec/s wr_sec/s avgrq-sz avgqu-sz await svctm %util Average: sdd449.95 111886.88 4.32248.67 1.54 3.41 2.11 94.85 Average: sdd1449.95 111886.88 4.32248.67 1.54 3.41 2.11 94.85 also can you please let us know the iowait times (also from sar) and promise it's not a Raid5 array you have in use? (but, admittedly it doesn't explain for a big difference between mysql and postgresql) Yes, it is RAID5, it was the only place I could go to get external space, with internal being inadequate. However, you point out that this isn't a fundamental problem as mysql performance was mostly ok. The different must be in the postgres config (or postgres itself). Also, I don't know if I would value RedHat supporting postgre 8.1 higher than running 8.4.1 :) Florian thanks! Stephen -- Stephen Thompson Berkeley Seismological Laboratory step...@seismo.berkeley.edu215 McCone Hall # 4760 404.538.7077 (phone) University of California, Berkeley 510.643.5811 (fax) Berkeley, CA 94720-4760 -- ThinkGeek and WIRED's GeekDad team up for the Ultimate GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the lucky parental unit. See the prize list and enter to win: http://p.sf.net/sfu/thinkgeek-promo ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] postgres tuning?
On 06/07/2010 12:17 PM, Julien wrote: On Mon, 2010-06-07 at 18:24 +0200, Florian Heigl wrote: Hi all, Within psql/mysql: 1) select count(*) from File;1sec30min Is your MySQL database in the MyISAM format ? If yes, then it's perfectly normal. Yes. So it sounds like the row count is a red herring in saying there's a performance problem with postgres. I'll try to ignore that result then and concentrate on the queries like the one that produces the jobs run in the bat console. That and the version browser are where I'm seeing my worst results. Disclaimer: I don't know a dime's worth of databases per se. But I spend a lot of time hunting other peoples performance issues. :) I think you should start identifying the cause for this bit at the very first, as it shows the absolutely worst perfomance and probably what slows this also slows the rest. My nose says this is really smelly and should'nt even take as long without any indexes. Can you verify whether the system swaps during your test commands or doesn't? sar -dp 100 1 | grep -v nodev will do nicely with the external storage. also can you please let us know the iowait times (also from sar) and promise it's not a Raid5 array you have in use? (but, admittedly it doesn't explain for a big difference between mysql and postgresql) Also, I don't know if I would value RedHat supporting postgre 8.1 higher than running 8.4.1 :) Florian -- Stephen Thompson Berkeley Seismological Laboratory step...@seismo.berkeley.edu215 McCone Hall # 4760 404.538.7077 (phone) University of California, Berkeley 510.643.5811 (fax) Berkeley, CA 94720-4760 -- ThinkGeek and WIRED's GeekDad team up for the Ultimate GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the lucky parental unit. See the prize list and enter to win: http://p.sf.net/sfu/thinkgeek-promo ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] 5.0.1 infinite email loop bug??
After running for 3 months without this problem, it happened again last night. We are running 5.0.2 at this point. Stephen On 04/15/2010 10:25 AM, Stephen Thompson wrote: Hello, I have just now experienced a possible new bug with bacula 5.0.1. The symptoms are this: bacula-sd crashes bacula-dir continues to run bacula-dir then spews out identical Intervention needed emails until manually restarted The first time this happened over a weekend and upon returning I found my inbox has about 120,000 bacula emails, all the SAME and of this type: 15-Apr 10:02 client-fd JobId 11: Fatal error: backup.c:1048 Network send error to SD. ERR=Broken pipe It happened again just now (second time since upgrading from 3.0.3 to 5.0.1) and I managed to stop the director with only a few thousand emails going out. So there are really 2 issues here: 1) Why does the director apparently get stuck in an infinite loop of sending the same email message? Is this a known bug? 2) Regarding the SD, I received one alert of this type, the rest like the above: 15-Apr 10:02 server-sd: ERROR in lock.c:268 Failed ASSERT: dev-blocked() A traceback like: -- ptrace: Operation not permitted. /var/bacula/work/29091: No such file or directory. $1 = 0 /opt/bacula-5.0.1/scripts/btraceback.gdb:2: Error in sourced command file: No symbol exename in current context. -- And a bactrace like: -- Attempt to dump current JCRs JCR=0x19a24888 JobId=10 name=client_1.2010-04-14_18.02.33_41 JobStatus=l use_count=1 JobType=B JobLevel=F sched_time=14-Apr-2010 21:35 start_time=14-Apr-2010 21:35 end_time=31-Dec-1969 16:00 wait_time=31-Dec-1969 16:00 db=(nil) db_batch=(nil) batch_started=0 JCR=0x1981b248 JobId=11 name=client_10.2010-04-14_20.00.15_04 JobStatus=R use_count=1 JobType=B JobLevel=I sched_time=15-Apr-2010 09:15 start_time=15-Apr-2010 09:15 end_time=31-Dec-1969 16:00 wait_time=31-Dec-1969 16:00 db=(nil) db_batch=(nil) batch_started=0 Attempt to dump plugins. Hook count=0 -- Both clients and server seem healthy, except for the SD crash. Any ideas? thanks! Stephen - Further info: My catalog... mysql-5.0.77 (64bit) MyISAM 210Gb in size 1,412,297,215 records in File table note: database built with bacula 2x scripts, upgraded with 3x scripts, then again with 5x scripts (i.e. nothing customized along the way) My OS hardware for bacula DIR+SD server... Centos 5.4 (fully patched) 8Gb RAM 2Gb Swap 1Tb EXT3 filesystem on external fiber RAID5 array (dedicated to database, incl. temp files) 2 dual-core [AMD Opteron(tm) Processor 2220] CPUs StorageTek SL500 Library with 2 LTO3 Drives -- Download Intel#174; Parallel Studio Eval Try the new software tools for yourself. Speed compiling, find bugs proactively, and fine-tune applications for parallel performance. See why Intel Parallel Studio got high marks during beta. http://p.sf.net/sfu/intel-sw-dev ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users -- Stephen Thompson Berkeley Seismological Laboratory step...@seismo.berkeley.edu215 McCone Hall # 4760 404.538.7077 (phone) University of California, Berkeley 510.643.5811 (fax) Berkeley, CA 94720-4760 -- The Palm PDK Hot Apps Program offers developers who use the Plug-In Development Kit to bring their C/C++ apps to Palm for a share of $1 Million in cash or HP Products. Visit us here for more details: http://p.sf.net/sfu/dev2dev-palm ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
[Bacula-users] 5.2.1 bat missing Browse Cataloged Files?
Hey all, Recently upgraded to 5.2.1, things mostly running well. I notice in BAT that the Browse Cataloged Files icon in the toolbar is greyed out and there is now a bRestore Page. Has the Browse Cataloged Files feature been depreciated (I use it a lot), and if so, why is the icon not entirely absent. If not, is there something that I could have down in my build to accidentally excluded that feature? thanks! Stephen -- Stephen Thompson Berkeley Seismological Laboratory step...@seismo.berkeley.edu215 McCone Hall # 4760 404.538.7077 (phone) University of California, Berkeley 510.643.5811 (fax) Berkeley, CA 94720-4760 -- RSA(R) Conference 2012 Save $700 by Nov 18 Register now http://p.sf.net/sfu/rsa-sfdev2dev1 ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] 5.2.1 bat missing Browse Cataloged Files?
Thank you. I had not seen that. I'll explore the bRestore feature, the screen layout is a bit non-intuitive, but the functionality looks promising. Stephen On 11/3/11 7:23 PM, John Drescher wrote: On Thu, Nov 3, 2011 at 12:59 PM, Stephen Thompson step...@seismo.berkeley.edu wrote: Hey all, Recently upgraded to 5.2.1, things mostly running well. I notice in BAT that the Browse Cataloged Files icon in the toolbar is greyed out and there is now a bRestore Page. Has the Browse Cataloged Files feature been depreciated (I use it a lot), and if so, why is the icon not entirely absent. If not, is there something that I could have down in my build to accidentally excluded that feature? I found this in the release notes - The old bat version browser has been turned off since it does not work correctly and the brestore panel provides the same functionality http://voxel.dl.sourceforge.net/project/bacula/Win32_64/5.2.1/ReleaseNotes John -- Stephen Thompson Berkeley Seismological Laboratory step...@seismo.berkeley.edu215 McCone Hall # 4760 404.538.7077 (phone) University of California, Berkeley 510.643.5811 (fax) Berkeley, CA 94720-4760 -- RSA(R) Conference 2012 Save $700 by Nov 18 Register now http://p.sf.net/sfu/rsa-sfdev2dev1 ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
[Bacula-users] possible 5.2.2 bug (incrementals being promoted to fulls)
FYI Not sure if anyone's seen or reported this, but I upgraded from 5.2.1 to 5.2.2 yesterday and during my backups last night, several jobs were promoted from Incremental to Full, even though their job configurations had not changed and they did have a valid Full backup from last week. I have never seen this happen before with bacula in general or my configuration in particular, so I thought it might be possible that a bug was introduced into 5.2.2. thanks, Stephen -- Stephen Thompson Berkeley Seismological Laboratory step...@seismo.berkeley.edu215 McCone Hall # 4760 404.538.7077 (phone) University of California, Berkeley 510.643.5811 (fax) Berkeley, CA 94720-4760 -- All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity, and more. Splunk takes this data and makes sense of it. IT sense. And common sense. http://p.sf.net/sfu/splunk-novd2d ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] [Bacula-devel] possible 5.2.2 bug (incrementals being promoted to fulls)
I agree, it's unlikely a 'new' bug, but rather the restarting of my director during the upgrade that caused the problem to exhibit itself. Here is what happened in more detail. A week before the upgrade/director restart, the conf files for a significant number of jobs (~100) were changed and a reload of the director, and on the day they were changed we manually ran Fulls for each modified job, which were successfully completed as Fulls. Then on subsequent evenings (every day for a week) scheduled Incrementals ran successfully, and as Incrementals. When we upgrade to 5.2.2, we of course stopped our old director and started up the new one. That evening the 12 out of the ~100 jobs mentioned above had their scheduled Incrementals promoted to Fulls, and yes the message in the log says: No prior or suitable Full backup found in catalog. Doing FULL backup. However, this is not actually the case. There is a successful FULL a week old for each of the 12 jobs that were promoted, and the other jobs in the ~100 that were changed did not promote to a FULL. The dates on the conf files shows that they have not changed since the Full backups were made a week ago. Again, we've been using bacula for years now, have some degree of expertise with it, and we've never seen this before. Very strange... Stephen On 11/30/11 12:48 PM, Kern Sibbald wrote: Hello, Most likely you edited the .conf file and modified the FileSet. If that is the case, listing all the FileSets recorded in the database will show multiple copies of the FileSet record with different hashes. In most cases, other than changing the FileSet, Bacula clearly indicates why it is upgrading a level. In the case of a FileSet change, it prints a notice saying something like a valid Full could not be found. The probability that there is a new bug introduced between 5.2.1 and 5.2.2 is probably about 0.0001% since there were very few coding changes except for bug fixes. Regards, Kern On 11/30/2011 07:55 PM, Stephen Thompson wrote: FYI Not sure if anyone's seen or reported this, but I upgraded from 5.2.1 to 5.2.2 yesterday and during my backups last night, several jobs were promoted from Incremental to Full, even though their job configurations had not changed and they did have a valid Full backup from last week. I have never seen this happen before with bacula in general or my configuration in particular, so I thought it might be possible that a bug was introduced into 5.2.2. thanks, Stephen -- Stephen Thompson Berkeley Seismological Laboratory step...@seismo.berkeley.edu215 McCone Hall # 4760 404.538.7077 (phone) University of California, Berkeley 510.643.5811 (fax) Berkeley, CA 94720-4760 -- All the data continuously generated in your IT infrastructure contains a definitive record of customers, application performance, security threats, fraudulent activity, and more. Splunk takes this data and makes sense of it. IT sense. And common sense. http://p.sf.net/sfu/splunk-novd2d ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
[Bacula-users] Catalog backup while job running?
Hello, We were wondering if anyone using bacula had come up with a creative way to backup their Catalog. We understand the basic dilemna -- that one should not backup a database that is in use, because it's not a coherent view. Currently we've managed to keep our filesets and jobs small enough that we're able to run jobs (a few full jobs along with an incremental of all jobs) in under 24 hours and then (per the bacula manual) backup the Catalog afterwards when nothing is running. However, we're faced with some new jobs that may take 4 days to complete. We have multiple tape drives, so we can run a long-running job on one drive while we use another for nightly Incrementals without contention. But, we would also like to get nightly backup of the Catalog that reflects the changes introduced by the nightly Incrementals. So, my question is whether anyone had any ideas about the feasibility of getting a backup of the Catalog while a single long-running job is active? This could be in-band (database dump) or out-of-band (copy of database directory on filesystem or slave database server taken offline). We are using MySQL, but would not be opposed to switching to PostGRES if it buys us anything in this regard. What I wonder specifically (in creating my own solution) is: 1) If I backup the MySQL database directory, or sync to a slave server and create a dump from that, am I simply putting the active long-running job records at risk of being incoherent, or am I risking the integrity of the whole Catalog in doing so? 2) If I attempt a dump of the MySQL catalog and lock the tables while doing so, what will the results be to the active long-running job? Will it crap out or simply pause and wait for database access when it needs to read/write to the database? And if so, how long will it wait? thanks for reading, Stephen -- Stephen Thompson Berkeley Seismological Laboratory step...@seismo.berkeley.edu215 McCone Hall # 4760 404.538.7077 (phone) University of California, Berkeley 510.643.5811 (fax) Berkeley, CA 94720-4760 -- Try before you buy = See our experts in action! The most comprehensive online learning library for Microsoft developers is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3, Metro Style Apps, more. Free future releases when you subscribe now! http://p.sf.net/sfu/learndevnow-dev2 ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Catalog backup while job running?
On 02/06/2012 02:45 PM, Phil Stracchino wrote: On 02/06/2012 05:02 PM, Stephen Thompson wrote: So, my question is whether anyone had any ideas about the feasibility of getting a backup of the Catalog while a single long-running job is active? This could be in-band (database dump) or out-of-band (copy of database directory on filesystem or slave database server taken offline). We are using MySQL, but would not be opposed to switching to PostGRES if it buys us anything in this regard. What I wonder specifically (in creating my own solution) is: 1) If I backup the MySQL database directory, or sync to a slave server and create a dump from that, am I simply putting the active long-running job records at risk of being incoherent, or am I risking the integrity of the whole Catalog in doing so? 2) If I attempt a dump of the MySQL catalog and lock the tables while doing so, what will the results be to the active long-running job? Will it crap out or simply pause and wait for database access when it needs to read/write to the database? And if so, how long will it wait? Stephen, Three suggestions here. Route 1: Set up a replication slave and perform your backups from the slave. If the slave falls behind the master while you're dumping the DB, you don't really care all that much. It doesn't impact your production DB. This was one of my ideas to try, though I'm still wondering -- If my slave does fall behind my production while dumping DB, because a long-running job is active during the dump, will that dump of the DB simply be missing information about that running job, or will anything else in the Catalog be affected? Because ultimately, if I need to restore my Catalog from backup, I want to be able to search and restore from all completed jobs (the acceptable omission being the job running during the dump, because it wasn't complete at the time!) as well as continue to run future backup jobs as normal with that restored Catalog. Route 2: If you're not using InnoDB in MySQL, you should be by now. So look into the --skip-opt and --single-transaction options to mysqldump to dump all of the transactional tables consistently without locking them. Your grant tables will still need a read lock, but hey, you weren't planning on rewriting your grant tables every day, were you...? Thanks, I look into this. Without the locks, but dumping while a job is running, this still begs the question above -- Am I just putting the data associated with the running job (concurrent to the dump) at risk, or is there any risk that my Catalog will go screwy in a more broad fashion. For instance, a counter that's not incremented ...or some row that's written upon job completion that, since the dump was made before the 'long-running' job completed, causing more general mayhem than just missing records for the uncompleted job. I don't know the database layout, or logic of what's written to the database and when, to understand what risk I am at with Route 1 or Route 2. Route 3: Look into an alternate DB backup solution like mydumper or Percona XtraBackup. Route 4: Do you have the option of taking a snapshot of your MySQL datadir and backing up the snapshot? This can be viable if you have a small DB and fast copy-on-write snapshots. (It's the technique I'm using at the moment, though I'm considering a switch to mydumper.) Nope, not an option. thanks! Stephen -- Stephen Thompson Berkeley Seismological Laboratory step...@seismo.berkeley.edu215 McCone Hall # 4760 404.538.7077 (phone) University of California, Berkeley 510.643.5811 (fax) Berkeley, CA 94720-4760 -- Try before you buy = See our experts in action! The most comprehensive online learning library for Microsoft developers is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3, Metro Style Apps, more. Free future releases when you subscribe now! http://p.sf.net/sfu/learndevnow-dev2 ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
[Bacula-users] new jobs have to wait for despooling to finish?
Hello, I was wondering if anyone could confirm what I've noticed on my own instance of bacula, which seems contrary to the Bacula manual. From DataSpooling section: If you are running multiple simultaneous jobs, Bacula will continue spooling other jobs while one is despooling to tape, provided there is sufficient spool file space. This seems to be true, only if the jobs in question were launched at the same time/concurrently. New jobs launched while a job is despooling, are launched into a running state, but they do not begin to spool until the existing job(s) finish despooling. This is very sad, because I just came into a windfall of spool space and I was hoping to run jobs back to back, such that while one set of jobs were despooling, I could have the next set spooling, and so on. I see an old bug 0001231 with a similar issue, which in the history it is pointed out that it may not be that new jobs can't spool while existing jobs despool, but that the new jobs cannot verify that they will have tape access, which is a step before spooling begins. I wonder if this is the state of affairs and if there are any plans to improve upon this inefficiency. thanks! Stephen -- Stephen Thompson Berkeley Seismological Laboratory step...@seismo.berkeley.edu215 McCone Hall # 4760 404.538.7077 (phone) University of California, Berkeley 510.643.5811 (fax) Berkeley, CA 94720-4760 -- This SF email is sponsosred by: Try Windows Azure free for 90 days Click Here http://p.sf.net/sfu/sfd2d-msazure ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
[Bacula-users] bacula table b2123??
Hello all, I notice that I have a table called b2123 that hasn't been written to in months (Nov 4 2011) and does not appear to be one of the tables created during install. Is this some kind of temp table that I can go ahead and drop? It looks like the File table only much much smaller. thanks! Stephen -- Stephen Thompson Berkeley Seismological Laboratory step...@seismo.berkeley.edu215 McCone Hall # 4760 404.538.7077 (phone) University of California, Berkeley 510.643.5811 (fax) Berkeley, CA 94720-4760 -- This SF email is sponsosred by: Try Windows Azure free for 90 days Click Here http://p.sf.net/sfu/sfd2d-msazure ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] bacula table b2123??
On 03/25/2012 11:36 AM, Stephen Thompson wrote: Hello all, I notice that I have a table called b2123 that hasn't been written to in months (Nov 4 2011) and does not appear to be one of the tables created during install. Is this some kind of temp table that I can go ahead and drop? It looks like the File table only much much smaller. thanks! Stephen To answer my own question, this appears to be a table left around by a bat that crashed. Solution, drop table b2123. thanks, Stephen -- Stephen Thompson Berkeley Seismological Laboratory step...@seismo.berkeley.edu215 McCone Hall # 4760 404.538.7077 (phone) University of California, Berkeley 510.643.5811 (fax) Berkeley, CA 94720-4760 -- This SF email is sponsosred by: Try Windows Azure free for 90 days Click Here http://p.sf.net/sfu/sfd2d-msazure ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Catalog backup while job running?
On 02/06/2012 02:45 PM, Phil Stracchino wrote: On 02/06/2012 05:02 PM, Stephen Thompson wrote: So, my question is whether anyone had any ideas about the feasibility of getting a backup of the Catalog while a single long-running job is active? This could be in-band (database dump) or out-of-band (copy of database directory on filesystem or slave database server taken offline). We are using MySQL, but would not be opposed to switching to PostGRES if it buys us anything in this regard. What I wonder specifically (in creating my own solution) is: 1) If I backup the MySQL database directory, or sync to a slave server and create a dump from that, am I simply putting the active long-running job records at risk of being incoherent, or am I risking the integrity of the whole Catalog in doing so? 2) If I attempt a dump of the MySQL catalog and lock the tables while doing so, what will the results be to the active long-running job? Will it crap out or simply pause and wait for database access when it needs to read/write to the database? And if so, how long will it wait? Stephen, Three suggestions here. Route 1: Set up a replication slave and perform your backups from the slave. If the slave falls behind the master while you're dumping the DB, you don't really care all that much. It doesn't impact your production DB. Route 2: If you're not using InnoDB in MySQL, you should be by now. So look into the --skip-opt and --single-transaction options to mysqldump to dump all of the transactional tables consistently without locking them. Your grant tables will still need a read lock, but hey, you weren't planning on rewriting your grant tables every day, were you...? Well, we've made the leap from MyISAM to InnoDB, seems like we win on transactions, but lose on read speed. That aside, I'm seeing something unexpected. I am now able to successfully run jobs while I use mysqldump to dump the bacula Catalog, except at the very end of the dump there is some sort of contention. A few of my jobs (3-4 out of 150) that are attempting to despool attritbutes at the tail end of the dump yield this error: Fatal error: sql_create.c:860 Fill File table Query failed: INSERT INTO File (FileIndex, JobId, PathId, FilenameId, LStat, MD5, DeltaSeq) SELECT batch.FileIndex, batch.JobId, Path.PathId, Filename.FilenameId,batch.LStat, batch.MD5, batch.DeltaSeq FROM batch JOIN Path ON (batch.Path = Path.Path) JOIN Filename ON (batch.Name = Filename.Name): ERR=Lock wait timeout exceeded; try restarting transaction I have successful jobs before and after this 'end of the dump' timeframe. It looks like I might be able to fix this by increasing my innodb_lock_wait_timeout, but I'd like to understand WHY I need to icnrease it. Anyone know what's happening at the end of a dump like this that would cause the above error? mysqldump -f --opt --skip-lock-tables --single-transaction bacula bacula.sql Is it the commit on this 'dump' transaction? thanks! Stephen Route 3: Look into an alternate DB backup solution like mydumper or Percona XtraBackup. Route 4: Do you have the option of taking a snapshot of your MySQL datadir and backing up the snapshot? This can be viable if you have a small DB and fast copy-on-write snapshots. (It's the technique I'm using at the moment, though I'm considering a switch to mydumper.) -- Stephen Thompson Berkeley Seismological Laboratory step...@seismo.berkeley.edu215 McCone Hall # 4760 404.538.7077 (phone) University of California, Berkeley 510.643.5811 (fax) Berkeley, CA 94720-4760 -- This SF email is sponsosred by: Try Windows Azure free for 90 days Click Here http://p.sf.net/sfu/sfd2d-msazure ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Catalog backup while job running?
First off, thanks for the response Phil. On 04/02/2012 01:11 PM, Phil Stracchino wrote: On 04/02/2012 01:49 PM, Stephen Thompson wrote: Well, we've made the leap from MyISAM to InnoDB, seems like we win on transactions, but lose on read speed. If you're finding InnoDB slower than MyISAM on reads, your InnoDB buffer pool is probably too small. This is probably true, but I have limited system resources and my File table is almost 300Gb large. That aside, I'm seeing something unexpected. I am now able to successfully run jobs while I use mysqldump to dump the bacula Catalog, except at the very end of the dump there is some sort of contention. A few of my jobs (3-4 out of 150) that are attempting to despool attritbutes at the tail end of the dump yield this error: Fatal error: sql_create.c:860 Fill File table Query failed: INSERT INTO File (FileIndex, JobId, PathId, FilenameId, LStat, MD5, DeltaSeq) SELECT batch.FileIndex, batch.JobId, Path.PathId, Filename.FilenameId,batch.LStat, batch.MD5, batch.DeltaSeq FROM batch JOIN Path ON (batch.Path = Path.Path) JOIN Filename ON (batch.Name = Filename.Name): ERR=Lock wait timeout exceeded; try restarting transaction I have successful jobs before and after this 'end of the dump' timeframe. It looks like I might be able to fix this by increasing my innodb_lock_wait_timeout, but I'd like to understand WHY I need to icnrease it. Anyone know what's happening at the end of a dump like this that would cause the above error? mysqldump -f --opt --skip-lock-tables --single-transaction bacula bacula.sql Is it the commit on this 'dump' transaction? --skip-lock-tables is referred to in the mysqldump documentation, but isn't actually a valid option. This is actually an increasingly horrible problem with mysqldump. It has been very poorly maintained, and has barely developed at all in ten or fifteen years. This has me confused. I have jobs that can run, and insert records into the File table, while I am dumping the Catalog. It's only at the tail-end that a few jobs get the error above. Wouldn't a locked File table cause all concurrent jobs to fail? Table locks are the default behavior of mysqldump, as part of the default --opt group. To override it, you actually have to use --skip-opt, than add back in the rest of the options from the --opt group that you actually wanted. There is *no way* to get mysqldump to Do The Right Thing for both transactional and non-transactional tables in the same run. it is simply not possible. My suggestion would be to look at mydumper instead. It has been written by a couple of former MySQL AB support engineers who started with a clean sheet of paper, and it is what mysqldump should have become ten years ago. It dumps tables in parallel, doesn't require exclusion of schemas that shouldn't be dumped because it knows they shouldn't be dumped, doesn't require long strings of arguments to tell it how to correctly handle transactional and non-transactional tables because it understands both and just Does The Right Thing on a table-by-table basis, can dump tables in parallel for better speed, can dump binlogs as well as tables, separates the data from the schemas... Give it a try. Thanks, I'll take a look at it. That said, I make my MySQL dump job a lower priority job and run it only after all other jobs have completed. This makes sure I get the most current possible data in my catalog dump. I just recently switched to a revised MySQL backup job that uses mydumper with the following simple shell script as a ClientRunBeforeJob on a separate host from the actual DB server. (Thus, if the backup client goes down, I still have the live DB, and if the DB server goes down, I still have the DB backups on disk.) #!/bin/bash RETAIN=5 USER=xx PASS=xx DUMPDIR=/dbdumps HOST=babylon4 PORT=6446 TIMEOUT=300 FMT='%Y%m%d-%T' DEST=${DUMPDIR}/${HOST}-$(date +${FMT}) for dir in $(ls -r ${DUMPDIR} | tail -n +${RETAIN}) do echo Deleting ${DUMPDIR}/${dir} rm -rf ${DUMPDIR}/${dir} done mydumper -Cce -h ${HOST} -p ${PORT} -u ${USER} --password=${PASS} -o ${DEST} -l ${TIMEOUT} Then my Bacula fileset for the DB-backup job just backs up the entire /db-dumps directory. -- Stephen Thompson Berkeley Seismological Laboratory step...@seismo.berkeley.edu215 McCone Hall # 4760 404.538.7077 (phone) University of California, Berkeley 510.643.5811 (fax) Berkeley, CA 94720-4760 -- Better than sec? Nothing is better than sec when it comes to monitoring Big Data applications. Try Boundary one-second resolution app monitoring today. Free. http://p.sf.net/sfu/Boundary-dev2dev ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula
Re: [Bacula-users] Catalog backup while job running?
On 4/3/12 3:28 AM, Martin Simmons wrote: On Mon, 02 Apr 2012 15:06:31 -0700, Stephen Thompson said: That aside, I'm seeing something unexpected. I am now able to successfully run jobs while I use mysqldump to dump the bacula Catalog, except at the very end of the dump there is some sort of contention. A few of my jobs (3-4 out of 150) that are attempting to despool attritbutes at the tail end of the dump yield this error: Fatal error: sql_create.c:860 Fill File table Query failed: INSERT INTO File (FileIndex, JobId, PathId, FilenameId, LStat, MD5, DeltaSeq) SELECT batch.FileIndex, batch.JobId, Path.PathId, Filename.FilenameId,batch.LStat, batch.MD5, batch.DeltaSeq FROM batch JOIN Path ON (batch.Path = Path.Path) JOIN Filename ON (batch.Name = Filename.Name): ERR=Lock wait timeout exceeded; try restarting transaction I have successful jobs before and after this 'end of the dump' timeframe. It looks like I might be able to fix this by increasing my innodb_lock_wait_timeout, but I'd like to understand WHY I need to icnrease it. Anyone know what's happening at the end of a dump like this that would cause the above error? mysqldump -f --opt --skip-lock-tables --single-transaction bacula bacula.sql Is it the commit on this 'dump' transaction? --skip-lock-tables is referred to in the mysqldump documentation, but isn't actually a valid option. This is actually an increasingly horrible problem with mysqldump. It has been very poorly maintained, and has barely developed at all in ten or fifteen years. This has me confused. I have jobs that can run, and insert records into the File table, while I am dumping the Catalog. It's only at the tail-end that a few jobs get the error above. Wouldn't a locked File table cause all concurrent jobs to fail? Are you sure that jobs are inserting records into the File table whilst they are running? With spooling, file records are not inserted until the end of the job. Likewise, in batch mode (as above), the File table is only updated once at the end. Yes, I have completed jobs before and after the problem jobs (which aren't always the same jobs, or happen at the same time, except that they seem to correlate with the end of the Catalog dump, which could also be the end of the File table dump, since it's 99% of the db). I can view the inserted records from jobs that complete while the Catalog dump is running. And I am spooling, so jobs are inserting all attrs at the end of the job. The jobs with the errors are clearly moving their records from the batch file to the File table at the conclusion of their run. I have never seen this before moving to InnoDB, but of course, I moved to InnoDB to be able to run my Catalog dump concurrently with jobs (knowing I won't capture the records from the running jobs). So at this point, I'm not sure if I'm getting the error because of something happening at the end of the dump, or if it's merely a 'collision' of jobs all wanting to insert batch records at the same time. I know that the Innodb engine has a lock wait timeout default of 50s, but I'm not sure who this was handled with MyISAM where I never saw this problem (but again, also, never ran my jobs concurrently with dump). Stephen __Martin -- Better than sec? Nothing is better than sec when it comes to monitoring Big Data applications. Try Boundary one-second resolution app monitoring today. Free. http://p.sf.net/sfu/Boundary-dev2dev ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users -- Stephen Thompson Berkeley Seismological Laboratory step...@seismo.berkeley.edu215 McCone Hall # 4760 404.538.7077 (phone) University of California, Berkeley 510.643.5811 (fax) Berkeley, CA 94720-4760 -- Better than sec? Nothing is better than sec when it comes to monitoring Big Data applications. Try Boundary one-second resolution app monitoring today. Free. http://p.sf.net/sfu/Boundary-dev2dev ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Catalog backup while job running?
On 4/2/12 3:33 PM, Phil Stracchino wrote: On 04/02/2012 06:06 PM, Stephen Thompson wrote: First off, thanks for the response Phil. On 04/02/2012 01:11 PM, Phil Stracchino wrote: On 04/02/2012 01:49 PM, Stephen Thompson wrote: Well, we've made the leap from MyISAM to InnoDB, seems like we win on transactions, but lose on read speed. If you're finding InnoDB slower than MyISAM on reads, your InnoDB buffer pool is probably too small. This is probably true, but I have limited system resources and my File table is almost 300Gb large. Ah, well, sometimes there's only so much you can allocate. --skip-lock-tables is referred to in the mysqldump documentation, but isn't actually a valid option. This is actually an increasingly horrible problem with mysqldump. It has been very poorly maintained, and has barely developed at all in ten or fifteen years. This has me confused. I have jobs that can run, and insert records into the File table, while I am dumping the Catalog. It's only at the tail-end that a few jobs get the error above. Wouldn't a locked File table cause all concurrent jobs to fail? Hmm. I stand corrected. I've never seen it listed as an option in the man page, despite there being one reference to it, but I see that mysqldump --help does explain it even though the man page doesn't. In that case, the only thing I can think of is that you have multiple jobs trying to insert attributes at the same time and the last ones in line are timing out. (Locking the table for batch attribute insertion actually isn't necessary; MySQL can be configured to interleave auto_increment inserts. However, that's the way Bacula does it.) Don't know that I have any helpful suggestions there, then... sorry. Thanks again for the response, just bouncing this issue off someone is of help. You idea about the jobs simply running into contention for locks sounds reasonable, though I never saw this happening with MyISAM (in the 3+ years we've run bacula, and I see it the 2nd night into running InnoDB). If so, I wouldn't mind estimating the maximum time my jobs might have to wait for a lock, based on their size and concurrency, but I really hate just tweaking settings in the DB without knowing why I'm doing so, you know. I'd like to get to the bottom of what's causing the timeout. thanks, Stephen -- Stephen Thompson Berkeley Seismological Laboratory step...@seismo.berkeley.edu215 McCone Hall # 4760 404.538.7077 (phone) University of California, Berkeley 510.643.5811 (fax) Berkeley, CA 94720-4760 -- Better than sec? Nothing is better than sec when it comes to monitoring Big Data applications. Try Boundary one-second resolution app monitoring today. Free. http://p.sf.net/sfu/Boundary-dev2dev ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Catalog backup while job running?
On 04/03/2012 08:43 AM, Phil Stracchino wrote: Stephen, by the way, if you're not already aware of it: You probably want to set innodb_flush_log_at_trx_commit = 0. The default value of this setting is 1, which causes the log buffer to be written out to the lgo file and the logfile flushed to disk at every transaction commit. (Which obviously has a performance impact.) With a setting of 0, nothing is done at transaction commit, but the log buffer is written to the log file and the log file flushed to disk once per second. There is a potential with this setting that up to the last full second of transactions can be list in the event of a mysqld crash, but ... if mysqld crashes in the middle of Bacula inserting attributes, that job is blown *anyway*, so there's really no loss. This is an interesting suggestion. I wonder if it's possible since I'm running the dump as a single transaction if my database is becoming unavailable during this flush, such that the 50 second timeout for the locks the jobs are requesting is surpassed. I would expect writes to a database to require more flushing than read (i.e. a dump), but I wonder if this could explain the jobs failing at the tail-end of the dump. I also suggest innodb_autoinc_lock_mode = 2, which allows InnoDB to interleave auto_increment inserts. This may possibly help with your locking problem. Keep in mind though that if you use this setting and you have replication running, your binlog_format must be set to MIXED or ROW. -- Stephen Thompson Berkeley Seismological Laboratory step...@seismo.berkeley.edu215 McCone Hall # 4760 404.538.7077 (phone) University of California, Berkeley 510.643.5811 (fax) Berkeley, CA 94720-4760 -- Better than sec? Nothing is better than sec when it comes to monitoring Big Data applications. Try Boundary one-second resolution app monitoring today. Free. http://p.sf.net/sfu/Boundary-dev2dev ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Catalog backup while job running?
On 04/02/2012 03:33 PM, Phil Stracchino wrote: On 04/02/2012 06:06 PM, Stephen Thompson wrote: First off, thanks for the response Phil. On 04/02/2012 01:11 PM, Phil Stracchino wrote: On 04/02/2012 01:49 PM, Stephen Thompson wrote: Well, we've made the leap from MyISAM to InnoDB, seems like we win on transactions, but lose on read speed. If you're finding InnoDB slower than MyISAM on reads, your InnoDB buffer pool is probably too small. This is probably true, but I have limited system resources and my File table is almost 300Gb large. Ah, well, sometimes there's only so much you can allocate. --skip-lock-tables is referred to in the mysqldump documentation, but isn't actually a valid option. This is actually an increasingly horrible problem with mysqldump. It has been very poorly maintained, and has barely developed at all in ten or fifteen years. This has me confused. I have jobs that can run, and insert records into the File table, while I am dumping the Catalog. It's only at the tail-end that a few jobs get the error above. Wouldn't a locked File table cause all concurrent jobs to fail? Hmm. I stand corrected. I've never seen it listed as an option in the man page, despite there being one reference to it, but I see that mysqldump --help does explain it even though the man page doesn't. In that case, the only thing I can think of is that you have multiple jobs trying to insert attributes at the same time and the last ones in line are timing out. This appears to be the root cause. After running a few more nights, the coincidence with the Catalog dump was not maintained. It happens for a few jobs each night, at different times, different jobs, and sometimes when no Catalog dump is occurring. I think it's simply that a bunch of batch inserts wind up running at the same time and the last in line run out of time. Rather than setting my timeout arbitrarily large (10 minutes did not solve the problem), I am curious about what you say below. (Locking the table for batch attribute insertion actually isn't necessary; MySQL can be configured to interleave auto_increment inserts. However, that's the way Bacula does it.) Are you saying that if I turn on auto_increment inserts in MySQL, it won't matter whether or not bacula is asking for locks during batch inserts? Or does bacula also need to be configured (patched) not to use locks during batch inserts? And lastly, why does the bacula documentation claim that locks are 'essential' for batch inserts and you claim they are not? I'm surprised more folks running mysql InnoDB and bacula aren't having this problem since I stumbled upon it so easily. :) Perhaps the trend is MySQL MyISAM -- Postgres. Don't know that I have any helpful suggestions there, then... sorry. thanks! Stephen -- Stephen Thompson Berkeley Seismological Laboratory step...@seismo.berkeley.edu215 McCone Hall # 4760 404.538.7077 (phone) University of California, Berkeley 510.643.5811 (fax) Berkeley, CA 94720-4760 -- Better than sec? Nothing is better than sec when it comes to monitoring Big Data applications. Try Boundary one-second resolution app monitoring today. Free. http://p.sf.net/sfu/Boundary-dev2dev ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Bacula MySQL Catalog binlog restore
On 04/05/2012 02:27 PM, Joe Nyland wrote: Hi, I've been using Bacula for a while now and I have a backup procedure in place for my MySQL databases, where I perform a full (dump) backup nightly, then incremental (bin log) backups every hour through the day to capture changes. I basically have a script which I have written which is run as a 'RunBeforeJob' from backup and runs either a mysqldump if the backup level is full, or flushes the bin logs if the level is incremental. I'm in the process of performing some test restores from these backups, as I would like to know the procedure is working correctly. I have no issue restoring the files from Bacula, however I'm having some issues restoring my catalog MySQL database from the binary logs created by MySQL. Specifically, I am getting messages like: ERROR 1146 (42S02) at line 105: Table 'bacula.batch' doesn't exist when I try to replay my log files against the database after it's been restore from the dump file. As far as I know the batch table is a temporary table created when inserting file attributes into the catalog during/after a backup job. I would have hoped, however, the creation of this table would have been included in either my database/earlier in my bin log. I believe this may be related to another thread on the list at the moment titled Catalog backup while job running? as this is, in effect what I am doing - a full database dump whilst other jobs are running, but my reason for creating a new thread is that I am not getting any errors in my backup jobs, as the OP of the other thread is - I'm simply having issues rebuilding my database after restoring the said full dump. I would like to know if anyone is currently backing up their catalog database in such a way, and if so how they are overcoming this issue when restoring. My reason for backing up my catalog using binary logging is so that I can perform a point-in-time recovery of the catalog, should I loose it. I am not running a catalog backup in that way, but have thought about it. You're correct that the batch tables are temporary tables created so that jobs can do batch inserts of the file attributes. I did run into a similar problem to yours when I had a MySQL slave server out of sync with the master. The slave (much like your restore) was reading through binlogs to catch up and ran into a line that referred to a batch table, which didn't exist. In my case, it didn't exist because the slave never saw an earlier line that created the temporary batch table. I would imagine something similar is going on with your restore, where you are not actually applying all the changes since the Full dump (or did not capture all the changes since the Full dump), because somewhere you should have a line in your binlogs that create the batch table before other lines refer to and try to use it. Also, keep in mind that theses temporary batch tables are owned by threads, so if you start looking through your binlogs, you'll see many references to bacula.batch, but they are not all referring to the same table. Each thread is able to have it's own bacula.batch table. Stephen Any input anyone can offer would be greatly appreciated. Thanks, Joe -- Better than sec? Nothing is better than sec when it comes to monitoring Big Data applications. Try Boundary one-second resolution app monitoring today. Free. http://p.sf.net/sfu/Boundary-dev2dev ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users -- Stephen Thompson Berkeley Seismological Laboratory step...@seismo.berkeley.edu215 McCone Hall # 4760 404.538.7077 (phone) University of California, Berkeley 510.643.5811 (fax) Berkeley, CA 94720-4760 -- Better than sec? Nothing is better than sec when it comes to monitoring Big Data applications. Try Boundary one-second resolution app monitoring today. Free. http://p.sf.net/sfu/Boundary-dev2dev ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Bacula MySQL Catalog binlog restore
On 04/05/2012 03:19 PM, Joe Nyland wrote: On 5 Apr 2012, at 22:37, Stephen Thompson wrote: On 04/05/2012 02:27 PM, Joe Nyland wrote: Hi, I've been using Bacula for a while now and I have a backup procedure in place for my MySQL databases, where I perform a full (dump) backup nightly, then incremental (bin log) backups every hour through the day to capture changes. I basically have a script which I have written which is run as a 'RunBeforeJob' from backup and runs either a mysqldump if the backup level is full, or flushes the bin logs if the level is incremental. I'm in the process of performing some test restores from these backups, as I would like to know the procedure is working correctly. I have no issue restoring the files from Bacula, however I'm having some issues restoring my catalog MySQL database from the binary logs created by MySQL. Specifically, I am getting messages like: ERROR 1146 (42S02) at line 105: Table 'bacula.batch' doesn't exist when I try to replay my log files against the database after it's been restore from the dump file. As far as I know the batch table is a temporary table created when inserting file attributes into the catalog during/after a backup job. I would have hoped, however, the creation of this table would have been included in either my database/earlier in my bin log. I believe this may be related to another thread on the list at the moment titled Catalog backup while job running? as this is, in effect what I am doing - a full database dump whilst other jobs are running, but my reason for creating a new thread is that I am not getting any errors in my backup jobs, as the OP of the other thread is - I'm simply having issues rebuilding my database after restoring the said full dump. I would like to know if anyone is currently backing up their catalog database in such a way, and if so how they are overcoming this issue when restoring. My reason for backing up my catalog using binary logging is so that I can perform a point-in-time recovery of the catalog, should I loose it. I am not running a catalog backup in that way, but have thought about it. You're correct that the batch tables are temporary tables created so that jobs can do batch inserts of the file attributes. I did run into a similar problem to yours when I had a MySQL slave server out of sync with the master. The slave (much like your restore) was reading through binlogs to catch up and ran into a line that referred to a batch table, which didn't exist. In my case, it didn't exist because the slave never saw an earlier line that created the temporary batch table. I would imagine something similar is going on with your restore, where you are not actually applying all the changes since the Full dump (or did not capture all the changes since the Full dump), because somewhere you should have a line in your binlogs that create the batch table before other lines refer to and try to use it. Also, keep in mind that theses temporary batch tables are owned by threads, so if you start looking through your binlogs, you'll see many references to bacula.batch, but they are not all referring to the same table. Each thread is able to have it's own bacula.batch table. Stephen Any input anyone can offer would be greatly appreciated. Thanks, Joe -- Stephen Thompson Berkeley Seismological Laboratory step...@seismo.berkeley.edu215 McCone Hall # 4760 404.538.7077 (phone) University of California, Berkeley 510.643.5811 (fax) Berkeley, CA 94720-4760 Hi Stephen, Thank you very much for your reply. I agree that it seems the creation of the batch table is not being captured, for some reason. As I think it may be useful, here's the line taken from my MySQL 'RunBeforeJob' script when the full backup is taken: mysqldump --all-databases --single-transaction --delete-master-logs --flush-logs --master-data --opt -u ${DBUSER} -p${DBPASS} ${DST}/${HOST}_${DATE}_${TIME}.sql.dmp Can you spot anything there which could cause the creation of this/these temporary tables to not be included in the bin log? I've spent a while getting this list of options right and I'm not 100% sure I've got the correct combination, but it's possible I've missed something here. Sorry, I don't think I can be much help here. I'm wrangling with mysqldump myself at the moment since I moved from MyISAM tables to InnoDB and the documentation is very poor. Are you using InnoDB... If not, I'm not sure why --single-transaction is there, and if so, I wonder if it shouldn't come after --opt. The options order matter and since --opt is the default, having it at the end of your line is only resetting anything you change earlier in the line back to the --opt defaults. Stephen Thanks, Joe
Re: [Bacula-users] Bacula MySQL Catalog binlog restore
On 04/10/2012 07:51 AM, Joe Nyland wrote: -Original message- From: Joe Nylandj...@joenyland.co.uk Sent: Fri 06-04-2012 22:15 Subject: Re: [Bacula-users] Bacula MySQL Catalog binlog restore To: Bacula Usersbacula-users@lists.sourceforge.net; On 6 Apr 2012, at 00:08, Phil Stracchino wrote: On 04/05/2012 06:46 PM, Stephen Thompson wrote: On 04/05/2012 03:19 PM, Joe Nyland wrote: As I think it may be useful, here's the line taken from my MySQL 'RunBeforeJob' script when the full backup is taken: mysqldump --all-databases --single-transaction --delete-master-logs --flush-logs --master-data --opt -u ${DBUSER} -p${DBPASS} ${DST}/${HOST}_${DATE}_${TIME}.sql.dmp Can you spot anything there which could cause the creation of this/these temporary tables to not be included in the bin log? I've spent a while getting this list of options right and I'm not 100% sure I've got the correct combination, but it's possible I've missed something here. Sorry, I don't think I can be much help here. I'm wrangling with mysqldump myself at the moment since I moved from MyISAM tables to InnoDB and the documentation is very poor. Are you using InnoDB... If not, I'm not sure why --single-transaction is there, and if so, I wonder if it shouldn't come after --opt. The options order matter and since --opt is the default, having it at the end of your line is only resetting anything you change earlier in the line back to the --opt defaults. Since --opt is the default, there's no reason to ever explicitly specify it at all in the first place. And as we just discussed the other day, --single-transaction is ineffective without either --skip-lock-tables, or --skip-opt and adding back in the stuff from --opt that you want. -- Phil Stracchino, CDK#2 DoD#299792458 ICBM: 43.5607, -71.355 ala...@caerllewys.net ala...@metrocast.net p...@co.ordinate.org Renaissance Man, Unix ronin, Perl hacker, SQL wrangler, Free Stater It's not the years, it's the mileage. Thank you all for you input. Following your advice, I've now changed my mysqldump line in my script to: mysqldump --all-databases -u ${DBUSER} -p${DBPASS} --flush-logs --master-data=1 --delete-master-logs --opt ${DST}/${HOST}_${DATE}_${TIME}.sql.dmp Re-reading the mysqldump reference manual (yet again!) I'm starting to wonder whether the '--delete-master-logs' option is causing some important transactions to be lost from the binary logs, which is the reason why the temporary table creation statements mentioned above are missing from my log file. My theory is that during the dump of the database, the temporary tables are created, then the dump finishes and deletes the binary logs, therefore removing any log of the temporary tables being created in the first place. Does that sound feasible? Thanks, Joe I'm a bit ashamed to admit I'm still battling this! I've removed '--delete-master-logs' from my mysqldump line, but it hasn't helped. For some reason, it seems as if the dump does not contain any mention of the temporary tables being created, neither do the binary logs, however there are statements which refer to bacula.batch, as if it should be there. Could it be that these statements refer to a bacula.batch table which was created by another thread prior to the mysql dump being created? ...and that's why the CREATE TEMPORARY TABLE bacula.batch statement is not in the binary logs after the full backup. Surely, if this were the case, the bacula.batch table sowuld be included in the dump would they not? My fear is that because I am restoring binary logs, the binary log restores will be running under their own threads (after the main dump file had been restored) and thus will be unable to access temporary tables created by any other previous threads - making what I am trying to achieve impossible. I know this is becoming a little OT as it's largely to do with mysqldump and binary logging, but I hope someone can help. Any ideas how to overcome this? I wonder if you're running the backup while other jobs are running? If nothing else is running, then the dump shouldn't miss any of the temp tables, because there will be none during the dump. If you run it concurrently, consider this: Rather than blasting away your binlogs, keep them around for longer than the interval between your backups (i.e. keep them for at least 2 days if you dump every day). Then backup ALL binlogs when you do the incremental. Then if you need to restore, you should be able to intentionally go back farther in time in the binlogs, before the dump, and start syncing from there WITH errors temporarily disabled (or at least duplicate entry errors). This might/should let the import skip over stuff that the dump has already restored, but catch the stuff that it missed, like temp tables. Problem is you're likely to not know WHEN to start in the logs, though you
[Bacula-users] Fatal error: askdir.c:339 NULL Volume name. This shouldn't happen!!!
hello, Anyone run into this error before? We hadn't until we upgraded our bacula server from Centos 5.8 to Redhat 6.2, after which we of course had to recompile bacula. However, we used the same source, version, and options, the exception being that we added readline for improved bconsole functionality. Now every couple of days we have jobs error out like this: 21-May 20:04 SD JobId 236699: Fatal error: askdir.c:339 NULL Volume name. This shouldn't happen!!! 21-May 22:02 DIR JobId 236711: Fatal error: Catalog error updating Media record. sql_update.c:411 Update failed: affected_rows=0 for UPDATE Media SET VolJobs=0,VolFiles=0,VolBlocks=0,VolBytes=0,VolMounts=0,VolErrors=0,VolWrites=0,MaxVolBytes=0,VolStatus='',Slot=0,InChanger=0,VolReadTime=0,VolWriteTime=0,VolParts=0,LabelType=0,StorageId=0,PoolId=0,VolRetention=0,VolUseDuration=0,MaxVolJobs=0,MaxVolFiles=0,Enabled=0,LocationId=0,ScratchPoolId=0,RecyclePoolId=0,RecycleCount=0,Recycle=0,ActionOnPurge=0 WHERE VolumeName='' 23-May 22:02 SD JobId 237069: Fatal error: askdir.c:339 NULL Volume name. This shouldn't happen!!! There is nothing new or strange about our volumes, nothing in DB with null values. My only idea, which is sheer speculation is that we have in the past had some strange behaviours around tape contention, like a set of jobs running with a storage daemon with two drives, and a job being assigned to one drive will want the tape that is in use by another job on the other drive. That happened pretty rarely, though I was wondering if this might perhaps be the new outcome of that contention. Again, sheer speculation as I have nothing but the errors above to go on at the moment. thanks! Stephen -- Stephen Thompson Berkeley Seismological Laboratory step...@seismo.berkeley.edu215 McCone Hall # 4760 404.538.7077 (phone) University of California, Berkeley 510.643.5811 (fax) Berkeley, CA 94720-4760 -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Config of bacula-dir.conf for two drive with one autochanger
4. Storage { 5. Name = titi1 6. Address = kraken# N.B. Use a fully qualified name here 7. SDPort = 9103 8. Password = *** # password for Storage daemon 9. 10. *Device = Drive-1 * # must be same as Device in Storage daemon-- ??? 11. Media Type = LTO-5 # must be same as MediaType in Storage daemon 12. Autochanger = yes # enable for Autochanger device 13. } 14. 15. Storage { 16. Name = titi2 17. Address = kraken# N.B. Use a fully qualified name here 18. SDPort = 9103 19. Password = *** # password for Storage daemon 20. *Device = Drive-2 * # must be same as Device in Storage daemon-- ??? 21. Media Type = LTO-5 # must be same as MediaType in Storage daemon 22. Autochanger = yes # enable for Autochanger device 23. } 24. [...] -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users -- Stephen Thompson Berkeley Seismological Laboratory step...@seismo.berkeley.edu215 McCone Hall # 4760 404.538.7077 (phone) University of California, Berkeley 510.643.5811 (fax) Berkeley, CA 94720-4760 -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Fatal error: askdir.c:339 NULL Volume name. This shouldn't happen!!!
This (fingers crossed) may have been fixed with 5.2.9 which we upgraded to last week. It hasn't quite been long enough for me to be convinced the problem won't return, but I'm hopeful. Stephen On 5/24/12 7:08 AM, Stephen Thompson wrote: hello, Anyone run into this error before? We hadn't until we upgraded our bacula server from Centos 5.8 to Redhat 6.2, after which we of course had to recompile bacula. However, we used the same source, version, and options, the exception being that we added readline for improved bconsole functionality. Now every couple of days we have jobs error out like this: 21-May 20:04 SD JobId 236699: Fatal error: askdir.c:339 NULL Volume name. This shouldn't happen!!! 21-May 22:02 DIR JobId 236711: Fatal error: Catalog error updating Media record. sql_update.c:411 Update failed: affected_rows=0 for UPDATE Media SET VolJobs=0,VolFiles=0,VolBlocks=0,VolBytes=0,VolMounts=0,VolErrors=0,VolWrites=0,MaxVolBytes=0,VolStatus='',Slot=0,InChanger=0,VolReadTime=0,VolWriteTime=0,VolParts=0,LabelType=0,StorageId=0,PoolId=0,VolRetention=0,VolUseDuration=0,MaxVolJobs=0,MaxVolFiles=0,Enabled=0,LocationId=0,ScratchPoolId=0,RecyclePoolId=0,RecycleCount=0,Recycle=0,ActionOnPurge=0 WHERE VolumeName='' 23-May 22:02 SD JobId 237069: Fatal error: askdir.c:339 NULL Volume name. This shouldn't happen!!! There is nothing new or strange about our volumes, nothing in DB with null values. My only idea, which is sheer speculation is that we have in the past had some strange behaviours around tape contention, like a set of jobs running with a storage daemon with two drives, and a job being assigned to one drive will want the tape that is in use by another job on the other drive. That happened pretty rarely, though I was wondering if this might perhaps be the new outcome of that contention. Again, sheer speculation as I have nothing but the errors above to go on at the moment. thanks! Stephen -- Stephen Thompson Berkeley Seismological Laboratory step...@seismo.berkeley.edu215 McCone Hall # 4760 404.538.7077 (phone) University of California, Berkeley 510.643.5811 (fax) Berkeley, CA 94720-4760 -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Fatal error: askdir.c:339 NULL Volume name. This shouldn't happen!!!
Well, since we upgraded to 5.2.9 we have not seen the problem. Also when running 5.2.6 we were seeing it 2-3 times a week, during which we run hundreds of incrementals and several fulls per day. The error happened both with fulls and incrementals (which we have in two different LTO3 libraries). There was nothing amiss with our catalog or volumes, or at least nothing obvious. The error occurred when attempting to use different volumes (mostly previously used ones, including recycled), but those same volume were successful for other jobs that attempted to use them. Lastly, it wasn't reproducible, like I said it happened 2-3 time out of several hundred jobs, but it was happening over the course of a month or two while we ran 5.2.6 on RedHat 6.2. Here was our config for 5.2.6 PATH=/usr/lib64/qt4/bin:$PATH BHOME=/home/bacula EMAIL=bac...@seismo.berkeley.edu env CFLAGS='-g -O2' \ ./configure \ --prefix=$BHOME \ --sbindir=$BHOME/bin \ --sysconfdir=$BHOME/conf \ --with-working-dir=$BHOME/work \ --with-bsrdir=$BHOME/log \ --with-logdir=$BHOME/log \ --with-pid-dir=/var/run \ --with-subsys-dir=/var/run \ --with-dump-email=$EMAIL \ --with-job-email=$EMAIL \ --with-mysql \ --with-dir-user=bacula \ --with-dir-group=bacula \ --with-sd-user=bacula \ --with-sd-group=bacula \ --with-openssl \ --with-tcp-wrappers \ --enable-smartalloc \ --with-readline=/usr/include/readline \ --disable-conio \ --enable-bat \ | tee configure.out On 6/20/12 7:23 AM, Igor Blazevic wrote: On 18.06.2012 16:26, Stephen Thompson wrote: hello, Hello:) Anyone run into this error before? We hadn't until we upgraded our bacula server from Centos 5.8 to Redhat 6.2, after which we of course had to recompile bacula. However, we used the same source, version, and options, the exception being that we added readline for improved bconsole functionality. Can you post your config options, please? I've compiled versions 5.0.3 and 5.2.6 on RHEL 6.2 with following options: CFLAGS=-g -Wall ./configure \ --sysconfdir=/etc/bacula \ --with-dir-user=bacula \ --with-dir-group=bacula \ --with-sd-user=bacula \ --with-sd-group=bacula \ --with-fd-user=root \ --with-fd-group=root \ --with-dir-password=somepasswd \ --with-fd-password=somepasswd \ --with-sd-password=somepasswd \ --with-mon-dir-password=somepasswd \ --with-mon-fd-password=somepasswd \ --with-mon-sd-password=somepasswd \ --with-working-dir=/var/lib/bacula \ --with-scriptdir=/etc/bacula/scripts \ --with-smtp-host=localhost \ --with-subsys-dir=/var/lib/bacula/lock/subsys \ --with-pid-dir=/var/lib/bacula/run \ --enable-largefile \ --disable-tray-monitor \ --enable-build-dird \ --enable-build-stored \ --with-openssl \ --with-tcp-wrappers \ --with-python \ --enable-smartalloc \ --with-x \ --enable-bat \ --disable-libtool \ --with-postgresql \ --with-readline=/usr/include/readline \ --disable-conio and can atest that everything works just fine although I only used NEW volumes with it. Maybe there is something amiss with your catalog or volume media? -- Igor Blažević -- Stephen Thompson Berkeley Seismological Laboratory step...@seismo.berkeley.edu215 McCone Hall # 4760 404.538.7077 (phone) University of California, Berkeley 510.643.5811 (fax) Berkeley, CA 94720-4760 -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Fatal error: askdir.c:339 NULL Volume name. This shouldn't happen!!!
Update. We have seen the problem 2-3 times this past month running 5.2.9 on Redhat 6.2, much less frequent than before but still there. Stephen On 6/20/12 7:40 AM, Stephen Thompson wrote: Well, since we upgraded to 5.2.9 we have not seen the problem. Also when running 5.2.6 we were seeing it 2-3 times a week, during which we run hundreds of incrementals and several fulls per day. The error happened both with fulls and incrementals (which we have in two different LTO3 libraries). There was nothing amiss with our catalog or volumes, or at least nothing obvious. The error occurred when attempting to use different volumes (mostly previously used ones, including recycled), but those same volume were successful for other jobs that attempted to use them. Lastly, it wasn't reproducible, like I said it happened 2-3 time out of several hundred jobs, but it was happening over the course of a month or two while we ran 5.2.6 on RedHat 6.2. Here was our config for 5.2.6 PATH=/usr/lib64/qt4/bin:$PATH BHOME=/home/bacula EMAIL=bac...@seismo.berkeley.edu env CFLAGS='-g -O2' \ ./configure \ --prefix=$BHOME \ --sbindir=$BHOME/bin \ --sysconfdir=$BHOME/conf \ --with-working-dir=$BHOME/work \ --with-bsrdir=$BHOME/log \ --with-logdir=$BHOME/log \ --with-pid-dir=/var/run \ --with-subsys-dir=/var/run \ --with-dump-email=$EMAIL \ --with-job-email=$EMAIL \ --with-mysql \ --with-dir-user=bacula \ --with-dir-group=bacula \ --with-sd-user=bacula \ --with-sd-group=bacula \ --with-openssl \ --with-tcp-wrappers \ --enable-smartalloc \ --with-readline=/usr/include/readline \ --disable-conio \ --enable-bat \ | tee configure.out On 6/20/12 7:23 AM, Igor Blazevic wrote: On 18.06.2012 16:26, Stephen Thompson wrote: hello, Hello:) Anyone run into this error before? We hadn't until we upgraded our bacula server from Centos 5.8 to Redhat 6.2, after which we of course had to recompile bacula. However, we used the same source, version, and options, the exception being that we added readline for improved bconsole functionality. Can you post your config options, please? I've compiled versions 5.0.3 and 5.2.6 on RHEL 6.2 with following options: CFLAGS=-g -Wall ./configure \ --sysconfdir=/etc/bacula \ --with-dir-user=bacula \ --with-dir-group=bacula \ --with-sd-user=bacula \ --with-sd-group=bacula \ --with-fd-user=root \ --with-fd-group=root \ --with-dir-password=somepasswd \ --with-fd-password=somepasswd \ --with-sd-password=somepasswd \ --with-mon-dir-password=somepasswd \ --with-mon-fd-password=somepasswd \ --with-mon-sd-password=somepasswd \ --with-working-dir=/var/lib/bacula \ --with-scriptdir=/etc/bacula/scripts \ --with-smtp-host=localhost \ --with-subsys-dir=/var/lib/bacula/lock/subsys \ --with-pid-dir=/var/lib/bacula/run \ --enable-largefile \ --disable-tray-monitor \ --enable-build-dird \ --enable-build-stored \ --with-openssl \ --with-tcp-wrappers \ --with-python \ --enable-smartalloc \ --with-x \ --enable-bat \ --disable-libtool \ --with-postgresql \ --with-readline=/usr/include/readline \ --disable-conio and can atest that everything works just fine although I only used NEW volumes with it. Maybe there is something amiss with your catalog or volume media? -- Igor Blažević -- Stephen Thompson Berkeley Seismological Laboratory step...@seismo.berkeley.edu215 McCone Hall # 4760 404.538.7077 (phone) University of California, Berkeley 510.643.5811 (fax) Berkeley, CA 94720-4760 -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
[Bacula-users] bacula jobs use volumes from the wrong pool - bug?
: 3302 Autochanger loaded? drive 0, result: nothing loaded. 02-Jul 21:16 SD JobId 243957: 3304 Issuing autochanger load slot 7, drive 0 command. 02-Jul 21:16 SD JobId 243957: 3305 Autochanger load slot 7, drive 0, status is OK. 02-Jul 21:16 SD JobId 243957: Recycled volume FB0162 on device SL500-Drive-0 (/dev/SL500-Drive-0), all previous data lost. 02-Jul 21:16 SD JobId 243957: New volume FB0162 mounted on device SL500-Drive-0 (/dev/SL500-Drive-0) at 02-Jul-2012 21:16. 03-Jul 00:36 SD JobId 243957: End of Volume FB0162 at 295:4268 on device SL500-Drive-0 (/dev/SL500-Drive-0). Write of 262144 bytes got -1. 03-Jul 00:37 SD JobId 243957: Re-read of last block succeeded. 03-Jul 00:37 SD JobId 243957: End of medium on Volume FB0162 Bytes=591,088,321,536 Blocks=2,254,823 at 03-Jul-2012 00:37. 03-Jul 00:37 SD JobId 243957: 3307 Issuing autochanger unload slot 7, drive 0 command. 03-Jul 00:38 DIR JobId 243957: Recycled volume FB0164 03-Jul 00:38 SD JobId 243957: 3301 Issuing autochanger loaded? drive 0 command. 03-Jul 00:38 SD JobId 243957: 3302 Autochanger loaded? drive 0, result: nothing loaded. 03-Jul 00:38 SD JobId 243957: 3301 Issuing autochanger loaded? drive 0 command. 03-Jul 00:38 SD JobId 243957: 3302 Autochanger loaded? drive 0, result: nothing loaded. 03-Jul 00:38 SD JobId 243957: 3304 Issuing autochanger load slot 9, drive 0 command. 03-Jul 00:39 SD JobId 243957: 3305 Autochanger load slot 9, drive 0, status is OK. 03-Jul 00:39 SD JobId 243957: Recycled volume FB0164 on device SL500-Drive-0 (/dev/SL500-Drive-0), all previous data lost. 03-Jul 00:39 SD JobId 243957: New volume FB0164 mounted on device SL500-Drive-0 (/dev/SL500-Drive-0) at 03-Jul-2012 00:39. 03-Jul 01:08 SD JobId 243957: Despooling elapsed time = 10:58:19, Transfer rate = 52.72 M Bytes/second 03-Jul 01:08 SD JobId 243957: Sending spooled attrs to the Director. Despooling 2,500,634,015 bytes ... 03-Jul 01:31 DIR JobId 243957: Bacula DIR 5.2.9 (11Jun12): Build OS: x86_64-unknown-linux-gnu redhat Enterprise release JobId: 243957 Job:JOB.2012-07-01_20.00.04_03 Backup Level: Full Client: FD 5.2.6 (21Feb12) i386-pc-solaris2.10,solaris,5.10 FileSet:FS 2012-02-01 20:00:23 Pool: Full-Pool (From Job resource) Catalog:MyCatalog (From Client resource) Storage:SL500-changer (From Job resource) Scheduled time: 01-Jul-2012 20:00:04 Start time: 01-Jul-2012 20:04:02 End time: 03-Jul-2012 01:31:35 Elapsed time: 1 day 5 hours 27 mins 33 secs Priority: 10 FD Files Written: 6,915,330 SD Files Written: 6,915,330 FD Bytes Written: 2,080,113,652,089 (2.080 TB) SD Bytes Written: 2,081,613,855,190 (2.081 TB) Rate: 19613.9 KB/s Software Compression: None VSS:no Encryption: no Accurate: yes Volume name(s): IM0094|FB0161|FB0158|FB0162|FB0164 Volume Session Id: 147 Volume Session Time:1340291913 Last Volume Bytes: 96,966,779,904 (96.96 GB) Non-fatal FD errors:0 SD Errors: 0 FD termination status: OK SD termination status: OK Termination:Backup OK 03-Jul 01:31 DIR JobId 243957: Begin pruning Jobs older than 10 years . 03-Jul 01:31 DIR JobId 243957: No Jobs found to prune. 03-Jul 01:31 DIR JobId 243957: Begin pruning Files. 03-Jul 01:31 DIR JobId 243957: No Files found to prune. 03-Jul 01:31 DIR JobId 243957: End auto prune Note: FB0161|FB0158|FB0162|FB0164 are all in Full-Pool, whereas the first tape used in the job, IM0094 is in Incremental-Pool. Anyone have ideas why it would be using a volume from a pool to which the job has not been associated? thanks, Stephen -- Stephen Thompson Berkeley Seismological Laboratory step...@seismo.berkeley.edu215 McCone Hall # 4760 404.538.7077 (phone) University of California, Berkeley 510.643.5811 (fax) Berkeley, CA 94720-4760 -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] bacula jobs use volumes from the wrong pool - bug?
On 07/05/2012 10:46 AM, Stephen Thompson wrote: Hello, Running 5.2.9, though I believe we've seen this sporadically in earlier versions. Jobs are using volumes that are in pools to which they have not been assigned. This is likely a bug as I don't see anything peculiar about our configuration. We are using a tape library with 2 drives, both set to autoselect. The library contains volumes that are properly assigned (i.e. database entries for volumes look fine) to various pools, including a Full pool and an Incremental pool. Twice in the past week, Full jobs which specify the use of the Full pool using jobdefs are using volumes from the Incremental pool. I haven't narrowed down all the details, but I believe it's if the Incremental volume is already loaded in a drive when the Full job in question is launched. Example: 01-Jul 22:00 DIR JobId 244098: Fatal error: JobId 243957 already running. Duplicate job not allowed. 01-Jul 20:03 DIR JobId 243957: Start Backup JobId 243957, Job=JOB.2012-07-01_20.00.04_03 01-Jul 20:04 DIR JobId 243957: Using Device SL500-Drive-0 01-Jul 20:42 SD JobId 243957: Spooling data ... 02-Jul 14:01 SD JobId 243957: Job write elapsed time = 17:19:00, Transfer rate = 33.39 M Bytes/second 02-Jul 14:01 SD JobId 243957: Committing spooled data to Volume FB0161. Despooling 2,082,572,994,002 bytes ... 02-Jul 15:38 SD JobId 243957: End of Volume IM0094 at 462:1559 on device SL500-Drive-0 (/dev/SL500-Drive-0). Write of 262144 bytes got -1. 02-Jul 15:38 SD JobId 243957: Re-read of last block succeeded. 02-Jul 15:38 SD JobId 243957: End of medium on Volume IM0094 Bytes=924,097,352,704 Blocks=3,525,155 at 02-Jul-2012 15:38. 02-Jul 15:38 SD JobId 243957: 3307 Issuing autochanger unload slot 148, drive 0 command. 02-Jul 15:39 SD JobId 243957: 3301 Issuing autochanger loaded? drive 0 command. 02-Jul 15:39 SD JobId 243957: 3302 Autochanger loaded? drive 0, result: nothing loaded. 02-Jul 15:39 SD JobId 243957: 3301 Issuing autochanger loaded? drive 0 command. 02-Jul 15:39 SD JobId 243957: 3302 Autochanger loaded? drive 0, result: nothing loaded. 02-Jul 15:39 SD JobId 243957: 3301 Issuing autochanger loaded? drive 1 command. 02-Jul 15:39 SD JobId 243957: 3302 Autochanger loaded? drive 1, result: nothing loaded. 02-Jul 15:39 SD JobId 243957: 3304 Issuing autochanger load slot 6, drive 0 command. 02-Jul 15:40 SD JobId 243957: 3305 Autochanger load slot 6, drive 0, status is OK. 02-Jul 15:40 SD JobId 243957: Volume FB0161 previously written, moving to end of data. 02-Jul 15:40 SD JobId 243957: Ready to append to end of Volume FB0161 at file=7. 02-Jul 15:40 SD JobId 243957: New volume FB0161 mounted on device SL500-Drive-0 (/dev/SL500-Drive-0) at 02-Jul-2012 15:40. 02-Jul 18:19 SD JobId 243957: End of Volume FB0161 at 274:5411 on device SL500-Drive-0 (/dev/SL500-Drive-0). Write of 262144 bytes got -1. 02-Jul 18:19 SD JobId 243957: Re-read of last block succeeded. 02-Jul 18:19 SD JobId 243957: End of medium on Volume FB0161 Bytes=535,390,861,312 Blocks=2,042,367 at 02-Jul-2012 18:19. 02-Jul 18:19 SD JobId 243957: 3307 Issuing autochanger unload slot 6, drive 0 command. 02-Jul 18:20 SD JobId 243957: 3301 Issuing autochanger loaded? drive 0 command. 02-Jul 18:20 SD JobId 243957: 3302 Autochanger loaded? drive 0, result: nothing loaded. 02-Jul 18:20 SD JobId 243957: 3301 Issuing autochanger loaded? drive 0 command. 02-Jul 18:20 SD JobId 243957: 3302 Autochanger loaded? drive 0, result: nothing loaded. 02-Jul 18:20 SD JobId 243957: 3301 Issuing autochanger loaded? drive 1 command. 02-Jul 18:20 SD JobId 243957: 3302 Autochanger loaded? drive 1, result: nothing loaded. 02-Jul 18:20 SD JobId 243957: 3304 Issuing autochanger load slot 4, drive 0 command. 02-Jul 18:21 SD JobId 243957: 3305 Autochanger load slot 4, drive 0, status is OK. 02-Jul 18:21 SD JobId 243957: Volume FB0158 previously written, moving to end of data. 02-Jul 18:21 SD JobId 243957: Ready to append to end of Volume FB0158 at file=1. 02-Jul 18:21 SD JobId 243957: New volume FB0158 mounted on device SL500-Drive-0 (/dev/SL500-Drive-0) at 02-Jul-2012 18:21. 02-Jul 21:14 SD JobId 243957: End of Volume FB0158 at 274:1785 on device SL500-Drive-0 (/dev/SL500-Drive-0). Write of 262144 bytes got -1. 02-Jul 21:14 SD JobId 243957: Re-read of last block succeeded. 02-Jul 21:14 SD JobId 243957: End of medium on Volume FB0158 Bytes=546,439,698,432 Blocks=2,084,507 at 02-Jul-2012 21:14. 02-Jul 21:14 SD JobId 243957: 3307 Issui02-Jul 22:00 DIR JobId 244271: Fatal error: JobId 244133 already running. Duplicate job not allowed. This line is btw as it is in the log file. Looks a bit mangled, like two lines combined in one line 02-Jul 21:14 SD JobId 243957: 3307 Issui and 02-Jul 22:00 DIR JobId 244271: Fatal error: JobId 244133 already running. Duplicate job not allowed. 02-Jul 22:00 DIR JobId 244268: Fatal error: JobId 243957 already running. Duplicate job
Re: [Bacula-users] bacula jobs use volumes from the wrong pool - bug?
Hello again, Here's something even stranger... Another Full job logs that it's written to a volume in the Full pool (FB0956), but then the status output of the job lists a volume in the Incremental pool (IM0093). This Incremental volume was never even mentioned in the log as a volume to which the job despooled. 22-Jun 20:00 DIR JobId 242323: Start Backup JobId 242323, Job=JOB.2012-06-22_20.00.02_06 22-Jun 20:01 DIR JobId 242323: Using Device SL500-Drive-1 22-Jun 20:06 SD JobId 242323: 3301 Issuing autochanger loaded? drive 1 command. 22-Jun 20:06 SD JobId 242323: 3302 Autochanger loaded? drive 1, result: nothing loaded. 22-Jun 20:06 SD JobId 242323: 3301 Issuing autochanger loaded? drive 1 command. 22-Jun 20:06 SD JobId 242323: 3302 Autochanger loaded? drive 1, result: nothing loaded. 22-Jun 20:06 SD JobId 242323: 3304 Issuing autochanger load slot 138, drive 1 command. 22-Jun 20:07 SD JobId 242323: 3305 Autochanger load slot 138, drive 1, status is OK. 22-Jun 20:07 SD JobId 242323: Volume FB0956 previously written, moving to end of data. 22-Jun 20:08 SD JobId 242323: Ready to append to end of Volume FB0956 at file=4. 22-Jun 20:08 SD JobId 242323: Spooling data ... 23-Jun 00:01 SD JobId 242323: Job write elapsed time = 03:53:01, Transfer rate = 10.80 M Bytes/second 23-Jun 00:01 SD JobId 242323: Committing spooled data to Volume FB0956. Despooling 151,089,481,092 bytes ... 23-Jun 01:28 SD JobId 242323: Despooling elapsed time = 01:27:07, Transfer rate = 28.90 M Bytes/second 23-Jun 01:28 SD JobId 242323: Sending spooled attrs to the Director. Despooling 99,242,108 bytes ... 23-Jun 01:30 DIR JobId 242323: Bacula DIR 5.2.9 (11Jun12): Build OS: x86_64-unknown-linux-gnu redhat Enterprise release JobId: 242323 Job:JOB.2012-06-22_20.00.02_06 Backup Level: Full Client: FD 5.2.6 (21Feb12) x86_64-unknown-linux-gnu,redhat, FileSet:FS 2012-01-22 20:00:03 Pool: Full-Pool-2012-06 (From Job resource) Catalog:MyCatalog (From Client resource) Storage:SL500-changer (From Job resource) Scheduled time: 22-Jun-2012 20:00:02 Start time: 22-Jun-2012 20:01:29 End time: 23-Jun-2012 01:30:15 Elapsed time: 5 hours 28 mins 46 secs Priority: 10 FD Files Written: 337,622 SD Files Written: 337,622 FD Bytes Written: 150,974,593,977 (150.9 GB) SD Bytes Written: 151,023,845,596 (151.0 GB) Rate: 7653.6 KB/s Software Compression: None VSS:no Encryption: no Accurate: yes Volume name(s): IM0093 Volume Session Id: 9 Volume Session Time:1340291913 Last Volume Bytes: 723,076,936,704 (723.0 GB) Non-fatal FD errors:0 SD Errors: 0 FD termination status: OK SD termination status: OK Termination:Backup OK 23-Jun 01:30 DIR JobId 242323: Begin pruning Jobs older than 10 years . 23-Jun 01:30 DIR JobId 242323: No Jobs found to prune. 23-Jun 01:30 DIR JobId 242323: Begin pruning Files. 23-Jun 01:30 DIR JobId 242323: No Files found to prune. 23-Jun 01:30 DIR JobId 242323: End auto prune. Note: FB0956 is in the Full pool, IM0093 in the Incremental. The vast majority of our jobs are being successful, but when something like this happens, I lost all faith that I even have the backups I think I have!!! I just tried a test restore of this particular job, and it did in fact use IM0093 to restore from. Ugh. Stephen On 07/05/2012 10:46 AM, Stephen Thompson wrote: Hello, Running 5.2.9, though I believe we've seen this sporadically in earlier versions. Jobs are using volumes that are in pools to which they have not been assigned. This is likely a bug as I don't see anything peculiar about our configuration. We are using a tape library with 2 drives, both set to autoselect. The library contains volumes that are properly assigned (i.e. database entries for volumes look fine) to various pools, including a Full pool and an Incremental pool. Twice in the past week, Full jobs which specify the use of the Full pool using jobdefs are using volumes from the Incremental pool. I haven't narrowed down all the details, but I believe it's if the Incremental volume is already loaded in a drive when the Full job in question is launched. Example: 01-Jul 22:00 DIR JobId 244098: Fatal error: JobId 243957 already running. Duplicate job not allowed. 01-Jul 20:03 DIR JobId 243957: Start Backup JobId 243957, Job=JOB.2012-07-01_20.00.04_03 01-Jul 20:04 DIR JobId 243957: Using Device SL500-Drive-0 01-Jul 20:42 SD JobId 243957: Spooling data ... 02-Jul 14:01 SD JobId 243957: Job write elapsed time = 17:19:00, Transfer rate = 33.39 M Bytes/second 02-Jul 14:01 SD JobId 243957
Re: [Bacula-users] bacula jobs use volumes from the wrong pool - bug?
On 07/06/2012 11:01 AM, Martin Simmons wrote: On Thu, 05 Jul 2012 11:35:15 -0700, Stephen Thompson said: Hello again, Here's something even stranger... Another Full job logs that it's written to a volume in the Full pool (FB0956), but then the status output of the job lists a volume in the Incremental pool (IM0093). This Incremental volume was never even mentioned in the log as a volume to which the job despooled. It could be a database problem (the volumes listed in the status output come from a query). What is the output of the sql commands below? SELECT VolumeName,JobMedia.* FROM JobMedia,Media WHERE JobMedia.JobId=242323 AND JobMedia.MediaId=Media.MediaId; SELECT MediaId,VolumeName FROM Media WHERE Media.VolumeName in ('IM0093','FB0956'); Looks like it did in fact write to the Incremental tape IM0093 instead of the requested Full tape BUT logged that it wrote to a Full tape FB0956. This begs the questions 1) Why is it writing to a tape in another pool? and 2) Why is logging that it wrote to a different tape than it did? mysql SELECT VolumeName,JobMedia.* FROM JobMedia,Media WHERE JobMedia.JobId=242323 AND JobMedia.MediaId=Media.MediaId; ++++-++---+---+-++--+--+ | VolumeName | JobMediaId | JobId | MediaId | FirstIndex | LastIndex | StartFile | EndFile | StartBlock | EndBlock | VolIndex | ++++-++---+---+-++--+--+ | IM0093 |1946327 | 242323 |1072 | 1 | 429 | 285 | 285 | 4851 | 7628 |1 | | IM0093 |1946330 | 242323 |1072 |429 | 435 | 286 | 286 | 0 | 7628 |2 | | IM0093 |1946332 | 242323 |1072 |435 | 438 | 287 | 287 | 0 | 7628 |3 | | IM0093 |1946334 | 242323 |1072 |438 | 446 | 288 | 288 | 0 | 7628 |4 | | IM0093 |1946338 | 242323 |1072 |446 | 446 | 289 | 289 | 0 | 7628 |5 | | IM0093 |1946344 | 242323 |1072 |446 | 484 | 290 | 290 | 0 | 7628 |6 | | IM0093 |1946347 | 242323 |1072 |484 | 727 | 291 | 291 | 0 | 7628 |7 | | IM0093 |1946351 | 242323 |1072 |727 | 727 | 292 | 292 | 0 | 7628 |8 | | IM0093 |1946357 | 242323 |1072 |727 | 6237 | 293 | 293 | 0 | 7628 |9 | | IM0093 |1946360 | 242323 |1072 | 6237 | 9134 | 294 | 294 | 0 | 7628 | 10 | | IM0093 |1946363 | 242323 |1072 | 9134 | 12816 | 295 | 295 | 0 | 7628 | 11 | | IM0093 |1946368 | 242323 |1072 | 12816 | 12950 | 296 | 296 | 0 | 7628 | 12 | | IM0093 |1946371 | 242323 |1072 | 12950 | 12985 | 297 | 297 | 0 | 7628 | 13 | | IM0093 |1946374 | 242323 |1072 | 12985 | 13140 | 298 | 298 | 0 | 7628 | 14 | | IM0093 |1946378 | 242323 |1072 | 13140 | 13181 | 299 | 299 | 0 | 7628 | 15 | | IM0093 |1946383 | 242323 |1072 | 13181 | 13283 | 300 | 300 | 0 | 7628 | 16 | | IM0093 |1946386 | 242323 |1072 | 13283 | 19855 | 301 | 301 | 0 | 7628 | 17 | | IM0093 |1946390 | 242323 |1072 | 19855 | 26710 | 302 | 302 | 0 | 7628 | 18 | | IM0093 |1946391 | 242323 |1072 | 26710 | 33532 | 303 | 303 | 0 | 7628 | 19 | | IM0093 |1946394 | 242323 |1072 | 33532 | 40378 | 304 | 304 | 0 | 7628 | 20 | | IM0093 |1946397 | 242323 |1072 | 40378 | 47275 | 305 | 305 | 0 | 7628 | 21 | | IM0093 |1946400 | 242323 |1072 | 47275 | 54271 | 306 | 306 | 0 | 7628 | 22 | | IM0093 |1946403 | 242323 |1072 | 54271 | 58872 | 307 | 307 | 0 | 7628 | 23 | | IM0093 |1946406 | 242323 |1072 | 58872 | 58872 | 308 | 308 | 0 | 7628 | 24 | | IM0093 |1946409 | 242323 |1072 | 58872 | 58873 | 309 | 309 | 0 | 7628 | 25 | | IM0093 |1946412 | 242323 |1072 | 58873 | 58873 | 310 | 310 | 0 | 7628 | 26 | | IM0093
Re: [Bacula-users] bacula jobs use volumes from the wrong pool - bug?
On 07/09/12 11:37, Martin Simmons wrote: On Fri, 06 Jul 2012 11:12:35 -0700, Stephen Thompson said: On 07/06/2012 11:01 AM, Martin Simmons wrote: On Thu, 05 Jul 2012 11:35:15 -0700, Stephen Thompson said: Hello again, Here's something even stranger... Another Full job logs that it's written to a volume in the Full pool (FB0956), but then the status output of the job lists a volume in the Incremental pool (IM0093). This Incremental volume was never even mentioned in the log as a volume to which the job despooled. It could be a database problem (the volumes listed in the status output come from a query). What is the output of the sql commands below? SELECT VolumeName,JobMedia.* FROM JobMedia,Media WHERE JobMedia.JobId=242323 AND JobMedia.MediaId=Media.MediaId; SELECT MediaId,VolumeName FROM Media WHERE Media.VolumeName in ('IM0093','FB0956'); Looks like it did in fact write to the Incremental tape IM0093 instead of the requested Full tape BUT logged that it wrote to a Full tape FB0956. This begs the questions 1) Why is it writing to a tape in another pool? and 2) Why is logging that it wrote to a different tape than it did? You could verify that IM0093 contains the data by using bls -j with the tape loaded (but not mounted in Bacula). It looks like you have concurrent jobs (non-consecutive JobMediaId values). Was another job trying to use IM0093? Maybe IM0093 was in another drive and Bacula mixed up the drives somehow? Yes, I believe that FB0956 was in one drive and IM0093 in the other, though I do not understand how bacula 'mixed up' which volume to use, or which drive a particular volume was in. Not sure how closely related this is, but I've seen cases occasionally where bacula will say that it cannot mount a certain volume in Drive0 and requires user intervention, only to find that the volume requested is already mounted and in use in Drive1 by other jobs. So it is possible for bacula either to lose track of which drive a volume is in or to not be sure if a volume is already in use. I did a partial restore of the job and it did in fact load and read off IM0093 successfully. So in some sense I know what happened, I just don't know why it happened or how to prevent it (other than isolating jobs, but that defeats the point of concurrency). Stephen __Martin -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users -- Stephen Thompson Berkeley Seismological Laboratory step...@seismo.berkeley.edu215 McCone Hall # 4760 404.538.7077 (phone) University of California, Berkeley 510.643.5811 (fax) Berkeley, CA 94720-4760 -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] bacula jobs use volumes from the wrong pool - bug?
On 07/10/2012 10:53 AM, Martin Simmons wrote: On Mon, 09 Jul 2012 12:55:14 -0700, Stephen Thompson said: On 07/09/12 11:37, Martin Simmons wrote: On Fri, 06 Jul 2012 11:12:35 -0700, Stephen Thompson said: On 07/06/2012 11:01 AM, Martin Simmons wrote: On Thu, 05 Jul 2012 11:35:15 -0700, Stephen Thompson said: Hello again, Here's something even stranger... Another Full job logs that it's written to a volume in the Full pool (FB0956), but then the status output of the job lists a volume in the Incremental pool (IM0093). This Incremental volume was never even mentioned in the log as a volume to which the job despooled. It could be a database problem (the volumes listed in the status output come from a query). What is the output of the sql commands below? SELECT VolumeName,JobMedia.* FROM JobMedia,Media WHERE JobMedia.JobId=242323 AND JobMedia.MediaId=Media.MediaId; SELECT MediaId,VolumeName FROM Media WHERE Media.VolumeName in ('IM0093','FB0956'); Looks like it did in fact write to the Incremental tape IM0093 instead of the requested Full tape BUT logged that it wrote to a Full tape FB0956. This begs the questions 1) Why is it writing to a tape in another pool? and 2) Why is logging that it wrote to a different tape than it did? You could verify that IM0093 contains the data by using bls -j with the tape loaded (but not mounted in Bacula). It looks like you have concurrent jobs (non-consecutive JobMediaId values). Was another job trying to use IM0093? Maybe IM0093 was in another drive and Bacula mixed up the drives somehow? Yes, I believe that FB0956 was in one drive and IM0093 in the other, though I do not understand how bacula 'mixed up' which volume to use, or which drive a particular volume was in. Not sure how closely related this is, but I've seen cases occasionally where bacula will say that it cannot mount a certain volume in Drive0 and requires user intervention, only to find that the volume requested is already mounted and in use in Drive1 by other jobs. So it is possible for bacula either to lose track of which drive a volume is in or to not be sure if a volume is already in use. I did a partial restore of the job and it did in fact load and read off IM0093 successfully. So in some sense I know what happened, I just don't know why it happened or how to prevent it (other than isolating jobs, but that defeats the point of concurrency). You could try upgrading to 5.2.10. If that doesn't fix it, then reporting it in the bug tracker might be the next step (http://www.bacula.org/en/?page=bugs). Already upgraded. We'll see if it happens again. thanks, Stephen __Martin -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users -- Stephen Thompson Berkeley Seismological Laboratory step...@seismo.berkeley.edu215 McCone Hall # 4760 404.538.7077 (phone) University of California, Berkeley 510.643.5811 (fax) Berkeley, CA 94720-4760 -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Fatal error: askdir.c:339 NULL Volume name. This shouldn't happen!!!
Update. We are still seeing this in 5.2.10 as well. It seems to happen more often towards the beginning of a series of jobs, when a tape is first chosen (i.e. not when a job is directly using a tape that's already been chosen and loaded into a drive by a previous job). Stephen On 7/5/12 7:44 AM, Stephen Thompson wrote: Update. We have seen the problem 2-3 times this past month running 5.2.9 on Redhat 6.2, much less frequent than before but still there. Stephen On 6/20/12 7:40 AM, Stephen Thompson wrote: Well, since we upgraded to 5.2.9 we have not seen the problem. Also when running 5.2.6 we were seeing it 2-3 times a week, during which we run hundreds of incrementals and several fulls per day. The error happened both with fulls and incrementals (which we have in two different LTO3 libraries). There was nothing amiss with our catalog or volumes, or at least nothing obvious. The error occurred when attempting to use different volumes (mostly previously used ones, including recycled), but those same volume were successful for other jobs that attempted to use them. Lastly, it wasn't reproducible, like I said it happened 2-3 time out of several hundred jobs, but it was happening over the course of a month or two while we ran 5.2.6 on RedHat 6.2. Here was our config for 5.2.6 PATH=/usr/lib64/qt4/bin:$PATH BHOME=/home/bacula EMAIL=bac...@seismo.berkeley.edu env CFLAGS='-g -O2' \ ./configure \ --prefix=$BHOME \ --sbindir=$BHOME/bin \ --sysconfdir=$BHOME/conf \ --with-working-dir=$BHOME/work \ --with-bsrdir=$BHOME/log \ --with-logdir=$BHOME/log \ --with-pid-dir=/var/run \ --with-subsys-dir=/var/run \ --with-dump-email=$EMAIL \ --with-job-email=$EMAIL \ --with-mysql \ --with-dir-user=bacula \ --with-dir-group=bacula \ --with-sd-user=bacula \ --with-sd-group=bacula \ --with-openssl \ --with-tcp-wrappers \ --enable-smartalloc \ --with-readline=/usr/include/readline \ --disable-conio \ --enable-bat \ | tee configure.out On 6/20/12 7:23 AM, Igor Blazevic wrote: On 18.06.2012 16:26, Stephen Thompson wrote: hello, Hello:) Anyone run into this error before? We hadn't until we upgraded our bacula server from Centos 5.8 to Redhat 6.2, after which we of course had to recompile bacula. However, we used the same source, version, and options, the exception being that we added readline for improved bconsole functionality. Can you post your config options, please? I've compiled versions 5.0.3 and 5.2.6 on RHEL 6.2 with following options: CFLAGS=-g -Wall ./configure \ --sysconfdir=/etc/bacula \ --with-dir-user=bacula \ --with-dir-group=bacula \ --with-sd-user=bacula \ --with-sd-group=bacula \ --with-fd-user=root \ --with-fd-group=root \ --with-dir-password=somepasswd \ --with-fd-password=somepasswd \ --with-sd-password=somepasswd \ --with-mon-dir-password=somepasswd \ --with-mon-fd-password=somepasswd \ --with-mon-sd-password=somepasswd \ --with-working-dir=/var/lib/bacula \ --with-scriptdir=/etc/bacula/scripts \ --with-smtp-host=localhost \ --with-subsys-dir=/var/lib/bacula/lock/subsys \ --with-pid-dir=/var/lib/bacula/run \ --enable-largefile \ --disable-tray-monitor \ --enable-build-dird \ --enable-build-stored \ --with-openssl \ --with-tcp-wrappers \ --with-python \ --enable-smartalloc \ --with-x \ --enable-bat \ --disable-libtool \ --with-postgresql \ --with-readline=/usr/include/readline \ --disable-conio and can atest that everything works just fine although I only used NEW volumes with it. Maybe there is something amiss with your catalog or volume media? -- Igor Blažević -- Stephen Thompson Berkeley Seismological Laboratory step...@seismo.berkeley.edu215 McCone Hall # 4760 404.538.7077 (phone) University of California, Berkeley 510.643.5811 (fax) Berkeley, CA 94720-4760 -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
[Bacula-users] bacula confused about volumes
Hey all, I've been meaning to post about this for awhile, but it comes up pretty rarely (maybe once every few months running hundreds of job a night). With an autochanger with 2 drives, each set to AutoSelect, it's possible for bacula to want the same volume in both drives at the same time, which creates an Operator Intervention situation. Here's an example where apparently previous jobs were using a particular volume in one drive and somehow jobs assigned to the other drives wanted the exact same volume, causing them to pause and require operator intervention. sd_C4 Version: 5.2.10 (28 June 2012) x86_64-unknown-linux-gnu redhat Enterprise release Daemon started 23-Jul-12 10:13. Jobs: run=295, running=3. Heap: heap=135,168 smbytes=2,089,365 max_bytes=3,689,580 bufs=299 max_bufs=396 Sizes: boffset_t=8 size_t=8 int32_t=4 int64_t=8 mode=0,0 Running Jobs: Writing: Incremental Backup job AAA JobId=247971 Volume=IM0081 pool=Incremental-Pool device=C4-Drive-0 (/dev/C4-Drive-0) spooling=0 despooling=0 despool_wait=0 Files=0 Bytes=0 Bytes/sec=0 FDReadSeqNo=6 in_msg=6 out_msg=4 fd=9 Writing: Incremental Backup job BBB JobId=247973 Volume=IM0081 pool=Incremental-Pool device=C4-Drive-0 (/dev/C4-Drive-0) spooling=0 despooling=0 despool_wait=0 Files=0 Bytes=0 Bytes/sec=0 FDReadSeqNo=6 in_msg=6 out_msg=4 fd=13 Writing: Incremental Backup job CCC JobId=247975 Volume=IM0081 pool=Incremental-Pool device=C4-Drive-0 (/dev/C4-Drive-0) spooling=0 despooling=0 despool_wait=0 Files=0 Bytes=0 Bytes/sec=0 FDReadSeqNo=6 in_msg=6 out_msg=4 fd=15 Writing: Incremental Backup job DDD JobId=247976 Volume=IM0081 pool=Incremental-Pool device=C4-Drive-0 (/dev/C4-Drive-0) spooling=0 despooling=0 despool_wait=0 Files=0 Bytes=0 Bytes/sec=0 FDReadSeqNo=6 in_msg=6 out_msg=4 fd=18 Jobs waiting to reserve a drive: Terminated Jobs: JobId LevelFiles Bytes Status FinishedName === XXX Device status: Autochanger C4-changer with devices: C4-Drive-0 (/dev/C4-Drive-0) C4-Drive-1 (/dev/C4-Drive-1) Device C4-Drive-0 (/dev/C4-Drive-0) is not open. Device is BLOCKED waiting for mount of volume IM0081, Pool:Incremental-Pool Media type: LTO-3 Drive 0 is not loaded. Device C4-Drive-1 (/dev/C4-Drive-1) is mounted with: Volume: IM0081 Pool:Incremental-Pool Media type: LTO-3 Slot 32 is loaded in drive 1. Total Bytes=369,270,534,144 Blocks=1,408,808 Bytes/block=262,115 Positioned at File=203 Block=0 Used Volume status: IM0070 on device C4-Drive-1 (/dev/C4-Drive-1) Reader=0 writers=0 devres=0 volinuse=0 IM0081 on device C4-Drive-0 (/dev/C4-Drive-0) Reader=0 writers=0 devres=4 volinuse=0 Anyone else have this happen? Race condition? thanks, Stephen -- Stephen Thompson Berkeley Seismological Laboratory step...@seismo.berkeley.edu215 McCone Hall # 4760 404.538.7077 (phone) University of California, Berkeley 510.643.5811 (fax) Berkeley, CA 94720-4760 -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Long running jobs and BackupCatalog
The enterprise version may have a pause feature, but the open-source one does not. We run a slave database server and make a daily dump from that, knowing that it will not preserve the records being made for running jobs, but since the running jobs aren't complete when the dump begins, they wouldn't be useful records to have anyway (and we're willing to be behind by a day on our backups if a disaster were to occur). It's also possible to run a transactional engine on your master db and do a dump while jobs are running, but we found the dump times to be ridiculously high (like 12+ hours). Our Catalog is something like 300Gb. There are other options out there as well, like using a snapshot of your underlying filesystem, but, yeah, a pause feature sure would be nice for many many reasons. Stephen On 8/2/12 6:36 AM, Clark, Patricia A. wrote: Because I have quite a few long running jobs, my BackupCatalog job is not getting run more than once or twice per week. I understand the potential instability of backing up the catalog while there are running jobs. Is there anything in the bacula pipeline that would pause running jobs so that the catalog could be backed up? Say a snapshot capability? Patti Clark Information International Associates, Inc. Linux Administrator and subcontractor to: Research and Development Systems Support Oak Ridge National Laboratory -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users -- Stephen Thompson Berkeley Seismological Laboratory step...@seismo.berkeley.edu215 McCone Hall # 4760 404.538.7077 (phone) University of California, Berkeley 510.643.5811 (fax) Berkeley, CA 94720-4760 -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] bacula confused about volumes
We're seeing this with a lot more frequency, though we've changed no configuration. Jobs are often left waiting an entire run in order to use a volume that's in use by the other drive within a 2 drive changer. Stephen On 7/25/12 7:38 AM, Stephen Thompson wrote: Hey all, I've been meaning to post about this for awhile, but it comes up pretty rarely (maybe once every few months running hundreds of job a night). With an autochanger with 2 drives, each set to AutoSelect, it's possible for bacula to want the same volume in both drives at the same time, which creates an Operator Intervention situation. Here's an example where apparently previous jobs were using a particular volume in one drive and somehow jobs assigned to the other drives wanted the exact same volume, causing them to pause and require operator intervention. sd_C4 Version: 5.2.10 (28 June 2012) x86_64-unknown-linux-gnu redhat Enterprise release Daemon started 23-Jul-12 10:13. Jobs: run=295, running=3. Heap: heap=135,168 smbytes=2,089,365 max_bytes=3,689,580 bufs=299 max_bufs=396 Sizes: boffset_t=8 size_t=8 int32_t=4 int64_t=8 mode=0,0 Running Jobs: Writing: Incremental Backup job AAA JobId=247971 Volume=IM0081 pool=Incremental-Pool device=C4-Drive-0 (/dev/C4-Drive-0) spooling=0 despooling=0 despool_wait=0 Files=0 Bytes=0 Bytes/sec=0 FDReadSeqNo=6 in_msg=6 out_msg=4 fd=9 Writing: Incremental Backup job BBB JobId=247973 Volume=IM0081 pool=Incremental-Pool device=C4-Drive-0 (/dev/C4-Drive-0) spooling=0 despooling=0 despool_wait=0 Files=0 Bytes=0 Bytes/sec=0 FDReadSeqNo=6 in_msg=6 out_msg=4 fd=13 Writing: Incremental Backup job CCC JobId=247975 Volume=IM0081 pool=Incremental-Pool device=C4-Drive-0 (/dev/C4-Drive-0) spooling=0 despooling=0 despool_wait=0 Files=0 Bytes=0 Bytes/sec=0 FDReadSeqNo=6 in_msg=6 out_msg=4 fd=15 Writing: Incremental Backup job DDD JobId=247976 Volume=IM0081 pool=Incremental-Pool device=C4-Drive-0 (/dev/C4-Drive-0) spooling=0 despooling=0 despool_wait=0 Files=0 Bytes=0 Bytes/sec=0 FDReadSeqNo=6 in_msg=6 out_msg=4 fd=18 Jobs waiting to reserve a drive: Terminated Jobs: JobId LevelFiles Bytes Status FinishedName === XXX Device status: Autochanger C4-changer with devices: C4-Drive-0 (/dev/C4-Drive-0) C4-Drive-1 (/dev/C4-Drive-1) Device C4-Drive-0 (/dev/C4-Drive-0) is not open. Device is BLOCKED waiting for mount of volume IM0081, Pool:Incremental-Pool Media type: LTO-3 Drive 0 is not loaded. Device C4-Drive-1 (/dev/C4-Drive-1) is mounted with: Volume: IM0081 Pool:Incremental-Pool Media type: LTO-3 Slot 32 is loaded in drive 1. Total Bytes=369,270,534,144 Blocks=1,408,808 Bytes/block=262,115 Positioned at File=203 Block=0 Used Volume status: IM0070 on device C4-Drive-1 (/dev/C4-Drive-1) Reader=0 writers=0 devres=0 volinuse=0 IM0081 on device C4-Drive-0 (/dev/C4-Drive-0) Reader=0 writers=0 devres=4 volinuse=0 Anyone else have this happen? Race condition? thanks, Stephen -- Stephen Thompson Berkeley Seismological Laboratory step...@seismo.berkeley.edu215 McCone Hall # 4760 404.538.7077 (phone) University of California, Berkeley 510.643.5811 (fax) Berkeley, CA 94720-4760 -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] BAT and qt vesrion
You can also use the depkgs-qt from the bacula website. It contains the necessary QT which you can statically link without installing the non-redhat QT on your system. Stephen On 08/09/2012 12:55 PM, Thomas Lohman wrote: I downloaded the latest stable QT open source version (4.8.2 at the time) and built it before building Bacula 5.2.10. Bat seems to work fine with it. If you do this, just be aware that the first time you build it, it will probably find the older 4.6.x RH QT libraries and embed their location in the shared library path so when you go to use it, it won't work. The first time I built it, I told it to explicitly look in it's own source tree for it's libraries (by setting LDFLAGS), installed that version and then re-built it again telling it to now look in the install directory. --tom I tried to compile bacula-5.2.10 with BAT on a RHEL6.2 server. I found that BAT did not get installed because it needs qt version 4.7.4 or higher but RHEL6.2 has version qt-4.6.2-24 as the latest. I would like to know what the others are doing about this issue? Uthra -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users -- Stephen Thompson Berkeley Seismological Laboratory step...@seismo.berkeley.edu215 McCone Hall # 4760 404.538.7077 (phone) University of California, Berkeley 510.643.5811 (fax) Berkeley, CA 94720-4760 -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Bacula 5.2.11: Director crashes
We updated our bacula server from 5.2.10 to 5.2.11 earlier today. A few hours later the bacula-dir crashed. This is on RedHat 6.3. No traceback generated. Stephen On 09/12/2012 05:45 AM, Uwe Schuerkamp wrote: Hi folks, I updated one of our bacula servers to 5.2.11 today (CentOS 6.x, compiled from source), but sadly the director crashes after a couple of copy jobs which were due this morning. Any idea how to go about debugging the issue? The server has a dir-bactrace file, but it appears to be empty, also the last couple of lines in the log file don't give away much beyond the selected jobids for copying. All the best, Uwe -- Stephen Thompson Berkeley Seismological Laboratory step...@seismo.berkeley.edu215 McCone Hall # 4760 404.538.7077 (phone) University of California, Berkeley 510.643.5811 (fax) Berkeley, CA 94720-4760 -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
[Bacula-users] LTO3 tape capacity (variable?)
Hello all, This is not likely a bacula questions, but in the chance that it is, or the experience on this list, I figured I would ask. We've been using LTO3 tapes with bacula for a few years now. Recently I've noticed how variable our tape capacity it, ranging from 200-800 Gb. Is that strictly governed by the compressibility of the actual data being backed up? Or is there some chance that bacula isn't squeezing as much onto my tapes as I would expect? 200Gb is not very much! thanks, Stephen -- Stephen Thompson Berkeley Seismological Laboratory step...@seismo.berkeley.edu215 McCone Hall # 4760 404.538.7077 (phone) University of California, Berkeley 510.643.5811 (fax) Berkeley, CA 94720-4760 -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] LTO3 tape capacity (variable?)
Thanks for the info, John. Is there anyone else in the bacula community with LTO3's seeing this behaviour? I don't believe (but am not 100% sure) that I'm having any hardware-related issues. Not sure what to make of this. About 25% of tapes in a monthly run (70 tapes) are under the 400Gb native, but then the other 75% are above it, some even hitting the 800Gb top. Stephen On 09/24/2012 12:02 PM, John Drescher wrote: This is not likely a bacula questions, but in the chance that it is, or the experience on this list, I figured I would ask. We've been using LTO3 tapes with bacula for a few years now. Recently I've noticed how variable our tape capacity it, ranging from 200-800 Gb. Is that strictly governed by the compressibility of the actual data being backed up? Or is there some chance that bacula isn't squeezing as much onto my tapes as I would expect? 200Gb is not very much! These tapes are 400GB native. If you get substantially less than that you have a configuration problem (you set limits on the volume size or duration) or a hardware problem. Compression should be handled entirely and automatically by the tape drive. Bacula does not enable or disable hardware compression it just passes the data to the drive and writes as much as it can up until it hits its first hardware error. At that point bacula calls the tape full and verifies that it can read the last block. I believe if it can't read the last block this block will be the first block written on the next volume. John -- Stephen Thompson Berkeley Seismological Laboratory step...@seismo.berkeley.edu215 McCone Hall # 4760 404.538.7077 (phone) University of California, Berkeley 510.643.5811 (fax) Berkeley, CA 94720-4760 -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] LTO3 tape capacity (variable?)
Thanks everyone for the suggestions, they at least give me somewhere to look, as I was running low on ideas. More info... The tape in question have only been used once or twice. The library is a StorageTek whose SLConsole reports no media (or drive) errors, though I will look into those linux-based tools. Our Sun/Oracle service engineer claims that our drives do not require cleaning tapes. Does that sound legit? Our throughput is pretty reasonable for our hardware -- we do use disk staging and get something like 60Mb/s to tape. Lastly, the tapes that get 200 vs 800 are from the same batch of tapes, same number of uses, and used by the same pair of SL500 drives. That's primarily why I wondered if it could be data dependent (or a bacula bug). thanks! Stephen On 09/25/12 02:29, Cejka Rudolf wrote: We've been using LTO3 tapes with bacula for a few years now. Recently I've noticed how variable our tape capacity it, ranging from 200-800 Gb. Is that strictly governed by the compressibility of the actual data being backed up? Hello, the lower bound 200 GB on 400 GB LTO-3 tapes is not possible due to the drive compression, because it always compares, if compressed data are shorter that original. In other case, it writes data uncompressed. So, in all cases, you should see atleast 400 000 000 000 bytes written on all tapes. Or is there some chance that bacula isn't squeezing as much onto my tapes as I would expect? 200Gb is not very much! In bacula, look mainly for the reasons, why there is just 200 GB written. If the tape is full, think about these: - Weared tapes. Typical tape service life is written as 200 full cycles. However, read http://www.xma4govt.co.uk/Libraries/Manufacturer/ultriumwhitepaper_EEE.sflb where they experienced problems with some tapes just only after 30 cycles! How many cycles could you have with your tapes? - Do you use disk staging, so that tape writes are done at full speed? Do you have a good disk staging? Considering using SSDs for staging is very wise. If data rate is lower that 1/3 to 1/2 of native tape speed (based on drive vendor, HP or IBM), then drive has to perform tape repositions, which means another important excessive drive and tape wearing. My experiences are, that even HW RAID-0 with four 10k disks could not be sufficient and when there are data writes and reads in parallel, it could not put 80 MB/s to the drive, typically just 50-70 MB/s, which is still acceptable for LTO-3, but not good. Currently, I have 4 x 450 GB SSDs HW RAID-0 with over 1500 GB/s without problem running writes and reads in parallel and just after that I hope that it is really sufficient for = LTO-3 staging and putting drives and tapes wearing to minimum. - Dirty heads. You can enforce cleaning cycle, but then return to the two points above and other suggestiong, like using some monitoring like ltt on Linux (or I have some home made reporting tool using camcontrol on FreeBSD), where it would be possible to ensure, that your problem are weared tapes, or something else. Best regards. -- Stephen Thompson Berkeley Seismological Laboratory step...@seismo.berkeley.edu215 McCone Hall # 4760 404.538.7077 (phone) University of California, Berkeley 510.643.5811 (fax) Berkeley, CA 94720-4760 -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] LTO3 tape capacity (variable?)
On 09/25/2012 10:43 AM, Alan Brown wrote: On 25/09/12 17:43, Stephen Thompson wrote: Our Sun/Oracle service engineer claims that our drives do not require cleaning tapes. Does that sound legit? In general: true (as in, Don't do it as a scheduled item), but all LTO drives require cleaning tapes from time to time and sometimes benefit from loading one even if the clean light isn't on. It primarily depends on the cleanliness of the room where the drive is. Our throughput is pretty reasonable for our hardware -- we do use disk staging and get something like 60Mb/s to tape. 60Mb/s is _slow_ for LTO3. You need to take a serious look at what you're using as stage disk and consider using a raid0 array of SSDs in order to keep up. Why do you say that's slow when the max speed appears to be 80? http://en.wikipedia.org/wiki/Linear_Tape-Open Lastly, the tapes that get 200 vs 800 are from the same batch of tapes, same number of uses, and used by the same pair of SL500 drives. That's primarily why I wondered if it could be data dependent (or a bacula bug). What happens if you mark the volumes as append and put them back in the library? I've seen transient scsi errors result in tapes being marked as full. What does smartctl show for the drive and tape in question? (run this against the /dev/sg of the tape drive) -- Stephen Thompson Berkeley Seismological Laboratory step...@seismo.berkeley.edu215 McCone Hall # 4760 404.538.7077 (phone) University of California, Berkeley 510.643.5811 (fax) Berkeley, CA 94720-4760 -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] LTO3 tape capacity (variable?)
On 09/25/2012 11:17 AM, Konstantin Khomoutov wrote: On Tue, 25 Sep 2012 11:00:07 -0700 Stephen Thompson step...@seismo.berkeley.edu wrote: 60Mb/s is _slow_ for LTO3. You need to take a serious look at what you're using as stage disk and consider using a raid0 array of SSDs in order to keep up. Why do you say that's slow when the max speed appears to be 80? http://en.wikipedia.org/wiki/Linear_Tape-Open It's quite logical, that to not starve the consumer, the producer should be at least as fast or faster, so you have to provide at least 80 Mb/s sustained read rate from your spooling media to be sure the tape drive is kept busy. No, I mean, there's slow and there's __SLOW__. He seemed to be indicating that it was unacceptably slow. I understand it's not optimal. Stephen -- Stephen Thompson Berkeley Seismological Laboratory step...@seismo.berkeley.edu215 McCone Hall # 4760 404.538.7077 (phone) University of California, Berkeley 510.643.5811 (fax) Berkeley, CA 94720-4760 -- Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] LTO3 tape capacity (variable?)
On 09/25/2012 02:29 PM, Cejka Rudolf wrote: Stephen Thompson wrote (2012/09/25): The tape in question have only been used once or twice. Do you mean just one or two drive loads and unloads? Yes, I mean the tapes have only been in a drive once or twice, possibly for a dozen sequential jobs while in the drive, but only in and out of the drive once or twice. I have seen this 200-300Gb capacity on new tapes as well as used. I see it in both my SL500 library as well as my C4 library, which is a combined 4 LTO3 drives (2 in each library). The library is a StorageTek whose SLConsole reports no media (or drive) errors, though I will look into those linux-based tools. There are several types of errors, recoverable and non-recoverable, and I'm afraid that you see just non-recoverable, but it is too late to see them. Our Sun/Oracle service engineer claims that our drives do not require cleaning tapes. Does that sound legit? If you are interested, you can study http://www.tarconis.com/documentos/LTO_Cleaning_wp.pdf ;o) So in HP case, it is possible to agree. However, you still have to have atleast one cleaning cartridge prepared ;o) Our throughput is pretty reasonable for our hardware -- we do use disk staging and get something like 60Mb/s to tape. HP LTO-3 drive can slow down physical speed to 27 MB/s, IBM LTO-3 to 40 MB/s. Native speed is 80 MB/s, bot all these speeds are after compression. If you have 60 MB/s before compression and there are some places with somewhat better compression than 2:1, then you are not able to feed HP LTO-3. For IBM drive, it is suffucient to have places with just 2:1 to need repositions. Lastly, the tapes that get 200 vs 800 are from the same batch of tapes, same number of uses, and used by the same pair of SL500 drives. That's primarily why I wondered if it could be data dependent (or a bacula bug). And what about the reason to switch to the next tape? Do you have something like this in your reports? 22-Sep 02:22 backup-sd JobId 74990: End of Volume 1 at 95:46412 on device drive0 (/dev/nsa0). Write of 65536 bytes got 0. 22-Sep 02:22 backup-sd JobId 74990: Re-read of last block succeeded. 22-Sep 02:22 backup-sd JobId 74990: End of medium on Volume 1 Bytes=381,238,317,056 Blocks=5,817,238 at 22-Sep-2012 02:22. Here's an example of a tape that had one job and only wrote ~278Gb to the tape: 10-Sep 10:08 sd-SL500 JobId 256773: Recycled volume FB0095 on device SL500-Drive-1 (/dev/SL500-Drive-1), all previous data lost. 10-Sep 10:08 sd-SL500 JobId 256773: New volume FB0095 mounted on device SL500-Drive-1 (/dev/SL500-Drive-1) at 10-Sep-2012 10:08. 10-Sep 13:02 sd-SL500 JobId 256773: End of Volume FB0095 at 149:5906 on device SL500-Drive-1 (/dev/SL500-Drive-1). Write of 262144 bytes got -1. 10-Sep 13:02 sd-SL500 JobId 256773: Re-read of last block succeeded. 10-Sep 13:02 sd-SL500 JobId 256773: End of medium on Volume FB0095 Bytes=299,532,813,312 Blocks=1,142,627 at 10-Sep-2012 13:02. Do not you use something from the following things in bacula configuration? UseVolumeOnce Maximum Volume Jobs Maximum Volume Bytes Volume Use Duration ? No, none of those are configured. Stephen -- Stephen Thompson Berkeley Seismological Laboratory step...@seismo.berkeley.edu215 McCone Hall # 4760 404.538.7077 (phone) University of California, Berkeley 510.643.5811 (fax) Berkeley, CA 94720-4760 -- How fast is your code? 3 out of 4 devs don\\\'t know how their code performs in production. Find out how slow your code is with AppDynamics Lite. http://ad.doubleclick.net/clk;262219672;13503038;z? http://info.appdynamics.com/FreeJavaPerformanceDownload.html ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] LTO3 tape capacity (variable?)
On 09/26/2012 02:35 PM, Stephen Thompson wrote: On 09/25/2012 02:29 PM, Cejka Rudolf wrote: Stephen Thompson wrote (2012/09/25): The tape in question have only been used once or twice. Do you mean just one or two drive loads and unloads? Yes, I mean the tapes have only been in a drive once or twice, possibly for a dozen sequential jobs while in the drive, but only in and out of the drive once or twice. I have seen this 200-300Gb capacity on new tapes as well as used. I think I pointed this out before, but I also have used and new tapes with 400-800Gb on them. It seems really hit or miss, though the tapes with 400Gb or less are probably a 1/3 of my tapes. The other 2/3 have above 400Gb. I see it in both my SL500 library as well as my C4 library, which is a combined 4 LTO3 drives (2 in each library). The library is a StorageTek whose SLConsole reports no media (or drive) errors, though I will look into those linux-based tools. There are several types of errors, recoverable and non-recoverable, and I'm afraid that you see just non-recoverable, but it is too late to see them. Our Sun/Oracle service engineer claims that our drives do not require cleaning tapes. Does that sound legit? If you are interested, you can study http://www.tarconis.com/documentos/LTO_Cleaning_wp.pdf ;o) So in HP case, it is possible to agree. However, you still have to have atleast one cleaning cartridge prepared ;o) Our throughput is pretty reasonable for our hardware -- we do use disk staging and get something like 60Mb/s to tape. HP LTO-3 drive can slow down physical speed to 27 MB/s, IBM LTO-3 to 40 MB/s. Native speed is 80 MB/s, bot all these speeds are after compression. If you have 60 MB/s before compression and there are some places with somewhat better compression than 2:1, then you are not able to feed HP LTO-3. For IBM drive, it is suffucient to have places with just 2:1 to need repositions. Lastly, the tapes that get 200 vs 800 are from the same batch of tapes, same number of uses, and used by the same pair of SL500 drives. That's primarily why I wondered if it could be data dependent (or a bacula bug). And what about the reason to switch to the next tape? Do you have something like this in your reports? 22-Sep 02:22 backup-sd JobId 74990: End of Volume 1 at 95:46412 on device drive0 (/dev/nsa0). Write of 65536 bytes got 0. 22-Sep 02:22 backup-sd JobId 74990: Re-read of last block succeeded. 22-Sep 02:22 backup-sd JobId 74990: End of medium on Volume 1 Bytes=381,238,317,056 Blocks=5,817,238 at 22-Sep-2012 02:22. Here's an example of a tape that had one job and only wrote ~278Gb to the tape: 10-Sep 10:08 sd-SL500 JobId 256773: Recycled volume FB0095 on device SL500-Drive-1 (/dev/SL500-Drive-1), all previous data lost. 10-Sep 10:08 sd-SL500 JobId 256773: New volume FB0095 mounted on device SL500-Drive-1 (/dev/SL500-Drive-1) at 10-Sep-2012 10:08. 10-Sep 13:02 sd-SL500 JobId 256773: End of Volume FB0095 at 149:5906 on device SL500-Drive-1 (/dev/SL500-Drive-1). Write of 262144 bytes got -1. 10-Sep 13:02 sd-SL500 JobId 256773: Re-read of last block succeeded. 10-Sep 13:02 sd-SL500 JobId 256773: End of medium on Volume FB0095 Bytes=299,532,813,312 Blocks=1,142,627 at 10-Sep-2012 13:02. Do not you use something from the following things in bacula configuration? UseVolumeOnce Maximum Volume Jobs Maximum Volume Bytes Volume Use Duration ? No, none of those are configured. Stephen -- Stephen Thompson Berkeley Seismological Laboratory step...@seismo.berkeley.edu215 McCone Hall # 4760 404.538.7077 (phone) University of California, Berkeley 510.643.5811 (fax) Berkeley, CA 94720-4760 -- How fast is your code? 3 out of 4 devs don\\\'t know how their code performs in production. Find out how slow your code is with AppDynamics Lite. http://ad.doubleclick.net/clk;262219672;13503038;z? http://info.appdynamics.com/FreeJavaPerformanceDownload.html ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] LTO3 tape capacity (variable?)
On 09/25/2012 10:43 AM, Alan Brown wrote: On 25/09/12 17:43, Stephen Thompson wrote: Our Sun/Oracle service engineer claims that our drives do not require cleaning tapes. Does that sound legit? In general: true (as in, Don't do it as a scheduled item), but all LTO drives require cleaning tapes from time to time and sometimes benefit from loading one even if the clean light isn't on. It primarily depends on the cleanliness of the room where the drive is. Our throughput is pretty reasonable for our hardware -- we do use disk staging and get something like 60Mb/s to tape. 60Mb/s is _slow_ for LTO3. You need to take a serious look at what you're using as stage disk and consider using a raid0 array of SSDs in order to keep up. Lastly, the tapes that get 200 vs 800 are from the same batch of tapes, same number of uses, and used by the same pair of SL500 drives. That's primarily why I wondered if it could be data dependent (or a bacula bug). What happens if you mark the volumes as append and put them back in the library? I haven't had a lot of time to look into this today, but I do this quick test and it immediately marks the volume Full again. 27-Sep 14:20 sd-SL500 JobId 260069: Volume FB0763 previously written, moving to end of data. 27-Sep 14:21 sd-SL500 JobId 260069: Ready to append to end of Volume FB0763 at file=110. 27-Sep 14:21 sd-SL500 JobId 260069: Spooling data ... 27-Sep 14:21 sd-SL500 JobId 260069: Job write elapsed time = 00:00:01, Transfer rate = 759.3 K Bytes/second 27-Sep 14:21 sd-SL500 JobId 260069: Committing spooled data to Volume FB0763. Despooling 762,358 bytes ... 27-Sep 14:21 sd-SL500 JobId 260069: End of Volume FB0763 at 110:1 on device SL500-Drive-0 (/dev/SL500-Drive-0). Write of 262144 bytes got -1. 27-Sep 14:21 sd-SL500 JobId 260069: Re-read of last block succeeded. 27-Sep 14:21 sd-SL500 JobId 260069: End of medium on Volume FB0763 Bytes=219,730,936,832 Blocks=838,207 at 27-Sep-2012 14:21. 27-Sep 14:21 sd-SL500 JobId 260069: 3307 Issuing autochanger unload slot 36, drive 0 command. I've seen transient scsi errors result in tapes being marked as full. What does smartctl show for the drive and tape in question? (run this against the /dev/sg of the tape drive) -- Stephen Thompson Berkeley Seismological Laboratory step...@seismo.berkeley.edu215 McCone Hall # 4760 404.538.7077 (phone) University of California, Berkeley 510.643.5811 (fax) Berkeley, CA 94720-4760 -- Everyone hates slow websites. So do we. Make your web apps faster with AppDynamics Download AppDynamics Lite for free today: http://ad.doubleclick.net/clk;258768047;13503038;j? http://info.appdynamics.com/FreeJavaPerformanceDownload.html ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] LTO3 tape capacity (variable?)
On 9/27/12 6:17 PM, Alan Brown wrote: On 27/09/12 22:25, Stephen Thompson wrote: What happens if you mark the volumes as append and put them back in the library? I haven't had a lot of time to look into this today, but I do this quick test and it immediately marks the volume Full again. Then it really is full and the rest is down to overheads. Consider using larger block sizes. Aren't these considered reasonable settings for LTO3? Maximum block size = 262144 # 256kb Maximum File Size = 2gb thanks for the help! Stephen -- Stephen Thompson Berkeley Seismological Laboratory step...@seismo.berkeley.edu215 McCone Hall # 4760 404.538.7077 (phone) University of California, Berkeley 510.643.5811 (fax) Berkeley, CA 94720-4760 -- Got visibility? Most devs has no idea what their production app looks like. Find out how fast your code is with AppDynamics Lite. http://ad.doubleclick.net/clk;262219671;13503038;y? http://info.appdynamics.com/FreeJavaPerformanceDownload.html ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] LTO3 tape capacity (variable?)
Hi, I ran some btape tests today to verify that I'd be improving throughput by changing blocksize from 256KB to 2MB and found that this does indeed appear to be true in terms of increasing compression efficiency, but it doesn't seem to affect incompressible data much, if at all. Still, it seems worth changing and I thank you for pointing me in that direction. More importantly, I realized that my testing 6 months ago was not on all 4 of my drives, but only 2 of them. Today, I discovered one of my drives (untested in the past) is getting 1/2 the throughput for random data writes as the others!! btape *speed file_size=4 nb_file=4 skip_raw SL500 Drive 0 SL500 Drive 1 C4 Drive 0 C4 Drive 1 256KB block size: Zeros = 92.86 MB/s 92.36 MB/s 91.38 MB/s 92.86 MB/s Random= 63.16 MB/s 27.53 MB/s 63.39 MB/s 63.60 MB/s 2MB block size: Zeros = 123.5 MB/s 122.7 MB/s 122.7 MB/s 122.7 MB/s Random= 62.24 MB/s 28.44 MB/s 63.62 MB/s 63.62 MB/s ^ thanks, Stephen On 09/28/2012 05:08 AM, Alan Brown wrote: On 28/09/12 02:38, Stephen Thompson wrote: Aren't these considered reasonable settings for LTO3? Maximum block size = 262144 # 256kb Maximum File Size = 2gb Not really. Change maximum file size to 10Gb and maximum block size to 2M You _must_ set all open tapes to used and restart the storage daemon when changing the block size. Bacula can't cope with varying maximum sizes on a tape Even with those changes, if you have a lot of small, incompressible files you'll see high tape overheads. thanks for the help! Stephen -- Stephen Thompson Berkeley Seismological Laboratory step...@seismo.berkeley.edu215 McCone Hall # 4760 404.538.7077 (phone) University of California, Berkeley 510.643.5811 (fax) Berkeley, CA 94720-4760 -- Got visibility? Most devs has no idea what their production app looks like. Find out how fast your code is with AppDynamics Lite. http://ad.doubleclick.net/clk;262219671;13503038;y? http://info.appdynamics.com/FreeJavaPerformanceDownload.html ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] LTO3 tape capacity (variable?)
On 10/01/2012 03:52 PM, James Harper wrote: Hi, I ran some btape tests today to verify that I'd be improving throughput by changing blocksize from 256KB to 2MB and found that this does indeed appear to be true in terms of increasing compression efficiency, but it doesn't seem to affect incompressible data much, if at all. Still, it seems worth changing and I thank you for pointing me in that direction. More importantly, I realized that my testing 6 months ago was not on all 4 of my drives, but only 2 of them. Today, I discovered one of my drives (untested in the past) is getting 1/2 the throughput for random data writes as the others!! Is it definitely LTO3 and definitely using LTO3 media? LTO2 was about half the speed, including using LTO2 media in an LTO3 drive. James Yes, all 4 drives are HP Ultrium 3 drives. And the same LTO3 bacula volume was used in all 4 testing runs today. All drives are connected via 2Gb fiber. All tests were done independent of each other with no other activity on the backup server during the time of the testing. Stephen -- Stephen Thompson Berkeley Seismological Laboratory step...@seismo.berkeley.edu215 McCone Hall # 4760 404.538.7077 (phone) University of California, Berkeley 510.643.5811 (fax) Berkeley, CA 94720-4760 -- Got visibility? Most devs has no idea what their production app looks like. Find out how fast your code is with AppDynamics Lite. http://ad.doubleclick.net/clk;262219671;13503038;y? http://info.appdynamics.com/FreeJavaPerformanceDownload.html ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] LTO3 tape capacity (variable?)
On 10/1/12 4:06 PM, Alan Brown wrote: On 01/10/12 23:38, Stephen Thompson wrote: More importantly, I realized that my testing 6 months ago was not on all 4 of my drives, but only 2 of them. Today, I discovered one of my drives (untested in the past) is getting 1/2 the throughput for random data writes as the others!! smartctl -a /dev/sg(drive) will tell you a lot Put a cleaning tape in it Cleaning tape did not improve results. I see some errors in the counter log on the problem drive, but I see even more errors on another drive which isn't having a throughput problem (specifically SL500 Drive 1 is the lower throughput, but C4 Drive 1 actually has a higher error count). SL500 Drive 0 (~60MB/s random data throughput) = Error counter log: Errors Corrected by Total Correction GigabytesTotal ECC rereads/errors algorithm processeduncorrected fast | delayed rewrites corrected invocations [10^9 bytes] errors read: 00 0 0 0 0.000 0 write: 00 0 0 0 0.000 0 SL500 Drive 1 (~30MB/s random data throughput) = Error counter log: Errors Corrected by Total Correction GigabytesTotal ECC rereads/errors algorithm processeduncorrected fast | delayed rewrites corrected invocations [10^9 bytes] errors read: 00 0 0 0 0.000 0 write: 104540 0 0 821389 0.000 0 C4 Drive 0 (~60MB/s random data throughput) == Error counter log: Errors Corrected by Total Correction GigabytesTotal ECC rereads/errors algorithm processeduncorrected fast | delayed rewrites corrected invocations [10^9 bytes] errors read: 20 0 0 2 0.000 0 write: 00 0 0 0 0.000 0 C4 Drive 1 (~60MB/s random data throughput) == Error counter log: Errors Corrected by Total Correction GigabytesTotal ECC rereads/errors algorithm processeduncorrected fast | delayed rewrites corrected invocations [10^9 bytes] errors read: 20 0 0 2 0.000 0 write: 189610 0 0 48261 0.000 0 Stephen -- Stephen Thompson Berkeley Seismological Laboratory step...@seismo.berkeley.edu215 McCone Hall # 4760 404.538.7077 (phone) University of California, Berkeley 510.643.5811 (fax) Berkeley, CA 94720-4760 -- Don't let slow site performance ruin your business. Deploy New Relic APM Deploy New Relic app performance management and know exactly what is happening inside your Ruby, Python, PHP, Java, and .NET app Try New Relic at no cost today and get our sweet Data Nerd shirt too! http://p.sf.net/sfu/newrelic-dev2dev ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] LTO3 tape capacity (variable?)
Correction, the non-problem drive has a higher ECC fast error count, but the problem drive has a significantly higher Corrective algorithm invocations count. On 10/1/12 5:33 PM, Stephen Thompson wrote: On 10/1/12 4:06 PM, Alan Brown wrote: On 01/10/12 23:38, Stephen Thompson wrote: More importantly, I realized that my testing 6 months ago was not on all 4 of my drives, but only 2 of them. Today, I discovered one of my drives (untested in the past) is getting 1/2 the throughput for random data writes as the others!! smartctl -a /dev/sg(drive) will tell you a lot Put a cleaning tape in it Cleaning tape did not improve results. I see some errors in the counter log on the problem drive, but I see even more errors on another drive which isn't having a throughput problem (specifically SL500 Drive 1 is the lower throughput, but C4 Drive 1 actually has a higher error count). SL500 Drive 0 (~60MB/s random data throughput) = Error counter log: Errors Corrected by Total Correction GigabytesTotal ECC rereads/errors algorithm processeduncorrected fast | delayed rewrites corrected invocations [10^9 bytes] errors read: 00 0 0 0 0.000 0 write: 00 0 0 0 0.000 0 SL500 Drive 1 (~30MB/s random data throughput) = Error counter log: Errors Corrected by Total Correction GigabytesTotal ECC rereads/errors algorithm processeduncorrected fast | delayed rewrites corrected invocations [10^9 bytes] errors read: 00 0 0 0 0.000 0 write: 104540 0 0 821389 0.000 0 C4 Drive 0 (~60MB/s random data throughput) == Error counter log: Errors Corrected by Total Correction GigabytesTotal ECC rereads/errors algorithm processeduncorrected fast | delayed rewrites corrected invocations [10^9 bytes] errors read: 20 0 0 2 0.000 0 write: 00 0 0 0 0.000 0 C4 Drive 1 (~60MB/s random data throughput) == Error counter log: Errors Corrected by Total Correction GigabytesTotal ECC rereads/errors algorithm processeduncorrected fast | delayed rewrites corrected invocations [10^9 bytes] errors read: 20 0 0 2 0.000 0 write: 189610 0 0 48261 0.000 0 Stephen -- Stephen Thompson Berkeley Seismological Laboratory step...@seismo.berkeley.edu215 McCone Hall # 4760 404.538.7077 (phone) University of California, Berkeley 510.643.5811 (fax) Berkeley, CA 94720-4760 -- Don't let slow site performance ruin your business. Deploy New Relic APM Deploy New Relic app performance management and know exactly what is happening inside your Ruby, Python, PHP, Java, and .NET app Try New Relic at no cost today and get our sweet Data Nerd shirt too! http://p.sf.net/sfu/newrelic-dev2dev ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] LTO3 tape capacity (variable?)
Thank you everyone for your help! Oracle replaced the drive and while it's not running with as high a throughput as I would like, it's at least up at the 60MB/s (random data) that my other drives are at, rather than it's previous 30MB/s. I'm still going to experiment with some of the ideas that were tossed out and see if I can't get even better throughput of for bacula. thanks again, Stephen On 10/2/12 2:47 AM, Alan Brown wrote: On 02/10/12 01:35, Stephen Thompson wrote: Correction, the non-problem drive has a higher ECC fast error count, but the problem drive has a significantly higher Corrective algorithm invocations count. What that means is that it rewrote the data, which accounts for the lower throughput. LTO drives read as they write and if there are errors, they write again. If a cleaning tape doesn't work then you need to get the drive looked at/replaced under warranty. On 10/1/12 5:33 PM, Stephen Thompson wrote: On 10/1/12 4:06 PM, Alan Brown wrote: On 01/10/12 23:38, Stephen Thompson wrote: More importantly, I realized that my testing 6 months ago was not on all 4 of my drives, but only 2 of them. Today, I discovered one of my drives (untested in the past) is getting 1/2 the throughput for random data writes as the others!! smartctl -a /dev/sg(drive) will tell you a lot Put a cleaning tape in it Cleaning tape did not improve results. I see some errors in the counter log on the problem drive, but I see even more errors on another drive which isn't having a throughput problem (specifically SL500 Drive 1 is the lower throughput, but C4 Drive 1 actually has a higher error count). SL500 Drive 0 (~60MB/s random data throughput) = Error counter log: Errors Corrected by Total Correction GigabytesTotal ECC rereads/errors algorithm processeduncorrected fast | delayed rewrites corrected invocations [10^9 bytes] errors read: 00 0 0 0 0.000 0 write: 00 0 0 0 0.000 0 SL500 Drive 1 (~30MB/s random data throughput) = Error counter log: Errors Corrected by Total Correction GigabytesTotal ECC rereads/errors algorithm processeduncorrected fast | delayed rewrites corrected invocations [10^9 bytes] errors read: 00 0 0 0 0.000 0 write: 104540 0 0 821389 0.000 0 C4 Drive 0 (~60MB/s random data throughput) == Error counter log: Errors Corrected by Total Correction GigabytesTotal ECC rereads/errors algorithm processeduncorrected fast | delayed rewrites corrected invocations [10^9 bytes] errors read: 20 0 0 2 0.000 0 write: 00 0 0 0 0.000 0 C4 Drive 1 (~60MB/s random data throughput) == Error counter log: Errors Corrected by Total Correction GigabytesTotal ECC rereads/errors algorithm processeduncorrected fast | delayed rewrites corrected invocations [10^9 bytes] errors read: 20 0 0 2 0.000 0 write: 189610 0 0 48261 0.000 0 Stephen -- Stephen Thompson Berkeley Seismological Laboratory step...@seismo.berkeley.edu215 McCone Hall # 4760 404.538.7077 (phone) University of California, Berkeley 510.643.5811 (fax) Berkeley, CA 94720-4760 -- Don't let slow site performance ruin your business. Deploy New Relic APM Deploy New Relic app performance management and know exactly what is happening inside your Ruby, Python, PHP, Java, and .NET app Try New Relic at no cost today and get our sweet Data Nerd shirt too! http://p.sf.net/sfu/newrelic-dev2dev ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Is tape filling up too early?
I recently found out that I had a bad tape drive. With the tape in the drive run the following and see if it says there are errors: smartctl -a /dev/nst0 If there are errors, it's wasting tape and hence less capacity. Stephen On 10/17/2012 11:14 AM, Sergio Belkin wrote: Hi folks I'm using LTO3 tapes and are filling up too fast. They have supposedly 800 GB. I know that never reach that capacity, but I am somewhat surprised that is full with only ~ 333 GB!! (lesser than a half) If I issue a list media pool command I get | MediaId | VolumeName | VolStatus | Enabled | VolBytes| VolFiles | VolRetention | Recycle | Slot | InChanger | MediaType | LastWritten | +-+--+---+-+-+--+--+-+--+---+---+-+ | 100 | LUNOCT12LTO3 | Full | 1 | 421,590,177,792 | 431 | 31,536,000 | 0 |0 | 0 | LTO3 | 2012-10-16 08:11:08 | Output of mt -f /dev/nst0 status SCSI 2 tape drive: File number=0, block number=0, partition=0. Tape block size 0 bytes. Density code 0x44 (no translation). Soft error count since last status=0 General status bits on (4101): BOT ONLINE IM_REP_EN The volume was recycled with 'mt -f /dev/nst0 rewind;mt -f /dev/nst0 weof' My storage daemon config is as follow Storage { # definition of myself Name = superbackup-sd SDPort = 9103 # Director's port WorkingDirectory = /var/bacula/working Pid Directory = /var/run Maximum Concurrent Jobs = 20 } Director { Name = superbackup-dir Password = ucuc } Director { Name = superbackup-mon Password = ucuc Monitor = yes } Device { Name = LTO3 Media Type = LTO3 Archive Device = /dev/nst0 #modificar a 1 para usar el DAT4S AutomaticMount = yes; # when device opened, read it AlwaysOpen = yes; RemovableMedia = yes; Maximum Spool Size = 30g Maximum Job Spool Size = 20gb Spool Directory = /var/spool/bacula #Maximum Network Buffer Size = 10240 #Hardware end of medium = No; Fast Forward Space File = yes #TWO EOF = yes } Messages { Name = Standard director = supernoc-dir = all } You have new mail in /var/spool/mail/root Could you suggest me something to improve it? Thanks in advance! -- Stephen Thompson Berkeley Seismological Laboratory step...@seismo.berkeley.edu215 McCone Hall # 4760 404.538.7077 (phone) University of California, Berkeley 510.643.5811 (fax) Berkeley, CA 94720-4760 -- Everyone hates slow websites. So do we. Make your web apps faster with AppDynamics Download AppDynamics Lite for free today: http://p.sf.net/sfu/appdyn_sfd2d_oct ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
[Bacula-users] wanted on DEVICE-0, is in use by device DEVICE-1
Hello all, I've had the following problem for ages (meaning multiple major revisions of bacula) and I've seen this come up from time to time on the mailing list, but I've never actually seen a resolution (please point me to one if it's been found). background: I run monthly Fulls and nightly Incrementals. I have a 2 drive autochanger dedicated to my Incrementals. I launch something like ~150 Incremental jobs each night. I am configured for 8 concurrent jobs on the Storage Daemon. PROBLEM: The first job(s) grab one of the 2 devices available in the changer (which is set to AutoSelect) and either load a tape, or use a tape from the previous evening. All tapes in the changer are in the same Incremenal-Pool. The second jobs(s) grab the other of the 2 devices available in the changer, but want to use the same tape that's just been mounted (or put into use) on the jobs that got launched first. They will often literal wait the entire evening until 100's of jobs run through on only one device, until that tape is freed up, at which point it is unmounted from the first device and moved to the second. Note, the behaviour seems to be to round-robin my 8 concurrency limit between the 2 available drives, which mean 4 jobs will run, and 4 jobs will block on waiting for the wanted Volume. When the original 4 jobs are completed (not at the same time) additional jobs are launched that keep that wanted Volume in use. LOG: 03-Nov 22:00 DIRECTOR JobId 267433: Start Backup JobId 267433, Job=JOB. 2012-11-03_22.00.00_0403-Nov 22:00 DIRECTOR JobId 267433: Using Device L100-Drive-003-Nov 22:00 DIRECTOR JobId 267433: Sending Accurate information. 03-Nov 22:00 sd_L100_ JobId 267433: 3307 Issuing autochanger unload slot 82, drive 0 command. 03-Nov 22:06 lawson-sd_L100_ JobId 267433: Warning: Volume IM0108 wanted on L100-Drive-0 (/dev/L100-Drive-0) is in use by device L100-Drive-1 (/dev/L100-Drive-1) 03-Nov 22:09 sd_L100_ JobId 267433: Warning: Volume IM0108 wanted on L100-Drive-0 (/dev/L100-Drive-0) is in use by device L100-Drive-1 (/dev/L100-Drive-1) 03-Nov 22:09 sd_L100_ JobId 267433: Warning: mount.c:217 Open device L100-Drive-0 (/dev/L100-Drive-0) Volume IM0108 failed: ERR=dev.c:513 Unable to open device L100-Drive-0 (/dev/L100-Drive-0): ERR=No medium found . . . CONFIGS (partial and seem pretty straight-forward): Schedule { Name = DefaultSchedule Run = Level=Incremental sat-thu at 22:00 Run = Level=Differential fri at 22:00 } JobDefs { Name = DefaultJob Type = Backup Level = Full Schedule = DefaultSchedule Incremental Backup Pool = Incremental-Pool Differential Backup Pool = Incremental-Pool } Pool { Name = Incremental-Pool Pool Type = Backup Storage = L100-changer } Storage { Name = L100-changer Device = L100-changer Media Type = LTO-3 Autochanger = yes Maximum Concurrent Jobs = 8 } Autochanger { Name = L100-changer Device = L100-Drive-0 Device = L100-Drive-1 Changer Device = /dev/L100-changer } Device { Name = L100-Drive-0 Drive Index = 0 Media Type = LTO-3 Archive Device = /dev/L100-Drive-0 AutomaticMount = yes; AlwaysOpen = yes; RemovableMedia = yes; RandomAccess = no; AutoChanger = yes; AutoSelect = yes; } Device { Name = L100-Drive-1 Drive Index = 0 Media Type = LTO-3 Archive Device = /dev/L100-Drive-1 AutomaticMount = yes; AlwaysOpen = yes; RemovableMedia = yes; RandomAccess = no; AutoChanger = yes; AutoSelect = yes; } thanks! Stephen -- Stephen Thompson Berkeley Seismological Laboratory step...@seismo.berkeley.edu215 McCone Hall # 4760 404.538.7077 (phone) University of California, Berkeley 510.643.5811 (fax) Berkeley, CA 94720-4760 -- LogMeIn Central: Instant, anywhere, Remote PC access and management. Stay in control, update software, and manage PCs from one command center Diagnose problems and improve visibility into emerging IT issues Automate, monitor and manage. Do more in less time with Central http://p.sf.net/sfu/logmein12331_d2d ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] wanted on DEVICE-0, is in use by device DEVICE-1
On 11/5/12 7:59 AM, John Drescher wrote: I've had the following problem for ages (meaning multiple major revisions of bacula) and I've seen this come up from time to time on the mailing list, but I've never actually seen a resolution (please point me to one if it's been found). background: I run monthly Fulls and nightly Incrementals. I have a 2 drive autochanger dedicated to my Incrementals. I launch something like ~150 Incremental jobs each night. I am configured for 8 concurrent jobs on the Storage Daemon. PROBLEM: The first job(s) grab one of the 2 devices available in the changer (which is set to AutoSelect) and either load a tape, or use a tape from the previous evening. All tapes in the changer are in the same Incremenal-Pool. The second jobs(s) grab the other of the 2 devices available in the changer, but want to use the same tape that's just been mounted (or put into use) on the jobs that got launched first. They will often literal wait the entire evening until 100's of jobs run through on only one device, until that tape is freed up, at which point it is unmounted from the first device and moved to the second. Note, the behaviour seems to be to round-robin my 8 concurrency limit between the 2 available drives, which mean 4 jobs will run, and 4 jobs will block on waiting for the wanted Volume. When the original 4 jobs are completed (not at the same time) additional jobs are launched that keep that wanted Volume in use. LOG: 03-Nov 22:00 DIRECTOR JobId 267433: Start Backup JobId 267433, Job=JOB. 2012-11-03_22.00.00_0403-Nov 22:00 DIRECTOR JobId 267433: Using Device L100-Drive-003-Nov 22:00 DIRECTOR JobId 267433: Sending Accurate information. 03-Nov 22:00 sd_L100_ JobId 267433: 3307 Issuing autochanger unload slot 82, drive 0 command. 03-Nov 22:06 lawson-sd_L100_ JobId 267433: Warning: Volume IM0108 wanted on L100-Drive-0 (/dev/L100-Drive-0) is in use by device L100-Drive-1 (/dev/L100-Drive-1) 03-Nov 22:09 sd_L100_ JobId 267433: Warning: Volume IM0108 wanted on L100-Drive-0 (/dev/L100-Drive-0) is in use by device L100-Drive-1 (/dev/L100-Drive-1) 03-Nov 22:09 sd_L100_ JobId 267433: Warning: mount.c:217 Open device L100-Drive-0 (/dev/L100-Drive-0) Volume IM0108 failed: ERR=dev.c:513 Unable to open device L100-Drive-0 (/dev/L100-Drive-0): ERR=No medium found . . . CONFIGS (partial and seem pretty straight-forward): Schedule { Name = DefaultSchedule Run = Level=Incremental sat-thu at 22:00 Run = Level=Differential fri at 22:00 } JobDefs { Name = DefaultJob Type = Backup Level = Full Schedule = DefaultSchedule Incremental Backup Pool = Incremental-Pool Differential Backup Pool = Incremental-Pool } Pool { Name = Incremental-Pool Pool Type = Backup Storage = L100-changer } Storage { Name = L100-changer Device = L100-changer Media Type = LTO-3 Autochanger = yes Maximum Concurrent Jobs = 8 } Autochanger { Name = L100-changer Device = L100-Drive-0 Device = L100-Drive-1 Changer Device = /dev/L100-changer } Device { Name = L100-Drive-0 Drive Index = 0 Media Type = LTO-3 Archive Device = /dev/L100-Drive-0 AutomaticMount = yes; AlwaysOpen = yes; RemovableMedia = yes; RandomAccess = no; AutoChanger = yes; AutoSelect = yes; } Device { Name = L100-Drive-1 Drive Index = 0 Media Type = LTO-3 Archive Device = /dev/L100-Drive-1 AutomaticMount = yes; AlwaysOpen = yes; RemovableMedia = yes; RandomAccess = no; AutoChanger = yes; AutoSelect = yes; } I do not have a good solution but I know by default bacula does not want to load the same pool into more than 1 storage device at the same time. John I think it's something in the automated logic. Because if I launch jobs by hand (same pool across 2 tapes devices in same autochanger) everything works fine. I think it has more to do with the Scheduler assigning the same same Volume to all jobs and then not wanting to change that choice if that Volume is in use. If I do a status on the Director for instance and see the jobs for the next day lined up in Scheduled jobs, they all have the same Volume listed. thanks, Stephen -- Stephen Thompson Berkeley Seismological Laboratory step...@seismo.berkeley.edu215 McCone Hall # 4760 404.538.7077 (phone) University of California, Berkeley 510.643.5811 (fax) Berkeley, CA 94720-4760 -- LogMeIn Central: Instant, anywhere, Remote PC access and management. Stay in control, update software, and manage PCs from one command center Diagnose problems and improve visibility into emerging IT issues Automate, monitor and manage. Do more in less time with Central http
Re: [Bacula-users] wanted on DEVICE-0, is in use by device DEVICE-1
On 11/05/12 08:03, Stephen Thompson wrote: On 11/5/12 7:59 AM, John Drescher wrote: I've had the following problem for ages (meaning multiple major revisions of bacula) and I've seen this come up from time to time on the mailing list, but I've never actually seen a resolution (please point me to one if it's been found). background: I run monthly Fulls and nightly Incrementals. I have a 2 drive autochanger dedicated to my Incrementals. I launch something like ~150 Incremental jobs each night. I am configured for 8 concurrent jobs on the Storage Daemon. PROBLEM: The first job(s) grab one of the 2 devices available in the changer (which is set to AutoSelect) and either load a tape, or use a tape from the previous evening. All tapes in the changer are in the same Incremenal-Pool. The second jobs(s) grab the other of the 2 devices available in the changer, but want to use the same tape that's just been mounted (or put into use) on the jobs that got launched first. They will often literal wait the entire evening until 100's of jobs run through on only one device, until that tape is freed up, at which point it is unmounted from the first device and moved to the second. Note, the behaviour seems to be to round-robin my 8 concurrency limit between the 2 available drives, which mean 4 jobs will run, and 4 jobs will block on waiting for the wanted Volume. When the original 4 jobs are completed (not at the same time) additional jobs are launched that keep that wanted Volume in use. LOG: 03-Nov 22:00 DIRECTOR JobId 267433: Start Backup JobId 267433, Job=JOB. 2012-11-03_22.00.00_0403-Nov 22:00 DIRECTOR JobId 267433: Using Device L100-Drive-003-Nov 22:00 DIRECTOR JobId 267433: Sending Accurate information. 03-Nov 22:00 sd_L100_ JobId 267433: 3307 Issuing autochanger unload slot 82, drive 0 command. 03-Nov 22:06 lawson-sd_L100_ JobId 267433: Warning: Volume IM0108 wanted on L100-Drive-0 (/dev/L100-Drive-0) is in use by device L100-Drive-1 (/dev/L100-Drive-1) 03-Nov 22:09 sd_L100_ JobId 267433: Warning: Volume IM0108 wanted on L100-Drive-0 (/dev/L100-Drive-0) is in use by device L100-Drive-1 (/dev/L100-Drive-1) 03-Nov 22:09 sd_L100_ JobId 267433: Warning: mount.c:217 Open device L100-Drive-0 (/dev/L100-Drive-0) Volume IM0108 failed: ERR=dev.c:513 Unable to open device L100-Drive-0 (/dev/L100-Drive-0): ERR=No medium found . . . CONFIGS (partial and seem pretty straight-forward): Schedule { Name = DefaultSchedule Run = Level=Incremental sat-thu at 22:00 Run = Level=Differential fri at 22:00 } JobDefs { Name = DefaultJob Type = Backup Level = Full Schedule = DefaultSchedule Incremental Backup Pool = Incremental-Pool Differential Backup Pool = Incremental-Pool } Pool { Name = Incremental-Pool Pool Type = Backup Storage = L100-changer } Storage { Name = L100-changer Device = L100-changer Media Type = LTO-3 Autochanger = yes Maximum Concurrent Jobs = 8 } Autochanger { Name = L100-changer Device = L100-Drive-0 Device = L100-Drive-1 Changer Device = /dev/L100-changer } Device { Name = L100-Drive-0 Drive Index = 0 Media Type = LTO-3 Archive Device = /dev/L100-Drive-0 AutomaticMount = yes; AlwaysOpen = yes; RemovableMedia = yes; RandomAccess = no; AutoChanger = yes; AutoSelect = yes; } Device { Name = L100-Drive-1 Drive Index = 0 Media Type = LTO-3 Archive Device = /dev/L100-Drive-1 AutomaticMount = yes; AlwaysOpen = yes; RemovableMedia = yes; RandomAccess = no; AutoChanger = yes; AutoSelect = yes; } I do not have a good solution but I know by default bacula does not want to load the same pool into more than 1 storage device at the same time. John I think it's something in the automated logic. Because if I launch jobs by hand (same pool across 2 tapes devices in same autochanger) everything works fine. I think it has more to do with the Scheduler assigning the same same Volume to all jobs and then not wanting to change that choice if that Volume is in use. I also use Accurate backups which can sometimes take a bit before the job get's back to volume/drive assignments, so it might be a race condition where when the blocking jobs start they still want the same Volume as the jobs that run, because the jobs that run are still setting up Accurate backup and haven't been solidly assigned that Volume yet. I don't know. It's rather annoying, especially as we attempt to ramp up our backup capacity. Lastly, it doesn't ALWAYS happen, though it does seem to happen more often than not. If I do a status on the Director for instance and see the jobs for the next day lined up in Scheduled jobs, they all have the same Volume
Re: [Bacula-users] wanted on DEVICE-0, is in use by device DEVICE-1
On 11/05/2012 01:17 PM, Josh Fisher wrote: On 11/5/2012 11:03 AM, Stephen Thompson wrote: On 11/5/12 7:59 AM, John Drescher wrote: I've had the following problem for ages (meaning multiple major revisions of bacula) and I've seen this come up from time to time on the mailing list, but I've never actually seen a resolution (please point me to one if it's been found). background: I run monthly Fulls and nightly Incrementals. I have a 2 drive autochanger dedicated to my Incrementals. I launch something like ~150 Incremental jobs each night. I am configured for 8 concurrent jobs on the Storage Daemon. PROBLEM: The first job(s) grab one of the 2 devices available in the changer (which is set to AutoSelect) and either load a tape, or use a tape from the previous evening. All tapes in the changer are in the same Incremenal-Pool. The second jobs(s) grab the other of the 2 devices available in the changer, but want to use the same tape that's just been mounted (or put into use) on the jobs that got launched first. They will often literal wait the entire evening until 100's of jobs run through on only one device, until that tape is freed up, at which point it is unmounted from the first device and moved to the second. Note, the behaviour seems to be to round-robin my 8 concurrency limit between the 2 available drives, which mean 4 jobs will run, and 4 jobs will block on waiting for the wanted Volume. When the original 4 jobs are completed (not at the same time) additional jobs are launched that keep that wanted Volume in use. LOG: 03-Nov 22:00 DIRECTOR JobId 267433: Start Backup JobId 267433, Job=JOB. 2012-11-03_22.00.00_0403-Nov 22:00 DIRECTOR JobId 267433: Using Device L100-Drive-003-Nov 22:00 DIRECTOR JobId 267433: Sending Accurate information. 03-Nov 22:00 sd_L100_ JobId 267433: 3307 Issuing autochanger unload slot 82, drive 0 command. 03-Nov 22:06 lawson-sd_L100_ JobId 267433: Warning: Volume IM0108 wanted on L100-Drive-0 (/dev/L100-Drive-0) is in use by device L100-Drive-1 (/dev/L100-Drive-1) 03-Nov 22:09 sd_L100_ JobId 267433: Warning: Volume IM0108 wanted on L100-Drive-0 (/dev/L100-Drive-0) is in use by device L100-Drive-1 (/dev/L100-Drive-1) 03-Nov 22:09 sd_L100_ JobId 267433: Warning: mount.c:217 Open device L100-Drive-0 (/dev/L100-Drive-0) Volume IM0108 failed: ERR=dev.c:513 Unable to open device L100-Drive-0 (/dev/L100-Drive-0): ERR=No medium found . . . CONFIGS (partial and seem pretty straight-forward): Schedule { Name = DefaultSchedule Run = Level=Incremental sat-thu at 22:00 Run = Level=Differential fri at 22:00 } JobDefs { Name = DefaultJob Type = Backup Level = Full Schedule = DefaultSchedule Incremental Backup Pool = Incremental-Pool Differential Backup Pool = Incremental-Pool } Pool { Name = Incremental-Pool Pool Type = Backup Storage = L100-changer } Storage { Name = L100-changer Device = L100-changer Media Type = LTO-3 Autochanger = yes Maximum Concurrent Jobs = 8 } Autochanger { Name = L100-changer Device = L100-Drive-0 Device = L100-Drive-1 Changer Device = /dev/L100-changer } Device { Name = L100-Drive-0 Drive Index = 0 Media Type = LTO-3 Archive Device = /dev/L100-Drive-0 AutomaticMount = yes; AlwaysOpen = yes; RemovableMedia = yes; RandomAccess = no; AutoChanger = yes; AutoSelect = yes; } Device { Name = L100-Drive-1 Drive Index = 0 Media Type = LTO-3 Archive Device = /dev/L100-Drive-1 AutomaticMount = yes; AlwaysOpen = yes; RemovableMedia = yes; RandomAccess = no; AutoChanger = yes; AutoSelect = yes; } I do not have a good solution but I know by default bacula does not want to load the same pool into more than 1 storage device at the same time. John I think it's something in the automated logic. Because if I launch jobs by hand (same pool across 2 tapes devices in same autochanger) everything works fine. I think it has more to do with the Scheduler assigning the same same Volume to all jobs and then not wanting to change that choice if that Volume is in use. When both jobs start at the same time and same priority, they see the same exact next available volume for the pool, and so both select the same volume. When they select different drives, it is a problem, since the volume can only be in one drive. When you start the jobs manually, I assume you are starting them at different times. This works, because the first job is up and running with the volume loaded before the second job begins its selection process. One way to handle this issue is to have a different Schedule for each job and start the jobs at different times with one
Re: [Bacula-users] wanted on DEVICE-0, is in use by device DEVICE-1
Going to try this out. Stephen On 11/05/2012 02:40 PM, Josh Fisher wrote: On 11/5/2012 4:28 PM, Stephen Thompson wrote: On 11/05/2012 01:17 PM, Josh Fisher wrote: On 11/5/2012 11:03 AM, Stephen Thompson wrote: On 11/5/12 7:59 AM, John Drescher wrote: I've had the following problem for ages (meaning multiple major revisions of bacula) and I've seen this come up from time to time on the mailing list, but I've never actually seen a resolution (please point me to one if it's been found). background: I run monthly Fulls and nightly Incrementals. I have a 2 drive autochanger dedicated to my Incrementals. I launch something like ~150 Incremental jobs each night. I am configured for 8 concurrent jobs on the Storage Daemon. PROBLEM: The first job(s) grab one of the 2 devices available in the changer (which is set to AutoSelect) and either load a tape, or use a tape from the previous evening. All tapes in the changer are in the same Incremenal-Pool. The second jobs(s) grab the other of the 2 devices available in the changer, but want to use the same tape that's just been mounted (or put into use) on the jobs that got launched first. They will often literal wait the entire evening until 100's of jobs run through on only one device, until that tape is freed up, at which point it is unmounted from the first device and moved to the second. Note, the behaviour seems to be to round-robin my 8 concurrency limit between the 2 available drives, which mean 4 jobs will run, and 4 jobs will block on waiting for the wanted Volume. When the original 4 jobs are completed (not at the same time) additional jobs are launched that keep that wanted Volume in use. LOG: 03-Nov 22:00 DIRECTOR JobId 267433: Start Backup JobId 267433, Job=JOB. 2012-11-03_22.00.00_0403-Nov 22:00 DIRECTOR JobId 267433: Using Device L100-Drive-003-Nov 22:00 DIRECTOR JobId 267433: Sending Accurate information. 03-Nov 22:00 sd_L100_ JobId 267433: 3307 Issuing autochanger unload slot 82, drive 0 command. 03-Nov 22:06 lawson-sd_L100_ JobId 267433: Warning: Volume IM0108 wanted on L100-Drive-0 (/dev/L100-Drive-0) is in use by device L100-Drive-1 (/dev/L100-Drive-1) 03-Nov 22:09 sd_L100_ JobId 267433: Warning: Volume IM0108 wanted on L100-Drive-0 (/dev/L100-Drive-0) is in use by device L100-Drive-1 (/dev/L100-Drive-1) 03-Nov 22:09 sd_L100_ JobId 267433: Warning: mount.c:217 Open device L100-Drive-0 (/dev/L100-Drive-0) Volume IM0108 failed: ERR=dev.c:513 Unable to open device L100-Drive-0 (/dev/L100-Drive-0): ERR=No medium found . . . CONFIGS (partial and seem pretty straight-forward): Schedule { Name = DefaultSchedule Run = Level=Incremental sat-thu at 22:00 Run = Level=Differential fri at 22:00 } JobDefs { Name = DefaultJob Type = Backup Level = Full Schedule = DefaultSchedule Incremental Backup Pool = Incremental-Pool Differential Backup Pool = Incremental-Pool } Pool { Name = Incremental-Pool Pool Type = Backup Storage = L100-changer } Storage { Name = L100-changer Device = L100-changer Media Type = LTO-3 Autochanger = yes Maximum Concurrent Jobs = 8 } Autochanger { Name = L100-changer Device = L100-Drive-0 Device = L100-Drive-1 Changer Device = /dev/L100-changer } Device { Name = L100-Drive-0 Drive Index = 0 Media Type = LTO-3 Archive Device = /dev/L100-Drive-0 AutomaticMount = yes; AlwaysOpen = yes; RemovableMedia = yes; RandomAccess = no; AutoChanger = yes; AutoSelect = yes; } Device { Name = L100-Drive-1 Drive Index = 0 Media Type = LTO-3 Archive Device = /dev/L100-Drive-1 AutomaticMount = yes; AlwaysOpen = yes; RemovableMedia = yes; RandomAccess = no; AutoChanger = yes; AutoSelect = yes; } I do not have a good solution but I know by default bacula does not want to load the same pool into more than 1 storage device at the same time. John I think it's something in the automated logic. Because if I launch jobs by hand (same pool across 2 tapes devices in same autochanger) everything works fine. I think it has more to do with the Scheduler assigning the same same Volume to all jobs and then not wanting to change that choice if that Volume is in use. When both jobs start at the same time and same priority, they see the same exact next available volume for the pool, and so both select the same volume. When they select different drives, it is a problem, since the volume can only be in one drive. When you start the jobs manually, I assume you are starting them at different times. This works, because the first job is up
Re: [Bacula-users] wanted on DEVICE-0, is in use by device DEVICE-1
No such luck. I already have Prefer Mounted Volumes = no set for all jobs. That's apparently not a solution. Stephen On 11/5/12 2:57 PM, Stephen Thompson wrote: Going to try this out. Stephen On 11/05/2012 02:40 PM, Josh Fisher wrote: On 11/5/2012 4:28 PM, Stephen Thompson wrote: On 11/05/2012 01:17 PM, Josh Fisher wrote: On 11/5/2012 11:03 AM, Stephen Thompson wrote: On 11/5/12 7:59 AM, John Drescher wrote: I've had the following problem for ages (meaning multiple major revisions of bacula) and I've seen this come up from time to time on the mailing list, but I've never actually seen a resolution (please point me to one if it's been found). background: I run monthly Fulls and nightly Incrementals. I have a 2 drive autochanger dedicated to my Incrementals. I launch something like ~150 Incremental jobs each night. I am configured for 8 concurrent jobs on the Storage Daemon. PROBLEM: The first job(s) grab one of the 2 devices available in the changer (which is set to AutoSelect) and either load a tape, or use a tape from the previous evening. All tapes in the changer are in the same Incremenal-Pool. The second jobs(s) grab the other of the 2 devices available in the changer, but want to use the same tape that's just been mounted (or put into use) on the jobs that got launched first. They will often literal wait the entire evening until 100's of jobs run through on only one device, until that tape is freed up, at which point it is unmounted from the first device and moved to the second. Note, the behaviour seems to be to round-robin my 8 concurrency limit between the 2 available drives, which mean 4 jobs will run, and 4 jobs will block on waiting for the wanted Volume. When the original 4 jobs are completed (not at the same time) additional jobs are launched that keep that wanted Volume in use. LOG: 03-Nov 22:00 DIRECTOR JobId 267433: Start Backup JobId 267433, Job=JOB. 2012-11-03_22.00.00_0403-Nov 22:00 DIRECTOR JobId 267433: Using Device L100-Drive-003-Nov 22:00 DIRECTOR JobId 267433: Sending Accurate information. 03-Nov 22:00 sd_L100_ JobId 267433: 3307 Issuing autochanger unload slot 82, drive 0 command. 03-Nov 22:06 lawson-sd_L100_ JobId 267433: Warning: Volume IM0108 wanted on L100-Drive-0 (/dev/L100-Drive-0) is in use by device L100-Drive-1 (/dev/L100-Drive-1) 03-Nov 22:09 sd_L100_ JobId 267433: Warning: Volume IM0108 wanted on L100-Drive-0 (/dev/L100-Drive-0) is in use by device L100-Drive-1 (/dev/L100-Drive-1) 03-Nov 22:09 sd_L100_ JobId 267433: Warning: mount.c:217 Open device L100-Drive-0 (/dev/L100-Drive-0) Volume IM0108 failed: ERR=dev.c:513 Unable to open device L100-Drive-0 (/dev/L100-Drive-0): ERR=No medium found . . . CONFIGS (partial and seem pretty straight-forward): Schedule { Name = DefaultSchedule Run = Level=Incremental sat-thu at 22:00 Run = Level=Differential fri at 22:00 } JobDefs { Name = DefaultJob Type = Backup Level = Full Schedule = DefaultSchedule Incremental Backup Pool = Incremental-Pool Differential Backup Pool = Incremental-Pool } Pool { Name = Incremental-Pool Pool Type = Backup Storage = L100-changer } Storage { Name = L100-changer Device = L100-changer Media Type = LTO-3 Autochanger = yes Maximum Concurrent Jobs = 8 } Autochanger { Name = L100-changer Device = L100-Drive-0 Device = L100-Drive-1 Changer Device = /dev/L100-changer } Device { Name = L100-Drive-0 Drive Index = 0 Media Type = LTO-3 Archive Device = /dev/L100-Drive-0 AutomaticMount = yes; AlwaysOpen = yes; RemovableMedia = yes; RandomAccess = no; AutoChanger = yes; AutoSelect = yes; } Device { Name = L100-Drive-1 Drive Index = 0 Media Type = LTO-3 Archive Device = /dev/L100-Drive-1 AutomaticMount = yes; AlwaysOpen = yes; RemovableMedia = yes; RandomAccess = no; AutoChanger = yes; AutoSelect = yes; } I do not have a good solution but I know by default bacula does not want to load the same pool into more than 1 storage device at the same time. John I think it's something in the automated logic. Because if I launch jobs by hand (same pool across 2 tapes devices in same autochanger) everything works fine. I think it has more to do with the Scheduler assigning the same same Volume to all jobs and then not wanting to change that choice if that Volume is in use. When both jobs start at the same time and same priority, they see the same exact next available volume for the pool, and so both select the same volume. When
[Bacula-users] Fwd: Re: wanted on DEVICE-0, is in use by device DEVICE-1
A quick test of this scenario seems to work. Leaving Prefer Mounted Volumes = yes (default). Setting both drives in autochanger to have 1/2 of the the total concurrently limit. This per device setting seems to allow for multiple drives using the same Pool. Not very well documented IMHO. Stephen Original Message Return-Path:bob_het...@hotmail.com Are you using the setting: prefer mounted volumes=yes or no ? If you had it set to yes, then you'd never use the 2nd tape drive, but if you set it to no, sometimes you'd hit a deadlock. I used to have an environment with more than a hundred daily jobs and would hit a contention issue occasionally. The developers eventually abandoned that code in favor of setting the maximum concurrent jobs per device http://www.bacula.org/5.2.x-manuals/en/main/main/New_Features_in_5_0_0.html#SECTION0091 In addition, another problem I hit occasionally would appear after upgrading the OS. If you update your system you may need to rebuild bacula. Before I started rebuilding bacula at the end of system updates I would hit race conditions and process crashes. Bob -- LogMeIn Central: Instant, anywhere, Remote PC access and management. Stay in control, update software, and manage PCs from one command center Diagnose problems and improve visibility into emerging IT issues Automate, monitor and manage. Do more in less time with Central http://p.sf.net/sfu/logmein12331_d2d ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Migrating from myisam to innodb
Another perspective... I've personally found that if your memory is limited (my bacula db server has 8Gb of RAM) that, for a bacula database, mysql performs _better_ than postgres. My File table currently has 2,856,394,323 rows. I've seen so many recommendations here and elsewhere about postgres being an obvious choice over mysql, but in real life practice, we've found at our site that mysql gave us better results (even after weeks of tuning postgres). Our hybrid solution is to run mysql INNODB as the active database so to avoid table-locking which causes all kinds of problems, especially operator access to bconsole. However, due to the painfully slow dumps from INNODB, we have a slave mysql server running MYISAM that we use for regular ole mysql dumps. In general this works out fairly well for us. The only unresolved issue that we have is that some of the bacula queries can take awhile to return. I've tracked it down the way the db engine is responding to the query, but the odd thing is that the first time these queries run, they are quick, then the mysql engine changes the recipe it uses to a slower one. Haven't figured out why or how to keep it running the quick way. Stephen On 03/01/2013 03:16 AM, Uwe Schuerkamp wrote: On Tue, Feb 26, 2013 at 04:23:20PM +, Alan Brown wrote: On 26/02/13 09:42, Uwe Schuerkamp wrote: for the record I'd like to give you some stats from our recent myisam - innodb conversion. For the sizes you're talking about, I'd recommend: 1: A _lot_ more memory. 100Gb or so. and even more strongly: 2: Postgresql Mysql is fast and good for small databases, but postgresql scales to large sizes with a lot less pain and suffering. Conversion here was relatively painless. Hi Alan list, can you point me to some good conversion guides and esp. utlities? I checked the postgres documentation wiki, but half of the scripts linked there are dead it seems. I tried converting a mysql dump to pg using my2pg.pl, but the poor script ran out of memory 30 minutes into the conversion on the test machine (Centos 6, 8GB RAM ;-) I'm hoping our File table will get a lot smaller now over time as we've moved away from copy jobs for the time being, so the conversion should also get easier as tape volumes with millions of files on them get recycled and pruned. All the best, Uwe -- Stephen Thompson Berkeley Seismological Laboratory step...@seismo.berkeley.edu215 McCone Hall # 4760 510.214.6506 (phone) University of California, Berkeley 510.643.5811 (fax) Berkeley, CA 94720-4760 -- Everyone hates slow websites. So do we. Make your web apps faster with AppDynamics Download AppDynamics Lite for free today: http://p.sf.net/sfu/appdyn_d2d_feb ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
[Bacula-users] duplicate job storage device bug?
Hey all, Figured I'd throw this out there before opening a ticket in case this is already known or I'm just confused. We use duplicate job control for the following reason: We run nightly Incrementals of _all_ jobs. Then rather than running Fulls on a cyclic schedule, we run them back-to-back, injecting a few at a time via scripts. Note, we also have two tape libraries (and two SDs), one for Incremental Pools and one for Full Pools. Where duplicate job control comes in is that we want a running Incremental to be canceled if a Full of the same job is launched on any given night since the Full, in our case, should take precedence and be run immediately. What we see is that the Full does indeed cancel the running Incremental and then runs itself, HOWEVER the Full job takes on the storage properties (storage device) of the canceled Incremental job rather than using it's own settings. The Full job then expects its Full Pool tape to be in the Incremental tape library, which it is not, and the job stalls for operator intervention. Here's some config snippets: Maximum Concurrent Jobs = 2 Allow Duplicate Jobs = no Cancel Lower Level Duplicates = yes Cancel Running Duplicates = no Cancel Queued Duplicates = no Log snippets: (incremental launches) 03-Aug 04:05 DIRECTOR JobId 316646: Start Backup JobId 316646, Job=CLIENT.2013-08-02_22.01.01_50 03-Aug 04:05 DIRECTOR JobId 316646: Using Device L100-Drive-0 to write. (full launches and cancels incremental) 03-Aug 06:20 DIRECTOR JobId 316677: Cancelling duplicate JobId=316646. 03-Aug 06:20 DIRECTOR JobId 316677: 2001 Job sutter_5.2013-08-02_22.01.01_50 marked to be canceled. 03-Aug 06:20 DIRECTOR JobId 316677: Cancelling duplicate JobId=316646. 03-Aug 06:20 DIRECTOR JobId 316677: 2901 Job sutter_5.2013-08-02_22.01.01_50 not found. 03-Aug 06:20 DIRECTOR JobId 316677: 3904 Job sutter_5.2013-08-02_22.01.01_50 not found. 03-Aug 08:20 DIRECTOR JobId 316677: Start Backup JobId 316677, Job=sutter_5.2013-08-03_06.20.02_04 (full complains that volume is tried to load is incremental tape instead of full tape) 03-Aug 08:22 DIRECTOR JobId 316677: Using Device L100-Drive-0 to write. 03-Aug 08:22 SD_L100_ JobId 316677: 3304 Issuing autochanger load slot 72, drive 0 command. 03-Aug 08:23 SD_L100_ JobId 316677: 3305 Autochanger load slot 72, drive 0, status is OK. 03-Aug 08:23 SD_L100_ JobId 316677: Warning: Director wanted Volume FB0718. Current Volume IM0097 not acceptable because: 1998 Volume IM0097 catalog status is Full, not in Pool. NOTE: Full job launch command was run job=sutter_5 level=Full storage=SL500-Drive-1 yes and yet, apparently, due to the job duplicate cancellation, the Full job instead attempted to use storage=L100-Drive-0. thanks, Stephen -- Stephen Thompson Berkeley Seismological Laboratory step...@seismo.berkeley.edu215 McCone Hall # 4760 510.214.6506 (phone) University of California, Berkeley 510.643.5811 (fax) Berkeley, CA 94720-4760 -- Get your SQL database under version control now! Version control is standard for application code, but databases havent caught up. So what steps can you take to put your SQL databases under version control? Why should you start doing it? Read more to find out. http://pubads.g.doubleclick.net/gampad/clk?id=49501711iu=/4140/ostg.clktrk ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] choosing database.
The answer may partly come from how much RAM the system running the database has. I've seen numerous preferences for postgres on this mailing list, but I've personally found on my 8Gb RAM system, I get better performance out of mysql. We backup about 130+ hosts, incrementals nightly, differentials weekly, fulls monthly (~40TB). Stephen On 9/19/13 8:06 AM, Mauro wrote: Hello. I'm using bacula in a linux debian system. I've to backup about 30 hosts. I've choose postresql as database. What do you think about? Better mysql or postgres? -- LIMITED TIME SALE - Full Year of Microsoft Training For Just $49.99! 1,500+ hours of tutorials including VisualStudio 2012, Windows 8, SharePoint 2013, SQL 2012, MVC 4, more. BEST VALUE: New Multi-Library Power Pack includes Mobile, Cloud, Java, and UX Design. Lowest price ever! Ends 9/20/13. http://pubads.g.doubleclick.net/gampad/clk?id=58041151iu=/4140/ostg.clktrk ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users -- Stephen Thompson Berkeley Seismological Laboratory step...@seismo.berkeley.edu215 McCone Hall # 4760 510.214.6506 (phone) University of California, Berkeley 510.643.5811 (fax) Berkeley, CA 94720-4760 -- LIMITED TIME SALE - Full Year of Microsoft Training For Just $49.99! 1,500+ hours of tutorials including VisualStudio 2012, Windows 8, SharePoint 2013, SQL 2012, MVC 4, more. BEST VALUE: New Multi-Library Power Pack includes Mobile, Cloud, Java, and UX Design. Lowest price ever! Ends 9/20/13. http://pubads.g.doubleclick.net/gampad/clk?id=58041151iu=/4140/ostg.clktrk ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] choosing database.
On 09/19/2013 08:51 AM, Mauro wrote: On 19 September 2013 17:20, Stephen Thompson step...@seismo.berkeley.edu mailto:step...@seismo.berkeley.edu wrote: The answer may partly come from how much RAM the system running the database has. I've seen numerous preferences for postgres on this mailing list, but I've personally found on my 8Gb RAM system, I get better performance out of mysql. We backup about 130+ hosts, incrementals nightly, differentials weekly, fulls monthly (~40TB). In my case the ram is not a problem, bacula server is in a virtual machine, I'm using xen, actually my ram is 4G but I can increase. I've to backup about 30 host, four of which have a lot of data to be backed up. One has about 80G of data, multimedia files and other. I've always used postgres for all my needs so I though to use it also for bacula server. Given what you're going to backup, I don't think it's really going to matter which database you choose. Pick whichever database you're more familiar with, as that's likely going to be the only difference you'll notice between them. Also, in this discussion folks don't always immediately bring up retention as that (along with the number, not size, of files you backup) is going to determine your database size. Since 90+% of the bacula database is the File table, that's where good or poor performance is going to exhibit itself. We have a 300-400Gb File table and get reasonable performance from mysql and 8Gb of RAM. We run the innodb engine for bacula itself (less blocking than myisam), and the myisam engine on a slave server for catalog dumps (faster dumps than innodb). Stephen -- Stephen Thompson Berkeley Seismological Laboratory step...@seismo.berkeley.edu215 McCone Hall # 4760 510.664.9177 (phone) University of California, Berkeley 510.643.5811 (fax) Berkeley, CA 94720-4760 -- LIMITED TIME SALE - Full Year of Microsoft Training For Just $49.99! 1,500+ hours of tutorials including VisualStudio 2012, Windows 8, SharePoint 2013, SQL 2012, MVC 4, more. BEST VALUE: New Multi-Library Power Pack includes Mobile, Cloud, Java, and UX Design. Lowest price ever! Ends 9/20/13. http://pubads.g.doubleclick.net/gampad/clk?id=58041151iu=/4140/ostg.clktrk ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
[Bacula-users] bconsole 7.0.2 storage status issue
Hello, Wanting to confirm something new I'm seeing in 7.0.2 with bconsole. I have multiple storage daemons with multiple devices. Used to be (5.2.13) that a status and then 2: Storage in bconsole would present a list of storage devices to query. Not it immediately returns only the status of the first device I have configured for my Director. A mount command in comparison, will present me with what I am used to -- the list of devices to choose from. Is this a feature? A bug? thanks, Stephen -- Stephen Thompson Berkeley Seismological Laboratory step...@seismo.berkeley.edu215 McCone Hall # 4760 510.214.6506 (phone) University of California, Berkeley 510.643.5811 (fax) Berkeley, CA 94720-4760 -- Learn Graph Databases - Download FREE O'Reilly Book Graph Databases is the definitive new guide to graph databases and their applications. Written by three acclaimed leaders in the field, this first edition is now available. Download your free book today! http://p.sf.net/sfu/NeoTech ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Bug when canceling a job in bconsole on 7.0.2?
I believe I've seen this unwanted behaviour as well. I cannot test, as at the moment I have a job running that I could not have accidentally canceled, but this past weekend I attempted to cancel a running Incremental job by number (as I have successfully many times in the past), but somehow a different Full job that was also running at the time got canceled as well. Stephen On 4/28/14 7:15 PM, Bill Arlofski wrote: Whoops... Clicked send too soon. Just a follow-up. I went ahead and chose #1 in the list to see if it would cancel both jobs. It did: *can Select Job(s): 1: JobId=25775 Job=Helpdesk.2014-04-28_20.30.00_52 2: JobId=25776 Job=Postbooks.2014-04-28_20.30.00_53 Choose Job list to cancel (1-2): 1 JobId=25775 Job=Helpdesk.2014-04-28_20.30.00_52 JobId=25776 Job=Postbooks.2014-04-28_20.30.00_53 Confirm cancel of 2 Jobs (yes/no): yes 2001 Job Helpdesk.2014-04-28_20.30.00_52 marked to be canceled. 3000 JobId=25775 Job=Helpdesk.2014-04-28_20.30.00_52 marked to be canceled. 2001 Job Postbooks.2014-04-28_20.30.00_53 marked to be canceled. 3000 JobId=25776 Job=Postbooks.2014-04-28_20.30.00_53 marked to be canceled. -- Stephen Thompson Berkeley Seismological Laboratory step...@seismo.berkeley.edu215 McCone Hall # 4760 510.214.6506 (phone) University of California, Berkeley 510.643.5811 (fax) Berkeley, CA 94720-4760 -- Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE Instantly run your Selenium tests across 300+ browser/OS combos. Get unparalleled scalability from the best Selenium testing platform available. Simple to use. Nothing to install. Get started now for free. http://p.sf.net/sfu/SauceLabs ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Fatal error: askdir.c:340 NULL Volume name. This shouldn't happen!!!
Hello, I believe this bug is present in version 7.0.3. I just had it happen last night, much like I saw about 2 years ago. I run 100s of incrementals each night across 2 LTO tap drives, running with a concurrency limit, so that jobs start whenever others are finished (i.e. I cannot stagger their start times.). I'm assuming this is again a race condition, but one as an end-user I really cannot workaround. So far the problem is not frequent, but does still appear to be an issue. thanks, Stephen On 02/20/2014 09:30 AM, Kern Sibbald wrote: Hello Wolfgang, The drive is allocated first. Your analysis is correct, but obviously something is wrong. I don't think this is happening any more with the Enterprise version, so it will very likely be fixed in the next release as we will backport (or flowback) some rather massive changes we have made in the last during the freeze to the community version. If you want to see what is going on a little more, turn on a debug level in the SD of about 100. Likewise you can set a debug level in the SD of say 1 or 2, then when you do a status, if Bacula is having difficulties reserving a drive, it will print out more detailed information on what is going on -- this last is most effective if jobs end up waiting because a resource (drive or volume) is not available. Best regards, Kern On 02/17/2014 11:54 PM, Wolfgang Denk wrote: Dear Kern Sibbald, In message 5301db23.6010...@sibbald.com you wrote: Were you careful to change the actual volume retention period in the catalog entry for the volume? That requires a manual step after changing the conf file. You can check two ways: Yes, I was. list volumes shows the new retention period for all volumes. 1. Look at the full output from all the jobs and see if any volumes were recycled while the batch of jobs ran. Not in this run, and not in any of the last 15 or so before that. 2. Do a llist on all the volumes that were used during the period the problem happened and see if they were freshly recycled and that the retention period is set to your new value. retention period is as expected, no recycling happened. In any case, I will look over your previous emails to see if I see anything that could point to a problem, and I will look at the bug report, but without a test case, this is one of those nightmare bugs that take huge resources and time to fix. Hm... I wonder why the DIR allocates for two simultaneous running jobs two pairs of (DRIVE, VOLUME), but not using the volume currently mounted in the respective drive, but in the other one. I would expect, that when a job starts, that either a volume or a drive is selected first: - if the drive is selected first, and it has a tape loaded which is in the right pool, and in status append, then there should be no need to ask for any other tape. - if the volume is allocated first, and it is already loaded in a suitable drive, then that drive should be used, ant not the other one. Best regards, Wolfgang Denk -- Managing the Performance of Cloud-Based Applications Take advantage of what the Cloud has to offer - Avoid Common Pitfalls. Read the Whitepaper. http://pubads.g.doubleclick.net/gampad/clk?id=121054471iu=/4140/ostg.clktrk ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users -- Stephen Thompson Berkeley Seismological Laboratory step...@seismo.berkeley.edu215 McCone Hall # 4760 510.664.9177 (phone) University of California, Berkeley 510.643.5811 (fax) Berkeley, CA 94720-4760 -- Is your legacy SCM system holding you back? Join Perforce May 7 to find out: #149; 3 signs your SCM is hindering your productivity #149; Requirements for releasing software faster #149; Expert tips and advice for migrating your SCM now http://p.sf.net/sfu/perforce ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Bug when canceling a job in bconsole on 7.0.2?
I may be able to test at the end of the month. Right now I have continuous jobs running that I'd rather not inadvertently cancel. Stephen On 5/22/14 8:37 AM, Bill Arlofski wrote: On 05/22/14 11:28, Kern Sibbald wrote: Hello Bill, I have also pushed a patch that may well fix the problem you are having with cancel. I have never been able to reproduce the problem, but I did yet another rewrite of the sellist routine as well as designed a number of tests, none of which every failed. However, in the process I noticed that the source code that called the sellist methods was using the wrong calling sequence (my own fault). I am pretty sure that is what was causing your problem. In any case, this new code is in the current git public repo and I would appreciate it if you would test it. Best regards, Kern Hi Kern, I saw that you wrote the above as an add-on to another thread, I am posting it here so that this thread is complete too. I currently don't have time to test this, but perhaps Stephen who is also seeing this issue might. I will test it as soon as I have some free time, unless of course Stephen or someone else has confirmed that the patch fixes the issue. Thanks Kern! Bill -- Bill Arlofski Reverse Polarity, LLC http://www.revpol.com/ -- Not responsible for anything below this line -- -- Stephen Thompson Berkeley Seismological Laboratory step...@seismo.berkeley.edu215 McCone Hall # 4760 510.214.6506 (phone) University of California, Berkeley 510.643.5811 (fax) Berkeley, CA 94720-4760 -- Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE Instantly run your Selenium tests across 300+ browser/OS combos. Get unparalleled scalability from the best Selenium testing platform available Simple to use. Nothing to install. Get started now for free. http://p.sf.net/sfu/SauceLabs ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] RESTORE PRUNED FILE (WITH CATALOG BACKUPS)
If you have the flexibility to do this, the simplest way might be to restore the catalog from tape, shut down bacula, temporarily move aside your up-to-date database and put the restored database in it's place (this is likely restoring the database from a dump file), do your restore now that you have a version of the database with the purged files, then once the restore is complete, shutdown bacula and move your up-to-date database back into place. Stephen On 5/29/14 6:49 AM, david parada wrote: Thanks John, I am not very confidence with BSCAN. Can you tell me an example to add files again to catalog using your way? Kind regards, David +-- |This was sent by david.par...@techex.es via Backup Central. |Forward SPAM to ab...@backupcentral.com. +-- -- Time is money. Stop wasting it! Get your web API in 5 minutes. www.restlet.com/download http://p.sf.net/sfu/restlet ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users -- Stephen Thompson Berkeley Seismological Laboratory step...@seismo.berkeley.edu215 McCone Hall # 4760 510.214.6506 (phone) University of California, Berkeley 510.643.5811 (fax) Berkeley, CA 94720-4760 -- Time is money. Stop wasting it! Get your web API in 5 minutes. www.restlet.com/download http://p.sf.net/sfu/restlet ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] RESTORE PRUNED FILE (WITH CATALOG BACKUPS)
I didn't mention this, but of course, you would not want to run any other jobs (or really do anything with bacula at all!) while running the old database beyond the restore of the files, otherwise those changes won't make it into your up-to-date database you ultimately run with. On 5/29/14 7:21 AM, Stephen Thompson wrote: If you have the flexibility to do this, the simplest way might be to restore the catalog from tape, shut down bacula, temporarily move aside your up-to-date database and put the restored database in it's place (this is likely restoring the database from a dump file), do your restore now that you have a version of the database with the purged files, then once the restore is complete, shutdown bacula and move your up-to-date database back into place. Stephen On 5/29/14 6:49 AM, david parada wrote: Thanks John, I am not very confidence with BSCAN. Can you tell me an example to add files again to catalog using your way? Kind regards, David +-- |This was sent by david.par...@techex.es via Backup Central. |Forward SPAM to ab...@backupcentral.com. +-- -- Time is money. Stop wasting it! Get your web API in 5 minutes. www.restlet.com/download http://p.sf.net/sfu/restlet ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users -- Stephen Thompson Berkeley Seismological Laboratory step...@seismo.berkeley.edu215 McCone Hall # 4760 510.214.6506 (phone) University of California, Berkeley 510.643.5811 (fax) Berkeley, CA 94720-4760 -- Time is money. Stop wasting it! Get your web API in 5 minutes. www.restlet.com/download http://p.sf.net/sfu/restlet ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Bug when canceling a job in bconsole on 7.0.2?
ver 7.0.4 does not appear to have the canceling job issue I saw in 7.0.2. yay! ...and thanks. On 5/22/14 8:37 AM, Bill Arlofski wrote: On 05/22/14 11:28, Kern Sibbald wrote: Hello Bill, I have also pushed a patch that may well fix the problem you are having with cancel. I have never been able to reproduce the problem, but I did yet another rewrite of the sellist routine as well as designed a number of tests, none of which every failed. However, in the process I noticed that the source code that called the sellist methods was using the wrong calling sequence (my own fault). I am pretty sure that is what was causing your problem. In any case, this new code is in the current git public repo and I would appreciate it if you would test it. Best regards, Kern Hi Kern, I saw that you wrote the above as an add-on to another thread, I am posting it here so that this thread is complete too. I currently don't have time to test this, but perhaps Stephen who is also seeing this issue might. I will test it as soon as I have some free time, unless of course Stephen or someone else has confirmed that the patch fixes the issue. Thanks Kern! Bill -- Bill Arlofski Reverse Polarity, LLC http://www.revpol.com/ -- Not responsible for anything below this line -- -- Stephen Thompson Berkeley Seismological Laboratory step...@seismo.berkeley.edu215 McCone Hall # 4760 510.214.6506 (phone) University of California, Berkeley 510.643.5811 (fax) Berkeley, CA 94720-4760 -- HPCC Systems Open Source Big Data Platform from LexisNexis Risk Solutions Find What Matters Most in Your Big Data with HPCC Systems Open Source. Fast. Scalable. Simple. Ideal for Dirty Data. Leverages Graph Analysis for Fast Processing Easy Data Exploration http://p.sf.net/sfu/hpccsystems ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
[Bacula-users] issue with setuid/gid on restored files
Sorry if I have not researched this enough before bringing it to the list, but what I'm seeing is very odd. Someone else must have run into this before me. If I restore a setuid or setgid file, the file is restored without the setuid/setgid bit set. However, the directory containing the file (which did not have it's setuid/setgid bit set during the backup) winds up with the setuid/setgid bit being set. If I restore both the directory and the file, the directory ends up with the proper non-setuid/setgid attributes, but the file once again ends up without the setuid/setgid bit set. I'm assuming the directory has the bit set during an interim stage of the restore, but is then properly set when it's attributes are set during the restore (which must happen after the files that it contains). I can't say authoritatively, but I don't believe this is the way bacula used to behave for me. And to say the least, this is far from acceptable. I discovered this during a bare metal restore, and have loads of issues from no setuid or setgid bits being set on the restored system. thanks, Stephen -- Stephen Thompson Berkeley Seismological Laboratory step...@seismo.berkeley.edu215 McCone Hall # 4760 510.214.6506 (phone) University of California, Berkeley 510.643.5811 (fax) Berkeley, CA 94720-4760 -- Want fast and easy access to all the code in your enterprise? Index and search up to 200,000 lines of code with a free copy of Black Duck Code Sight - the same software that powers the world's largest code search on Ohloh, the Black Duck Open Hub! Try it now. http://p.sf.net/sfu/bds ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] issue with setuid/gid on restored files
I'm running 7.0.4. Here's an example... (before backup) # ls -ld /bin dr-xr-xr-x 2 root root 4096 Jul 22 09:56 /bin # ls -l /bin/ping -rwsr-xr-x 1 root root 40760 Sep 17 2013 /bin/ping (after restore selecting file /bin/ping) # ls -ld /bin drwsr-xr-x 2 root root 4096 Jul 22 14:38 bin # ls -l /bin/ping -rwxr-xr-x 1 root root 40760 Sep 17 2013 ping (after restore selecting file /bin/ping and directory /bin) # ls -ld /bin dr-xr-xr-x 2 root root 4096 Jul 22 14:38 bin # ls -l /bin/ping -rwxr-xr-x 1 root root 40760 Sep 17 2013 ping In the first restore case, looks like the dir has user-write permissions as well, which isn't right, but perhaps that comes from the umask of the restore since the directory wasn't part of the restore selection. However, the setuid bit certainly wouldn't be coming from the umask. I'm jumping to the conclusion that whatever's doing the setuid bit is messing up and doing it to the parent directory instead of to the file. Stephen On 7/22/14 2:58 PM, Stephen Thompson wrote: Sorry if I have not researched this enough before bringing it to the list, but what I'm seeing is very odd. Someone else must have run into this before me. If I restore a setuid or setgid file, the file is restored without the setuid/setgid bit set. However, the directory containing the file (which did not have it's setuid/setgid bit set during the backup) winds up with the setuid/setgid bit being set. If I restore both the directory and the file, the directory ends up with the proper non-setuid/setgid attributes, but the file once again ends up without the setuid/setgid bit set. I'm assuming the directory has the bit set during an interim stage of the restore, but is then properly set when it's attributes are set during the restore (which must happen after the files that it contains). I can't say authoritatively, but I don't believe this is the way bacula used to behave for me. And to say the least, this is far from acceptable. I discovered this during a bare metal restore, and have loads of issues from no setuid or setgid bits being set on the restored system. thanks, Stephen -- Stephen Thompson Berkeley Seismological Laboratory step...@seismo.berkeley.edu215 McCone Hall # 4760 510.214.6506 (phone) University of California, Berkeley 510.643.5811 (fax) Berkeley, CA 94720-4760 -- Want fast and easy access to all the code in your enterprise? Index and search up to 200,000 lines of code with a free copy of Black Duck Code Sight - the same software that powers the world's largest code search on Ohloh, the Black Duck Open Hub! Try it now. http://p.sf.net/sfu/bds ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users