Re: [Bacula-users] make bat troubles in bacula 3.0.0

2009-04-17 Thread Stephen Thompson

Hello Dirk,

I would greatly appreciate a patch to be able to install bat 3.0.0
on centos 5 with stock qt 4.2.  I am getting the same errors as Jan Jap.

thanks!
Stephen


Dirk Bartley wrote:
 I have done a patch for being able to install on the older qt.  If you
 wat it, let me know.  I'll hunt it down.  The issue is not with the
 programming but the version of designer I was using.  If I open with an
 older designer and solve a couple of issues it compiled just fine on
 centos.
 
 I'm trying not to keep so up to date lately.
 
 Dirk
 
 
 On Wed, 2009-04-08 at 15:27 -0700, Kelvin Raywood wrote:
 JanJaap Scholing janjaapschol...@hotmail.com:
 I'm trying to install bat.
 ./configure --enable-lockmgr --with-mysql --enable-bat --disable-libtool
 looks ok

 But when I make I see the following error messages:
 [snip]

 I use Debian 4 with qt4 installed (4.2.1-2+etch1), bacula 3.0.0 (latest 
 svn)
 What can I do to solve this problem?
 John Drescher wrote:
 Install Qt 4.3 or greater.

 Ignore that. I was looking at the wrong class in the Qt docs.
 I think the first advice was correct.  I recently had the same problem 
 myself building  bacula-2.5.42-b2 on CentOS-5 which includes QT4.2.1 .

 I grabbed qt 4.3.4 srpm from Fedora-7 and did

   rpmbuild --rebuild --define 'dist .el5' --define 'rhel 5' \
 qt4-4.3.4-14.fc7.src.rpm

   yum localinstall qt4-4.3.4-14.el5.x86_64.rpm \
qt4-devel-4.3.4-14.el5.x86_64.rpm \
qt4-x11-4.3.4-14.el5.x86_64.rpm

 I installed qwt-devel from EPEL.

 BAT built and runs fine.

 Kel Raywood

 --
 This SF.net email is sponsored by:
 High Quality Requirements in a Collaborative Environment.
 Download a free trial of Rational Requirements Composer Now!
 http://p.sf.net/sfu/www-ibm-com
 ___
 Bacula-users mailing list
 Bacula-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/bacula-users
 
 
 --
 This SF.net email is sponsored by:
 High Quality Requirements in a Collaborative Environment.
 Download a free trial of Rational Requirements Composer Now!
 http://p.sf.net/sfu/www-ibm-com
 ___
 Bacula-users mailing list
 Bacula-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/bacula-users


-- 
Stephen Thompson   Berkeley Seismological Laboratory
step...@seismo.berkeley.edu215 McCone Hall # 4760
404.538.7077 (phone)   University of California, Berkeley
510.643.5811 (fax) Berkeley, CA 94720-4760

--
Stay on top of everything new and different, both inside and 
around Java (TM) technology - register by April 22, and save
$200 on the JavaOne (SM) conference, June 2-5, 2009, San Francisco.
300 plus technical and hands-on sessions. Register today. 
Use priority code J9JMT32. http://p.sf.net/sfu/p
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


[Bacula-users] Restore only file attributes (permissions, ACL, owner, group...)

2009-04-17 Thread Stephen Thompson
Hello,

I see that a restore attribute only feature has been added with 3.0.0,
but I cannot find any documentation on how to run a restore with this 
feature.  Is this a command line option to restore?

Any help would be greatly appreciated.
Stephen
-- 
Stephen Thompson   Berkeley Seismological Laboratory
step...@seismo.berkeley.edu215 McCone Hall # 4760
404.538.7077 (phone)   University of California, Berkeley
510.643.5811 (fax) Berkeley, CA 94720-4760

--
Stay on top of everything new and different, both inside and 
around Java (TM) technology - register by April 22, and save
$200 on the JavaOne (SM) conference, June 2-5, 2009, San Francisco.
300 plus technical and hands-on sessions. Register today. 
Use priority code J9JMT32. http://p.sf.net/sfu/p
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


[Bacula-users] job progress?

2009-04-17 Thread Stephen Thompson


Hello,

Is there any built-in/simple way to determine how far along a job is?
Some kind of progress meter against a job size estimate?

Even knowing how much has been put to tape at a given point would be 
nice.  We have jobs that take more than 24 hours to run.  :S

The best I can see is looking at the JobMedia table and then multiplying 
the number of entries for a job by the file size for our tape media.
Not even sure if that's accurate.

Anything simpler?

thanks!
Stephen
-- 
Stephen Thompson   Berkeley Seismological Laboratory
step...@seismo.berkeley.edu215 McCone Hall # 4760
404.538.7077 (phone)   University of California, Berkeley
510.643.5811 (fax) Berkeley, CA 94720-4760

--
Stay on top of everything new and different, both inside and 
around Java (TM) technology - register by April 22, and save
$200 on the JavaOne (SM) conference, June 2-5, 2009, San Francisco.
300 plus technical and hands-on sessions. Register today. 
Use priority code J9JMT32. http://p.sf.net/sfu/p
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] job progress?

2009-04-20 Thread Stephen Thompson

This is what I couldn't seem to find -- the running job bytes to tape 
per jobname.

thanks!
Stephen


Ralf Gross wrote:
 Stephen Thompson schrieb:
 Is there any built-in/simple way to determine how far along a job is?
 Some kind of progress meter against a job size estimate?

 Even knowing how much has been put to tape at a given point would be 
 nice.  We have jobs that take more than 24 hours to run.  :S

 The best I can see is looking at the JobMedia table and then multiplying 
 the number of entries for a job by the file size for our tape media.
 Not even sure if that's accurate.

 Anything simpler?
 
 status client= shows you how much data was backed up so far.
 
 Ralf
 
 --
 Stay on top of everything new and different, both inside and 
 around Java (TM) technology - register by April 22, and save
 $200 on the JavaOne (SM) conference, June 2-5, 2009, San Francisco.
 300 plus technical and hands-on sessions. Register today. 
 Use priority code J9JMT32. http://p.sf.net/sfu/p
 ___
 Bacula-users mailing list
 Bacula-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/bacula-users


-- 
Stephen Thompson   Berkeley Seismological Laboratory
step...@seismo.berkeley.edu215 McCone Hall # 4760
404.538.7077 (phone)   University of California, Berkeley
510.643.5811 (fax) Berkeley, CA 94720-4760

--
Stay on top of everything new and different, both inside and 
around Java (TM) technology - register by April 22, and save
$200 on the JavaOne (SM) conference, June 2-5, 2009, San Francisco.
300 plus technical and hands-on sessions. Register today. 
Use priority code J9JMT32. http://p.sf.net/sfu/p
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] make bat troubles in bacula 3.0.0

2009-04-20 Thread Stephen Thompson


Dirk,

Thanks, though I'm still getting the error:

g++ -c -m64 -pipe -g -D_REENTRANT -Wall -W  -DQT_GUI_LIB -DQT_CORE_LIB 
-DQT_SHARED -I/usr/lib64/qt4/mkspecs/linux-g++-64 -I. 
-I/usr/lib64/qt4/include/QtCore -I/usr/lib64/qt4/include/QtCore 
-I/usr/lib64/qt4/include/QtGui -I/usr/lib64/qt4/include/QtGui 
-I/usr/lib64/qt4/include -I.. -I. -Iconsole -Irestore -Iselect 
-I../../../../qwt/include -Imoc -Iui -o obj/main.o main.cpp
ui/ui_main.h: In member function ���void 
Ui_MainForm::setupUi(QMainWindow*)���:
ui/ui_main.h:168: error: ���class QGridLayout��� has no member named 
���setLeftMargin���
ui/ui_main.h:169: error: ���class QGridLayout��� has no member named 
���setTopMargin���
ui/ui_main.h:170: error: ���class QGridLayout��� has no member named 
���setRightMargin���
ui/ui_main.h:171: error: ���class QGridLayout��� has no member named 
���setBottomMargin���
ui/ui_main.h:172: error: ���class QGridLayout��� has no member named 
���setHorizontalSpacing���
ui/ui_main.h:173: error: ���class QGridLayout��� has no member named 
���setVerticalSpacing���
ui/ui_main.h:224: error: ���class QGridLayout��� has no member named 
���setLeftMargin���
ui/ui_main.h:225: error: ���class QGridLayout��� has no member named 
���setTopMargin���
ui/ui_main.h:226: error: ���class QGridLayout��� has no member named 
���setRightMargin���
ui/ui_main.h:227: error: ���class QGridLayout��� has no member named 
���setBottomMargin���
ui/ui_main.h:228: error: ���class QGridLayout��� has no member named 
���setHorizontalSpacing���
ui/ui_main.h:229: error: ���class QGridLayout��� has no member named 
���setVerticalSpacing���
ui/ui_main.h:255: error: ���class QGridLayout��� has no member named 
���setLeftMargin���
ui/ui_main.h:256: error: ���class QGridLayout��� has no member named 
���setTopMargin���
ui/ui_main.h:257: error: ���class QGridLayout��� has no member named 
���setRightMargin���
ui/ui_main.h:258: error: ���class QGridLayout��� has no member named 
���setBottomMargin���
ui/ui_main.h:259: error: ���class QGridLayout��� has no member named 
���setHorizontalSpacing���
ui/ui_main.h:260: error: ���class QGridLayout��� has no member named 
���setVerticalSpacing���
ui/ui_main.h:264: error: ���class QHBoxLayout��� has no member named 
���setLeftMargin���
ui/ui_main.h:265: error: ���class QHBoxLayout��� has no member named 
���setTopMargin���
ui/ui_main.h:266: error: ���class QHBoxLayout��� has no member named 
���setRightMargin���
ui/ui_main.h:267: error: ���class QHBoxLayout��� has no member named 
���setBottomMargin���

make: *** [obj/main.o] Error 1

Also 2 of the diffs on main.ui and prefs.ui failed.
Attached are the rejs.

thanks,
Stephen





Dirk Bartley wrote:

This is the diff.

Dirk


On Fri, 2009-04-17 at 09:14 -0700, Stephen Thompson wrote:

Hello Dirk,

I would greatly appreciate a patch to be able to install bat 3.0.0
on centos 5 with stock qt 4.2.  I am getting the same errors as Jan Jap.

thanks!
Stephen


Dirk Bartley wrote:

I have done a patch for being able to install on the older qt.  If you
wat it, let me know.  I'll hunt it down.  The issue is not with the
programming but the version of designer I was using.  If I open with an
older designer and solve a couple of issues it compiled just fine on
centos.

I'm trying not to keep so up to date lately.

Dirk


On Wed, 2009-04-08 at 15:27 -0700, Kelvin Raywood wrote:

JanJaap Scholing janjaapschol...@hotmail.com:

I'm trying to install bat.
./configure --enable-lockmgr --with-mysql --enable-bat --disable-libtool
looks ok

But when I make I see the following error messages:

[snip]


I use Debian 4 with qt4 installed (4.2.1-2+etch1), bacula 3.0.0 (latest svn)
What can I do to solve this problem?

John Drescher wrote:

Install Qt 4.3 or greater.


Ignore that. I was looking at the wrong class in the Qt docs.
I think the first advice was correct.  I recently had the same problem 
myself building  bacula-2.5.42-b2 on CentOS-5 which includes QT4.2.1 .


I grabbed qt 4.3.4 srpm from Fedora-7 and did

  rpmbuild --rebuild --define 'dist .el5' --define 'rhel 5' \
qt4-4.3.4-14.fc7.src.rpm

  yum localinstall qt4-4.3.4-14.el5.x86_64.rpm \
   qt4-devel-4.3.4-14.el5.x86_64.rpm \
   qt4-x11-4.3.4-14.el5.x86_64.rpm

I installed qwt-devel from EPEL.

BAT built and runs fine.

Kel Raywood

--
This SF.net email is sponsored by:
High Quality Requirements in a Collaborative Environment.
Download a free trial of Rational Requirements Composer Now!
http://p.sf.net/sfu/www-ibm-com
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


--
This SF.net email is sponsored by:
High Quality Requirements in a Collaborative Environment.
Download a free trial of Rational Requirements Composer Now

[Bacula-users] Duplicate Job Control?

2009-04-21 Thread Stephen Thompson


Hello,

I was excited to take advantage of the Duplicate Job Control feature in 
3.0.0 but it does not appear to be working.  At first I assumed the 
defaults listed on the New Features documentation page, then later 
explicitly defined them in my jobdefs config:

JobDefs {
   Name = DefaultJob
   ...
   Maximum Concurrent Jobs = 2
   Allow Duplicate Jobs = no
   Allow Higher Duplicates = yes
   Cancel Queued Duplicates = yes
   Cancel Running Duplicates = no
}

I used to set 'per job' the max concurrent jobs to 1, which would not 
cancel duplicate jobs, but would force them to wait for the running job 
to finish.

What I actually want is for the duplicate job (queued) to be canceled,
so that the running job can complete and a redundant job is not then run 
immediately afterwards.  So, I changed the maximum concurrent jobs to 2,
hoping to see the duplicate job control cancel one of the duplicate jobs.

Here is what I actually experienced.  I launched a job, waited for it to 
reach a running state, then launched an identical job (name, level, 
priority, etc).  To my surprise the identical job reached a running 
state as well!  I thought that according to the above config, the queued 
job of the same priority should be canceled rather than moved into a 
running state?

  JobId Level   Name   Status
==
  37802 Increme  seismo70.2009-04-21_08.42.17_05 is running
  37803 Increme  seismo70.2009-04-21_08.42.30_06 is running


Anyone have any idea why this might not be working for me?
Am I misunderstanding how this should work?

thanks!
Stephen
berkeley seismology laboratory
--




--
Stay on top of everything new and different, both inside and 
around Java (TM) technology - register by April 22, and save
$200 on the JavaOne (SM) conference, June 2-5, 2009, San Francisco.
300 plus technical and hands-on sessions. Register today. 
Use priority code J9JMT32. http://p.sf.net/sfu/p
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


[Bacula-users] Canceling jobs in 3.0.0 results in Terminated with error

2009-04-22 Thread Stephen Thompson

Hello everyone.

I recently upgrade from 2.4.4 to 3.0.0.  Everything went very smoothly 
and the new version is running at least as well as the previous.

Once peculiar thing I've noticed is that if I cancel a job, rather than 
the job being set to a Canceled by user state, it winds up being set 
to Terminated with error.

Anyone else notice this and/or have ideas why this might be happening?

thanks!
Stephen


--
Stay on top of everything new and different, both inside and 
around Java (TM) technology - register by April 22, and save
$200 on the JavaOne (SM) conference, June 2-5, 2009, San Francisco.
300 plus technical and hands-on sessions. Register today. 
Use priority code J9JMT32. http://p.sf.net/sfu/p
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Canceling jobs in 3.0.0 results in Terminated with error

2009-04-23 Thread Stephen Thompson

RE logs...

Looks like the cancel request is put in, but results in an error.
Note that the message about there being no SL500-changer with type 
LTO-3 in the SD resources seems bogus -- that's the storage device and 
type with which all the non-canceled jobs successfully run.


sample log (server lawson/client agentsmith):

16-Apr 07:41 lawson-sd JobId 315-Apr 20:00 lawson-dir JobId 36627: Start 
Backup JobId 36627, Job=agentsmith.2009-04-15_20.00.00_23
16-Apr 08:31 lawson-sd JobId 36627: Job 
agentsmith.2009-04-15_20.00.00_23 marked to be canceled.
16-Apr 08:31 lawson-sd JobId 36627: Failed command: Jmsg 
Job=agentsmith.2009-04-15_20.00.00_23 type=6 level=1239895886 lawson-sd 
JobId 36627: Job agentsmith.2009-04-15_20.00.00_23 marked to be canceled.

16-Apr 08:31 lawson-sd JobId 36627: Fatal error:
  Device SL500-changer with MediaType LTO-3 requested by DIR not 
found in SD Device resources.
16-Apr 08:31 lawson-dir JobId 36627: Fatal error:
  Storage daemon didn't accept Device SL500-changer because:
  3924 Device SL500-changer not in SD Device resources.
16-Apr 08:31 lawson-dir JobId 36627: Error: Bacula lawson-dir 3.0.0 
(06Apr09): 16-Apr-2009 08:31:58
   Build OS:   i386-pc-solaris2.10 solaris 5.10
   JobId:  36627
   Job:agentsmith.2009-04-15_20.00.00_23
   Backup Level:   Incremental, since=2009-04-14 20:01:47
   Client: agentsmith-fd 2.4.2 (26Jul08) 
x86_64-unknown-linux-gnu,redhat,Enterprise release
   FileSet:agentsmith-fs 2008-08-15 12:09:36
   Pool:   Incremental-Pool (From Run pool override)
   Catalog:MyCatalog (From Client resource)
   Storage:SL500-changer (From Job resource)
   Scheduled time: 15-Apr-2009 20:00:00
   Start time: 15-Apr-2009 20:00:00
   End time:   16-Apr-2009 08:31:58
   Elapsed time:   12 hours 31 mins 58 secs
   Priority:   10
   FD Files Written:   0
   SD Files Written:   0
   FD Bytes Written:   0 (0 B)
   SD Bytes Written:   0 (0 B)
   Rate:   0.0 KB/s
   Software Compression:   None
   VSS:no
   Encryption: no
   Accurate:   no
   Volume name(s):
   Volume Session Id:  7
   Volume Session Time:1239837527
   Last Volume Bytes:  1 (1 B)
   Non-fatal FD errors:0
   SD Errors:  0
   FD termination status:
   SD termination status:
   Termination:*** Backup Error ***

snippet from SD resources file:

.
.
.
Autochanger {
   Name = SL500-changer
   Device = SL500-Drive-0
   Device = SL500-Drive-1
   Changer Command = /opt/bacula/scripts/mtx-changer %c %o %S %a %d
   Changer Device = /dev/changer
}
Device {
   Name = SL500-Drive-0
   Drive Index = 0
   Media Type = LTO-3
   Archive Device = /dev/rmt/0cbn
   AutomaticMount = yes;   # when device opened, read it
   AlwaysOpen = yes;
   RemovableMedia = yes;
   RandomAccess = no;
   AutoChanger = yes;
   AutoSelect = yes;
   Maximum block size = 262144   # 256kb
   Alert Command = sh -c 'tapeinfo -f %c |grep TapeAlert|cat'
   Maximum Spool Size = 140gb
   Maximum Job Spool Size = 50gb
   Spool Directory = /bacula/spool
}
.
.
.
















Martin Simmons wrote:
 On Wed, 22 Apr 2009 16:07:49 -0700, Stephen Thompson said:
 I recently upgrade from 2.4.4 to 3.0.0.  Everything went very smoothly 
 and the new version is running at least as well as the previous.

 Once peculiar thing I've noticed is that if I cancel a job, rather than 
 the job being set to a Canceled by user state, it winds up being set 
 to Terminated with error.

 Anyone else notice this and/or have ideas why this might be happening?
 
 What does the log output show?
 
 __Martin
 
 --
 Stay on top of everything new and different, both inside and 
 around Java (TM) technology - register by April 22, and save
 $200 on the JavaOne (SM) conference, June 2-5, 2009, San Francisco.
 300 plus technical and hands-on sessions. Register today. 
 Use priority code J9JMT32. http://p.sf.net/sfu/p
 ___
 Bacula-users mailing list
 Bacula-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/bacula-users


-- 
Stephen Thompson   Berkeley Seismological Laboratory
step...@seismo.berkeley.edu215 McCone Hall # 4760
404.538.7077 (phone)   University of California, Berkeley
510.643.5811 (fax) Berkeley, CA 94720-4760

--
Stay on top of everything new and different, both inside and 
around Java (TM) technology - register by April 22, and save
$200 on the JavaOne (SM) conference, June 2-5, 2009, San Francisco.
300 plus technical and hands-on sessions. Register today. 
Use priority code J9JMT32. http

Re: [Bacula-users] duplicate jobs in 3.0

2009-05-04 Thread Stephen Thompson


That's the behaviour I've seen when I set   Maximum Concurrent Jobs = 1
under JobDefs.  Then only one job with the same name can run at a time.
What I was hoping for with duplicate job control was for the subsequent 
job(s) to be canceled so that they wouldn't run at all.

thanks,
Stephen



Silver Salonen wrote:
 Hello.
 
 I noticed one thing today.. a big full backup was ran on friday, so it wasn't 
 completed 24 hours later, but when the next job's time arrived, it wasn't 
 run. 
 I was very surprised, because I expected it to run as it has been the case 
 without allow duplicate jobs = no with Bacula 2.x. When the full job 
 completed, the scheduled (and not run) one started immediately and was 
 correctly making an incremental backup.
 
 So it seems that duplicate job control does work, just not the way I 
 expected, 
 ie. I expected it to being cancelled (I guess I thought it's the Cancel 
 Queued Duplicates directive, but now, I guess not) instead of being hidden 
 and waiting somewhere back there.
 


-- 
Stephen Thompson   Berkeley Seismological Laboratory
step...@seismo.berkeley.edu215 McCone Hall # 4760
404.538.7077 (phone)   University of California, Berkeley
510.643.5811 (fax) Berkeley, CA 94720-4760

--
Register Now  Save for Velocity, the Web Performance  Operations 
Conference from O'Reilly Media. Velocity features a full day of 
expert-led, hands-on workshops and two days of sessions from industry 
leaders in dedicated Performance  Operations tracks. Use code vel09scf 
and Save an extra 15% before 5/3. http://p.sf.net/sfu/velocityconf
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] How to prevent to schedule a job if the same job is still running

2009-08-31 Thread Stephen Thompson


Note this tread as well; it's possible the documentation is wrong (the 
source code seem to indicate so):

Dear All,
Bacula (3.0.2) is configured to make daily backups of some systems. Full
backups unfortunately take more then one day to complete and I want to
avoid that duplicate jobs start (or are queued) before the full backup
is completed.

No duplicate job control directives are configured. If I understand the
manual correctly (perhaps it's an interpretation error of me)
http://www.bacula.org/en/dev-manual/New_Features.html#SECTION00310
this should not happen.

I had a quick look in the source code and found this code in src/dird/job.c:

bool allow_duplicate_job(JCR *jcr)
{
JOB *job = jcr-job;
JCR *djcr;/* possible duplicate */

if (job-AllowDuplicateJobs) {
   return true;
}
if (!job-AllowHigherDuplicates) {
-- code related to Cancel Queued Duplicates: and Cancel Running
Duplicates here
}
return true;
}

Apparently Cancel Queued Duplicates and Cancel Running Duplicates
are only evaluated when Allow Higher Duplicates is set to no - not
default. Is this an error in the documentation, code or me not correctly
understanding the manual or code?

Kind regards,
Bram Vandoren.

--
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
trial. Simplify your report design, integration and deployment - and 
focus on
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users






Silver Salonen wrote:
 On Monday 31 August 2009 09:33:22 Ralf Gross wrote:
 Silver Salonen schrieb:
 On Sunday 30 August 2009 13:58:44 Ralf Gross wrote:
 Martina Mohrstein schrieb:
 So my question is how could I prevent the schedule of a job when the
 same job is already running?
 Maybe the new Duplicate Job Control feature in 3.0.x helps to prevent
 this?

 http://www.bacula.org/en/dev-manual/New_Features.html#515

 - Allow Duplicate Jobs
 - Allow Higher Duplicates
 - Cancel Queued Duplicates
 - Cancel Running Duplicates

 I haven't still seen it working as it should (even in 3.0.2), but yes, a 
 day 
 it will, it'll be the most correct and easiest way to achieve this.
 Did anyone file a bug report about that? I searched the bug database,
 but couldn't find a report about that. At least no open bug.

 Ralf
 
 Hmm, I guess not then. But it has been reported several times in the list. 
 So, 
 any volunteers? :)
 


-- 
Stephen Thompson   Berkeley Seismological Laboratory
step...@seismo.berkeley.edu215 McCone Hall # 4760
404.538.7077 (phone)   University of California, Berkeley
510.643.5811 (fax) Berkeley, CA 94720-4760

--
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with 
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] How to prevent to schedule a job if the same job is still running

2009-09-14 Thread Stephen Thompson

Hello,

This works as reported for me as well, however, what I want to have in 
the first case is for the originally scheduled job to be canceled, not 
the duplicate.  The reason being that my incrementals fall into a daily 
schedule, whereas my fulls are scheduled out-of-band, so I want the 
incremental to be canceled on the day that a full is scheduled.

Given what you all say below, this doesn't seem possible with bacula's 
Duplicate Job Control.  Correct?

thanks!
Stephen


Silver Salonen wrote:
 On Monday 14 September 2009 15:59:24 Bram Vandoren wrote:
 Hi All,

 Silver Salonen wrote:
 Hmm, I guess not then. But it has been reported several times in the
 list. So, any volunteers? :)
 This configuration:
   Allow Higher Duplicates = no
   Cancel Queued Duplicates = yes

 Seems to work fine in my situation (some more testing is needed). It
 cancels the newly created duplicate job immediately.

 This configuration:
   Allow Higher Duplicates = no
   Cancel Running Duplicates = yes
 cancels the running job and starts the same one. If you have a job that
 takes more than 24h to complete a runs daily, it will never finish.

 Hope it helps.

 When I find some try I will reopen the bug report.

 Cheers,
 Bram.
 
 I'll try that on my servers with a few hundred jobs and report about it 
 tomorrow :)
 
 But as default options are Cancel Queued Duplicates = yes and Cancel 
 Running Duplicates = no, the only needed option seems to be Allow Higher 
 Duplicates = no. I myself had set Allow Duplicate Jobs = no, because I 
 thought it includes  Allow Higher Duplicates = no too. Whether the one 
 option helps or not, I'll tell tomorrow.
 


-- 
Stephen Thompson   Berkeley Seismological Laboratory
step...@seismo.berkeley.edu215 McCone Hall # 4760
404.538.7077 (phone)   University of California, Berkeley
510.643.5811 (fax) Berkeley, CA 94720-4760

--
Come build with us! The BlackBerryreg; Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay 
ahead of the curve. Join us from November 9#45;12, 2009. Register now#33;
http://p.sf.net/sfu/devconf
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] How to prevent to schedule a job if the same job is still running

2009-09-15 Thread Stephen Thompson
Silver Salonen wrote:
 Actually, you can do it - Allow Higher Duplicates really means ANY 
 duplicate 
 job, not only a higher one. I just tested it and an incremental job is 
 cancelled if either full or incremental instance of the same job is still 
 running.
 
 So in my case Allow Higher Duplicates did the trick :)
 

Really?

This is exactly what I want and what I tried for when 3.x was first 
released, but my experiments showed that nothing was canceled.  The jobs 
rather began running concurrently.

I'll try this again.  Are you saying to set Allow Higher Duplicates to 
Yes or No?  Actually, could you possibly list what you have all the 
relevant values set to?  I would most appreciate it.

thanks,
Stephen
-- 
Stephen Thompson   Berkeley Seismological Laboratory
step...@seismo.berkeley.edu215 McCone Hall # 4760
404.538.7077 (phone)   University of California, Berkeley
510.643.5811 (fax) Berkeley, CA 94720-4760

--
Come build with us! The BlackBerryreg; Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay 
ahead of the curve. Join us from November 9#45;12, 2009. Register now#33;
http://p.sf.net/sfu/devconf
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] How to prevent to schedule a job if the same job is still running

2009-09-15 Thread Stephen Thompson


Ah, thanks for the info, but this still is not the behavior that I am 
looking for.  This does indeed cancel incrementals if a full is already 
running (actually even if a full is merely scheduled), but it goes both 
ways, it also cancels my fulls if an incremental is already running or 
scheduled.  It's the scheduled part that causes me problems.  I have 
incrementals scheduled to run every day.  I then interject full jobs 
each day based on a script that determines which of my hosts are 
available for a full that day.

This configuration immediately cancels my fulls, rather than letting 
them run and then canceling the corresponding incrementals when they are 
actually launched.  This might work out if all jobs have static (i.e. 
based on configuration files) schedules, but rather than controlling 
duplicates, it seems better at preventing administrator intervention 
which is frustrating.

I recognize I might have a unique situation (dynamically scheduling 
fulls based on availability rather than a regular calendar cycle) which 
is fine; I'll probably have to pull my incremental scheduling out of 
bacula and cron the injection of jobs via a script.  But to me, there is 
still a design issue with considering a scheduled job to be in duplicate 
conflict with a running job; it seems like it would make more sense to 
only apply that logic in the running queue (whether actually running or 
waiting for resources).  Then canceled queued duplicates would cancel 
any job that attempted to enter the running state if another job was 
already running.  As it is now, it appears to cancel any job entering 
the running state even if another job is merely scheduled to run at some 
point in the future.  Cancellations should happen on conflict, not on 
suspicion that conflict might arise in the future.

But perhaps that's being too philosophical.  :)

Stephen





Silver Salonen wrote:
 On Tuesday 15 September 2009 17:36:25 Stephen Thompson wrote:
 Silver Salonen wrote:
 Actually, you can do it - Allow Higher Duplicates really means ANY
 duplicate job, not only a higher one. I just tested it and an incremental
 job is cancelled if either full or incremental instance of the same job
 is still running.

 So in my case Allow Higher Duplicates did the trick :)
 Really?

 This is exactly what I want and what I tried for when 3.x was first
 released, but my experiments showed that nothing was canceled.  The jobs
 rather began running concurrently.

 I'll try this again.  Are you saying to set Allow Higher Duplicates to
 Yes or No?  Actually, could you possibly list what you have all the
 relevant values set to?  I would most appreciate it.

 thanks,
 Stephen
 
 Yeah, I was positively surprised today too :)
 
 I have just one option in every JobDefs for that:
 Allow Higher Duplicates = no
 


-- 
Stephen Thompson   Berkeley Seismological Laboratory
step...@seismo.berkeley.edu215 McCone Hall # 4760
404.538.7077 (phone)   University of California, Berkeley
510.643.5811 (fax) Berkeley, CA 94720-4760

--
Come build with us! The BlackBerryreg; Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay 
ahead of the curve. Join us from November 9#45;12, 2009. Register now#33;
http://p.sf.net/sfu/devconf
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] bacula 5.0.1 and db issues - please, share your experience

2010-04-14 Thread Stephen Thompson


I mostly use bat for restores (i.e. building restore tree).

I did nothing with my tables regarding indexing.  I have whatever the 
bacula scripts create by defaults.

In regards to tuning, I did play with changing the join and sort buffer 
sizes.  I found a 'slight' increase in performance.  By slight, I mean 
something like 4.5 vs 4.6 minutes for the same restore.

Stephen




On 4/14/10 6:29 AM, Koldo Santisteban wrote:
 Thanks for your answers.
 Stephen, do you use bconsole or bat? Perhaps the issue is only on bat. I
 recognize that i only use bweb and bat (on windows).
 Regarding your comments Stephen, my bacula server is smaller than yours,
 but my catalog size was 400 Mb (Baculas is working since 2 months ago).
 I don´t tune my database, but with 3.0.3 version thats wasn´t neccesary.
 Wich parameters are recommending to tune? This info i think that is very
 useful for people with the same issue like me... i see that some people
 says that the better way is creating new indexes(someone says that this
 is the worst option), others say to custom mysql parameters...but i
 can´t find any official info, and, at less in my case, i don´t have
 enough time(and knowledge) to testing bacula with some new indexes, or
 customizing mysql/postgre... I miss this offcial info...
 Regards

 On Tue, Apr 13, 2010 at 7:36 PM, Thomas Mueller tho...@chaschperli.ch
 mailto:tho...@chaschperli.ch wrote:

 Am Tue, 13 Apr 2010 15:59:25 +0200 schrieb Koldo Santisteban:

   thanks for your answer.
The first stage was with mysql 5.0.77, and works with bacula 3.0.3
without
   problems. I have used the same database and server with bacula 5.0.1.
   The bacula server + DB is a 3,5 Gb Ram with a Xeon processor. I have
   tested my environment installing postgree on the same server and
 with a
   empty db. I create full bacula server backup and then try to
 restore. I
   have detected that the restore process works fine usgin bweb and
 bacula
   5.0.1, what is the difference between bat and bweb?

 noticed too, bat takes forever on building trees on restores.
 bconsole is
 _much_ faster.

 - Thomas


 
 --
 Download Intel#174; Parallel Studio Eval
 Try the new software tools for yourself. Speed compiling, find bugs
 proactively, and fine-tune applications for parallel performance.
 See why Intel Parallel Studio got high marks during beta.
 http://p.sf.net/sfu/intel-sw-dev
 ___
 Bacula-users mailing list
 Bacula-users@lists.sourceforge.net
 mailto:Bacula-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/bacula-users




 --
 Download Intel#174; Parallel Studio Eval
 Try the new software tools for yourself. Speed compiling, find bugs
 proactively, and fine-tune applications for parallel performance.
 See why Intel Parallel Studio got high marks during beta.
 http://p.sf.net/sfu/intel-sw-dev



 ___
 Bacula-users mailing list
 Bacula-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/bacula-users

-- 
Stephen Thompson   Berkeley Seismological Laboratory
step...@seismo.berkeley.edu215 McCone Hall # 4760
404.538.7077 (phone)   University of California, Berkeley
510.643.5811 (fax) Berkeley, CA 94720-4760

--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


[Bacula-users] 5.0.1 infinite email loop bug??

2010-04-15 Thread Stephen Thompson

Hello,

I have just now experienced a possible new bug with bacula 5.0.1.

The symptoms are this:

bacula-sd crashes
bacula-dir continues to run
bacula-dir then spews out identical Intervention needed emails until 
manually restarted

The first time this happened over a weekend and upon returning I found 
my inbox has about 120,000 bacula emails, all the SAME and of this type:

15-Apr 10:02 client-fd JobId 11: Fatal error: backup.c:1048 Network 
send error to SD. ERR=Broken pipe

It happened again just now (second time since upgrading from 3.0.3 to 
5.0.1) and I managed to stop the director with only a few thousand 
emails going out.

So there are really 2 issues here:

1)
Why does the director apparently get stuck in an infinite loop of 
sending the same email message?  Is this a known bug?

2)
Regarding the SD, I received one alert of this type, the rest like the 
above:

  15-Apr 10:02 server-sd: ERROR in lock.c:268 Failed ASSERT: 
dev-blocked()

A traceback like:
--
ptrace: Operation not permitted.
/var/bacula/work/29091: No such file or directory.
$1 = 0
/opt/bacula-5.0.1/scripts/btraceback.gdb:2: Error in sourced command file:
No symbol exename in current context.
--

And a bactrace like:
--
Attempt to dump current JCRs
JCR=0x19a24888 JobId=10 name=client_1.2010-04-14_18.02.33_41 JobStatus=l
 use_count=1
 JobType=B JobLevel=F
 sched_time=14-Apr-2010 21:35 start_time=14-Apr-2010 21:35
 end_time=31-Dec-1969 16:00 wait_time=31-Dec-1969 16:00
 db=(nil) db_batch=(nil) batch_started=0
JCR=0x1981b248 JobId=11 name=client_10.2010-04-14_20.00.15_04 
JobStatus=R
 use_count=1
 JobType=B JobLevel=I
 sched_time=15-Apr-2010 09:15 start_time=15-Apr-2010 09:15
 end_time=31-Dec-1969 16:00 wait_time=31-Dec-1969 16:00
 db=(nil) db_batch=(nil) batch_started=0
Attempt to dump plugins. Hook count=0
--

Both clients and server seem healthy, except for the SD crash.
Any ideas?


thanks!
Stephen


-
Further info:

My catalog...

 mysql-5.0.77 (64bit) MyISAM
 210Gb in size
 1,412,297,215 records in File table
 note: database built with bacula 2x scripts,
 upgraded with 3x scripts, then again with 5x scripts
 (i.e. nothing customized along the way)

My OS  hardware for bacula DIR+SD server...

 Centos 5.4 (fully patched)
 8Gb RAM
 2Gb Swap
 1Tb EXT3 filesystem on external fiber RAID5 array
 (dedicated to database, incl. temp files)
 2 dual-core [AMD Opteron(tm) Processor 2220] CPUs
 StorageTek SL500 Library with 2 LTO3 Drives





--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself. Speed compiling, find bugs
proactively, and fine-tune applications for parallel performance.
See why Intel Parallel Studio got high marks during beta.
http://p.sf.net/sfu/intel-sw-dev
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] 5.0.1 infinite email loop bug??

2010-04-15 Thread Stephen Thompson






Additionally, seems like the SD was possibly reading a new
freshly-labeled tape when it crashed...  Last items in bacula log
besides alerts already mentioned:


15-Apr 09:31 server-sd JobId 10: Writing spooled data to Volume.
Despooling 35,000,185,219 bytes ...
15-Apr 09:51 server-sd JobId 10: End of Volume FB0568 at 888:1414
on device SL500-Drive-1 (/dev/nst0). Write of 262144 bytes got -1.
15-Apr 09:51 server-sd JobId 10: Re-read of last block succeeded.
15-Apr 09:51 server-sd JobId 10: End of medium on Volume FB0568
Bytes=887,261,470,720 Blocks=3,384,635 at 15-Apr-2010 09:51.
15-Apr 09:51 server-sd JobId 10: 3307 Issuing autochanger unload
slot 38, drive 1 command.
15-Apr 09:52 server-sd JobId 10: 3301 Issuing autochanger loaded?
drive 1 command.
15-Apr 09:52 server-sd JobId 10: 3302 Autochanger loaded? drive 1,
result: nothing loaded.
15-Apr 09:52 server-sd JobId 10: 3304 Issuing autochanger load slot
39, drive 1 command.
15-Apr 09:52 server-sd JobId 10: 3305 Autochanger load slot 39,
drive 1, status is OK.
15-Apr 09:52 server-sd JobId 10: Volume FB0569 previously written,
moving to end of data.

Nothing but thousands of 'repetitive' alerts after that...

thanks again,
Stephen



On 04/15/2010 10:25 AM, Stephen Thompson wrote:

 Hello,

 I have just now experienced a possible new bug with bacula 5.0.1.

 The symptoms are this:

 bacula-sd crashes
 bacula-dir continues to run
 bacula-dir then spews out identical Intervention needed emails until
 manually restarted

 The first time this happened over a weekend and upon returning I found
 my inbox has about 120,000 bacula emails, all the SAME and of this type:

 15-Apr 10:02 client-fd JobId 11: Fatal error: backup.c:1048 Network
 send error to SD. ERR=Broken pipe

 It happened again just now (second time since upgrading from 3.0.3 to
 5.0.1) and I managed to stop the director with only a few thousand
 emails going out.

 So there are really 2 issues here:

 1)
 Why does the director apparently get stuck in an infinite loop of
 sending the same email message?  Is this a known bug?

 2)
 Regarding the SD, I received one alert of this type, the rest like the
 above:

15-Apr 10:02 server-sd: ERROR in lock.c:268 Failed ASSERT:
 dev-blocked()

 A traceback like:
 --
 ptrace: Operation not permitted.
 /var/bacula/work/29091: No such file or directory.
 $1 = 0
 /opt/bacula-5.0.1/scripts/btraceback.gdb:2: Error in sourced command file:
 No symbol exename in current context.
 --

 And a bactrace like:
 --
 Attempt to dump current JCRs
 JCR=0x19a24888 JobId=10 name=client_1.2010-04-14_18.02.33_41 JobStatus=l
   use_count=1
   JobType=B JobLevel=F
   sched_time=14-Apr-2010 21:35 start_time=14-Apr-2010 21:35
   end_time=31-Dec-1969 16:00 wait_time=31-Dec-1969 16:00
   db=(nil) db_batch=(nil) batch_started=0
 JCR=0x1981b248 JobId=11 name=client_10.2010-04-14_20.00.15_04
 JobStatus=R
   use_count=1
   JobType=B JobLevel=I
   sched_time=15-Apr-2010 09:15 start_time=15-Apr-2010 09:15
   end_time=31-Dec-1969 16:00 wait_time=31-Dec-1969 16:00
   db=(nil) db_batch=(nil) batch_started=0
 Attempt to dump plugins. Hook count=0
 --

 Both clients and server seem healthy, except for the SD crash.
 Any ideas?


 thanks!
 Stephen


 -
 Further info:

 My catalog...

   mysql-5.0.77 (64bit) MyISAM
   210Gb in size
   1,412,297,215 records in File table
   note: database built with bacula 2x scripts,
   upgraded with 3x scripts, then again with 5x scripts
   (i.e. nothing customized along the way)

 My OS  hardware for bacula DIR+SD server...

   Centos 5.4 (fully patched)
   8Gb RAM
   2Gb Swap
   1Tb EXT3 filesystem on external fiber RAID5 array
   (dedicated to database, incl. temp files)
   2 dual-core [AMD Opteron(tm) Processor 2220] CPUs
   StorageTek SL500 Library with 2 LTO3 Drives





 --
 Download Intel#174; Parallel Studio Eval
 Try the new software tools for yourself. Speed compiling, find bugs
 proactively, and fine-tune applications for parallel performance.
 See why Intel Parallel Studio got high marks during beta.
 http://p.sf.net/sfu/intel-sw-dev
 ___
 Bacula-devel mailing list
 bacula-de...@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/bacula-devel


-- 
Stephen Thompson   Berkeley Seismological Laboratory
step...@seismo.berkeley.edu215 McCone Hall # 4760
404.538.7077 (phone)   University of California, Berkeley
510.643.5811 (fax) Berkeley, CA 94720-4760



--
Download Intel#174; Parallel Studio Eval
Try the new software tools for yourself

Re: [Bacula-users] [Bacula-devel] 5.0.1 infinite email loop bug??

2010-04-15 Thread Stephen Thompson


Additionally, seems like the SD was possibly reading a new 
freshly-labeled tape when it crashed...  Last items in bacula log 
besides alerts already mentioned:


15-Apr 09:31 server-sd JobId 10: Writing spooled data to Volume. 
Despooling 35,000,185,219 bytes ...
15-Apr 09:51 server-sd JobId 10: End of Volume FB0568 at 888:1414 
on device SL500-Drive-1 (/dev/nst0). Write of 262144 bytes got -1.
15-Apr 09:51 server-sd JobId 10: Re-read of last block succeeded.
15-Apr 09:51 server-sd JobId 10: End of medium on Volume FB0568 
Bytes=887,261,470,720 Blocks=3,384,635 at 15-Apr-2010 09:51.
15-Apr 09:51 server-sd JobId 10: 3307 Issuing autochanger unload 
slot 38, drive 1 command.
15-Apr 09:52 server-sd JobId 10: 3301 Issuing autochanger loaded? 
drive 1 command.
15-Apr 09:52 server-sd JobId 10: 3302 Autochanger loaded? drive 1, 
result: nothing loaded.
15-Apr 09:52 server-sd JobId 10: 3304 Issuing autochanger load slot 
39, drive 1 command.
15-Apr 09:52 server-sd JobId 10: 3305 Autochanger load slot 39, 
drive 1, status is OK.
15-Apr 09:52 server-sd JobId 10: Volume FB0569 previously written, 
moving to end of data.

Nothing but thousands of 'repetitive' alerts after that...

thanks again,
Stephen



On 04/15/2010 10:25 AM, Stephen Thompson wrote:

 Hello,

 I have just now experienced a possible new bug with bacula 5.0.1.

 The symptoms are this:

 bacula-sd crashes
 bacula-dir continues to run
 bacula-dir then spews out identical Intervention needed emails until
 manually restarted

 The first time this happened over a weekend and upon returning I found
 my inbox has about 120,000 bacula emails, all the SAME and of this type:

 15-Apr 10:02 client-fd JobId 11: Fatal error: backup.c:1048 Network
 send error to SD. ERR=Broken pipe

 It happened again just now (second time since upgrading from 3.0.3 to
 5.0.1) and I managed to stop the director with only a few thousand
 emails going out.

 So there are really 2 issues here:

 1)
 Why does the director apparently get stuck in an infinite loop of
 sending the same email message?  Is this a known bug?

 2)
 Regarding the SD, I received one alert of this type, the rest like the
 above:

15-Apr 10:02 server-sd: ERROR in lock.c:268 Failed ASSERT:
 dev-blocked()

 A traceback like:
 --
 ptrace: Operation not permitted.
 /var/bacula/work/29091: No such file or directory.
 $1 = 0
 /opt/bacula-5.0.1/scripts/btraceback.gdb:2: Error in sourced command file:
 No symbol exename in current context.
 --

 And a bactrace like:
 --
 Attempt to dump current JCRs
 JCR=0x19a24888 JobId=10 name=client_1.2010-04-14_18.02.33_41 JobStatus=l
   use_count=1
   JobType=B JobLevel=F
   sched_time=14-Apr-2010 21:35 start_time=14-Apr-2010 21:35
   end_time=31-Dec-1969 16:00 wait_time=31-Dec-1969 16:00
   db=(nil) db_batch=(nil) batch_started=0
 JCR=0x1981b248 JobId=11 name=client_10.2010-04-14_20.00.15_04
 JobStatus=R
   use_count=1
   JobType=B JobLevel=I
   sched_time=15-Apr-2010 09:15 start_time=15-Apr-2010 09:15
   end_time=31-Dec-1969 16:00 wait_time=31-Dec-1969 16:00
   db=(nil) db_batch=(nil) batch_started=0
 Attempt to dump plugins. Hook count=0
 --

 Both clients and server seem healthy, except for the SD crash.
 Any ideas?


 thanks!
 Stephen


 -
 Further info:

 My catalog...

   mysql-5.0.77 (64bit) MyISAM
   210Gb in size
   1,412,297,215 records in File table
   note: database built with bacula 2x scripts,
   upgraded with 3x scripts, then again with 5x scripts
   (i.e. nothing customized along the way)

 My OS  hardware for bacula DIR+SD server...

   Centos 5.4 (fully patched)
   8Gb RAM
   2Gb Swap
   1Tb EXT3 filesystem on external fiber RAID5 array
   (dedicated to database, incl. temp files)
   2 dual-core [AMD Opteron(tm) Processor 2220] CPUs
   StorageTek SL500 Library with 2 LTO3 Drives





 --
 Download Intel#174; Parallel Studio Eval
 Try the new software tools for yourself. Speed compiling, find bugs
 proactively, and fine-tune applications for parallel performance.
 See why Intel Parallel Studio got high marks during beta.
 http://p.sf.net/sfu/intel-sw-dev
 ___
 Bacula-devel mailing list
 bacula-de...@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/bacula-devel


-- 
Stephen Thompson   Berkeley Seismological Laboratory
step...@seismo.berkeley.edu215 McCone Hall # 4760
404.538.7077 (phone)   University of California, Berkeley
510.643.5811 (fax) Berkeley, CA 94720-4760

--
Download Intel#174; Parallel Studio Eval
Try the new software tools

Re: [Bacula-users] [Bacula-devel] 5.0.1 infinite email loop bug??

2010-04-15 Thread Stephen Thompson

Hello,

Thanks for the response.

No, it's nothing to do with mail configuration; 100% sure of that.
(I know people say that all the time, but, seriously, it's the director).

And by alerts, I do mean Messages in the bacula vernacular.

The first time this crash happened, we received 120,000 Messages in the 
form of emails to our administrative account.  The messages were 
identical both to each other and to the content of the $JOB.mail file in 
our bacula working directory (which is never removed automatically after 
one of these crashes - perhaps that causes the endless cycle).  The same 
Message also appears to be written to our bacula log file each time an 
email is generated (or vice versa).

It seems to me like it's possible for the director to get stuck in a 
loop and send the contents of that mail file again and again, 
infinitely.  Both times we've had the SD crash (both have happened since 
upgrading to 5.0.1), the only thing that stopped the Message generation 
was stopping the director itself.

Of course, that's the annoying symptom.  The more serious problem is our 
the crash of our SD.  Any pointers to getting ptrace working with the 
automatic scripts?

thanks!
Stephen






On 04/15/2010 12:40 PM, Kern Sibbald wrote:
 On Thursday 15 April 2010 19:36:51 Stephen Thompson wrote:
 Additionally, seems like the SD was possibly reading a new
 freshly-labeled tape when it crashed...  Last items in bacula log
 besides alerts already mentioned:

 In Bacula alerts refer to tape drive information stored concerning tape
 problems, so I am assuming you mean messages.



 15-Apr 09:31 server-sd JobId 10: Writing spooled data to Volume.
 Despooling 35,000,185,219 bytes ...
 15-Apr 09:51 server-sd JobId 10: End of Volume FB0568 at 888:1414
 on device SL500-Drive-1 (/dev/nst0). Write of 262144 bytes got -1.
 15-Apr 09:51 server-sd JobId 10: Re-read of last block succeeded.
 15-Apr 09:51 server-sd JobId 10: End of medium on Volume FB0568
 Bytes=887,261,470,720 Blocks=3,384,635 at 15-Apr-2010 09:51.
 15-Apr 09:51 server-sd JobId 10: 3307 Issuing autochanger unload
 slot 38, drive 1 command.
 15-Apr 09:52 server-sd JobId 10: 3301 Issuing autochanger loaded?
 drive 1 command.
 15-Apr 09:52 server-sd JobId 10: 3302 Autochanger loaded? drive 1,
 result: nothing loaded.
 15-Apr 09:52 server-sd JobId 10: 3304 Issuing autochanger load slot
 39, drive 1 command.
 15-Apr 09:52 server-sd JobId 10: 3305 Autochanger load slot 39,
 drive 1, status is OK.
 15-Apr 09:52 server-sd JobId 10: Volume FB0569 previously written,
 moving to end of data.

 Nothing but thousands of 'repetitive' alerts after that...

 What exactly is repeated?

 There was a Bacula bug #1480 in message delivery that may be the same that you
 are experiencing, it was triggered by a misconfigured SMTP server or by a
 reference in Bacula to a non-existent SMTP server  - and the simple solution
 is to make sure Bacula points to a valid functional SMTP server.  This
 problem was not particular to version 5.0.1, but I think it was fixed after
 the release of 5.0.1.  Please see the bugs database for more details.

 Kern


 thanks again,
 Stephen

 On 04/15/2010 10:25 AM, Stephen Thompson wrote:
 Hello,

 I have just now experienced a possible new bug with bacula 5.0.1.

 The symptoms are this:

 bacula-sd crashes
 bacula-dir continues to run
 bacula-dir then spews out identical Intervention needed emails until
 manually restarted

 The first time this happened over a weekend and upon returning I found
 my inbox has about 120,000 bacula emails, all the SAME and of this type:

 15-Apr 10:02 client-fd JobId 11: Fatal error: backup.c:1048 Network
 send error to SD. ERR=Broken pipe

 It happened again just now (second time since upgrading from 3.0.3 to
 5.0.1) and I managed to stop the director with only a few thousand
 emails going out.

 So there are really 2 issues here:

 1)
 Why does the director apparently get stuck in an infinite loop of
 sending the same email message?  Is this a known bug?

 2)
 Regarding the SD, I received one alert of this type, the rest like the
 above:

 15-Apr 10:02 server-sd: ERROR in lock.c:268 Failed ASSERT:
 dev-blocked()

 A traceback like:
 --
 ptrace: Operation not permitted.
 /var/bacula/work/29091: No such file or directory.
 $1 = 0
 /opt/bacula-5.0.1/scripts/btraceback.gdb:2: Error in sourced command
 file: No symbol exename in current context.
 --

 And a bactrace like:
 --
 Attempt to dump current JCRs
 JCR=0x19a24888 JobId=10 name=client_1.2010-04-14_18.02.33_41
 JobStatus=l use_count=1
JobType=B JobLevel=F
sched_time=14-Apr-2010 21:35 start_time=14-Apr-2010 21:35
end_time=31-Dec-1969 16:00 wait_time=31-Dec-1969 16:00
db=(nil) db_batch=(nil) batch_started=0
 JCR=0x1981b248 JobId=11 name=client_10.2010-04-14_20.00.15_04
 JobStatus=R
use_count=1
JobType=B JobLevel=I
sched_time=15-Apr

[Bacula-users] Warning about setting File/Job Retentions in Pool resource!

2010-04-26 Thread Stephen Thompson



My, possibly mistaken, understanding of having File/Job Retention 
directives in a Pool resource was to be able to deviate from File/Job 
Retentions set by the Client resource AND to confine those retentions to 
the Pool where they are specified.

What actually happens is that when using the Pool where the File/Job 
Retentions are specified, the retentions will apply to any File/Job's 
that were written to another Pool, overriding the Client resource.

Real life example:

The Job Retention for all my clients defaults to 1 year and I have 
monthly full Pools that I keep for a year.  I also have an 
incremental/differential pool that I recycle on a 60-90 day basis.

When I set the File/Job Retention to 90 days for my 
incremental/differential Pool and ran a complete set of incrementals, 
the 90 day retention was then applied to all of those jobs, not just for 
the incremental/differential Pool where the 90 day period was set, but 
for all of my monthly full Pools as well!  This effectively purged 9 
months of my Catalog records.  :(

Yes, I had a backup of the Catalog and yet it took 12 hours to restore.

But, please note that it can be dangerous to use File/Job retentions in 
a Pool resource.

thanks,
Stephen
-- 
Stephen Thompson   Berkeley Seismological Laboratory
step...@seismo.berkeley.edu215 McCone Hall # 4760
404.538.7077 (phone)   University of California, Berkeley
510.643.5811 (fax) Berkeley, CA 94720-4760

--
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Warning about setting File/Job Retentions in Pool resource!

2010-04-26 Thread Stephen Thompson


For more clarity:
What actually happens is that when writing to the Pool where the 
File/Job Retentions are specified, the retentions will apply to any 
File/Job's that were ALSO written to another Pool, thus overriding the 
Client resource regardless of Pool.



On 04/26/2010 11:52 AM, Stephen Thompson wrote:



 My, possibly mistaken, understanding of having File/Job Retention
 directives in a Pool resource was to be able to deviate from File/Job
 Retentions set by the Client resource AND to confine those retentions to
 the Pool where they are specified.

 What actually happens is that when using the Pool where the File/Job
 Retentions are specified, the retentions will apply to any File/Job's
 that were written to another Pool, overriding the Client resource.

 Real life example:

 The Job Retention for all my clients defaults to 1 year and I have
 monthly full Pools that I keep for a year.  I also have an
 incremental/differential pool that I recycle on a 60-90 day basis.

 When I set the File/Job Retention to 90 days for my
 incremental/differential Pool and ran a complete set of incrementals,
 the 90 day retention was then applied to all of those jobs, not just for
 the incremental/differential Pool where the 90 day period was set, but
 for all of my monthly full Pools as well!  This effectively purged 9
 months of my Catalog records.  :(

 Yes, I had a backup of the Catalog and yet it took 12 hours to restore.

 But, please note that it can be dangerous to use File/Job retentions in
 a Pool resource.

 thanks,
 Stephen


-- 
Stephen Thompson   Berkeley Seismological Laboratory
step...@seismo.berkeley.edu215 McCone Hall # 4760
404.538.7077 (phone)   University of California, Berkeley
510.643.5811 (fax) Berkeley, CA 94720-4760

--
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


[Bacula-users] 5.0.2 status of Restore jobs bytes read

2010-05-10 Thread Stephen Thompson


Hello all,

I can't say this with certainty, but I've believe I've been experiencing 
a new oddity with bacula ever since I upgraded from 3.X to 5.X.

After launching a restore job, the status of the job via bconsole is 
always is waiting on Storage Device.  This continues on long past 
the loading and forwarding of tapes until job completion.  I could swear 
that there used to be more granular status messages, if only that it 
switched to running when not sending mtx commands.

Am I mistaken?

Also a status of the Storage Daemon does not display bytes out for a 
Restore job the way it does bytes in for a Backup job.  I don't know if 
this is how it's always been or not, but seems like a bytes read back 
out would be pretty handy to have.

thanks!!!
Stephen
-- 
Stephen Thompson   Berkeley Seismological Laboratory
step...@seismo.berkeley.edu215 McCone Hall # 4760
404.538.7077 (phone)   University of California, Berkeley
510.643.5811 (fax) Berkeley, CA 94720-4760

--

___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


[Bacula-users] mysql to postgresql conversion?

2010-05-25 Thread Stephen Thompson

Hello,

Anyone have an up to date howto on converting a mysql bacula database to 
postgresql?

There are notes on the bacula site, but it appears that they may be out 
of date.

thanks!
Stephen
-- 
Stephen Thompson   Berkeley Seismological Laboratory
step...@seismo.berkeley.edu215 McCone Hall # 4760
404.538.7077 (phone)   University of California, Berkeley
510.643.5811 (fax) Berkeley, CA 94720-4760

--

___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


[Bacula-users] postgres tuning?

2010-06-04 Thread Stephen Thompson


Hello everyone,

We recently attempted a mysql to postgresql migration for our bacula 
5.0.2 server.  The data migration itself was successful, however we are 
disappointly either getting the same or significantly worse performance 
out of the postgres db.

I was hoping that someone might have some insight into this.

Here is some background:

software:
   centos 5.5 (64bit)
   bacula 5.0.2 (64bit)
   postgresql 8.1.21 (64bit)
   (previously... mysql-5.0.77 (64bit) MyISAM)

database:
   select count(*) from File -- 1,439,626,558
   du -sk /var/lib/pgsql/data -- 346,236,136 /var/lib/pgsql/data

hardware:
   1Tb EXT3 external fibre-RAID storage
   8Gb RAM
   2Gb SWAP
   2 dual-core [AMD Opteron(tm) Processor 2220] CPUs


Some of the postgres tuning that I've attempted thus far (comments are 
either default or alternatively settings I've tried without effect):

#shared_buffers = 1000# min 16 or max_connections*2, 8KB each
shared_buffers = 262144 # 2Gb
#work_mem = 1024# min 64, size in KB
work_mem = 524288   # 512Mb
#maintenance_work_mem = 16384   # min 1024, size in KB
maintenance_work_mem = 2097152  # 2Gb
#checkpoint_segments = 3  # in logfile segments, min 1, 16MB each
checkpoint_segments = 16
#checkpoint_warning = 30# in seconds, 0 is off
checkpoint_warning = 16
#effective_cache_size = 1000# typically 8KB each
#effective_cache_size = 262144  # 256Mb
effective_cache_size = 6291456  # 6Gb
#random_page_cost = 4 # units are one sequential page fetch cost
random_page_cost = 2

Now, as to what I'm 'seeing'.  Building restore trees are on par with my 
previous mysql db, but what I'm seeing as significantly worse are:

mysql   postgresql
Within Bat:
1) Version Browser (large sample job)3min 9min  
2) Restore Tree (average sample job)40sec25sec
3) Restore Tree (large sample job)  10min   8.5min
2) Jobs Run (1000 Records)  10sec 2min

Within psql/mysql:
1) select count(*) from File;1sec30min

Catalog dump:
1) mysqldump/pgdump  2hrs 3hrs


I get a win on building Restore trees, but everywhere else, it's 
painfully slow.  It makes the bat utility virtually unusable as an 
interface.  Why the win (albeit moderate) in some cases but terrible 
responses in others?

I admit that I am not familiar with postgres at all, but I tried to walk 
through some of the postgres tuning documents, including the notes in 
the bacula manual to arrive at the above settings.  Also note that I've 
tried several variants on the configuration above (including the 
postgres defaults), don't have a detailed play by play of the results, 
but the time results above seemed typical regardless of what settings I 
tweaked.

Any help would be greatly appreciated!
Stephen
-- 
Stephen Thompson   Berkeley Seismological Laboratory
step...@seismo.berkeley.edu215 McCone Hall # 4760
404.538.7077 (phone)   University of California, Berkeley
510.643.5811 (fax) Berkeley, CA 94720-4760

--
ThinkGeek and WIRED's GeekDad team up for the Ultimate 
GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the 
lucky parental unit.  See the prize list and enter to win: 
http://p.sf.net/sfu/thinkgeek-promo
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] postgres tuning?

2010-06-04 Thread Stephen Thompson


Correction:
I didn't notice the 8k per unit settings at first with postgres 8.1.
Should read:
effective_cache_size = 786432# 6Gb


On 06/04/2010 10:58 AM, Stephen Thompson wrote:


 Hello everyone,

 We recently attempted a mysql to postgresql migration for our bacula
 5.0.2 server.  The data migration itself was successful, however we are
 disappointly either getting the same or significantly worse performance
 out of the postgres db.

 I was hoping that someone might have some insight into this.

 Here is some background:

 software:
 centos 5.5 (64bit)
 bacula 5.0.2 (64bit)
 postgresql 8.1.21 (64bit)
 (previously... mysql-5.0.77 (64bit) MyISAM)

 database:
 select count(*) from File --  1,439,626,558
 du -sk /var/lib/pgsql/data --  346,236,136 /var/lib/pgsql/data

 hardware:
 1Tb EXT3 external fibre-RAID storage
 8Gb RAM
 2Gb SWAP
 2 dual-core [AMD Opteron(tm) Processor 2220] CPUs


 Some of the postgres tuning that I've attempted thus far (comments are
 either default or alternatively settings I've tried without effect):

 #shared_buffers = 1000# min 16 or max_connections*2, 8KB each
 shared_buffers = 262144 # 2Gb
 #work_mem = 1024# min 64, size in KB
 work_mem = 524288   # 512Mb
 #maintenance_work_mem = 16384   # min 1024, size in KB
 maintenance_work_mem = 2097152  # 2Gb
 #checkpoint_segments = 3  # in logfile segments, min 1, 16MB each
 checkpoint_segments = 16
 #checkpoint_warning = 30# in seconds, 0 is off
 checkpoint_warning = 16
 #effective_cache_size = 1000# typically 8KB each
 #effective_cache_size = 262144  # 256Mb
 effective_cache_size = 6291456  # 6Gb
 #random_page_cost = 4 # units are one sequential page fetch cost
 random_page_cost = 2

 Now, as to what I'm 'seeing'.  Building restore trees are on par with my
 previous mysql db, but what I'm seeing as significantly worse are:

   mysql   postgresql
 Within Bat:
 1) Version Browser (large sample job)  3min 9min  
 2) Restore Tree (average sample job)  40sec25sec
 3) Restore Tree (large sample job)10min   8.5min
 2) Jobs Run (1000 Records)10sec 2min

 Within psql/mysql:
 1) select count(*) from File;  1sec30min

 Catalog dump:
 1) mysqldump/pgdump2hrs 3hrs


 I get a win on building Restore trees, but everywhere else, it's
 painfully slow.  It makes the bat utility virtually unusable as an
 interface.  Why the win (albeit moderate) in some cases but terrible
 responses in others?

 I admit that I am not familiar with postgres at all, but I tried to walk
 through some of the postgres tuning documents, including the notes in
 the bacula manual to arrive at the above settings.  Also note that I've
 tried several variants on the configuration above (including the
 postgres defaults), don't have a detailed play by play of the results,
 but the time results above seemed typical regardless of what settings I
 tweaked.

 Any help would be greatly appreciated!
 Stephen


-- 
Stephen Thompson   Berkeley Seismological Laboratory
step...@seismo.berkeley.edu215 McCone Hall # 4760
404.538.7077 (phone)   University of California, Berkeley
510.643.5811 (fax) Berkeley, CA 94720-4760

--
ThinkGeek and WIRED's GeekDad team up for the Ultimate 
GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the 
lucky parental unit.  See the prize list and enter to win: 
http://p.sf.net/sfu/thinkgeek-promo
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] postgres tuning?

2010-06-04 Thread Stephen Thompson


Thanks, yes it is Linux.  I will look at those limits settings.
And yes, I've built indexes and analyze (nothing to vacuum yet since 
it's a fresh import).

Stephen

On 06/04/2010 12:16 PM, Alan Brown wrote:
 On Fri, 4 Jun 2010, Stephen Thompson wrote:



 Correction:
 I didn't notice the 8k per unit settings at first with postgres 8.1.
 Should read:
 effective_cache_size = 786432# 6Gb

 Assuming this is linux, you need to tweak /etc/sysctl/limits.conf a
 little:

 postgres  softmemlock unlimited
 postgres  hardmemlock unlimited
 @postgres  hardmemlock unlimited
 @postgres  softmemlock unlimited
 bacula  softmemlock unlimited
 bacula  hardmemlock unlimited
 @bacula  softmemlock unlimited
 @bacula  hardmemlock unlimited

 postgres softrss unlimited
 postgres hardrss unlimited


 Don't forget to build the indexes and run analyse/vacuum commands.

 So far I'm finding Postgres is far more forgiving than MySQL and has far
 fewer parts to tune...





 On 06/04/2010 10:58 AM, Stephen Thompson wrote:


 Hello everyone,

 We recently attempted a mysql to postgresql migration for our bacula
 5.0.2 server.  The data migration itself was successful, however we are
 disappointly either getting the same or significantly worse performance
 out of the postgres db.

 I was hoping that someone might have some insight into this.

 Here is some background:

 software:
  centos 5.5 (64bit)
  bacula 5.0.2 (64bit)
  postgresql 8.1.21 (64bit)
  (previously... mysql-5.0.77 (64bit) MyISAM)

 database:
  select count(*) from File --   1,439,626,558
  du -sk /var/lib/pgsql/data --   346,236,136 /var/lib/pgsql/data

 hardware:
  1Tb EXT3 external fibre-RAID storage
  8Gb RAM
  2Gb SWAP
  2 dual-core [AMD Opteron(tm) Processor 2220] CPUs


 Some of the postgres tuning that I've attempted thus far (comments are
 either default or alternatively settings I've tried without effect):

 #shared_buffers = 1000# min 16 or max_connections*2, 8KB each
 shared_buffers = 262144 # 2Gb
 #work_mem = 1024# min 64, size in KB
 work_mem = 524288   # 512Mb
 #maintenance_work_mem = 16384   # min 1024, size in KB
 maintenance_work_mem = 2097152  # 2Gb
 #checkpoint_segments = 3  # in logfile segments, min 1, 16MB each
 checkpoint_segments = 16
 #checkpoint_warning = 30# in seconds, 0 is off
 checkpoint_warning = 16
 #effective_cache_size = 1000# typically 8KB each
 #effective_cache_size = 262144  # 256Mb
 effective_cache_size = 6291456  # 6Gb
 #random_page_cost = 4 # units are one sequential page fetch cost
 random_page_cost = 2

 Now, as to what I'm 'seeing'.  Building restore trees are on par with my
 previous mysql db, but what I'm seeing as significantly worse are:

 mysql   postgresql
 Within Bat:
 1) Version Browser (large sample job)3min 9min
 2) Restore Tree (average sample job)40sec25sec
 3) Restore Tree (large sample job)  10min   8.5min
 2) Jobs Run (1000 Records)  10sec 2min

 Within psql/mysql:
 1) select count(*) from File;1sec30min

 Catalog dump:
 1) mysqldump/pgdump  2hrs 3hrs


 I get a win on building Restore trees, but everywhere else, it's
 painfully slow.  It makes the bat utility virtually unusable as an
 interface.  Why the win (albeit moderate) in some cases but terrible
 responses in others?

 I admit that I am not familiar with postgres at all, but I tried to walk
 through some of the postgres tuning documents, including the notes in
 the bacula manual to arrive at the above settings.  Also note that I've
 tried several variants on the configuration above (including the
 postgres defaults), don't have a detailed play by play of the results,
 but the time results above seemed typical regardless of what settings I
 tweaked.

 Any help would be greatly appreciated!
 Stephen






-- 
Stephen Thompson   Berkeley Seismological Laboratory
step...@seismo.berkeley.edu215 McCone Hall # 4760
404.538.7077 (phone)   University of California, Berkeley
510.643.5811 (fax) Berkeley, CA 94720-4760

--
ThinkGeek and WIRED's GeekDad team up for the Ultimate 
GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the 
lucky parental unit.  See the prize list and enter to win: 
http://p.sf.net/sfu/thinkgeek-promo
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] postgres tuning?

2010-06-07 Thread Stephen Thompson


Yes, it's ext3.



On 6/4/10 7:24 PM, Jon Schewe wrote:
 Which filesystem are you on too? I've found that ext3 is significantly
 faster than ext4 and xfs.

 On 06/04/2010 04:01 PM, Stephen Thompson wrote:

 Thanks, yes it is Linux.  I will look at those limits settings.
 And yes, I've built indexes and analyze (nothing to vacuum yet since
 it's a fresh import).

 Stephen

 On 06/04/2010 12:16 PM, Alan Brown wrote:

 On Fri, 4 Jun 2010, Stephen Thompson wrote:



 Correction:
 I didn't notice the 8k per unit settings at first with postgres 8.1.
 Should read:
 effective_cache_size = 786432# 6Gb

 Assuming this is linux, you need to tweak /etc/sysctl/limits.conf a
 little:

 postgres  softmemlock unlimited
 postgres  hardmemlock unlimited
 @postgres  hardmemlock unlimited
 @postgres  softmemlock unlimited
 bacula  softmemlock unlimited
 bacula  hardmemlock unlimited
 @bacula  softmemlock unlimited
 @bacula  hardmemlock unlimited

 postgres softrss unlimited
 postgres hardrss unlimited


 Don't forget to build the indexes and run analyse/vacuum commands.

 So far I'm finding Postgres is far more forgiving than MySQL and has far
 fewer parts to tune...





 On 06/04/2010 10:58 AM, Stephen Thompson wrote:


 Hello everyone,

 We recently attempted a mysql to postgresql migration for our bacula
 5.0.2 server.  The data migration itself was successful, however we are
 disappointly either getting the same or significantly worse performance
 out of the postgres db.

 I was hoping that someone might have some insight into this.

 Here is some background:

 software:
   centos 5.5 (64bit)
   bacula 5.0.2 (64bit)
   postgresql 8.1.21 (64bit)
   (previously... mysql-5.0.77 (64bit) MyISAM)

 database:
   select count(*) from File --1,439,626,558
   du -sk /var/lib/pgsql/data --346,236,136 /var/lib/pgsql/data

 hardware:
   1Tb EXT3 external fibre-RAID storage
   8Gb RAM
   2Gb SWAP
   2 dual-core [AMD Opteron(tm) Processor 2220] CPUs


 Some of the postgres tuning that I've attempted thus far (comments are
 either default or alternatively settings I've tried without effect):

 #shared_buffers = 1000# min 16 or max_connections*2, 8KB each
 shared_buffers = 262144 # 2Gb
 #work_mem = 1024# min 64, size in KB
 work_mem = 524288   # 512Mb
 #maintenance_work_mem = 16384   # min 1024, size in KB
 maintenance_work_mem = 2097152  # 2Gb
 #checkpoint_segments = 3  # in logfile segments, min 1, 16MB each
 checkpoint_segments = 16
 #checkpoint_warning = 30# in seconds, 0 is off
 checkpoint_warning = 16
 #effective_cache_size = 1000# typically 8KB each
 #effective_cache_size = 262144  # 256Mb
 effective_cache_size = 6291456  # 6Gb
 #random_page_cost = 4 # units are one sequential page fetch cost
 random_page_cost = 2

 Now, as to what I'm 'seeing'.  Building restore trees are on par with my
 previous mysql db, but what I'm seeing as significantly worse are:

   mysql   postgresql
 Within Bat:
 1) Version Browser (large sample job)  3min 9min
 2) Restore Tree (average sample job)  40sec25sec
 3) Restore Tree (large sample job)10min   8.5min
 2) Jobs Run (1000 Records)10sec 2min

 Within psql/mysql:
 1) select count(*) from File;  1sec30min

 Catalog dump:
 1) mysqldump/pgdump2hrs 3hrs


 I get a win on building Restore trees, but everywhere else, it's
 painfully slow.  It makes the bat utility virtually unusable as an
 interface.  Why the win (albeit moderate) in some cases but terrible
 responses in others?

 I admit that I am not familiar with postgres at all, but I tried to walk
 through some of the postgres tuning documents, including the notes in
 the bacula manual to arrive at the above settings.  Also note that I've
 tried several variants on the configuration above (including the
 postgres defaults), don't have a detailed play by play of the results,
 but the time results above seemed typical regardless of what settings I
 tweaked.

 Any help would be greatly appreciated!
 Stephen








-- 
Stephen Thompson   Berkeley Seismological Laboratory
step...@seismo.berkeley.edu215 McCone Hall # 4760
404.538.7077 (phone)   University of California, Berkeley
510.643.5811 (fax) Berkeley, CA 94720-4760

--
ThinkGeek and WIRED's GeekDad team up for the Ultimate 
GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the 
lucky parental unit.  See the prize list and enter to win: 
http://p.sf.net/sfu

Re: [Bacula-users] postgres tuning?

2010-06-07 Thread Stephen Thompson
On 06/07/2010 12:33 AM, Julien Cigar wrote:
 Stephen Thompson wrote:

 Hello everyone,

 We recently attempted a mysql to postgresql migration for our bacula
 5.0.2 server. The data migration itself was successful, however we are
 disappointly either getting the same or significantly worse
 performance out of the postgres db.

 I was hoping that someone might have some insight into this.

 Here is some background:

 software:
 centos 5.5 (64bit)
 bacula 5.0.2 (64bit)
 postgresql 8.1.21 (64bit)

 Why 8.1 ..? 8.1 is more than 5 years old ...


Yes, Alan Brown answered this for me, but, yeah, it's a restriction 
based on our policy to use Centos (5.5.) packages which is at postgresql 
8.1.  It might be worth trying a compiled version of the latest release 
to at least be able to compare and possibly make an argument for a 
non-Centos package in this case.


 (previously... mysql-5.0.77 (64bit) MyISAM)

 database:
 select count(*) from File -- 1,439,626,558
 du -sk /var/lib/pgsql/data -- 346,236,136 /var/lib/pgsql/data

 hardware:
 1Tb EXT3 external fibre-RAID storage
 8Gb RAM
 2Gb SWAP
 2 dual-core [AMD Opteron(tm) Processor 2220] CPUs


 Some of the postgres tuning that I've attempted thus far (comments are
 either default or alternatively settings I've tried without effect):

 #shared_buffers = 1000 # min 16 or max_connections*2, 8KB each
 shared_buffers = 262144 # 2Gb

 This is too large, set shared_buffers to something like 256-512 MB


I can try that.
The postgres tuning documents I'd read said to try 1/4 the RAM as a 
starting point, which is how I arrived at 2Gb.


 #work_mem = 1024 # min 64, size in KB
 work_mem = 524288 # 512Mb

 Don't forget that work_mem is allocated *per-operation* (maybe several
 times per query). 512 MB seems too large for me


Thanks, I can try reducing this and the shared_buffers.


 #maintenance_work_mem = 16384 # min 1024, size in KB
 maintenance_work_mem = 2097152 # 2Gb
 #checkpoint_segments = 3 # in logfile segments, min 1, 16MB each
 checkpoint_segments = 16
 #checkpoint_warning = 30 # in seconds, 0 is off
 checkpoint_warning = 16
 #effective_cache_size = 1000 # typically 8KB each
 #effective_cache_size = 262144 # 256Mb
 effective_cache_size = 6291456 # 6Gb

 6GB seems OK to me

 #random_page_cost = 4 # units are one sequential page fetch cost
 random_page_cost = 2


 only reduce random_page_cost if you have fast disks (SAS, ...)

Thanks, I'll try putting this back to the default of 4.

 Now, as to what I'm 'seeing'. Building restore trees are on par with
 my previous mysql db, but what I'm seeing as significantly worse are:

 mysql postgresql
 Within Bat:
 1) Version Browser (large sample job) 3min 9min
 2) Restore Tree (average sample job) 40sec 25sec
 3) Restore Tree (large sample job) 10min 8.5min
 2) Jobs Run (1000 Records) 10sec 2min

 Within psql/mysql:
 1) select count(*) from File; 1sec 30min

 Catalog dump:
 1) mysqldump/pgdump 2hrs 3hrs


 I get a win on building Restore trees, but everywhere else, it's
 painfully slow. It makes the bat utility virtually unusable as an
 interface. Why the win (albeit moderate) in some cases but terrible
 responses in others?

 I admit that I am not familiar with postgres at all, but I tried to
 walk through some of the postgres tuning documents, including the
 notes in the bacula manual to arrive at the above settings. Also note
 that I've tried several variants on the configuration above (including
 the postgres defaults), don't have a detailed play by play of the
 results, but the time results above seemed typical regardless of what
 settings I tweaked.

 Any help would be greatly appreciated!
 Stephen



It doesn't sound like I'm doing anything egregiously wrong.

I am still surprised at how slow postgres is compared to mysql on the 
same hardware after all I've read and heard about postgres superiority.
Don't get me wrong, I understand it's strengths, but for an application 
like Bacula, it doesn't seem like many of it's features are really 
needed, and if it runs more slowly...  I may very well continue to run 
with mysql which is rather disappointing.

thanks!
Stephen
-- 
Stephen Thompson   Berkeley Seismological Laboratory
step...@seismo.berkeley.edu215 McCone Hall # 4760
404.538.7077 (phone)   University of California, Berkeley
510.643.5811 (fax) Berkeley, CA 94720-4760

--
ThinkGeek and WIRED's GeekDad team up for the Ultimate 
GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the 
lucky parental unit.  See the prize list and enter to win: 
http://p.sf.net/sfu/thinkgeek-promo
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] postgres tuning?

2010-06-07 Thread Stephen Thompson


On 06/07/2010 09:24 AM, Florian Heigl wrote:
 Hi all,

 Within psql/mysql:
 1) select count(*) from File;1sec30min

 Disclaimer: I don't know a dime's worth of databases per se. But I
 spend a lot of time hunting other peoples performance issues. :)

 I think you should start identifying the cause for this bit at the
 very first, as it shows the absolutely worst perfomance and probably
 what slows this also slows the rest.
 My nose says this is really smelly and should'nt even take as long
 without any indexes.

I totally agree.  This was my first cause for concern when after the 
data import, I wanted to check that I still had the same number of rows.


 Can you verify whether the system swaps during your test commands or doesn't?


No, it does not swap, in this case or any other one I've tested.

 sar -dp 100 1 | grep -v nodev will do nicely with the external storage.


sdc1 is the database partition (including log and temp files)...

11:05:21 AM   DEV   tps  rd_sec/s  wr_sec/s  avgrq-sz  avgqu-sz 
 await svctm %util
11:07:01 AM   sdd449.95 111886.88  4.32248.67  1.54 
  3.41  2.11 94.85
11:07:01 AM  sdd1449.95 111886.88  4.32248.67  1.54 
  3.41  2.11 94.85

Average:  DEV   tps  rd_sec/s  wr_sec/s  avgrq-sz  avgqu-sz 
 await svctm %util
Average:  sdd449.95 111886.88  4.32248.67  1.54 
  3.41  2.11 94.85
Average: sdd1449.95 111886.88  4.32248.67  1.54 
  3.41  2.11 94.85


 also can you please let us know the iowait times (also from sar) and
 promise it's not a Raid5 array you have in use?
 (but, admittedly it doesn't explain for a big difference between mysql
 and postgresql)


Yes, it is RAID5, it was the only place I could go to get external 
space, with internal being inadequate.  However, you point out that this 
isn't a fundamental problem as mysql performance was mostly ok.
The different must be in the postgres config (or postgres itself).


 Also, I don't know if I would value RedHat supporting postgre 8.1
 higher than running 8.4.1  :)


 Florian


thanks!
Stephen
-- 
Stephen Thompson   Berkeley Seismological Laboratory
step...@seismo.berkeley.edu215 McCone Hall # 4760
404.538.7077 (phone)   University of California, Berkeley
510.643.5811 (fax) Berkeley, CA 94720-4760

--
ThinkGeek and WIRED's GeekDad team up for the Ultimate 
GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the 
lucky parental unit.  See the prize list and enter to win: 
http://p.sf.net/sfu/thinkgeek-promo
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] postgres tuning?

2010-06-07 Thread Stephen Thompson
On 06/07/2010 12:17 PM, Julien wrote:
 On Mon, 2010-06-07 at 18:24 +0200, Florian Heigl wrote:
 Hi all,

 Within psql/mysql:
 1) select count(*) from File;1sec30min


 Is your MySQL database in the MyISAM format ? If yes, then it's
 perfectly normal.


Yes.
So it sounds like the row count is a red herring in saying there's a 
performance problem with postgres.  I'll try to ignore that result then 
and concentrate on the queries like the one that produces the jobs run 
in the bat console.  That and the version browser are where I'm seeing 
my worst results.


 Disclaimer: I don't know a dime's worth of databases per se. But I
 spend a lot of time hunting other peoples performance issues. :)

 I think you should start identifying the cause for this bit at the
 very first, as it shows the absolutely worst perfomance and probably
 what slows this also slows the rest.
 My nose says this is really smelly and should'nt even take as long
 without any indexes.

 Can you verify whether the system swaps during your test commands or doesn't?

 sar -dp 100 1 | grep -v nodev will do nicely with the external storage.

 also can you please let us know the iowait times (also from sar) and
 promise it's not a Raid5 array you have in use?
 (but, admittedly it doesn't explain for a big difference between mysql
 and postgresql)

 Also, I don't know if I would value RedHat supporting postgre 8.1
 higher than running 8.4.1  :)


 Florian




-- 
Stephen Thompson   Berkeley Seismological Laboratory
step...@seismo.berkeley.edu215 McCone Hall # 4760
404.538.7077 (phone)   University of California, Berkeley
510.643.5811 (fax) Berkeley, CA 94720-4760

--
ThinkGeek and WIRED's GeekDad team up for the Ultimate 
GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the 
lucky parental unit.  See the prize list and enter to win: 
http://p.sf.net/sfu/thinkgeek-promo
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] 5.0.1 infinite email loop bug??

2010-07-28 Thread Stephen Thompson

After running for 3 months without this problem, it happened again last 
night.  We are running 5.0.2 at this point.

Stephen



On 04/15/2010 10:25 AM, Stephen Thompson wrote:

 Hello,

 I have just now experienced a possible new bug with bacula 5.0.1.

 The symptoms are this:

 bacula-sd crashes
 bacula-dir continues to run
 bacula-dir then spews out identical Intervention needed emails until
 manually restarted

 The first time this happened over a weekend and upon returning I found
 my inbox has about 120,000 bacula emails, all the SAME and of this type:

 15-Apr 10:02 client-fd JobId 11: Fatal error: backup.c:1048 Network
 send error to SD. ERR=Broken pipe

 It happened again just now (second time since upgrading from 3.0.3 to
 5.0.1) and I managed to stop the director with only a few thousand
 emails going out.

 So there are really 2 issues here:

 1)
 Why does the director apparently get stuck in an infinite loop of
 sending the same email message?  Is this a known bug?

 2)
 Regarding the SD, I received one alert of this type, the rest like the
 above:

15-Apr 10:02 server-sd: ERROR in lock.c:268 Failed ASSERT:
 dev-blocked()

 A traceback like:
 --
 ptrace: Operation not permitted.
 /var/bacula/work/29091: No such file or directory.
 $1 = 0
 /opt/bacula-5.0.1/scripts/btraceback.gdb:2: Error in sourced command file:
 No symbol exename in current context.
 --

 And a bactrace like:
 --
 Attempt to dump current JCRs
 JCR=0x19a24888 JobId=10 name=client_1.2010-04-14_18.02.33_41 JobStatus=l
   use_count=1
   JobType=B JobLevel=F
   sched_time=14-Apr-2010 21:35 start_time=14-Apr-2010 21:35
   end_time=31-Dec-1969 16:00 wait_time=31-Dec-1969 16:00
   db=(nil) db_batch=(nil) batch_started=0
 JCR=0x1981b248 JobId=11 name=client_10.2010-04-14_20.00.15_04
 JobStatus=R
   use_count=1
   JobType=B JobLevel=I
   sched_time=15-Apr-2010 09:15 start_time=15-Apr-2010 09:15
   end_time=31-Dec-1969 16:00 wait_time=31-Dec-1969 16:00
   db=(nil) db_batch=(nil) batch_started=0
 Attempt to dump plugins. Hook count=0
 --

 Both clients and server seem healthy, except for the SD crash.
 Any ideas?


 thanks!
 Stephen


 -
 Further info:

 My catalog...

   mysql-5.0.77 (64bit) MyISAM
   210Gb in size
   1,412,297,215 records in File table
   note: database built with bacula 2x scripts,
   upgraded with 3x scripts, then again with 5x scripts
   (i.e. nothing customized along the way)

 My OS  hardware for bacula DIR+SD server...

   Centos 5.4 (fully patched)
   8Gb RAM
   2Gb Swap
   1Tb EXT3 filesystem on external fiber RAID5 array
   (dedicated to database, incl. temp files)
   2 dual-core [AMD Opteron(tm) Processor 2220] CPUs
   StorageTek SL500 Library with 2 LTO3 Drives





 --
 Download Intel#174; Parallel Studio Eval
 Try the new software tools for yourself. Speed compiling, find bugs
 proactively, and fine-tune applications for parallel performance.
 See why Intel Parallel Studio got high marks during beta.
 http://p.sf.net/sfu/intel-sw-dev
 ___
 Bacula-users mailing list
 Bacula-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/bacula-users


-- 
Stephen Thompson   Berkeley Seismological Laboratory
step...@seismo.berkeley.edu215 McCone Hall # 4760
404.538.7077 (phone)   University of California, Berkeley
510.643.5811 (fax) Berkeley, CA 94720-4760

--
The Palm PDK Hot Apps Program offers developers who use the
Plug-In Development Kit to bring their C/C++ apps to Palm for a share
of $1 Million in cash or HP Products. Visit us here for more details:
http://p.sf.net/sfu/dev2dev-palm
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


[Bacula-users] 5.2.1 bat missing Browse Cataloged Files?

2011-11-03 Thread Stephen Thompson


Hey all,

Recently upgraded to 5.2.1, things mostly running well.
I notice in BAT that the Browse Cataloged Files icon in the toolbar is 
greyed out and there is now a bRestore Page.  Has the Browse Cataloged 
Files feature been depreciated (I use it a lot), and if so, why is the 
icon not entirely absent.  If not, is there something that I could have 
down in my build to accidentally excluded that feature?

thanks!
Stephen
-- 
Stephen Thompson   Berkeley Seismological Laboratory
step...@seismo.berkeley.edu215 McCone Hall # 4760
404.538.7077 (phone)   University of California, Berkeley
510.643.5811 (fax) Berkeley, CA 94720-4760

--
RSA(R) Conference 2012
Save $700 by Nov 18
Register now
http://p.sf.net/sfu/rsa-sfdev2dev1
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] 5.2.1 bat missing Browse Cataloged Files?

2011-11-04 Thread Stephen Thompson


Thank you. I had not seen that.

I'll explore the bRestore feature, the screen layout is a bit 
non-intuitive, but the functionality looks promising.

Stephen



On 11/3/11 7:23 PM, John Drescher wrote:
 On Thu, Nov 3, 2011 at 12:59 PM, Stephen Thompson
 step...@seismo.berkeley.edu  wrote:


 Hey all,

 Recently upgraded to 5.2.1, things mostly running well.
 I notice in BAT that the Browse Cataloged Files icon in the toolbar is
 greyed out and there is now a bRestore Page.  Has the Browse Cataloged
 Files feature been depreciated (I use it a lot), and if so, why is the
 icon not entirely absent.  If not, is there something that I could have
 down in my build to accidentally excluded that feature?


 I found this in the release notes - The old bat version browser has
 been turned off since it does not
work correctly and the brestore panel provides the same functionality

 http://voxel.dl.sourceforge.net/project/bacula/Win32_64/5.2.1/ReleaseNotes

 John

-- 
Stephen Thompson   Berkeley Seismological Laboratory
step...@seismo.berkeley.edu215 McCone Hall # 4760
404.538.7077 (phone)   University of California, Berkeley
510.643.5811 (fax) Berkeley, CA 94720-4760

--
RSA(R) Conference 2012
Save $700 by Nov 18
Register now
http://p.sf.net/sfu/rsa-sfdev2dev1
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


[Bacula-users] possible 5.2.2 bug (incrementals being promoted to fulls)

2011-11-30 Thread Stephen Thompson


FYI

Not sure if anyone's seen or reported this, but I upgraded from 5.2.1 to 
5.2.2 yesterday and during my backups last night, several jobs were 
promoted from Incremental to Full, even though their job configurations 
had not changed and they did have a valid Full backup from last week.

I have never seen this happen before with bacula in general or my 
configuration in particular, so I thought it might be possible that a 
bug was introduced into 5.2.2.

thanks,
Stephen
-- 
Stephen Thompson   Berkeley Seismological Laboratory
step...@seismo.berkeley.edu215 McCone Hall # 4760
404.538.7077 (phone)   University of California, Berkeley
510.643.5811 (fax) Berkeley, CA 94720-4760

--
All the data continuously generated in your IT infrastructure 
contains a definitive record of customers, application performance, 
security threats, fraudulent activity, and more. Splunk takes this 
data and makes sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-novd2d
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] [Bacula-devel] possible 5.2.2 bug (incrementals being promoted to fulls)

2011-12-01 Thread Stephen Thompson


I agree, it's unlikely a 'new' bug, but rather the restarting of my 
director during the upgrade that caused the problem to exhibit itself.

Here is what happened in more detail.

A week before the upgrade/director restart, the conf files for a 
significant number of jobs (~100) were changed and a reload of the 
director, and on the day they were changed we manually ran Fulls for 
each modified job, which were successfully completed as Fulls.  Then on 
subsequent evenings (every day for a week) scheduled Incrementals ran 
successfully, and as Incrementals.

When we upgrade to 5.2.2, we of course stopped our old director and 
started up the new one.  That evening the 12 out of the ~100 jobs 
mentioned above had their scheduled Incrementals promoted to Fulls, and 
yes the message in the log says:

 No prior or suitable Full backup found in catalog. Doing FULL backup.

However, this is not actually the case.  There is a successful FULL a 
week old for each of the 12 jobs that were promoted, and the other jobs 
in the ~100 that were changed did not promote to a FULL.

The dates on the conf files shows that they have not changed since the 
Full backups were made a week ago.

Again, we've been using bacula for years now, have some degree of 
expertise with it, and we've never seen this before.

Very strange...
Stephen







On 11/30/11 12:48 PM, Kern Sibbald wrote:
 Hello,

 Most likely you edited the .conf file and modified the
 FileSet. If that is the case, listing all the FileSets recorded
 in the database will show multiple copies of the FileSet
 record with different hashes.

 In most cases, other than changing the FileSet, Bacula
 clearly indicates why it is upgrading a level. In the case
 of a FileSet change, it prints a notice saying something
 like a valid Full could not be found.

 The probability that there is a new bug introduced between
 5.2.1 and 5.2.2 is probably about 0.0001% since there were very
 few coding changes except for bug fixes.

 Regards,
 Kern

 On 11/30/2011 07:55 PM, Stephen Thompson wrote:

 FYI

 Not sure if anyone's seen or reported this, but I upgraded from 5.2.1 to
 5.2.2 yesterday and during my backups last night, several jobs were
 promoted from Incremental to Full, even though their job configurations
 had not changed and they did have a valid Full backup from last week.

 I have never seen this happen before with bacula in general or my
 configuration in particular, so I thought it might be possible that a
 bug was introduced into 5.2.2.

 thanks,
 Stephen

-- 
Stephen Thompson   Berkeley Seismological Laboratory
step...@seismo.berkeley.edu215 McCone Hall # 4760
404.538.7077 (phone)   University of California, Berkeley
510.643.5811 (fax) Berkeley, CA 94720-4760

--
All the data continuously generated in your IT infrastructure 
contains a definitive record of customers, application performance, 
security threats, fraudulent activity, and more. Splunk takes this 
data and makes sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-novd2d
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


[Bacula-users] Catalog backup while job running?

2012-02-06 Thread Stephen Thompson


Hello,

We were wondering if anyone using bacula had come up with a creative way 
to backup their Catalog.  We understand the basic dilemna -- that one 
should not backup a database that is in use, because it's not a coherent 
view.

Currently we've managed to keep our filesets and jobs small enough that 
we're able to run jobs (a few full jobs along with an incremental of all 
jobs) in under 24 hours and then (per the bacula manual) backup the 
Catalog afterwards when nothing is running.  However, we're faced with 
some new jobs that may take 4 days to complete.  We have multiple tape 
drives, so we can run a long-running job on one drive while we use 
another for nightly Incrementals without contention.  But, we would also 
like to get nightly backup of the Catalog that reflects the changes 
introduced by the nightly Incrementals.

So, my question is whether anyone had any ideas about the feasibility of 
getting a backup of the Catalog while a single long-running job is 
active?  This could be in-band (database dump) or out-of-band (copy of 
database directory on filesystem or slave database server taken 
offline).  We are using MySQL, but would not be opposed to switching to 
PostGRES if it buys us anything in this regard.

What I wonder specifically (in creating my own solution) is:
1) If I backup the MySQL database directory, or sync to a slave server 
and create a dump from that, am I simply putting the active 
long-running job records at risk of being incoherent, or am I risking 
the integrity of the whole Catalog in doing so?
2) If I attempt a dump of the MySQL catalog and lock the tables while 
doing so, what will the results be to the active long-running job? 
Will it crap out or simply pause and wait for database access when it 
needs to read/write to the database?  And if so, how long will it wait?


thanks for reading,
Stephen
-- 
Stephen Thompson   Berkeley Seismological Laboratory
step...@seismo.berkeley.edu215 McCone Hall # 4760
404.538.7077 (phone)   University of California, Berkeley
510.643.5811 (fax) Berkeley, CA 94720-4760

--
Try before you buy = See our experts in action!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-dev2
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Catalog backup while job running?

2012-02-06 Thread Stephen Thompson
On 02/06/2012 02:45 PM, Phil Stracchino wrote:
 On 02/06/2012 05:02 PM, Stephen Thompson wrote:
 So, my question is whether anyone had any ideas about the feasibility of
 getting a backup of the Catalog while a single long-running job is
 active?  This could be in-band (database dump) or out-of-band (copy of
 database directory on filesystem or slave database server taken
 offline).  We are using MySQL, but would not be opposed to switching to
 PostGRES if it buys us anything in this regard.

 What I wonder specifically (in creating my own solution) is:
 1) If I backup the MySQL database directory, or sync to a slave server
 and create a dump from that, am I simply putting the active
 long-running job records at risk of being incoherent, or am I risking
 the integrity of the whole Catalog in doing so?
 2) If I attempt a dump of the MySQL catalog and lock the tables while
 doing so, what will the results be to the active long-running job?
 Will it crap out or simply pause and wait for database access when it
 needs to read/write to the database?  And if so, how long will it wait?

 Stephen,
 Three suggestions here.

 Route 1:
 Set up a replication slave and perform your backups from the slave.  If
 the slave falls behind the master while you're dumping the DB, you don't
 really care all that much.  It doesn't impact your production DB.


This was one of my ideas to try, though I'm still wondering -- If my 
slave does fall behind my production while dumping DB, because a 
long-running job is active during the dump, will that dump of the DB 
simply be missing information about that running job, or will anything 
else in the Catalog be affected?  Because ultimately, if I need to 
restore my Catalog from backup, I want to be able to search and restore 
from all completed jobs (the acceptable omission being the job running 
during the dump, because it wasn't complete at the time!) as well as 
continue to run future backup jobs as normal with that restored Catalog.

 Route 2:
 If you're not using InnoDB in MySQL, you should be by now.  So look into
 the --skip-opt and --single-transaction options to mysqldump to dump all
 of the transactional tables consistently without locking them.  Your
 grant tables will still need a read lock, but hey, you weren't planning
 on rewriting your grant tables every day, were you...?


Thanks, I look into this.  Without the locks, but dumping while a job is 
running, this still begs the question above -- Am I just putting the 
data associated with the running job (concurrent to the dump) at risk, 
or is there any risk that my Catalog will go screwy in a more broad 
fashion.  For instance, a counter that's not incremented ...or some row 
that's written upon job completion that, since the dump was made before 
the 'long-running' job completed, causing more general mayhem than just 
missing records for the uncompleted job.  I don't know the database 
layout, or logic of what's written to the database and when, to 
understand what risk I am at with Route 1 or Route 2.

 Route 3:
 Look into an alternate DB backup solution like mydumper or Percona
 XtraBackup.

 Route 4:
 Do you have the option of taking a snapshot of your MySQL datadir and
 backing up the snapshot?  This can be viable if you have a small DB and
 fast copy-on-write snapshots.  (It's the technique I'm using at the
 moment, though I'm considering a switch to mydumper.)

Nope, not an option.




thanks!
Stephen
-- 
Stephen Thompson   Berkeley Seismological Laboratory
step...@seismo.berkeley.edu215 McCone Hall # 4760
404.538.7077 (phone)   University of California, Berkeley
510.643.5811 (fax) Berkeley, CA 94720-4760

--
Try before you buy = See our experts in action!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-dev2
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


[Bacula-users] new jobs have to wait for despooling to finish?

2012-03-19 Thread Stephen Thompson

Hello,

I was wondering if anyone could confirm what I've noticed on my own 
instance of bacula, which seems contrary to the Bacula manual.

 From DataSpooling section:
 If you are running multiple simultaneous jobs, Bacula will continue spooling 
 other jobs while one is despooling to tape, provided there is sufficient 
 spool file space.

This seems to be true, only if the jobs in question were launched at the 
same time/concurrently.  New jobs launched while a job is despooling, 
are launched into a running state, but they do not begin to spool 
until the existing job(s) finish despooling.

This is very sad, because I just came into a windfall of spool space and 
I was hoping to run jobs back to back, such that while one set of jobs 
were despooling, I could have the next set spooling, and so on.

I see an old bug 0001231 with a similar issue, which in the history it 
is pointed out that it may not be that new jobs can't spool while 
existing jobs despool, but that the new jobs cannot verify that they 
will have tape access, which is a step before spooling begins.

I wonder if this is the state of affairs and if there are any plans to 
improve upon this inefficiency.

thanks!
Stephen
-- 
Stephen Thompson   Berkeley Seismological Laboratory
step...@seismo.berkeley.edu215 McCone Hall # 4760
404.538.7077 (phone)   University of California, Berkeley
510.643.5811 (fax) Berkeley, CA 94720-4760

--
This SF email is sponsosred by:
Try Windows Azure free for 90 days Click Here 
http://p.sf.net/sfu/sfd2d-msazure
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


[Bacula-users] bacula table b2123??

2012-03-25 Thread Stephen Thompson


Hello all,

I notice that I have a table called b2123 that hasn't been written to in 
months (Nov 4 2011) and does not appear to be one of the tables created 
during install.  Is this some kind of temp table that I can go ahead and 
drop?

It looks like the File table only much much smaller.

thanks!
Stephen
-- 
Stephen Thompson   Berkeley Seismological Laboratory
step...@seismo.berkeley.edu215 McCone Hall # 4760
404.538.7077 (phone)   University of California, Berkeley
510.643.5811 (fax) Berkeley, CA 94720-4760

--
This SF email is sponsosred by:
Try Windows Azure free for 90 days Click Here 
http://p.sf.net/sfu/sfd2d-msazure
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] bacula table b2123??

2012-03-26 Thread Stephen Thompson
On 03/25/2012 11:36 AM, Stephen Thompson wrote:


 Hello all,

 I notice that I have a table called b2123 that hasn't been written to in
 months (Nov 4 2011) and does not appear to be one of the tables created
 during install.  Is this some kind of temp table that I can go ahead and
 drop?

 It looks like the File table only much much smaller.

 thanks!
 Stephen

To answer my own question, this appears to be a table left around by a 
bat that crashed.  Solution, drop table b2123.

thanks,
Stephen
-- 
Stephen Thompson   Berkeley Seismological Laboratory
step...@seismo.berkeley.edu215 McCone Hall # 4760
404.538.7077 (phone)   University of California, Berkeley
510.643.5811 (fax) Berkeley, CA 94720-4760

--
This SF email is sponsosred by:
Try Windows Azure free for 90 days Click Here 
http://p.sf.net/sfu/sfd2d-msazure
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Catalog backup while job running?

2012-04-02 Thread Stephen Thompson
On 02/06/2012 02:45 PM, Phil Stracchino wrote:
 On 02/06/2012 05:02 PM, Stephen Thompson wrote:
 So, my question is whether anyone had any ideas about the feasibility of
 getting a backup of the Catalog while a single long-running job is
 active?  This could be in-band (database dump) or out-of-band (copy of
 database directory on filesystem or slave database server taken
 offline).  We are using MySQL, but would not be opposed to switching to
 PostGRES if it buys us anything in this regard.

 What I wonder specifically (in creating my own solution) is:
 1) If I backup the MySQL database directory, or sync to a slave server
 and create a dump from that, am I simply putting the active
 long-running job records at risk of being incoherent, or am I risking
 the integrity of the whole Catalog in doing so?
 2) If I attempt a dump of the MySQL catalog and lock the tables while
 doing so, what will the results be to the active long-running job?
 Will it crap out or simply pause and wait for database access when it
 needs to read/write to the database?  And if so, how long will it wait?

 Stephen,
 Three suggestions here.

 Route 1:
 Set up a replication slave and perform your backups from the slave.  If
 the slave falls behind the master while you're dumping the DB, you don't
 really care all that much.  It doesn't impact your production DB.

 Route 2:
 If you're not using InnoDB in MySQL, you should be by now.  So look into
 the --skip-opt and --single-transaction options to mysqldump to dump all
 of the transactional tables consistently without locking them.  Your
 grant tables will still need a read lock, but hey, you weren't planning
 on rewriting your grant tables every day, were you...?



Well, we've made the leap from MyISAM to InnoDB, seems like we win on 
transactions, but lose on read speed.

That aside, I'm seeing something unexpected.  I am now able to 
successfully run jobs while I use mysqldump to dump the bacula Catalog, 
except at the very end of the dump there is some sort of contention.  A 
few of my jobs (3-4 out of 150) that are attempting to despool 
attritbutes at the tail end of the dump yield this error:

Fatal error: sql_create.c:860 Fill File table Query failed: INSERT INTO 
File (FileIndex, JobId, PathId, FilenameId, LStat, MD5, DeltaSeq) SELECT 
batch.FileIndex, batch.JobId, Path.PathId, 
Filename.FilenameId,batch.LStat, batch.MD5, batch.DeltaSeq FROM batch 
JOIN Path ON (batch.Path = Path.Path) JOIN Filename ON (batch.Name = 
Filename.Name): ERR=Lock wait timeout exceeded; try restarting transaction

I have successful jobs before and after this 'end of the dump' timeframe.

It looks like I might be able to fix this by increasing my 
innodb_lock_wait_timeout, but I'd like to understand WHY I need to 
icnrease it.  Anyone know what's happening at the end of a dump like 
this that would cause the above error?

mysqldump -f --opt --skip-lock-tables --single-transaction bacula 
 bacula.sql

Is it the commit on this 'dump' transaction?

thanks!
Stephen





 Route 3:
 Look into an alternate DB backup solution like mydumper or Percona
 XtraBackup.

 Route 4:
 Do you have the option of taking a snapshot of your MySQL datadir and
 backing up the snapshot?  This can be viable if you have a small DB and
 fast copy-on-write snapshots.  (It's the technique I'm using at the
 moment, though I'm considering a switch to mydumper.)




-- 
Stephen Thompson   Berkeley Seismological Laboratory
step...@seismo.berkeley.edu215 McCone Hall # 4760
404.538.7077 (phone)   University of California, Berkeley
510.643.5811 (fax) Berkeley, CA 94720-4760

--
This SF email is sponsosred by:
Try Windows Azure free for 90 days Click Here 
http://p.sf.net/sfu/sfd2d-msazure
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Catalog backup while job running?

2012-04-02 Thread Stephen Thompson


First off, thanks for the response Phil.


On 04/02/2012 01:11 PM, Phil Stracchino wrote:
 On 04/02/2012 01:49 PM, Stephen Thompson wrote:
 Well, we've made the leap from MyISAM to InnoDB, seems like we win on
 transactions, but lose on read speed.

 If you're finding InnoDB slower than MyISAM on reads, your InnoDB buffer
 pool is probably too small.

This is probably true, but I have limited system resources and my File 
table is almost 300Gb large.


 That aside, I'm seeing something unexpected.  I am now able to
 successfully run jobs while I use mysqldump to dump the bacula Catalog,
 except at the very end of the dump there is some sort of contention.  A
 few of my jobs (3-4 out of 150) that are attempting to despool
 attritbutes at the tail end of the dump yield this error:

 Fatal error: sql_create.c:860 Fill File table Query failed: INSERT INTO
 File (FileIndex, JobId, PathId, FilenameId, LStat, MD5, DeltaSeq) SELECT
 batch.FileIndex, batch.JobId, Path.PathId,
 Filename.FilenameId,batch.LStat, batch.MD5, batch.DeltaSeq FROM batch
 JOIN Path ON (batch.Path = Path.Path) JOIN Filename ON (batch.Name =
 Filename.Name): ERR=Lock wait timeout exceeded; try restarting transaction

 I have successful jobs before and after this 'end of the dump' timeframe.

 It looks like I might be able to fix this by increasing my
 innodb_lock_wait_timeout, but I'd like to understand WHY I need to
 icnrease it.  Anyone know what's happening at the end of a dump like
 this that would cause the above error?

 mysqldump -f --opt --skip-lock-tables --single-transaction bacula
   bacula.sql

 Is it the commit on this 'dump' transaction?

 --skip-lock-tables is referred to in the mysqldump documentation, but
 isn't actually a valid option.  This is actually an increasingly
 horrible problem with mysqldump.  It has been very poorly maintained,
 and has barely developed at all in ten or fifteen years.


This has me confused.  I have jobs that can run, and insert records into 
the File table, while I am dumping the Catalog.  It's only at the 
tail-end that a few jobs get the error above.  Wouldn't a locked File 
table cause all concurrent jobs to fail?


 Table locks are the default behavior of mysqldump, as part of the
 default --opt group.  To override it, you actually have to use
 --skip-opt, than add back in the rest of the options from the --opt
 group that you actually wanted.  There is *no way* to get mysqldump to
 Do The Right Thing for both transactional and non-transactional tables
 in the same run.  it is simply not possible.

 My suggestion would be to look at mydumper instead.  It has been written
 by a couple of former MySQL AB support engineers who started with a
 clean sheet of paper, and it is what mysqldump should have become ten
 years ago.  It dumps tables in parallel, doesn't require exclusion of
 schemas that shouldn't be dumped because it knows they shouldn't be
 dumped, doesn't require long strings of arguments to tell it how to
 correctly handle transactional and non-transactional tables because it
 understands both and just Does The Right Thing on a table-by-table
 basis, can dump tables in parallel for better speed, can dump binlogs as
 well as tables, separates the data from the schemas...

 Give it a try.


Thanks, I'll take a look at it.


 That said, I make my MySQL dump job a lower priority job and run it only
 after all other jobs have completed.  This makes sure I get the most
 current possible data in my catalog dump.  I just recently switched to a
 revised MySQL backup job that uses mydumper with the following simple
 shell script as a ClientRunBeforeJob on a separate host from the actual
 DB server.  (Thus, if the backup client goes down, I still have the live
 DB, and if the DB server goes down, I still have the DB backups on disk.)


 #!/bin/bash

 RETAIN=5
 USER=xx
 PASS=xx
 DUMPDIR=/dbdumps
 HOST=babylon4
 PORT=6446
 TIMEOUT=300
 FMT='%Y%m%d-%T'
 DEST=${DUMPDIR}/${HOST}-$(date +${FMT})

 for dir in $(ls -r ${DUMPDIR} | tail -n +${RETAIN})
 do
 echo Deleting ${DUMPDIR}/${dir}
 rm -rf ${DUMPDIR}/${dir}
 done

 mydumper -Cce -h ${HOST} -p ${PORT} -u ${USER} --password=${PASS} -o
 ${DEST} -l ${TIMEOUT}


 Then my Bacula fileset for the DB-backup job just backs up the entire
 /db-dumps directory.




-- 
Stephen Thompson   Berkeley Seismological Laboratory
step...@seismo.berkeley.edu215 McCone Hall # 4760
404.538.7077 (phone)   University of California, Berkeley
510.643.5811 (fax) Berkeley, CA 94720-4760

--
Better than sec? Nothing is better than sec when it comes to
monitoring Big Data applications. Try Boundary one-second 
resolution app monitoring today. Free.
http://p.sf.net/sfu/Boundary-dev2dev
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula

Re: [Bacula-users] Catalog backup while job running?

2012-04-03 Thread Stephen Thompson


On 4/3/12 3:28 AM, Martin Simmons wrote:
 On Mon, 02 Apr 2012 15:06:31 -0700, Stephen Thompson said:

 That aside, I'm seeing something unexpected.  I am now able to
 successfully run jobs while I use mysqldump to dump the bacula Catalog,
 except at the very end of the dump there is some sort of contention.  A
 few of my jobs (3-4 out of 150) that are attempting to despool
 attritbutes at the tail end of the dump yield this error:

 Fatal error: sql_create.c:860 Fill File table Query failed: INSERT INTO
 File (FileIndex, JobId, PathId, FilenameId, LStat, MD5, DeltaSeq) SELECT
 batch.FileIndex, batch.JobId, Path.PathId,
 Filename.FilenameId,batch.LStat, batch.MD5, batch.DeltaSeq FROM batch
 JOIN Path ON (batch.Path = Path.Path) JOIN Filename ON (batch.Name =
 Filename.Name): ERR=Lock wait timeout exceeded; try restarting transaction

 I have successful jobs before and after this 'end of the dump' timeframe.

 It looks like I might be able to fix this by increasing my
 innodb_lock_wait_timeout, but I'd like to understand WHY I need to
 icnrease it.  Anyone know what's happening at the end of a dump like
 this that would cause the above error?

 mysqldump -f --opt --skip-lock-tables --single-transaction bacula
bacula.sql

 Is it the commit on this 'dump' transaction?

 --skip-lock-tables is referred to in the mysqldump documentation, but
 isn't actually a valid option.  This is actually an increasingly
 horrible problem with mysqldump.  It has been very poorly maintained,
 and has barely developed at all in ten or fifteen years.


 This has me confused.  I have jobs that can run, and insert records into
 the File table, while I am dumping the Catalog.  It's only at the
 tail-end that a few jobs get the error above.  Wouldn't a locked File
 table cause all concurrent jobs to fail?

 Are you sure that jobs are inserting records into the File table whilst they
 are running?  With spooling, file records are not inserted until the end of
 the job.

 Likewise, in batch mode (as above), the File table is only updated once at the
 end.


Yes, I have completed jobs before and after the problem jobs (which 
aren't always the same jobs, or happen at the same time, except that 
they seem to correlate with the end of the Catalog dump, which could 
also be the end of the File table dump, since it's 99% of the db).

I can view the inserted records from jobs that complete while the 
Catalog dump is running.  And I am spooling, so jobs are inserting all 
attrs at the end of the job.  The jobs with the errors are clearly 
moving their records from the batch file to the File table at the 
conclusion of their run.

I have never seen this before moving to InnoDB, but of course, I moved 
to InnoDB to be able to run my Catalog dump concurrently with jobs 
(knowing I won't capture the records from the running jobs).  So at this 
point, I'm not sure if I'm getting the error because of something 
happening at the end of the dump, or if it's merely a 'collision' of 
jobs all wanting to insert batch records at the same time.  I know that 
the Innodb engine has a lock wait timeout default of 50s, but I'm not 
sure who this was handled with MyISAM where I never saw this problem 
(but again, also, never ran my jobs concurrently with dump).

Stephen




 __Martin

 --
 Better than sec? Nothing is better than sec when it comes to
 monitoring Big Data applications. Try Boundary one-second
 resolution app monitoring today. Free.
 http://p.sf.net/sfu/Boundary-dev2dev
 ___
 Bacula-users mailing list
 Bacula-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/bacula-users

-- 
Stephen Thompson   Berkeley Seismological Laboratory
step...@seismo.berkeley.edu215 McCone Hall # 4760
404.538.7077 (phone)   University of California, Berkeley
510.643.5811 (fax) Berkeley, CA 94720-4760

--
Better than sec? Nothing is better than sec when it comes to
monitoring Big Data applications. Try Boundary one-second 
resolution app monitoring today. Free.
http://p.sf.net/sfu/Boundary-dev2dev
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Catalog backup while job running?

2012-04-03 Thread Stephen Thompson


On 4/2/12 3:33 PM, Phil Stracchino wrote:
 On 04/02/2012 06:06 PM, Stephen Thompson wrote:


 First off, thanks for the response Phil.


 On 04/02/2012 01:11 PM, Phil Stracchino wrote:
 On 04/02/2012 01:49 PM, Stephen Thompson wrote:
 Well, we've made the leap from MyISAM to InnoDB, seems like we win on
 transactions, but lose on read speed.

 If you're finding InnoDB slower than MyISAM on reads, your InnoDB buffer
 pool is probably too small.

 This is probably true, but I have limited system resources and my File
 table is almost 300Gb large.

 Ah, well, sometimes there's only so much you can allocate.

 --skip-lock-tables is referred to in the mysqldump documentation, but
 isn't actually a valid option.  This is actually an increasingly
 horrible problem with mysqldump.  It has been very poorly maintained,
 and has barely developed at all in ten or fifteen years.


 This has me confused.  I have jobs that can run, and insert records into
 the File table, while I am dumping the Catalog.  It's only at the
 tail-end that a few jobs get the error above.  Wouldn't a locked File
 table cause all concurrent jobs to fail?

 Hmm.  I stand corrected.  I've never seen it listed as an option in the
 man page, despite there being one reference to it, but I see that
 mysqldump --help does explain it even though the man page doesn't.

 In that case, the only thing I can think of is that you have multiple
 jobs trying to insert attributes at the same time and the last ones in
 line are timing out.

 (Locking the table for batch attribute insertion actually isn't
 necessary; MySQL can be configured to interleave auto_increment inserts.
   However, that's the way Bacula does it.)

 Don't know that I have any helpful suggestions there, then...  sorry.




Thanks again for the response, just bouncing this issue off someone is 
of help.

You idea about the jobs simply running into contention for locks sounds 
reasonable, though I never saw this happening with MyISAM (in the 3+ 
years we've run bacula, and I see it the 2nd night into running InnoDB).
If so, I wouldn't mind estimating the maximum time my jobs might have to 
wait for a lock, based on their size and concurrency, but I really hate 
just tweaking settings in the DB without knowing why I'm doing so, you 
know.  I'd like to get to the bottom of what's causing the timeout.

thanks,
Stephen
-- 
Stephen Thompson   Berkeley Seismological Laboratory
step...@seismo.berkeley.edu215 McCone Hall # 4760
404.538.7077 (phone)   University of California, Berkeley
510.643.5811 (fax) Berkeley, CA 94720-4760

--
Better than sec? Nothing is better than sec when it comes to
monitoring Big Data applications. Try Boundary one-second 
resolution app monitoring today. Free.
http://p.sf.net/sfu/Boundary-dev2dev
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Catalog backup while job running?

2012-04-03 Thread Stephen Thompson
On 04/03/2012 08:43 AM, Phil Stracchino wrote:

 Stephen, by the way, if you're not already aware of it:  You probably
 want to set innodb_flush_log_at_trx_commit = 0.

 The default value of this setting is 1, which causes the log buffer to
 be written out to the lgo file and the logfile flushed to disk at every
 transaction commit.  (Which obviously has a performance impact.)  With a
 setting of 0, nothing is done at transaction commit, but the log buffer
 is written to the log file and the log file flushed to disk once per
 second.  There is a potential with this setting that up to the last full
 second of transactions can be list in the event of a mysqld crash, but
 ... if mysqld crashes in the middle of Bacula inserting attributes, that
 job is blown *anyway*, so there's really no loss.


This is an interesting suggestion.

I wonder if it's possible since I'm running the dump as a single 
transaction if my database is becoming unavailable during this flush, 
such that the 50 second timeout for the locks the jobs are requesting is 
surpassed.  I would expect writes to a database to require more flushing 
than read (i.e. a dump), but I wonder if this could explain the jobs 
failing at the tail-end of the dump.




 I also suggest innodb_autoinc_lock_mode = 2, which allows InnoDB to
 interleave auto_increment inserts.  This may possibly help with your
 locking problem.  Keep in mind though that if you use this setting and
 you have replication running, your binlog_format must be set to MIXED or
 ROW.




-- 
Stephen Thompson   Berkeley Seismological Laboratory
step...@seismo.berkeley.edu215 McCone Hall # 4760
404.538.7077 (phone)   University of California, Berkeley
510.643.5811 (fax) Berkeley, CA 94720-4760

--
Better than sec? Nothing is better than sec when it comes to
monitoring Big Data applications. Try Boundary one-second 
resolution app monitoring today. Free.
http://p.sf.net/sfu/Boundary-dev2dev
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Catalog backup while job running?

2012-04-05 Thread Stephen Thompson
On 04/02/2012 03:33 PM, Phil Stracchino wrote:
 On 04/02/2012 06:06 PM, Stephen Thompson wrote:


 First off, thanks for the response Phil.


 On 04/02/2012 01:11 PM, Phil Stracchino wrote:
 On 04/02/2012 01:49 PM, Stephen Thompson wrote:
 Well, we've made the leap from MyISAM to InnoDB, seems like we win on
 transactions, but lose on read speed.

 If you're finding InnoDB slower than MyISAM on reads, your InnoDB buffer
 pool is probably too small.

 This is probably true, but I have limited system resources and my File
 table is almost 300Gb large.

 Ah, well, sometimes there's only so much you can allocate.

 --skip-lock-tables is referred to in the mysqldump documentation, but
 isn't actually a valid option.  This is actually an increasingly
 horrible problem with mysqldump.  It has been very poorly maintained,
 and has barely developed at all in ten or fifteen years.


 This has me confused.  I have jobs that can run, and insert records into
 the File table, while I am dumping the Catalog.  It's only at the
 tail-end that a few jobs get the error above.  Wouldn't a locked File
 table cause all concurrent jobs to fail?

 Hmm.  I stand corrected.  I've never seen it listed as an option in the
 man page, despite there being one reference to it, but I see that
 mysqldump --help does explain it even though the man page doesn't.

 In that case, the only thing I can think of is that you have multiple
 jobs trying to insert attributes at the same time and the last ones in
 line are timing out.



This appears to be the root cause.  After running a few more nights, the 
coincidence with the Catalog dump was not maintained.  It happens for a 
few jobs each night, at different times, different jobs, and sometimes 
when no Catalog dump is occurring.

I think it's simply that a bunch of batch inserts wind up running at the 
same time and the last in line run out of time.  Rather than setting my 
timeout arbitrarily large (10 minutes did not solve the problem), I am 
curious about what you say below.

 (Locking the table for batch attribute insertion actually isn't
 necessary; MySQL can be configured to interleave auto_increment inserts.
   However, that's the way Bacula does it.)

Are you saying that if I turn on auto_increment inserts in MySQL, it 
won't matter whether or not bacula is asking for locks during batch 
inserts?  Or does bacula also need to be configured (patched) not to use 
locks during batch inserts?

And lastly, why does the bacula documentation claim that locks are 
'essential' for batch inserts and you claim they are not?

I'm surprised more folks running mysql InnoDB and bacula aren't having 
this problem since I stumbled upon it so easily.  :)  Perhaps the trend 
is MySQL MyISAM -- Postgres.



 Don't know that I have any helpful suggestions there, then...  sorry.




thanks!
Stephen
-- 
Stephen Thompson   Berkeley Seismological Laboratory
step...@seismo.berkeley.edu215 McCone Hall # 4760
404.538.7077 (phone)   University of California, Berkeley
510.643.5811 (fax) Berkeley, CA 94720-4760

--
Better than sec? Nothing is better than sec when it comes to
monitoring Big Data applications. Try Boundary one-second 
resolution app monitoring today. Free.
http://p.sf.net/sfu/Boundary-dev2dev
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Bacula MySQL Catalog binlog restore

2012-04-05 Thread Stephen Thompson
On 04/05/2012 02:27 PM, Joe Nyland wrote:
 Hi,

 I've been using Bacula for a while now and I have a backup procedure in place 
 for my MySQL databases, where I perform a full (dump) backup nightly, then 
 incremental (bin log) backups every hour through the day to capture changes.

 I basically have a script which I have written which is run as a 
 'RunBeforeJob' from backup and runs either a mysqldump if the backup level is 
 full, or flushes the bin logs if the level is incremental.

 I'm in the process of performing some test restores from these backups, as I 
 would like to know the procedure is working correctly.

 I have no issue restoring the files from Bacula, however I'm having some 
 issues restoring my catalog MySQL database from the binary logs created by 
 MySQL. Specifically, I am getting messages like:

   ERROR 1146 (42S02) at line 105: Table 'bacula.batch' doesn't exist

 when I try to replay my log files against the database after it's been 
 restore from the dump file. As far as I know the batch table is a temporary 
 table created when inserting file attributes into the catalog during/after a 
 backup job. I would have hoped, however, the creation of this table would 
 have been included in either my database/earlier in my bin log.

 I believe this may be related to another thread on the list at the moment 
 titled Catalog backup while job running? as this is, in effect what I am 
 doing - a full database dump whilst other jobs are running, but my reason for 
 creating a new thread is that I am not getting any errors in my backup jobs, 
 as the OP of the other thread is - I'm simply having issues rebuilding my 
 database after restoring the said full dump.

 I would like to know if anyone is currently backing up their catalog database 
 in such a way, and if so how they are overcoming this issue when restoring. 
 My reason for backing up my catalog using binary logging is so that I can 
 perform a point-in-time recovery of the catalog, should I loose it.



I am not running a catalog backup in that way, but have thought about it.

You're correct that the batch tables are temporary tables created so 
that jobs can do batch inserts of the file attributes.

I did run into a similar problem to yours when I had a MySQL slave 
server out of sync with the master.  The slave (much like your restore) 
was reading through binlogs to catch up and ran into a line that 
referred to a batch table, which didn't exist.  In my case, it didn't 
exist because the slave never saw an earlier line that created the 
temporary batch table.

I would imagine something similar is going on with your restore, where 
you are not actually applying all the changes since the Full dump (or 
did not capture all the changes since the Full dump), because somewhere 
you should have a line in your binlogs that create the batch table 
before other lines refer to and try to use it.

Also, keep in mind that theses temporary batch tables are owned by 
threads, so if you start looking through your binlogs, you'll see many 
references to bacula.batch, but they are not all referring to the same 
table.  Each thread is able to have it's own bacula.batch table.


Stephen







 Any input anyone can offer would be greatly appreciated.

 Thanks,

 Joe
 --
 Better than sec? Nothing is better than sec when it comes to
 monitoring Big Data applications. Try Boundary one-second
 resolution app monitoring today. Free.
 http://p.sf.net/sfu/Boundary-dev2dev
 ___
 Bacula-users mailing list
 Bacula-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/bacula-users


-- 
Stephen Thompson   Berkeley Seismological Laboratory
step...@seismo.berkeley.edu215 McCone Hall # 4760
404.538.7077 (phone)   University of California, Berkeley
510.643.5811 (fax) Berkeley, CA 94720-4760

--
Better than sec? Nothing is better than sec when it comes to
monitoring Big Data applications. Try Boundary one-second 
resolution app monitoring today. Free.
http://p.sf.net/sfu/Boundary-dev2dev
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Bacula MySQL Catalog binlog restore

2012-04-05 Thread Stephen Thompson
On 04/05/2012 03:19 PM, Joe Nyland wrote:
 On 5 Apr 2012, at 22:37, Stephen Thompson wrote:

 On 04/05/2012 02:27 PM, Joe Nyland wrote:
 Hi,

 I've been using Bacula for a while now and I have a backup procedure in 
 place for my MySQL databases, where I perform a full (dump) backup nightly, 
 then incremental (bin log) backups every hour through the day to capture 
 changes.

 I basically have a script which I have written which is run as a 
 'RunBeforeJob' from backup and runs either a mysqldump if the backup level 
 is full, or flushes the bin logs if the level is incremental.

 I'm in the process of performing some test restores from these backups, as 
 I would like to know the procedure is working correctly.

 I have no issue restoring the files from Bacula, however I'm having some 
 issues restoring my catalog MySQL database from the binary logs created by 
 MySQL. Specifically, I am getting messages like:

 ERROR 1146 (42S02) at line 105: Table 'bacula.batch' doesn't exist

 when I try to replay my log files against the database after it's been 
 restore from the dump file. As far as I know the batch table is a temporary 
 table created when inserting file attributes into the catalog during/after 
 a backup job. I would have hoped, however, the creation of this table would 
 have been included in either my database/earlier in my bin log.

 I believe this may be related to another thread on the list at the moment 
 titled Catalog backup while job running? as this is, in effect what I am 
 doing - a full database dump whilst other jobs are running, but my reason 
 for creating a new thread is that I am not getting any errors in my backup 
 jobs, as the OP of the other thread is - I'm simply having issues 
 rebuilding my database after restoring the said full dump.

 I would like to know if anyone is currently backing up their catalog 
 database in such a way, and if so how they are overcoming this issue when 
 restoring. My reason for backing up my catalog using binary logging is so 
 that I can perform a point-in-time recovery of the catalog, should I loose 
 it.



 I am not running a catalog backup in that way, but have thought about it.

 You're correct that the batch tables are temporary tables created so that 
 jobs can do batch inserts of the file attributes.

 I did run into a similar problem to yours when I had a MySQL slave server 
 out of sync with the master.  The slave (much like your restore) was reading 
 through binlogs to catch up and ran into a line that referred to a batch 
 table, which didn't exist.  In my case, it didn't exist because the slave 
 never saw an earlier line that created the temporary batch table.

 I would imagine something similar is going on with your restore, where you 
 are not actually applying all the changes since the Full dump (or did not 
 capture all the changes since the Full dump), because somewhere you should 
 have a line in your binlogs that create the batch table before other lines 
 refer to and try to use it.

 Also, keep in mind that theses temporary batch tables are owned by threads, 
 so if you start looking through your binlogs, you'll see many references to 
 bacula.batch, but they are not all referring to the same table.  Each thread 
 is able to have it's own bacula.batch table.


 Stephen


 Any input anyone can offer would be greatly appreciated.

 Thanks,

 Joe


 --
 Stephen Thompson   Berkeley Seismological Laboratory
 step...@seismo.berkeley.edu215 McCone Hall # 4760
 404.538.7077 (phone)   University of California, Berkeley
 510.643.5811 (fax) Berkeley, CA 94720-4760

 Hi Stephen,

 Thank you very much for your reply.

 I agree that it seems the creation of the batch table is not being captured, 
 for some reason.

 As I think it may be useful, here's the line taken from my MySQL 
 'RunBeforeJob' script when the full backup is taken:

   mysqldump --all-databases --single-transaction --delete-master-logs 
 --flush-logs --master-data --opt -u ${DBUSER} -p${DBPASS}  
 ${DST}/${HOST}_${DATE}_${TIME}.sql.dmp

 Can you spot anything there which could cause the creation of this/these 
 temporary tables to not be included in the bin log? I've spent a while 
 getting this list of options right and I'm not 100% sure I've got the correct 
 combination, but it's possible I've missed something here.


Sorry, I don't think I can be much help here.  I'm wrangling with 
mysqldump myself at the moment since I moved from MyISAM tables to 
InnoDB and the documentation is very poor.

Are you using InnoDB...  If not, I'm not sure why --single-transaction 
is there, and if so, I wonder if it shouldn't come after --opt.  The 
options order matter and since --opt is the default, having it at the 
end of your line is only resetting anything you change earlier in the 
line back to the --opt defaults.

Stephen




 Thanks,

 Joe

Re: [Bacula-users] Bacula MySQL Catalog binlog restore

2012-04-10 Thread Stephen Thompson
On 04/10/2012 07:51 AM, Joe Nyland wrote:
 -Original message-
 From: Joe Nylandj...@joenyland.co.uk
 Sent: Fri 06-04-2012 22:15
 Subject:  Re: [Bacula-users] Bacula MySQL Catalog binlog restore
 To:   Bacula Usersbacula-users@lists.sourceforge.net;
 On 6 Apr 2012, at 00:08, Phil Stracchino wrote:

 On 04/05/2012 06:46 PM, Stephen Thompson wrote:
 On 04/05/2012 03:19 PM, Joe Nyland wrote:
 As I think it may be useful, here's the line taken from my MySQL
 'RunBeforeJob' script when the full backup is taken:

 mysqldump --all-databases --single-transaction --delete-master-logs
 --flush-logs --master-data --opt -u ${DBUSER} -p${DBPASS}
 ${DST}/${HOST}_${DATE}_${TIME}.sql.dmp

 Can you spot anything there which could cause the creation of
 this/these temporary tables to not be included in the bin log? I've
 spent a while getting this list of options right and I'm not 100%
 sure I've got the correct combination, but it's possible I've
 missed something here.


 Sorry, I don't think I can be much help here.  I'm wrangling with
 mysqldump myself at the moment since I moved from MyISAM tables to
 InnoDB and the documentation is very poor.

 Are you using InnoDB...  If not, I'm not sure why
 --single-transaction is there, and if so, I wonder if it shouldn't
 come after --opt.  The options order matter and since --opt is the
 default, having it at the end of your line is only resetting anything
 you change earlier in the line back to the --opt defaults.

 Since --opt is the default, there's no reason to ever explicitly specify
 it at all in the first place.

 And as we just discussed the other day, --single-transaction is
 ineffective without either --skip-lock-tables, or --skip-opt and adding
 back in the stuff from  --opt that you want.


 --
   Phil Stracchino, CDK#2 DoD#299792458 ICBM: 43.5607, -71.355
   ala...@caerllewys.net   ala...@metrocast.net   p...@co.ordinate.org
   Renaissance Man, Unix ronin, Perl hacker, SQL wrangler, Free Stater
  It's not the years, it's the mileage.

 Thank you all for you input.

 Following your advice, I've now changed my mysqldump line in my script to:

  mysqldump --all-databases -u ${DBUSER} -p${DBPASS} --flush-logs
 --master-data=1 --delete-master-logs --opt
 ${DST}/${HOST}_${DATE}_${TIME}.sql.dmp

 Re-reading the mysqldump reference manual (yet again!) I'm starting to wonder
 whether the '--delete-master-logs' option is causing some important
 transactions to be lost from the binary logs, which is the reason why the
 temporary table creation statements mentioned above are missing from my log
 file. My theory is that during the dump of the database, the temporary tables
 are created, then the dump finishes and deletes the binary logs, therefore
 removing any log of the temporary tables being created in the first place. 
 Does
 that sound feasible?

 Thanks,

 Joe

 I'm a bit ashamed to admit I'm still battling this! I've removed 
 '--delete-master-logs' from my mysqldump line, but it hasn't helped.

 For some reason, it seems as if the dump does not contain any mention of the 
 temporary tables being created, neither do the binary logs, however there are 
 statements which refer to bacula.batch, as if it should be there.

 Could it be that these statements refer to a bacula.batch table which was 
 created by another thread prior to the mysql dump being created? ...and 
 that's why the CREATE TEMPORARY TABLE bacula.batch statement is not in the 
 binary logs after the full backup. Surely, if this were the case, the 
 bacula.batch table sowuld be included in the dump would they not?

 My fear is that because I am restoring binary logs, the binary log restores 
 will be running under their own threads (after the main dump file had been 
 restored) and thus will be unable to access temporary tables created by any 
 other previous threads - making what I am trying to achieve impossible.

 I know this is becoming a little OT as it's largely to do with mysqldump and 
 binary logging, but I hope someone can help.

 Any ideas how to overcome this?


I wonder if you're running the backup while other jobs are running?

If nothing else is running, then the dump shouldn't miss any of the temp 
tables, because there will be none during the dump.

If you run it concurrently, consider this:

Rather than blasting away your binlogs, keep them around for longer than 
the interval between your backups (i.e. keep them for at least 2 days if 
you dump every day).  Then backup ALL binlogs when you do the 
incremental.  Then if you need to restore, you should be able to 
intentionally go back farther in time in the binlogs, before the dump, 
and start syncing from there WITH errors temporarily disabled (or at 
least duplicate entry errors).  This might/should let the import skip 
over stuff that the dump has already restored, but catch the stuff that 
it missed, like temp tables.

Problem is you're likely to not know WHEN to start in the logs, though 
you

[Bacula-users] Fatal error: askdir.c:339 NULL Volume name. This shouldn't happen!!!

2012-05-24 Thread Stephen Thompson

hello,

Anyone run into this error before?

We hadn't until we upgraded our bacula server from Centos 5.8 to Redhat 
6.2, after which we of course had to recompile bacula.  However, we used 
the same source, version, and options, the exception being that we added 
readline for improved bconsole functionality.

Now every couple of days we have jobs error out like this:


21-May 20:04 SD JobId 236699: Fatal error: askdir.c:339 NULL Volume 
name. This shouldn't happen!!!


21-May 22:02 DIR JobId 236711: Fatal error: Catalog error updating Media 
record. sql_update.c:411 Update failed: affected_rows=0 for UPDATE Media 
SET 
VolJobs=0,VolFiles=0,VolBlocks=0,VolBytes=0,VolMounts=0,VolErrors=0,VolWrites=0,MaxVolBytes=0,VolStatus='',Slot=0,InChanger=0,VolReadTime=0,VolWriteTime=0,VolParts=0,LabelType=0,StorageId=0,PoolId=0,VolRetention=0,VolUseDuration=0,MaxVolJobs=0,MaxVolFiles=0,Enabled=0,LocationId=0,ScratchPoolId=0,RecyclePoolId=0,RecycleCount=0,Recycle=0,ActionOnPurge=0
 
WHERE VolumeName=''


23-May 22:02 SD JobId 237069: Fatal error: askdir.c:339 NULL Volume 
name. This shouldn't happen!!!


There is nothing new or strange about our volumes, nothing in DB with 
null values.

My only idea, which is sheer speculation is that we have in the past had 
some strange behaviours around tape contention, like a set of jobs 
running with a storage daemon with two drives, and a job being assigned 
to one drive will want the tape that is in use by another job on the 
other drive.   That happened pretty rarely, though I was wondering if 
this might perhaps be the new outcome of that contention.  Again, sheer 
speculation as I have nothing but the errors above to go on at the moment.


thanks!
Stephen
-- 
Stephen Thompson   Berkeley Seismological Laboratory
step...@seismo.berkeley.edu215 McCone Hall # 4760
404.538.7077 (phone)   University of California, Berkeley
510.643.5811 (fax) Berkeley, CA 94720-4760

--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Config of bacula-dir.conf for two drive with one autochanger

2012-06-14 Thread Stephen Thompson
  4.
 Storage {
  5.
Name = titi1
  6.
Address = kraken# N.B. Use a fully qualified name
 here
  7.
SDPort = 9103
  8.
Password = ***  # password for Storage daemon
  9.

 10.
 *Device = Drive-1 *   # must be same as Device in
 Storage daemon-- ???
 11.
Media Type = LTO-5  # must be same as MediaType
 in Storage daemon
 12.
Autochanger = yes   # enable for Autochanger device
 13.
 }
 14.

 15.
 Storage {
 16.
Name = titi2
 17.
Address = kraken# N.B. Use a fully qualified name
 here
 18.
SDPort = 9103
 19.
Password = ***  # password for Storage daemon
 20.
 *Device = Drive-2 *   # must be same as Device in
 Storage daemon-- ???
 21.
Media Type = LTO-5  # must be same as MediaType
 in Storage daemon
 22.
Autochanger = yes   # enable for Autochanger device
 23.
 }
 24.
 [...]







 --
 Live Security Virtual Conference
 Exclusive live event will cover all the ways today's security and
 threat landscape has changed and how IT managers can respond. Discussions
 will include endpoint security, mobile security and the latest in malware
 threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/



 ___
 Bacula-users mailing list
 Bacula-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/bacula-users



-- 
Stephen Thompson   Berkeley Seismological Laboratory
step...@seismo.berkeley.edu215 McCone Hall # 4760
404.538.7077 (phone)   University of California, Berkeley
510.643.5811 (fax) Berkeley, CA 94720-4760



--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Fatal error: askdir.c:339 NULL Volume name. This shouldn't happen!!!

2012-06-18 Thread Stephen Thompson


This (fingers crossed) may have been fixed with 5.2.9 which we upgraded 
to last week.  It hasn't quite been long enough for me to be convinced 
the problem won't return, but I'm hopeful.

Stephen



On 5/24/12 7:08 AM, Stephen Thompson wrote:

 hello,

 Anyone run into this error before?

 We hadn't until we upgraded our bacula server from Centos 5.8 to Redhat
 6.2, after which we of course had to recompile bacula.  However, we used
 the same source, version, and options, the exception being that we added
 readline for improved bconsole functionality.

 Now every couple of days we have jobs error out like this:


 21-May 20:04 SD JobId 236699: Fatal error: askdir.c:339 NULL Volume
 name. This shouldn't happen!!!


 21-May 22:02 DIR JobId 236711: Fatal error: Catalog error updating Media
 record. sql_update.c:411 Update failed: affected_rows=0 for UPDATE Media
 SET
 VolJobs=0,VolFiles=0,VolBlocks=0,VolBytes=0,VolMounts=0,VolErrors=0,VolWrites=0,MaxVolBytes=0,VolStatus='',Slot=0,InChanger=0,VolReadTime=0,VolWriteTime=0,VolParts=0,LabelType=0,StorageId=0,PoolId=0,VolRetention=0,VolUseDuration=0,MaxVolJobs=0,MaxVolFiles=0,Enabled=0,LocationId=0,ScratchPoolId=0,RecyclePoolId=0,RecycleCount=0,Recycle=0,ActionOnPurge=0
 WHERE VolumeName=''


 23-May 22:02 SD JobId 237069: Fatal error: askdir.c:339 NULL Volume
 name. This shouldn't happen!!!


 There is nothing new or strange about our volumes, nothing in DB with
 null values.

 My only idea, which is sheer speculation is that we have in the past had
 some strange behaviours around tape contention, like a set of jobs
 running with a storage daemon with two drives, and a job being assigned
 to one drive will want the tape that is in use by another job on the
 other drive.   That happened pretty rarely, though I was wondering if
 this might perhaps be the new outcome of that contention.  Again, sheer
 speculation as I have nothing but the errors above to go on at the moment.


 thanks!
 Stephen

-- 
Stephen Thompson   Berkeley Seismological Laboratory
step...@seismo.berkeley.edu215 McCone Hall # 4760
404.538.7077 (phone)   University of California, Berkeley
510.643.5811 (fax) Berkeley, CA 94720-4760

--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Fatal error: askdir.c:339 NULL Volume name. This shouldn't happen!!!

2012-06-20 Thread Stephen Thompson


Well, since we upgraded to 5.2.9 we have not seen the problem.
Also when running 5.2.6 we were seeing it 2-3 times a week, during which 
we run hundreds of incrementals and several fulls per day.
The error happened both with fulls and incrementals (which we have in 
two different LTO3 libraries).  There was nothing amiss with our catalog 
or volumes, or at least nothing obvious.  The error occurred when 
attempting to use different volumes (mostly previously used ones, 
including recycled), but those same volume were successful for other 
jobs that attempted to use them.  Lastly, it wasn't reproducible, like I 
said it happened 2-3 time out of several hundred jobs, but it was 
happening over the course of a month or two while we ran 5.2.6 on RedHat 
6.2.

Here was our config for 5.2.6


PATH=/usr/lib64/qt4/bin:$PATH
BHOME=/home/bacula
EMAIL=bac...@seismo.berkeley.edu

env CFLAGS='-g -O2' \
 ./configure \
 --prefix=$BHOME \
 --sbindir=$BHOME/bin \
 --sysconfdir=$BHOME/conf \
 --with-working-dir=$BHOME/work \
 --with-bsrdir=$BHOME/log \
 --with-logdir=$BHOME/log \
 --with-pid-dir=/var/run \
 --with-subsys-dir=/var/run \
 --with-dump-email=$EMAIL \
 --with-job-email=$EMAIL \
 --with-mysql \
 --with-dir-user=bacula \
 --with-dir-group=bacula \
 --with-sd-user=bacula \
 --with-sd-group=bacula \
--with-openssl \
--with-tcp-wrappers \
 --enable-smartalloc \
 --with-readline=/usr/include/readline \
 --disable-conio \
 --enable-bat \
 | tee configure.out




On 6/20/12 7:23 AM, Igor Blazevic wrote:
 On 18.06.2012 16:26, Stephen Thompson wrote:


 hello,

 Hello:)


 Anyone run into this error before?

 We hadn't until we upgraded our bacula server from Centos 5.8 to Redhat
 6.2, after which we of course had to recompile bacula.  However, we used
 the same source, version, and options, the exception being that we added
 readline for improved bconsole functionality.

 Can you post your config options, please? I've compiled versions 5.0.3 and
 5.2.6 on RHEL 6.2 with following options:

 CFLAGS=-g -Wall ./configure \
--sysconfdir=/etc/bacula \
--with-dir-user=bacula \
--with-dir-group=bacula \
--with-sd-user=bacula \
--with-sd-group=bacula \
--with-fd-user=root \
--with-fd-group=root \
--with-dir-password=somepasswd \
--with-fd-password=somepasswd \
--with-sd-password=somepasswd \
--with-mon-dir-password=somepasswd \
--with-mon-fd-password=somepasswd \
--with-mon-sd-password=somepasswd \
--with-working-dir=/var/lib/bacula \
--with-scriptdir=/etc/bacula/scripts \
--with-smtp-host=localhost \
--with-subsys-dir=/var/lib/bacula/lock/subsys \
--with-pid-dir=/var/lib/bacula/run \
--enable-largefile \
--disable-tray-monitor \
--enable-build-dird  \
--enable-build-stored \
--with-openssl \
--with-tcp-wrappers \
--with-python \
--enable-smartalloc \
--with-x \
--enable-bat \
--disable-libtool \
--with-postgresql \
--with-readline=/usr/include/readline \
--disable-conio

 and can atest that everything works just fine although I only used NEW
 volumes with it. Maybe there is something amiss with your catalog or
 volume media?





 --

 Igor Blažević



-- 
Stephen Thompson   Berkeley Seismological Laboratory
step...@seismo.berkeley.edu215 McCone Hall # 4760
404.538.7077 (phone)   University of California, Berkeley
510.643.5811 (fax) Berkeley, CA 94720-4760

--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Fatal error: askdir.c:339 NULL Volume name. This shouldn't happen!!!

2012-07-05 Thread Stephen Thompson


Update.  We have seen the problem 2-3 times this past month running 
5.2.9 on Redhat 6.2, much less frequent than before but still there.

Stephen



On 6/20/12 7:40 AM, Stephen Thompson wrote:


 Well, since we upgraded to 5.2.9 we have not seen the problem.
 Also when running 5.2.6 we were seeing it 2-3 times a week, during which
 we run hundreds of incrementals and several fulls per day.
 The error happened both with fulls and incrementals (which we have in
 two different LTO3 libraries).  There was nothing amiss with our catalog
 or volumes, or at least nothing obvious.  The error occurred when
 attempting to use different volumes (mostly previously used ones,
 including recycled), but those same volume were successful for other
 jobs that attempted to use them.  Lastly, it wasn't reproducible, like I
 said it happened 2-3 time out of several hundred jobs, but it was
 happening over the course of a month or two while we ran 5.2.6 on RedHat
 6.2.

 Here was our config for 5.2.6


 PATH=/usr/lib64/qt4/bin:$PATH
 BHOME=/home/bacula
 EMAIL=bac...@seismo.berkeley.edu

 env CFLAGS='-g -O2' \
   ./configure \
   --prefix=$BHOME \
   --sbindir=$BHOME/bin \
   --sysconfdir=$BHOME/conf \
   --with-working-dir=$BHOME/work \
   --with-bsrdir=$BHOME/log \
   --with-logdir=$BHOME/log \
   --with-pid-dir=/var/run \
   --with-subsys-dir=/var/run \
   --with-dump-email=$EMAIL \
   --with-job-email=$EMAIL \
   --with-mysql \
   --with-dir-user=bacula \
   --with-dir-group=bacula \
   --with-sd-user=bacula \
   --with-sd-group=bacula \
   --with-openssl \
   --with-tcp-wrappers \
   --enable-smartalloc \
   --with-readline=/usr/include/readline \
   --disable-conio \
   --enable-bat \
   | tee configure.out




 On 6/20/12 7:23 AM, Igor Blazevic wrote:
 On 18.06.2012 16:26, Stephen Thompson wrote:


 hello,

 Hello:)


 Anyone run into this error before?

 We hadn't until we upgraded our bacula server from Centos 5.8 to Redhat
 6.2, after which we of course had to recompile bacula.  However, we used
 the same source, version, and options, the exception being that we added
 readline for improved bconsole functionality.

 Can you post your config options, please? I've compiled versions 5.0.3 and
 5.2.6 on RHEL 6.2 with following options:

 CFLAGS=-g -Wall ./configure \
 --sysconfdir=/etc/bacula \
 --with-dir-user=bacula \
 --with-dir-group=bacula \
 --with-sd-user=bacula \
 --with-sd-group=bacula \
 --with-fd-user=root \
 --with-fd-group=root \
 --with-dir-password=somepasswd \
 --with-fd-password=somepasswd \
 --with-sd-password=somepasswd \
 --with-mon-dir-password=somepasswd \
 --with-mon-fd-password=somepasswd \
 --with-mon-sd-password=somepasswd \
 --with-working-dir=/var/lib/bacula \
 --with-scriptdir=/etc/bacula/scripts \
 --with-smtp-host=localhost \
 --with-subsys-dir=/var/lib/bacula/lock/subsys \
 --with-pid-dir=/var/lib/bacula/run \
 --enable-largefile \
 --disable-tray-monitor \
 --enable-build-dird  \
 --enable-build-stored \
 --with-openssl \
 --with-tcp-wrappers \
 --with-python \
 --enable-smartalloc \
 --with-x \
 --enable-bat \
 --disable-libtool \
 --with-postgresql \
 --with-readline=/usr/include/readline \
 --disable-conio

 and can atest that everything works just fine although I only used NEW
 volumes with it. Maybe there is something amiss with your catalog or
 volume media?





 --

 Igor Blažević




-- 
Stephen Thompson   Berkeley Seismological Laboratory
step...@seismo.berkeley.edu215 McCone Hall # 4760
404.538.7077 (phone)   University of California, Berkeley
510.643.5811 (fax) Berkeley, CA 94720-4760



--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


[Bacula-users] bacula jobs use volumes from the wrong pool - bug?

2012-07-05 Thread Stephen Thompson
: 3302 Autochanger loaded? drive 0, 
result: nothing loaded.
02-Jul 21:16 SD JobId 243957: 3304 Issuing autochanger load slot 7, 
drive 0 command.
02-Jul 21:16 SD JobId 243957: 3305 Autochanger load slot 7, drive 0, 
status is OK.
02-Jul 21:16 SD JobId 243957: Recycled volume FB0162 on device 
SL500-Drive-0 (/dev/SL500-Drive-0), all previous data lost.
02-Jul 21:16 SD JobId 243957: New volume FB0162 mounted on device 
SL500-Drive-0 (/dev/SL500-Drive-0) at 02-Jul-2012 21:16.
03-Jul 00:36 SD JobId 243957: End of Volume FB0162 at 295:4268 on 
device SL500-Drive-0 (/dev/SL500-Drive-0). Write of 262144 bytes got -1.
03-Jul 00:37 SD JobId 243957: Re-read of last block succeeded.
03-Jul 00:37 SD JobId 243957: End of medium on Volume FB0162 
Bytes=591,088,321,536 Blocks=2,254,823 at 03-Jul-2012 00:37.
03-Jul 00:37 SD JobId 243957: 3307 Issuing autochanger unload slot 7, 
drive 0 command.
03-Jul 00:38 DIR JobId 243957: Recycled volume FB0164
03-Jul 00:38 SD JobId 243957: 3301 Issuing autochanger loaded? drive 
0 command.
03-Jul 00:38 SD JobId 243957: 3302 Autochanger loaded? drive 0, 
result: nothing loaded.
03-Jul 00:38 SD JobId 243957: 3301 Issuing autochanger loaded? drive 
0 command.
03-Jul 00:38 SD JobId 243957: 3302 Autochanger loaded? drive 0, 
result: nothing loaded.
03-Jul 00:38 SD JobId 243957: 3304 Issuing autochanger load slot 9, 
drive 0 command.
03-Jul 00:39 SD JobId 243957: 3305 Autochanger load slot 9, drive 0, 
status is OK.
03-Jul 00:39 SD JobId 243957: Recycled volume FB0164 on device 
SL500-Drive-0 (/dev/SL500-Drive-0), all previous data lost.
03-Jul 00:39 SD JobId 243957: New volume FB0164 mounted on device 
SL500-Drive-0 (/dev/SL500-Drive-0) at 03-Jul-2012 00:39.
03-Jul 01:08 SD JobId 243957: Despooling elapsed time = 10:58:19, 
Transfer rate = 52.72 M Bytes/second
03-Jul 01:08 SD JobId 243957: Sending spooled attrs to the Director. 
Despooling 2,500,634,015 bytes ...
03-Jul 01:31 DIR JobId 243957: Bacula DIR 5.2.9 (11Jun12):

   Build OS:   x86_64-unknown-linux-gnu redhat Enterprise 
release
   JobId:  243957
   Job:JOB.2012-07-01_20.00.04_03
   Backup Level:   Full
   Client: FD 5.2.6 (21Feb12) 
i386-pc-solaris2.10,solaris,5.10
   FileSet:FS 2012-02-01 20:00:23
   Pool:   Full-Pool (From Job resource)
   Catalog:MyCatalog (From Client resource)
   Storage:SL500-changer (From Job resource)
   Scheduled time: 01-Jul-2012 20:00:04
   Start time: 01-Jul-2012 20:04:02
   End time:   03-Jul-2012 01:31:35
   Elapsed time:   1 day 5 hours 27 mins 33 secs
   Priority:   10
   FD Files Written:   6,915,330
   SD Files Written:   6,915,330
   FD Bytes Written:   2,080,113,652,089 (2.080 TB)
   SD Bytes Written:   2,081,613,855,190 (2.081 TB)
   Rate:   19613.9 KB/s
   Software Compression:   None
   VSS:no
   Encryption: no
   Accurate:   yes
   Volume name(s): IM0094|FB0161|FB0158|FB0162|FB0164
   Volume Session Id:  147
   Volume Session Time:1340291913
   Last Volume Bytes:  96,966,779,904 (96.96 GB)
   Non-fatal FD errors:0
   SD Errors:  0
   FD termination status:  OK
   SD termination status:  OK
   Termination:Backup OK

03-Jul 01:31 DIR JobId 243957: Begin pruning Jobs older than 10 years .
03-Jul 01:31 DIR JobId 243957: No Jobs found to prune.
03-Jul 01:31 DIR JobId 243957: Begin pruning Files.
03-Jul 01:31 DIR JobId 243957: No Files found to prune.
03-Jul 01:31 DIR JobId 243957: End auto prune


Note: FB0161|FB0158|FB0162|FB0164 are all in Full-Pool, whereas the 
first tape used in the job, IM0094 is in Incremental-Pool.


Anyone have ideas why it would be using a volume from a pool to which 
the job has not been associated?

thanks,
Stephen
-- 
Stephen Thompson   Berkeley Seismological Laboratory
step...@seismo.berkeley.edu215 McCone Hall # 4760
404.538.7077 (phone)   University of California, Berkeley
510.643.5811 (fax) Berkeley, CA 94720-4760


--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] bacula jobs use volumes from the wrong pool - bug?

2012-07-05 Thread Stephen Thompson
On 07/05/2012 10:46 AM, Stephen Thompson wrote:


 Hello,

 Running 5.2.9, though I believe we've seen this sporadically in earlier
 versions.  Jobs are using volumes that are in pools to which they have
 not been assigned.

 This is likely a bug as I don't see anything peculiar about our
 configuration.  We are using a tape library with 2 drives, both set to
 autoselect. The library contains volumes that are properly assigned
 (i.e. database entries for volumes look fine) to various pools,
 including a Full pool and an Incremental pool.

 Twice in the past week, Full jobs which specify the use of the Full pool
 using jobdefs are using volumes from the Incremental pool.  I haven't
 narrowed down all the details, but I believe it's if the Incremental
 volume is already loaded in a drive when the Full job in question is
 launched.


 Example:

 01-Jul 22:00 DIR JobId 244098: Fatal error: JobId 243957 already
 running. Duplicate job not allowed.
 01-Jul 20:03 DIR JobId 243957: Start Backup JobId 243957,
 Job=JOB.2012-07-01_20.00.04_03
 01-Jul 20:04 DIR JobId 243957: Using Device SL500-Drive-0
 01-Jul 20:42 SD JobId 243957: Spooling data ...
 02-Jul 14:01 SD JobId 243957: Job write elapsed time = 17:19:00,
 Transfer rate = 33.39 M Bytes/second
 02-Jul 14:01 SD JobId 243957: Committing spooled data to Volume
 FB0161. Despooling 2,082,572,994,002 bytes ...
 02-Jul 15:38 SD JobId 243957: End of Volume IM0094 at 462:1559 on
 device SL500-Drive-0 (/dev/SL500-Drive-0). Write of 262144 bytes got -1.
 02-Jul 15:38 SD JobId 243957: Re-read of last block succeeded.
 02-Jul 15:38 SD JobId 243957: End of medium on Volume IM0094
 Bytes=924,097,352,704 Blocks=3,525,155 at 02-Jul-2012 15:38.
 02-Jul 15:38 SD JobId 243957: 3307 Issuing autochanger unload slot
 148, drive 0 command.
 02-Jul 15:39 SD JobId 243957: 3301 Issuing autochanger loaded? drive
 0 command.
 02-Jul 15:39 SD JobId 243957: 3302 Autochanger loaded? drive 0,
 result: nothing loaded.
 02-Jul 15:39 SD JobId 243957: 3301 Issuing autochanger loaded? drive
 0 command.
 02-Jul 15:39 SD JobId 243957: 3302 Autochanger loaded? drive 0,
 result: nothing loaded.
 02-Jul 15:39 SD JobId 243957: 3301 Issuing autochanger loaded? drive
 1 command.
 02-Jul 15:39 SD JobId 243957: 3302 Autochanger loaded? drive 1,
 result: nothing loaded.
 02-Jul 15:39 SD JobId 243957: 3304 Issuing autochanger load slot 6,
 drive 0 command.
 02-Jul 15:40 SD JobId 243957: 3305 Autochanger load slot 6, drive 0,
 status is OK.
 02-Jul 15:40 SD JobId 243957: Volume FB0161 previously written,
 moving to end of data.
 02-Jul 15:40 SD JobId 243957: Ready to append to end of Volume
 FB0161 at file=7.
 02-Jul 15:40 SD JobId 243957: New volume FB0161 mounted on device
 SL500-Drive-0 (/dev/SL500-Drive-0) at 02-Jul-2012 15:40.
 02-Jul 18:19 SD JobId 243957: End of Volume FB0161 at 274:5411 on
 device SL500-Drive-0 (/dev/SL500-Drive-0). Write of 262144 bytes got -1.
 02-Jul 18:19 SD JobId 243957: Re-read of last block succeeded.
 02-Jul 18:19 SD JobId 243957: End of medium on Volume FB0161
 Bytes=535,390,861,312 Blocks=2,042,367 at 02-Jul-2012 18:19.
 02-Jul 18:19 SD JobId 243957: 3307 Issuing autochanger unload slot 6,
 drive 0 command.
 02-Jul 18:20 SD JobId 243957: 3301 Issuing autochanger loaded? drive
 0 command.
 02-Jul 18:20 SD JobId 243957: 3302 Autochanger loaded? drive 0,
 result: nothing loaded.
 02-Jul 18:20 SD JobId 243957: 3301 Issuing autochanger loaded? drive
 0 command.
 02-Jul 18:20 SD JobId 243957: 3302 Autochanger loaded? drive 0,
 result: nothing loaded.
 02-Jul 18:20 SD JobId 243957: 3301 Issuing autochanger loaded? drive
 1 command.
 02-Jul 18:20 SD JobId 243957: 3302 Autochanger loaded? drive 1,
 result: nothing loaded.
 02-Jul 18:20 SD JobId 243957: 3304 Issuing autochanger load slot 4,
 drive 0 command.
 02-Jul 18:21 SD JobId 243957: 3305 Autochanger load slot 4, drive 0,
 status is OK.
 02-Jul 18:21 SD JobId 243957: Volume FB0158 previously written,
 moving to end of data.
 02-Jul 18:21 SD JobId 243957: Ready to append to end of Volume
 FB0158 at file=1.
 02-Jul 18:21 SD JobId 243957: New volume FB0158 mounted on device
 SL500-Drive-0 (/dev/SL500-Drive-0) at 02-Jul-2012 18:21.
 02-Jul 21:14 SD JobId 243957: End of Volume FB0158 at 274:1785 on
 device SL500-Drive-0 (/dev/SL500-Drive-0). Write of 262144 bytes got -1.
 02-Jul 21:14 SD JobId 243957: Re-read of last block succeeded.
 02-Jul 21:14 SD JobId 243957: End of medium on Volume FB0158
 Bytes=546,439,698,432 Blocks=2,084,507 at 02-Jul-2012 21:14.

 02-Jul 21:14 SD JobId 243957: 3307 Issui02-Jul 22:00 DIR JobId
 244271: Fatal error: JobId 244133 already running. Duplicate job not
 allowed.

This line is btw as it is in the log file.  Looks a bit mangled, like 
two lines combined in one line 02-Jul 21:14 SD JobId 243957: 3307 
Issui and 02-Jul 22:00 DIR JobId 244271: Fatal error: JobId 244133 
already running. Duplicate job not allowed.


 02-Jul 22:00 DIR JobId 244268: Fatal error: JobId 243957 already
 running. Duplicate job

Re: [Bacula-users] bacula jobs use volumes from the wrong pool - bug?

2012-07-05 Thread Stephen Thompson


Hello again,

Here's something even stranger...  Another Full job logs that it's 
written to a volume in the Full pool (FB0956), but then the status 
output of the job lists a volume in the Incremental pool (IM0093).  This 
Incremental volume was never even mentioned in the log as a volume to 
which the job despooled.


22-Jun 20:00 DIR JobId 242323: Start Backup JobId 242323, 
Job=JOB.2012-06-22_20.00.02_06
22-Jun 20:01 DIR JobId 242323: Using Device SL500-Drive-1
22-Jun 20:06 SD JobId 242323: 3301 Issuing autochanger loaded? drive 
1 command.
22-Jun 20:06 SD JobId 242323: 3302 Autochanger loaded? drive 1, 
result: nothing loaded.
22-Jun 20:06 SD JobId 242323: 3301 Issuing autochanger loaded? drive 
1 command.
22-Jun 20:06 SD JobId 242323: 3302 Autochanger loaded? drive 1, 
result: nothing loaded.
22-Jun 20:06 SD JobId 242323: 3304 Issuing autochanger load slot 138, 
drive 1 command.
22-Jun 20:07 SD JobId 242323: 3305 Autochanger load slot 138, drive 
1, status is OK.
22-Jun 20:07 SD JobId 242323: Volume FB0956 previously written, 
moving to end of data.
22-Jun 20:08 SD JobId 242323: Ready to append to end of Volume 
FB0956 at file=4.
22-Jun 20:08 SD JobId 242323: Spooling data ...
23-Jun 00:01 SD JobId 242323: Job write elapsed time = 03:53:01, 
Transfer rate = 10.80 M Bytes/second
23-Jun 00:01 SD JobId 242323: Committing spooled data to Volume 
FB0956. Despooling 151,089,481,092 bytes ...
23-Jun 01:28 SD JobId 242323: Despooling elapsed time = 01:27:07, 
Transfer rate = 28.90 M Bytes/second
23-Jun 01:28 SD JobId 242323: Sending spooled attrs to the Director. 
Despooling 99,242,108 bytes ...
23-Jun 01:30 DIR JobId 242323: Bacula DIR 5.2.9 (11Jun12):

   Build OS:   x86_64-unknown-linux-gnu redhat Enterprise 
release
   JobId:  242323
   Job:JOB.2012-06-22_20.00.02_06
   Backup Level:   Full
   Client: FD 5.2.6 (21Feb12) 
x86_64-unknown-linux-gnu,redhat,
   FileSet:FS 2012-01-22 20:00:03
   Pool:   Full-Pool-2012-06 (From Job resource)
   Catalog:MyCatalog (From Client resource)
   Storage:SL500-changer (From Job resource)
   Scheduled time: 22-Jun-2012 20:00:02
   Start time: 22-Jun-2012 20:01:29
   End time:   23-Jun-2012 01:30:15
   Elapsed time:   5 hours 28 mins 46 secs
   Priority:   10
   FD Files Written:   337,622
   SD Files Written:   337,622
   FD Bytes Written:   150,974,593,977 (150.9 GB)
   SD Bytes Written:   151,023,845,596 (151.0 GB)
   Rate:   7653.6 KB/s
   Software Compression:   None
   VSS:no
   Encryption: no
   Accurate:   yes
   Volume name(s): IM0093
   Volume Session Id:  9
   Volume Session Time:1340291913
   Last Volume Bytes:  723,076,936,704 (723.0 GB)
   Non-fatal FD errors:0
   SD Errors:  0
   FD termination status:  OK
   SD termination status:  OK
   Termination:Backup OK

23-Jun 01:30 DIR JobId 242323: Begin pruning Jobs older than 10 years .
23-Jun 01:30 DIR JobId 242323: No Jobs found to prune.
23-Jun 01:30 DIR JobId 242323: Begin pruning Files.
23-Jun 01:30 DIR JobId 242323: No Files found to prune.
23-Jun 01:30 DIR JobId 242323: End auto prune.



Note:  FB0956 is in the Full pool, IM0093 in the Incremental.


The vast majority of our jobs are being successful, but when something 
like this happens, I lost all faith that I even have the backups I think 
I have!!!

I just tried a test restore of this particular job, and it did in fact 
use IM0093 to restore from.  Ugh.

Stephen





On 07/05/2012 10:46 AM, Stephen Thompson wrote:


 Hello,

 Running 5.2.9, though I believe we've seen this sporadically in earlier
 versions.  Jobs are using volumes that are in pools to which they have
 not been assigned.

 This is likely a bug as I don't see anything peculiar about our
 configuration.  We are using a tape library with 2 drives, both set to
 autoselect. The library contains volumes that are properly assigned
 (i.e. database entries for volumes look fine) to various pools,
 including a Full pool and an Incremental pool.

 Twice in the past week, Full jobs which specify the use of the Full pool
 using jobdefs are using volumes from the Incremental pool.  I haven't
 narrowed down all the details, but I believe it's if the Incremental
 volume is already loaded in a drive when the Full job in question is
 launched.


 Example:

 01-Jul 22:00 DIR JobId 244098: Fatal error: JobId 243957 already
 running. Duplicate job not allowed.
 01-Jul 20:03 DIR JobId 243957: Start Backup JobId 243957,
 Job=JOB.2012-07-01_20.00.04_03
 01-Jul 20:04 DIR JobId 243957: Using Device SL500-Drive-0
 01-Jul 20:42 SD JobId 243957: Spooling data ...
 02-Jul 14:01 SD JobId 243957: Job write elapsed time = 17:19:00,
 Transfer rate = 33.39 M Bytes/second
 02-Jul 14:01 SD JobId 243957

Re: [Bacula-users] bacula jobs use volumes from the wrong pool - bug?

2012-07-06 Thread Stephen Thompson
On 07/06/2012 11:01 AM, Martin Simmons wrote:
 On Thu, 05 Jul 2012 11:35:15 -0700, Stephen Thompson said:

 Hello again,

 Here's something even stranger...  Another Full job logs that it's
 written to a volume in the Full pool (FB0956), but then the status
 output of the job lists a volume in the Incremental pool (IM0093).  This
 Incremental volume was never even mentioned in the log as a volume to
 which the job despooled.

 It could be a database problem (the volumes listed in the status output come
 from a query).  What is the output of the sql commands below?

 SELECT VolumeName,JobMedia.* FROM JobMedia,Media WHERE JobMedia.JobId=242323 
 AND JobMedia.MediaId=Media.MediaId;

 SELECT MediaId,VolumeName FROM Media WHERE Media.VolumeName in 
 ('IM0093','FB0956');


Looks like it did in fact write to the Incremental tape IM0093 instead 
of the requested Full tape BUT logged that it wrote to a Full tape 
FB0956.  This begs the questions 1) Why is it writing to a tape in 
another pool? and 2) Why is logging that it wrote to a different tape 
than it did?




mysql SELECT VolumeName,JobMedia.* FROM JobMedia,Media WHERE 
JobMedia.JobId=242323 AND JobMedia.MediaId=Media.MediaId;
++++-++---+---+-++--+--+
| VolumeName | JobMediaId | JobId  | MediaId | FirstIndex | LastIndex | 
StartFile | EndFile | StartBlock | EndBlock | VolIndex |
++++-++---+---+-++--+--+
| IM0093 |1946327 | 242323 |1072 |  1 |   429 | 
   285 | 285 |   4851 | 7628 |1 |
| IM0093 |1946330 | 242323 |1072 |429 |   435 | 
   286 | 286 |  0 | 7628 |2 |
| IM0093 |1946332 | 242323 |1072 |435 |   438 | 
   287 | 287 |  0 | 7628 |3 |
| IM0093 |1946334 | 242323 |1072 |438 |   446 | 
   288 | 288 |  0 | 7628 |4 |
| IM0093 |1946338 | 242323 |1072 |446 |   446 | 
   289 | 289 |  0 | 7628 |5 |
| IM0093 |1946344 | 242323 |1072 |446 |   484 | 
   290 | 290 |  0 | 7628 |6 |
| IM0093 |1946347 | 242323 |1072 |484 |   727 | 
   291 | 291 |  0 | 7628 |7 |
| IM0093 |1946351 | 242323 |1072 |727 |   727 | 
   292 | 292 |  0 | 7628 |8 |
| IM0093 |1946357 | 242323 |1072 |727 |  6237 | 
   293 | 293 |  0 | 7628 |9 |
| IM0093 |1946360 | 242323 |1072 |   6237 |  9134 | 
   294 | 294 |  0 | 7628 |   10 |
| IM0093 |1946363 | 242323 |1072 |   9134 | 12816 | 
   295 | 295 |  0 | 7628 |   11 |
| IM0093 |1946368 | 242323 |1072 |  12816 | 12950 | 
   296 | 296 |  0 | 7628 |   12 |
| IM0093 |1946371 | 242323 |1072 |  12950 | 12985 | 
   297 | 297 |  0 | 7628 |   13 |
| IM0093 |1946374 | 242323 |1072 |  12985 | 13140 | 
   298 | 298 |  0 | 7628 |   14 |
| IM0093 |1946378 | 242323 |1072 |  13140 | 13181 | 
   299 | 299 |  0 | 7628 |   15 |
| IM0093 |1946383 | 242323 |1072 |  13181 | 13283 | 
   300 | 300 |  0 | 7628 |   16 |
| IM0093 |1946386 | 242323 |1072 |  13283 | 19855 | 
   301 | 301 |  0 | 7628 |   17 |
| IM0093 |1946390 | 242323 |1072 |  19855 | 26710 | 
   302 | 302 |  0 | 7628 |   18 |
| IM0093 |1946391 | 242323 |1072 |  26710 | 33532 | 
   303 | 303 |  0 | 7628 |   19 |
| IM0093 |1946394 | 242323 |1072 |  33532 | 40378 | 
   304 | 304 |  0 | 7628 |   20 |
| IM0093 |1946397 | 242323 |1072 |  40378 | 47275 | 
   305 | 305 |  0 | 7628 |   21 |
| IM0093 |1946400 | 242323 |1072 |  47275 | 54271 | 
   306 | 306 |  0 | 7628 |   22 |
| IM0093 |1946403 | 242323 |1072 |  54271 | 58872 | 
   307 | 307 |  0 | 7628 |   23 |
| IM0093 |1946406 | 242323 |1072 |  58872 | 58872 | 
   308 | 308 |  0 | 7628 |   24 |
| IM0093 |1946409 | 242323 |1072 |  58872 | 58873 | 
   309 | 309 |  0 | 7628 |   25 |
| IM0093 |1946412 | 242323 |1072 |  58873 | 58873 | 
   310 | 310 |  0 | 7628 |   26 |
| IM0093

Re: [Bacula-users] bacula jobs use volumes from the wrong pool - bug?

2012-07-09 Thread Stephen Thompson
On 07/09/12 11:37, Martin Simmons wrote:
 On Fri, 06 Jul 2012 11:12:35 -0700, Stephen Thompson said:

 On 07/06/2012 11:01 AM, Martin Simmons wrote:
 On Thu, 05 Jul 2012 11:35:15 -0700, Stephen Thompson said:

 Hello again,

 Here's something even stranger...  Another Full job logs that it's
 written to a volume in the Full pool (FB0956), but then the status
 output of the job lists a volume in the Incremental pool (IM0093).  This
 Incremental volume was never even mentioned in the log as a volume to
 which the job despooled.

 It could be a database problem (the volumes listed in the status output come
 from a query).  What is the output of the sql commands below?

 SELECT VolumeName,JobMedia.* FROM JobMedia,Media WHERE 
 JobMedia.JobId=242323 AND JobMedia.MediaId=Media.MediaId;

 SELECT MediaId,VolumeName FROM Media WHERE Media.VolumeName in 
 ('IM0093','FB0956');


 Looks like it did in fact write to the Incremental tape IM0093 instead
 of the requested Full tape BUT logged that it wrote to a Full tape
 FB0956.  This begs the questions 1) Why is it writing to a tape in
 another pool? and 2) Why is logging that it wrote to a different tape
 than it did?

 You could verify that IM0093 contains the data by using bls -j with the tape
 loaded (but not mounted in Bacula).

 It looks like you have concurrent jobs (non-consecutive JobMediaId values).
 Was another job trying to use IM0093?  Maybe IM0093 was in another drive and
 Bacula mixed up the drives somehow?


Yes, I believe that FB0956 was in one drive and IM0093 in the other, 
though I do not understand how bacula 'mixed up' which volume to use, or 
which drive a particular volume was in.

Not sure how closely related this is, but I've seen cases occasionally 
where bacula will say that it cannot mount a certain volume in Drive0 
and requires user intervention, only to find that the volume requested 
is already mounted and in use in Drive1 by other jobs.  So it is 
possible for bacula either to lose track of which drive a volume is in 
or to not be sure if a volume is already in use.

I did a partial restore of the job and it did in fact load and read off 
IM0093 successfully.  So in some sense I know what happened, I just 
don't know why it happened or how to prevent it (other than isolating 
jobs, but that defeats the point of concurrency).

Stephen




 __Martin

 --
 Live Security Virtual Conference
 Exclusive live event will cover all the ways today's security and
 threat landscape has changed and how IT managers can respond. Discussions
 will include endpoint security, mobile security and the latest in malware
 threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
 ___
 Bacula-users mailing list
 Bacula-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/bacula-users



-- 
Stephen Thompson   Berkeley Seismological Laboratory
step...@seismo.berkeley.edu215 McCone Hall # 4760
404.538.7077 (phone)   University of California, Berkeley
510.643.5811 (fax) Berkeley, CA 94720-4760



--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] bacula jobs use volumes from the wrong pool - bug?

2012-07-10 Thread Stephen Thompson
On 07/10/2012 10:53 AM, Martin Simmons wrote:
 On Mon, 09 Jul 2012 12:55:14 -0700, Stephen Thompson said:

 On 07/09/12 11:37, Martin Simmons wrote:
 On Fri, 06 Jul 2012 11:12:35 -0700, Stephen Thompson said:

 On 07/06/2012 11:01 AM, Martin Simmons wrote:
 On Thu, 05 Jul 2012 11:35:15 -0700, Stephen Thompson said:

 Hello again,

 Here's something even stranger...  Another Full job logs that it's
 written to a volume in the Full pool (FB0956), but then the status
 output of the job lists a volume in the Incremental pool (IM0093).  This
 Incremental volume was never even mentioned in the log as a volume to
 which the job despooled.

 It could be a database problem (the volumes listed in the status output 
 come
 from a query).  What is the output of the sql commands below?

 SELECT VolumeName,JobMedia.* FROM JobMedia,Media WHERE 
 JobMedia.JobId=242323 AND JobMedia.MediaId=Media.MediaId;

 SELECT MediaId,VolumeName FROM Media WHERE Media.VolumeName in 
 ('IM0093','FB0956');


 Looks like it did in fact write to the Incremental tape IM0093 instead
 of the requested Full tape BUT logged that it wrote to a Full tape
 FB0956.  This begs the questions 1) Why is it writing to a tape in
 another pool? and 2) Why is logging that it wrote to a different tape
 than it did?

 You could verify that IM0093 contains the data by using bls -j with the tape
 loaded (but not mounted in Bacula).

 It looks like you have concurrent jobs (non-consecutive JobMediaId values).
 Was another job trying to use IM0093?  Maybe IM0093 was in another drive and
 Bacula mixed up the drives somehow?


 Yes, I believe that FB0956 was in one drive and IM0093 in the other,
 though I do not understand how bacula 'mixed up' which volume to use, or
 which drive a particular volume was in.

 Not sure how closely related this is, but I've seen cases occasionally
 where bacula will say that it cannot mount a certain volume in Drive0
 and requires user intervention, only to find that the volume requested
 is already mounted and in use in Drive1 by other jobs.  So it is
 possible for bacula either to lose track of which drive a volume is in
 or to not be sure if a volume is already in use.

 I did a partial restore of the job and it did in fact load and read off
 IM0093 successfully.  So in some sense I know what happened, I just
 don't know why it happened or how to prevent it (other than isolating
 jobs, but that defeats the point of concurrency).

 You could try upgrading to 5.2.10.  If that doesn't fix it, then reporting it
 in the bug tracker might be the next step
 (http://www.bacula.org/en/?page=bugs).


Already upgraded.  We'll see if it happens again.

thanks,
Stephen



 __Martin

 --
 Live Security Virtual Conference
 Exclusive live event will cover all the ways today's security and
 threat landscape has changed and how IT managers can respond. Discussions
 will include endpoint security, mobile security and the latest in malware
 threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
 ___
 Bacula-users mailing list
 Bacula-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/bacula-users



-- 
Stephen Thompson   Berkeley Seismological Laboratory
step...@seismo.berkeley.edu215 McCone Hall # 4760
404.538.7077 (phone)   University of California, Berkeley
510.643.5811 (fax) Berkeley, CA 94720-4760



--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Fatal error: askdir.c:339 NULL Volume name. This shouldn't happen!!!

2012-07-15 Thread Stephen Thompson


Update.  We are still seeing this in 5.2.10 as well.
It seems to happen more often towards the beginning of a series of jobs, 
when a tape is first chosen (i.e. not when a job is directly using a 
tape that's already been chosen and loaded into a drive by a previous job).

Stephen



On 7/5/12 7:44 AM, Stephen Thompson wrote:


 Update.  We have seen the problem 2-3 times this past month running
 5.2.9 on Redhat 6.2, much less frequent than before but still there.

 Stephen



 On 6/20/12 7:40 AM, Stephen Thompson wrote:


 Well, since we upgraded to 5.2.9 we have not seen the problem.
 Also when running 5.2.6 we were seeing it 2-3 times a week, during which
 we run hundreds of incrementals and several fulls per day.
 The error happened both with fulls and incrementals (which we have in
 two different LTO3 libraries).  There was nothing amiss with our catalog
 or volumes, or at least nothing obvious.  The error occurred when
 attempting to use different volumes (mostly previously used ones,
 including recycled), but those same volume were successful for other
 jobs that attempted to use them.  Lastly, it wasn't reproducible, like I
 said it happened 2-3 time out of several hundred jobs, but it was
 happening over the course of a month or two while we ran 5.2.6 on RedHat
 6.2.

 Here was our config for 5.2.6


 PATH=/usr/lib64/qt4/bin:$PATH
 BHOME=/home/bacula
 EMAIL=bac...@seismo.berkeley.edu

 env CFLAGS='-g -O2' \
./configure \
--prefix=$BHOME \
--sbindir=$BHOME/bin \
--sysconfdir=$BHOME/conf \
--with-working-dir=$BHOME/work \
--with-bsrdir=$BHOME/log \
--with-logdir=$BHOME/log \
--with-pid-dir=/var/run \
--with-subsys-dir=/var/run \
--with-dump-email=$EMAIL \
--with-job-email=$EMAIL \
--with-mysql \
--with-dir-user=bacula \
--with-dir-group=bacula \
--with-sd-user=bacula \
--with-sd-group=bacula \
  --with-openssl \
  --with-tcp-wrappers \
--enable-smartalloc \
--with-readline=/usr/include/readline \
--disable-conio \
--enable-bat \
| tee configure.out




 On 6/20/12 7:23 AM, Igor Blazevic wrote:
 On 18.06.2012 16:26, Stephen Thompson wrote:


 hello,

 Hello:)


 Anyone run into this error before?

 We hadn't until we upgraded our bacula server from Centos 5.8 to Redhat
 6.2, after which we of course had to recompile bacula.  However, we used
 the same source, version, and options, the exception being that we added
 readline for improved bconsole functionality.

 Can you post your config options, please? I've compiled versions 5.0.3 and
 5.2.6 on RHEL 6.2 with following options:

 CFLAGS=-g -Wall ./configure \
  --sysconfdir=/etc/bacula \
  --with-dir-user=bacula \
  --with-dir-group=bacula \
  --with-sd-user=bacula \
  --with-sd-group=bacula \
  --with-fd-user=root \
  --with-fd-group=root \
  --with-dir-password=somepasswd \
  --with-fd-password=somepasswd \
  --with-sd-password=somepasswd \
  --with-mon-dir-password=somepasswd \
  --with-mon-fd-password=somepasswd \
  --with-mon-sd-password=somepasswd \
  --with-working-dir=/var/lib/bacula \
  --with-scriptdir=/etc/bacula/scripts \
  --with-smtp-host=localhost \
  --with-subsys-dir=/var/lib/bacula/lock/subsys \
  --with-pid-dir=/var/lib/bacula/run \
  --enable-largefile \
  --disable-tray-monitor \
  --enable-build-dird  \
  --enable-build-stored \
  --with-openssl \
  --with-tcp-wrappers \
  --with-python \
  --enable-smartalloc \
  --with-x \
  --enable-bat \
  --disable-libtool \
  --with-postgresql \
  --with-readline=/usr/include/readline \
  --disable-conio

 and can atest that everything works just fine although I only used NEW
 volumes with it. Maybe there is something amiss with your catalog or
 volume media?





 --

 Igor Blažević





-- 
Stephen Thompson   Berkeley Seismological Laboratory
step...@seismo.berkeley.edu215 McCone Hall # 4760
404.538.7077 (phone)   University of California, Berkeley
510.643.5811 (fax) Berkeley, CA 94720-4760



--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


[Bacula-users] bacula confused about volumes

2012-07-25 Thread Stephen Thompson

Hey all,

I've been meaning to post about this for awhile, but it comes up pretty 
rarely (maybe once every few months running hundreds of job a night).

With an autochanger with 2 drives, each set to AutoSelect, it's possible 
for bacula to want the same volume in both drives at the same time, 
which creates an Operator Intervention situation.

Here's an example where apparently previous jobs were using a particular 
volume in one drive and somehow jobs assigned to the other drives wanted 
the exact same volume, causing them to pause and require operator 
intervention.


sd_C4 Version: 5.2.10 (28 June 2012) x86_64-unknown-linux-gnu redhat 
Enterprise release
Daemon started 23-Jul-12 10:13. Jobs: run=295, running=3.
  Heap: heap=135,168 smbytes=2,089,365 max_bytes=3,689,580 bufs=299 
max_bufs=396
  Sizes: boffset_t=8 size_t=8 int32_t=4 int64_t=8 mode=0,0

Running Jobs:
Writing: Incremental Backup job AAA JobId=247971 Volume=IM0081
 pool=Incremental-Pool device=C4-Drive-0 (/dev/C4-Drive-0)
 spooling=0 despooling=0 despool_wait=0
 Files=0 Bytes=0 Bytes/sec=0
 FDReadSeqNo=6 in_msg=6 out_msg=4 fd=9
Writing: Incremental Backup job BBB JobId=247973 Volume=IM0081
 pool=Incremental-Pool device=C4-Drive-0 (/dev/C4-Drive-0)
 spooling=0 despooling=0 despool_wait=0
 Files=0 Bytes=0 Bytes/sec=0
 FDReadSeqNo=6 in_msg=6 out_msg=4 fd=13
Writing: Incremental Backup job CCC JobId=247975 Volume=IM0081
 pool=Incremental-Pool device=C4-Drive-0 (/dev/C4-Drive-0)
 spooling=0 despooling=0 despool_wait=0
 Files=0 Bytes=0 Bytes/sec=0
 FDReadSeqNo=6 in_msg=6 out_msg=4 fd=15
Writing: Incremental Backup job DDD JobId=247976 Volume=IM0081
 pool=Incremental-Pool device=C4-Drive-0 (/dev/C4-Drive-0)
 spooling=0 despooling=0 despool_wait=0
 Files=0 Bytes=0 Bytes/sec=0
 FDReadSeqNo=6 in_msg=6 out_msg=4 fd=18


Jobs waiting to reserve a drive:


Terminated Jobs:
  JobId  LevelFiles  Bytes   Status   FinishedName
===
XXX


Device status:
Autochanger C4-changer with devices:
C4-Drive-0 (/dev/C4-Drive-0)
C4-Drive-1 (/dev/C4-Drive-1)
Device C4-Drive-0 (/dev/C4-Drive-0) is not open.
 Device is BLOCKED waiting for mount of volume IM0081,
Pool:Incremental-Pool
Media type:  LTO-3
 Drive 0 is not loaded.
Device C4-Drive-1 (/dev/C4-Drive-1) is mounted with:
 Volume:  IM0081
 Pool:Incremental-Pool
 Media type:  LTO-3
 Slot 32 is loaded in drive 1.
 Total Bytes=369,270,534,144 Blocks=1,408,808 Bytes/block=262,115
 Positioned at File=203 Block=0


Used Volume status:
IM0070 on device C4-Drive-1 (/dev/C4-Drive-1)
 Reader=0 writers=0 devres=0 volinuse=0
IM0081 on device C4-Drive-0 (/dev/C4-Drive-0)
 Reader=0 writers=0 devres=4 volinuse=0




Anyone else have this happen?
Race condition?

thanks,
Stephen
-- 
Stephen Thompson   Berkeley Seismological Laboratory
step...@seismo.berkeley.edu215 McCone Hall # 4760
404.538.7077 (phone)   University of California, Berkeley
510.643.5811 (fax) Berkeley, CA 94720-4760

--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Long running jobs and BackupCatalog

2012-08-02 Thread Stephen Thompson


The enterprise version may have a pause feature, but the open-source one 
does not.

We run a slave database server and make a daily dump from that, knowing 
that it will not preserve the records being made for running jobs, but 
since the running jobs aren't complete when the dump begins, they 
wouldn't be useful records to have anyway (and we're willing to be 
behind by a day on our backups if a disaster were to occur).

It's also possible to run a transactional engine on your master db and 
do a dump while jobs are running, but we found the dump times to be 
ridiculously high (like 12+ hours).  Our Catalog is something like 300Gb.

There are other options out there as well, like using a snapshot of your 
underlying filesystem, but, yeah, a pause feature sure would be nice for 
many many reasons.

Stephen




On 8/2/12 6:36 AM, Clark, Patricia A. wrote:
 Because I have quite a few long running jobs, my BackupCatalog job is not 
 getting run more than once or twice per week.  I understand the potential 
 instability of backing up the catalog while there are running jobs.  Is there 
 anything in the bacula pipeline that would pause running jobs so that the 
 catalog could be backed up?  Say a snapshot capability?

 Patti Clark
 Information International Associates, Inc.
 Linux Administrator and subcontractor to:
 Research and Development Systems Support Oak Ridge National Laboratory


 --
 Live Security Virtual Conference
 Exclusive live event will cover all the ways today's security and
 threat landscape has changed and how IT managers can respond. Discussions
 will include endpoint security, mobile security and the latest in malware
 threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
 ___
 Bacula-users mailing list
 Bacula-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/bacula-users


-- 
Stephen Thompson   Berkeley Seismological Laboratory
step...@seismo.berkeley.edu215 McCone Hall # 4760
404.538.7077 (phone)   University of California, Berkeley
510.643.5811 (fax) Berkeley, CA 94720-4760

--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] bacula confused about volumes

2012-08-05 Thread Stephen Thompson


We're seeing this with a lot more frequency, though we've changed no 
configuration.  Jobs are often left waiting an entire run in order to 
use a volume that's in use by the other drive within a 2 drive changer.

Stephen



On 7/25/12 7:38 AM, Stephen Thompson wrote:

 Hey all,

 I've been meaning to post about this for awhile, but it comes up pretty
 rarely (maybe once every few months running hundreds of job a night).

 With an autochanger with 2 drives, each set to AutoSelect, it's possible
 for bacula to want the same volume in both drives at the same time,
 which creates an Operator Intervention situation.

 Here's an example where apparently previous jobs were using a particular
 volume in one drive and somehow jobs assigned to the other drives wanted
 the exact same volume, causing them to pause and require operator
 intervention.


 sd_C4 Version: 5.2.10 (28 June 2012) x86_64-unknown-linux-gnu redhat
 Enterprise release
 Daemon started 23-Jul-12 10:13. Jobs: run=295, running=3.
Heap: heap=135,168 smbytes=2,089,365 max_bytes=3,689,580 bufs=299
 max_bufs=396
Sizes: boffset_t=8 size_t=8 int32_t=4 int64_t=8 mode=0,0

 Running Jobs:
 Writing: Incremental Backup job AAA JobId=247971 Volume=IM0081
   pool=Incremental-Pool device=C4-Drive-0 (/dev/C4-Drive-0)
   spooling=0 despooling=0 despool_wait=0
   Files=0 Bytes=0 Bytes/sec=0
   FDReadSeqNo=6 in_msg=6 out_msg=4 fd=9
 Writing: Incremental Backup job BBB JobId=247973 Volume=IM0081
   pool=Incremental-Pool device=C4-Drive-0 (/dev/C4-Drive-0)
   spooling=0 despooling=0 despool_wait=0
   Files=0 Bytes=0 Bytes/sec=0
   FDReadSeqNo=6 in_msg=6 out_msg=4 fd=13
 Writing: Incremental Backup job CCC JobId=247975 Volume=IM0081
   pool=Incremental-Pool device=C4-Drive-0 (/dev/C4-Drive-0)
   spooling=0 despooling=0 despool_wait=0
   Files=0 Bytes=0 Bytes/sec=0
   FDReadSeqNo=6 in_msg=6 out_msg=4 fd=15
 Writing: Incremental Backup job DDD JobId=247976 Volume=IM0081
   pool=Incremental-Pool device=C4-Drive-0 (/dev/C4-Drive-0)
   spooling=0 despooling=0 despool_wait=0
   Files=0 Bytes=0 Bytes/sec=0
   FDReadSeqNo=6 in_msg=6 out_msg=4 fd=18
 

 Jobs waiting to reserve a drive:
 

 Terminated Jobs:
JobId  LevelFiles  Bytes   Status   FinishedName
 ===
 XXX
 

 Device status:
 Autochanger C4-changer with devices:
  C4-Drive-0 (/dev/C4-Drive-0)
  C4-Drive-1 (/dev/C4-Drive-1)
 Device C4-Drive-0 (/dev/C4-Drive-0) is not open.
   Device is BLOCKED waiting for mount of volume IM0081,
  Pool:Incremental-Pool
  Media type:  LTO-3
   Drive 0 is not loaded.
 Device C4-Drive-1 (/dev/C4-Drive-1) is mounted with:
   Volume:  IM0081
   Pool:Incremental-Pool
   Media type:  LTO-3
   Slot 32 is loaded in drive 1.
   Total Bytes=369,270,534,144 Blocks=1,408,808 Bytes/block=262,115
   Positioned at File=203 Block=0
 

 Used Volume status:
 IM0070 on device C4-Drive-1 (/dev/C4-Drive-1)
   Reader=0 writers=0 devres=0 volinuse=0
 IM0081 on device C4-Drive-0 (/dev/C4-Drive-0)
   Reader=0 writers=0 devres=4 volinuse=0
 



 Anyone else have this happen?
 Race condition?

 thanks,
 Stephen


-- 
Stephen Thompson   Berkeley Seismological Laboratory
step...@seismo.berkeley.edu215 McCone Hall # 4760
404.538.7077 (phone)   University of California, Berkeley
510.643.5811 (fax) Berkeley, CA 94720-4760

--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] BAT and qt vesrion

2012-08-13 Thread Stephen Thompson

You can also use the depkgs-qt from the bacula website.
It contains the necessary QT which you can statically link without 
installing the non-redhat QT on your system.

Stephen



On 08/09/2012 12:55 PM, Thomas Lohman wrote:
 I downloaded the latest stable QT open source version (4.8.2 at the
 time) and built it before building Bacula 5.2.10.  Bat seems to work
 fine with it.  If you do this, just be aware that the first time you
 build it, it will probably find the older 4.6.x RH QT libraries and
 embed their location in the shared library path so when you go to use
 it, it won't work.  The first time I built it, I told it to explicitly
 look in it's own source tree for it's libraries (by setting LDFLAGS),
 installed that version and then re-built it again telling it to now look
 in the install directory.


 --tom

 I tried to compile bacula-5.2.10 with BAT on a RHEL6.2 server. I
 found that BAT did not get installed because it needs qt version
 4.7.4 or higher but RHEL6.2 has version qt-4.6.2-24 as the latest.  I
 would like to know what the others are doing about this issue?

 Uthra

 --
 Live Security Virtual Conference
 Exclusive live event will cover all the ways today's security and
 threat landscape has changed and how IT managers can respond. Discussions
 will include endpoint security, mobile security and the latest in malware
 threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
 ___
 Bacula-users mailing list
 Bacula-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/bacula-users



-- 
Stephen Thompson   Berkeley Seismological Laboratory
step...@seismo.berkeley.edu215 McCone Hall # 4760
404.538.7077 (phone)   University of California, Berkeley
510.643.5811 (fax) Berkeley, CA 94720-4760

--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Bacula 5.2.11: Director crashes

2012-09-12 Thread Stephen Thompson


We updated our bacula server from 5.2.10 to 5.2.11 earlier today.
A few hours later the bacula-dir crashed.  This is on RedHat 6.3.

No traceback generated.


Stephen




On 09/12/2012 05:45 AM, Uwe Schuerkamp wrote:
 Hi folks,

 I updated one of our bacula servers to 5.2.11 today (CentOS 6.x,
 compiled from source), but sadly the director crashes after a couple
 of copy jobs which were due this morning. Any idea how to go about
 debugging the issue?

 The server has a dir-bactrace file, but it appears to be empty, also
 the last couple of lines in the log file don't give away much beyond
 the selected jobids for copying.

 All the best,

 Uwe



-- 
Stephen Thompson   Berkeley Seismological Laboratory
step...@seismo.berkeley.edu215 McCone Hall # 4760
404.538.7077 (phone)   University of California, Berkeley
510.643.5811 (fax) Berkeley, CA 94720-4760

--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


[Bacula-users] LTO3 tape capacity (variable?)

2012-09-24 Thread Stephen Thompson

Hello all,

This is not likely a bacula questions, but in the chance that it is, or 
the experience on this list, I figured I would ask.

We've been using LTO3 tapes with bacula for a few years now.  Recently 
I've noticed how variable our tape capacity it, ranging from 200-800 Gb. 
  Is that strictly governed by the compressibility of the actual data 
being backed up?  Or is there some chance that bacula isn't squeezing as 
much onto my tapes as I would expect?

200Gb is not very much!

thanks,
Stephen
-- 
Stephen Thompson   Berkeley Seismological Laboratory
step...@seismo.berkeley.edu215 McCone Hall # 4760
404.538.7077 (phone)   University of California, Berkeley
510.643.5811 (fax) Berkeley, CA 94720-4760

--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] LTO3 tape capacity (variable?)

2012-09-24 Thread Stephen Thompson



Thanks for the info, John.

Is there anyone else in the bacula community with LTO3's seeing this 
behaviour?  I don't believe (but am not 100% sure) that I'm having any 
hardware-related issues.

Not sure what to make of this.  About 25% of tapes in a monthly run (70 
tapes) are under the 400Gb native, but then the other 75% are above it, 
some even hitting the 800Gb top.

Stephen



On 09/24/2012 12:02 PM, John Drescher wrote:
 This is not likely a bacula questions, but in the chance that it is, or
 the experience on this list, I figured I would ask.

 We've been using LTO3 tapes with bacula for a few years now.  Recently
 I've noticed how variable our tape capacity it, ranging from 200-800 Gb.
Is that strictly governed by the compressibility of the actual data
 being backed up?  Or is there some chance that bacula isn't squeezing as
 much onto my tapes as I would expect?

 200Gb is not very much!

 These tapes are 400GB native. If you get substantially less than that
 you have a configuration problem (you set limits on the volume size or
 duration) or a hardware problem. Compression should be handled
 entirely and automatically by the tape drive. Bacula does not enable
 or disable hardware compression it just passes the data to the drive
 and writes as much as it can up until it hits its first hardware
 error. At that point bacula calls the tape full and verifies that it
 can read the last block. I believe if it can't read the last block
 this block will be the first block written on the next volume.

 John



-- 
Stephen Thompson   Berkeley Seismological Laboratory
step...@seismo.berkeley.edu215 McCone Hall # 4760
404.538.7077 (phone)   University of California, Berkeley
510.643.5811 (fax) Berkeley, CA 94720-4760

--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] LTO3 tape capacity (variable?)

2012-09-25 Thread Stephen Thompson



Thanks everyone for the suggestions, they at least give me somewhere to 
look, as I was running low on ideas.


More info...

The tape in question have only been used once or twice.

The library is a StorageTek whose SLConsole reports no media (or drive) 
errors, though I will look into those linux-based tools.

Our Sun/Oracle service engineer claims that our drives do not require 
cleaning tapes.  Does that sound legit?

Our throughput is pretty reasonable for our hardware -- we do use disk 
staging and get something like 60Mb/s to tape.

Lastly, the tapes that get 200 vs 800 are from the same batch of tapes, 
same number of uses, and used by the same pair of SL500 drives.  That's 
primarily why I wondered if it could be data dependent (or a bacula bug).


thanks!
Stephen


On 09/25/12 02:29, Cejka Rudolf wrote:
 We've been using LTO3 tapes with bacula for a few years now.  Recently I've
 noticed how variable our tape capacity it, ranging from 200-800 Gb.
Is that strictly governed by the compressibility of the actual data being
 backed up?

 Hello,
the lower bound 200 GB on 400 GB LTO-3 tapes is not possible due
 to the drive compression, because it always compares, if compressed
 data are shorter that original. In other case, it writes data uncompressed.
 So, in all cases, you should see atleast 400 000 000 000 bytes written
 on all tapes.

 Or is there some chance that bacula isn't squeezing as much
 onto my tapes as I would expect? 200Gb is not very much!

 In bacula, look mainly for the reasons, why there is just 200 GB written.
 If the tape is full, think about these:

 - Weared tapes. Typical tape service life is written as 200 full cycles.
However, read
http://www.xma4govt.co.uk/Libraries/Manufacturer/ultriumwhitepaper_EEE.sflb
where they experienced problems with some tapes just only after
30 cycles! How many cycles could you have with your tapes?

 - Do you use disk staging, so that tape writes are done at full speed?
Do you have a good disk staging? Considering using SSDs for staging
is very wise. If data rate is lower that 1/3 to 1/2 of native tape
speed (based on drive vendor, HP or IBM), then drive has to perform
tape repositions, which means another important excessive drive and
tape wearing.

My experiences are, that even HW RAID-0 with four 10k disks could not
be sufficient and when there are data writes and reads in parallel,
it could not put 80 MB/s to the drive, typically just 50-70 MB/s,
which is still acceptable for LTO-3, but not good.

Currently, I have 4 x 450 GB SSDs HW RAID-0 with over 1500 GB/s without
problem running writes and reads in parallel and just after that I hope
that it is really sufficient for = LTO-3 staging and putting drives and
tapes wearing to minimum.

 - Dirty heads. You can enforce cleaning cycle, but then return to the
two points above and other suggestiong, like using some monitoring
like ltt on Linux (or I have some home made reporting tool using
camcontrol on FreeBSD), where it would be possible to ensure, that
your problem are weared tapes, or something else.

 Best regards.



-- 
Stephen Thompson   Berkeley Seismological Laboratory
step...@seismo.berkeley.edu215 McCone Hall # 4760
404.538.7077 (phone)   University of California, Berkeley
510.643.5811 (fax) Berkeley, CA 94720-4760

--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] LTO3 tape capacity (variable?)

2012-09-25 Thread Stephen Thompson
On 09/25/2012 10:43 AM, Alan Brown wrote:
 On 25/09/12 17:43, Stephen Thompson wrote:
 Our Sun/Oracle service engineer claims that our drives do not require
 cleaning tapes.  Does that sound legit?

 In general: true (as in, Don't do it as a scheduled item), but all LTO
 drives require cleaning tapes from time to time and sometimes benefit
 from loading one even if the clean light isn't on. It primarily
 depends on the cleanliness of the room where the drive is.

 Our throughput is pretty reasonable for our hardware -- we do use disk
 staging and get something like 60Mb/s to tape.

 60Mb/s is _slow_ for LTO3. You need to take a serious look at what
 you're using as stage disk and consider using a raid0 array of SSDs in
 order to keep up.


Why do you say that's slow when the max speed appears to be 80?

http://en.wikipedia.org/wiki/Linear_Tape-Open




 Lastly, the tapes that get 200 vs 800 are from the same batch of tapes,
 same number of uses, and used by the same pair of SL500 drives.  That's
 primarily why I wondered if it could be data dependent (or a bacula bug).


 What happens if you mark the volumes as append and put them back in
 the library?

 I've seen transient scsi errors result in tapes being marked as full.

 What does smartctl show for the drive and tape in question? (run this
 against the /dev/sg of the tape drive)





-- 
Stephen Thompson   Berkeley Seismological Laboratory
step...@seismo.berkeley.edu215 McCone Hall # 4760
404.538.7077 (phone)   University of California, Berkeley
510.643.5811 (fax) Berkeley, CA 94720-4760

--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] LTO3 tape capacity (variable?)

2012-09-25 Thread Stephen Thompson
On 09/25/2012 11:17 AM, Konstantin Khomoutov wrote:
 On Tue, 25 Sep 2012 11:00:07 -0700
 Stephen Thompson step...@seismo.berkeley.edu wrote:

 60Mb/s is _slow_ for LTO3. You need to take a serious look at what
 you're using as stage disk and consider using a raid0 array of SSDs
 in order to keep up.
 Why do you say that's slow when the max speed appears to be 80?
 http://en.wikipedia.org/wiki/Linear_Tape-Open
 It's quite logical, that to not starve the consumer, the producer
 should be at least as fast or faster, so you have to provide at least
 80 Mb/s sustained read rate from your spooling media to be sure the
 tape drive is kept busy.


No, I mean, there's slow and there's __SLOW__.  He seemed to be 
indicating that it was unacceptably slow.  I understand it's not optimal.

Stephen
-- 
Stephen Thompson   Berkeley Seismological Laboratory
step...@seismo.berkeley.edu215 McCone Hall # 4760
404.538.7077 (phone)   University of California, Berkeley
510.643.5811 (fax) Berkeley, CA 94720-4760

--
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] LTO3 tape capacity (variable?)

2012-09-26 Thread Stephen Thompson
On 09/25/2012 02:29 PM, Cejka Rudolf wrote:
 Stephen Thompson wrote (2012/09/25):
 The tape in question have only been used once or twice.

 Do you mean just one or two drive loads and unloads?


Yes, I mean the tapes have only been in a drive once or twice, possibly 
for a dozen sequential jobs while in the drive, but only in and out of 
the drive once or twice.

I have seen this 200-300Gb capacity on new tapes as well as used.

I see it in both my SL500 library as well as my C4 library, which is a 
combined 4 LTO3 drives (2 in each library).


 The library is a StorageTek whose SLConsole reports no media (or drive)
 errors, though I will look into those linux-based tools.

 There are several types of errors, recoverable and non-recoverable, and
 I'm afraid that you see just non-recoverable, but it is too late to see
 them.

 Our Sun/Oracle service engineer claims that our drives do not require
 cleaning tapes.  Does that sound legit?

 If you are interested, you can study
 http://www.tarconis.com/documentos/LTO_Cleaning_wp.pdf ;o)
 So in HP case, it is possible to agree. However, you still
 have to have atleast one cleaning cartridge prepared ;o)

 Our throughput is pretty reasonable for our hardware -- we do use disk
 staging and get something like 60Mb/s to tape.

 HP LTO-3 drive can slow down physical speed to 27 MB/s, IBM LTO-3
 to 40 MB/s. Native speed is 80 MB/s, bot all these speeds are after
 compression. If you have 60 MB/s before compression and there are
 some places with somewhat better compression than 2:1, then you are not
 able to feed HP LTO-3. For IBM drive, it is suffucient to have places
 with just 2:1 to need repositions.

 Lastly, the tapes that get 200 vs 800 are from the same batch of tapes,
 same number of uses, and used by the same pair of SL500 drives.  That's
 primarily why I wondered if it could be data dependent (or a bacula bug).

 And what about the reason to switch to the next tape? Do you have something
 like this in your reports?

 22-Sep 02:22 backup-sd JobId 74990: End of Volume 1 at 95:46412 on device 
 drive0 (/dev/nsa0). Write of 65536 bytes got 0.
 22-Sep 02:22 backup-sd JobId 74990: Re-read of last block succeeded.
 22-Sep 02:22 backup-sd JobId 74990: End of medium on Volume 1 
 Bytes=381,238,317,056 Blocks=5,817,238 at 22-Sep-2012 02:22.


Here's an example of a tape that had one job and only wrote ~278Gb to 
the tape:

10-Sep 10:08 sd-SL500 JobId 256773: Recycled volume FB0095 on device 
SL500-Drive-1 (/dev/SL500-Drive-1), all previous data lost.
10-Sep 10:08 sd-SL500 JobId 256773: New volume FB0095 mounted on 
device SL500-Drive-1 (/dev/SL500-Drive-1) at 10-Sep-2012 10:08.
10-Sep 13:02 sd-SL500 JobId 256773: End of Volume FB0095 at 149:5906 
on device SL500-Drive-1 (/dev/SL500-Drive-1). Write of 262144 bytes 
got -1.
10-Sep 13:02 sd-SL500 JobId 256773: Re-read of last block succeeded.
10-Sep 13:02 sd-SL500 JobId 256773: End of medium on Volume FB0095 
Bytes=299,532,813,312 Blocks=1,142,627 at 10-Sep-2012 13:02.


 Do not you use something from the following things in bacula configuration?
  UseVolumeOnce
  Maximum Volume Jobs
  Maximum Volume Bytes
  Volume Use Duration
 ?


No, none of those are configured.


Stephen
-- 
Stephen Thompson   Berkeley Seismological Laboratory
step...@seismo.berkeley.edu215 McCone Hall # 4760
404.538.7077 (phone)   University of California, Berkeley
510.643.5811 (fax) Berkeley, CA 94720-4760

--
How fast is your code?
3 out of 4 devs don\\\'t know how their code performs in production.
Find out how slow your code is with AppDynamics Lite.
http://ad.doubleclick.net/clk;262219672;13503038;z?
http://info.appdynamics.com/FreeJavaPerformanceDownload.html
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] LTO3 tape capacity (variable?)

2012-09-26 Thread Stephen Thompson
On 09/26/2012 02:35 PM, Stephen Thompson wrote:
 On 09/25/2012 02:29 PM, Cejka Rudolf wrote:
 Stephen Thompson wrote (2012/09/25):
 The tape in question have only been used once or twice.

 Do you mean just one or two drive loads and unloads?


 Yes, I mean the tapes have only been in a drive once or twice, possibly
 for a dozen sequential jobs while in the drive, but only in and out of
 the drive once or twice.

 I have seen this 200-300Gb capacity on new tapes as well as used.


I think I pointed this out before, but I also have used and new tapes 
with 400-800Gb on them.  It seems really hit or miss, though the tapes 
with 400Gb or less are probably a 1/3 of my tapes.  The other 2/3 have 
above 400Gb.



 I see it in both my SL500 library as well as my C4 library, which is a
 combined 4 LTO3 drives (2 in each library).


 The library is a StorageTek whose SLConsole reports no media (or drive)
 errors, though I will look into those linux-based tools.

 There are several types of errors, recoverable and non-recoverable, and
 I'm afraid that you see just non-recoverable, but it is too late to see
 them.

 Our Sun/Oracle service engineer claims that our drives do not require
 cleaning tapes.  Does that sound legit?

 If you are interested, you can study
 http://www.tarconis.com/documentos/LTO_Cleaning_wp.pdf ;o)
 So in HP case, it is possible to agree. However, you still
 have to have atleast one cleaning cartridge prepared ;o)

 Our throughput is pretty reasonable for our hardware -- we do use disk
 staging and get something like 60Mb/s to tape.

 HP LTO-3 drive can slow down physical speed to 27 MB/s, IBM LTO-3
 to 40 MB/s. Native speed is 80 MB/s, bot all these speeds are after
 compression. If you have 60 MB/s before compression and there are
 some places with somewhat better compression than 2:1, then you are not
 able to feed HP LTO-3. For IBM drive, it is suffucient to have places
 with just 2:1 to need repositions.

 Lastly, the tapes that get 200 vs 800 are from the same batch of tapes,
 same number of uses, and used by the same pair of SL500 drives.  That's
 primarily why I wondered if it could be data dependent (or a bacula bug).

 And what about the reason to switch to the next tape? Do you have something
 like this in your reports?

 22-Sep 02:22 backup-sd JobId 74990: End of Volume 1 at 95:46412 on device 
 drive0 (/dev/nsa0). Write of 65536 bytes got 0.
 22-Sep 02:22 backup-sd JobId 74990: Re-read of last block succeeded.
 22-Sep 02:22 backup-sd JobId 74990: End of medium on Volume 1 
 Bytes=381,238,317,056 Blocks=5,817,238 at 22-Sep-2012 02:22.


 Here's an example of a tape that had one job and only wrote ~278Gb to
 the tape:

 10-Sep 10:08 sd-SL500 JobId 256773: Recycled volume FB0095 on device
 SL500-Drive-1 (/dev/SL500-Drive-1), all previous data lost.
 10-Sep 10:08 sd-SL500 JobId 256773: New volume FB0095 mounted on
 device SL500-Drive-1 (/dev/SL500-Drive-1) at 10-Sep-2012 10:08.
 10-Sep 13:02 sd-SL500 JobId 256773: End of Volume FB0095 at 149:5906
 on device SL500-Drive-1 (/dev/SL500-Drive-1). Write of 262144 bytes
 got -1.
 10-Sep 13:02 sd-SL500 JobId 256773: Re-read of last block succeeded.
 10-Sep 13:02 sd-SL500 JobId 256773: End of medium on Volume FB0095
 Bytes=299,532,813,312 Blocks=1,142,627 at 10-Sep-2012 13:02.


 Do not you use something from the following things in bacula configuration?
   UseVolumeOnce
   Maximum Volume Jobs
   Maximum Volume Bytes
   Volume Use Duration
 ?


 No, none of those are configured.


 Stephen



-- 
Stephen Thompson   Berkeley Seismological Laboratory
step...@seismo.berkeley.edu215 McCone Hall # 4760
404.538.7077 (phone)   University of California, Berkeley
510.643.5811 (fax) Berkeley, CA 94720-4760

--
How fast is your code?
3 out of 4 devs don\\\'t know how their code performs in production.
Find out how slow your code is with AppDynamics Lite.
http://ad.doubleclick.net/clk;262219672;13503038;z?
http://info.appdynamics.com/FreeJavaPerformanceDownload.html
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] LTO3 tape capacity (variable?)

2012-09-27 Thread Stephen Thompson
On 09/25/2012 10:43 AM, Alan Brown wrote:
 On 25/09/12 17:43, Stephen Thompson wrote:
 Our Sun/Oracle service engineer claims that our drives do not require
 cleaning tapes.  Does that sound legit?

 In general: true (as in, Don't do it as a scheduled item), but all LTO
 drives require cleaning tapes from time to time and sometimes benefit
 from loading one even if the clean light isn't on. It primarily
 depends on the cleanliness of the room where the drive is.

 Our throughput is pretty reasonable for our hardware -- we do use disk
 staging and get something like 60Mb/s to tape.

 60Mb/s is _slow_ for LTO3. You need to take a serious look at what
 you're using as stage disk and consider using a raid0 array of SSDs in
 order to keep up.

 Lastly, the tapes that get 200 vs 800 are from the same batch of tapes,
 same number of uses, and used by the same pair of SL500 drives.  That's
 primarily why I wondered if it could be data dependent (or a bacula bug).


 What happens if you mark the volumes as append and put them back in
 the library?


I haven't had a lot of time to look into this today, but I do this quick 
test and it immediately marks the volume Full again.

27-Sep 14:20 sd-SL500 JobId 260069: Volume FB0763 previously written, 
moving to end of data.
27-Sep 14:21 sd-SL500 JobId 260069: Ready to append to end of Volume 
FB0763 at file=110.
27-Sep 14:21 sd-SL500 JobId 260069: Spooling data ...
27-Sep 14:21 sd-SL500 JobId 260069: Job write elapsed time = 00:00:01, 
Transfer rate = 759.3 K Bytes/second
27-Sep 14:21 sd-SL500 JobId 260069: Committing spooled data to Volume 
FB0763. Despooling 762,358 bytes ...
27-Sep 14:21 sd-SL500 JobId 260069: End of Volume FB0763 at 110:1 on 
device SL500-Drive-0 (/dev/SL500-Drive-0). Write of 262144 bytes got -1.
27-Sep 14:21 sd-SL500 JobId 260069: Re-read of last block succeeded.
27-Sep 14:21 sd-SL500 JobId 260069: End of medium on Volume FB0763 
Bytes=219,730,936,832 Blocks=838,207 at 27-Sep-2012 14:21.
27-Sep 14:21 sd-SL500 JobId 260069: 3307 Issuing autochanger unload 
slot 36, drive 0 command.





 I've seen transient scsi errors result in tapes being marked as full.

 What does smartctl show for the drive and tape in question? (run this
 against the /dev/sg of the tape drive)





-- 
Stephen Thompson   Berkeley Seismological Laboratory
step...@seismo.berkeley.edu215 McCone Hall # 4760
404.538.7077 (phone)   University of California, Berkeley
510.643.5811 (fax) Berkeley, CA 94720-4760

--
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://ad.doubleclick.net/clk;258768047;13503038;j?
http://info.appdynamics.com/FreeJavaPerformanceDownload.html
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] LTO3 tape capacity (variable?)

2012-09-27 Thread Stephen Thompson


On 9/27/12 6:17 PM, Alan Brown wrote:
 On 27/09/12 22:25, Stephen Thompson wrote:
 What happens if you mark the volumes as append and put them back in
 the library?



 I haven't had a lot of time to look into this today, but I do this
 quick test and it immediately marks the volume Full again.


 Then it really is full and the rest is down to overheads.

 Consider using larger block sizes.




Aren't these considered reasonable settings for LTO3?

   Maximum block size = 262144   # 256kb
   Maximum File Size = 2gb



thanks for the help!
Stephen
-- 
Stephen Thompson   Berkeley Seismological Laboratory
step...@seismo.berkeley.edu215 McCone Hall # 4760
404.538.7077 (phone)   University of California, Berkeley
510.643.5811 (fax) Berkeley, CA 94720-4760

--
Got visibility?
Most devs has no idea what their production app looks like.
Find out how fast your code is with AppDynamics Lite.
http://ad.doubleclick.net/clk;262219671;13503038;y?
http://info.appdynamics.com/FreeJavaPerformanceDownload.html
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] LTO3 tape capacity (variable?)

2012-10-01 Thread Stephen Thompson



Hi,

I ran some btape tests today to verify that I'd be improving throughput 
by changing blocksize from 256KB to 2MB and found that this does indeed 
appear to be true in terms of increasing compression efficiency, but it 
doesn't seem to affect incompressible data much, if at all.  Still, it 
seems worth changing and I thank you for pointing me in that direction.

More importantly, I realized that my testing 6 months ago was not on all 
4 of my drives, but only 2 of them.  Today, I discovered one of my 
drives (untested in the past) is getting 1/2 the throughput for random 
data writes as the others!!

btape
*speed file_size=4 nb_file=4 skip_raw


SL500 Drive 0   SL500 Drive 1   C4 Drive 0  C4 Drive 1

256KB block size:
  Zeros =   92.86 MB/s   92.36 MB/s  91.38 MB/s  92.86 MB/s
  Random=   63.16 MB/s   27.53 MB/s  63.39 MB/s  63.60 MB/s

2MB block size:
  Zeros =  123.5  MB/s  122.7  MB/s 122.7  MB/s 122.7  MB/s
  Random=   62.24 MB/s   28.44 MB/s  63.62 MB/s  63.62 MB/s

^

thanks,
Stephen





On 09/28/2012 05:08 AM, Alan Brown wrote:
 On 28/09/12 02:38, Stephen Thompson wrote:

 Aren't these considered reasonable settings for LTO3?

 Maximum block size = 262144   # 256kb
 Maximum File Size = 2gb

 Not really.

 Change maximum file size to 10Gb and maximum block size to 2M

 You _must_ set all open tapes to used and restart the storage daemon
 when changing the block size. Bacula can't cope with varying maximum
 sizes on a tape

 Even with those changes, if you have a lot of small, incompressible
 files you'll see high tape overheads.




 thanks for the help!
 Stephen





-- 
Stephen Thompson   Berkeley Seismological Laboratory
step...@seismo.berkeley.edu215 McCone Hall # 4760
404.538.7077 (phone)   University of California, Berkeley
510.643.5811 (fax) Berkeley, CA 94720-4760

--
Got visibility?
Most devs has no idea what their production app looks like.
Find out how fast your code is with AppDynamics Lite.
http://ad.doubleclick.net/clk;262219671;13503038;y?
http://info.appdynamics.com/FreeJavaPerformanceDownload.html
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] LTO3 tape capacity (variable?)

2012-10-01 Thread Stephen Thompson
On 10/01/2012 03:52 PM, James Harper wrote:

 Hi,

 I ran some btape tests today to verify that I'd be improving throughput by
 changing blocksize from 256KB to 2MB and found that this does indeed
 appear to be true in terms of increasing compression efficiency, but it
 doesn't seem to affect incompressible data much, if at all.  Still, it seems
 worth changing and I thank you for pointing me in that direction.

 More importantly, I realized that my testing 6 months ago was not on all
 4 of my drives, but only 2 of them.  Today, I discovered one of my drives
 (untested in the past) is getting 1/2 the throughput for random data writes 
 as
 the others!!


 Is it definitely LTO3 and definitely using LTO3 media? LTO2 was about half 
 the speed, including using LTO2 media in an LTO3 drive.

 James


Yes, all 4 drives are HP Ultrium 3 drives.
And the same LTO3 bacula volume was used in all 4 testing runs today.
All drives are connected via 2Gb fiber.
All tests were done independent of each other with no other activity on 
the backup server during the time of the testing.


Stephen
-- 
Stephen Thompson   Berkeley Seismological Laboratory
step...@seismo.berkeley.edu215 McCone Hall # 4760
404.538.7077 (phone)   University of California, Berkeley
510.643.5811 (fax) Berkeley, CA 94720-4760

--
Got visibility?
Most devs has no idea what their production app looks like.
Find out how fast your code is with AppDynamics Lite.
http://ad.doubleclick.net/clk;262219671;13503038;y?
http://info.appdynamics.com/FreeJavaPerformanceDownload.html
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] LTO3 tape capacity (variable?)

2012-10-01 Thread Stephen Thompson


On 10/1/12 4:06 PM, Alan Brown wrote:
 On 01/10/12 23:38, Stephen Thompson wrote:

 More importantly, I realized that my testing 6 months ago was not on
 all 4 of my drives, but only 2 of them.  Today, I discovered one of my
 drives (untested in the past) is getting 1/2 the throughput for random
 data writes as the others!!

 smartctl -a /dev/sg(drive) will tell you a lot

 Put a cleaning tape in it







Cleaning tape did not improve results.

I see some errors in the counter log on the problem drive, but I see 
even more errors on another drive which isn't having a throughput 
problem (specifically SL500 Drive 1 is the lower throughput, but C4 
Drive 1 actually has a higher error count).



SL500 Drive 0 (~60MB/s random data throughput)
=
Error counter log:
Errors Corrected by   Total   Correction 
GigabytesTotal
ECC  rereads/errors   algorithm 
processeduncorrected
fast | delayed   rewrites  corrected  invocations   [10^9 
bytes]  errors
read:  00 0 0  0  0.000 
   0
write: 00 0 0  0  0.000 
   0


SL500 Drive 1 (~30MB/s random data throughput)
=
Error counter log:
Errors Corrected by   Total   Correction 
GigabytesTotal
ECC  rereads/errors   algorithm 
processeduncorrected
fast | delayed   rewrites  corrected  invocations   [10^9 
bytes]  errors
read:  00 0 0  0  0.000 
   0
write: 104540 0 0 821389  0.000 
   0


C4 Drive 0 (~60MB/s random data throughput)
==
Error counter log:
Errors Corrected by   Total   Correction 
GigabytesTotal
ECC  rereads/errors   algorithm 
processeduncorrected
fast | delayed   rewrites  corrected  invocations   [10^9 
bytes]  errors
read:  20 0 0  2  0.000 
   0
write: 00 0 0  0  0.000 
   0


C4 Drive 1 (~60MB/s random data throughput)
==
Error counter log:
Errors Corrected by   Total   Correction 
GigabytesTotal
ECC  rereads/errors   algorithm 
processeduncorrected
fast | delayed   rewrites  corrected  invocations   [10^9 
bytes]  errors
read:  20 0 0  2  0.000 
   0
write: 189610 0 0  48261  0.000 
   0




Stephen
-- 
Stephen Thompson   Berkeley Seismological Laboratory
step...@seismo.berkeley.edu215 McCone Hall # 4760
404.538.7077 (phone)   University of California, Berkeley
510.643.5811 (fax) Berkeley, CA 94720-4760

--
Don't let slow site performance ruin your business. Deploy New Relic APM
Deploy New Relic app performance management and know exactly
what is happening inside your Ruby, Python, PHP, Java, and .NET app
Try New Relic at no cost today and get our sweet Data Nerd shirt too!
http://p.sf.net/sfu/newrelic-dev2dev
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] LTO3 tape capacity (variable?)

2012-10-01 Thread Stephen Thompson



Correction, the non-problem drive has a higher ECC fast error count, 
but the problem drive has a significantly higher Corrective algorithm 
invocations count.


On 10/1/12 5:33 PM, Stephen Thompson wrote:


 On 10/1/12 4:06 PM, Alan Brown wrote:
 On 01/10/12 23:38, Stephen Thompson wrote:

 More importantly, I realized that my testing 6 months ago was not on
 all 4 of my drives, but only 2 of them.  Today, I discovered one of my
 drives (untested in the past) is getting 1/2 the throughput for random
 data writes as the others!!

 smartctl -a /dev/sg(drive) will tell you a lot

 Put a cleaning tape in it







 Cleaning tape did not improve results.

 I see some errors in the counter log on the problem drive, but I see
 even more errors on another drive which isn't having a throughput
 problem (specifically SL500 Drive 1 is the lower throughput, but C4
 Drive 1 actually has a higher error count).



 SL500 Drive 0 (~60MB/s random data throughput)
 =
 Error counter log:
  Errors Corrected by   Total   Correction
 GigabytesTotal
  ECC  rereads/errors   algorithm
 processeduncorrected
  fast | delayed   rewrites  corrected  invocations   [10^9
 bytes]  errors
 read:  00 0 0  0  0.000
 0
 write: 00 0 0  0  0.000
 0


 SL500 Drive 1 (~30MB/s random data throughput)
 =
 Error counter log:
  Errors Corrected by   Total   Correction
 GigabytesTotal
  ECC  rereads/errors   algorithm
 processeduncorrected
  fast | delayed   rewrites  corrected  invocations   [10^9
 bytes]  errors
 read:  00 0 0  0  0.000
 0
 write: 104540 0 0 821389  0.000
 0


 C4 Drive 0 (~60MB/s random data throughput)
 ==
 Error counter log:
  Errors Corrected by   Total   Correction
 GigabytesTotal
  ECC  rereads/errors   algorithm
 processeduncorrected
  fast | delayed   rewrites  corrected  invocations   [10^9
 bytes]  errors
 read:  20 0 0  2  0.000
 0
 write: 00 0 0  0  0.000
 0


 C4 Drive 1 (~60MB/s random data throughput)
 ==
 Error counter log:
  Errors Corrected by   Total   Correction
 GigabytesTotal
  ECC  rereads/errors   algorithm
 processeduncorrected
  fast | delayed   rewrites  corrected  invocations   [10^9
 bytes]  errors
 read:  20 0 0  2  0.000
 0
 write: 189610 0 0  48261  0.000
 0




 Stephen


-- 
Stephen Thompson   Berkeley Seismological Laboratory
step...@seismo.berkeley.edu215 McCone Hall # 4760
404.538.7077 (phone)   University of California, Berkeley
510.643.5811 (fax) Berkeley, CA 94720-4760

--
Don't let slow site performance ruin your business. Deploy New Relic APM
Deploy New Relic app performance management and know exactly
what is happening inside your Ruby, Python, PHP, Java, and .NET app
Try New Relic at no cost today and get our sweet Data Nerd shirt too!
http://p.sf.net/sfu/newrelic-dev2dev
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] LTO3 tape capacity (variable?)

2012-10-05 Thread Stephen Thompson

Thank you everyone for your help!

Oracle replaced the drive and while it's not running with as high a 
throughput as I would like, it's at least up at the 60MB/s (random data) 
that my other drives are at, rather than it's previous 30MB/s.

I'm still going to experiment with some of the ideas that were tossed 
out and see if I can't get even better throughput of for bacula.

thanks again,
Stephen



On 10/2/12 2:47 AM, Alan Brown wrote:
 On 02/10/12 01:35, Stephen Thompson wrote:


 Correction, the non-problem drive has a higher ECC fast error count,
 but the problem drive has a significantly higher Corrective algorithm
 invocations count.


 What that means is that it rewrote the data, which accounts for the
 lower throughput.

 LTO drives read as they write and if there are errors, they write again.

 If a cleaning tape doesn't work then you need to get the drive looked
 at/replaced under warranty.


 On 10/1/12 5:33 PM, Stephen Thompson wrote:

 On 10/1/12 4:06 PM, Alan Brown wrote:
 On 01/10/12 23:38, Stephen Thompson wrote:
 More importantly, I realized that my testing 6 months ago was not on
 all 4 of my drives, but only 2 of them.  Today, I discovered one of my
 drives (untested in the past) is getting 1/2 the throughput for random
 data writes as the others!!
 smartctl -a /dev/sg(drive) will tell you a lot

 Put a cleaning tape in it






 Cleaning tape did not improve results.

 I see some errors in the counter log on the problem drive, but I see
 even more errors on another drive which isn't having a throughput
 problem (specifically SL500 Drive 1 is the lower throughput, but C4
 Drive 1 actually has a higher error count).



 SL500 Drive 0 (~60MB/s random data throughput)
 =
 Error counter log:
   Errors Corrected by   Total   Correction
 GigabytesTotal
   ECC  rereads/errors   algorithm
 processeduncorrected
   fast | delayed   rewrites  corrected  invocations   [10^9
 bytes]  errors
 read:  00 0 0  0  0.000
  0
 write: 00 0 0  0  0.000
  0


 SL500 Drive 1 (~30MB/s random data throughput)
 =
 Error counter log:
   Errors Corrected by   Total   Correction
 GigabytesTotal
   ECC  rereads/errors   algorithm
 processeduncorrected
   fast | delayed   rewrites  corrected  invocations   [10^9
 bytes]  errors
 read:  00 0 0  0  0.000
  0
 write: 104540 0 0 821389  0.000
  0


 C4 Drive 0 (~60MB/s random data throughput)
 ==
 Error counter log:
   Errors Corrected by   Total   Correction
 GigabytesTotal
   ECC  rereads/errors   algorithm
 processeduncorrected
   fast | delayed   rewrites  corrected  invocations   [10^9
 bytes]  errors
 read:  20 0 0  2  0.000
  0
 write: 00 0 0  0  0.000
  0


 C4 Drive 1 (~60MB/s random data throughput)
 ==
 Error counter log:
   Errors Corrected by   Total   Correction
 GigabytesTotal
   ECC  rereads/errors   algorithm
 processeduncorrected
   fast | delayed   rewrites  corrected  invocations   [10^9
 bytes]  errors
 read:  20 0 0  2  0.000
  0
 write: 189610 0 0  48261  0.000
  0




 Stephen




-- 
Stephen Thompson   Berkeley Seismological Laboratory
step...@seismo.berkeley.edu215 McCone Hall # 4760
404.538.7077 (phone)   University of California, Berkeley
510.643.5811 (fax) Berkeley, CA 94720-4760

--
Don't let slow site performance ruin your business. Deploy New Relic APM
Deploy New Relic app performance management and know exactly
what is happening inside your Ruby, Python, PHP, Java, and .NET app
Try New Relic at no cost today and get our sweet Data Nerd shirt too!
http://p.sf.net/sfu/newrelic-dev2dev
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Is tape filling up too early?

2012-10-17 Thread Stephen Thompson




I recently found out that I had a bad tape drive.

With the tape in the drive run the following and see if it says there 
are errors:

smartctl -a /dev/nst0


If there are errors, it's wasting tape and hence less capacity.

Stephen



On 10/17/2012 11:14 AM, Sergio Belkin wrote:
 Hi folks

 I'm using LTO3 tapes and are filling up too fast. They have supposedly
 800 GB. I know that never reach that capacity, but I am somewhat
 surprised that is full with only ~ 333 GB!!  (lesser than a half)


 If I issue a list media pool command I get

 | MediaId | VolumeName   | VolStatus | Enabled | VolBytes|
 VolFiles | VolRetention | Recycle | Slot | InChanger | MediaType |
 LastWritten |
 +-+--+---+-+-+--+--+-+--+---+---+-+
 | 100 | LUNOCT12LTO3 | Full  |   1 | 421,590,177,792 |
   431 |   31,536,000 |   0 |0 | 0 | LTO3  |
 2012-10-16 08:11:08 |


 Output of mt -f /dev/nst0  status

 SCSI 2 tape drive:
 File number=0, block number=0, partition=0.
 Tape block size 0 bytes. Density code 0x44 (no translation).
 Soft error count since last status=0
 General status bits on (4101):
   BOT ONLINE IM_REP_EN

 The volume was recycled with 'mt -f /dev/nst0 rewind;mt -f /dev/nst0 weof'

 My storage daemon config is as follow

 Storage { # definition of myself
Name = superbackup-sd
SDPort = 9103  # Director's port
WorkingDirectory = /var/bacula/working
Pid Directory = /var/run
Maximum Concurrent Jobs = 20

 }
 Director {
Name = superbackup-dir
Password = ucuc
 }
 Director {
Name = superbackup-mon
Password = ucuc
Monitor = yes
 }
 Device {
Name = LTO3
Media Type = LTO3
Archive Device = /dev/nst0  #modificar a 1 para usar el DAT4S
AutomaticMount = yes;   # when device opened, read it
AlwaysOpen = yes;
RemovableMedia = yes;
Maximum Spool Size = 30g
Maximum Job Spool Size = 20gb
Spool Directory = /var/spool/bacula
#Maximum Network Buffer Size =  10240
#Hardware end of medium = No;
Fast Forward Space File = yes
#TWO EOF = yes
 }

 Messages {
Name = Standard
director = supernoc-dir = all
 }
 You have new mail in /var/spool/mail/root


 Could you suggest me something to improve it?

 Thanks in advance!



-- 
Stephen Thompson   Berkeley Seismological Laboratory
step...@seismo.berkeley.edu215 McCone Hall # 4760
404.538.7077 (phone)   University of California, Berkeley
510.643.5811 (fax) Berkeley, CA 94720-4760

--
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_sfd2d_oct
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


[Bacula-users] wanted on DEVICE-0, is in use by device DEVICE-1

2012-11-05 Thread Stephen Thompson

Hello all,

I've had the following problem for ages (meaning multiple major 
revisions of bacula) and I've seen this come up from time to time on the 
mailing list, but I've never actually seen a resolution (please point me 
to one if it's been found).


background:

I run monthly Fulls and nightly Incrementals.  I have a 2 drive 
autochanger dedicated to my Incrementals.  I launch something like ~150 
Incremental jobs each night.  I am configured for 8 concurrent jobs on 
the Storage Daemon.


PROBLEM:

The first job(s) grab one of the 2 devices available in the changer 
(which is set to AutoSelect) and either load a tape, or use a tape from 
the previous evening.  All tapes in the changer are in the same 
Incremenal-Pool.

The second jobs(s) grab the other of the 2 devices available in the 
changer, but want to use the same tape that's just been mounted (or put 
into use) on the jobs that got launched first.  They will often literal 
wait the entire evening until 100's of jobs run through on only one 
device, until that tape is freed up, at which point it is unmounted from 
the first device and moved to the second.

Note, the behaviour seems to be to round-robin my 8 concurrency limit 
between the 2 available drives, which mean 4 jobs will run, and 4 jobs 
will block on waiting for the wanted Volume.  When the original 4 jobs 
are completed (not at the same time) additional jobs are launched that 
keep that wanted Volume in use.


LOG:

03-Nov 22:00 DIRECTOR JobId 267433: Start Backup JobId 267433, Job=JOB.
2012-11-03_22.00.00_0403-Nov 22:00 DIRECTOR JobId 267433: Using Device 
L100-Drive-003-Nov 22:00 DIRECTOR JobId 267433: Sending Accurate 
information.
03-Nov 22:00 sd_L100_ JobId 267433: 3307 Issuing autochanger unload 
slot 82, drive 0 command.
03-Nov 22:06 lawson-sd_L100_ JobId 267433: Warning: Volume IM0108 
wanted on L100-Drive-0 (/dev/L100-Drive-0) is in use by device 
L100-Drive-1 (/dev/L100-Drive-1)
03-Nov 22:09 sd_L100_ JobId 267433: Warning: Volume IM0108 wanted on 
L100-Drive-0 (/dev/L100-Drive-0) is in use by device L100-Drive-1 
(/dev/L100-Drive-1)
03-Nov 22:09 sd_L100_ JobId 267433: Warning: mount.c:217 Open device 
L100-Drive-0 (/dev/L100-Drive-0) Volume IM0108 failed: ERR=dev.c:513 
Unable to open device L100-Drive-0 (/dev/L100-Drive-0): ERR=No medium 
found
.
.
.


CONFIGS (partial and seem pretty straight-forward):

Schedule {
   Name = DefaultSchedule
   Run = Level=Incremental   sat-thu at 22:00
   Run = Level=Differential  fri at 22:00
}

JobDefs {
   Name = DefaultJob
   Type = Backup
   Level = Full
   Schedule = DefaultSchedule
   Incremental Backup Pool = Incremental-Pool
   Differential Backup Pool = Incremental-Pool
}

Pool {
   Name = Incremental-Pool
   Pool Type = Backup
   Storage = L100-changer
}

Storage {
   Name = L100-changer
   Device = L100-changer
   Media Type = LTO-3
   Autochanger = yes
   Maximum Concurrent Jobs = 8
}

Autochanger {
   Name = L100-changer
   Device = L100-Drive-0
   Device = L100-Drive-1
   Changer Device = /dev/L100-changer
}

Device {
   Name = L100-Drive-0
   Drive Index = 0
   Media Type = LTO-3
   Archive Device = /dev/L100-Drive-0
   AutomaticMount = yes;
   AlwaysOpen = yes;
   RemovableMedia = yes;
   RandomAccess = no;
   AutoChanger = yes;
   AutoSelect = yes;
}

Device {
   Name = L100-Drive-1
   Drive Index = 0
   Media Type = LTO-3
   Archive Device = /dev/L100-Drive-1
   AutomaticMount = yes;
   AlwaysOpen = yes;
   RemovableMedia = yes;
   RandomAccess = no;
   AutoChanger = yes;
   AutoSelect = yes;
}



thanks!
Stephen
-- 
Stephen Thompson   Berkeley Seismological Laboratory
step...@seismo.berkeley.edu215 McCone Hall # 4760
404.538.7077 (phone)   University of California, Berkeley
510.643.5811 (fax) Berkeley, CA 94720-4760

--
LogMeIn Central: Instant, anywhere, Remote PC access and management.
Stay in control, update software, and manage PCs from one command center
Diagnose problems and improve visibility into emerging IT issues
Automate, monitor and manage. Do more in less time with Central
http://p.sf.net/sfu/logmein12331_d2d
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] wanted on DEVICE-0, is in use by device DEVICE-1

2012-11-05 Thread Stephen Thompson


On 11/5/12 7:59 AM, John Drescher wrote:
 I've had the following problem for ages (meaning multiple major
 revisions of bacula) and I've seen this come up from time to time on the
 mailing list, but I've never actually seen a resolution (please point me
 to one if it's been found).


 background:

 I run monthly Fulls and nightly Incrementals.  I have a 2 drive
 autochanger dedicated to my Incrementals.  I launch something like ~150
 Incremental jobs each night.  I am configured for 8 concurrent jobs on
 the Storage Daemon.


 PROBLEM:

 The first job(s) grab one of the 2 devices available in the changer
 (which is set to AutoSelect) and either load a tape, or use a tape from
 the previous evening.  All tapes in the changer are in the same
 Incremenal-Pool.

 The second jobs(s) grab the other of the 2 devices available in the
 changer, but want to use the same tape that's just been mounted (or put
 into use) on the jobs that got launched first.  They will often literal
 wait the entire evening until 100's of jobs run through on only one
 device, until that tape is freed up, at which point it is unmounted from
 the first device and moved to the second.

 Note, the behaviour seems to be to round-robin my 8 concurrency limit
 between the 2 available drives, which mean 4 jobs will run, and 4 jobs
 will block on waiting for the wanted Volume.  When the original 4 jobs
 are completed (not at the same time) additional jobs are launched that
 keep that wanted Volume in use.


 LOG:

 03-Nov 22:00 DIRECTOR JobId 267433: Start Backup JobId 267433, Job=JOB.
 2012-11-03_22.00.00_0403-Nov 22:00 DIRECTOR JobId 267433: Using Device
 L100-Drive-003-Nov 22:00 DIRECTOR JobId 267433: Sending Accurate
 information.
 03-Nov 22:00 sd_L100_ JobId 267433: 3307 Issuing autochanger unload
 slot 82, drive 0 command.
 03-Nov 22:06 lawson-sd_L100_ JobId 267433: Warning: Volume IM0108
 wanted on L100-Drive-0 (/dev/L100-Drive-0) is in use by device
 L100-Drive-1 (/dev/L100-Drive-1)
 03-Nov 22:09 sd_L100_ JobId 267433: Warning: Volume IM0108 wanted on
 L100-Drive-0 (/dev/L100-Drive-0) is in use by device L100-Drive-1
 (/dev/L100-Drive-1)
 03-Nov 22:09 sd_L100_ JobId 267433: Warning: mount.c:217 Open device
 L100-Drive-0 (/dev/L100-Drive-0) Volume IM0108 failed: ERR=dev.c:513
 Unable to open device L100-Drive-0 (/dev/L100-Drive-0): ERR=No medium
 found
 .
 .
 .


 CONFIGS (partial and seem pretty straight-forward):

 Schedule {
 Name = DefaultSchedule
 Run = Level=Incremental   sat-thu at 22:00
 Run = Level=Differential  fri at 22:00
 }

 JobDefs {
 Name = DefaultJob
 Type = Backup
 Level = Full
 Schedule = DefaultSchedule
 Incremental Backup Pool = Incremental-Pool
 Differential Backup Pool = Incremental-Pool
 }

 Pool {
 Name = Incremental-Pool
 Pool Type = Backup
 Storage = L100-changer
 }

 Storage {
 Name = L100-changer
 Device = L100-changer
 Media Type = LTO-3
 Autochanger = yes
 Maximum Concurrent Jobs = 8
 }

 Autochanger {
 Name = L100-changer
 Device = L100-Drive-0
 Device = L100-Drive-1
 Changer Device = /dev/L100-changer
 }

 Device {
 Name = L100-Drive-0
 Drive Index = 0
 Media Type = LTO-3
 Archive Device = /dev/L100-Drive-0
 AutomaticMount = yes;
 AlwaysOpen = yes;
 RemovableMedia = yes;
 RandomAccess = no;
 AutoChanger = yes;
 AutoSelect = yes;
 }

 Device {
 Name = L100-Drive-1
 Drive Index = 0
 Media Type = LTO-3
 Archive Device = /dev/L100-Drive-1
 AutomaticMount = yes;
 AlwaysOpen = yes;
 RemovableMedia = yes;
 RandomAccess = no;
 AutoChanger = yes;
 AutoSelect = yes;
 }


 I do not have a good solution but I know by default bacula does not
 want to load the same pool into more than 1 storage device at the same
 time.

 John


I think it's something in the automated logic.  Because if I launch jobs 
by hand (same pool across 2 tapes devices in same autochanger) 
everything works fine.  I think it has more to do with the Scheduler 
assigning the same same Volume to all jobs and then not wanting to 
change that choice if that Volume is in use.

If I do a status on the Director for instance and see the jobs for the 
next day lined up in Scheduled jobs, they all have the same Volume listed.

thanks,
Stephen
-- 
Stephen Thompson   Berkeley Seismological Laboratory
step...@seismo.berkeley.edu215 McCone Hall # 4760
404.538.7077 (phone)   University of California, Berkeley
510.643.5811 (fax) Berkeley, CA 94720-4760

--
LogMeIn Central: Instant, anywhere, Remote PC access and management.
Stay in control, update software, and manage PCs from one command center
Diagnose problems and improve visibility into emerging IT issues
Automate, monitor and manage. Do more in less time with Central
http

Re: [Bacula-users] wanted on DEVICE-0, is in use by device DEVICE-1

2012-11-05 Thread Stephen Thompson
On 11/05/12 08:03, Stephen Thompson wrote:


 On 11/5/12 7:59 AM, John Drescher wrote:
 I've had the following problem for ages (meaning multiple major
 revisions of bacula) and I've seen this come up from time to time on the
 mailing list, but I've never actually seen a resolution (please point me
 to one if it's been found).


 background:

 I run monthly Fulls and nightly Incrementals.  I have a 2 drive
 autochanger dedicated to my Incrementals.  I launch something like ~150
 Incremental jobs each night.  I am configured for 8 concurrent jobs on
 the Storage Daemon.


 PROBLEM:

 The first job(s) grab one of the 2 devices available in the changer
 (which is set to AutoSelect) and either load a tape, or use a tape from
 the previous evening.  All tapes in the changer are in the same
 Incremenal-Pool.

 The second jobs(s) grab the other of the 2 devices available in the
 changer, but want to use the same tape that's just been mounted (or put
 into use) on the jobs that got launched first.  They will often literal
 wait the entire evening until 100's of jobs run through on only one
 device, until that tape is freed up, at which point it is unmounted from
 the first device and moved to the second.

 Note, the behaviour seems to be to round-robin my 8 concurrency limit
 between the 2 available drives, which mean 4 jobs will run, and 4 jobs
 will block on waiting for the wanted Volume.  When the original 4 jobs
 are completed (not at the same time) additional jobs are launched that
 keep that wanted Volume in use.


 LOG:

 03-Nov 22:00 DIRECTOR JobId 267433: Start Backup JobId 267433, Job=JOB.
 2012-11-03_22.00.00_0403-Nov 22:00 DIRECTOR JobId 267433: Using Device
 L100-Drive-003-Nov 22:00 DIRECTOR JobId 267433: Sending Accurate
 information.
 03-Nov 22:00 sd_L100_ JobId 267433: 3307 Issuing autochanger unload
 slot 82, drive 0 command.
 03-Nov 22:06 lawson-sd_L100_ JobId 267433: Warning: Volume IM0108
 wanted on L100-Drive-0 (/dev/L100-Drive-0) is in use by device
 L100-Drive-1 (/dev/L100-Drive-1)
 03-Nov 22:09 sd_L100_ JobId 267433: Warning: Volume IM0108 wanted on
 L100-Drive-0 (/dev/L100-Drive-0) is in use by device L100-Drive-1
 (/dev/L100-Drive-1)
 03-Nov 22:09 sd_L100_ JobId 267433: Warning: mount.c:217 Open device
 L100-Drive-0 (/dev/L100-Drive-0) Volume IM0108 failed: ERR=dev.c:513
 Unable to open device L100-Drive-0 (/dev/L100-Drive-0): ERR=No medium
 found
 .
 .
 .


 CONFIGS (partial and seem pretty straight-forward):

 Schedule {
  Name = DefaultSchedule
  Run = Level=Incremental   sat-thu at 22:00
  Run = Level=Differential  fri at 22:00
 }

 JobDefs {
  Name = DefaultJob
  Type = Backup
  Level = Full
  Schedule = DefaultSchedule
  Incremental Backup Pool = Incremental-Pool
  Differential Backup Pool = Incremental-Pool
 }

 Pool {
  Name = Incremental-Pool
  Pool Type = Backup
  Storage = L100-changer
 }

 Storage {
  Name = L100-changer
  Device = L100-changer
  Media Type = LTO-3
  Autochanger = yes
  Maximum Concurrent Jobs = 8
 }

 Autochanger {
  Name = L100-changer
  Device = L100-Drive-0
  Device = L100-Drive-1
  Changer Device = /dev/L100-changer
 }

 Device {
  Name = L100-Drive-0
  Drive Index = 0
  Media Type = LTO-3
  Archive Device = /dev/L100-Drive-0
  AutomaticMount = yes;
  AlwaysOpen = yes;
  RemovableMedia = yes;
  RandomAccess = no;
  AutoChanger = yes;
  AutoSelect = yes;
 }

 Device {
  Name = L100-Drive-1
  Drive Index = 0
  Media Type = LTO-3
  Archive Device = /dev/L100-Drive-1
  AutomaticMount = yes;
  AlwaysOpen = yes;
  RemovableMedia = yes;
  RandomAccess = no;
  AutoChanger = yes;
  AutoSelect = yes;
 }


 I do not have a good solution but I know by default bacula does not
 want to load the same pool into more than 1 storage device at the same
 time.

 John


 I think it's something in the automated logic.  Because if I launch jobs
 by hand (same pool across 2 tapes devices in same autochanger)
 everything works fine.  I think it has more to do with the Scheduler
 assigning the same same Volume to all jobs and then not wanting to
 change that choice if that Volume is in use.


I also use Accurate backups which can sometimes take a bit before the 
job get's back to volume/drive assignments, so it might be a race 
condition where when the blocking jobs start they still want the same 
Volume as the jobs that run, because the jobs that run are still setting 
up Accurate backup and haven't been solidly assigned that Volume yet.  I 
don't know.  It's rather annoying, especially as we attempt to ramp up 
our backup capacity.

Lastly, it doesn't ALWAYS happen, though it does seem to happen more 
often than not.



 If I do a status on the Director for instance and see the jobs for the
 next day lined up in Scheduled jobs, they all have the same Volume

Re: [Bacula-users] wanted on DEVICE-0, is in use by device DEVICE-1

2012-11-05 Thread Stephen Thompson
On 11/05/2012 01:17 PM, Josh Fisher wrote:

 On 11/5/2012 11:03 AM, Stephen Thompson wrote:

 On 11/5/12 7:59 AM, John Drescher wrote:
 I've had the following problem for ages (meaning multiple major
 revisions of bacula) and I've seen this come up from time to time on the
 mailing list, but I've never actually seen a resolution (please point me
 to one if it's been found).


 background:

 I run monthly Fulls and nightly Incrementals.  I have a 2 drive
 autochanger dedicated to my Incrementals.  I launch something like ~150
 Incremental jobs each night.  I am configured for 8 concurrent jobs on
 the Storage Daemon.


 PROBLEM:

 The first job(s) grab one of the 2 devices available in the changer
 (which is set to AutoSelect) and either load a tape, or use a tape from
 the previous evening.  All tapes in the changer are in the same
 Incremenal-Pool.

 The second jobs(s) grab the other of the 2 devices available in the
 changer, but want to use the same tape that's just been mounted (or put
 into use) on the jobs that got launched first.  They will often literal
 wait the entire evening until 100's of jobs run through on only one
 device, until that tape is freed up, at which point it is unmounted from
 the first device and moved to the second.

 Note, the behaviour seems to be to round-robin my 8 concurrency limit
 between the 2 available drives, which mean 4 jobs will run, and 4 jobs
 will block on waiting for the wanted Volume.  When the original 4 jobs
 are completed (not at the same time) additional jobs are launched that
 keep that wanted Volume in use.


 LOG:

 03-Nov 22:00 DIRECTOR JobId 267433: Start Backup JobId 267433, Job=JOB.
 2012-11-03_22.00.00_0403-Nov 22:00 DIRECTOR JobId 267433: Using Device
 L100-Drive-003-Nov 22:00 DIRECTOR JobId 267433: Sending Accurate
 information.
 03-Nov 22:00 sd_L100_ JobId 267433: 3307 Issuing autochanger unload
 slot 82, drive 0 command.
 03-Nov 22:06 lawson-sd_L100_ JobId 267433: Warning: Volume IM0108
 wanted on L100-Drive-0 (/dev/L100-Drive-0) is in use by device
 L100-Drive-1 (/dev/L100-Drive-1)
 03-Nov 22:09 sd_L100_ JobId 267433: Warning: Volume IM0108 wanted on
 L100-Drive-0 (/dev/L100-Drive-0) is in use by device L100-Drive-1
 (/dev/L100-Drive-1)
 03-Nov 22:09 sd_L100_ JobId 267433: Warning: mount.c:217 Open device
 L100-Drive-0 (/dev/L100-Drive-0) Volume IM0108 failed: ERR=dev.c:513
 Unable to open device L100-Drive-0 (/dev/L100-Drive-0): ERR=No medium
 found
 .
 .
 .


 CONFIGS (partial and seem pretty straight-forward):

 Schedule {
   Name = DefaultSchedule
   Run = Level=Incremental   sat-thu at 
 22:00
   Run = Level=Differential  fri at 
 22:00
 }

 JobDefs {
   Name = DefaultJob
   Type = Backup
   Level = Full
   Schedule = DefaultSchedule
   Incremental Backup Pool = Incremental-Pool
   Differential Backup Pool = Incremental-Pool
 }

 Pool {
   Name = Incremental-Pool
   Pool Type = Backup
   Storage = L100-changer
 }

 Storage {
   Name = L100-changer
   Device = L100-changer
   Media Type = LTO-3
   Autochanger = yes
   Maximum Concurrent Jobs = 8
 }

 Autochanger {
   Name = L100-changer
   Device = L100-Drive-0
   Device = L100-Drive-1
   Changer Device = /dev/L100-changer
 }

 Device {
   Name = L100-Drive-0
   Drive Index = 0
   Media Type = LTO-3
   Archive Device = /dev/L100-Drive-0
   AutomaticMount = yes;
   AlwaysOpen = yes;
   RemovableMedia = yes;
   RandomAccess = no;
   AutoChanger = yes;
   AutoSelect = yes;
 }

 Device {
   Name = L100-Drive-1
   Drive Index = 0
   Media Type = LTO-3
   Archive Device = /dev/L100-Drive-1
   AutomaticMount = yes;
   AlwaysOpen = yes;
   RemovableMedia = yes;
   RandomAccess = no;
   AutoChanger = yes;
   AutoSelect = yes;
 }

 I do not have a good solution but I know by default bacula does not
 want to load the same pool into more than 1 storage device at the same
 time.

 John

 I think it's something in the automated logic.  Because if I launch jobs
 by hand (same pool across 2 tapes devices in same autochanger)
 everything works fine.  I think it has more to do with the Scheduler
 assigning the same same Volume to all jobs and then not wanting to
 change that choice if that Volume is in use.

 When both jobs start at the same time and same priority, they see the
 same exact next available volume for the pool, and so both select the
 same volume. When they select different drives, it is a problem, since
 the volume can only be in one drive.

 When you start the jobs manually, I assume you are starting them at
 different times. This works, because the first job is up and running
 with the volume loaded before the second job begins its selection
 process. One way to handle this issue is to have a different Schedule
 for each job and start the jobs at different times with one

Re: [Bacula-users] wanted on DEVICE-0, is in use by device DEVICE-1

2012-11-05 Thread Stephen Thompson


Going to try this out.

Stephen



On 11/05/2012 02:40 PM, Josh Fisher wrote:

 On 11/5/2012 4:28 PM, Stephen Thompson wrote:
 On 11/05/2012 01:17 PM, Josh Fisher wrote:
 On 11/5/2012 11:03 AM, Stephen Thompson wrote:
 On 11/5/12 7:59 AM, John Drescher wrote:
 I've had the following problem for ages (meaning multiple major
 revisions of bacula) and I've seen this come up from time to time on the
 mailing list, but I've never actually seen a resolution (please point me
 to one if it's been found).


 background:

 I run monthly Fulls and nightly Incrementals.  I have a 2 drive
 autochanger dedicated to my Incrementals.  I launch something like ~150
 Incremental jobs each night.  I am configured for 8 concurrent jobs on
 the Storage Daemon.


 PROBLEM:

 The first job(s) grab one of the 2 devices available in the changer
 (which is set to AutoSelect) and either load a tape, or use a tape from
 the previous evening.  All tapes in the changer are in the same
 Incremenal-Pool.

 The second jobs(s) grab the other of the 2 devices available in the
 changer, but want to use the same tape that's just been mounted (or put
 into use) on the jobs that got launched first.  They will often literal
 wait the entire evening until 100's of jobs run through on only one
 device, until that tape is freed up, at which point it is unmounted from
 the first device and moved to the second.

 Note, the behaviour seems to be to round-robin my 8 concurrency limit
 between the 2 available drives, which mean 4 jobs will run, and 4 jobs
 will block on waiting for the wanted Volume.  When the original 4 jobs
 are completed (not at the same time) additional jobs are launched that
 keep that wanted Volume in use.


 LOG:

 03-Nov 22:00 DIRECTOR JobId 267433: Start Backup JobId 267433, Job=JOB.
 2012-11-03_22.00.00_0403-Nov 22:00 DIRECTOR JobId 267433: Using Device
 L100-Drive-003-Nov 22:00 DIRECTOR JobId 267433: Sending Accurate
 information.
 03-Nov 22:00 sd_L100_ JobId 267433: 3307 Issuing autochanger unload
 slot 82, drive 0 command.
 03-Nov 22:06 lawson-sd_L100_ JobId 267433: Warning: Volume IM0108
 wanted on L100-Drive-0 (/dev/L100-Drive-0) is in use by device
 L100-Drive-1 (/dev/L100-Drive-1)
 03-Nov 22:09 sd_L100_ JobId 267433: Warning: Volume IM0108 wanted on
 L100-Drive-0 (/dev/L100-Drive-0) is in use by device L100-Drive-1
 (/dev/L100-Drive-1)
 03-Nov 22:09 sd_L100_ JobId 267433: Warning: mount.c:217 Open device
 L100-Drive-0 (/dev/L100-Drive-0) Volume IM0108 failed: ERR=dev.c:513
 Unable to open device L100-Drive-0 (/dev/L100-Drive-0): ERR=No medium
 found
 .
 .
 .


 CONFIGS (partial and seem pretty straight-forward):

 Schedule {
 Name = DefaultSchedule
 Run = Level=Incremental   sat-thu at 
 22:00
 Run = Level=Differential  fri at 
 22:00
 }

 JobDefs {
 Name = DefaultJob
 Type = Backup
 Level = Full
 Schedule = DefaultSchedule
 Incremental Backup Pool = Incremental-Pool
 Differential Backup Pool = Incremental-Pool
 }

 Pool {
 Name = Incremental-Pool
 Pool Type = Backup
 Storage = L100-changer
 }

 Storage {
 Name = L100-changer
 Device = L100-changer
 Media Type = LTO-3
 Autochanger = yes
 Maximum Concurrent Jobs = 8
 }

 Autochanger {
 Name = L100-changer
 Device = L100-Drive-0
 Device = L100-Drive-1
 Changer Device = /dev/L100-changer
 }

 Device {
 Name = L100-Drive-0
 Drive Index = 0
 Media Type = LTO-3
 Archive Device = /dev/L100-Drive-0
 AutomaticMount = yes;
 AlwaysOpen = yes;
 RemovableMedia = yes;
 RandomAccess = no;
 AutoChanger = yes;
 AutoSelect = yes;
 }

 Device {
 Name = L100-Drive-1
 Drive Index = 0
 Media Type = LTO-3
 Archive Device = /dev/L100-Drive-1
 AutomaticMount = yes;
 AlwaysOpen = yes;
 RemovableMedia = yes;
 RandomAccess = no;
 AutoChanger = yes;
 AutoSelect = yes;
 }

 I do not have a good solution but I know by default bacula does not
 want to load the same pool into more than 1 storage device at the same
 time.

 John

 I think it's something in the automated logic.  Because if I launch jobs
 by hand (same pool across 2 tapes devices in same autochanger)
 everything works fine.  I think it has more to do with the Scheduler
 assigning the same same Volume to all jobs and then not wanting to
 change that choice if that Volume is in use.
 When both jobs start at the same time and same priority, they see the
 same exact next available volume for the pool, and so both select the
 same volume. When they select different drives, it is a problem, since
 the volume can only be in one drive.

 When you start the jobs manually, I assume you are starting them at
 different times. This works, because the first job is up

Re: [Bacula-users] wanted on DEVICE-0, is in use by device DEVICE-1

2012-11-05 Thread Stephen Thompson


No such luck.  I already have Prefer Mounted Volumes = no set for all 
jobs.  That's apparently not a solution.

Stephen



On 11/5/12 2:57 PM, Stephen Thompson wrote:


 Going to try this out.

 Stephen



 On 11/05/2012 02:40 PM, Josh Fisher wrote:

 On 11/5/2012 4:28 PM, Stephen Thompson wrote:
 On 11/05/2012 01:17 PM, Josh Fisher wrote:
 On 11/5/2012 11:03 AM, Stephen Thompson wrote:
 On 11/5/12 7:59 AM, John Drescher wrote:
 I've had the following problem for ages (meaning multiple major
 revisions of bacula) and I've seen this come up from time to time on the
 mailing list, but I've never actually seen a resolution (please point me
 to one if it's been found).


 background:

 I run monthly Fulls and nightly Incrementals.  I have a 2 drive
 autochanger dedicated to my Incrementals.  I launch something like ~150
 Incremental jobs each night.  I am configured for 8 concurrent jobs on
 the Storage Daemon.


 PROBLEM:

 The first job(s) grab one of the 2 devices available in the changer
 (which is set to AutoSelect) and either load a tape, or use a tape from
 the previous evening.  All tapes in the changer are in the same
 Incremenal-Pool.

 The second jobs(s) grab the other of the 2 devices available in the
 changer, but want to use the same tape that's just been mounted (or put
 into use) on the jobs that got launched first.  They will often literal
 wait the entire evening until 100's of jobs run through on only one
 device, until that tape is freed up, at which point it is unmounted from
 the first device and moved to the second.

 Note, the behaviour seems to be to round-robin my 8 concurrency limit
 between the 2 available drives, which mean 4 jobs will run, and 4 jobs
 will block on waiting for the wanted Volume.  When the original 4 jobs
 are completed (not at the same time) additional jobs are launched that
 keep that wanted Volume in use.


 LOG:

 03-Nov 22:00 DIRECTOR JobId 267433: Start Backup JobId 267433, Job=JOB.
 2012-11-03_22.00.00_0403-Nov 22:00 DIRECTOR JobId 267433: Using Device
 L100-Drive-003-Nov 22:00 DIRECTOR JobId 267433: Sending Accurate
 information.
 03-Nov 22:00 sd_L100_ JobId 267433: 3307 Issuing autochanger unload
 slot 82, drive 0 command.
 03-Nov 22:06 lawson-sd_L100_ JobId 267433: Warning: Volume IM0108
 wanted on L100-Drive-0 (/dev/L100-Drive-0) is in use by device
 L100-Drive-1 (/dev/L100-Drive-1)
 03-Nov 22:09 sd_L100_ JobId 267433: Warning: Volume IM0108 wanted on
 L100-Drive-0 (/dev/L100-Drive-0) is in use by device L100-Drive-1
 (/dev/L100-Drive-1)
 03-Nov 22:09 sd_L100_ JobId 267433: Warning: mount.c:217 Open device
 L100-Drive-0 (/dev/L100-Drive-0) Volume IM0108 failed: ERR=dev.c:513
 Unable to open device L100-Drive-0 (/dev/L100-Drive-0): ERR=No medium
 found
 .
 .
 .


 CONFIGS (partial and seem pretty straight-forward):

 Schedule {
  Name = DefaultSchedule
  Run = Level=Incremental   sat-thu 
 at 22:00
  Run = Level=Differential  fri 
 at 22:00
 }

 JobDefs {
  Name = DefaultJob
  Type = Backup
  Level = Full
  Schedule = DefaultSchedule
  Incremental Backup Pool = Incremental-Pool
  Differential Backup Pool = Incremental-Pool
 }

 Pool {
  Name = Incremental-Pool
  Pool Type = Backup
  Storage = L100-changer
 }

 Storage {
  Name = L100-changer
  Device = L100-changer
  Media Type = LTO-3
  Autochanger = yes
  Maximum Concurrent Jobs = 8
 }

 Autochanger {
  Name = L100-changer
  Device = L100-Drive-0
  Device = L100-Drive-1
  Changer Device = /dev/L100-changer
 }

 Device {
  Name = L100-Drive-0
  Drive Index = 0
  Media Type = LTO-3
  Archive Device = /dev/L100-Drive-0
  AutomaticMount = yes;
  AlwaysOpen = yes;
  RemovableMedia = yes;
  RandomAccess = no;
  AutoChanger = yes;
  AutoSelect = yes;
 }

 Device {
  Name = L100-Drive-1
  Drive Index = 0
  Media Type = LTO-3
  Archive Device = /dev/L100-Drive-1
  AutomaticMount = yes;
  AlwaysOpen = yes;
  RemovableMedia = yes;
  RandomAccess = no;
  AutoChanger = yes;
  AutoSelect = yes;
 }

 I do not have a good solution but I know by default bacula does not
 want to load the same pool into more than 1 storage device at the same
 time.

 John

 I think it's something in the automated logic.  Because if I launch jobs
 by hand (same pool across 2 tapes devices in same autochanger)
 everything works fine.  I think it has more to do with the Scheduler
 assigning the same same Volume to all jobs and then not wanting to
 change that choice if that Volume is in use.
 When both jobs start at the same time and same priority, they see the
 same exact next available volume for the pool, and so both select the
 same volume. When

[Bacula-users] Fwd: Re: wanted on DEVICE-0, is in use by device DEVICE-1

2012-11-06 Thread Stephen Thompson


A quick test of this scenario seems to work.
Leaving Prefer Mounted Volumes = yes (default).
Setting both drives in autochanger to have 1/2 of the the total 
concurrently limit.  This per device setting seems to allow for multiple 
drives using the same Pool.

Not very well documented IMHO.

Stephen





 Original Message 
Return-Path:bob_het...@hotmail.com





Are you using the setting:

prefer mounted volumes=yes or no
?

If you had it set to yes, then you'd never use the 2nd tape drive, but
if you set it to no, sometimes you'd hit a deadlock.

I used to have an environment with more than a hundred daily jobs and
would hit a contention issue occasionally.  The developers eventually
abandoned that code in favor of setting the maximum concurrent jobs per
device

http://www.bacula.org/5.2.x-manuals/en/main/main/New_Features_in_5_0_0.html#SECTION0091


In addition, another problem I hit occasionally would appear after
upgrading the OS.  If you update your system you may need to rebuild
bacula.  Before I started rebuilding bacula at the end of system updates
I would hit race conditions and process crashes.

  Bob




--
LogMeIn Central: Instant, anywhere, Remote PC access and management.
Stay in control, update software, and manage PCs from one command center
Diagnose problems and improve visibility into emerging IT issues
Automate, monitor and manage. Do more in less time with Central
http://p.sf.net/sfu/logmein12331_d2d
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Migrating from myisam to innodb

2013-03-01 Thread Stephen Thompson


Another perspective...

I've personally found that if your memory is limited (my bacula db 
server has 8Gb of RAM) that, for a bacula database, mysql performs 
_better_ than postgres.  My File table currently has 2,856,394,323 rows.

I've seen so many recommendations here and elsewhere about postgres 
being an obvious choice over mysql, but in real life practice, we've 
found at our site that mysql gave us better results (even after weeks of 
tuning postgres).

Our hybrid solution is to run mysql INNODB as the active database so to 
avoid table-locking which causes all kinds of problems, especially 
operator access to bconsole.  However, due to the painfully slow dumps 
from INNODB, we have a slave mysql server running MYISAM that we use for 
regular ole mysql dumps.

In general this works out fairly well for us.

The only unresolved issue that we have is that some of the bacula 
queries can take awhile to return.  I've tracked it down the way the db 
engine is responding to the query, but the odd thing is that the first 
time these queries run, they are quick, then the mysql engine changes 
the recipe it uses to a slower one.  Haven't figured out why or how to 
keep it running the quick way.

Stephen




On 03/01/2013 03:16 AM, Uwe Schuerkamp wrote:
 On Tue, Feb 26, 2013 at 04:23:20PM +, Alan Brown wrote:
 On 26/02/13 09:42, Uwe Schuerkamp wrote:


 for the record I'd like to give you some stats from our recent myisam
 - innodb conversion.


 For the sizes you're talking about, I'd recommend:

 1: A _lot_ more memory. 100Gb or so.

 and even more strongly:

 2: Postgresql


 Mysql is fast and good for small databases, but postgresql scales to
 large sizes with a lot less pain and suffering. Conversion here was
 relatively painless.



 Hi Alan  list,

 can you point me to some good conversion guides and esp. utlities? I
 checked the postgres documentation wiki, but half of the scripts
 linked there are dead it seems. I tried converting a mysql dump to pg
 using my2pg.pl, but the poor script ran out of memory 30 minutes into
 the conversion on the test machine (Centos 6, 8GB RAM ;-)

 I'm hoping our File table will get a lot smaller now over time as
 we've moved away from copy jobs for the time being, so the conversion
 should also get easier as tape volumes with millions of files on them
 get recycled and pruned.

 All the best, Uwe



-- 
Stephen Thompson   Berkeley Seismological Laboratory
step...@seismo.berkeley.edu215 McCone Hall # 4760
510.214.6506 (phone)   University of California, Berkeley
510.643.5811 (fax) Berkeley, CA 94720-4760

--
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_feb
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


[Bacula-users] duplicate job storage device bug?

2013-08-03 Thread Stephen Thompson


Hey all,

Figured I'd throw this out there before opening a ticket in case this is 
already known or I'm just confused.

We use duplicate job control for the following reason:  We run nightly 
Incrementals of _all_ jobs.  Then rather than running Fulls on a cyclic 
schedule, we run them back-to-back, injecting a few at a time via 
scripts.  Note, we also have two tape libraries (and two SDs), one for 
Incremental Pools and one for Full Pools.

Where duplicate job control comes in is that we want a running 
Incremental to be canceled if a Full of the same job is launched on any 
given night since the Full, in our case, should take precedence and be 
run immediately.  What we see is that the Full does indeed cancel the 
running Incremental and then runs itself, HOWEVER the Full job takes on 
the storage properties (storage device) of the canceled Incremental job 
rather than using it's own settings.  The Full job then expects its Full 
Pool tape to be in the Incremental tape library, which it is not, and 
the job stalls for operator intervention.

Here's some config snippets:

   Maximum Concurrent Jobs = 2
   Allow Duplicate Jobs = no
   Cancel Lower Level Duplicates = yes
   Cancel Running Duplicates = no
   Cancel Queued Duplicates = no

Log snippets:

(incremental launches)
03-Aug 04:05 DIRECTOR JobId 316646: Start Backup JobId 316646, 
Job=CLIENT.2013-08-02_22.01.01_50
03-Aug 04:05 DIRECTOR JobId 316646: Using Device L100-Drive-0 to write.

(full launches and cancels incremental)
03-Aug 06:20 DIRECTOR JobId 316677: Cancelling duplicate JobId=316646.
03-Aug 06:20 DIRECTOR JobId 316677: 2001 Job 
sutter_5.2013-08-02_22.01.01_50 marked to be canceled.
03-Aug 06:20 DIRECTOR JobId 316677: Cancelling duplicate JobId=316646.
03-Aug 06:20 DIRECTOR JobId 316677: 2901 Job 
sutter_5.2013-08-02_22.01.01_50 not found.
03-Aug 06:20 DIRECTOR JobId 316677: 3904 Job 
sutter_5.2013-08-02_22.01.01_50 not found.
03-Aug 08:20 DIRECTOR JobId 316677: Start Backup JobId 316677, 
Job=sutter_5.2013-08-03_06.20.02_04

(full complains that volume is tried to load is incremental tape instead 
of full tape)
03-Aug 08:22 DIRECTOR JobId 316677: Using Device L100-Drive-0 to write.
03-Aug 08:22 SD_L100_ JobId 316677: 3304 Issuing autochanger load slot 
72, drive 0 command.
03-Aug 08:23 SD_L100_ JobId 316677: 3305 Autochanger load slot 72, 
drive 0, status is OK.
03-Aug 08:23 SD_L100_ JobId 316677: Warning: Director wanted Volume 
FB0718.
 Current Volume IM0097 not acceptable because:
 1998 Volume IM0097 catalog status is Full, not in Pool.

NOTE: Full job launch command was run job=sutter_5 level=Full 
storage=SL500-Drive-1 yes and yet, apparently, due to the job duplicate 
cancellation, the Full job instead attempted to use storage=L100-Drive-0.


thanks,
Stephen
-- 
Stephen Thompson   Berkeley Seismological Laboratory
step...@seismo.berkeley.edu215 McCone Hall # 4760
510.214.6506 (phone)   University of California, Berkeley
510.643.5811 (fax) Berkeley, CA 94720-4760

--
Get your SQL database under version control now!
Version control is standard for application code, but databases havent 
caught up. So what steps can you take to put your SQL databases under 
version control? Why should you start doing it? Read more to find out.
http://pubads.g.doubleclick.net/gampad/clk?id=49501711iu=/4140/ostg.clktrk
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] choosing database.

2013-09-19 Thread Stephen Thompson


The answer may partly come from how much RAM the system running the 
database has.  I've seen numerous preferences for postgres on this 
mailing list, but I've personally found on my 8Gb RAM system, I get 
better performance out of mysql.  We backup about 130+ hosts, 
incrementals nightly, differentials weekly, fulls monthly (~40TB).

Stephen


On 9/19/13 8:06 AM, Mauro wrote:
 Hello.
 I'm using bacula in a linux debian system.
 I've to backup about 30 hosts.
 I've choose postresql as database.
 What do you think about?
 Better mysql or postgres?


 --
 LIMITED TIME SALE - Full Year of Microsoft Training For Just $49.99!
 1,500+ hours of tutorials including VisualStudio 2012, Windows 8, SharePoint
 2013, SQL 2012, MVC 4, more. BEST VALUE: New Multi-Library Power Pack includes
 Mobile, Cloud, Java, and UX Design. Lowest price ever! Ends 9/20/13.
 http://pubads.g.doubleclick.net/gampad/clk?id=58041151iu=/4140/ostg.clktrk



 ___
 Bacula-users mailing list
 Bacula-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/bacula-users


-- 
Stephen Thompson   Berkeley Seismological Laboratory
step...@seismo.berkeley.edu215 McCone Hall # 4760
510.214.6506 (phone)   University of California, Berkeley
510.643.5811 (fax) Berkeley, CA 94720-4760

--
LIMITED TIME SALE - Full Year of Microsoft Training For Just $49.99!
1,500+ hours of tutorials including VisualStudio 2012, Windows 8, SharePoint
2013, SQL 2012, MVC 4, more. BEST VALUE: New Multi-Library Power Pack includes
Mobile, Cloud, Java, and UX Design. Lowest price ever! Ends 9/20/13. 
http://pubads.g.doubleclick.net/gampad/clk?id=58041151iu=/4140/ostg.clktrk
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] choosing database.

2013-09-19 Thread Stephen Thompson
On 09/19/2013 08:51 AM, Mauro wrote:
 On 19 September 2013 17:20, Stephen Thompson
 step...@seismo.berkeley.edu mailto:step...@seismo.berkeley.edu wrote:



 The answer may partly come from how much RAM the system running the
 database has.  I've seen numerous preferences for postgres on this
 mailing list, but I've personally found on my 8Gb RAM system, I get
 better performance out of mysql.  We backup about 130+ hosts,
 incrementals nightly, differentials weekly, fulls monthly (~40TB).


 In my case the ram is not a problem, bacula server is in a virtual
 machine, I'm using xen, actually my ram is 4G but I can increase.
 I've to backup about 30 host, four of which have a lot of data to be
 backed up.
 One has about 80G of data, multimedia files and other.
 I've always used postgres for all my needs so I though to use it also
 for bacula server.


Given what you're going to backup, I don't think it's really going to 
matter which database you choose.  Pick whichever database you're more 
familiar with, as that's likely going to be the only difference you'll 
notice between them.

Also, in this discussion folks don't always immediately bring up 
retention as that (along with the number, not size, of files you backup) 
is going to determine your database size.  Since 90+% of the bacula 
database is the File table, that's where good or poor performance is 
going to exhibit itself.

We have a 300-400Gb File table and get reasonable performance from mysql 
and 8Gb of RAM.  We run the innodb engine for bacula itself (less 
blocking than myisam), and the myisam engine on a slave server for 
catalog dumps (faster dumps than innodb).


Stephen
-- 
Stephen Thompson   Berkeley Seismological Laboratory
step...@seismo.berkeley.edu215 McCone Hall # 4760
510.664.9177 (phone)   University of California, Berkeley
510.643.5811 (fax) Berkeley, CA 94720-4760

--
LIMITED TIME SALE - Full Year of Microsoft Training For Just $49.99!
1,500+ hours of tutorials including VisualStudio 2012, Windows 8, SharePoint
2013, SQL 2012, MVC 4, more. BEST VALUE: New Multi-Library Power Pack includes
Mobile, Cloud, Java, and UX Design. Lowest price ever! Ends 9/20/13. 
http://pubads.g.doubleclick.net/gampad/clk?id=58041151iu=/4140/ostg.clktrk
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


[Bacula-users] bconsole 7.0.2 storage status issue

2014-04-16 Thread Stephen Thompson

Hello,

Wanting to confirm something new I'm seeing in 7.0.2 with bconsole.  I 
have multiple storage daemons with multiple devices.  Used to be 
(5.2.13) that a status and then 2: Storage in bconsole would present 
a list of storage devices to query.  Not it immediately returns only the 
status of the first device I have configured for my Director.  A mount 
command in comparison, will present me with what I am used to -- the 
list of devices to choose from.  Is this a feature?  A bug?

thanks,
Stephen
-- 
Stephen Thompson   Berkeley Seismological Laboratory
step...@seismo.berkeley.edu215 McCone Hall # 4760
510.214.6506 (phone)   University of California, Berkeley
510.643.5811 (fax) Berkeley, CA 94720-4760

--
Learn Graph Databases - Download FREE O'Reilly Book
Graph Databases is the definitive new guide to graph databases and their
applications. Written by three acclaimed leaders in the field,
this first edition is now available. Download your free book today!
http://p.sf.net/sfu/NeoTech
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Bug when canceling a job in bconsole on 7.0.2?

2014-04-29 Thread Stephen Thompson


I believe I've seen this unwanted behaviour as well.  I cannot test, as 
at the moment I have a job running that I could not have accidentally 
canceled, but this past weekend I attempted to cancel a running 
Incremental job by number (as I have successfully many times in the 
past), but somehow a different Full job that was also running at the 
time got canceled as well.

Stephen


On 4/28/14 7:15 PM, Bill Arlofski wrote:

 Whoops... Clicked send too soon.

 Just a follow-up.

 I went ahead and chose #1 in the list to see if it would cancel both jobs. It 
 did:

 *can
 Select Job(s):
   1: JobId=25775 Job=Helpdesk.2014-04-28_20.30.00_52
   2: JobId=25776 Job=Postbooks.2014-04-28_20.30.00_53
 Choose Job list to cancel (1-2): 1
 JobId=25775 Job=Helpdesk.2014-04-28_20.30.00_52
 JobId=25776 Job=Postbooks.2014-04-28_20.30.00_53
 Confirm cancel of 2 Jobs (yes/no): yes
 2001 Job Helpdesk.2014-04-28_20.30.00_52 marked to be canceled.
 3000 JobId=25775 Job=Helpdesk.2014-04-28_20.30.00_52 marked to be canceled.
 2001 Job Postbooks.2014-04-28_20.30.00_53 marked to be canceled.
 3000 JobId=25776 Job=Postbooks.2014-04-28_20.30.00_53 marked to be canceled.



-- 
Stephen Thompson   Berkeley Seismological Laboratory
step...@seismo.berkeley.edu215 McCone Hall # 4760
510.214.6506 (phone)   University of California, Berkeley
510.643.5811 (fax) Berkeley, CA 94720-4760

--
Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE
Instantly run your Selenium tests across 300+ browser/OS combos.  Get 
unparalleled scalability from the best Selenium testing platform available.
Simple to use. Nothing to install. Get started now for free.
http://p.sf.net/sfu/SauceLabs
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Fatal error: askdir.c:340 NULL Volume name. This shouldn't happen!!!

2014-05-05 Thread Stephen Thompson


Hello,

I believe this bug is present in version 7.0.3.

I just had it happen last night, much like I saw about 2 years ago.  I 
run 100s of incrementals each night across 2 LTO tap drives, running 
with a concurrency limit, so that jobs start whenever others are 
finished (i.e. I cannot stagger their start times.).  I'm assuming this 
is again a race condition, but one as an end-user I really cannot 
workaround.

So far the problem is not frequent, but does still appear to be an issue.

thanks,
Stephen




On 02/20/2014 09:30 AM, Kern Sibbald wrote:
 Hello Wolfgang,

 The drive is allocated first.  Your analysis is correct, but
 obviously something is wrong.  I don't think this is happening
 any more with the Enterprise version, so it will very likely
 be fixed in the next release as we will backport (or flowback)
 some rather massive changes we have made in the last
 during the freeze to the community version.

 If you want to see what is going on a little more, turn on
 a debug level in the SD of about 100.  Likewise you can set a debug
 level in the SD of say 1 or 2, then when you do a status,
 if Bacula is having difficulties reserving a drive, it will print
 out more detailed information on what is going on -- this last
 is most effective if jobs end up waiting because a resource
 (drive or volume) is not available.

 Best regards,
 Kern

 On 02/17/2014 11:54 PM, Wolfgang Denk wrote:
 Dear Kern Sibbald,

 In message 5301db23.6010...@sibbald.com you wrote:
 Were you careful to change the actual volume retention period in
 the catalog entry for the volume?  That requires a manual step after
 changing the conf file.  You can check two ways:
 Yes, I was. list volumes shows the new retention period for all
 volumes.

 1. Look at the full output from all the jobs and see if any
 volumes were recycled while the batch of jobs ran.
 Not in this run, and not in any of the last 15 or so before that.

 2. Do a llist on all the volumes that were used during the
 period the problem happened and see if they were freshly
 recycled and that the retention period is set to your new
 value.
 retention period is as expected, no recycling happened.

 In any case, I will look over your previous emails to see if I see
 anything that could point to a problem, and I will look at the bug
 report, but without a test case, this is one of those nightmare
 bugs that take huge resources and time to fix.
 Hm... I wonder why the DIR allocates for two simultaneous running jobs
 two pairs of (DRIVE, VOLUME), but not using the volume currently
 mounted in the respective drive, but in the other one.  I would
 expect, that when a job starts, that either a volume or a drive is
 selected first:

 - if the drive is selected first, and it has a tape loaded which is in
the right pool, and in status append, then there should be no need
to ask for any other tape.
 - if the volume is allocated first, and it is already loaded in a
suitable drive, then that drive should be used, ant not the other
one.

 Best regards,

 Wolfgang Denk



 --
 Managing the Performance of Cloud-Based Applications
 Take advantage of what the Cloud has to offer - Avoid Common Pitfalls.
 Read the Whitepaper.
 http://pubads.g.doubleclick.net/gampad/clk?id=121054471iu=/4140/ostg.clktrk
 ___
 Bacula-users mailing list
 Bacula-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/bacula-users



-- 
Stephen Thompson   Berkeley Seismological Laboratory
step...@seismo.berkeley.edu215 McCone Hall # 4760
510.664.9177 (phone)   University of California, Berkeley
510.643.5811 (fax) Berkeley, CA 94720-4760

--
Is your legacy SCM system holding you back? Join Perforce May 7 to find out:
#149; 3 signs your SCM is hindering your productivity
#149; Requirements for releasing software faster
#149; Expert tips and advice for migrating your SCM now
http://p.sf.net/sfu/perforce
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Bug when canceling a job in bconsole on 7.0.2?

2014-05-23 Thread Stephen Thompson


I may be able to test at the end of the month.  Right now I have 
continuous jobs running that I'd rather not inadvertently cancel.

Stephen


On 5/22/14 8:37 AM, Bill Arlofski wrote:
 On 05/22/14 11:28, Kern Sibbald wrote:
 Hello Bill,

 I have also pushed a patch that may well fix the problem you are
 having with cancel.  I have never been able to reproduce the problem,
 but I did yet another rewrite of the sellist routine as well as
 designed a number of tests, none of which every failed.  However, in
 the process I noticed that the source code that called the sellist
 methods was using the wrong calling sequence (my own fault).  I am
 pretty sure that is what was causing your problem.  In any case, this
 new code is in the current git public repo and I would appreciate it
 if you would test it.

 Best regards,
 Kern


 Hi Kern, I saw that you wrote the above as an add-on to another thread,
 I am posting it here so that this thread is complete too.

 I currently don't have time to test this, but perhaps Stephen who is
 also seeing this issue might.

 I will test it as soon as I have some free time, unless of course
 Stephen or someone else has confirmed that the patch fixes the issue.

 Thanks Kern!


 Bill


 --
 Bill Arlofski
 Reverse Polarity, LLC
 http://www.revpol.com/
 -- Not responsible for anything below this line --


-- 
Stephen Thompson   Berkeley Seismological Laboratory
step...@seismo.berkeley.edu215 McCone Hall # 4760
510.214.6506 (phone)   University of California, Berkeley
510.643.5811 (fax) Berkeley, CA 94720-4760

--
Accelerate Dev Cycles with Automated Cross-Browser Testing - For FREE
Instantly run your Selenium tests across 300+ browser/OS combos.
Get unparalleled scalability from the best Selenium testing platform available
Simple to use. Nothing to install. Get started now for free.
http://p.sf.net/sfu/SauceLabs
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] RESTORE PRUNED FILE (WITH CATALOG BACKUPS)

2014-05-29 Thread Stephen Thompson


If you have the flexibility to do this, the simplest way might be to 
restore the catalog from tape, shut down bacula, temporarily move aside 
your up-to-date database and put the restored database in it's place 
(this is likely restoring the database from a dump file), do your 
restore now that you have a version of the database with the purged 
files, then once the restore is complete, shutdown bacula and move your 
up-to-date database back into place.


Stephen


On 5/29/14 6:49 AM, david parada wrote:
 Thanks John,

 I am not very confidence with BSCAN. Can you tell me an example to add files 
 again to catalog using your way?

 Kind regards,


 David

 +--
 |This was sent by david.par...@techex.es via Backup Central.
 |Forward SPAM to ab...@backupcentral.com.
 +--



 --
 Time is money. Stop wasting it! Get your web API in 5 minutes.
 www.restlet.com/download
 http://p.sf.net/sfu/restlet
 ___
 Bacula-users mailing list
 Bacula-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/bacula-users


-- 
Stephen Thompson   Berkeley Seismological Laboratory
step...@seismo.berkeley.edu215 McCone Hall # 4760
510.214.6506 (phone)   University of California, Berkeley
510.643.5811 (fax) Berkeley, CA 94720-4760

--
Time is money. Stop wasting it! Get your web API in 5 minutes.
www.restlet.com/download
http://p.sf.net/sfu/restlet
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] RESTORE PRUNED FILE (WITH CATALOG BACKUPS)

2014-05-29 Thread Stephen Thompson


I didn't mention this, but of course, you would not want to run any 
other jobs (or really do anything with bacula at all!) while running the 
old database beyond the restore of the files, otherwise those changes 
won't make it into your up-to-date database you ultimately run with.

On 5/29/14 7:21 AM, Stephen Thompson wrote:


 If you have the flexibility to do this, the simplest way might be to
 restore the catalog from tape, shut down bacula, temporarily move aside
 your up-to-date database and put the restored database in it's place
 (this is likely restoring the database from a dump file), do your
 restore now that you have a version of the database with the purged
 files, then once the restore is complete, shutdown bacula and move your
 up-to-date database back into place.


 Stephen


 On 5/29/14 6:49 AM, david parada wrote:
 Thanks John,

 I am not very confidence with BSCAN. Can you tell me an example to add files 
 again to catalog using your way?

 Kind regards,


 David

 +--
 |This was sent by david.par...@techex.es via Backup Central.
 |Forward SPAM to ab...@backupcentral.com.
 +--



 --
 Time is money. Stop wasting it! Get your web API in 5 minutes.
 www.restlet.com/download
 http://p.sf.net/sfu/restlet
 ___
 Bacula-users mailing list
 Bacula-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/bacula-users



-- 
Stephen Thompson   Berkeley Seismological Laboratory
step...@seismo.berkeley.edu215 McCone Hall # 4760
510.214.6506 (phone)   University of California, Berkeley
510.643.5811 (fax) Berkeley, CA 94720-4760

--
Time is money. Stop wasting it! Get your web API in 5 minutes.
www.restlet.com/download
http://p.sf.net/sfu/restlet
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] Bug when canceling a job in bconsole on 7.0.2?

2014-06-10 Thread Stephen Thompson


ver 7.0.4 does not appear to have the canceling job issue I saw in 
7.0.2. yay! ...and thanks.



On 5/22/14 8:37 AM, Bill Arlofski wrote:
 On 05/22/14 11:28, Kern Sibbald wrote:
 Hello Bill,

 I have also pushed a patch that may well fix the problem you are
 having with cancel.  I have never been able to reproduce the problem,
 but I did yet another rewrite of the sellist routine as well as
 designed a number of tests, none of which every failed.  However, in
 the process I noticed that the source code that called the sellist
 methods was using the wrong calling sequence (my own fault).  I am
 pretty sure that is what was causing your problem.  In any case, this
 new code is in the current git public repo and I would appreciate it
 if you would test it.

 Best regards,
 Kern


 Hi Kern, I saw that you wrote the above as an add-on to another thread,
 I am posting it here so that this thread is complete too.

 I currently don't have time to test this, but perhaps Stephen who is
 also seeing this issue might.

 I will test it as soon as I have some free time, unless of course
 Stephen or someone else has confirmed that the patch fixes the issue.

 Thanks Kern!


 Bill


 --
 Bill Arlofski
 Reverse Polarity, LLC
 http://www.revpol.com/
 -- Not responsible for anything below this line --


-- 
Stephen Thompson   Berkeley Seismological Laboratory
step...@seismo.berkeley.edu215 McCone Hall # 4760
510.214.6506 (phone)   University of California, Berkeley
510.643.5811 (fax) Berkeley, CA 94720-4760

--
HPCC Systems Open Source Big Data Platform from LexisNexis Risk Solutions
Find What Matters Most in Your Big Data with HPCC Systems
Open Source. Fast. Scalable. Simple. Ideal for Dirty Data.
Leverages Graph Analysis for Fast Processing  Easy Data Exploration
http://p.sf.net/sfu/hpccsystems
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


[Bacula-users] issue with setuid/gid on restored files

2014-07-22 Thread Stephen Thompson


Sorry if I have not researched this enough before bringing it to the 
list, but what I'm seeing is very odd.  Someone else must have run into 
this before me.

If I restore a setuid or setgid file, the file is restored without the 
setuid/setgid bit set.  However, the directory containing the file 
(which did not have it's setuid/setgid bit set during the backup) winds 
up with the setuid/setgid bit being set.

If I restore both the directory and the file, the directory ends up with 
the proper non-setuid/setgid attributes, but the file once again ends 
up without the setuid/setgid bit set.  I'm assuming the directory has 
the bit set during an interim stage of the restore, but is then properly 
set when it's attributes are set during the restore (which must happen 
after the files that it contains).

I can't say authoritatively, but I don't believe this is the way bacula 
used to behave for me.  And to say the least, this is far from 
acceptable.  I discovered this during a bare metal restore, and have 
loads of issues from no setuid or setgid bits being set on the restored 
system.

thanks,
Stephen
-- 
Stephen Thompson   Berkeley Seismological Laboratory
step...@seismo.berkeley.edu215 McCone Hall # 4760
510.214.6506 (phone)   University of California, Berkeley
510.643.5811 (fax) Berkeley, CA 94720-4760

--
Want fast and easy access to all the code in your enterprise? Index and
search up to 200,000 lines of code with a free copy of Black Duck
Code Sight - the same software that powers the world's largest code
search on Ohloh, the Black Duck Open Hub! Try it now.
http://p.sf.net/sfu/bds
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] issue with setuid/gid on restored files

2014-07-22 Thread Stephen Thompson

I'm running 7.0.4.



Here's an example...

(before backup)
# ls -ld /bin
dr-xr-xr-x 2 root root 4096 Jul 22 09:56 /bin
# ls -l /bin/ping
-rwsr-xr-x 1 root root 40760 Sep 17  2013 /bin/ping

(after restore selecting file /bin/ping)
# ls -ld  /bin
drwsr-xr-x 2 root root 4096 Jul 22 14:38 bin
# ls -l /bin/ping
-rwxr-xr-x 1 root root 40760 Sep 17  2013 ping

(after restore selecting file /bin/ping and directory /bin)
# ls -ld  /bin
dr-xr-xr-x 2 root root 4096 Jul 22 14:38 bin
# ls -l /bin/ping
-rwxr-xr-x 1 root root 40760 Sep 17  2013 ping


In the first restore case, looks like the dir has user-write permissions 
as well, which isn't right, but perhaps that comes from the umask of the 
restore since the directory wasn't part of the restore selection. 
However, the setuid bit certainly wouldn't be coming from the umask. 
I'm jumping to the conclusion that whatever's doing the setuid bit is 
messing up and doing it to the parent directory instead of to the file.

Stephen





On 7/22/14 2:58 PM, Stephen Thompson wrote:


 Sorry if I have not researched this enough before bringing it to the
 list, but what I'm seeing is very odd.  Someone else must have run into
 this before me.

 If I restore a setuid or setgid file, the file is restored without the
 setuid/setgid bit set.  However, the directory containing the file
 (which did not have it's setuid/setgid bit set during the backup) winds
 up with the setuid/setgid bit being set.

 If I restore both the directory and the file, the directory ends up with
 the proper non-setuid/setgid attributes, but the file once again ends
 up without the setuid/setgid bit set.  I'm assuming the directory has
 the bit set during an interim stage of the restore, but is then properly
 set when it's attributes are set during the restore (which must happen
 after the files that it contains).

 I can't say authoritatively, but I don't believe this is the way bacula
 used to behave for me.  And to say the least, this is far from
 acceptable.  I discovered this during a bare metal restore, and have
 loads of issues from no setuid or setgid bits being set on the restored
 system.

 thanks,
 Stephen

-- 
Stephen Thompson   Berkeley Seismological Laboratory
step...@seismo.berkeley.edu215 McCone Hall # 4760
510.214.6506 (phone)   University of California, Berkeley
510.643.5811 (fax) Berkeley, CA 94720-4760

--
Want fast and easy access to all the code in your enterprise? Index and
search up to 200,000 lines of code with a free copy of Black Duck
Code Sight - the same software that powers the world's largest code
search on Ohloh, the Black Duck Open Hub! Try it now.
http://p.sf.net/sfu/bds
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


  1   2   >