Re: [Bacula-users] SD crashes when working with S3 (Ceph)

2020-05-14 Thread Martin Simmons
Is this still loading the driver from
/usr/lib64/bacula-sd-cloud-driver-9.6.3.so?  It is a little strange that you
have bacula in /opt/bacula/bin/bacula-sd but the plugins are in /usr/lib64.

Please also post the output from:

objdump -t /usr/lib64/bacula-sd-cloud-driver-9.6.3.so | grep _driver

Do you also have /opt/bacula/plugins/bacula-sd-cloud-driver-9.6.3.so?

__Martin


> On Thu, 14 May 2020 08:49:14 +0200, Phillip Dale said:
> 
> I could not get much information out of that traceback. Hopefully this helps, 
> so here is the traceback file I got:
> 
> [New LWP 19474]
> [New LWP 19470]
> [New LWP 19315]
> [Thread debugging using libthread_db enabled]
> Using host libthread_db library "/lib64/libthread_db.so.1".
> 0x7feb40a409a3 in select () from /usr/lib64/libc.so.6
> $1 = "12-May-2020 22:44:03\000\000\000\000\000\000\000\000\000"
> $2 = '\000' 
> $3 = 0x231aeb8 "bacula-sd"
> $4 = 0x231aef8 "/opt/bacula/bin/bacula-sd"
> $5 = 0x0
> $6 = '\000' 
> $7 = 0x7feb420b443f "9.6.3 (09 March 2020)"
> $8 = 0x7feb420b4463 "x86_64-pc-linux-gnu"
> $9 = 0x7feb420b4477 "redhat"
> $10 = 0x7feb420b447e "(Core)"
> $11 = "backup.novalocal", '\000' 
> $12 = 0x7feb420b4455 "redhat (Core)"
> Environment variable "TestName" not defined.
> #0  0x7feb40a409a3 in select () from /usr/lib64/libc.so.6
> #1  0x7feb4204e6cc in bnet_thread_server (addrs=0x231f6f8, 
> max_clients=41, client_wq=0x630d80  _workq>, handle_client_request=0x40ebd8 ) 
> at bnet_server.c:166
> #2  0x0040a347 in main (argc=0, argv=0x7fffaaa99890) at stored.c:327
> 
> Thread 4 (Thread 0x7feb387e3700 (LWP 19315)):
> #0  0x7feb41e1cde2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from 
> /usr/lib64/libpthread.so.0
> #1  0x7feb420a1ae1 in watchdog_thread (arg=0x0) at watchdog.c:299
> #2  0x7feb41e18ea5 in start_thread () from /usr/lib64/libpthread.so.0
> #3  0x7feb40a498dd in clone () from /usr/lib64/libc.so.6
> 
> Thread 3 (Thread 0x7feb38fe4700 (LWP 19470)):
> #0  0x7feb41e201d9 in waitpid () from /usr/lib64/libpthread.so.0
> #1  0x7feb42093f88 in signal_handler (sig=11) at signal.c:233
> #2  
> #3  0x7feb41e1ad00 in pthread_mutex_lock () from 
> /usr/lib64/libpthread.so.0
> #4  0x7feb420abf31 in lmgr_p (m=0x10) at lockmgr.c:106
> #5  0x7feb420ae7b8 in lock_guard::lock_guard (this=0x7feb38fe3510, 
> mutex=...) at ../lib/lockmgr.h:2
> 89
> #6  0x7feb37dd8296 in cloud_proxy::volume_lookup (this=0x0, 
> volume=0x7feb3000acc8 "Vol-0003") at cl
> oud_parts.c:229
> #7  0x7feb37dd1658 in cloud_dev::probe_cloud_proxy (this=0x7feb3000a878, 
> dcr=0x7feb300146d8, VolNam
> e=0x7feb3000acc8 "Vol-0003", force=false) at cloud_dev.c:1217
> #8  0x7feb37dd0593 in cloud_dev::open_device (this=0x7feb3000a878, 
> dcr=0x7feb300146d8, omode=2) at
> cloud_dev.c:1025
> #9  0x7feb4251ce8a in DCR::mount_next_write_volume (this=0x7feb300146d8) 
> at mount.c:191
> #10 0x7feb424f6d32 in acquire_device_for_append (dcr=0x7feb300146d8) at 
> acquire.c:420
> #11 0x0040c325 in do_append_data (jcr=0x7feb38e8) at append.c:102
> #12 0x00416ed7 in append_data_cmd (jcr=0x7feb38e8) at 
> fd_cmds.c:263
> #13 0x00416b68 in do_client_commands (jcr=0x7feb38e8) at 
> fd_cmds.c:218
> #14 0x0041688a in run_job (jcr=0x7feb38e8) at fd_cmds.c:167
> #15 0x00418c58 in run_cmd (jcr=0x7feb38e8) at job.c:240
> #16 0x0040f196 in handle_connection_request (arg=0x2340ef8) at 
> dircmd.c:242
> #17 0x7feb420a2b54 in workq_server (arg=0x630d80 ) at 
> workq.c:372
> #18 0x7feb41e18ea5 in start_thread () from /usr/lib64/libpthread.so.0
> #19 0x7feb40a498dd in clone () from /usr/lib64/libc.so.6
> 
> Thread 2 (Thread 0x7feb368e2700 (LWP 19474)):
> #0  0x7feb41e1cde2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from 
> /usr/lib64/libpthread.so.0
> #1  0x7feb420a294b in workq_server (arg=0x630d80 ) at 
> workq.c:349
> #2  0x7feb41e18ea5 in start_thread () from /usr/lib64/libpthread.so.0
> #3  0x7feb40a498dd in clone () from /usr/lib64/libc.so.6
> 
> Thread 1 (Thread 0x7feb42b90880 (LWP 19313)):
> #0  0x7feb40a409a3 in select () from /usr/lib64/libc.so.6
> #1  0x7feb4204e6cc in bnet_thread_server (addrs=0x231f6f8, 
> max_clients=41, client_wq=0x630d80 , 
> handle_client_request=0x40ebd8 ) at 
> bnet_server.c:166
> #2  0x0040a347 in main (argc=0, argv=0x7fffaaa99890) at stored.c:327
> #0  0x7feb40a409a3 in select () from /usr/lib64/libc.so.6
> No symbol table info available.
> #1  0x7feb4204e6cc in bnet_thread_server (addrs=0x231f6f8, 
> max_clients=41, client_wq=0x630d80 , 
> handle_client_request=0x40ebd8 ) at 
> bnet_server.c:166
> 166   if ((stat = select(maxfd + 1, , NULL, NULL, NULL)) < 0) 
> {
> maxfd = 5
> sockset = {fds_bits = {32, 0 }}
> clilen = 16
> turnon = 1
> buf = "188.95.226.225", '\000' 
> allbuf = "0.0.0.0:9103 \000\000\000\060\215\251\252\377\177\000\000 
> 

Re: [Bacula-users] SD crashes when working with S3 (Ceph)

2020-05-14 Thread Radosław Korzeniewski
Hello,

czw., 14 maj 2020 o 08:50 Phillip Dale 
napisał(a):

> I could not get much information out of that traceback. Hopefully this
> helps, so here is the traceback file I got:
>

It is an almost perfect traceback. :)


> Thread 3 (Thread 0x7feb38fe4700 (LWP 19470)):
> #0  0x7feb41e201d9 in waitpid () from /usr/lib64/libpthread.so.0
> #1  0x7feb42093f88 in signal_handler (sig=11) at signal.c:233
> #2  
> #3  0x7feb41e1ad00 in pthread_mutex_lock () from
> /usr/lib64/libpthread.so.0
> #4  0x7feb420abf31 in lmgr_p (m=0x10) at lockmgr.c:106
> #5  0x7feb420ae7b8 in lock_guard::lock_guard (this=0x7feb38fe3510,
> mutex=...) at ../lib/lockmgr.h:2
> 89
> #6  0x7feb37dd8296 in cloud_proxy::volume_lookup (this=0x0,
> volume=0x7feb3000acc8 "Vol-0003") at cl
> oud_parts.c:229
>

The problem is here ^^^ as variable "this" should not be NULL (0x0) and
this causes your SEGSIGV.
The author of the plugin should check why it get NULL and correct the code
to avoid such problems.

best regards
-- 
Radosław Korzeniewski
rados...@korzeniewski.net
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] SD crashes when working with S3 (Ceph)

2020-05-14 Thread Phillip Dale
I could not get much information out of that traceback. Hopefully this helps, 
so here is the traceback file I got:

[New LWP 19474]
[New LWP 19470]
[New LWP 19315]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
0x7feb40a409a3 in select () from /usr/lib64/libc.so.6
$1 = "12-May-2020 22:44:03\000\000\000\000\000\000\000\000\000"
$2 = '\000' 
$3 = 0x231aeb8 "bacula-sd"
$4 = 0x231aef8 "/opt/bacula/bin/bacula-sd"
$5 = 0x0
$6 = '\000' 
$7 = 0x7feb420b443f "9.6.3 (09 March 2020)"
$8 = 0x7feb420b4463 "x86_64-pc-linux-gnu"
$9 = 0x7feb420b4477 "redhat"
$10 = 0x7feb420b447e "(Core)"
$11 = "backup.novalocal", '\000' 
$12 = 0x7feb420b4455 "redhat (Core)"
Environment variable "TestName" not defined.
#0  0x7feb40a409a3 in select () from /usr/lib64/libc.so.6
#1  0x7feb4204e6cc in bnet_thread_server (addrs=0x231f6f8, max_clients=41, 
client_wq=0x630d80 , handle_client_request=0x40ebd8 ) at 
bnet_server.c:166
#2  0x0040a347 in main (argc=0, argv=0x7fffaaa99890) at stored.c:327

Thread 4 (Thread 0x7feb387e3700 (LWP 19315)):
#0  0x7feb41e1cde2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from 
/usr/lib64/libpthread.so.0
#1  0x7feb420a1ae1 in watchdog_thread (arg=0x0) at watchdog.c:299
#2  0x7feb41e18ea5 in start_thread () from /usr/lib64/libpthread.so.0
#3  0x7feb40a498dd in clone () from /usr/lib64/libc.so.6

Thread 3 (Thread 0x7feb38fe4700 (LWP 19470)):
#0  0x7feb41e201d9 in waitpid () from /usr/lib64/libpthread.so.0
#1  0x7feb42093f88 in signal_handler (sig=11) at signal.c:233
#2  
#3  0x7feb41e1ad00 in pthread_mutex_lock () from /usr/lib64/libpthread.so.0
#4  0x7feb420abf31 in lmgr_p (m=0x10) at lockmgr.c:106
#5  0x7feb420ae7b8 in lock_guard::lock_guard (this=0x7feb38fe3510, 
mutex=...) at ../lib/lockmgr.h:2
89
#6  0x7feb37dd8296 in cloud_proxy::volume_lookup (this=0x0, 
volume=0x7feb3000acc8 "Vol-0003") at cl
oud_parts.c:229
#7  0x7feb37dd1658 in cloud_dev::probe_cloud_proxy (this=0x7feb3000a878, 
dcr=0x7feb300146d8, VolNam
e=0x7feb3000acc8 "Vol-0003", force=false) at cloud_dev.c:1217
#8  0x7feb37dd0593 in cloud_dev::open_device (this=0x7feb3000a878, 
dcr=0x7feb300146d8, omode=2) at
cloud_dev.c:1025
#9  0x7feb4251ce8a in DCR::mount_next_write_volume (this=0x7feb300146d8) at 
mount.c:191
#10 0x7feb424f6d32 in acquire_device_for_append (dcr=0x7feb300146d8) at 
acquire.c:420
#11 0x0040c325 in do_append_data (jcr=0x7feb38e8) at append.c:102
#12 0x00416ed7 in append_data_cmd (jcr=0x7feb38e8) at fd_cmds.c:263
#13 0x00416b68 in do_client_commands (jcr=0x7feb38e8) at 
fd_cmds.c:218
#14 0x0041688a in run_job (jcr=0x7feb38e8) at fd_cmds.c:167
#15 0x00418c58 in run_cmd (jcr=0x7feb38e8) at job.c:240
#16 0x0040f196 in handle_connection_request (arg=0x2340ef8) at 
dircmd.c:242
#17 0x7feb420a2b54 in workq_server (arg=0x630d80 ) at 
workq.c:372
#18 0x7feb41e18ea5 in start_thread () from /usr/lib64/libpthread.so.0
#19 0x7feb40a498dd in clone () from /usr/lib64/libc.so.6

Thread 2 (Thread 0x7feb368e2700 (LWP 19474)):
#0  0x7feb41e1cde2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from 
/usr/lib64/libpthread.so.0
#1  0x7feb420a294b in workq_server (arg=0x630d80 ) at 
workq.c:349
#2  0x7feb41e18ea5 in start_thread () from /usr/lib64/libpthread.so.0
#3  0x7feb40a498dd in clone () from /usr/lib64/libc.so.6

Thread 1 (Thread 0x7feb42b90880 (LWP 19313)):
#0  0x7feb40a409a3 in select () from /usr/lib64/libc.so.6
#1  0x7feb4204e6cc in bnet_thread_server (addrs=0x231f6f8, max_clients=41, 
client_wq=0x630d80 , handle_client_request=0x40ebd8 
) at bnet_server.c:166
#2  0x0040a347 in main (argc=0, argv=0x7fffaaa99890) at stored.c:327
#0  0x7feb40a409a3 in select () from /usr/lib64/libc.so.6
No symbol table info available.
#1  0x7feb4204e6cc in bnet_thread_server (addrs=0x231f6f8, max_clients=41, 
client_wq=0x630d80 , handle_client_request=0x40ebd8 
) at bnet_server.c:166
166   if ((stat = select(maxfd + 1, , NULL, NULL, NULL)) < 0) {
maxfd = 5
sockset = {fds_bits = {32, 0 }}
clilen = 16
turnon = 1
buf = "188.95.226.225", '\000' 
allbuf = "0.0.0.0:9103 \000\000\000\060\215\251\252\377\177\000\000 
\215\251\252\377\177\000\000!\000\000\000\000\000\000\000$\274UA\353\177\000\000\000\000\000\000\000\000\000\000hG\271B\353\177\000\000\000\200\271B\353\177\000\000U\301UA\353\177\000\000\320\277\225@\353\177\000\000\360\027TA\353\177\000\000\000\000\000\000\001\000\000\000t\004\000\000\001\000\000\000\334\367\063\002\000\000\000\000\350\215\251\252\377\177\000\000\300\215\251\252\377\177\000\000\001\000\000\000\000\000\000\000hG\271B\353\177\000\000\370\254\271B\353\177\000\000\230\251\271B\353\177\000\000\217`\230B\353\177\000\000\000\000\000\000\000\000\000\000hG\271B\353\177\000\000\001\000\000\000\377\177\000\000\000\000\000\000\000\000\000\000"...
stat = 0
tlog = 0
fd_ptr = 0x0
sockfds = { = {}, 

Re: [Bacula-users] SD crashes when working with S3 (Ceph)

2020-05-13 Thread Martin Simmons
> On Wed, 13 May 2020 15:39:56 +0200, Phillip Dale said:
> 
> Hi all,
> 
> I just joined this list, so not sure if this should go here or in the 
> development list. I have the same issue that Rick Tuk has from his post on 
> May 07.
> 
> I am running on CentOS 7 and everything works fine until I try to use Ceph S3 
> or Amazon S3 storage. At this time, the bacula-sd crashes. My setup is very 
> similar to the one in his post.
> Not sure about where to go from here. Hoping for some help.
> 
> Here is the traceback from running bacula-sd with -q20:
> 
> backup.novalocal-sd: init_dev.c:437-0 Open SD driver at 
> /usr/lib64/bacula-sd-cloud-driver-9.6.3.so
> backup.novalocal-sd: init_dev.c:442-0 Lookup "BaculaSDdriver" in driver=cloud
> backup.novalocal-sd: init_dev.c:444-0 Driver=cloud entry point=7feb37dcc907
> backup.novalocal-sd: stored.c:615-0 SD init done CephStorage (0x7feb30008818)
> backup.novalocal-sd: init_dev.c:469-0 SD driver=cloud is already loaded.
> backup.novalocal-sd: stored.c:615-0 SD init done S3CloudStorage 
> (0x7feb3000a878)
> backup.novalocal-sd: stored.c:615-0 SD init done TmpFileStorage 
> (0x7feb3000c928)
> backup.novalocal-sd: bnet_server.c:86-0 Addresses 0.0.0.0:9103
> List plugins. Hook count=0
> Bacula interrupted by signal 11: Segmentation violation
> Kaboom! bacula-sd, backup.novalocal-sd got signal 11 - Segmentation violation 
> at 12-May-2020 22:44:03. Attempting traceback.
> Kaboom! exepath=/opt/bacula/bin/
> Calling: /opt/bacula/bin/btraceback /opt/bacula/bin/bacula-sd 19313 
> /opt/bacula/working
> It looks like the traceback worked...
> LockDump: /opt/bacula/working/bacula.19313.traceback

Did it send you an email with the traceback?  That might contain more
information.

If you can't find the email, then look in
/opt/bacula/working/bacula.19313.traceback.

__Martin


___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


[Bacula-users] SD crashes when working with S3 (Ceph)

2020-05-13 Thread Phillip Dale


Hi all,

I just joined this list, so not sure if this should go here or in the 
development list. I have the same issue that Rick Tuk has from his post on May 
07.

I am running on CentOS 7 and everything works fine until I try to use Ceph S3 
or Amazon S3 storage. At this time, the bacula-sd crashes. My setup is very 
similar to the one in his post.
Not sure about where to go from here. Hoping for some help.

Here is the traceback from running bacula-sd with -q20:

backup.novalocal-sd: init_dev.c:437-0 Open SD driver at 
/usr/lib64/bacula-sd-cloud-driver-9.6.3.so
backup.novalocal-sd: init_dev.c:442-0 Lookup "BaculaSDdriver" in driver=cloud
backup.novalocal-sd: init_dev.c:444-0 Driver=cloud entry point=7feb37dcc907
backup.novalocal-sd: stored.c:615-0 SD init done CephStorage (0x7feb30008818)
backup.novalocal-sd: init_dev.c:469-0 SD driver=cloud is already loaded.
backup.novalocal-sd: stored.c:615-0 SD init done S3CloudStorage (0x7feb3000a878)
backup.novalocal-sd: stored.c:615-0 SD init done TmpFileStorage (0x7feb3000c928)
backup.novalocal-sd: bnet_server.c:86-0 Addresses 0.0.0.0:9103
List plugins. Hook count=0
Bacula interrupted by signal 11: Segmentation violation
Kaboom! bacula-sd, backup.novalocal-sd got signal 11 - Segmentation violation 
at 12-May-2020 22:44:03. Attempting traceback.
Kaboom! exepath=/opt/bacula/bin/
Calling: /opt/bacula/bin/btraceback /opt/bacula/bin/bacula-sd 19313 
/opt/bacula/working
It looks like the traceback worked...
LockDump: /opt/bacula/working/bacula.19313.traceback
backup.novalocal-sd: lockmgr.c:1221-8 lockmgr disabled
backup.novalocal-sd: smartall.c:411-2863311530 Orphaned buffer: 
backup.novalocal-sd 536 bytes at 2321f08 from bsockcore.c:157
backup.novalocal-sd: smartall.c:411-2863311530 Orphaned buffer: 
backup.novalocal-sd 280 bytes at 233fb68 from jcr.c:388
backup.novalocal-sd: smartall.c:411-2863311530 Orphaned buffer: 
backup.novalocal-sd 280 bytes at 233ff18 from jcr.c:390
backup.novalocal-sd: smartall.c:411-2863311530 Orphaned buffer: 
backup.novalocal-sd 536 bytes at 2340c28 from jcr.c:386
backup.novalocal-sd: smartall.c:411-2863311530 Orphaned buffer: 
backup.novalocal-sd 280 bytes at 7feb3f68 from jcr.c:384
backup.novalocal-sd: smartall.c:411-2863311530 Orphaned buffer: 
backup.novalocal-sd 280 bytes at 7feb30008068 from lib/mem_pool.h:85
backup.novalocal-sd: smartall.c:411-2863311530 Orphaned buffer: 
backup.novalocal-sd 4120 bytes at 2341c78 from bsockcore.c:156
backup.novalocal-sd: smartall.c:411-2863311530 Orphaned buffer: 
backup.novalocal-sd 4120 bytes at 2342cc8 from bsock.c:101
backup.novalocal-sd: smartall.c:411-2863311530 Orphaned buffer: 
backup.novalocal-sd 536 bytes at 7feb3000e2a8 from record_util.c:251
backup.novalocal-sd: smartall.c:411-2863311530 Orphaned buffer: 
backup.novalocal-sd 536 bytes at 7feb3000e5e8 from bsockcore.c:157
backup.novalocal-sd: smartall.c:411-2863311530 Orphaned buffer: 
backup.novalocal-sd 448 bytes at 2340ef8 from bsock.c:852
backup.novalocal-sd: smartall.c:411-2863311530 Orphaned buffer: 
backup.novalocal-sd 7 bytes at 2341138 from bsock.c:854
backup.novalocal-sd: smartall.c:411-2863311530 Orphaned buffer: 
backup.novalocal-sd 15 bytes at 23410f8 from bsock.c:855
backup.novalocal-sd: smartall.c:411-2863311530 Orphaned buffer: 
backup.novalocal-sd 16 bytes at 2341178 from workq.c:198
backup.novalocal-sd: smartall.c:411-2863311530 Orphaned buffer: 
backup.novalocal-sd 24 bytes at 7feb3f18 from jcr.c:372
backup.novalocal-sd: smartall.c:411-2863311530 Orphaned buffer: 
backup.novalocal-sd 32 bytes at 7feb30009d58 from dircmd.c:194
backup.novalocal-sd: smartall.c:411-2863311530 Orphaned buffer: 
backup.novalocal-sd 24 bytes at 7feb30009db8 from askdir.c:575
backup.novalocal-sd: smartall.c:411-2863311530 Orphaned buffer: 
backup.novalocal-sd 154 bytes at 7feb3000f198 from job.c:126
backup.novalocal-sd: smartall.c:411-2863311530 Orphaned buffer: 
backup.novalocal-sd 154 bytes at 7feb3000f268 from job.c:129
backup.novalocal-sd: smartall.c:411-2863311530 Orphaned buffer: 
backup.novalocal-sd 154 bytes at 7feb3000f338 from job.c:132
backup.novalocal-sd: smartall.c:411-2863311530 Orphaned buffer: 
backup.novalocal-sd 154 bytes at 7feb3000f408 from job.c:141
backup.novalocal-sd: smartall.c:411-2863311530 Orphaned buffer: 
backup.novalocal-sd 40 bytes at 7feb300141d8 from job.c:161
backup.novalocal-sd: smartall.c:411-2863311530 Orphaned buffer: 
backup.novalocal-sd 32 bytes at 7feb30014238 from reserve.c:283
backup.novalocal-sd: smartall.c:411-2863311530 Orphaned buffer: 
backup.novalocal-sd 80 bytes at 7feb30014518 from alist.c:55
backup.novalocal-sd: smartall.c:411-2863311530 Orphaned buffer: 
backup.novalocal-sd 32 bytes at 7feb300145a8 from reserve.c:306
backup.novalocal-sd: smartall.c:411-2863311530 Orphaned buffer: 
backup.novalocal-sd 15 bytes at 7feb30014608 from reserve.c:321
backup.novalocal-sd: smartall.c:411-2863311530 Orphaned buffer: 
backup.novalocal-sd 80 bytes at 7feb30014648 from alist.c:55

Re: [Bacula-users] SD crashes

2012-02-17 Thread joenyland
On 16 Feb, 2012,at 05:25 PM, Martin Simmons mar...@lispworks.com wrote: On Wed, 15 Feb 2012 17:46:33 +, Joe Nyland said:On 15 Feb 2012, at 16:52, Martin Simmons wrote: On Wed, 15 Feb 2012 13:31:10 + (GMT), Joe Nyland said:  On 14 Feb, 2012,at 03:47 PM, Martin Simmons mar...@lispworks.com wrote:  On Tue, 14 Feb 2012 10:34:31 -0500, John Drescher said: I've been running the SD using the following command (I know the combination  of options I have used may be excessive, but I wanted as much chance of  catching the error as I could!) since yesterday afternoon:  sudo bacula-sd -c /etc/bacula/bacula-sd.conf -d 100 -dt -f -u bacula -g  tape -m -v | tee -a /mnt/array/bacula-sd.screen.logHowever, (as luck would have it) I've not seen the behaviour I originally  reported whilst running with debug options.Is there any way in which running the SD with the combination of options I  have used above, could cause any different behaviour of the SD? Or interfere  in any way with it? I'm asking, becuase I have re-enabled all of the backups  jobs I have on the server, and I have still not seen it crash again.Could be a timing issue that the delay in writing the log causes the   bad behavior to not happen.. Those types of problems are hard to   debug.  Running it under gdb without the debug options is better approach in that   case.  http://www.bacula.org/5.2.x-manuals/en/problems/problems/What_Do_When_Bacula.html#SECTION0064  When it 'crashes' (though it sounds more like 'hangs' is a better word),   interrupt gdb with Ctrl-c to get back to the gdb shell window (as in step 8).  __Martin  --   Keep Your Developer Skills Current with LearnDevNow!   The most comprehensive online learning library for Microsoft developers   is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,   Metro Style Apps, more. Free future releases when you subscribe now!   http://p.sf.net/sfu/learndevnow-d2d   ___   Bacula-users mailing list   Bacula-users@lists.sourceforge.net   https://lists.sourceforge.net/lists/listinfo/bacula-usersMartin and John, thank you for your replies.  I since yesterday afternoon, bacula-sd has been running under gdb using the instructions in the manual for my Bacula version, however I've still not seen the issue originally reported.  I agree with you both that by running it under a debugging process, it seems a delay is introduced which is suppressing the error in some way. Is my best bet just to leave bacula-sd running under gdb and hope that my full backups over the weekend may highlight the issue? Or is there another way I could debug this?  Assuming the error causes the SD to hang (rather than exit), then you could   run it without gdb and then attach gdb to it when it hangs (use gdb -p $pid).  __Martin  --   Virtualization  Cloud Management Using Capacity Planning   Cloud computing makes use of virtualization - but cloud computingalso focuses on allowing computing to be delivered as a service.   http://www.accelacomm.com/jaw/sfnl/114/51521223/   ___   Bacula-users mailing list   Bacula-users@lists.sourceforge.net   https://lists.sourceforge.net/lists/listinfo/bacula-usersOk, that sounds reasonable.One question, does bacula-sd need to be running with "-s no signals (for debugging)" or will gdb be able to provide enough info without this option? (My default options on Ubuntu 10.04 are: -c config file -u user -g group)Would using the -s option introduce the sort of delay we mentioned earlier, and thus limit my changes of reproducing the issue?  I don't think -s will have any effect on delays, but you shouldn't need it if you attach gdb to the hanging process.  __Martin  -- Virtualization  Cloud Management Using Capacity Planning Cloud computing makes use of virtualization - but cloud computing  also focuses on allowing computing to be delivered as a service. http://www.accelacomm.com/jaw/sfnl/114/51521223/ ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-usersMartin,Thanks for confirming. I though as much, but didn't want to waste the opportunity of catching the error, so though it best to check.As luck would have it, however, I've not seen the SD hang since re-running it without any of the additional debugging options and not running it under gdb (so back to how it has been running normally, leading up to this issue). It's annoying that it seemed to be every couple of hours it would hang, 
then I'd have to restart everything. Now it seems I can't even get it 

Re: [Bacula-users] SD crashes

2012-02-16 Thread Martin Simmons
 On Wed, 15 Feb 2012 17:46:33 +, Joe Nyland said:
 
 On 15 Feb 2012, at 16:52, Martin Simmons wrote:
 
  On Wed, 15 Feb 2012 13:31:10 + (GMT), Joe Nyland said:
  
  On 14 Feb, 2012,at 03:47 PM, Martin Simmons mar...@lispworks.com wrote:
  
  On Tue, 14 Feb 2012 10:34:31 -0500, John Drescher said:
  
 I've been running the SD using the following command (I know the 
 combination
 of options I have used may be excessive, but I wanted as much chance of
 catching the error as I could!) since yesterday afternoon:
   sudo bacula-sd -c /etc/bacula/bacula-sd.conf -d 100 -dt -f -u bacula -g
 tape -m -v | tee -a /mnt/array/bacula-sd.screen.log
 
 However, (as luck would have it) I've not seen the behaviour I originally
 reported whilst running with debug options.
 
 Is there any way in which running the SD with the combination of options I
 have used above, could cause any different behaviour of the SD? Or 
 interfere
 in any way with it? I'm asking, becuase I have re-enabled all of the 
 backups
 jobs I have on the server, and I have still not seen it crash again.
 
  
  Could be a timing issue that the delay in writing the log causes the
  bad behavior to not happen.. Those types of problems are hard to
  debug.
  
  Running it under gdb without the debug options is better approach in that
  case.
  
  http://www.bacula.org/5.2.x-manuals/en/problems/problems/What_Do_When_Bacula.html#SECTION0064
  
  When it 'crashes' (though it sounds more like 'hangs' is a better word),
  interrupt gdb with Ctrl-c to get back to the gdb shell window (as in step 
  8).
  
  __Martin
  
  --
  Keep Your Developer Skills Current with LearnDevNow!
  The most comprehensive online learning library for Microsoft developers
  is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
  Metro Style Apps, more. Free future releases when you subscribe now!
  http://p.sf.net/sfu/learndevnow-d2d
  ___
  Bacula-users mailing list
  Bacula-users@lists.sourceforge.net
  https://lists.sourceforge.net/lists/listinfo/bacula-users
  
  
  
  Martin and John, thank you for your replies.
  
  I since yesterday afternoon, bacula-sd has been running under gdb using 
  the instructions in the manual for my Bacula version, however I've still 
  not seen the issue originally reported.
  
  I agree with you both that by running it under a debugging process, it 
  seems a delay is introduced which is suppressing the error in some way. Is 
  my best bet just to leave bacula-sd running under gdb and hope that my 
  full backups over the weekend may highlight the issue? Or is there another 
  way I could debug this?
  
  Assuming the error causes the SD to hang (rather than exit), then you could
  run it without gdb and then attach gdb to it when it hangs (use gdb -p 
  $pid).
  
  __Martin
  
  --
  Virtualization  Cloud Management Using Capacity Planning
  Cloud computing makes use of virtualization - but cloud computing 
  also focuses on allowing computing to be delivered as a service.
  http://www.accelacomm.com/jaw/sfnl/114/51521223/
  ___
  Bacula-users mailing list
  Bacula-users@lists.sourceforge.net
  https://lists.sourceforge.net/lists/listinfo/bacula-users
 
 Ok, that sounds reasonable.
 
 One question, does bacula-sd need to be running with -s  no 
 signals (for debugging) or will gdb be able to provide enough info without 
 this option? (My default options on Ubuntu 10.04 are: -c config file -u 
 user -g group)
 
 Would using the -s option introduce the sort of delay we mentioned earlier, 
 and thus limit my changes of reproducing the issue?

I don't think -s will have any effect on delays, but you shouldn't need it if
you attach gdb to the hanging process.

__Martin

--
Virtualization  Cloud Management Using Capacity Planning
Cloud computing makes use of virtualization - but cloud computing 
also focuses on allowing computing to be delivered as a service.
http://www.accelacomm.com/jaw/sfnl/114/51521223/
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] SD crashes

2012-02-15 Thread Joe Nyland
On 14 Feb, 2012,at 03:47 PM, Martin Simmons mar...@lispworks.com wrote: On Tue, 14 Feb 2012 10:34:31 -0500, John Drescher said: I've been running the SD using the following command (I know the combination   of options I have used may be excessive, but I wanted as much chance of   catching the error as I could!) since yesterday afternoon:sudo bacula-sd -c /etc/bacula/bacula-sd.conf -d 100 -dt -f -u bacula -g   tape -m -v | tee -a /mnt/array/bacula-sd.screen.log However, (as luck would have it) I've not seen the behaviour I originally   reported whilst running with debug options. Is there any way in which running the SD with the combination of options I   have used above, could cause any different behaviour of the SD? Or interfere   in any way with it? I'm asking, becuase I have re-enabled all of the backups   jobs I have on the server, and I have still not seen it crash again.  Could be a timing issue that the delay in writing the log causes the  bad behavior to not happen.. Those types of problems are hard to  debug.  Running it under gdb without the debug options is better approach in that case.  http://www.bacula.org/5.2.x-manuals/en/problems/problems/What_Do_When_Bacula.html#SECTION0064  When it 'crashes' (though it sounds more like 'hangs' is a better word), interrupt gdb with Ctrl-c to get back to the gdb shell window (as in step 8).  __Martin  -- Keep Your Developer Skills Current with LearnDevNow! The most comprehensive online learning library for Microsoft developers is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3, Metro Style Apps, more. Free future releases when you subscribe now! http://p.sf.net/sfu/learndevnow-d2d ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-usersMartin and John, thank you for your replies.I since yesterday afternoon, bacula-sd has been running under gdb using the instructions in the manual for my Bacula version, however I've still not seen the issue originally reported.I agree with you both that by running it under a debugging process, it seems a delay is introduced which is suppressing the error in some way. Is my best bet just to leave bacula-sd running under gdb and hope that my full backups over the weekend may highlight the issue? Or is there another way I could debug this?Thanks,Joe--
Virtualization  Cloud Management Using Capacity Planning
Cloud computing makes use of virtualization - but cloud computing 
also focuses on allowing computing to be delivered as a service.
http://www.accelacomm.com/jaw/sfnl/114/51521223/___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] SD crashes

2012-02-15 Thread Martin Simmons
 On Wed, 15 Feb 2012 13:31:10 + (GMT), Joe Nyland said:
 
 On 14 Feb, 2012,at 03:47 PM, Martin Simmons mar...@lispworks.com wrote:
 
   On Tue, 14 Feb 2012 10:34:31 -0500, John Drescher said:
  
I've been running the SD using the following command (I know the 
combination
of options I have used may be excessive, but I wanted as much chance of
catching the error as I could!) since yesterday afternoon:
   sudo bacula-sd -c /etc/bacula/bacula-sd.conf -d 100 -dt -f -u bacula 
-g
tape -m -v | tee -a /mnt/array/bacula-sd.screen.log
   
However, (as luck would have it) I've not seen the behaviour I 
originally
reported whilst running with debug options.
   
Is there any way in which running the SD with the combination of 
options I
have used above, could cause any different behaviour of the SD? Or 
interfere
in any way with it? I'm asking, becuase I have re-enabled all of the 
backups
jobs I have on the server, and I have still not seen it crash again.
   
  
   Could be a timing issue that the delay in writing the log causes the
   bad behavior to not happen.. Those types of problems are hard to
   debug.
 
  Running it under gdb without the debug options is better approach in that
  case.
 
  http://www.bacula.org/5.2.x-manuals/en/problems/problems/What_Do_When_Bacula.html#SECTION0064
 
  When it 'crashes' (though it sounds more like 'hangs' is a better word),
  interrupt gdb with Ctrl-c to get back to the gdb shell window (as in step 
  8).
 
  __Martin
 
  --
  Keep Your Developer Skills Current with LearnDevNow!
  The most comprehensive online learning library for Microsoft developers
  is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
  Metro Style Apps, more. Free future releases when you subscribe now!
  http://p.sf.net/sfu/learndevnow-d2d
  ___
  Bacula-users mailing list
  Bacula-users@lists.sourceforge.net
  https://lists.sourceforge.net/lists/listinfo/bacula-users
   
  
 
 Martin and John, thank you for your replies.
 
 I since yesterday afternoon, bacula-sd has been running under gdb using the 
 instructions in the manual for my Bacula version, however I've still not seen 
 the issue originally reported.
 
 I agree with you both that by running it under a debugging process, it seems 
 a delay is introduced which is suppressing the error in some way. Is my best 
 bet just to leave bacula-sd running under gdb and hope that my full backups 
 over the weekend may highlight the issue? Or is there another way I could 
 debug this?

Assuming the error causes the SD to hang (rather than exit), then you could
run it without gdb and then attach gdb to it when it hangs (use gdb -p $pid).

__Martin

--
Virtualization  Cloud Management Using Capacity Planning
Cloud computing makes use of virtualization - but cloud computing 
also focuses on allowing computing to be delivered as a service.
http://www.accelacomm.com/jaw/sfnl/114/51521223/
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] SD crashes

2012-02-15 Thread Joe Nyland
On 15 Feb 2012, at 16:52, Martin Simmons wrote:

 On Wed, 15 Feb 2012 13:31:10 + (GMT), Joe Nyland said:
 
 On 14 Feb, 2012,at 03:47 PM, Martin Simmons mar...@lispworks.com wrote:
 
 On Tue, 14 Feb 2012 10:34:31 -0500, John Drescher said:
 
 I've been running the SD using the following command (I know the 
 combination
 of options I have used may be excessive, but I wanted as much chance of
 catching the error as I could!) since yesterday afternoon:
   sudo bacula-sd -c /etc/bacula/bacula-sd.conf -d 100 -dt -f -u bacula -g
 tape -m -v | tee -a /mnt/array/bacula-sd.screen.log
 
 However, (as luck would have it) I've not seen the behaviour I originally
 reported whilst running with debug options.
 
 Is there any way in which running the SD with the combination of options I
 have used above, could cause any different behaviour of the SD? Or 
 interfere
 in any way with it? I'm asking, becuase I have re-enabled all of the 
 backups
 jobs I have on the server, and I have still not seen it crash again.
 
 
 Could be a timing issue that the delay in writing the log causes the
 bad behavior to not happen.. Those types of problems are hard to
 debug.
 
 Running it under gdb without the debug options is better approach in that
 case.
 
 http://www.bacula.org/5.2.x-manuals/en/problems/problems/What_Do_When_Bacula.html#SECTION0064
 
 When it 'crashes' (though it sounds more like 'hangs' is a better word),
 interrupt gdb with Ctrl-c to get back to the gdb shell window (as in step 
 8).
 
 __Martin
 
 --
 Keep Your Developer Skills Current with LearnDevNow!
 The most comprehensive online learning library for Microsoft developers
 is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
 Metro Style Apps, more. Free future releases when you subscribe now!
 http://p.sf.net/sfu/learndevnow-d2d
 ___
 Bacula-users mailing list
 Bacula-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/bacula-users
 
 
 
 Martin and John, thank you for your replies.
 
 I since yesterday afternoon, bacula-sd has been running under gdb using the 
 instructions in the manual for my Bacula version, however I've still not 
 seen the issue originally reported.
 
 I agree with you both that by running it under a debugging process, it seems 
 a delay is introduced which is suppressing the error in some way. Is my best 
 bet just to leave bacula-sd running under gdb and hope that my full backups 
 over the weekend may highlight the issue? Or is there another way I could 
 debug this?
 
 Assuming the error causes the SD to hang (rather than exit), then you could
 run it without gdb and then attach gdb to it when it hangs (use gdb -p $pid).
 
 __Martin
 
 --
 Virtualization  Cloud Management Using Capacity Planning
 Cloud computing makes use of virtualization - but cloud computing 
 also focuses on allowing computing to be delivered as a service.
 http://www.accelacomm.com/jaw/sfnl/114/51521223/
 ___
 Bacula-users mailing list
 Bacula-users@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/bacula-users

Ok, that sounds reasonable.

One question, does bacula-sd need to be running with -sno 
signals (for debugging) or will gdb be able to provide enough info without 
this option? (My default options on Ubuntu 10.04 are: -c config file -u 
user -g group)

Would using the -s option introduce the sort of delay we mentioned earlier, and 
thus limit my changes of reproducing the issue?

Thank you for your continued help.

Joe
--
Virtualization  Cloud Management Using Capacity Planning
Cloud computing makes use of virtualization - but cloud computing 
also focuses on allowing computing to be delivered as a service.
http://www.accelacomm.com/jaw/sfnl/114/51521223/___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] SD crashes

2012-02-14 Thread Joe Nyland
On 13 Feb, 2012,at 02:30 PM, "Joe Nyland" joenyl...@me.com wrote:On 13 Feb, 2012,at 02:11 PM, John Drescher dresche...@gmail.com wrote:2012/2/13 Joe Nyland joenyl...@me.com:  Hello everyone,   I hope someone would be able to offer any suggestions of why I am seeing the  following behaviour in my current Bacula setup:   Since the tail end of last week, I have been having issues with my MySQL  backups in Bacula, where they would randomly appear to 'crash', normally  when performing a copy of a backup to another pool - but I'm not sure yet if  this is the trigger.   Running 'status dir' after one of these 'crashes' gives the following output  for the running jobs:   Running Jobs:  Console connected at 12-Feb-12 15:53  Console connected at 13-Feb-12 06:58  JobId Level  NameStatus  ==   2107 Full  WebServer1_MySQL_Copy.2012-02-13_04.30.00_28 is running  Crashed Job   2108 Full  WebServer1_MySQL.2012-02-13_04.30.00_29 is running Crashed  Job   2111 Full  MythTVServer1_MySQL.2012-02-13_05.00.00_32 is waiting for  higher priority jobs to finish   2113 Full  TestServer_MySQL.2012-02-13_05.00.00_34 is waiting execution   2114 Full  MythTVServer1_MySQL_Copy.2012-02-13_05.30.00_35 is waiting  execution   2115 Full  WebServer1_MySQL_Copy.2012-02-13_05.30.00_36 is waiting  execution   2116 Full  WebServer1_MySQL.2012-02-13_05.30.00_37 has a fatal error   2117 Full  TestServer_MySQL_Copy.2012-02-13_05.30.00_38 is waiting  execution   2121 Full  MythTVServer1_MySQL_Copy.2012-02-13_06.30.00_42 is waiting  execution   2122 Full  WebServer1_MySQL_Copy.2012-02-13_06.30.00_43 is waiting  execution   2123 Full  WebServer1_MySQL.2012-02-13_06.30.00_44 has a fatal error   2124 Full  TestServer_MySQL_Copy.2012-02-13_06.30.00_45 is waiting  execution   2125 Full  MythTVServer1_MySQL.2012-02-13_07.00.00_47 has a fatal error   2126 Full  WebServer1_MySQL.2012-02-13_07.00.00_48 has a fatal error     Once the above appears, I am unable to view the status of any storage  resource on my SD:   *status storage=FileServer1_Full  Connecting to Storage daemon FileServer1_Full at FileServer1:9103   FileServer1-sd Version: 5.0.1 (24 February 2010) x86_64-pc-linux-gnu ubuntu  10.04  Daemon started 12-Feb-12 15:53, 92 Jobs run since started.  Heap: heap=1,671,168 smbytes=1,188,608 max_bytes=1,388,208 bufs=577  max_bufs=994  Sizes: boffset_t=8 size_t=8 int32_t=4 int64_t=8   Running Jobs:  Reading: Full Copy job WebServer1_MySQL_Copy JobId=2107  Volume="WebServer1_MySQL_1325"pool="WebServer1_MySQL" device="WebServer1_MySQL"  (/mnt/backup/Bacula/Databases/WebServer1)Files=4 Bytes=164,924 Bytes/sec=17FDSocket closed     Jobs waiting to reserve a drive:     Terminated Jobs:  JobId Level  Files   Bytes  Status  FinishedName  ===   2091 Full 2  92.45 K OK13-Feb-12 03:30  TestServer_MySQL_Copy   2096 Full 5  2.258 M OK13-Feb-12 03:30  MythTVServer1_MySQL_Copy   2098 Full 4  164.9 K OK13-Feb-12 03:30  WebServer1_MySQL_Copy   2100 Full 2  92.45 K OK13-Feb-12 03:30  TestServer_MySQL_Copy   2078 Full   1,145  2.942 G OK13-Feb-12 03:31 SVN_Copy   2102 Full 5  2.259 M OK13-Feb-12 04:01  MythTVServer1_MySQL   2103 Full 4  164.9 K OK13-Feb-12 04:01  WebServer1_MySQL   2104 Full 2  92.37 K OK13-Feb-12 04:01  TestServer_MySQL   2105 Full 5  2.259 M OK13-Feb-12 04:30  MythTVServer1_MySQL_Copy   2109 Full 2  92.37 K OK13-Feb-12 04:30  TestServer_MySQL_Copy     Device status:  Device "Default" (/mnt/backup/Bacula) is not open.  snip  Device "WebServer1_Inc" (/mnt/backup/Bacula/WebServer1/Incremental) is not  open.  Device "WebServer1_MySQL" (/mnt/backup/Bacula/Databases/WebServer1) is  mounted with:Volume:   WebServer1_MySQL_1325Pool:WebServer1_MySQLMedia type: FileTotal Bytes Read=0 Blocks Read=0 Bytes/block=0Positioned at File=0 Block=0  Device "WebServer1_MySQL_Copy" (/mnt/mac_backup/Bacula/Databases/WebServer1)  is not open.  Device "WebServer1_Full_Copy" (/mnt/mac_backup/Bacula/WebServer1/Full) is  not open.  Device "WebServer1_Inc_Copy"  (/mnt/mac_backup/Bacula/WebServer1/Incrementals) is not open.  snip  Device "SharedData_Diff" (/mnt/backup/Bacula/Shared/Differential) is not  open.     Used Volume status:   NOTE: bconsole appears to crash here - no further output is produced, and  bconsole does not respond to any key presses. I have to Ctrl + C to exit out  from bconsole. Furthermore, the only way I can clear our the failed jobs  from the 'Running jobs queue' is to exit from bconsole, issue 'sudo service  bacula-sd stop' twice, then restart the SD and restart bacula-director.What I have is for 4 of my clients I run a MySQL backup hourly at 00:00,  01:00,etc. I then copy the MySQL backups to another storage resource on my  SD at 00:30, 01:30, etc. The MySQL databases which I am 

Re: [Bacula-users] SD crashes

2012-02-14 Thread John Drescher
 I've been running the SD using the following command (I know the combination
 of options I have used may be excessive, but I wanted as much chance of
 catching the error as I could!) since yesterday afternoon:
    sudo bacula-sd -c /etc/bacula/bacula-sd.conf -d 100 -dt -f -u bacula -g
 tape -m -v | tee -a /mnt/array/bacula-sd.screen.log

 However, (as luck would have it) I've not seen the behaviour I originally
 reported whilst running with debug options.

 Is there any way in which running the SD with the combination of options I
 have used above, could cause any different behaviour of the SD? Or interfere
 in any way with it? I'm asking, becuase I have re-enabled all of the backups
 jobs I have on the server, and I have still not seen it crash again.


Could be a timing issue that the delay in writing the log causes the
bad behavior to not happen.. Those types of problems are hard to
debug.

John

--
Keep Your Developer Skills Current with LearnDevNow!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-d2d
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] SD crashes

2012-02-14 Thread Martin Simmons
 On Tue, 14 Feb 2012 10:34:31 -0500, John Drescher said:
 
  I've been running the SD using the following command (I know the combination
  of options I have used may be excessive, but I wanted as much chance of
  catching the error as I could!) since yesterday afternoon:
     sudo bacula-sd -c /etc/bacula/bacula-sd.conf -d 100 -dt -f -u bacula -g
  tape -m -v | tee -a /mnt/array/bacula-sd.screen.log
 
  However, (as luck would have it) I've not seen the behaviour I originally
  reported whilst running with debug options.
 
  Is there any way in which running the SD with the combination of options I
  have used above, could cause any different behaviour of the SD? Or interfere
  in any way with it? I'm asking, becuase I have re-enabled all of the backups
  jobs I have on the server, and I have still not seen it crash again.
 
 
 Could be a timing issue that the delay in writing the log causes the
 bad behavior to not happen.. Those types of problems are hard to
 debug.

Running it under gdb without the debug options is better approach in that
case.

http://www.bacula.org/5.2.x-manuals/en/problems/problems/What_Do_When_Bacula.html#SECTION0064

When it 'crashes' (though it sounds more like 'hangs' is a better word),
interrupt gdb with Ctrl-c to get back to the gdb shell window (as in step 8).

__Martin

--
Keep Your Developer Skills Current with LearnDevNow!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-d2d
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] SD crashes

2012-02-13 Thread Adrian Reyer
Hi Joe,

On Mon, Feb 13, 2012 at 07:21:03AM +, Joe Nyland wrote:
 I hope someone would be able to offer any suggestions of why I am seeing the 
 following behaviour in my current Bacula setup:
 Since the tail end of last week, I have been having issues with my MySQL 
 backups in Bacula, where they would randomly appear to 'crash', normally when 
 performing a copy of a backup to another pool - but I'm not sure yet if this 
 is the trigger.

With bacula 5.0.3 I had frequent crashes on Copy Jobs as I ran out of
memory. The SD-box has only 4GB RAM, now I added 8GB swap and it seems
to run fine.

 NOTE: bconsole appears to crash here - no further output is produced, and 
 bconsole does not respond to any key presses. I have to Ctrl + C to exit out 
 from bconsole. Furthermore, the only way I can clear our the failed jobs from 
 the 'Running jobs queue' is to exit from bconsole, issue 'sudo service 
 bacula-sd stop' twice, then restart the SD and restart bacula-director.

Here the bacula-sd crashes and misses from process list.

I have another issue I have not been able to track down so far. The tape
changer seems to claim it has 0 slots now and then and bacula-sd really
dislikes that. Seems mostly to happen when tapes are moving and some
'mtx status'-like command is issued. If this happens, I need to stop
bacula-sd, it will take some time to umount the tape (bacula-sd has 'D'
state in 'ps'), only afterwards it can be started again and all is fine.
'update slots' without restart won't help, even as 'mtx status' gives
correct output again. Perhaps this is comparable to your issue 'sudo
service bacula-sd stop' twice.

Regards,
Adrian
-- 
LiHAS - Adrian Reyer - Hessenwiesenstraße 10 - D-70565 Stuttgart
Fon: +49 (7 11) 78 28 50 90 - Fax:  +49 (7 11) 78 28 50 91
Mail: li...@lihas.de - Web: http://lihas.de
Linux, Netzwerke, Consulting  Support - USt-ID: DE 227 816 626 Stuttgart

--
Try before you buy = See our experts in action!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-dev2
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] SD crashes

2012-02-13 Thread John Drescher
2012/2/13 Joe Nyland joenyl...@me.com:
 Hello everyone,

 I hope someone would be able to offer any suggestions of why I am seeing the
 following behaviour in my current Bacula setup:

 Since the tail end of last week, I have been having issues with my MySQL
 backups in Bacula, where they would randomly appear to 'crash', normally
 when performing a copy of a backup to another pool - but I'm not sure yet if
 this is the trigger.

 Running 'status dir' after one of these 'crashes' gives the following output
 for the running jobs:

 Running Jobs:
 Console connected at 12-Feb-12 15:53
 Console connected at 13-Feb-12 06:58
  JobId Level   Name                       Status
 ==
   2107 Full    WebServer1_MySQL_Copy.2012-02-13_04.30.00_28 is running
 Crashed Job
   2108 Full    WebServer1_MySQL.2012-02-13_04.30.00_29 is running Crashed
 Job
   2111 Full    MythTVServer1_MySQL.2012-02-13_05.00.00_32 is waiting for
 higher priority jobs to finish
   2113 Full    TestServer_MySQL.2012-02-13_05.00.00_34 is waiting execution
   2114 Full    MythTVServer1_MySQL_Copy.2012-02-13_05.30.00_35 is waiting
 execution
   2115 Full    WebServer1_MySQL_Copy.2012-02-13_05.30.00_36 is waiting
 execution
   2116 Full    WebServer1_MySQL.2012-02-13_05.30.00_37 has a fatal error
   2117 Full    TestServer_MySQL_Copy.2012-02-13_05.30.00_38 is waiting
 execution
   2121 Full    MythTVServer1_MySQL_Copy.2012-02-13_06.30.00_42 is waiting
 execution
   2122 Full    WebServer1_MySQL_Copy.2012-02-13_06.30.00_43 is waiting
 execution
   2123 Full    WebServer1_MySQL.2012-02-13_06.30.00_44 has a fatal error
   2124 Full    TestServer_MySQL_Copy.2012-02-13_06.30.00_45 is waiting
 execution
   2125 Full    MythTVServer1_MySQL.2012-02-13_07.00.00_47 has a fatal error
   2126 Full    WebServer1_MySQL.2012-02-13_07.00.00_48 has a fatal error
 

 Once the above appears, I am unable to view the status of any storage
 resource on my SD:

 *status storage=FileServer1_Full
 Connecting to Storage daemon FileServer1_Full at FileServer1:9103

 FileServer1-sd Version: 5.0.1 (24 February 2010) x86_64-pc-linux-gnu ubuntu
 10.04
 Daemon started 12-Feb-12 15:53, 92 Jobs run since started.
  Heap: heap=1,671,168 smbytes=1,188,608 max_bytes=1,388,208 bufs=577
 max_bufs=994
 Sizes: boffset_t=8 size_t=8 int32_t=4 int64_t=8

 Running Jobs:
 Reading: Full Copy job WebServer1_MySQL_Copy JobId=2107
 Volume=WebServer1_MySQL_1325
     pool=WebServer1_MySQL device=WebServer1_MySQL
 (/mnt/backup/Bacula/Databases/WebServer1)
     Files=4 Bytes=164,924 Bytes/sec=17
     FDSocket closed
 

 Jobs waiting to reserve a drive:
 

 Terminated Jobs:
  JobId  Level    Files      Bytes   Status   Finished        Name
 ===
   2091  Full          2    92.45 K  OK       13-Feb-12 03:30
 TestServer_MySQL_Copy
   2096  Full          5    2.258 M  OK       13-Feb-12 03:30
 MythTVServer1_MySQL_Copy
   2098  Full          4    164.9 K  OK       13-Feb-12 03:30
 WebServer1_MySQL_Copy
   2100  Full          2    92.45 K  OK       13-Feb-12 03:30
 TestServer_MySQL_Copy
   2078  Full      1,145    2.942 G  OK       13-Feb-12 03:31 SVN_Copy
   2102  Full          5    2.259 M  OK       13-Feb-12 04:01
 MythTVServer1_MySQL
   2103  Full          4    164.9 K  OK       13-Feb-12 04:01
 WebServer1_MySQL
   2104  Full          2    92.37 K  OK       13-Feb-12 04:01
 TestServer_MySQL
   2105  Full          5    2.259 M  OK       13-Feb-12 04:30
 MythTVServer1_MySQL_Copy
   2109  Full          2    92.37 K  OK       13-Feb-12 04:30
 TestServer_MySQL_Copy
 

 Device status:
 Device Default (/mnt/backup/Bacula) is not open.
 snip
 Device WebServer1_Inc (/mnt/backup/Bacula/WebServer1/Incremental) is not
 open.
 Device WebServer1_MySQL (/mnt/backup/Bacula/Databases/WebServer1) is
 mounted with:
     Volume:      WebServer1_MySQL_1325
     Pool:        WebServer1_MySQL
     Media type:  File
     Total Bytes Read=0 Blocks Read=0 Bytes/block=0
     Positioned at File=0 Block=0
 Device WebServer1_MySQL_Copy (/mnt/mac_backup/Bacula/Databases/WebServer1)
 is not open.
 Device WebServer1_Full_Copy (/mnt/mac_backup/Bacula/WebServer1/Full) is
 not open.
 Device WebServer1_Inc_Copy
 (/mnt/mac_backup/Bacula/WebServer1/Incrementals) is not open.
 snip
 Device SharedData_Diff (/mnt/backup/Bacula/Shared/Differential) is not
 open.
 

 Used Volume status:

 NOTE: bconsole appears to crash here - no further output is produced, and
 bconsole does not respond to any key presses. I have to Ctrl + C to exit out
 from bconsole. Furthermore, the only way I can clear our the failed jobs
 from the 'Running jobs queue' is to exit from bconsole, issue 'sudo service
 bacula-sd stop' twice, then restart the SD and restart bacula-director.


 What I have is for 4 of my clients I run a MySQL backup hourly at 00:00,
 01:00, etc. I then copy the MySQL backups to another storage 

Re: [Bacula-users] SD crashes

2012-02-13 Thread Joe Nyland
On 13 Feb, 2012,at 11:37 AM, Adrian Reyer bacula-li...@lihas.de wrote:Hi Joe,  On Mon, Feb 13, 2012 at 07:21:03AM +, Joe Nyland wrote:  I hope someone would be able to offer any suggestions of why I am seeing the following behaviour in my current Bacula setup:  Since the tail end of last week, I have been having issues with my MySQL backups in Bacula, where they would randomly appear to 'crash', normally when performing a copy of a backup to another pool - but I'm not sure yet if this is the trigger.  With bacula 5.0.3 I had frequent crashes on Copy Jobs as I ran out of memory. The SD-box has only 4GB RAM, now I added 8GB swap and it seems to run fine.   NOTE: bconsole appears to crash here - no further output is produced, and bconsole does not respond to any key presses. I have to Ctrl + C to exit out from bconsole. Furthermore, the only way I can clear our the failed jobs from the 'Running jobs queue' is to exit from bconsole, issue 'sudo service bacula-sd stop' twice, then restart the SD and restart bacula-director.  Here the bacula-sd crashes and misses from process list.  I have another issue I have not been able to track down so far. The tape changer seems to claim it has 0 slots now and then and bacula-sd really dislikes that. Seems mostly to happen when tapes are moving and some 'mtx status'-like command is issued. If this happens, I need to stop bacula-sd, it will take some time to umount the tape (bacula-sd has 'D' state in 'ps'), only afterwards it can be started again and all is fine. 'update slots' without restart won't help, even as 'mtx status' gives correct output again. Perhaps this is comparable to your "issue 'sudo service bacula-sd stop' twice".  Regards, Adrian --  LiHAS - Adrian Reyer - Hessenwiesenstraße 10 - D-70565 Stuttgart Fon: +49 (7 11) 78 28 50 90 - Fax: +49 (7 11) 78 28 50 91 Mail: li...@lihas.de - Web: http://lihas.de Linux, Netzwerke, Consulting  Support - USt-ID: DE 227 816 626 StuttgartHi Adrian,Thanks for your reply.I hadn't considered RAM as being the cause of the problem, mainly because other backup jobs backup far more (and far larger) files to this same SD without issue. It seems to be only when I introduced MySQL backups of different servers (including the Bacula catalog server) into the mix, that I started to see this behaviour.My current theory which I am testing is disabling the MySQL backup and copy jobs for FileServer1 only, so that the Bacula database is not backed up in Bacula as this resides on FileServer1 - I'm starting to wonder whether the process of backing up my catalog at the same time that several other backup jobs are running (and completing in the case of the smaller DBs) is somehow causing this problem. However, this doesn't explain why the SD appears to be crashing :-(.In the meantime, I have found this bug which was forwarded on from Debian bugs: http://bugs.bacula.org/view.php?id=1098, However it appears to for Bacula 2.2.8 :-( Another mention of the issue here: http://adsm.org/lists/html/Bacula-users/2009-12/msg00140.html but that's for Bacula 3.0.3.Any other ideas?Thank you.Joe--
Try before you buy = See our experts in action!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-dev2___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] SD crashes

2012-02-13 Thread Joe Nyland
On 13 Feb, 2012,at 02:11 PM, John Drescher dresche...@gmail.com wrote:2012/2/13 Joe Nyland joenyl...@me.com:  Hello everyone,   I hope someone would be able to offer any suggestions of why I am seeing the  following behaviour in my current Bacula setup:   Since the tail end of last week, I have been having issues with my MySQL  backups in Bacula, where they would randomly appear to 'crash', normally  when performing a copy of a backup to another pool - but I'm not sure yet if  this is the trigger.   Running 'status dir' after one of these 'crashes' gives the following output  for the running jobs:   Running Jobs:  Console connected at 12-Feb-12 15:53  Console connected at 13-Feb-12 06:58  JobId Level  NameStatus  ==   2107 Full  WebServer1_MySQL_Copy.2012-02-13_04.30.00_28 is running  Crashed Job   2108 Full  WebServer1_MySQL.2012-02-13_04.30.00_29 is running Crashed  Job   2111 Full  MythTVServer1_MySQL.2012-02-13_05.00.00_32 is waiting for  higher priority jobs to finish   2113 Full  TestServer_MySQL.2012-02-13_05.00.00_34 is waiting execution   2114 Full  MythTVServer1_MySQL_Copy.2012-02-13_05.30.00_35 is waiting  execution   2115 Full  WebServer1_MySQL_Copy.2012-02-13_05.30.00_36 is waiting  execution   2116 Full  WebServer1_MySQL.2012-02-13_05.30.00_37 has a fatal error   2117 Full  TestServer_MySQL_Copy.2012-02-13_05.30.00_38 is waiting  execution   2121 Full  MythTVServer1_MySQL_Copy.2012-02-13_06.30.00_42 is waiting  execution   2122 Full  WebServer1_MySQL_Copy.2012-02-13_06.30.00_43 is waiting  execution   2123 Full  WebServer1_MySQL.2012-02-13_06.30.00_44 has a fatal error   2124 Full  TestServer_MySQL_Copy.2012-02-13_06.30.00_45 is waiting  execution   2125 Full  MythTVServer1_MySQL.2012-02-13_07.00.00_47 has a fatal error   2126 Full  WebServer1_MySQL.2012-02-13_07.00.00_48 has a fatal error     Once the above appears, I am unable to view the status of any storage  resource on my SD:   *status storage=FileServer1_Full  Connecting to Storage daemon FileServer1_Full at FileServer1:9103   FileServer1-sd Version: 5.0.1 (24 February 2010) x86_64-pc-linux-gnu ubuntu  10.04  Daemon started 12-Feb-12 15:53, 92 Jobs run since started.  Heap: heap=1,671,168 smbytes=1,188,608 max_bytes=1,388,208 bufs=577  max_bufs=994  Sizes: boffset_t=8 size_t=8 int32_t=4 int64_t=8   Running Jobs:  Reading: Full Copy job WebServer1_MySQL_Copy JobId=2107  Volume="WebServer1_MySQL_1325"pool="WebServer1_MySQL" device="WebServer1_MySQL"  (/mnt/backup/Bacula/Databases/WebServer1)Files=4 Bytes=164,924 Bytes/sec=17FDSocket closed     Jobs waiting to reserve a drive:     Terminated Jobs:  JobId Level  Files   Bytes  Status  FinishedName  ===   2091 Full 2  92.45 K OK13-Feb-12 03:30  TestServer_MySQL_Copy   2096 Full 5  2.258 M OK13-Feb-12 03:30  MythTVServer1_MySQL_Copy   2098 Full 4  164.9 K OK13-Feb-12 03:30  WebServer1_MySQL_Copy   2100 Full 2  92.45 K OK13-Feb-12 03:30  TestServer_MySQL_Copy   2078 Full   1,145  2.942 G OK13-Feb-12 03:31 SVN_Copy   2102 Full 5  2.259 M OK13-Feb-12 04:01  MythTVServer1_MySQL   2103 Full 4  164.9 K OK13-Feb-12 04:01  WebServer1_MySQL   2104 Full 2  92.37 K OK13-Feb-12 04:01  TestServer_MySQL   2105 Full 5  2.259 M OK13-Feb-12 04:30  MythTVServer1_MySQL_Copy   2109 Full 2  92.37 K OK13-Feb-12 04:30  TestServer_MySQL_Copy     Device status:  Device "Default" (/mnt/backup/Bacula) is not open.  snip  Device "WebServer1_Inc" (/mnt/backup/Bacula/WebServer1/Incremental) is not  open.  Device "WebServer1_MySQL" (/mnt/backup/Bacula/Databases/WebServer1) is  mounted with:Volume:   WebServer1_MySQL_1325Pool:WebServer1_MySQLMedia type: FileTotal Bytes Read=0 Blocks Read=0 Bytes/block=0Positioned at File=0 Block=0  Device "WebServer1_MySQL_Copy" (/mnt/mac_backup/Bacula/Databases/WebServer1)  is not open.  Device "WebServer1_Full_Copy" (/mnt/mac_backup/Bacula/WebServer1/Full) is  not open.  Device "WebServer1_Inc_Copy"  (/mnt/mac_backup/Bacula/WebServer1/Incrementals) is not open.  snip  Device "SharedData_Diff" (/mnt/backup/Bacula/Shared/Differential) is not  open.     Used Volume status:   NOTE: bconsole appears to crash here - no further output is produced, and  bconsole does not respond to any key presses. I have to Ctrl + C to exit out  from bconsole. Furthermore, the only way I can clear our the failed jobs  from the 'Running jobs queue' is to exit from bconsole, issue 'sudo service  bacula-sd stop' twice, then restart the SD and restart bacula-director.What I have is for 4 of my clients I run a MySQL backup hourly at 00:00,  01:00,etc. I then copy the MySQL backups to another storage resource on my  SD at 00:30, 01:30, etc. The MySQL databases which I am backing up are  relatively small, the biggest of which is my Bacula 

[Bacula-users] SD crashes

2012-02-12 Thread Joe Nyland
Hello everyone,

I hope someone would be able to offer any suggestions of why I am seeing the 
following behaviour in my current Bacula setup:

Since the tail end of last week, I have been having issues with my MySQL 
backups in Bacula, where they would randomly appear to 'crash', normally when 
performing a copy of a backup to another pool - but I'm not sure yet if this is 
the trigger.

Running 'status dir' after one of these 'crashes' gives the following output 
for the running jobs:

Running Jobs:
Console connected at 12-Feb-12 15:53
Console connected at 13-Feb-12 06:58
 JobId Level   Name   Status
==
  2107 FullWebServer1_MySQL_Copy.2012-02-13_04.30.00_28 is running  
Crashed Job
  2108 FullWebServer1_MySQL.2012-02-13_04.30.00_29 is running   
Crashed Job
  2111 FullMythTVServer1_MySQL.2012-02-13_05.00.00_32 is waiting for higher 
priority jobs to finish
  2113 FullTestServer_MySQL.2012-02-13_05.00.00_34 is waiting execution
  2114 FullMythTVServer1_MySQL_Copy.2012-02-13_05.30.00_35 is waiting 
execution
  2115 FullWebServer1_MySQL_Copy.2012-02-13_05.30.00_36 is waiting execution
  2116 FullWebServer1_MySQL.2012-02-13_05.30.00_37 has a fatal error
  2117 FullTestServer_MySQL_Copy.2012-02-13_05.30.00_38 is waiting execution
  2121 FullMythTVServer1_MySQL_Copy.2012-02-13_06.30.00_42 is waiting 
execution
  2122 FullWebServer1_MySQL_Copy.2012-02-13_06.30.00_43 is waiting execution
  2123 FullWebServer1_MySQL.2012-02-13_06.30.00_44 has a fatal error
  2124 FullTestServer_MySQL_Copy.2012-02-13_06.30.00_45 is waiting execution
  2125 FullMythTVServer1_MySQL.2012-02-13_07.00.00_47 has a fatal error
  2126 FullWebServer1_MySQL.2012-02-13_07.00.00_48 has a fatal error


Once the above appears, I am unable to view the status of any storage resource 
on my SD:

*status storage=FileServer1_Full
Connecting to Storage daemon FileServer1_Full at FileServer1:9103

FileServer1-sd Version: 5.0.1 (24 February 2010) x86_64-pc-linux-gnu ubuntu 
10.04
Daemon started 12-Feb-12 15:53, 92 Jobs run since started.
 Heap: heap=1,671,168 smbytes=1,188,608 max_bytes=1,388,208 bufs=577 
max_bufs=994
Sizes: boffset_t=8 size_t=8 int32_t=4 int64_t=8

Running Jobs:
Reading: Full Copy job WebServer1_MySQL_Copy JobId=2107 
Volume=WebServer1_MySQL_1325
pool=WebServer1_MySQL device=WebServer1_MySQL 
(/mnt/backup/Bacula/Databases/WebServer1)
Files=4 Bytes=164,924 Bytes/sec=17
FDSocket closed


Jobs waiting to reserve a drive:


Terminated Jobs:
 JobId  LevelFiles  Bytes   Status   FinishedName 
===
  2091  Full  292.45 K  OK   13-Feb-12 03:30 
TestServer_MySQL_Copy
  2096  Full  52.258 M  OK   13-Feb-12 03:30 
MythTVServer1_MySQL_Copy
  2098  Full  4164.9 K  OK   13-Feb-12 03:30 
WebServer1_MySQL_Copy
  2100  Full  292.45 K  OK   13-Feb-12 03:30 
TestServer_MySQL_Copy
  2078  Full  1,1452.942 G  OK   13-Feb-12 03:31 SVN_Copy
  2102  Full  52.259 M  OK   13-Feb-12 04:01 MythTVServer1_MySQL
  2103  Full  4164.9 K  OK   13-Feb-12 04:01 WebServer1_MySQL
  2104  Full  292.37 K  OK   13-Feb-12 04:01 TestServer_MySQL
  2105  Full  52.259 M  OK   13-Feb-12 04:30 
MythTVServer1_MySQL_Copy
  2109  Full  292.37 K  OK   13-Feb-12 04:30 
TestServer_MySQL_Copy


Device status:
Device Default (/mnt/backup/Bacula) is not open.
snip
Device WebServer1_Inc (/mnt/backup/Bacula/WebServer1/Incremental) is not open.
Device WebServer1_MySQL (/mnt/backup/Bacula/Databases/WebServer1) is mounted 
with:
Volume:  WebServer1_MySQL_1325
Pool:WebServer1_MySQL
Media type:  File
Total Bytes Read=0 Blocks Read=0 Bytes/block=0
Positioned at File=0 Block=0
Device WebServer1_MySQL_Copy (/mnt/mac_backup/Bacula/Databases/WebServer1) is 
not open.
Device WebServer1_Full_Copy (/mnt/mac_backup/Bacula/WebServer1/Full) is not 
open.
Device WebServer1_Inc_Copy (/mnt/mac_backup/Bacula/WebServer1/Incrementals) 
is not open.
snip
Device SharedData_Diff (/mnt/backup/Bacula/Shared/Differential) is not open.


Used Volume status:

NOTE: bconsole appears to crash here - no further output is produced, and 
bconsole does not respond to any key presses. I have to Ctrl + C to exit out 
from bconsole. Furthermore, the only way I can clear our the failed jobs from 
the 'Running jobs queue' is to exit from bconsole, issue 'sudo service 
bacula-sd stop' twice, then restart the SD and restart bacula-director.

What I have is for 4 of my clients I run a MySQL backup hourly at 00:00, 01:00, 
etc. I then copy the MySQL backups to another storage