Re: [Bacula-users] SD crashes when working with S3 (Ceph)
Is this still loading the driver from /usr/lib64/bacula-sd-cloud-driver-9.6.3.so? It is a little strange that you have bacula in /opt/bacula/bin/bacula-sd but the plugins are in /usr/lib64. Please also post the output from: objdump -t /usr/lib64/bacula-sd-cloud-driver-9.6.3.so | grep _driver Do you also have /opt/bacula/plugins/bacula-sd-cloud-driver-9.6.3.so? __Martin > On Thu, 14 May 2020 08:49:14 +0200, Phillip Dale said: > > I could not get much information out of that traceback. Hopefully this helps, > so here is the traceback file I got: > > [New LWP 19474] > [New LWP 19470] > [New LWP 19315] > [Thread debugging using libthread_db enabled] > Using host libthread_db library "/lib64/libthread_db.so.1". > 0x7feb40a409a3 in select () from /usr/lib64/libc.so.6 > $1 = "12-May-2020 22:44:03\000\000\000\000\000\000\000\000\000" > $2 = '\000' > $3 = 0x231aeb8 "bacula-sd" > $4 = 0x231aef8 "/opt/bacula/bin/bacula-sd" > $5 = 0x0 > $6 = '\000' > $7 = 0x7feb420b443f "9.6.3 (09 March 2020)" > $8 = 0x7feb420b4463 "x86_64-pc-linux-gnu" > $9 = 0x7feb420b4477 "redhat" > $10 = 0x7feb420b447e "(Core)" > $11 = "backup.novalocal", '\000' > $12 = 0x7feb420b4455 "redhat (Core)" > Environment variable "TestName" not defined. > #0 0x7feb40a409a3 in select () from /usr/lib64/libc.so.6 > #1 0x7feb4204e6cc in bnet_thread_server (addrs=0x231f6f8, > max_clients=41, client_wq=0x630d80 _workq>, handle_client_request=0x40ebd8 ) > at bnet_server.c:166 > #2 0x0040a347 in main (argc=0, argv=0x7fffaaa99890) at stored.c:327 > > Thread 4 (Thread 0x7feb387e3700 (LWP 19315)): > #0 0x7feb41e1cde2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from > /usr/lib64/libpthread.so.0 > #1 0x7feb420a1ae1 in watchdog_thread (arg=0x0) at watchdog.c:299 > #2 0x7feb41e18ea5 in start_thread () from /usr/lib64/libpthread.so.0 > #3 0x7feb40a498dd in clone () from /usr/lib64/libc.so.6 > > Thread 3 (Thread 0x7feb38fe4700 (LWP 19470)): > #0 0x7feb41e201d9 in waitpid () from /usr/lib64/libpthread.so.0 > #1 0x7feb42093f88 in signal_handler (sig=11) at signal.c:233 > #2 > #3 0x7feb41e1ad00 in pthread_mutex_lock () from > /usr/lib64/libpthread.so.0 > #4 0x7feb420abf31 in lmgr_p (m=0x10) at lockmgr.c:106 > #5 0x7feb420ae7b8 in lock_guard::lock_guard (this=0x7feb38fe3510, > mutex=...) at ../lib/lockmgr.h:2 > 89 > #6 0x7feb37dd8296 in cloud_proxy::volume_lookup (this=0x0, > volume=0x7feb3000acc8 "Vol-0003") at cl > oud_parts.c:229 > #7 0x7feb37dd1658 in cloud_dev::probe_cloud_proxy (this=0x7feb3000a878, > dcr=0x7feb300146d8, VolNam > e=0x7feb3000acc8 "Vol-0003", force=false) at cloud_dev.c:1217 > #8 0x7feb37dd0593 in cloud_dev::open_device (this=0x7feb3000a878, > dcr=0x7feb300146d8, omode=2) at > cloud_dev.c:1025 > #9 0x7feb4251ce8a in DCR::mount_next_write_volume (this=0x7feb300146d8) > at mount.c:191 > #10 0x7feb424f6d32 in acquire_device_for_append (dcr=0x7feb300146d8) at > acquire.c:420 > #11 0x0040c325 in do_append_data (jcr=0x7feb38e8) at append.c:102 > #12 0x00416ed7 in append_data_cmd (jcr=0x7feb38e8) at > fd_cmds.c:263 > #13 0x00416b68 in do_client_commands (jcr=0x7feb38e8) at > fd_cmds.c:218 > #14 0x0041688a in run_job (jcr=0x7feb38e8) at fd_cmds.c:167 > #15 0x00418c58 in run_cmd (jcr=0x7feb38e8) at job.c:240 > #16 0x0040f196 in handle_connection_request (arg=0x2340ef8) at > dircmd.c:242 > #17 0x7feb420a2b54 in workq_server (arg=0x630d80 ) at > workq.c:372 > #18 0x7feb41e18ea5 in start_thread () from /usr/lib64/libpthread.so.0 > #19 0x7feb40a498dd in clone () from /usr/lib64/libc.so.6 > > Thread 2 (Thread 0x7feb368e2700 (LWP 19474)): > #0 0x7feb41e1cde2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from > /usr/lib64/libpthread.so.0 > #1 0x7feb420a294b in workq_server (arg=0x630d80 ) at > workq.c:349 > #2 0x7feb41e18ea5 in start_thread () from /usr/lib64/libpthread.so.0 > #3 0x7feb40a498dd in clone () from /usr/lib64/libc.so.6 > > Thread 1 (Thread 0x7feb42b90880 (LWP 19313)): > #0 0x7feb40a409a3 in select () from /usr/lib64/libc.so.6 > #1 0x7feb4204e6cc in bnet_thread_server (addrs=0x231f6f8, > max_clients=41, client_wq=0x630d80 , > handle_client_request=0x40ebd8 ) at > bnet_server.c:166 > #2 0x0040a347 in main (argc=0, argv=0x7fffaaa99890) at stored.c:327 > #0 0x7feb40a409a3 in select () from /usr/lib64/libc.so.6 > No symbol table info available. > #1 0x7feb4204e6cc in bnet_thread_server (addrs=0x231f6f8, > max_clients=41, client_wq=0x630d80 , > handle_client_request=0x40ebd8 ) at > bnet_server.c:166 > 166 if ((stat = select(maxfd + 1, , NULL, NULL, NULL)) < 0) > { > maxfd = 5 > sockset = {fds_bits = {32, 0 }} > clilen = 16 > turnon = 1 > buf = "188.95.226.225", '\000' > allbuf = "0.0.0.0:9103 \000\000\000\060\215\251\252\377\177\000\000 >
Re: [Bacula-users] SD crashes when working with S3 (Ceph)
Hello, czw., 14 maj 2020 o 08:50 Phillip Dale napisał(a): > I could not get much information out of that traceback. Hopefully this > helps, so here is the traceback file I got: > It is an almost perfect traceback. :) > Thread 3 (Thread 0x7feb38fe4700 (LWP 19470)): > #0 0x7feb41e201d9 in waitpid () from /usr/lib64/libpthread.so.0 > #1 0x7feb42093f88 in signal_handler (sig=11) at signal.c:233 > #2 > #3 0x7feb41e1ad00 in pthread_mutex_lock () from > /usr/lib64/libpthread.so.0 > #4 0x7feb420abf31 in lmgr_p (m=0x10) at lockmgr.c:106 > #5 0x7feb420ae7b8 in lock_guard::lock_guard (this=0x7feb38fe3510, > mutex=...) at ../lib/lockmgr.h:2 > 89 > #6 0x7feb37dd8296 in cloud_proxy::volume_lookup (this=0x0, > volume=0x7feb3000acc8 "Vol-0003") at cl > oud_parts.c:229 > The problem is here ^^^ as variable "this" should not be NULL (0x0) and this causes your SEGSIGV. The author of the plugin should check why it get NULL and correct the code to avoid such problems. best regards -- Radosław Korzeniewski rados...@korzeniewski.net ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] SD crashes when working with S3 (Ceph)
I could not get much information out of that traceback. Hopefully this helps, so here is the traceback file I got: [New LWP 19474] [New LWP 19470] [New LWP 19315] [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". 0x7feb40a409a3 in select () from /usr/lib64/libc.so.6 $1 = "12-May-2020 22:44:03\000\000\000\000\000\000\000\000\000" $2 = '\000' $3 = 0x231aeb8 "bacula-sd" $4 = 0x231aef8 "/opt/bacula/bin/bacula-sd" $5 = 0x0 $6 = '\000' $7 = 0x7feb420b443f "9.6.3 (09 March 2020)" $8 = 0x7feb420b4463 "x86_64-pc-linux-gnu" $9 = 0x7feb420b4477 "redhat" $10 = 0x7feb420b447e "(Core)" $11 = "backup.novalocal", '\000' $12 = 0x7feb420b4455 "redhat (Core)" Environment variable "TestName" not defined. #0 0x7feb40a409a3 in select () from /usr/lib64/libc.so.6 #1 0x7feb4204e6cc in bnet_thread_server (addrs=0x231f6f8, max_clients=41, client_wq=0x630d80 , handle_client_request=0x40ebd8 ) at bnet_server.c:166 #2 0x0040a347 in main (argc=0, argv=0x7fffaaa99890) at stored.c:327 Thread 4 (Thread 0x7feb387e3700 (LWP 19315)): #0 0x7feb41e1cde2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /usr/lib64/libpthread.so.0 #1 0x7feb420a1ae1 in watchdog_thread (arg=0x0) at watchdog.c:299 #2 0x7feb41e18ea5 in start_thread () from /usr/lib64/libpthread.so.0 #3 0x7feb40a498dd in clone () from /usr/lib64/libc.so.6 Thread 3 (Thread 0x7feb38fe4700 (LWP 19470)): #0 0x7feb41e201d9 in waitpid () from /usr/lib64/libpthread.so.0 #1 0x7feb42093f88 in signal_handler (sig=11) at signal.c:233 #2 #3 0x7feb41e1ad00 in pthread_mutex_lock () from /usr/lib64/libpthread.so.0 #4 0x7feb420abf31 in lmgr_p (m=0x10) at lockmgr.c:106 #5 0x7feb420ae7b8 in lock_guard::lock_guard (this=0x7feb38fe3510, mutex=...) at ../lib/lockmgr.h:2 89 #6 0x7feb37dd8296 in cloud_proxy::volume_lookup (this=0x0, volume=0x7feb3000acc8 "Vol-0003") at cl oud_parts.c:229 #7 0x7feb37dd1658 in cloud_dev::probe_cloud_proxy (this=0x7feb3000a878, dcr=0x7feb300146d8, VolNam e=0x7feb3000acc8 "Vol-0003", force=false) at cloud_dev.c:1217 #8 0x7feb37dd0593 in cloud_dev::open_device (this=0x7feb3000a878, dcr=0x7feb300146d8, omode=2) at cloud_dev.c:1025 #9 0x7feb4251ce8a in DCR::mount_next_write_volume (this=0x7feb300146d8) at mount.c:191 #10 0x7feb424f6d32 in acquire_device_for_append (dcr=0x7feb300146d8) at acquire.c:420 #11 0x0040c325 in do_append_data (jcr=0x7feb38e8) at append.c:102 #12 0x00416ed7 in append_data_cmd (jcr=0x7feb38e8) at fd_cmds.c:263 #13 0x00416b68 in do_client_commands (jcr=0x7feb38e8) at fd_cmds.c:218 #14 0x0041688a in run_job (jcr=0x7feb38e8) at fd_cmds.c:167 #15 0x00418c58 in run_cmd (jcr=0x7feb38e8) at job.c:240 #16 0x0040f196 in handle_connection_request (arg=0x2340ef8) at dircmd.c:242 #17 0x7feb420a2b54 in workq_server (arg=0x630d80 ) at workq.c:372 #18 0x7feb41e18ea5 in start_thread () from /usr/lib64/libpthread.so.0 #19 0x7feb40a498dd in clone () from /usr/lib64/libc.so.6 Thread 2 (Thread 0x7feb368e2700 (LWP 19474)): #0 0x7feb41e1cde2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /usr/lib64/libpthread.so.0 #1 0x7feb420a294b in workq_server (arg=0x630d80 ) at workq.c:349 #2 0x7feb41e18ea5 in start_thread () from /usr/lib64/libpthread.so.0 #3 0x7feb40a498dd in clone () from /usr/lib64/libc.so.6 Thread 1 (Thread 0x7feb42b90880 (LWP 19313)): #0 0x7feb40a409a3 in select () from /usr/lib64/libc.so.6 #1 0x7feb4204e6cc in bnet_thread_server (addrs=0x231f6f8, max_clients=41, client_wq=0x630d80 , handle_client_request=0x40ebd8 ) at bnet_server.c:166 #2 0x0040a347 in main (argc=0, argv=0x7fffaaa99890) at stored.c:327 #0 0x7feb40a409a3 in select () from /usr/lib64/libc.so.6 No symbol table info available. #1 0x7feb4204e6cc in bnet_thread_server (addrs=0x231f6f8, max_clients=41, client_wq=0x630d80 , handle_client_request=0x40ebd8 ) at bnet_server.c:166 166 if ((stat = select(maxfd + 1, , NULL, NULL, NULL)) < 0) { maxfd = 5 sockset = {fds_bits = {32, 0 }} clilen = 16 turnon = 1 buf = "188.95.226.225", '\000' allbuf = "0.0.0.0:9103 \000\000\000\060\215\251\252\377\177\000\000 \215\251\252\377\177\000\000!\000\000\000\000\000\000\000$\274UA\353\177\000\000\000\000\000\000\000\000\000\000hG\271B\353\177\000\000\000\200\271B\353\177\000\000U\301UA\353\177\000\000\320\277\225@\353\177\000\000\360\027TA\353\177\000\000\000\000\000\000\001\000\000\000t\004\000\000\001\000\000\000\334\367\063\002\000\000\000\000\350\215\251\252\377\177\000\000\300\215\251\252\377\177\000\000\001\000\000\000\000\000\000\000hG\271B\353\177\000\000\370\254\271B\353\177\000\000\230\251\271B\353\177\000\000\217`\230B\353\177\000\000\000\000\000\000\000\000\000\000hG\271B\353\177\000\000\001\000\000\000\377\177\000\000\000\000\000\000\000\000\000\000"... stat = 0 tlog = 0 fd_ptr = 0x0 sockfds = { = {},
Re: [Bacula-users] SD crashes when working with S3 (Ceph)
> On Wed, 13 May 2020 15:39:56 +0200, Phillip Dale said: > > Hi all, > > I just joined this list, so not sure if this should go here or in the > development list. I have the same issue that Rick Tuk has from his post on > May 07. > > I am running on CentOS 7 and everything works fine until I try to use Ceph S3 > or Amazon S3 storage. At this time, the bacula-sd crashes. My setup is very > similar to the one in his post. > Not sure about where to go from here. Hoping for some help. > > Here is the traceback from running bacula-sd with -q20: > > backup.novalocal-sd: init_dev.c:437-0 Open SD driver at > /usr/lib64/bacula-sd-cloud-driver-9.6.3.so > backup.novalocal-sd: init_dev.c:442-0 Lookup "BaculaSDdriver" in driver=cloud > backup.novalocal-sd: init_dev.c:444-0 Driver=cloud entry point=7feb37dcc907 > backup.novalocal-sd: stored.c:615-0 SD init done CephStorage (0x7feb30008818) > backup.novalocal-sd: init_dev.c:469-0 SD driver=cloud is already loaded. > backup.novalocal-sd: stored.c:615-0 SD init done S3CloudStorage > (0x7feb3000a878) > backup.novalocal-sd: stored.c:615-0 SD init done TmpFileStorage > (0x7feb3000c928) > backup.novalocal-sd: bnet_server.c:86-0 Addresses 0.0.0.0:9103 > List plugins. Hook count=0 > Bacula interrupted by signal 11: Segmentation violation > Kaboom! bacula-sd, backup.novalocal-sd got signal 11 - Segmentation violation > at 12-May-2020 22:44:03. Attempting traceback. > Kaboom! exepath=/opt/bacula/bin/ > Calling: /opt/bacula/bin/btraceback /opt/bacula/bin/bacula-sd 19313 > /opt/bacula/working > It looks like the traceback worked... > LockDump: /opt/bacula/working/bacula.19313.traceback Did it send you an email with the traceback? That might contain more information. If you can't find the email, then look in /opt/bacula/working/bacula.19313.traceback. __Martin ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
[Bacula-users] SD crashes when working with S3 (Ceph)
Hi all, I just joined this list, so not sure if this should go here or in the development list. I have the same issue that Rick Tuk has from his post on May 07. I am running on CentOS 7 and everything works fine until I try to use Ceph S3 or Amazon S3 storage. At this time, the bacula-sd crashes. My setup is very similar to the one in his post. Not sure about where to go from here. Hoping for some help. Here is the traceback from running bacula-sd with -q20: backup.novalocal-sd: init_dev.c:437-0 Open SD driver at /usr/lib64/bacula-sd-cloud-driver-9.6.3.so backup.novalocal-sd: init_dev.c:442-0 Lookup "BaculaSDdriver" in driver=cloud backup.novalocal-sd: init_dev.c:444-0 Driver=cloud entry point=7feb37dcc907 backup.novalocal-sd: stored.c:615-0 SD init done CephStorage (0x7feb30008818) backup.novalocal-sd: init_dev.c:469-0 SD driver=cloud is already loaded. backup.novalocal-sd: stored.c:615-0 SD init done S3CloudStorage (0x7feb3000a878) backup.novalocal-sd: stored.c:615-0 SD init done TmpFileStorage (0x7feb3000c928) backup.novalocal-sd: bnet_server.c:86-0 Addresses 0.0.0.0:9103 List plugins. Hook count=0 Bacula interrupted by signal 11: Segmentation violation Kaboom! bacula-sd, backup.novalocal-sd got signal 11 - Segmentation violation at 12-May-2020 22:44:03. Attempting traceback. Kaboom! exepath=/opt/bacula/bin/ Calling: /opt/bacula/bin/btraceback /opt/bacula/bin/bacula-sd 19313 /opt/bacula/working It looks like the traceback worked... LockDump: /opt/bacula/working/bacula.19313.traceback backup.novalocal-sd: lockmgr.c:1221-8 lockmgr disabled backup.novalocal-sd: smartall.c:411-2863311530 Orphaned buffer: backup.novalocal-sd 536 bytes at 2321f08 from bsockcore.c:157 backup.novalocal-sd: smartall.c:411-2863311530 Orphaned buffer: backup.novalocal-sd 280 bytes at 233fb68 from jcr.c:388 backup.novalocal-sd: smartall.c:411-2863311530 Orphaned buffer: backup.novalocal-sd 280 bytes at 233ff18 from jcr.c:390 backup.novalocal-sd: smartall.c:411-2863311530 Orphaned buffer: backup.novalocal-sd 536 bytes at 2340c28 from jcr.c:386 backup.novalocal-sd: smartall.c:411-2863311530 Orphaned buffer: backup.novalocal-sd 280 bytes at 7feb3f68 from jcr.c:384 backup.novalocal-sd: smartall.c:411-2863311530 Orphaned buffer: backup.novalocal-sd 280 bytes at 7feb30008068 from lib/mem_pool.h:85 backup.novalocal-sd: smartall.c:411-2863311530 Orphaned buffer: backup.novalocal-sd 4120 bytes at 2341c78 from bsockcore.c:156 backup.novalocal-sd: smartall.c:411-2863311530 Orphaned buffer: backup.novalocal-sd 4120 bytes at 2342cc8 from bsock.c:101 backup.novalocal-sd: smartall.c:411-2863311530 Orphaned buffer: backup.novalocal-sd 536 bytes at 7feb3000e2a8 from record_util.c:251 backup.novalocal-sd: smartall.c:411-2863311530 Orphaned buffer: backup.novalocal-sd 536 bytes at 7feb3000e5e8 from bsockcore.c:157 backup.novalocal-sd: smartall.c:411-2863311530 Orphaned buffer: backup.novalocal-sd 448 bytes at 2340ef8 from bsock.c:852 backup.novalocal-sd: smartall.c:411-2863311530 Orphaned buffer: backup.novalocal-sd 7 bytes at 2341138 from bsock.c:854 backup.novalocal-sd: smartall.c:411-2863311530 Orphaned buffer: backup.novalocal-sd 15 bytes at 23410f8 from bsock.c:855 backup.novalocal-sd: smartall.c:411-2863311530 Orphaned buffer: backup.novalocal-sd 16 bytes at 2341178 from workq.c:198 backup.novalocal-sd: smartall.c:411-2863311530 Orphaned buffer: backup.novalocal-sd 24 bytes at 7feb3f18 from jcr.c:372 backup.novalocal-sd: smartall.c:411-2863311530 Orphaned buffer: backup.novalocal-sd 32 bytes at 7feb30009d58 from dircmd.c:194 backup.novalocal-sd: smartall.c:411-2863311530 Orphaned buffer: backup.novalocal-sd 24 bytes at 7feb30009db8 from askdir.c:575 backup.novalocal-sd: smartall.c:411-2863311530 Orphaned buffer: backup.novalocal-sd 154 bytes at 7feb3000f198 from job.c:126 backup.novalocal-sd: smartall.c:411-2863311530 Orphaned buffer: backup.novalocal-sd 154 bytes at 7feb3000f268 from job.c:129 backup.novalocal-sd: smartall.c:411-2863311530 Orphaned buffer: backup.novalocal-sd 154 bytes at 7feb3000f338 from job.c:132 backup.novalocal-sd: smartall.c:411-2863311530 Orphaned buffer: backup.novalocal-sd 154 bytes at 7feb3000f408 from job.c:141 backup.novalocal-sd: smartall.c:411-2863311530 Orphaned buffer: backup.novalocal-sd 40 bytes at 7feb300141d8 from job.c:161 backup.novalocal-sd: smartall.c:411-2863311530 Orphaned buffer: backup.novalocal-sd 32 bytes at 7feb30014238 from reserve.c:283 backup.novalocal-sd: smartall.c:411-2863311530 Orphaned buffer: backup.novalocal-sd 80 bytes at 7feb30014518 from alist.c:55 backup.novalocal-sd: smartall.c:411-2863311530 Orphaned buffer: backup.novalocal-sd 32 bytes at 7feb300145a8 from reserve.c:306 backup.novalocal-sd: smartall.c:411-2863311530 Orphaned buffer: backup.novalocal-sd 15 bytes at 7feb30014608 from reserve.c:321 backup.novalocal-sd: smartall.c:411-2863311530 Orphaned buffer: backup.novalocal-sd 80 bytes at 7feb30014648 from alist.c:55
Re: [Bacula-users] SD crashes
On 16 Feb, 2012,at 05:25 PM, Martin Simmons mar...@lispworks.com wrote: On Wed, 15 Feb 2012 17:46:33 +, Joe Nyland said:On 15 Feb 2012, at 16:52, Martin Simmons wrote: On Wed, 15 Feb 2012 13:31:10 + (GMT), Joe Nyland said: On 14 Feb, 2012,at 03:47 PM, Martin Simmons mar...@lispworks.com wrote: On Tue, 14 Feb 2012 10:34:31 -0500, John Drescher said: I've been running the SD using the following command (I know the combination of options I have used may be excessive, but I wanted as much chance of catching the error as I could!) since yesterday afternoon: sudo bacula-sd -c /etc/bacula/bacula-sd.conf -d 100 -dt -f -u bacula -g tape -m -v | tee -a /mnt/array/bacula-sd.screen.logHowever, (as luck would have it) I've not seen the behaviour I originally reported whilst running with debug options.Is there any way in which running the SD with the combination of options I have used above, could cause any different behaviour of the SD? Or interfere in any way with it? I'm asking, becuase I have re-enabled all of the backups jobs I have on the server, and I have still not seen it crash again.Could be a timing issue that the delay in writing the log causes the bad behavior to not happen.. Those types of problems are hard to debug. Running it under gdb without the debug options is better approach in that case. http://www.bacula.org/5.2.x-manuals/en/problems/problems/What_Do_When_Bacula.html#SECTION0064 When it 'crashes' (though it sounds more like 'hangs' is a better word), interrupt gdb with Ctrl-c to get back to the gdb shell window (as in step 8). __Martin -- Keep Your Developer Skills Current with LearnDevNow! The most comprehensive online learning library for Microsoft developers is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3, Metro Style Apps, more. Free future releases when you subscribe now! http://p.sf.net/sfu/learndevnow-d2d ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-usersMartin and John, thank you for your replies. I since yesterday afternoon, bacula-sd has been running under gdb using the instructions in the manual for my Bacula version, however I've still not seen the issue originally reported. I agree with you both that by running it under a debugging process, it seems a delay is introduced which is suppressing the error in some way. Is my best bet just to leave bacula-sd running under gdb and hope that my full backups over the weekend may highlight the issue? Or is there another way I could debug this? Assuming the error causes the SD to hang (rather than exit), then you could run it without gdb and then attach gdb to it when it hangs (use gdb -p $pid). __Martin -- Virtualization Cloud Management Using Capacity Planning Cloud computing makes use of virtualization - but cloud computingalso focuses on allowing computing to be delivered as a service. http://www.accelacomm.com/jaw/sfnl/114/51521223/ ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-usersOk, that sounds reasonable.One question, does bacula-sd need to be running with "-s no signals (for debugging)" or will gdb be able to provide enough info without this option? (My default options on Ubuntu 10.04 are: -c config file -u user -g group)Would using the -s option introduce the sort of delay we mentioned earlier, and thus limit my changes of reproducing the issue? I don't think -s will have any effect on delays, but you shouldn't need it if you attach gdb to the hanging process. __Martin -- Virtualization Cloud Management Using Capacity Planning Cloud computing makes use of virtualization - but cloud computing also focuses on allowing computing to be delivered as a service. http://www.accelacomm.com/jaw/sfnl/114/51521223/ ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-usersMartin,Thanks for confirming. I though as much, but didn't want to waste the opportunity of catching the error, so though it best to check.As luck would have it, however, I've not seen the SD hang since re-running it without any of the additional debugging options and not running it under gdb (so back to how it has been running normally, leading up to this issue). It's annoying that it seemed to be every couple of hours it would hang, then I'd have to restart everything. Now it seems I can't even get it
Re: [Bacula-users] SD crashes
On Wed, 15 Feb 2012 17:46:33 +, Joe Nyland said: On 15 Feb 2012, at 16:52, Martin Simmons wrote: On Wed, 15 Feb 2012 13:31:10 + (GMT), Joe Nyland said: On 14 Feb, 2012,at 03:47 PM, Martin Simmons mar...@lispworks.com wrote: On Tue, 14 Feb 2012 10:34:31 -0500, John Drescher said: I've been running the SD using the following command (I know the combination of options I have used may be excessive, but I wanted as much chance of catching the error as I could!) since yesterday afternoon: sudo bacula-sd -c /etc/bacula/bacula-sd.conf -d 100 -dt -f -u bacula -g tape -m -v | tee -a /mnt/array/bacula-sd.screen.log However, (as luck would have it) I've not seen the behaviour I originally reported whilst running with debug options. Is there any way in which running the SD with the combination of options I have used above, could cause any different behaviour of the SD? Or interfere in any way with it? I'm asking, becuase I have re-enabled all of the backups jobs I have on the server, and I have still not seen it crash again. Could be a timing issue that the delay in writing the log causes the bad behavior to not happen.. Those types of problems are hard to debug. Running it under gdb without the debug options is better approach in that case. http://www.bacula.org/5.2.x-manuals/en/problems/problems/What_Do_When_Bacula.html#SECTION0064 When it 'crashes' (though it sounds more like 'hangs' is a better word), interrupt gdb with Ctrl-c to get back to the gdb shell window (as in step 8). __Martin -- Keep Your Developer Skills Current with LearnDevNow! The most comprehensive online learning library for Microsoft developers is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3, Metro Style Apps, more. Free future releases when you subscribe now! http://p.sf.net/sfu/learndevnow-d2d ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users Martin and John, thank you for your replies. I since yesterday afternoon, bacula-sd has been running under gdb using the instructions in the manual for my Bacula version, however I've still not seen the issue originally reported. I agree with you both that by running it under a debugging process, it seems a delay is introduced which is suppressing the error in some way. Is my best bet just to leave bacula-sd running under gdb and hope that my full backups over the weekend may highlight the issue? Or is there another way I could debug this? Assuming the error causes the SD to hang (rather than exit), then you could run it without gdb and then attach gdb to it when it hangs (use gdb -p $pid). __Martin -- Virtualization Cloud Management Using Capacity Planning Cloud computing makes use of virtualization - but cloud computing also focuses on allowing computing to be delivered as a service. http://www.accelacomm.com/jaw/sfnl/114/51521223/ ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users Ok, that sounds reasonable. One question, does bacula-sd need to be running with -s no signals (for debugging) or will gdb be able to provide enough info without this option? (My default options on Ubuntu 10.04 are: -c config file -u user -g group) Would using the -s option introduce the sort of delay we mentioned earlier, and thus limit my changes of reproducing the issue? I don't think -s will have any effect on delays, but you shouldn't need it if you attach gdb to the hanging process. __Martin -- Virtualization Cloud Management Using Capacity Planning Cloud computing makes use of virtualization - but cloud computing also focuses on allowing computing to be delivered as a service. http://www.accelacomm.com/jaw/sfnl/114/51521223/ ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] SD crashes
On 14 Feb, 2012,at 03:47 PM, Martin Simmons mar...@lispworks.com wrote: On Tue, 14 Feb 2012 10:34:31 -0500, John Drescher said: I've been running the SD using the following command (I know the combination of options I have used may be excessive, but I wanted as much chance of catching the error as I could!) since yesterday afternoon:sudo bacula-sd -c /etc/bacula/bacula-sd.conf -d 100 -dt -f -u bacula -g tape -m -v | tee -a /mnt/array/bacula-sd.screen.log However, (as luck would have it) I've not seen the behaviour I originally reported whilst running with debug options. Is there any way in which running the SD with the combination of options I have used above, could cause any different behaviour of the SD? Or interfere in any way with it? I'm asking, becuase I have re-enabled all of the backups jobs I have on the server, and I have still not seen it crash again. Could be a timing issue that the delay in writing the log causes the bad behavior to not happen.. Those types of problems are hard to debug. Running it under gdb without the debug options is better approach in that case. http://www.bacula.org/5.2.x-manuals/en/problems/problems/What_Do_When_Bacula.html#SECTION0064 When it 'crashes' (though it sounds more like 'hangs' is a better word), interrupt gdb with Ctrl-c to get back to the gdb shell window (as in step 8). __Martin -- Keep Your Developer Skills Current with LearnDevNow! The most comprehensive online learning library for Microsoft developers is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3, Metro Style Apps, more. Free future releases when you subscribe now! http://p.sf.net/sfu/learndevnow-d2d ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-usersMartin and John, thank you for your replies.I since yesterday afternoon, bacula-sd has been running under gdb using the instructions in the manual for my Bacula version, however I've still not seen the issue originally reported.I agree with you both that by running it under a debugging process, it seems a delay is introduced which is suppressing the error in some way. Is my best bet just to leave bacula-sd running under gdb and hope that my full backups over the weekend may highlight the issue? Or is there another way I could debug this?Thanks,Joe-- Virtualization Cloud Management Using Capacity Planning Cloud computing makes use of virtualization - but cloud computing also focuses on allowing computing to be delivered as a service. http://www.accelacomm.com/jaw/sfnl/114/51521223/___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] SD crashes
On Wed, 15 Feb 2012 13:31:10 + (GMT), Joe Nyland said: On 14 Feb, 2012,at 03:47 PM, Martin Simmons mar...@lispworks.com wrote: On Tue, 14 Feb 2012 10:34:31 -0500, John Drescher said: I've been running the SD using the following command (I know the combination of options I have used may be excessive, but I wanted as much chance of catching the error as I could!) since yesterday afternoon: sudo bacula-sd -c /etc/bacula/bacula-sd.conf -d 100 -dt -f -u bacula -g tape -m -v | tee -a /mnt/array/bacula-sd.screen.log However, (as luck would have it) I've not seen the behaviour I originally reported whilst running with debug options. Is there any way in which running the SD with the combination of options I have used above, could cause any different behaviour of the SD? Or interfere in any way with it? I'm asking, becuase I have re-enabled all of the backups jobs I have on the server, and I have still not seen it crash again. Could be a timing issue that the delay in writing the log causes the bad behavior to not happen.. Those types of problems are hard to debug. Running it under gdb without the debug options is better approach in that case. http://www.bacula.org/5.2.x-manuals/en/problems/problems/What_Do_When_Bacula.html#SECTION0064 When it 'crashes' (though it sounds more like 'hangs' is a better word), interrupt gdb with Ctrl-c to get back to the gdb shell window (as in step 8). __Martin -- Keep Your Developer Skills Current with LearnDevNow! The most comprehensive online learning library for Microsoft developers is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3, Metro Style Apps, more. Free future releases when you subscribe now! http://p.sf.net/sfu/learndevnow-d2d ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users Martin and John, thank you for your replies. I since yesterday afternoon, bacula-sd has been running under gdb using the instructions in the manual for my Bacula version, however I've still not seen the issue originally reported. I agree with you both that by running it under a debugging process, it seems a delay is introduced which is suppressing the error in some way. Is my best bet just to leave bacula-sd running under gdb and hope that my full backups over the weekend may highlight the issue? Or is there another way I could debug this? Assuming the error causes the SD to hang (rather than exit), then you could run it without gdb and then attach gdb to it when it hangs (use gdb -p $pid). __Martin -- Virtualization Cloud Management Using Capacity Planning Cloud computing makes use of virtualization - but cloud computing also focuses on allowing computing to be delivered as a service. http://www.accelacomm.com/jaw/sfnl/114/51521223/ ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] SD crashes
On 15 Feb 2012, at 16:52, Martin Simmons wrote: On Wed, 15 Feb 2012 13:31:10 + (GMT), Joe Nyland said: On 14 Feb, 2012,at 03:47 PM, Martin Simmons mar...@lispworks.com wrote: On Tue, 14 Feb 2012 10:34:31 -0500, John Drescher said: I've been running the SD using the following command (I know the combination of options I have used may be excessive, but I wanted as much chance of catching the error as I could!) since yesterday afternoon: sudo bacula-sd -c /etc/bacula/bacula-sd.conf -d 100 -dt -f -u bacula -g tape -m -v | tee -a /mnt/array/bacula-sd.screen.log However, (as luck would have it) I've not seen the behaviour I originally reported whilst running with debug options. Is there any way in which running the SD with the combination of options I have used above, could cause any different behaviour of the SD? Or interfere in any way with it? I'm asking, becuase I have re-enabled all of the backups jobs I have on the server, and I have still not seen it crash again. Could be a timing issue that the delay in writing the log causes the bad behavior to not happen.. Those types of problems are hard to debug. Running it under gdb without the debug options is better approach in that case. http://www.bacula.org/5.2.x-manuals/en/problems/problems/What_Do_When_Bacula.html#SECTION0064 When it 'crashes' (though it sounds more like 'hangs' is a better word), interrupt gdb with Ctrl-c to get back to the gdb shell window (as in step 8). __Martin -- Keep Your Developer Skills Current with LearnDevNow! The most comprehensive online learning library for Microsoft developers is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3, Metro Style Apps, more. Free future releases when you subscribe now! http://p.sf.net/sfu/learndevnow-d2d ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users Martin and John, thank you for your replies. I since yesterday afternoon, bacula-sd has been running under gdb using the instructions in the manual for my Bacula version, however I've still not seen the issue originally reported. I agree with you both that by running it under a debugging process, it seems a delay is introduced which is suppressing the error in some way. Is my best bet just to leave bacula-sd running under gdb and hope that my full backups over the weekend may highlight the issue? Or is there another way I could debug this? Assuming the error causes the SD to hang (rather than exit), then you could run it without gdb and then attach gdb to it when it hangs (use gdb -p $pid). __Martin -- Virtualization Cloud Management Using Capacity Planning Cloud computing makes use of virtualization - but cloud computing also focuses on allowing computing to be delivered as a service. http://www.accelacomm.com/jaw/sfnl/114/51521223/ ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users Ok, that sounds reasonable. One question, does bacula-sd need to be running with -sno signals (for debugging) or will gdb be able to provide enough info without this option? (My default options on Ubuntu 10.04 are: -c config file -u user -g group) Would using the -s option introduce the sort of delay we mentioned earlier, and thus limit my changes of reproducing the issue? Thank you for your continued help. Joe -- Virtualization Cloud Management Using Capacity Planning Cloud computing makes use of virtualization - but cloud computing also focuses on allowing computing to be delivered as a service. http://www.accelacomm.com/jaw/sfnl/114/51521223/___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] SD crashes
On 13 Feb, 2012,at 02:30 PM, "Joe Nyland" joenyl...@me.com wrote:On 13 Feb, 2012,at 02:11 PM, John Drescher dresche...@gmail.com wrote:2012/2/13 Joe Nyland joenyl...@me.com: Hello everyone, I hope someone would be able to offer any suggestions of why I am seeing the following behaviour in my current Bacula setup: Since the tail end of last week, I have been having issues with my MySQL backups in Bacula, where they would randomly appear to 'crash', normally when performing a copy of a backup to another pool - but I'm not sure yet if this is the trigger. Running 'status dir' after one of these 'crashes' gives the following output for the running jobs: Running Jobs: Console connected at 12-Feb-12 15:53 Console connected at 13-Feb-12 06:58 JobId Level NameStatus == 2107 Full WebServer1_MySQL_Copy.2012-02-13_04.30.00_28 is running Crashed Job 2108 Full WebServer1_MySQL.2012-02-13_04.30.00_29 is running Crashed Job 2111 Full MythTVServer1_MySQL.2012-02-13_05.00.00_32 is waiting for higher priority jobs to finish 2113 Full TestServer_MySQL.2012-02-13_05.00.00_34 is waiting execution 2114 Full MythTVServer1_MySQL_Copy.2012-02-13_05.30.00_35 is waiting execution 2115 Full WebServer1_MySQL_Copy.2012-02-13_05.30.00_36 is waiting execution 2116 Full WebServer1_MySQL.2012-02-13_05.30.00_37 has a fatal error 2117 Full TestServer_MySQL_Copy.2012-02-13_05.30.00_38 is waiting execution 2121 Full MythTVServer1_MySQL_Copy.2012-02-13_06.30.00_42 is waiting execution 2122 Full WebServer1_MySQL_Copy.2012-02-13_06.30.00_43 is waiting execution 2123 Full WebServer1_MySQL.2012-02-13_06.30.00_44 has a fatal error 2124 Full TestServer_MySQL_Copy.2012-02-13_06.30.00_45 is waiting execution 2125 Full MythTVServer1_MySQL.2012-02-13_07.00.00_47 has a fatal error 2126 Full WebServer1_MySQL.2012-02-13_07.00.00_48 has a fatal error Once the above appears, I am unable to view the status of any storage resource on my SD: *status storage=FileServer1_Full Connecting to Storage daemon FileServer1_Full at FileServer1:9103 FileServer1-sd Version: 5.0.1 (24 February 2010) x86_64-pc-linux-gnu ubuntu 10.04 Daemon started 12-Feb-12 15:53, 92 Jobs run since started. Heap: heap=1,671,168 smbytes=1,188,608 max_bytes=1,388,208 bufs=577 max_bufs=994 Sizes: boffset_t=8 size_t=8 int32_t=4 int64_t=8 Running Jobs: Reading: Full Copy job WebServer1_MySQL_Copy JobId=2107 Volume="WebServer1_MySQL_1325"pool="WebServer1_MySQL" device="WebServer1_MySQL" (/mnt/backup/Bacula/Databases/WebServer1)Files=4 Bytes=164,924 Bytes/sec=17FDSocket closed Jobs waiting to reserve a drive: Terminated Jobs: JobId Level Files Bytes Status FinishedName === 2091 Full 2 92.45 K OK13-Feb-12 03:30 TestServer_MySQL_Copy 2096 Full 5 2.258 M OK13-Feb-12 03:30 MythTVServer1_MySQL_Copy 2098 Full 4 164.9 K OK13-Feb-12 03:30 WebServer1_MySQL_Copy 2100 Full 2 92.45 K OK13-Feb-12 03:30 TestServer_MySQL_Copy 2078 Full 1,145 2.942 G OK13-Feb-12 03:31 SVN_Copy 2102 Full 5 2.259 M OK13-Feb-12 04:01 MythTVServer1_MySQL 2103 Full 4 164.9 K OK13-Feb-12 04:01 WebServer1_MySQL 2104 Full 2 92.37 K OK13-Feb-12 04:01 TestServer_MySQL 2105 Full 5 2.259 M OK13-Feb-12 04:30 MythTVServer1_MySQL_Copy 2109 Full 2 92.37 K OK13-Feb-12 04:30 TestServer_MySQL_Copy Device status: Device "Default" (/mnt/backup/Bacula) is not open. snip Device "WebServer1_Inc" (/mnt/backup/Bacula/WebServer1/Incremental) is not open. Device "WebServer1_MySQL" (/mnt/backup/Bacula/Databases/WebServer1) is mounted with:Volume: WebServer1_MySQL_1325Pool:WebServer1_MySQLMedia type: FileTotal Bytes Read=0 Blocks Read=0 Bytes/block=0Positioned at File=0 Block=0 Device "WebServer1_MySQL_Copy" (/mnt/mac_backup/Bacula/Databases/WebServer1) is not open. Device "WebServer1_Full_Copy" (/mnt/mac_backup/Bacula/WebServer1/Full) is not open. Device "WebServer1_Inc_Copy" (/mnt/mac_backup/Bacula/WebServer1/Incrementals) is not open. snip Device "SharedData_Diff" (/mnt/backup/Bacula/Shared/Differential) is not open. Used Volume status: NOTE: bconsole appears to crash here - no further output is produced, and bconsole does not respond to any key presses. I have to Ctrl + C to exit out from bconsole. Furthermore, the only way I can clear our the failed jobs from the 'Running jobs queue' is to exit from bconsole, issue 'sudo service bacula-sd stop' twice, then restart the SD and restart bacula-director.What I have is for 4 of my clients I run a MySQL backup hourly at 00:00, 01:00,etc. I then copy the MySQL backups to another storage resource on my SD at 00:30, 01:30, etc. The MySQL databases which I am
Re: [Bacula-users] SD crashes
I've been running the SD using the following command (I know the combination of options I have used may be excessive, but I wanted as much chance of catching the error as I could!) since yesterday afternoon: sudo bacula-sd -c /etc/bacula/bacula-sd.conf -d 100 -dt -f -u bacula -g tape -m -v | tee -a /mnt/array/bacula-sd.screen.log However, (as luck would have it) I've not seen the behaviour I originally reported whilst running with debug options. Is there any way in which running the SD with the combination of options I have used above, could cause any different behaviour of the SD? Or interfere in any way with it? I'm asking, becuase I have re-enabled all of the backups jobs I have on the server, and I have still not seen it crash again. Could be a timing issue that the delay in writing the log causes the bad behavior to not happen.. Those types of problems are hard to debug. John -- Keep Your Developer Skills Current with LearnDevNow! The most comprehensive online learning library for Microsoft developers is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3, Metro Style Apps, more. Free future releases when you subscribe now! http://p.sf.net/sfu/learndevnow-d2d ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] SD crashes
On Tue, 14 Feb 2012 10:34:31 -0500, John Drescher said: I've been running the SD using the following command (I know the combination of options I have used may be excessive, but I wanted as much chance of catching the error as I could!) since yesterday afternoon: sudo bacula-sd -c /etc/bacula/bacula-sd.conf -d 100 -dt -f -u bacula -g tape -m -v | tee -a /mnt/array/bacula-sd.screen.log However, (as luck would have it) I've not seen the behaviour I originally reported whilst running with debug options. Is there any way in which running the SD with the combination of options I have used above, could cause any different behaviour of the SD? Or interfere in any way with it? I'm asking, becuase I have re-enabled all of the backups jobs I have on the server, and I have still not seen it crash again. Could be a timing issue that the delay in writing the log causes the bad behavior to not happen.. Those types of problems are hard to debug. Running it under gdb without the debug options is better approach in that case. http://www.bacula.org/5.2.x-manuals/en/problems/problems/What_Do_When_Bacula.html#SECTION0064 When it 'crashes' (though it sounds more like 'hangs' is a better word), interrupt gdb with Ctrl-c to get back to the gdb shell window (as in step 8). __Martin -- Keep Your Developer Skills Current with LearnDevNow! The most comprehensive online learning library for Microsoft developers is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3, Metro Style Apps, more. Free future releases when you subscribe now! http://p.sf.net/sfu/learndevnow-d2d ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] SD crashes
Hi Joe, On Mon, Feb 13, 2012 at 07:21:03AM +, Joe Nyland wrote: I hope someone would be able to offer any suggestions of why I am seeing the following behaviour in my current Bacula setup: Since the tail end of last week, I have been having issues with my MySQL backups in Bacula, where they would randomly appear to 'crash', normally when performing a copy of a backup to another pool - but I'm not sure yet if this is the trigger. With bacula 5.0.3 I had frequent crashes on Copy Jobs as I ran out of memory. The SD-box has only 4GB RAM, now I added 8GB swap and it seems to run fine. NOTE: bconsole appears to crash here - no further output is produced, and bconsole does not respond to any key presses. I have to Ctrl + C to exit out from bconsole. Furthermore, the only way I can clear our the failed jobs from the 'Running jobs queue' is to exit from bconsole, issue 'sudo service bacula-sd stop' twice, then restart the SD and restart bacula-director. Here the bacula-sd crashes and misses from process list. I have another issue I have not been able to track down so far. The tape changer seems to claim it has 0 slots now and then and bacula-sd really dislikes that. Seems mostly to happen when tapes are moving and some 'mtx status'-like command is issued. If this happens, I need to stop bacula-sd, it will take some time to umount the tape (bacula-sd has 'D' state in 'ps'), only afterwards it can be started again and all is fine. 'update slots' without restart won't help, even as 'mtx status' gives correct output again. Perhaps this is comparable to your issue 'sudo service bacula-sd stop' twice. Regards, Adrian -- LiHAS - Adrian Reyer - Hessenwiesenstraße 10 - D-70565 Stuttgart Fon: +49 (7 11) 78 28 50 90 - Fax: +49 (7 11) 78 28 50 91 Mail: li...@lihas.de - Web: http://lihas.de Linux, Netzwerke, Consulting Support - USt-ID: DE 227 816 626 Stuttgart -- Try before you buy = See our experts in action! The most comprehensive online learning library for Microsoft developers is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3, Metro Style Apps, more. Free future releases when you subscribe now! http://p.sf.net/sfu/learndevnow-dev2 ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] SD crashes
2012/2/13 Joe Nyland joenyl...@me.com: Hello everyone, I hope someone would be able to offer any suggestions of why I am seeing the following behaviour in my current Bacula setup: Since the tail end of last week, I have been having issues with my MySQL backups in Bacula, where they would randomly appear to 'crash', normally when performing a copy of a backup to another pool - but I'm not sure yet if this is the trigger. Running 'status dir' after one of these 'crashes' gives the following output for the running jobs: Running Jobs: Console connected at 12-Feb-12 15:53 Console connected at 13-Feb-12 06:58 JobId Level Name Status == 2107 Full WebServer1_MySQL_Copy.2012-02-13_04.30.00_28 is running Crashed Job 2108 Full WebServer1_MySQL.2012-02-13_04.30.00_29 is running Crashed Job 2111 Full MythTVServer1_MySQL.2012-02-13_05.00.00_32 is waiting for higher priority jobs to finish 2113 Full TestServer_MySQL.2012-02-13_05.00.00_34 is waiting execution 2114 Full MythTVServer1_MySQL_Copy.2012-02-13_05.30.00_35 is waiting execution 2115 Full WebServer1_MySQL_Copy.2012-02-13_05.30.00_36 is waiting execution 2116 Full WebServer1_MySQL.2012-02-13_05.30.00_37 has a fatal error 2117 Full TestServer_MySQL_Copy.2012-02-13_05.30.00_38 is waiting execution 2121 Full MythTVServer1_MySQL_Copy.2012-02-13_06.30.00_42 is waiting execution 2122 Full WebServer1_MySQL_Copy.2012-02-13_06.30.00_43 is waiting execution 2123 Full WebServer1_MySQL.2012-02-13_06.30.00_44 has a fatal error 2124 Full TestServer_MySQL_Copy.2012-02-13_06.30.00_45 is waiting execution 2125 Full MythTVServer1_MySQL.2012-02-13_07.00.00_47 has a fatal error 2126 Full WebServer1_MySQL.2012-02-13_07.00.00_48 has a fatal error Once the above appears, I am unable to view the status of any storage resource on my SD: *status storage=FileServer1_Full Connecting to Storage daemon FileServer1_Full at FileServer1:9103 FileServer1-sd Version: 5.0.1 (24 February 2010) x86_64-pc-linux-gnu ubuntu 10.04 Daemon started 12-Feb-12 15:53, 92 Jobs run since started. Heap: heap=1,671,168 smbytes=1,188,608 max_bytes=1,388,208 bufs=577 max_bufs=994 Sizes: boffset_t=8 size_t=8 int32_t=4 int64_t=8 Running Jobs: Reading: Full Copy job WebServer1_MySQL_Copy JobId=2107 Volume=WebServer1_MySQL_1325 pool=WebServer1_MySQL device=WebServer1_MySQL (/mnt/backup/Bacula/Databases/WebServer1) Files=4 Bytes=164,924 Bytes/sec=17 FDSocket closed Jobs waiting to reserve a drive: Terminated Jobs: JobId Level Files Bytes Status Finished Name === 2091 Full 2 92.45 K OK 13-Feb-12 03:30 TestServer_MySQL_Copy 2096 Full 5 2.258 M OK 13-Feb-12 03:30 MythTVServer1_MySQL_Copy 2098 Full 4 164.9 K OK 13-Feb-12 03:30 WebServer1_MySQL_Copy 2100 Full 2 92.45 K OK 13-Feb-12 03:30 TestServer_MySQL_Copy 2078 Full 1,145 2.942 G OK 13-Feb-12 03:31 SVN_Copy 2102 Full 5 2.259 M OK 13-Feb-12 04:01 MythTVServer1_MySQL 2103 Full 4 164.9 K OK 13-Feb-12 04:01 WebServer1_MySQL 2104 Full 2 92.37 K OK 13-Feb-12 04:01 TestServer_MySQL 2105 Full 5 2.259 M OK 13-Feb-12 04:30 MythTVServer1_MySQL_Copy 2109 Full 2 92.37 K OK 13-Feb-12 04:30 TestServer_MySQL_Copy Device status: Device Default (/mnt/backup/Bacula) is not open. snip Device WebServer1_Inc (/mnt/backup/Bacula/WebServer1/Incremental) is not open. Device WebServer1_MySQL (/mnt/backup/Bacula/Databases/WebServer1) is mounted with: Volume: WebServer1_MySQL_1325 Pool: WebServer1_MySQL Media type: File Total Bytes Read=0 Blocks Read=0 Bytes/block=0 Positioned at File=0 Block=0 Device WebServer1_MySQL_Copy (/mnt/mac_backup/Bacula/Databases/WebServer1) is not open. Device WebServer1_Full_Copy (/mnt/mac_backup/Bacula/WebServer1/Full) is not open. Device WebServer1_Inc_Copy (/mnt/mac_backup/Bacula/WebServer1/Incrementals) is not open. snip Device SharedData_Diff (/mnt/backup/Bacula/Shared/Differential) is not open. Used Volume status: NOTE: bconsole appears to crash here - no further output is produced, and bconsole does not respond to any key presses. I have to Ctrl + C to exit out from bconsole. Furthermore, the only way I can clear our the failed jobs from the 'Running jobs queue' is to exit from bconsole, issue 'sudo service bacula-sd stop' twice, then restart the SD and restart bacula-director. What I have is for 4 of my clients I run a MySQL backup hourly at 00:00, 01:00, etc. I then copy the MySQL backups to another storage
Re: [Bacula-users] SD crashes
On 13 Feb, 2012,at 11:37 AM, Adrian Reyer bacula-li...@lihas.de wrote:Hi Joe, On Mon, Feb 13, 2012 at 07:21:03AM +, Joe Nyland wrote: I hope someone would be able to offer any suggestions of why I am seeing the following behaviour in my current Bacula setup: Since the tail end of last week, I have been having issues with my MySQL backups in Bacula, where they would randomly appear to 'crash', normally when performing a copy of a backup to another pool - but I'm not sure yet if this is the trigger. With bacula 5.0.3 I had frequent crashes on Copy Jobs as I ran out of memory. The SD-box has only 4GB RAM, now I added 8GB swap and it seems to run fine. NOTE: bconsole appears to crash here - no further output is produced, and bconsole does not respond to any key presses. I have to Ctrl + C to exit out from bconsole. Furthermore, the only way I can clear our the failed jobs from the 'Running jobs queue' is to exit from bconsole, issue 'sudo service bacula-sd stop' twice, then restart the SD and restart bacula-director. Here the bacula-sd crashes and misses from process list. I have another issue I have not been able to track down so far. The tape changer seems to claim it has 0 slots now and then and bacula-sd really dislikes that. Seems mostly to happen when tapes are moving and some 'mtx status'-like command is issued. If this happens, I need to stop bacula-sd, it will take some time to umount the tape (bacula-sd has 'D' state in 'ps'), only afterwards it can be started again and all is fine. 'update slots' without restart won't help, even as 'mtx status' gives correct output again. Perhaps this is comparable to your "issue 'sudo service bacula-sd stop' twice". Regards, Adrian -- LiHAS - Adrian Reyer - Hessenwiesenstraße 10 - D-70565 Stuttgart Fon: +49 (7 11) 78 28 50 90 - Fax: +49 (7 11) 78 28 50 91 Mail: li...@lihas.de - Web: http://lihas.de Linux, Netzwerke, Consulting Support - USt-ID: DE 227 816 626 StuttgartHi Adrian,Thanks for your reply.I hadn't considered RAM as being the cause of the problem, mainly because other backup jobs backup far more (and far larger) files to this same SD without issue. It seems to be only when I introduced MySQL backups of different servers (including the Bacula catalog server) into the mix, that I started to see this behaviour.My current theory which I am testing is disabling the MySQL backup and copy jobs for FileServer1 only, so that the Bacula database is not backed up in Bacula as this resides on FileServer1 - I'm starting to wonder whether the process of backing up my catalog at the same time that several other backup jobs are running (and completing in the case of the smaller DBs) is somehow causing this problem. However, this doesn't explain why the SD appears to be crashing :-(.In the meantime, I have found this bug which was forwarded on from Debian bugs: http://bugs.bacula.org/view.php?id=1098, However it appears to for Bacula 2.2.8 :-( Another mention of the issue here: http://adsm.org/lists/html/Bacula-users/2009-12/msg00140.html but that's for Bacula 3.0.3.Any other ideas?Thank you.Joe-- Try before you buy = See our experts in action! The most comprehensive online learning library for Microsoft developers is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3, Metro Style Apps, more. Free future releases when you subscribe now! http://p.sf.net/sfu/learndevnow-dev2___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] SD crashes
On 13 Feb, 2012,at 02:11 PM, John Drescher dresche...@gmail.com wrote:2012/2/13 Joe Nyland joenyl...@me.com: Hello everyone, I hope someone would be able to offer any suggestions of why I am seeing the following behaviour in my current Bacula setup: Since the tail end of last week, I have been having issues with my MySQL backups in Bacula, where they would randomly appear to 'crash', normally when performing a copy of a backup to another pool - but I'm not sure yet if this is the trigger. Running 'status dir' after one of these 'crashes' gives the following output for the running jobs: Running Jobs: Console connected at 12-Feb-12 15:53 Console connected at 13-Feb-12 06:58 JobId Level NameStatus == 2107 Full WebServer1_MySQL_Copy.2012-02-13_04.30.00_28 is running Crashed Job 2108 Full WebServer1_MySQL.2012-02-13_04.30.00_29 is running Crashed Job 2111 Full MythTVServer1_MySQL.2012-02-13_05.00.00_32 is waiting for higher priority jobs to finish 2113 Full TestServer_MySQL.2012-02-13_05.00.00_34 is waiting execution 2114 Full MythTVServer1_MySQL_Copy.2012-02-13_05.30.00_35 is waiting execution 2115 Full WebServer1_MySQL_Copy.2012-02-13_05.30.00_36 is waiting execution 2116 Full WebServer1_MySQL.2012-02-13_05.30.00_37 has a fatal error 2117 Full TestServer_MySQL_Copy.2012-02-13_05.30.00_38 is waiting execution 2121 Full MythTVServer1_MySQL_Copy.2012-02-13_06.30.00_42 is waiting execution 2122 Full WebServer1_MySQL_Copy.2012-02-13_06.30.00_43 is waiting execution 2123 Full WebServer1_MySQL.2012-02-13_06.30.00_44 has a fatal error 2124 Full TestServer_MySQL_Copy.2012-02-13_06.30.00_45 is waiting execution 2125 Full MythTVServer1_MySQL.2012-02-13_07.00.00_47 has a fatal error 2126 Full WebServer1_MySQL.2012-02-13_07.00.00_48 has a fatal error Once the above appears, I am unable to view the status of any storage resource on my SD: *status storage=FileServer1_Full Connecting to Storage daemon FileServer1_Full at FileServer1:9103 FileServer1-sd Version: 5.0.1 (24 February 2010) x86_64-pc-linux-gnu ubuntu 10.04 Daemon started 12-Feb-12 15:53, 92 Jobs run since started. Heap: heap=1,671,168 smbytes=1,188,608 max_bytes=1,388,208 bufs=577 max_bufs=994 Sizes: boffset_t=8 size_t=8 int32_t=4 int64_t=8 Running Jobs: Reading: Full Copy job WebServer1_MySQL_Copy JobId=2107 Volume="WebServer1_MySQL_1325"pool="WebServer1_MySQL" device="WebServer1_MySQL" (/mnt/backup/Bacula/Databases/WebServer1)Files=4 Bytes=164,924 Bytes/sec=17FDSocket closed Jobs waiting to reserve a drive: Terminated Jobs: JobId Level Files Bytes Status FinishedName === 2091 Full 2 92.45 K OK13-Feb-12 03:30 TestServer_MySQL_Copy 2096 Full 5 2.258 M OK13-Feb-12 03:30 MythTVServer1_MySQL_Copy 2098 Full 4 164.9 K OK13-Feb-12 03:30 WebServer1_MySQL_Copy 2100 Full 2 92.45 K OK13-Feb-12 03:30 TestServer_MySQL_Copy 2078 Full 1,145 2.942 G OK13-Feb-12 03:31 SVN_Copy 2102 Full 5 2.259 M OK13-Feb-12 04:01 MythTVServer1_MySQL 2103 Full 4 164.9 K OK13-Feb-12 04:01 WebServer1_MySQL 2104 Full 2 92.37 K OK13-Feb-12 04:01 TestServer_MySQL 2105 Full 5 2.259 M OK13-Feb-12 04:30 MythTVServer1_MySQL_Copy 2109 Full 2 92.37 K OK13-Feb-12 04:30 TestServer_MySQL_Copy Device status: Device "Default" (/mnt/backup/Bacula) is not open. snip Device "WebServer1_Inc" (/mnt/backup/Bacula/WebServer1/Incremental) is not open. Device "WebServer1_MySQL" (/mnt/backup/Bacula/Databases/WebServer1) is mounted with:Volume: WebServer1_MySQL_1325Pool:WebServer1_MySQLMedia type: FileTotal Bytes Read=0 Blocks Read=0 Bytes/block=0Positioned at File=0 Block=0 Device "WebServer1_MySQL_Copy" (/mnt/mac_backup/Bacula/Databases/WebServer1) is not open. Device "WebServer1_Full_Copy" (/mnt/mac_backup/Bacula/WebServer1/Full) is not open. Device "WebServer1_Inc_Copy" (/mnt/mac_backup/Bacula/WebServer1/Incrementals) is not open. snip Device "SharedData_Diff" (/mnt/backup/Bacula/Shared/Differential) is not open. Used Volume status: NOTE: bconsole appears to crash here - no further output is produced, and bconsole does not respond to any key presses. I have to Ctrl + C to exit out from bconsole. Furthermore, the only way I can clear our the failed jobs from the 'Running jobs queue' is to exit from bconsole, issue 'sudo service bacula-sd stop' twice, then restart the SD and restart bacula-director.What I have is for 4 of my clients I run a MySQL backup hourly at 00:00, 01:00,etc. I then copy the MySQL backups to another storage resource on my SD at 00:30, 01:30, etc. The MySQL databases which I am backing up are relatively small, the biggest of which is my Bacula
[Bacula-users] SD crashes
Hello everyone, I hope someone would be able to offer any suggestions of why I am seeing the following behaviour in my current Bacula setup: Since the tail end of last week, I have been having issues with my MySQL backups in Bacula, where they would randomly appear to 'crash', normally when performing a copy of a backup to another pool - but I'm not sure yet if this is the trigger. Running 'status dir' after one of these 'crashes' gives the following output for the running jobs: Running Jobs: Console connected at 12-Feb-12 15:53 Console connected at 13-Feb-12 06:58 JobId Level Name Status == 2107 FullWebServer1_MySQL_Copy.2012-02-13_04.30.00_28 is running Crashed Job 2108 FullWebServer1_MySQL.2012-02-13_04.30.00_29 is running Crashed Job 2111 FullMythTVServer1_MySQL.2012-02-13_05.00.00_32 is waiting for higher priority jobs to finish 2113 FullTestServer_MySQL.2012-02-13_05.00.00_34 is waiting execution 2114 FullMythTVServer1_MySQL_Copy.2012-02-13_05.30.00_35 is waiting execution 2115 FullWebServer1_MySQL_Copy.2012-02-13_05.30.00_36 is waiting execution 2116 FullWebServer1_MySQL.2012-02-13_05.30.00_37 has a fatal error 2117 FullTestServer_MySQL_Copy.2012-02-13_05.30.00_38 is waiting execution 2121 FullMythTVServer1_MySQL_Copy.2012-02-13_06.30.00_42 is waiting execution 2122 FullWebServer1_MySQL_Copy.2012-02-13_06.30.00_43 is waiting execution 2123 FullWebServer1_MySQL.2012-02-13_06.30.00_44 has a fatal error 2124 FullTestServer_MySQL_Copy.2012-02-13_06.30.00_45 is waiting execution 2125 FullMythTVServer1_MySQL.2012-02-13_07.00.00_47 has a fatal error 2126 FullWebServer1_MySQL.2012-02-13_07.00.00_48 has a fatal error Once the above appears, I am unable to view the status of any storage resource on my SD: *status storage=FileServer1_Full Connecting to Storage daemon FileServer1_Full at FileServer1:9103 FileServer1-sd Version: 5.0.1 (24 February 2010) x86_64-pc-linux-gnu ubuntu 10.04 Daemon started 12-Feb-12 15:53, 92 Jobs run since started. Heap: heap=1,671,168 smbytes=1,188,608 max_bytes=1,388,208 bufs=577 max_bufs=994 Sizes: boffset_t=8 size_t=8 int32_t=4 int64_t=8 Running Jobs: Reading: Full Copy job WebServer1_MySQL_Copy JobId=2107 Volume=WebServer1_MySQL_1325 pool=WebServer1_MySQL device=WebServer1_MySQL (/mnt/backup/Bacula/Databases/WebServer1) Files=4 Bytes=164,924 Bytes/sec=17 FDSocket closed Jobs waiting to reserve a drive: Terminated Jobs: JobId LevelFiles Bytes Status FinishedName === 2091 Full 292.45 K OK 13-Feb-12 03:30 TestServer_MySQL_Copy 2096 Full 52.258 M OK 13-Feb-12 03:30 MythTVServer1_MySQL_Copy 2098 Full 4164.9 K OK 13-Feb-12 03:30 WebServer1_MySQL_Copy 2100 Full 292.45 K OK 13-Feb-12 03:30 TestServer_MySQL_Copy 2078 Full 1,1452.942 G OK 13-Feb-12 03:31 SVN_Copy 2102 Full 52.259 M OK 13-Feb-12 04:01 MythTVServer1_MySQL 2103 Full 4164.9 K OK 13-Feb-12 04:01 WebServer1_MySQL 2104 Full 292.37 K OK 13-Feb-12 04:01 TestServer_MySQL 2105 Full 52.259 M OK 13-Feb-12 04:30 MythTVServer1_MySQL_Copy 2109 Full 292.37 K OK 13-Feb-12 04:30 TestServer_MySQL_Copy Device status: Device Default (/mnt/backup/Bacula) is not open. snip Device WebServer1_Inc (/mnt/backup/Bacula/WebServer1/Incremental) is not open. Device WebServer1_MySQL (/mnt/backup/Bacula/Databases/WebServer1) is mounted with: Volume: WebServer1_MySQL_1325 Pool:WebServer1_MySQL Media type: File Total Bytes Read=0 Blocks Read=0 Bytes/block=0 Positioned at File=0 Block=0 Device WebServer1_MySQL_Copy (/mnt/mac_backup/Bacula/Databases/WebServer1) is not open. Device WebServer1_Full_Copy (/mnt/mac_backup/Bacula/WebServer1/Full) is not open. Device WebServer1_Inc_Copy (/mnt/mac_backup/Bacula/WebServer1/Incrementals) is not open. snip Device SharedData_Diff (/mnt/backup/Bacula/Shared/Differential) is not open. Used Volume status: NOTE: bconsole appears to crash here - no further output is produced, and bconsole does not respond to any key presses. I have to Ctrl + C to exit out from bconsole. Furthermore, the only way I can clear our the failed jobs from the 'Running jobs queue' is to exit from bconsole, issue 'sudo service bacula-sd stop' twice, then restart the SD and restart bacula-director. What I have is for 4 of my clients I run a MySQL backup hourly at 00:00, 01:00, etc. I then copy the MySQL backups to another storage