Re: [Bacula-users] [Backup Fatal Error ] Deduplication Optimized Volumes testing

2018-01-31 Thread Kern Sibbald

  
  
Hello,
As the error message says the Aligned-Disk cannot be opened or
  does not exist.  What it doesn't say very clearly is that you do
  not have the aligned-driver.  This driver has not yet been
  released.  It will be released in binary form in the next couple
  of months.  We are currently building, testing, and documenting
  the binaries.
When it is ready, I will clearly announce it.

Best regards,
Kern


On 30.01.2018 08:18, Vinay Singh wrote:


  
Hi All,

i am getting below errors during testing of Aligned volumes,
  can you please help me in resolving the same.
  i am testing Aligned volumes on Bacula Version: 9.0.6 on
  Centos7 with a NAS system that supports deduplicated file
  systems.
25-Jan 02:10 localhost.localdomain-sd JobId 52: Error: [SE0003]
  Lookup of symbol "BaculaSDdriver" in driver Aligned-Disk for
  device /usr/lib64/bacula-sd-aligned-driver-9.0.6.so
  failed: ERR=/usr/lib64/bacula-sd-aligned-driver-9.0.6.so:
  undefined symbol: BaculaSDdriver 25-Jan 02:10
  localhost.localdomain-sd JobId 52: Warning:
  Device "Aligned-Disk" requested by DIR could not be opened or
  does not exist.
  25-Jan 02:10 localhost.localdomain-sd JobId 52: Error: [SE0003]
  Lookup of symbol "BaculaSDdriver" in driver Aligned-Disk for
  device /usr/lib64/bacula-sd-aligned-driver-9.0.6.so
  failed: ERR=/usr/lib64/bacula-sd-aligned-driver-9.0.6.so:
  undefined symbol: BaculaSDdriver 25-Jan 02:10
  localhost.localdomain-sd JobId 52: Warning:
  Device "Aligned-Disk" requested by DIR could not be opened or
  does not exist.
  25-Jan 02:10 localhost.localdomain-sd JobId 52: Error: [SE0003]
  Lookup of symbol "BaculaSDdriver" in driver Aligned-Disk for
  device /usr/lib64/bacula-sd-aligned-driver-9.0.6.so
  failed: ERR=/usr/lib64/bacula-sd-aligned-driver-9.0.6.so:
  undefined symbol: BaculaSDdriver 25-Jan 02:10
  localhost.localdomain-sd JobId 52: Warning:
  Device "Aligned-Disk" requested by DIR could not be opened or
  does not exist.
  .
  .
  .
  25-Jan 02:20 localhost.localdomain-sd JobId 52: Error: [SE0003]
  Lookup of symbol "BaculaSDdriver" in driver Aligned-Disk for
  device /usr/lib64/bacula-sd-aligned-driver-9.0.6.so
  failed: ERR=/usr/lib64/bacula-sd-aligned-driver-9.0.6.so:
  undefined symbol: BaculaSDdriver 25-Jan 02:20
  localhost.localdomain-sd JobId 52: Warning:
  Device "Aligned-Disk" requested by DIR could not be opened or
  does not exist.
  25-Jan 02:20 localhost.localdomain-sd JobId 52: Fatal error:
  Device reservation failed for JobId=52: Jmsg JobId=52 type=5
  level=1516864851 localhost.localdomain-sd JobId 52: Warning:
  Device "Aligned-Disk" requested by DIR could not be opened or
  does not exist.
  25-Jan 02:20 localhost.localdomain-dir JobId 52: Fatal error:
  Storage daemon didn't accept Device "Aligned-Disk" because:
  3924 Device "Aligned-Disk" not in SD Device resources or no
  matching Media Type or is disabled.
  25-Jan 02:20 localhost.localdomain-dir JobId 52: Error: Bacula
  localhost.localdomain-dir 9.0.6 (20Nov17):
  Build OS: x86_64-pc-linux-gnu redhat (Core)
  JobId: 52
  Job: test_job.2018-01-25_02.10.49_09
  Backup Level: Full (upgraded from Incremental)
  Client: "urmachine-fd" 9.0.6 (20Nov17)
  x86_64-pc-linux-gnu,redhat,(Core)
  FileSet: "testfileNEW" 2018-01-19 02:39:55
  Pool: "Default" (From Job resource)
  Catalog: "MyCatalog" (From Client resource)
  Storage: "testStorageDaemon" (From Job resource)
  Scheduled time: 25-Jan-2018 02:10:48
  Start time: 25-Jan-2018 02:10:51
  End time: 25-Jan-2018 02:20:51
  Elapsed time: 10 mins
  Priority: 10
  FD Files Written: 0
  SD Files Written: 0
  FD Bytes Written: 0 (0 B)
  SD Bytes Written: 0 (0 B)
  Rate: 0.0 KB/s
  Software Compression: None
  Comm Line Compression: None
  Snapshot/VSS: no
  Encryption: no
  Accurate: no
  Volume name(s):
  Volume Session Id: 2
  Volume Session Time: 1516863775
  Last Volume Bytes: 0 (0 B)
  Non-fatal FD errors: 1
  SD Errors: 0
  FD termination status:
  SD termination status:
  Termination:  Backup Error 

Regards
Vinay

  
  
  
  
  --
Check out the vibrant tech community on one of the 

[Bacula-users] [Backup Fatal Error ] Deduplication Optimized Volumes testing

2018-01-29 Thread Vinay Singh
Hi All,

i am getting below errors during testing of Aligned volumes, can you please
help me in resolving the same.
i am testing Aligned volumes on Bacula Version: 9.0.6 on Centos7 with a NAS
system that supports deduplicated file systems.

25-Jan 02:10 localhost.localdomain-sd JobId 52: Error: [SE0003] Lookup of
symbol "BaculaSDdriver" in driver Aligned-Disk for device /usr/lib64/
bacula-sd-aligned-driver-9.0.6.so failed: ERR=/usr/lib64/bacula-sd-
aligned-driver-9.0.6.so: undefined symbol: BaculaSDdriver 25-Jan 02:10
localhost.localdomain-sd JobId 52: Warning:
Device "Aligned-Disk" requested by DIR could not be opened or does not
exist.
25-Jan 02:10 localhost.localdomain-sd JobId 52: Error: [SE0003] Lookup of
symbol "BaculaSDdriver" in driver Aligned-Disk for device /usr/lib64/
bacula-sd-aligned-driver-9.0.6.so failed: ERR=/usr/lib64/bacula-sd-
aligned-driver-9.0.6.so: undefined symbol: BaculaSDdriver 25-Jan 02:10
localhost.localdomain-sd JobId 52: Warning:
Device "Aligned-Disk" requested by DIR could not be opened or does not
exist.
25-Jan 02:10 localhost.localdomain-sd JobId 52: Error: [SE0003] Lookup of
symbol "BaculaSDdriver" in driver Aligned-Disk for device /usr/lib64/
bacula-sd-aligned-driver-9.0.6.so failed: ERR=/usr/lib64/bacula-sd-
aligned-driver-9.0.6.so: undefined symbol: BaculaSDdriver 25-Jan 02:10
localhost.localdomain-sd JobId 52: Warning:
Device "Aligned-Disk" requested by DIR could not be opened or does not
exist.
.
.
.
25-Jan 02:20 localhost.localdomain-sd JobId 52: Error: [SE0003] Lookup of
symbol "BaculaSDdriver" in driver Aligned-Disk for device /usr/lib64/
bacula-sd-aligned-driver-9.0.6.so failed: ERR=/usr/lib64/bacula-sd-
aligned-driver-9.0.6.so: undefined symbol: BaculaSDdriver 25-Jan 02:20
localhost.localdomain-sd JobId 52: Warning:
Device "Aligned-Disk" requested by DIR could not be opened or does not
exist.
25-Jan 02:20 localhost.localdomain-sd JobId 52: Fatal error: Device
reservation failed for JobId=52: Jmsg JobId=52 type=5 level=1516864851
localhost.localdomain-sd JobId 52: Warning:
Device "Aligned-Disk" requested by DIR could not be opened or does not
exist.
25-Jan 02:20 localhost.localdomain-dir JobId 52: Fatal error:
Storage daemon didn't accept Device "Aligned-Disk" because:
3924 Device "Aligned-Disk" not in SD Device resources or no matching Media
Type or is disabled.
25-Jan 02:20 localhost.localdomain-dir JobId 52: Error: Bacula
localhost.localdomain-dir 9.0.6 (20Nov17):
Build OS: x86_64-pc-linux-gnu redhat (Core)
JobId: 52
Job: test_job.2018-01-25_02.10.49_09
Backup Level: Full (upgraded from Incremental)
Client: "urmachine-fd" 9.0.6 (20Nov17) x86_64-pc-linux-gnu,redhat,(Core)
FileSet: "testfileNEW" 2018-01-19 02:39:55
Pool: "Default" (From Job resource)
Catalog: "MyCatalog" (From Client resource)
Storage: "testStorageDaemon" (From Job resource)
Scheduled time: 25-Jan-2018 02:10:48
Start time: 25-Jan-2018 02:10:51
End time: 25-Jan-2018 02:20:51
Elapsed time: 10 mins
Priority: 10
FD Files Written: 0
SD Files Written: 0
FD Bytes Written: 0 (0 B)
SD Bytes Written: 0 (0 B)
Rate: 0.0 KB/s
Software Compression: None
Comm Line Compression: None
Snapshot/VSS: no
Encryption: no
Accurate: no
Volume name(s):
Volume Session Id: 2
Volume Session Time: 1516863775
Last Volume Bytes: 0 (0 B)
Non-fatal FD errors: 1
SD Errors: 0
FD termination status:
SD termination status:
Termination:
* Backup Error *

Regards

Vinay
--
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


[Bacula-users] Backup Fatal Error

2006-11-08 Thread Ryan Novosielski
Help me out here -- does this appear to be a media error? I think it's 
possible that I had a job failure on this tape before, but I don't have 
the records. What other explanation could there be other than a media 
problem (incidentally, tonight's backups have run successfully on the 
same tape).
---BeginMessage---
07-Nov 23:00 helios-dir: Start Backup JobId 1586, 
Job=sopris-WEBMAIL.2006-11-07_23.00.01
07-Nov 22:52 sopris-fd: DIR and FD clocks differ by -479 seconds, FD 
automatically adjusting.
07-Nov 23:00 helios-sd: Spooling data ...
07-Nov 23:00 helios-sd: Writing spooled data to Volume. Despooling 64,524 bytes 
...
07-Nov 23:00 helios-sd: Writing spooled data to Volume. Despooling 0 bytes ...
07-Nov 23:00 helios-sd: sopris-WEBMAIL.2006-11-07_23.00.01 Fatal error: 
Retrying after data spooling error failed.
07-Nov 23:00 helios-sd: sopris-WEBMAIL.2006-11-07_23.00.01 Fatal error: 
append.c:207 Fatal append error on device helios_DDS (/dev/rmt/0cbn): ERR=Job 
BigBrother-NET.2006-10-14_11.10.24 canceled while waiting for mount on Storage 
Device helios_DDS (/dev/rmt/0cbn).

07-Nov 23:00 helios-sd: Writing spooled data to Volume. Despooling 64,524 bytes 
...
07-Nov 23:00 helios-sd: sopris-WEBMAIL.2006-11-07_23.00.01 Fatal error: Fatal 
despooling error.07-Nov 23:00 helios-sd: sopris-WEBMAIL.2006-11-07_23.00.01 
Fatal error: Error writing Session label to Delta_B: Error 0
07-Nov 23:00 helios-sd: sopris-WEBMAIL.2006-11-07_23.00.01 Fatal error: 
append.c:266 Error writting end session label. ERR=Job 
BigBrother-NET.2006-10-14_11.10.24 canceled while waiting for mount on Storage 
Device helios_DDS (/dev/rmt/0cbn).

07-Nov 23:00 helios-sd: Writing spooled data to Volume. Despooling 64,524 bytes 
...
07-Nov 23:00 helios-sd: sopris-WEBMAIL.2006-11-07_23.00.01 Fatal error: Fatal 
despooling error.07-Nov 23:00 helios-sd: sopris-WEBMAIL.2006-11-07_23.00.01 
Fatal error: append.c:277 Fatal append error on device helios_DDS 
(/dev/rmt/0cbn): ERR=Job BigBrother-NET.2006-10-14_11.10.24 canceled while 
waiting for mount on Storage Device helios_DDS (/dev/rmt/0cbn).

07-Nov 23:04 sopris-fd: sopris-WEBMAIL.2006-11-07_23.00.01 Fatal error: 
backup.c:500 Network send error to SD. ERR=Broken pipe
07-Nov 23:12 helios-dir: sopris-WEBMAIL.2006-11-07_23.00.01 Error: Bacula 
1.38.11 (28Jun06): 07-Nov-2006 23:12:55
  JobId:  1586
  Job:sopris-WEBMAIL.2006-11-07_23.00.01
  Backup Level:   Incremental, since=2006-11-06 23:00:14
  Client: sopris-fd 
x86_64-unknown-linux-gnu,redhat,Enterprise release
  FileSet:webmail-DATA 2006-03-10 14:05:06
  Pool:   combined_DELTA
  Storage:helios_DDS
  Scheduled time: 07-Nov-2006 23:00:00
  Start time: 07-Nov-2006 23:00:09
  End time:   07-Nov-2006 23:12:55
  Elapsed time:   12 mins 46 secs
  Priority:   10
  FD Files Written:   4
  SD Files Written:   4
  FD Bytes Written:   200,502 (200.5 KB)
  SD Bytes Written:   4,224 (4.224 KB)
  Rate:   0.3 KB/s
  Software Compression:   24.6 %
  Volume name(s): 
  Volume Session Id:  246
  Volume Session Time:1159324894
  Last Volume Bytes:  132,345,036 (132.3 MB)
  Non-fatal FD errors:0
  SD Errors:  0
  FD termination status:  Error
  SD termination status:  Error
  Termination:*** Backup Error ***

---End Message---
-
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnkkid=120709bid=263057dat=121642___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] backup : Fatal Error because: Bacula interrupted by signal 11: Segmentation violation ---- Max Run Time

2006-01-07 Thread Kern Sibbald
Hello,

I have now implemented what I hope to be a fix for the seg fault that the two 
of you encountered.  I have posted the following file on my website in case 
you would like to test it:

   www.sibbald.com/download/bacula-beta-1.38.4-06Jan06.tar.gz
and
www.sibbald.com/download/bacula-beta-1.38.4-06Jan06.tar.gz.sig

Concerning the question of exactly when the clock starts ticking for the 
MaxRunTime, it is definitely the start time that is used.  The problem is in 
the definition of start time.  In general, the start time is a few seconds 
after the schedule time.  Perhaps in a later version of Bacula, I will 
rethink about what the start time really is, because in fact a job can be 
held for lots of reasons before it actually starts backing up data.

On Friday 06 January 2006 14:50, Evelyne Cangini wrote:
 it looks like this is due to a Maximum Run Time that you have set for the
 Job : In effect, Max Run Time was set to 3 hours. Backup was running from
 a console command.

 I have another problem with that Job directive (without relation with the
 previous) : Bacula's documentation writes : The time specifies maximum
 allowed time that a job may run, counted from the when the job starts (not
 necessarily the same as when the job was scheduled).

 It seems to me the time is counted from when the job was scheduled  :
 Backup scheduled :
 Schedule : run at 1:05
 Max Start Dealy = 10800
 Max Run Time = 3600
 Le job canceled at 2:05 and it will never start.

 Kern Sibbald a écrit:
 Hello,
 
 Apparently, you have fallen into a bug that has previously been reported.
 From what I can tell from the traceback (nice, thanks), it looks like this
  is due to a Maximum Run Time (I forget the exact directive name) that you
  have set for the Job, and the watchdog decided the time had passed so it
  canceled the job.  In doing so, it trips over itself and falls flat on
  its face :-(
 
 This is now my #1 priority.  However, in the mean time, as a workaround
  either increase the timeout significantly or remove it althogether ...
 
 On Friday 06 January 2006 12:35, Evelyne Cangini wrote:
 Hello,
 
 Several times, i try a backup wich buckle : each time, the job cancel
 after running during 3 hours and with SD Bytes Written around 30 GB.
 And the last time, the log file announce : Fatal Error because: Bacula
 interrupted by signal 11: Segmentation violation.
 
 I receive also a  mail Bacula GDB traceback of bacula-dir :
 
 Using host libthread_db library /lib/libthread_db.so.1.
 [Thread debugging using libthread_db enabled]
 [New Thread -1209116992 (LWP 11829)]
 [New Thread -1221706832 (LWP 11831)]
 [New Thread -1211216976 (LWP 11830)]
 0x00980402 in ?? ()
 $1 = gaiaDir, '\0' repeats 22 times
 $2 = 0x95d1868 bacula-dir
 $3 = 0x95d1890 /home/bacula/bin/bacula-dir
 $4 = MySQL
 $5 = 0x80ca612 1.38.2 (20 November 2005)
 $6 = 0x80ca600 i686-pc-linux-gnu
 $7 = 0x80ca5f9 redhat
 $8 = 0x80ca5f0 (Stentz)
 #0  0x00980402 in ?? ()
 #1  0x004808f6 in __nanosleep_nocancel () from /lib/libpthread.so.0
 #2  0x080935db in bmicrosleep (sec=60, usec=0) at bsys.c:54
 #3  0x0806759a in wait_for_next_job (one_shot_job_to_run=0x0) at
 scheduler.c:96 #4  0x0804d16f in main (argc=0, argv=0xbfb0cc14) at
 dird.c:244
 
 Thread 3 (Thread -1211216976 (LWP 11830)):
 #0  0x00980402 in ?? ()
 #1  0x002f64b1 in ___newselect_nocancel () from /lib/libc.so.6
 #2  0x0809752c in bnet_thread_server (addrs=0x95d2770, max_clients=10,
 client_wq=0x80dce80,
 handle_client_request=0x807dbde handle_UA_client_request)
 at bnet_server.c:148
 #3  0x0807d96e in connect_thread (arg=0x95d2770) at ua_server.c:73
 #4  0x0047bb80 in start_thread () from /lib/libpthread.so.0
 #5  0x002fddee in clone () from /lib/libc.so.6
 
 Thread 2 (Thread -1221706832 (LWP 11831)):
 #0  0x00980402 in ?? ()
 #1  0x00480fbb in __waitpid_nocancel () from /lib/libpthread.so.0
 #2  0x080a978d in signal_handler (sig=11) at signal.c:159
 #3  signal handler called
 #4  0x0047caa2 in pthread_mutex_lock () from /lib/libpthread.so.0
 #5  0x080933d8 in _p (m=0xaaba) at bsys.c:370
 #6  0x0809d541 in JCR::inc_use_count (this=0x) at ../jcr.h:99
 #7  0x0809d1fd in get_next_jcr (prev_jcr=0xb43ce9c0) at jcr.c:581
 #8  0x0805e170 in job_monitor_watchdog (self=0x95e2b70) at job.c:443
 #9  0x080b18ac in watchdog_thread (arg=0x0) at watchdog.c:265
 #10 0x0047bb80 in start_thread () from /lib/libpthread.so.0
 #11 0x002fddee in clone () from /lib/libc.so.6
 
 Thread 1 (Thread -1209116992 (LWP 11829)):
 #0  0x00980402 in ?? ()
 #1  0x004808f6 in __nanosleep_nocancel () from /lib/libpthread.so.0
 #2  0x080935db in bmicrosleep (sec=60, usec=0) at bsys.c:54
 #3  0x0806759a in wait_for_next_job (one_shot_job_to_run=0x0) at
 scheduler.c:96 #4  0x0804d16f in main (argc=0, argv=0xbfb0cc14) at
 dird.c:244
 #0  0x00980402 in ?? ()
 No symbol table info available.
 #1  0x004808f6 in __nanosleep_nocancel () from /lib/libpthread.so.0
 No symbol table info available.
 #2  0x080935db in 

[Bacula-users] backup : Fatal Error because: Bacula interrupted by signal 11: Segmentation violation

2006-01-06 Thread Evelyne Cangini




Hello,

Several times, i try a backup wich buckle : each time, the job cancel after
running during 3 hours and with SD Bytes Written around 30 GB.
And the last time, the log file announce : Fatal Error because: Bacula interrupted
by signal 11: Segmentation violation.

I receive also a mail "Bacula GDB traceback of bacula-dir" :

Using host libthread_db library "/lib/libthread_db.so.1".
[Thread debugging using libthread_db enabled]
[New Thread -1209116992 (LWP 11829)]
[New Thread -1221706832 (LWP 11831)]
[New Thread -1211216976 (LWP 11830)]
0x00980402 in ?? ()
$1 = "gaiaDir", '\0' repeats 22 times
$2 = 0x95d1868 "bacula-dir"
$3 = 0x95d1890 "/home/bacula/bin/bacula-dir"
$4 = "MySQL"
$5 = 0x80ca612 "1.38.2 (20 November 2005)"
$6 = 0x80ca600 "i686-pc-linux-gnu"
$7 = 0x80ca5f9 "redhat"
$8 = 0x80ca5f0 "(Stentz)"
#0  0x00980402 in ?? ()
#1  0x004808f6 in __nanosleep_nocancel () from /lib/libpthread.so.0
#2  0x080935db in bmicrosleep (sec=60, usec=0) at bsys.c:54
#3  0x0806759a in wait_for_next_job ( at scheduler.c:96
#4  0x0804d16f in main (argc=0, argv=0xbfb0cc14) at dird.c:244

Thread 3 (Thread -1211216976 (LWP 11830)):
#0  0x00980402 in ?? ()
#1  0x002f64b1 in ___newselect_nocancel () from /lib/libc.so.6
#2  0x0809752c in bnet_thread_server (addrs=0x95d2770, max_clients=10, 
client_wq=0x80dce80, 
handle_client_request=0x807dbde handle_UA_client_request)
at bnet_server.c:148
#3  0x0807d96e in connect_thread (arg=0x95d2770) at ua_server.c:73
#4  0x0047bb80 in start_thread () from /lib/libpthread.so.0
#5  0x002fddee in clone () from /lib/libc.so.6

Thread 2 (Thread -1221706832 (LWP 11831)):
#0  0x00980402 in ?? ()
#1  0x00480fbb in __waitpid_nocancel () from /lib/libpthread.so.0
#2  0x080a978d in signal_handler (sig=11) at signal.c:159
#3  signal handler called
#4  0x0047caa2 in pthread_mutex_lock () from /lib/libpthread.so.0
#5  0x080933d8 in _p (m=0xaaba) at bsys.c:370
#6  0x0809d541 in JCR::inc_use_count (this=0x) at ../jcr.h:99
#7  0x0809d1fd in get_next_jcr (prev_jcr=0xb43ce9c0) at jcr.c:581
#8  0x0805e170 in job_monitor_watchdog (self=0x95e2b70) at job.c:443
#9  0x080b18ac in watchdog_thread (arg=0x0) at watchdog.c:265
#10 0x0047bb80 in start_thread () from /lib/libpthread.so.0
#11 0x002fddee in clone () from /lib/libc.so.6

Thread 1 (Thread -1209116992 (LWP 11829)):
#0  0x00980402 in ?? ()
#1  0x004808f6 in __nanosleep_nocancel () from /lib/libpthread.so.0
#2  0x080935db in bmicrosleep (sec=60, usec=0) at bsys.c:54
#3  0x0806759a in wait_for_next_job ( at scheduler.c:96
#4  0x0804d16f in main (argc=0, argv=0xbfb0cc14) at dird.c:244
#0  0x00980402 in ?? ()
No symbol table info available.
#1  0x004808f6 in __nanosleep_nocancel () from /lib/libpthread.so.0
No symbol table info available.
#2  0x080935db in bmicrosleep (sec=60, usec=0) at bsys.c:54
54	   stat = nanosleep(timeout, NULL);
Current language:  auto; currently c++
timeout = {tv_sec = 60, tv_nsec = 0}
tv = {tv_sec = 1, tv_usec = 20}
tz = {tz_minuteswest = 0, tz_dsttime = 0}
stat = 0
#3  0x0806759a in wait_for_next_job ( at scheduler.c:96
96	  bmicrosleep(NEXT_CHECK_SECS, 0); /* recheck once per minute */
jcr = (JCR *) 0xbfb0cb08
job = (JOB *) 0x809cd17
run = (RUN *) 0x987d8e8
now = 0
first = false
next_job = (job_item *) 0x0
#4  0x0804d16f in main (argc=0, argv=0xbfb0cc14) at dird.c:244
244	   while ( (jcr = wait_for_next_job(runjob)) ) {
ch = -1
jcr = (JCR *) 0x987d8e8
no_signals = 0
test_config = 0
uid = 0x0
gid = 0x0
#0  0x in ?? ()
No symbol table info available.
#0  0x in ?? ()
No symbol table info available.
#0  0x in ?? ()
No symbol table info available.


Thanks for your help,
Evelyne





Re: [Bacula-users] backup : Fatal Error because: Bacula interrupted by signal 11: Segmentation violation

2006-01-06 Thread Kern Sibbald
Hello,

Apparently, you have fallen into a bug that has previously been reported.  
From what I can tell from the traceback (nice, thanks), it looks like this is 
due to a Maximum Run Time (I forget the exact directive name) that you have 
set for the Job, and the watchdog decided the time had passed so it canceled 
the job.  In doing so, it trips over itself and falls flat on its face :-(  

This is now my #1 priority.  However, in the mean time, as a workaround either 
increase the timeout significantly or remove it althogether ...

On Friday 06 January 2006 12:35, Evelyne Cangini wrote:
 Hello,

 Several times, i try a backup wich buckle : each time, the job cancel
 after running during 3 hours and with SD Bytes Written around 30 GB.
 And the last time, the log file announce : Fatal Error because: Bacula
 interrupted by signal 11: Segmentation violation.

 I receive also a  mail Bacula GDB traceback of bacula-dir :

 Using host libthread_db library /lib/libthread_db.so.1.
 [Thread debugging using libthread_db enabled]
 [New Thread -1209116992 (LWP 11829)]
 [New Thread -1221706832 (LWP 11831)]
 [New Thread -1211216976 (LWP 11830)]
 0x00980402 in ?? ()
 $1 = gaiaDir, '\0' repeats 22 times
 $2 = 0x95d1868 bacula-dir
 $3 = 0x95d1890 /home/bacula/bin/bacula-dir
 $4 = MySQL
 $5 = 0x80ca612 1.38.2 (20 November 2005)
 $6 = 0x80ca600 i686-pc-linux-gnu
 $7 = 0x80ca5f9 redhat
 $8 = 0x80ca5f0 (Stentz)
 #0  0x00980402 in ?? ()
 #1  0x004808f6 in __nanosleep_nocancel () from /lib/libpthread.so.0
 #2  0x080935db in bmicrosleep (sec=60, usec=0) at bsys.c:54
 #3  0x0806759a in wait_for_next_job (one_shot_job_to_run=0x0) at
 scheduler.c:96 #4  0x0804d16f in main (argc=0, argv=0xbfb0cc14) at
 dird.c:244

 Thread 3 (Thread -1211216976 (LWP 11830)):
 #0  0x00980402 in ?? ()
 #1  0x002f64b1 in ___newselect_nocancel () from /lib/libc.so.6
 #2  0x0809752c in bnet_thread_server (addrs=0x95d2770, max_clients=10,
 client_wq=0x80dce80,
 handle_client_request=0x807dbde handle_UA_client_request)
 at bnet_server.c:148
 #3  0x0807d96e in connect_thread (arg=0x95d2770) at ua_server.c:73
 #4  0x0047bb80 in start_thread () from /lib/libpthread.so.0
 #5  0x002fddee in clone () from /lib/libc.so.6

 Thread 2 (Thread -1221706832 (LWP 11831)):
 #0  0x00980402 in ?? ()
 #1  0x00480fbb in __waitpid_nocancel () from /lib/libpthread.so.0
 #2  0x080a978d in signal_handler (sig=11) at signal.c:159
 #3  signal handler called
 #4  0x0047caa2 in pthread_mutex_lock () from /lib/libpthread.so.0
 #5  0x080933d8 in _p (m=0xaaba) at bsys.c:370
 #6  0x0809d541 in JCR::inc_use_count (this=0x) at ../jcr.h:99
 #7  0x0809d1fd in get_next_jcr (prev_jcr=0xb43ce9c0) at jcr.c:581
 #8  0x0805e170 in job_monitor_watchdog (self=0x95e2b70) at job.c:443
 #9  0x080b18ac in watchdog_thread (arg=0x0) at watchdog.c:265
 #10 0x0047bb80 in start_thread () from /lib/libpthread.so.0
 #11 0x002fddee in clone () from /lib/libc.so.6

 Thread 1 (Thread -1209116992 (LWP 11829)):
 #0  0x00980402 in ?? ()
 #1  0x004808f6 in __nanosleep_nocancel () from /lib/libpthread.so.0
 #2  0x080935db in bmicrosleep (sec=60, usec=0) at bsys.c:54
 #3  0x0806759a in wait_for_next_job (one_shot_job_to_run=0x0) at
 scheduler.c:96 #4  0x0804d16f in main (argc=0, argv=0xbfb0cc14) at
 dird.c:244
 #0  0x00980402 in ?? ()
 No symbol table info available.
 #1  0x004808f6 in __nanosleep_nocancel () from /lib/libpthread.so.0
 No symbol table info available.
 #2  0x080935db in bmicrosleep (sec=60, usec=0) at bsys.c:54
 54   stat = nanosleep(timeout, NULL);
 Current language:  auto; currently c++
 timeout = {tv_sec = 60, tv_nsec = 0}
 tv = {tv_sec = 1, tv_usec = 20}
 tz = {tz_minuteswest = 0, tz_dsttime = 0}
 stat = 0
 #3  0x0806759a in wait_for_next_job (one_shot_job_to_run=0x0) at
 scheduler.c:96 96   bmicrosleep(NEXT_CHECK_SECS, 0); /* recheck once
 per minute */ jcr = (JCR *) 0xbfb0cb08
 job = (JOB *) 0x809cd17
 run = (RUN *) 0x987d8e8
 now = 0
 first = false
 next_job = (job_item *) 0x0
 #4  0x0804d16f in main (argc=0, argv=0xbfb0cc14) at dird.c:244
 244  while ( (jcr = wait_for_next_job(runjob)) ) {
 ch = -1
 jcr = (JCR *) 0x987d8e8
 no_signals = 0
 test_config = 0
 uid = 0x0
 gid = 0x0
 #0  0x in ?? ()
 No symbol table info available.
 #0  0x in ?? ()
 No symbol table info available.
 #0  0x in ?? ()
 No symbol table info available.

 Thanks for your help,
 Evelyne

-- 
Best regards,

Kern

  (
  /\
  V_V


---
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_idv37alloc_id865op=click
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users


Re: [Bacula-users] backup : Fatal Error because: Bacula interrupted by signal 11: Segmentation violation ---- Max Run Time

2006-01-06 Thread Evelyne Cangini




"it looks like this is due to a Maximum Run Time that you have set for the Job" :
In effect, Max Run Time was set to 3 hours. Backup was running from a console command.

I have another problem with that Job directive (without relation with the previous) :
Bacula's documentation writes : "The time specifies maximum allowed time that a job may run, counted from the when the job starts (not necessarily
the same as when the job was scheduled)." 

It seems to me the time is counted from when the job was scheduled  : 
Backup scheduled :
    Schedule : run at 1:05
    Max Start Dealy = 10800
    Max Run Time = 3600 
Le job canceled at 2:05 and it will never start.




Kern Sibbald a écrit:

  Hello,

Apparently, you have fallen into a bug that has previously been reported.  
>From what I can tell from the traceback (nice, thanks), it looks like this is 
due to a Maximum Run Time (I forget the exact directive name) that you have 
set for the Job, and the watchdog decided the time had passed so it canceled 
the job.  In doing so, it trips over itself and falls flat on its face :-(  

This is now my #1 priority.  However, in the mean time, as a workaround either 
increase the timeout significantly or remove it althogether ...

On Friday 06 January 2006 12:35, Evelyne Cangini wrote:
  
  
Hello,

Several times, i try a backup wich buckle : each time, the job cancel
after running during 3 hours and with SD Bytes Written around 30 GB.
And the last time, the log file announce : Fatal Error because: Bacula
interrupted by signal 11: Segmentation violation.

I receive also a  mail "Bacula GDB traceback of bacula-dir" :

Using host libthread_db library "/lib/libthread_db.so.1".
[Thread debugging using libthread_db enabled]
[New Thread -1209116992 (LWP 11829)]
[New Thread -1221706832 (LWP 11831)]
[New Thread -1211216976 (LWP 11830)]
0x00980402 in ?? ()
$1 = "gaiaDir", '\0' repeats 22 times
$2 = 0x95d1868 "bacula-dir"
$3 = 0x95d1890 "/home/bacula/bin/bacula-dir"
$4 = "MySQL"
$5 = 0x80ca612 "1.38.2 (20 November 2005)"
$6 = 0x80ca600 "i686-pc-linux-gnu"
$7 = 0x80ca5f9 "redhat"
$8 = 0x80ca5f0 "(Stentz)"
#0  0x00980402 in ?? ()
#1  0x004808f6 in __nanosleep_nocancel () from /lib/libpthread.so.0
#2  0x080935db in bmicrosleep (sec=60, usec=0) at bsys.c:54
#3  0x0806759a in wait_for_next_job ( at
scheduler.c:96 #4  0x0804d16f in main (argc=0, argv=0xbfb0cc14) at
dird.c:244

Thread 3 (Thread -1211216976 (LWP 11830)):
#0  0x00980402 in ?? ()
#1  0x002f64b1 in ___newselect_nocancel () from /lib/libc.so.6
#2  0x0809752c in bnet_thread_server (addrs=0x95d2770, max_clients=10,
client_wq=0x80dce80,
handle_client_request=0x807dbde handle_UA_client_request)
at bnet_server.c:148
#3  0x0807d96e in connect_thread (arg=0x95d2770) at ua_server.c:73
#4  0x0047bb80 in start_thread () from /lib/libpthread.so.0
#5  0x002fddee in clone () from /lib/libc.so.6

Thread 2 (Thread -1221706832 (LWP 11831)):
#0  0x00980402 in ?? ()
#1  0x00480fbb in __waitpid_nocancel () from /lib/libpthread.so.0
#2  0x080a978d in signal_handler (sig=11) at signal.c:159
#3  signal handler called
#4  0x0047caa2 in pthread_mutex_lock () from /lib/libpthread.so.0
#5  0x080933d8 in _p (m=0xaaba) at bsys.c:370
#6  0x0809d541 in JCR::inc_use_count (this=0x) at ../jcr.h:99
#7  0x0809d1fd in get_next_jcr (prev_jcr=0xb43ce9c0) at jcr.c:581
#8  0x0805e170 in job_monitor_watchdog (self=0x95e2b70) at job.c:443
#9  0x080b18ac in watchdog_thread (arg=0x0) at watchdog.c:265
#10 0x0047bb80 in start_thread () from /lib/libpthread.so.0
#11 0x002fddee in clone () from /lib/libc.so.6

Thread 1 (Thread -1209116992 (LWP 11829)):
#0  0x00980402 in ?? ()
#1  0x004808f6 in __nanosleep_nocancel () from /lib/libpthread.so.0
#2  0x080935db in bmicrosleep (sec=60, usec=0) at bsys.c:54
#3  0x0806759a in wait_for_next_job ( at
scheduler.c:96 #4  0x0804d16f in main (argc=0, argv=0xbfb0cc14) at
dird.c:244
#0  0x00980402 in ?? ()
No symbol table info available.
#1  0x004808f6 in __nanosleep_nocancel () from /lib/libpthread.so.0
No symbol table info available.
#2  0x080935db in bmicrosleep (sec=60, usec=0) at bsys.c:54
54	   stat = nanosleep(timeout, NULL);
Current language:  auto; currently c++
timeout = {tv_sec = 60, tv_nsec = 0}
tv = {tv_sec = 1, tv_usec = 20}
tz = {tz_minuteswest = 0, tz_dsttime = 0}
stat = 0
#3  0x0806759a in wait_for_next_job ( at
scheduler.c:96 96	  bmicrosleep(NEXT_CHECK_SECS, 0); /* recheck once
per minute */ jcr = (JCR *) 0xbfb0cb08
job = (JOB *) 0x809cd17
run = (RUN *) 0x987d8e8
now = 0
first = false
next_job = (job_item *) 0x0
#4  0x0804d16f in main (argc=0, argv=0xbfb0cc14) at dird.c:244
244	   while ( (jcr = wait_for_next_job(runjob)) ) {
ch = -1
jcr = (JCR *) 0x987d8e8
no_signals = 0
test_config = 0
uid = 0x0
gid = 0x0
#0  0x in ?? ()
No symbol table info available.
#0  0x in ?? ()
No symbol table info available.
#0  0x in ?? ()
No symbol table info available.

Thanks for your help,
Evelyne