Re: [Bacula-devel] Hung jobs: continued diagnosis

2020-07-15 Thread Phil Stracchino
On 2020-07-15 09:19, Kern Sibbald wrote: > Hello, > > There are a few things to keep in mind concerning MariaDB: > > 1. The Bacula project currently does not support MariaDB.   One of the > reasons is when testing Bacula with MariaDB, running *exactly* the same > jobs as I ran with MySQL, Maria

Re: [Bacula-devel] Hung jobs: continued diagnosis

2020-07-15 Thread Phil Stracchino
On 2020-07-15 09:23, Kern Sibbald wrote: > Sorry, correction:  > > 2. I personally do **not** have the time/energy to support MariaDB even > if it is becoming the open source replacement for MySQL. > > Kern I figured that was the correct reading. :) My setup's not actually nearly as complex o

Re: [Bacula-devel] Hung jobs: continued diagnosis

2020-07-15 Thread Kern Sibbald
Sorry, correction:  2. I personally do *not* have the time/energy to support MariaDB even if it is becoming the open source replacement for MySQL. Kern On 7/15/20 3:19 PM, Kern Sibbald wrote: Hello, There are a few things to

Re: [Bacula-devel] Hung jobs: continued diagnosis

2020-07-15 Thread Kern Sibbald
Hello, There are a few things to keep in mind concerning MariaDB: 1. The Bacula project currently does not support MariaDB.   One of the reasons is when testing Bacula with MariaDB, running *exactly* the same jobs as I ran with MySQL, MariaDB fails with the deadlock messages you have seen.  T

Re: [Bacula-devel] Hung jobs: continued diagnosis

2020-07-15 Thread Phil Stracchino
On 2020-07-15 01:50, Sven Hartge wrote: > Looking at my logs, I have seen the same in my 9.6.5 installation on > Debian 10: > > Excerpts from Error/Canceled-Mails: > > - > > 24-Jun 00:02 back-dir JobId 585301: Error: bdb.h:140 bdb.h:140 update > UPDATE Job SET JobStatus='R',Level='I',Star

Re: [Bacula-devel] Hung jobs: continued diagnosis

2020-07-14 Thread Sven Hartge
On 15.07.20 00:05, Phil Stracchino wrote: 1. The triggering condition is when a DB record insertion fails for any reason, *including recoverable failures* such as InnoDB rollbacks. (MySQL uses rollbacks to notify the application of any of several types of transient error, including deadlocks or

Re: [Bacula-devel] Hung jobs: continued diagnosis

2020-07-14 Thread Phil Stracchino
On 2020-07-14 12:18, Phil Stracchino wrote: > At which I sheepishly notice that I have wsrep_auto_increment_control > OFF when I'd have sworn it was on. Which doesn't change the fact that > it works that way with 9.6.3. > > Nevertheless, I'll retest 9.6.5 with that change just to be certain. If

Re: [Bacula-devel] Hung jobs: continued diagnosis

2020-07-14 Thread Phil Stracchino
On 2020-07-14 11:30, Martin Simmons wrote: > Sorry if you've already mentioned it, but is the 9.6.3 Director the same old > binary as you used in the past? Or have you recompiled it recently? If it is > the old binary, maybe something else has changed that affects compilation, so > you could try

Re: [Bacula-devel] Hung jobs: continued diagnosis

2020-07-14 Thread Martin Simmons
> On Mon, 13 Jul 2020 15:26:17 -0400, Phil Stracchino said: > > On 2020-07-13 13:59, Martin Simmons wrote: > >> On Sun, 12 Jul 2020 14:32:44 -0400, Phil Stracchino said: > >> > >> On 2020-07-12 14:12, Phil Stracchino wrote: > >>> To test this theory I have built a 9.6.5 director with LZO s

Re: [Bacula-devel] Hung jobs: continued diagnosis

2020-07-13 Thread Phil Stracchino
On 2020-07-13 13:59, Martin Simmons wrote: >> On Sun, 12 Jul 2020 14:32:44 -0400, Phil Stracchino said: >> >> On 2020-07-12 14:12, Phil Stracchino wrote: >>> To test this theory I have built a 9.6.5 director with LZO support >>> disabled and am testing it now. >> >> Well, that didn't work. >> >

Re: [Bacula-devel] Hung jobs: continued diagnosis

2020-07-13 Thread Martin Simmons
> On Sun, 12 Jul 2020 14:32:44 -0400, Phil Stracchino said: > > On 2020-07-12 14:12, Phil Stracchino wrote: > > To test this theory I have built a 9.6.5 director with LZO support > > disabled and am testing it now. > > Well, that didn't work. > > But this does definitely now seem to be relat

Re: [Bacula-devel] Hung jobs: continued diagnosis

2020-07-13 Thread Martin Simmons
> On Mon, 13 Jul 2020 10:12:41 -0400, Phil Stracchino said: > > On 2020-07-13 10:04, Radosław Korzeniewski wrote: > > Hello, > > > > niedz., 12 lip 2020 o 21:14 Phil Stracchino > > napisał(a): > > > > There appear to be two failures occurring, the SIGUSR2 at

Re: [Bacula-devel] Hung jobs: continued diagnosis

2020-07-13 Thread Phil Stracchino
On 2020-07-13 10:04, Radosław Korzeniewski wrote: > Hello, > > niedz., 12 lip 2020 o 21:14 Phil Stracchino > napisał(a): > > There appear to be two failures occurring, the SIGUSR2 at > LZ4_decompress_generic+0x03a2 in the SD, > > > AFAIK, the SIGUSR2 is us

Re: [Bacula-devel] Hung jobs: continued diagnosis

2020-07-13 Thread Radosław Korzeniewski
Hello, niedz., 12 lip 2020 o 21:14 Phil Stracchino napisał(a): > There appear to be two failures occurring, the SIGUSR2 at > LZ4_decompress_generic+0x03a2 in the SD, AFAIK, the SIGUSR2 is used internally by Bacula with a watchdog/timeout code and it is not a failure in any case especially when

Re: [Bacula-devel] Hung jobs: continued diagnosis

2020-07-12 Thread Phil Stracchino
On 2020-07-12 14:32, Phil Stracchino wrote: > But this does definitely now seem to be related to lzo/lz4 decompression > failures on the 9.6.3/9.6.5 SD that were not happening with a 9.6.3 > Director. So that's narrowed it down quite a bit. OK, let me revise that: There appear to be two failure

Re: [Bacula-devel] Hung jobs: continued diagnosis

2020-07-12 Thread Phil Stracchino
(And if it's an LZ4 problem, then what would it have to do with HAproxy? I haven't a freaking clue, unless there's more than one problem.) -- Phil Stracchino Babylon Communications ph...@caerllewys.net p...@co.ordinate.org Landline: +1.603.293.8485 Mobile: +1.603.998.6958 __

Re: [Bacula-devel] Hung jobs: continued diagnosis

2020-07-12 Thread Phil Stracchino
On 2020-07-12 14:12, Phil Stracchino wrote: > To test this theory I have built a 9.6.5 director with LZO support > disabled and am testing it now. Well, that didn't work. But this does definitely now seem to be related to lzo/lz4 decompression failures on the 9.6.3/9.6.5 SD that were not happenin

Re: [Bacula-devel] Hung jobs: continued diagnosis

2020-07-12 Thread Phil Stracchino
Ha! I think I got something useful this time. From the SD, running on Solaris 11.3: (dbx) attach 16166 Reading bacula-sd Reading ld.so.1 Reading libbacsd-9.6.5.so Reading libbaccfg-9.6.5.so Reading libbac-9.6.5.so Reading libz.so.1 Reading libm.so.2 Reading libpthread.so.1 Reading libnsl.so.

Re: [Bacula-devel] Hung jobs: continued diagnosis

2020-07-12 Thread Sven Hartge
On 12.07.20 17:38, Phil Stracchino wrote: > This problem is very clearly tied to the 9.6.5 Director as a regression > from 9.6.3, and there are strong hints that it has to do with how the > Director is talking to the database. I can throw one datapoint at you: Before 9.6.5 the director never con

[Bacula-devel] Hung jobs: continued diagnosis

2020-07-12 Thread Phil Stracchino
For the last week or so I've been testing and observing when hung jobs occur. And the answer is pretty clear. With everything running 9.6.3, NO hung jobs, whether connecting to the DB cluster via HAproxy or direct to the local node. I can upgrade EVERYTHING BUT the Director to 9.6.5, including t