Bug#930931: [pkg-bacula-devel] Bug#930931: /usr/sbin/btape: btape crashes on "fill" test

2020-11-08 Thread Sebastian Suchanek
Hi!

For the avoidance of doubt: The bug I reported here and all the testing
I did in 2019 around this bug report all took place on my Bacula
production system.

In the meantime, I did set up an independent test system in order to
have more flexibility in testing and also to exclude hardware problems
other than those I already tested. Here's a quick overview of the test
system compared to the production system:

- System main components (e.g. mainboard, CPU, ...): all different from
  production system.
- Fibre Channel controler: different
  (production system: Emulex LPe11002; test system: QLogic QLE2460)
- Tape Library: same type as production system
- LTO5 drive: exact same drive as previously tested in production system

- Basic operating system: same as production system (Debian Stretch)
- Actual kernel running: different
  (production system: 4.9.0-12-amd64; test system: 4.9.0-14-amd64)
- Bacula version: same as production system:
  - bacula:  7.4.4+dfsg-6+deb9u2
  - bacula-bscan:9.4.2-1~bpo9+1
  - bacula-client:   7.4.4+dfsg-6+deb9u2
  - bacula-common:   9.4.2-1~bpo9+1
  - bacula-common-mysql: 9.4.2-1~bpo9+1
  - bacula-console:  9.4.2-1~bpo9+1
  - bacula-director: 9.4.2-1~bpo9+1
  - bacula-director-mysql:   9.4.2-1~bpo9+1
  - bacula-fd:   9.4.2-1~bpo9+1
  - bacula-sd:   9.4.2-1~bpo9+1
  - bacula-sd-dbgsym:9.4.2-1~bpo9+1
  - bacula-server:   7.4.4+dfsg-6+deb9u2

Now, when I run tests on the test system with btape, both the "test"
test and the "fill" test work without any problems, i.e. no crash occurs.


Regards

Sebastian



Bug#930931: /usr/sbin/btape: btape crashes on "fill" test with kernel panic

2019-12-08 Thread Sebastian Suchanek
Update 2019-12-08:

In the meantime, I've been able to source a second HP LTO5 drive.
Unfortunately, also with the second drive, Bacula shows the exact same
behaviour.
In my opinion, with the error occuring also with a second drive, the
possibility of a potential hardware problem is eliminated, making this
issue a software problem.


Best regards

Sebastian



Bug#930931: /usr/sbin/btape: btape crashes on "fill" test with kernel panic

2019-06-22 Thread Sebastian Suchanek
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Am 22.06.2019 um 22:20 schrieb Sven Hartge:
> On 22.06.19 19:58, Sebastian Suchanek wrote:
>
>> - Put a cartridge in the drive - Run btape from console: "btape
>> -c /etc/bacula/bacula-sd.con /dev/nst1" - Start "fill" test
>> within btape - btape writes the name of the volume to the tape,
>> then crashes immediately with a kernel panic.
>
>> Bacula interrupted by signal 11: Segmentation violation
>
> You mean it aborts with a SIGSEGV. A kernel panic is a different
> error.
>
> If you really experience a kernel panic then we also need to
> complete panic output from dmesg.

Sorry, my bad. Yes, btape indeed shows a segmentation fault, not a
kernel panic.


Best regards

Sebastian

-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.22 (MingW32)

iEYEARECAAYFAl0OkMwACgkQql3J4k8uAQkX4ACfUdD6jfft7Qw1sdP0G6bd82dE
JyMAoLECWArkObCo3mrA9ac016DiCIKd
=UwVR
-END PGP SIGNATURE-



Bug#930931: /usr/sbin/btape: btape crashes on "fill" test with kernel panic

2019-06-22 Thread Sebastian Suchanek
Package: bacula-sd
Version: 9.4.2-1~bpo9+1
Severity: normal
File: /usr/sbin/btape

Dear Maintainer,

running a "fill" test in btape on a HP LTO-5 (type: BRSLA-0901-DC)
results in a kernel panic.
What I did:

- Put a cartridge in the drive
- Run btape from console: "btape -c /etc/bacula/bacula-sd.con /dev/nst1"
- Start "fill" test within btape
- btape writes the name of the volume to the tape, then crashes
  immediately with a kernel panic.

Output from btape:


[...]
Mount blank Volume on device "LTO4-Drive-1" (/dev/nst0) and press return
when ready:
btape: btape.c:3073-0
Wrote Volume label for volume "TestVolume1".
Wrote Start of Session label.
15:29:11 Begin writing Bacula records to tape ...
Bacula interrupted by signal 11: Segmentation violation
Kaboom! btape, btape got signal 11 - Segmentation violation at
01-Jun-2019 15:29:11. Attempting traceback.
Kaboom! exepath=/usr/sbin/
Calling: /usr/sbin/btraceback /usr/sbin/btape 11716 /tmp
bsmtp: bsmtp.c:122-0 Fatal malformed reply from localhost: 501 :
sender address must contain a domain
The btraceback call returned 1
LockDump: /tmp/bacula.11716.traceback
btape: lockmgr.c:1221-0 lockmgr disabled
btape: smartall.c:400-0 Orphaned buffer: btape 100024 bytes at
561ca92294a8 from btape.c:2246
/usr/sbin#




So far, this behaviour has been 100% reproducible.

Here is a Bacula backtrace file from a crash:


[New LWP 13970]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
0x7f4654653b5a in __waitpid (pid=14087, stat_loc=0x7ffd8f0e1d6c,
options=0) at ../sysdeps/unix/sysv/linux/waitpid.c:29
29  ../sysdeps/unix/sysv/linux/waitpid.c: No such file or directory.
$1 = 1244475954
$2 = 0x7f4654aceb00  "btape"
$3 = 1392275832
$4 = 1392275896
$5 = 0
$6 = 0
$7 = 1418438265
$8 = 1418438234
$9 = 1418438227
$10 = 1418438261
$11 = 1701276020
$12 = 1418438254
Environment variable "TestName" not defined.
#0  0x7f4654653b5a in __waitpid (pid=14087, stat_loc=0x7ffd8f0e1d6c,
options=0) at ../sysdeps/unix/sysv/linux/waitpid.c:29
#1  0x7f46548a3c9f in signal_handler () from
/usr/lib/bacula/libbac-9.4.2.so
#2  
#3  0x561a5264919e in fillcmd () at btape.c:2387
#4  0x561a526410cb in do_tape_cmds () at btape.c:2945
#5  main (margc=, margv=) at btape.c:309

Thread 2 (Thread 0x7f46526b9700 (LWP 13970)):
#0  pthread_cond_timedwait@@GLIBC_2.3.2 () at
../sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:225
#1  0x7f46548adf01 in watchdog_thread () from
/usr/lib/bacula/libbac-9.4.2.so
#2  0x7f465464a4a4 in start_thread (arg=0x7f46526b9700) at
pthread_create.c:456
#3  0x7f465400ad0f in clone () at
../sysdeps/unix/sysv/linux/x86_64/clone.S:97

Thread 1 (Thread 0x7f4655155740 (LWP 13967)):
#0  0x7f4654653b5a in __waitpid (pid=14087, stat_loc=0x7ffd8f0e1d6c,
options=0) at ../sysdeps/unix/sysv/linux/waitpid.c:29
#1  0x7f46548a3c9f in signal_handler () from
/usr/lib/bacula/libbac-9.4.2.so
#2  
#3  0x561a5264919e in fillcmd () at btape.c:2387
#4  0x561a526410cb in do_tape_cmds () at btape.c:2945
#5  main (margc=, margv=) at btape.c:309
#0  0x7f4654653b5a in __waitpid (pid=14087, stat_loc=0x7ffd8f0e1d6c,
options=0) at ../sysdeps/unix/sysv/linux/waitpid.c:29
29  in ../sysdeps/unix/sysv/linux/waitpid.c
resultvar = 18446744073709551104
sc_cancel_oldtype = 0
#1  0x7f46548a3c9f in signal_handler () from
/usr/lib/bacula/libbac-9.4.2.so
No symbol table info available.
#2  
No locals.
#3  0x561a5264919e in fillcmd () at btape.c:2387
2387btape.c: No such file or directory.
rec = {link = {next = 0x0, prev = 0x0}, StreamLen = 0, FileOffset = 0,
StartAddr = 0, Addr = 0, VolSessionId = 0, VolSessionTime = 0, FileIndex
= 0, Stream = 0, last_FI = 0, last_Stream = 0, maskedStream = 0,
data_len = 32768, remainder = 0, adata_remainder = 0, remlen = 0,
data_bytes = 0, state_bits = 0, RecNum = 0, BlockNumber = 0, invalid =
false, wstate = st_none, rstate = st_none, bsr = 0x0, data =
0x561a52fd04c0
"\267\023\030\224\023^J$\276\020h\363>*\370\354\351!E}RNG>^n\320N6\037t\017_݆\354.\t-uJ\016\346\206\340i#\336\363w\356i\026\266?\034\232\353>\022\263y\t\342\061w\306m\266P\346\v\345k\271,\205, "]\000\000\000n", '\000' 
ec2 =
"\200+\016\217\375\177\000\000\000\273+TF\177\000\000\000\004\000\000\000\000\000\000
 
\205R\032V\000\000\330\060\016\217\375\177\000\000\377\377\377\377\377\377\377\377v8"
buf1 = "12:33:35\000\210", '\000' ,
"\231\331\377SF\177\000\000\000\000\000\000\000\000\000\000\027H\371SF\177\000\000\000\306+TF\177\000\000h\r\000\000\000\000\000\000\000y+TF\177\000\000@\204+TF\177\000\000\300u\374R\032V\000\000Q{\211TF\177\000\000\000\000\000"
buf2 =