Re: [Bacula-users] Mysterious Director console connection failures

2012-03-07 Thread Phil Stracchino
OK, this is getting more and more peculiar as I study it more.  Adding
bacula-devel list.

To briefly recap the initial statement of the problem, I've been
experiencing a problem in which, after a number of successful
connections, console-Director connection authentication begins
repeatedly failing.  Everything else seems to continue to work normally.
 The typical behavior is that after manually starting two or three jobs
using BAT, I can no longer connect to the Director either with BAT or
with bconsole, but everything else continues to function normally and
the scheduled jobs run normally.  After the pending manually-scheduled
jobs complete, I can connect again.



On the theory that network bandwidth may be somehow involved, I tried
scheduling several jobs 15 minutes ahead of time, to see if I could get
more jobs running if I scheduled them all before any started.

Starting at about 0915, schedule job 1 for 0925.  No problem.
Schedule Job 2 for 0925.  No problem.
Schedule job 3 for 0925.  No problem.
At about 0918, try to schedule job 4 for 0925.  None of the new jobs has
yet started.  No go; neither bat nor bconsole can connect.


This is what the trace logged as I tried to connect with bconsole:

babylon4-dir: bnet.c:708-0 who=client host=10.24.32.10 port=36131
babylon4-dir: job.c:1331-0 wstorage=babylon5-sd
babylon4-dir: job.c:1340-0 wstore=babylon5-sd where=Pool resource
babylon4-dir: job.c:1031-0 JobId=0 created
Job=-Console-.2012-03-07_09.19.16_37
babylon4-dir: cram-md5.c:72-0 send: auth cram-md5
1723850907.1331129956@babylon4-dir ssl=0
babylon4-dir: cram-md5.c:131-0 cram-get received: auth cram-md5
85736557.1331129966@bat ssl=0
babylon4-dir: cram-md5.c:150-0 sending resp to challenge:
25Q2B+IdJ/UKI/+p6++vkC
babylon4-dir: ua_dotcmds.c:164-0 Cmd: .api 1
babylon4-dir: ua_dotcmds.c:164-0 Cmd: .levels Backup
babylon4-dir: bnet.c:708-0 who=client host=10.24.32.10 port=36131
babylon4-dir: bnet.c:708-0 who=client host=10.24.32.14 port=36131


The console reported:

babylon4:root:/opt/bacula/etc:29 # bconsole
Connecting to Director babylon4:9101
Director authorization problem.
Most likely the passwords do not agree.
If you are using TLS, there may have been a certificate validation error
during the TLS handshake.


After restarting the Director, I re-enabled the trace (setdebug director
level=100 trace=1), then reconnected again with bconsole:

babylon4-dir: bnet.c:708-0 who=client host=10.24.32.14 port=36131
babylon4-dir: job.c:1331-0 wstorage=babylon5-sd
babylon4-dir: job.c:1340-0 wstore=babylon5-sd where=Pool resource
babylon4-dir: job.c:1031-0 JobId=0 created
Job=-Console-.2012-03-07_09.32.59_04
babylon4-dir: cram-md5.c:72-0 send: auth cram-md5
1031666935.1331130779@babylon4-dir ssl=0
babylon4-dir: cram-md5.c:131-0 cram-get received: auth cram-md5
41725829.1331130779@bconsole ssl=0
babylon4-dir: cram-md5.c:150-0 sending resp to challenge:
6Sgw8g8aLxgeAEx5CwsU1B

This looks no different to me than the failed connection attempt.  So I
tried starting up bconsole from the Linux machine I'm running bat on.
That worked fine, so I quit it and started another.  I did this about
five times.  Then I started six at once.  No problem.

It appears I can connect as many consoles as I want, up to the
Director's configured concurrency limit, with no problem ... until I
start scheduling jobs.

So, then I opened a bconsole and left it open, then scheduled two jobs
from BAT successfully.  Then I tried to schedule a third.  No go.

At this point, I tried to open an additional new bconsole.  No go, and
the trace *did not log anything* for the connection attempt.  I could
continue to schedule more manual jobs from the existing open bconsole,
but could start no new consoles, and BAT became completely unresponsive.
 It appears that once two or three jobs were scheduled, the Director
*stopped listening* for new console connections, but continued to
service existing open consoles.


All daemons are Bacula 5.2.5, all 64-bit builds.  The Director and the
disk-based SD are running on Solaris 10u9 amd64, built using Sun Studio
12.2.  The tape SD is running on Gentoo Linux amd64, built using
gcc-4.5.3.  BAT runs on the Linux box, and I used bconsoles from both
machines with no difference in behavior.



-- 
  Phil Stracchino, CDK#2 DoD#299792458 ICBM: 43.5607, -71.355
  ala...@caerllewys.net   ala...@metrocast.net   p...@co.ordinate.org
  Renaissance Man, Unix ronin, Perl hacker, SQL wrangler, Free Stater
 It's not the years, it's the mileage.

--
Virtualization  Cloud Management Using Capacity Planning
Cloud computing makes use of virtualization - but cloud computing 
also focuses on allowing computing to be delivered as a service.
http://www.accelacomm.com/jaw/sfnl/114/51521223/
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net

Re: [Bacula-users] Mysterious Director console connection failures

2012-03-07 Thread Geert Stappers
Op 20120307 om 21:36 schreef Kern Sibbald:
 On 03/07/2012 06:54 PM, Martin Simmons wrote:
  On Wed, 07 Mar 2012 09:58:48 -0500, Phil Stracchino said:

 
 
  On the theory that network bandwidth may be somehow involved, I tried
  scheduling several jobs 15 minutes ahead of time, to see if I could get
  more jobs running if I scheduled them all before any started.
 
  Starting at about 0915, schedule job 1 for 0925.  No problem.
  Schedule Job 2 for 0925.  No problem.
  Schedule job 3 for 0925.  No problem.
  At about 0918, try to schedule job 4 for 0925.  None of the new jobs has
  yet started.  No go; neither bat nor bconsole can connect.
 
 
  This is what the trace logged as I tried to connect with bconsole:
 
  babylon4-dir: bnet.c:708-0 who=client host=10.24.32.10 port=36131
  babylon4-dir: job.c:1331-0 wstorage=babylon5-sd
  babylon4-dir: job.c:1340-0 wstore=babylon5-sd where=Pool resource
  babylon4-dir: job.c:1031-0 JobId=0 created 
  Job=-Console-.2012-03-07_09.19.16_37
  babylon4-dir: cram-md5.c:72-0 send: auth cram-md5 
  1723850907.1331129956@babylon4-dir  ssl=0
  babylon4-dir: cram-md5.c:131-0 cram-get received: auth cram-md5 
  85736557.1331129966@bat  ssl=0
  babylon4-dir: cram-md5.c:150-0 sending resp to challenge: 
  25Q2B+IdJ/UKI/+p6++vkC
  babylon4-dir: ua_dotcmds.c:164-0 Cmd: .api 1
  babylon4-dir: ua_dotcmds.c:164-0 Cmd: .levels Backup
  babylon4-dir: bnet.c:708-0 who=client host=10.24.32.10 port=36131
  babylon4-dir: bnet.c:708-0 who=client host=10.24.32.14 port=36131
  
  That looks like bat, not bconsole, so I think you got the wrong output.
 
  Also, next time it starts failing, run bconsole -d 100 while the Director is
  running with setdebug, so the outputs of both sides can be compared.
 
  __Martin
 

 Hello,
 
~ In looking at the output above, I have a similar comment to Martin
 about bat and bconsole.  First, you should understand that currently
 the Director only has a concept of a *single* console.  When you
 run with bat, it puts the console in gui mode, and so if you are running
 any bconsoles at the same time and you just happen to get some output,
 you will probably get stuff destined for bat, which will completely confuse
 bconsole.
 
 Moral of the story.  Don't run multiple bconsoles if possible because you
 probably will get output mixed between them, and above all don't run
 bconsole at the same time as bat or they both will probably get confused
 particularly bconsole.  bconsole won't have any idea how to handle signals
 destined for bat.


Oops. I have default a bconsole open in my screen[1] session.
And now I know that it might grab / get  data from another bconsole ...

If I recall correct, I keep the bconsole session open to capture messages.
Otherwise I get loads of job messages on opening a bconsole.
That is with version 2.4.4

From another posting:
} Someone (not me) added Maximum Console Connections, which defaults
} to 20, but I am not 100% sure how it interacts with Maximum Concurrent
} Jobs.  Before Maximum Console Connections, everything was lumped int
} Maximum Concurrent Jobs.

What does that mean?
Is it saying: Multiple console connections is work in progress ?


Cheers
Geert Stappers

Footnote:
[1] Terminal multiplexer, http://en.wikipedia.org/wiki/GNU_Screen
--
Virtualization  Cloud Management Using Capacity Planning
Cloud computing makes use of virtualization - but cloud computing 
also focuses on allowing computing to be delivered as a service.
http://www.accelacomm.com/jaw/sfnl/114/51521223/
___
Bacula-users mailing list
Bacula-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/bacula-users