Re: [Bacula-users] Bacula via NATed connection and Bacula docs - partly solved. Launchd!
The more I look into it, the more it gets weird. Gavin McCullagh schrieb: On Wed, 27 Jan 2010, Dirk H. Schulz wrote: Telnetting from external-fd to server-sd using the above mentionened FQDN and the port of the storage daemon (telnet storage.server.sd 9103) outputs exactly the same as telnetting internally to that port. Afaik, that means: bacula-fd on the external client should be able to connect to bacula-sd on the internal server. But it does not. Running a backup job for this client the director is quite a long time waiting for Client ... to connect to Storage ... and eventually gives up. In this instance, I would be inclined to start a tcpdump like that below on both the -fd and -sd, start your backup and see where exactly the -fd tries to connect to. tcpdump -ni ethX tcp port 9103 The first question I suppose is to see what IP address the -fd is actually using to connect. The second is does the tcp handshake happen correctly and if so what happens then. Perhaps the -fd is connecting to the wrong IP, or it could be a firewall issue, or something else...? First: I made the test with all firewalls on the way shut down (except the one doing NAT) to avoid any issues from there. Then I made a similar test with a different client-fd in the same public subnet, and it worked. I have thoroughly compared the configuration of these two clients (both bacula-fd.conf and bacula-dir.conf). Still nothing works. And here is what tcpdump and bacula-dir output: external-fd:~ root# tcpdump -ni en1 portrange 9101-9103 tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on en1, link-type EN10MB (Ethernet), capture size 96 bytes 08:01:39.346580 IP 1.2.3.4.32930 40.50.60.70.9102: S 1415949915:1415949915(0) win 5840 mss 1452,sackOK,timestamp 939681199 0,nop,wscale 7 08:01:39.346647 IP 40.50.60.70.9102 1.2.3.4.32930: S 221080258:221080258(0) ack 1415949916 win 65535 mss 1460,nop,wscale 3,nop,nop,timestamp 213499348 939681199,sackOK,eol 08:01:39.387055 IP 1.2.3.4.32930 40.50.60.70.9102: . ack 1 win 46 nop,nop,timestamp 939681241 213499348 08:01:39.387073 IP 40.50.60.70.9102 1.2.3.4.32930: . ack 1 win 65535 nop,nop,timestamp 213499348 939681241 08:01:39.391051 IP 1.2.3.4.32930 40.50.60.70.9102: P 1:51(50) ack 1 win 46 nop,nop,timestamp 939681244 213499348 08:01:39.391065 IP 40.50.60.70.9102 1.2.3.4.32930: . ack 51 win 65535 nop,nop,timestamp 213499348 939681244 08:06:22.221818 IP 40.50.60.70.9102 1.2.3.4.32930: . ack 51 win 0 08:06:22.221853 IP 40.50.60.70.9102 1.2.3.4.32930: . ack 51 win 65535 nop,nop,timestamp 213502176 939681244 08:06:22.262232 IP 1.2.3.4.32930 40.50.60.70.9102: . ack 1 win 46 nop,nop,timestamp 939964161 213499348 08:11:07.236737 IP 40.50.60.70.9102 1.2.3.4.32930: . ack 51 win 0 08:11:07.236780 IP 40.50.60.70.9102 1.2.3.4.32930: . ack 51 win 65535 nop,nop,timestamp 213505026 939964161 08:11:07.279418 IP 1.2.3.4.32930 40.50.60.70.9102: . ack 1 win 46 nop,nop,timestamp 940249226 213502176 08:11:44.501513 IP 1.2.3.4.32930 40.50.60.70.9102: F 51:51(0) ack 1 win 46 nop,nop,timestamp 940286454 213505026 08:11:44.501542 IP 40.50.60.70.9102 1.2.3.4.32930: . ack 52 win 65535 nop,nop,timestamp 213505399 940286454 All the while bacula-dir claims waiting for Client external-fd to connect to Storage LTO2 there is not one attempt at connecting to SD from this client! And in the end the error message from bacula-dir is something different: 8-Jan 08:11 bacula-dir JobId 33: Fatal error: Unable to authenticate with File daemon at external-fd.domain.de:9102. Possible causes: Passwords or names not the same or Maximum Concurrent Jobs exceeded on the FD or FD networking messed up (restart daemon). Please see http://www.bacula.org/en/rel-manual/Bacula_Freque_Asked_Questi.html#SECTION00376 for help. 28-Jan 08:11 bacula-dir JobId 33: Fatal error: Network error with FD during Backup: ERR=Unterbrechung während des Betriebssystemaufrufs 28-Jan 08:11 bacula-dir JobId 33: Fatal error: No Job status returned from FD. 28-Jan 08:11 bacula-dir JobId 33: Error: Bacula bacula-dir 3.0.3 (18Oct09): 28-J I have even tried without any passwords, I have copied and pasted the client name everywhere to make sure there is no typo in there. And then - just from pure desperation - I started it bacula-fd manually instead of via launchd (with the same parameters launchd is given) - and now it works! Somehow communication does not work correctly if bacula-fd is started via launchd (/sbin/bacula-fd -f -c /etc/bacula/bacula-fd.conf). Anyone seen that before? Any workaround for that? It is MacOS X Client 10.5.5 Intel (uname -a outputs Darwin external-fd.domain.de 9.5.0 Darwin Kernel Version 9.5.0: Wed Sep 3 11:29:43 PDT 2008; root:xnu-1228.7.58~1/RELEASE_I386 i386). Any help or hint would be greatly appreciated! Dirk
Re: [Bacula-users] Bacula via NATed connection and Bacula docs - partly solved. Launchd!
Dirk H. Schulz wrote: The more I look into it, the more it gets weird. Gavin McCullagh schrieb: On Wed, 27 Jan 2010, Dirk H. Schulz wrote: Telnetting from external-fd to server-sd using the above mentionened FQDN and the port of the storage daemon (telnet storage.server.sd 9103) outputs exactly the same as telnetting internally to that port. Afaik, that means: bacula-fd on the external client should be able to connect to bacula-sd on the internal server. But it does not. Running a backup job for this client the director is quite a long time waiting for Client ... to connect to Storage ... and eventually gives up. In this instance, I would be inclined to start a tcpdump like that below on both the -fd and -sd, start your backup and see where exactly the -fd tries to connect to. tcpdump -ni ethX tcp port 9103 The first question I suppose is to see what IP address the -fd is actually using to connect. The second is does the tcp handshake happen correctly and if so what happens then. Perhaps the -fd is connecting to the wrong IP, or it could be a firewall issue, or something else...? First: I made the test with all firewalls on the way shut down (except the one doing NAT) to avoid any issues from there. Then I made a similar test with a different client-fd in the same public subnet, and it worked. I have thoroughly compared the configuration of these two clients (both bacula-fd.conf and bacula-dir.conf). Still nothing works. And here is what tcpdump and bacula-dir output: external-fd:~ root# tcpdump -ni en1 portrange 9101-9103 tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on en1, link-type EN10MB (Ethernet), capture size 96 bytes 08:01:39.346580 IP 1.2.3.4.32930 40.50.60.70.9102: S 1415949915:1415949915(0) win 5840 mss 1452,sackOK,timestamp 939681199 0,nop,wscale 7 08:01:39.346647 IP 40.50.60.70.9102 1.2.3.4.32930: S 221080258:221080258(0) ack 1415949916 win 65535 mss 1460,nop,wscale 3,nop,nop,timestamp 213499348 939681199,sackOK,eol 08:01:39.387055 IP 1.2.3.4.32930 40.50.60.70.9102: . ack 1 win 46 nop,nop,timestamp 939681241 213499348 08:01:39.387073 IP 40.50.60.70.9102 1.2.3.4.32930: . ack 1 win 65535 nop,nop,timestamp 213499348 939681241 08:01:39.391051 IP 1.2.3.4.32930 40.50.60.70.9102: P 1:51(50) ack 1 win 46 nop,nop,timestamp 939681244 213499348 08:01:39.391065 IP 40.50.60.70.9102 1.2.3.4.32930: . ack 51 win 65535 nop,nop,timestamp 213499348 939681244 08:06:22.221818 IP 40.50.60.70.9102 1.2.3.4.32930: . ack 51 win 0 08:06:22.221853 IP 40.50.60.70.9102 1.2.3.4.32930: . ack 51 win 65535 nop,nop,timestamp 213502176 939681244 08:06:22.262232 IP 1.2.3.4.32930 40.50.60.70.9102: . ack 1 win 46 nop,nop,timestamp 939964161 213499348 08:11:07.236737 IP 40.50.60.70.9102 1.2.3.4.32930: . ack 51 win 0 08:11:07.236780 IP 40.50.60.70.9102 1.2.3.4.32930: . ack 51 win 65535 nop,nop,timestamp 213505026 939964161 08:11:07.279418 IP 1.2.3.4.32930 40.50.60.70.9102: . ack 1 win 46 nop,nop,timestamp 940249226 213502176 08:11:44.501513 IP 1.2.3.4.32930 40.50.60.70.9102: F 51:51(0) ack 1 win 46 nop,nop,timestamp 940286454 213505026 08:11:44.501542 IP 40.50.60.70.9102 1.2.3.4.32930: . ack 52 win 65535 nop,nop,timestamp 213505399 940286454 All the while bacula-dir claims waiting for Client external-fd to connect to Storage LTO2 there is not one attempt at connecting to SD from this client! If this is true, then something is wrong. Either you're looking in the wrong place for the traffic, or the traffic is going somewhere else. Consider the possibility that what you are expecting does not match up with what the Bacula components have been told to do. Verify and double check all data related to that FD and SD. Go through the .conf files and see what hostnames are being used. Verify on both the SD and the FD that those hostnames resolve to the correct IP addresse. Verify that each can talk to the other (telnet IPaddress PORT). -- The Planet: dedicated and managed hosting, cloud storage, colocation Stay online with enterprise data centers and the best network in the business Choose flexible plans and management services without long-term contracts Personal 24x7 support from experience hosting pros just a phone call away. http://p.sf.net/sfu/theplanet-com ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Bacula via NATed connection and Bacula docs - partly solved. Launchd!
Dirk, On Thu, Jan 28, 2010 at 4:35 AM, Dirk H. Schulz dirk.sch...@kinzesberg.de wrote: And then - just from pure desperation - I started bacula-fd manually instead of via launchd (with the same parameters launchd is given) - and now it works! Somehow communication does not work correctly if bacula-fd is started via launchd (/sbin/bacula-fd -f -c /etc/bacula/bacula-fd.conf). Anyone seen that before? Any workaround for that? It is MacOS X Client 10.5.5 Intel (uname -a outputs Darwin external-fd.domain.de 9.5.0 Darwin Kernel Version 9.5.0: Wed Sep 3 11:29:43 PDT 2008; root:xnu-1228.7.58~1/RELEASE_I386 i386). Okay, after cooling down again I started searching. In the original bacula plist file for launchd from bacula sources there is this entry: keySockets/key dict keyListeners/key array dict keySockServiceName/key stringbacula-fd/string /dict /array /dict That tells launchd to realize an on demand run like inetd does in historical unixes. I did not find anything in Apple's documentation on why this should prevent communication from bacula-fd outside to bacula-sd, but it does. Without this entry (and with added KeepAlive and RunAtLoad entries) it works fine. I've been using a simpler plist file for bacula backups of a couple of OS X systems for a few years, without any issue. The plist file is simple: sh-3.2# more /Library/LaunchDaemons/bacula-fd.plist ?xml version=1.0 encoding=UTF-8? !DOCTYPE plist PUBLIC -//Apple//DTD PLIST 1.0//EN http://www.apple.com/DTDs/PropertyList-1.0.dtd; plist version=1.0 dict keyLabel/key stringorg.bacula.bacula-fd/string keyProgramArguments/key array string/usr/local/sbin/bacula-fd/string string-f/string string-c/string string/usr/local/etc/bacula-fd.conf/string /array keyRunAtLoad/key true/ keyUserName/key stringroot/string /dict /plist The bacula-fd stays running all the time, but that seems to be the standard setup, rather than having it started when needed via the launchd equivalent of xinetd. chris -- The Planet: dedicated and managed hosting, cloud storage, colocation Stay online with enterprise data centers and the best network in the business Choose flexible plans and management services without long-term contracts Personal 24x7 support from experience hosting pros just a phone call away. http://p.sf.net/sfu/theplanet-com ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Bacula via NATed connection and Bacula docs - partly solved. Launchd!
Chris Shelton schrieb: Dirk, On Thu, Jan 28, 2010 at 4:35 AM, Dirk H. Schulz dirk.sch...@kinzesberg.de wrote: And then - just from pure desperation - I started bacula-fd manually instead of via launchd (with the same parameters launchd is given) - and now it works! Somehow communication does not work correctly if bacula-fd is started via launchd (/sbin/bacula-fd -f -c /etc/bacula/bacula-fd.conf). Anyone seen that before? Any workaround for that? It is MacOS X Client 10.5.5 Intel (uname -a outputs Darwin external-fd.domain.de 9.5.0 Darwin Kernel Version 9.5.0: Wed Sep 3 11:29:43 PDT 2008; root:xnu-1228.7.58~1/RELEASE_I386 i386). Okay, after cooling down again I started searching. In the original bacula plist file for launchd from bacula sources there is this entry: keySockets/key dict keyListeners/key array dict keySockServiceName/key stringbacula-fd/string /dict /array /dict That tells launchd to realize an on demand run like inetd does in historical unixes. I did not find anything in Apple's documentation on why this should prevent communication from bacula-fd outside to bacula-sd, but it does. Without this entry (and with added KeepAlive and RunAtLoad entries) it works fine. I've been using a simpler plist file for bacula backups of a couple of OS X systems for a few years, without any issue. The plist file is simple: sh-3.2# more /Library/LaunchDaemons/bacula-fd.plist ?xml version=1.0 encoding=UTF-8? !DOCTYPE plist PUBLIC -//Apple//DTD PLIST 1.0//EN http://www.apple.com/DTDs/PropertyList-1.0.dtd; plist version=1.0 dict keyLabel/key stringorg.bacula.bacula-fd/string keyProgramArguments/key array string/usr/local/sbin/bacula-fd/string string-f/string string-c/string string/usr/local/etc/bacula-fd.conf/string /array keyRunAtLoad/key true/ keyUserName/key stringroot/string /dict /plist The bacula-fd stays running all the time, but that seems to be the standard setup, rather than having it started when needed via the launchd equivalent of xinetd. That is what I thought and what I have configured now, too, but I wonder why the developers implement a configuration that cannot work without correction. I am just chasing for the point of view in which this makes sense in case there is something to learn. :-) Thank you, Dirk -- The Planet: dedicated and managed hosting, cloud storage, colocation Stay online with enterprise data centers and the best network in the business Choose flexible plans and management services without long-term contracts Personal 24x7 support from experience hosting pros just a phone call away. http://p.sf.net/sfu/theplanet-com ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] Bacula via NATed connection and Bacula docs
On Wed, 27 Jan 2010, Dirk H. Schulz wrote: Telnetting from external-fd to server-sd using the above mentionened FQDN and the port of the storage daemon (telnet storage.server.sd 9103) outputs exactly the same as telnetting internally to that port. Afaik, that means: bacula-fd on the external client should be able to connect to bacula-sd on the internal server. But it does not. Running a backup job for this client the director is quite a long time waiting for Client ... to connect to Storage ... and eventually gives up. In this instance, I would be inclined to start a tcpdump like that below on both the -fd and -sd, start your backup and see where exactly the -fd tries to connect to. tcpdump -ni ethX tcp port 9103 The first question I suppose is to see what IP address the -fd is actually using to connect. The second is does the tcp handshake happen correctly and if so what happens then. Perhaps the -fd is connecting to the wrong IP, or it could be a firewall issue, or something else...? Gavin -- The Planet: dedicated and managed hosting, cloud storage, colocation Stay online with enterprise data centers and the best network in the business Choose flexible plans and management services without long-term contracts Personal 24x7 support from experience hosting pros just a phone call away. http://p.sf.net/sfu/theplanet-com ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users