Re: [Bacula-users] bacula-dir 3.0.3 dies on second job run or manual reload.
On Monday 14 December 2009 09:27:40 Bruno Friedmann wrote: On 12/09/2009 12:11 AM, Janusz Syrytczyk wrote: Hi, I've upgraded to 3.0.3 from 3.0.2 a while ago and I'm facing serious problems with bacula-dir stability. Just after its start, Director is able to perform any request I have (perform a backup, restore, reload etc.). But once I've got the task done, Director stops listening me - the second job is not starting when requested. Then bconsole stops, I have to exit ctrl+c, but reissuing bconsle and here typing status dir gives that the backup is running. The problem is that the backup is not running. Director keeps it almost fully silent. When I try to reload through bconsole, I'm experiencing Director going like zombie - cannot connect. Debugging gives only this: atom-dir: bnet.c:670-0 who=client host=192.168.1.150 port=36131 What's interesting, when I leave the Director alone it works OK, it schedules backups and performs them. I had previously suspected that something is wrong with scheduler as on before this troubleshooting I couldn't even get the Director scheduling, but since few days it goes right. This is the same issue as the guy here, but he hasn't found a clue: http://www.mail-archive.com/bacula-users@lists.sourceforge.net/msg38279.h tml I've just moved backups and database, recompiled Bacula, recreated the database and started backups but the same history goes. What this could be, anyone? Don't know if it's your case. We have same trouble here with dir hanging after having run the first job. I've restart it with -d100 just to check what's happen. In the meantime, on the bacula server (which has been upgraded from opensuse 11.1 to 11.2 ) I have found that postfix is throttling ... ( missing relay.db file in /etc/postfix : issue a postmap relay and restart postfix ) After that all emails are working. As inside my dir-config message bsmtp are connected to the internal postfix, bsmtp was hanging ! And perharps bacula-dir too. I've now running three scheduled jobs, and bacula-dir have done it's jobs. What I suspect is : there's no bsmtp timeout ( if it could not connect it return, but if it connect and nothing goes right in postfix (the throttling case) it wait indefinitely and also the director I will leave this configuration running 2 to 3 days just to be sure it was that. In the meantime, if you can check on your side, if you get some trouble with bstmp to infirm or confirm. True, I've verified this too. bsmtp goes zombie and bacula-dir waits on it. Solution is to usea another app for sending email or drop email notifications at all. I wonder if its not a candidate to bug report? Thanks, JS -- This SF.Net email is sponsored by the Verizon Developer Community Take advantage of Verizon's best-in-class app development support A streamlined, 14 day to market process makes app distribution fast and easy Join now and get one step closer to millions of Verizon customers http://p.sf.net/sfu/verizon-dev2dev ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] bacula-dir 3.0.3 dies on second job run or manual reload.
On 12/18/2009 09:20 AM, Janusz Syrytczyk wrote: On Monday 14 December 2009 09:27:40 Bruno Friedmann wrote: On 12/09/2009 12:11 AM, Janusz Syrytczyk wrote: Hi, I've upgraded to 3.0.3 from 3.0.2 a while ago and I'm facing serious problems with bacula-dir stability. Just after its start, Director is able to perform any request I have (perform a backup, restore, reload etc.). But once I've got the task done, Director stops listening me - the second job is not starting when requested. Then bconsole stops, I have to exit ctrl+c, but reissuing bconsle and here typing status dir gives that the backup is running. The problem is that the backup is not running. Director keeps it almost fully silent. When I try to reload through bconsole, I'm experiencing Director going like zombie - cannot connect. Debugging gives only this: atom-dir: bnet.c:670-0 who=client host=192.168.1.150 port=36131 What's interesting, when I leave the Director alone it works OK, it schedules backups and performs them. I had previously suspected that something is wrong with scheduler as on before this troubleshooting I couldn't even get the Director scheduling, but since few days it goes right. This is the same issue as the guy here, but he hasn't found a clue: http://www.mail-archive.com/bacula-users@lists.sourceforge.net/msg38279.h tml I've just moved backups and database, recompiled Bacula, recreated the database and started backups but the same history goes. What this could be, anyone? Don't know if it's your case. We have same trouble here with dir hanging after having run the first job. I've restart it with -d100 just to check what's happen. In the meantime, on the bacula server (which has been upgraded from opensuse 11.1 to 11.2 ) I have found that postfix is throttling ... ( missing relay.db file in /etc/postfix : issue a postmap relay and restart postfix ) After that all emails are working. As inside my dir-config message bsmtp are connected to the internal postfix, bsmtp was hanging ! And perharps bacula-dir too. I've now running three scheduled jobs, and bacula-dir have done it's jobs. What I suspect is : there's no bsmtp timeout ( if it could not connect it return, but if it connect and nothing goes right in postfix (the throttling case) it wait indefinitely and also the director I will leave this configuration running 2 to 3 days just to be sure it was that. In the meantime, if you can check on your side, if you get some trouble with bstmp to infirm or confirm. True, I've verified this too. bsmtp goes zombie and bacula-dir waits on it. Solution is to usea another app for sending email or drop email notifications at all. I wonder if its not a candidate to bug report? Thanks, JS I think you could fill a bug report against it (forward the number here so I could attach myself to it) In fact director or bsmtp need somewhere a timeout in case of such trap. -- Bruno Friedmann -- This SF.Net email is sponsored by the Verizon Developer Community Take advantage of Verizon's best-in-class app development support A streamlined, 14 day to market process makes app distribution fast and easy Join now and get one step closer to millions of Verizon customers http://p.sf.net/sfu/verizon-dev2dev ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
Re: [Bacula-users] bacula-dir 3.0.3 dies on second job run or manual reload.
On 12/09/2009 12:11 AM, Janusz Syrytczyk wrote: Hi, I've upgraded to 3.0.3 from 3.0.2 a while ago and I'm facing serious problems with bacula-dir stability. Just after its start, Director is able to perform any request I have (perform a backup, restore, reload etc.). But once I've got the task done, Director stops listening me - the second job is not starting when requested. Then bconsole stops, I have to exit ctrl+c, but reissuing bconsle and here typing status dir gives that the backup is running. The problem is that the backup is not running. Director keeps it almost fully silent. When I try to reload through bconsole, I'm experiencing Director going like zombie - cannot connect. Debugging gives only this: atom-dir: bnet.c:670-0 who=client host=192.168.1.150 port=36131 What's interesting, when I leave the Director alone it works OK, it schedules backups and performs them. I had previously suspected that something is wrong with scheduler as on before this troubleshooting I couldn't even get the Director scheduling, but since few days it goes right. This is the same issue as the guy here, but he hasn't found a clue: http://www.mail-archive.com/bacula-users@lists.sourceforge.net/msg38279.html I've just moved backups and database, recompiled Bacula, recreated the database and started backups but the same history goes. What this could be, anyone? Don't know if it's your case. We have same trouble here with dir hanging after having run the first job. I've restart it with -d100 just to check what's happen. In the meantime, on the bacula server (which has been upgraded from opensuse 11.1 to 11.2 ) I have found that postfix is throttling ... ( missing relay.db file in /etc/postfix : issue a postmap relay and restart postfix ) After that all emails are working. As inside my dir-config message bsmtp are connected to the internal postfix, bsmtp was hanging ! And perharps bacula-dir too. I've now running three scheduled jobs, and bacula-dir have done it's jobs. What I suspect is : there's no bsmtp timeout ( if it could not connect it return, but if it connect and nothing goes right in postfix (the throttling case) it wait indefinitely and also the director I will leave this configuration running 2 to 3 days just to be sure it was that. In the meantime, if you can check on your side, if you get some trouble with bstmp to infirm or confirm. -- Bruno Friedmann -- Return on Information: Google Enterprise Search pays you back Get the facts. http://p.sf.net/sfu/google-dev2dev ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users
[Bacula-users] bacula-dir 3.0.3 dies on second job run or manual reload.
Hi, I've upgraded to 3.0.3 from 3.0.2 a while ago and I'm facing serious problems with bacula-dir stability. Just after its start, Director is able to perform any request I have (perform a backup, restore, reload etc.). But once I've got the task done, Director stops listening me - the second job is not starting when requested. Then bconsole stops, I have to exit ctrl+c, but reissuing bconsle and here typing status dir gives that the backup is running. The problem is that the backup is not running. Director keeps it almost fully silent. When I try to reload through bconsole, I'm experiencing Director going like zombie - cannot connect. Debugging gives only this: atom-dir: bnet.c:670-0 who=client host=192.168.1.150 port=36131 What's interesting, when I leave the Director alone it works OK, it schedules backups and performs them. I had previously suspected that something is wrong with scheduler as on before this troubleshooting I couldn't even get the Director scheduling, but since few days it goes right. This is the same issue as the guy here, but he hasn't found a clue: http://www.mail-archive.com/bacula-users@lists.sourceforge.net/msg38279.html I've just moved backups and database, recompiled Bacula, recreated the database and started backups but the same history goes. What this could be, anyone? -- Return on Information: Google Enterprise Search pays you back Get the facts. http://p.sf.net/sfu/google-dev2dev ___ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users