Re: [bareos-users] Re: Backup of clients in different network with long run before scripts terminate with connection reset
Il giorno mercoledì 19 luglio 2017 11:15:25 UTC+2, Cristian Mammoli ha scritto: > Of course it is C:/Test.bat, not C:\Test.bat > >Run Script { > Command = "C:/Test.bat" > Runs When = after > Fail Job On Error = No >} > Ok, the test.bat script run without a itch for 2 weeks. Now i'll try to put my command inside the bat script -- You received this message because you are subscribed to the Google Groups "bareos-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to bareos-users+unsubscr...@googlegroups.com. To post to this group, send email to bareos-users@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: [bareos-users] Re: Backup of clients in different network with long run before scripts terminate with connection reset
Of course it is C:/Test.bat, not C:\Test.bat Run Script { Command = "C:/Test.bat" Runs When = after Fail Job On Error = No } Il 19/07/2017 10:33, Bruno Friedmann ha scritto: On mardi, 18 juillet 2017 15.34:13 h CEST Cristian Mammoli wrote: It works *most of the time* the way it is. It's not easy to reproduce it. I can test with a simple batch but I don't think an escaping problem can cause a"connection reset" once every 10 backups (just saying) :-) Yeah right, the root cause is still really annoying (especially the randomness of reproductibility) Just to exclude another eventual cause, I guess you're using fixed ip, and not dhcp where the problem occur ? so we can exclude the problem that a renewal of dhcp lease can create. I've also seen this kind of trouble last week-end with one 15.2.4 windows 2012 client (Hyper-V guest) 15-Jul 20:00 europe-fd JobId 10487: Generate VSS snapshots. Driver="Win64 VSS", Drive(s)="CD" 15-Jul 20:00 europe-fd JobId 10487: VolumeMountpoints are not processed as onefs = yes. 15-Jul 20:00 europe-fd JobId 10487: VolumeMountpoints are not processed as onefs = yes. 15-Jul 21:57 oceania-sd JobId 10487: User specified Job spool size reached: JobSpoolSize=68,719,477,111 MaxJobSpoolSize=68,719,476,736 15-Jul 21:57 oceania-sd JobId 10487: Writing spooled data to Volume. Despooling 68,719,477,111 bytes ... 15-Jul 21:59 oceania-dir JobId 10487: Fatal error: Network error with FD during Backup: ERR=Connection timed out 15-Jul 21:59 oceania-dir JobId 10487: Error: Director's comm line to SD dropped. 15-Jul 21:59 oceania-dir JobId 10487: Fatal error: No Job status returned from FD. 15-Jul 21:59 oceania-dir JobId 10487: Error: Bareos oceania-dir 15.2.4 (09Jun16): No idea why the unspooling doesn't take place, and the Network error. I'm suspecting network related troubles, as the ethernet errors are increasing more than expected (which can be also a switch failure) Ports have to be monitored to check if this is the case. -- You received this message because you are subscribed to the Google Groups "bareos-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to bareos-users+unsubscr...@googlegroups.com. To post to this group, send email to bareos-users@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: [bareos-users] Re: Backup of clients in different network with long run before scripts terminate with connection reset
Il 19/07/2017 10:33, Bruno Friedmann ha scritto: Yeah right, the root cause is still really annoying (especially the randomness of reproductibility) Just to exclude another eventual cause, I guess you're using fixed ip, and not dhcp where the problem occur ? so we can exclude the problem that a renewal of dhcp lease can create. I configured a run after job this way: Run Script { Command = "C:\Test.bat" Runs When = after Fail Job On Error = No } And test.bat only has "echo hello world" in it Let's se how it goes, atm this is the only run after job left on the problematic servers I've also seen this kind of trouble last week-end with one 15.2.4 windows 2012 client (Hyper-V guest) No idea why the unspooling doesn't take place, and the Network error. I'm suspecting network related troubles, as the ethernet errors are increasing more than expected (which can be also a switch failure) Ports have to be monitored to check if this is the case. Sadly my environment is a remote "virtual datacenter" running on vSphere and I don't have access to the underlying network. The Linux VMs, anyway, are running fine, but I already tried to exclude all the possible causes: * Windows Firewall * KeepAlive settings * Vmware tools (and nic drivers) version I should try to replace the vmxnet3 vNic with e1000, but I already know E1000 is even more unstable on windows 2012 and later. -- You received this message because you are subscribed to the Google Groups "bareos-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to bareos-users+unsubscr...@googlegroups.com. To post to this group, send email to bareos-users@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: [bareos-users] Re: Backup of clients in different network with long run before scripts terminate with connection reset
It works *most of the time* the way it is. It's not easy to reproduce it. I can test with a simple batch but I don't think an escaping problem can cause a"connection reset" once every 10 backups (just saying) :-) Il 18/07/2017 14:37, Bruno Friedmann ha scritto: On mardi, 18 juillet 2017 09.18:28 h CEST Cristian Mammoli wrote: Tha was just an example script, it does work for the record ( i create a systemstate backup Before and dump it after backup). Anyway I get connection reset by peer even with a simle script such as: Run Script { Command = "del /Q \"C:\\Program Files\\MySQL\\MySQL Server 5.7\\Backup\\MySQL.bak\"" Runs When = after Fail Job On Error = No } Considering the documentation (especially the Windows consideration) http://doc.bareos.org/master/html/bareos-manual-main-reference.html#directiveDirJobRun%20Script And by experience the nitpicking thing to correctly escaping and sending command line what happen if you wrap this in a simple bat. Command = "c:/testme.bat" is it working ? Il 17/07/2017 18:18, Bruno Friedmann ha scritto: it seems you ask windows to delete the running vss state. l don't know if this can work ?> On July 17, 2017 9:50:02 AM GMT+02:00, Cristian Mammoli wrote: 05-Jul 21:48 srvbkp-dir JobId 54311: Fatal error: Network error with FD during Backup: ERR=Connection reset by peer 05-Jul 21:48 srvbkp-dir JobId 54311: Error: Bareos srvbkp-dir 16.2.4 (01Jul16): I can confirm that the issue is with Client Run After Job such as: Run Script { Command = "wbadmin delete systemstatebackup -keepversions:0 -quiet" Runs When = before Fail Job On Error = No } I commented out all the script like this and had no issues so far. Obviously this is not a solution... I'm pretty sure this has nothing to do with routers since I noticed it happens even in the same network. So to recap these are the conditions: It doesn't happen with Linux clients but only Windows (2008R2 to 2012R2 tested) Windows firewall on/off doesn't matter Heartbeat interval does not help It happens with "normal" mode, passive clients, and client initiated connections It happens even if server and client are in the same network It only happens if there is a "Client Run After Job" script I tried updating vmware tools and nic drivers -- You received this message because you are subscribed to the Google Groups "bareos-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to bareos-users+unsubscr...@googlegroups.com. To post to this group, send email to bareos-users@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: [bareos-users] Re: Backup of clients in different network with long run before scripts terminate with connection reset
Tha was just an example script, it does work for the record ( i create a systemstate backup Before and dump it after backup). Anyway I get connection reset by peer even with a simle script such as: Run Script { Command = "del /Q \"C:\\Program Files\\MySQL\\MySQL Server 5.7\\Backup\\MySQL.bak\"" Runs When = after Fail Job On Error = No } Il 17/07/2017 18:18, Bruno Friedmann ha scritto: it seems you ask windows to delete the running vss state. l don't know if this can work ? On July 17, 2017 9:50:02 AM GMT+02:00, Cristian Mammoli wrote: 05-Jul 21:48 srvbkp-dir JobId 54311: Fatal error: Network error with FD during Backup: ERR=Connection reset by peer 05-Jul 21:48 srvbkp-dir JobId 54311: Error: Bareos srvbkp-dir 16.2.4 (01Jul16): I can confirm that the issue is with Client Run After Job such as: Run Script { Command = "wbadmin delete systemstatebackup -keepversions:0 -quiet" Runs When = before Fail Job On Error = No } I commented out all the script like this and had no issues so far. Obviously this is not a solution... I'm pretty sure this has nothing to do with routers since I noticed it happens even in the same network. So to recap these are the conditions: It doesn't happen with Linux clients but only Windows (2008R2 to 2012R2 tested) Windows firewall on/off doesn't matter Heartbeat interval does not help It happens with "normal" mode, passive clients, and client initiated connections It happens even if server and client are in the same network It only happens if there is a "Client Run After Job" script I tried updating vmware tools and nic drivers -- You received this message because you are subscribed to the Google Groups "bareos-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to bareos-users+unsubscr...@googlegroups.com. To post to this group, send email to bareos-users@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: [bareos-users] Re: Backup of clients in different network with long run before scripts terminate with connection reset
> 05-Jul 21:48 srvbkp-dir JobId 54311: Fatal error: Network error with FD > during Backup: ERR=Connection reset by peer > 05-Jul 21:48 srvbkp-dir JobId 54311: Error: Bareos srvbkp-dir 16.2.4 > (01Jul16): I can confirm that the issue is with Client Run After Job such as: Run Script { Command = "wbadmin delete systemstatebackup -keepversions:0 -quiet" Runs When = before Fail Job On Error = No } I commented out all the script like this and had no issues so far. Obviously this is not a solution... I'm pretty sure this has nothing to do with routers since I noticed it happens even in the same network. So to recap these are the conditions: It doesn't happen with Linux clients but only Windows (2008R2 to 2012R2 tested) Windows firewall on/off doesn't matter Heartbeat interval does not help It happens with "normal" mode, passive clients, and client initiated connections It happens even if server and client are in the same network It only happens if there is a "Client Run After Job" script I tried updating vmware tools and nic drivers -- You received this message because you are subscribed to the Google Groups "bareos-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to bareos-users+unsubscr...@googlegroups.com. To post to this group, send email to bareos-users@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: [bareos-users] Re: Backup of clients in different network with long run before scripts terminate with connection reset
I checked the logs and the connection does not reset *before* the client starts sending data to the SD but *after* the run after job script! The backup actually "succeeds": 05-Jul 21:48 css-srvdc02-fd JobId 54311: ClientAfterJob: Deleting system state backup version 07/05/2017-18:42 (1 out of 1)... 05-Jul 21:48 srvbkp-sd JobId 54311: Sending spooled attrs to the Director. Despooling 63,976 bytes ... 05-Jul 21:48 css-srvdc02-fd JobId 54311: ClientAfterJob: The operation to delete system state backups completed, 05-Jul 21:48 css-srvdc02-fd JobId 54311: ClientAfterJob: 1 backups were deleted. 05-Jul 21:48 srvbkp-dir JobId 54311: Fatal error: Network error with FD during Backup: ERR=Connection reset by peer 05-Jul 21:48 srvbkp-dir JobId 54311: Error: Bareos srvbkp-dir 16.2.4 (01Jul16): -- You received this message because you are subscribed to the Google Groups "bareos-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to bareos-users+unsubscr...@googlegroups.com. To post to this group, send email to bareos-users@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: [bareos-users] Re: Backup of clients in different network with long run before scripts terminate with connection reset
Il giorno giovedì 6 luglio 2017 13:05:02 UTC+2, Bruno Friedmann ha scritto: > did you tried to setup heartbeatinterval option inr dir sd and client. you > certainly facing a firewall or router somewhefe that drop what it consider as > empty dead connection. Yes, I already added: Heartbeat Interval = 60 in director, client and sd.conf I even manually configured [HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters] "KeepAliveInterval"=dword:03e8 "KeepAliveTime"=dword:ea60 in Windows registry. Actually I am running a backup while sniffing the traffic with Wireshark and the keepalives seems to be exchanged: tcp0 0 10.254.99.100:34562 10.254.96.1:9102ESTABLISHED keepalive (54.00/0/0) tcp0 0 10.254.99.100:34560 10.254.96.1:9102ESTABLISHED keepalive (43.76/0/0) 453 1020.122463 10.254.96.1 10.254.99.100 TCP 55 [TCP Keep-Alive] 9102 → 34562 [ACK] Seq=106 Ack=185 Win=131584 Len=1 454 1020.123393 10.254.99.100 10.254.96.1 TCP 78 [TCP Keep-Alive ACK] 34562 → 9102 [ACK] Seq=185 Ack=107 Win=29312 Len=0 TSval=186109820 TSecr=94118 SLE=106 SRE=107 But every now and then the backup fails. The connection is always reset at the end of "the run before job" script when the client should start sending data to the SD (sd and dir run on the same server) -- You received this message because you are subscribed to the Google Groups "bareos-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to bareos-users+unsubscr...@googlegroups.com. To post to this group, send email to bareos-users@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
[bareos-users] Re: Backup of clients in different network with long run before scripts terminate with connection reset
Il giorno giovedì 26 gennaio 2017 10:03:58 UTC+1, Cristian Mammoli ha scritto: > I forgot to save the file before attaching Anyone??? -- You received this message because you are subscribed to the Google Groups "bareos-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to bareos-users+unsubscr...@googlegroups.com. To post to this group, send email to bareos-users@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
[bareos-users] Re: Backup of clients in different network with long run before scripts terminate with connection reset
I forgot to save the file before attaching -- You received this message because you are subscribed to the Google Groups "bareos-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to bareos-users+unsubscr...@googlegroups.com. To post to this group, send email to bareos-users@googlegroups.com. For more options, visit https://groups.google.com/d/optout. 25-Jan 20:40 srvbkp-dir JobId 24217: Start Backup JobId 24217, Job=css-srvdc02-backup.2017-01-25_20.30.01_53 25-Jan 20:40 srvbkp-dir JobId 24217: Using Device "File" to write. 25-Jan 20:40 css-srvdc02-fd JobId 24217: Created 32 wildcard excludes from FilesNotToBackup Registry key 25-Jan 20:40 css-srvdc02-fd JobId 24217: shell command: run ClientBeforeJob "REG ADD HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\wbengine\SystemStateBackup\ /v AllowSSBToAnyVolume /t REG_DWORD /d 1 /F" 25-Jan 20:40 css-srvdc02-fd JobId 24217: ClientBeforeJob: The operation completed successfully. 25-Jan 20:40 css-srvdc02-fd JobId 24217: ClientBeforeJob: 25-Jan 20:40 css-srvdc02-fd JobId 24217: shell command: run ClientBeforeJob "wbadmin start systemstatebackup -backuptarget:C: -quiet" 25-Jan 20:40 css-srvdc02-fd JobId 24217: ClientBeforeJob: wbadmin 1.0 - Backup command-line tool 25-Jan 20:40 css-srvdc02-fd JobId 24217: ClientBeforeJob: (C) Copyright 2013 Microsoft Corporation. All rights reserved. 25-Jan 20:40 css-srvdc02-fd JobId 24217: ClientBeforeJob: 25-Jan 20:40 css-srvdc02-fd JobId 24217: ClientBeforeJob: Starting to back up the system state [25/01/2017 20:40]... 25-Jan 20:40 css-srvdc02-fd JobId 24217: ClientBeforeJob: Retrieving volume information... 25-Jan 20:40 css-srvdc02-fd JobId 24217: ClientBeforeJob: This will back up the system state from volume(s) System Reserved (350.00 MB),(C:) to C:. 25-Jan 20:40 css-srvdc02-fd JobId 24217: ClientBeforeJob: Creating a shadow copy of the volumes specified for backup... 25-Jan 20:40 css-srvdc02-fd JobId 24217: ClientBeforeJob: Creating a shadow copy of the volumes specified for backup... 25-Jan 20:41 css-srvdc02-fd JobId 24217: ClientBeforeJob: Please wait while system state files to back up are identified. 25-Jan 20:41 css-srvdc02-fd JobId 24217: ClientBeforeJob: This might take several minutes... 25-Jan 20:41 css-srvdc02-fd JobId 24217: ClientBeforeJob: Found (800) files. 25-Jan 20:41 css-srvdc02-fd JobId 24217: ClientBeforeJob: Found (6302) files. 25-Jan 20:41 css-srvdc02-fd JobId 24217: ClientBeforeJob: Found (20986) files. 25-Jan 20:41 css-srvdc02-fd JobId 24217: ClientBeforeJob: Found (30911) files. 25-Jan 20:41 css-srvdc02-fd JobId 24217: ClientBeforeJob: Found (34322) files. 25-Jan 20:42 css-srvdc02-fd JobId 24217: ClientBeforeJob: Found (46552) files. 25-Jan 20:42 css-srvdc02-fd JobId 24217: ClientBeforeJob: Found (54284) files. 25-Jan 20:42 css-srvdc02-fd JobId 24217: ClientBeforeJob: Found (58286) files. 25-Jan 20:42 css-srvdc02-fd JobId 24217: ClientBeforeJob: Found (60655) files. 25-Jan 20:42 css-srvdc02-fd JobId 24217: ClientBeforeJob: Found (62403) files. 25-Jan 20:42 css-srvdc02-fd JobId 24217: ClientBeforeJob: Found (70164) files. 25-Jan 20:43 css-srvdc02-fd JobId 24217: ClientBeforeJob: Found (76541) files. 25-Jan 20:43 css-srvdc02-fd JobId 24217: ClientBeforeJob: Found (84063) files. 25-Jan 20:43 css-srvdc02-fd JobId 24217: ClientBeforeJob: Found (90136) files. 25-Jan 20:43 css-srvdc02-fd JobId 24217: ClientBeforeJob: Found (95040) files. 25-Jan 20:43 css-srvdc02-fd JobId 24217: ClientBeforeJob: Found (103972) files. 25-Jan 20:43 css-srvdc02-fd JobId 24217: ClientBeforeJob: Found (119617) files. 25-Jan 20:44 css-srvdc02-fd JobId 24217: ClientBeforeJob: Found (132625) files. 25-Jan 20:44 css-srvdc02-fd JobId 24217: ClientBeforeJob: Found (138834) files. 25-Jan 20:44 css-srvdc02-fd JobId 24217: ClientBeforeJob: Found (142878) files. 25-Jan 20:44 css-srvdc02-fd JobId 24217: ClientBeforeJob: Found (142878) files. 25-Jan 20:44 css-srvdc02-fd JobId 24217: ClientBeforeJob: The search for system state files is complete. 25-Jan 20:44 css-srvdc02-fd JobId 24217: ClientBeforeJob: Starting to back up files... 25-Jan 20:44 css-srvdc02-fd JobId 24217: ClientBeforeJob: The backup of files reported by 'Task Scheduler Writer' is complete. 25-Jan 20:44 css-srvdc02-fd JobId 24217: ClientBeforeJob: The backup of files reported by 'VSS Metadata Store Writer' is complete. 25-Jan 20:44 css-srvdc02-fd JobId 24217: ClientBeforeJob: The backup of files reported by 'Performance Counters Writer' is complete. 25-Jan 20:44 css-srvdc02-fd JobId 24217: ClientBeforeJob: Overall progress: 0%. 25-Jan 20:44 css-srvdc02-fd JobId 24217: ClientBeforeJob: Currently backing up files reported by 'Registry Writer'... 25-Jan 20:44 css-srvdc02-fd JobId 24217: ClientBeforeJob: The backup of files reported by 'Registry Writer' is complete. 25-Jan 20:44 css-srvdc02-fd JobId 24217: ClientBeforeJob: Overall progress: 2%. 25-Jan 20:44 css-srvdc02-fd JobId 24217: ClientBeforeJob: Cur
[bareos-users] Backup of clients in different network with long run before scripts terminate with connection reset
Hi, we are having issues backing up a couple of windows servers behind a firewall. There are lots of server in the same network but this 2 are the only ones with long (20-30 mins) run before scripts. Every now and then (like 1 every 5 backups) the backup ends with connection reset by peer: 25-Jan 21:19 srvbkp-sd JobId 24217: Sending spooled attrs to the Director. Despooling 79,898 bytes ... 25-Jan 21:19 css-srvdc02-fd JobId 24217: ClientAfterJob: The operation to delete system state backups completed, 25-Jan 21:19 css-srvdc02-fd JobId 24217: ClientAfterJob: 1 backups were deleted. 25-Jan 21:19 srvbkp-dir JobId 24217: Fatal error: Network error with FD during Backup: ERR=Connection reset by peer 25-Jan 21:19 srvbkp-dir JobId 24217: Fatal error: No Job status returned from FD. 25-Jan 21:19 srvbkp-dir JobId 24217: Error: Bareos srvbkp-dir 16.2.4 (01Jul16): The full log is attached I already tried adding "Heartbeat interval = 60" to the server, client and storage configuration. Then I tried lowering keepalive time both on the director and on the windows client like I read here: http://wiki.bacula.org/doku.php?id=faq More info: Director and Storage daemon run on the same server Everything is version 16.4 It doesn't happen with Linux clients Windows Firewall on the affected server is on but there is an exception for Bareos It happens with "normal" mode, passive clients, and client initiated connections as well I'm using SpoolAttributes = yes Thanks Cristian -- You received this message because you are subscribed to the Google Groups "bareos-users" group. To unsubscribe from this group and stop receiving emails from it, send an email to bareos-users+unsubscr...@googlegroups.com. To post to this group, send email to bareos-users@googlegroups.com. For more options, visit https://groups.google.com/d/optout.