On Tue, May 31, 2016 at 12:01 PM, Gert Doering <[email protected]> wrote:
> Do you foresee more different action types? My imagination ends at > "accept, ignore, refuse-with-error"... > > > Suppose we do make "reject" to not just remove the option but reject the > > server. What to do if no alternative remotes exist --- just SIGHUP and > hope > > that next try will succeed or something like management-hold will keep > the > > connection waiting for user intervention? I would avoid issuing or > causing > > a SIGTERM. > > I have a use case where I would like to silently *ignore* something one > of my servers is sending me (one route that is sent conflicts with local > stuff, but that is not malicious or neglicience on their end, but something > on my end). So v1 would do that for me, changing it to "you can only > have accept ore reject-the-server" would make this option not work for > me (right now I use route-nopull and manually set route statements). > > But anyway - I think SIGHUP is good enough. If there is no other server, > it will loop, like in any other uncorrectable error (like, server cert > invalid), but there is not much we can do about it. > I'm holding back on version 2 of patch due to an issue seen while testing the proposed change (accept, ignore or reject with error by triggering a restart) -- I'm using SIGUSR1 restart so that the next remote gest tried. Testing shows that during such a failure-restart cycle caused by the reject, a SIGTERM received could be lost sometimes depending on the timing of the signal. This is particularly bad for the windows-GUI which uses the exit-event for SIGTERM --- when missed it leaves openvpn.exe process running after the GUI exits causing all kinds of issues. The problem appears to be present only when explicit-exit-notify is used so it seems this is what happens: SIGTERM triggers exit-notify and initializes the struct c->c2.explicit_exit_notification_interval with params that controls the timer. Any SIGUSR1 or SIGHUP after this can break the loop in tunnel_point_to_point() by any of the P2P_CHECK_SIG()s. Then the instance gets restarted. At the re-entry to tunnel_point_to_point() we have context_clear_2 (c); which clears the previously set elements of the c->c2.explicit_exit_notification_interval struct and the SIGTERM is lost. Now, I recall a discussion in the past about SIGTERM/SIGINT being ignored during the resolve-failure -- how was that resolved? Selva
