On Tue, May 31, 2016 at 12:01 PM, Gert Doering <[email protected]> wrote:

> Do you foresee more different action types?  My imagination ends at
> "accept, ignore, refuse-with-error"...
>
> > Suppose we do make "reject" to not just remove the option but reject the
> > server. What to do if no alternative remotes exist --- just SIGHUP and
> hope
> > that next try will succeed or something like management-hold will keep
> the
> > connection waiting for user intervention? I would avoid issuing or
> causing
> > a SIGTERM.
>
> I have a use case where I would like to silently *ignore* something one
> of my servers is sending me (one route that is sent conflicts with local
> stuff, but that is not malicious or neglicience on their end, but something
> on my end).  So v1 would do that for me, changing it to "you can only
> have accept ore reject-the-server" would make this option not work for
> me (right now I use route-nopull and manually set route statements).
>
> But anyway - I think SIGHUP is good enough.  If there is no other server,
> it will loop, like in any other uncorrectable error (like, server cert
> invalid), but there is not much we can do about it.
>

I'm holding back on version 2 of  patch due to an issue seen while testing
the proposed change (accept, ignore or  reject with error by triggering a
restart) -- I'm using SIGUSR1 restart so that the next remote gest tried.

Testing shows that during such a failure-restart cycle caused by the
reject, a SIGTERM received could be lost sometimes depending on the timing
of the signal. This is particularly bad for the windows-GUI which uses the
exit-event for SIGTERM --- when missed it leaves openvpn.exe process
running after the GUI exits causing all kinds of issues.

The problem appears to be present only when explicit-exit-notify is used so
it seems this is what happens:

SIGTERM triggers exit-notify and initializes the struct
c->c2.explicit_exit_notification_interval with params that controls the
timer. Any SIGUSR1 or SIGHUP after this can break the loop in
 tunnel_point_to_point() by any of the P2P_CHECK_SIG()s. Then the instance
gets restarted. At the re-entry to tunnel_point_to_point() we have

context_clear_2 (c);

which clears the previously set elements of the
c->c2.explicit_exit_notification_interval struct and the SIGTERM is lost.

Now, I recall a discussion in the past about SIGTERM/SIGINT being ignored
during the resolve-failure -- how was that resolved?

Selva

Reply via email to