On 6/15/21 10:06 PM, Ilya Maximets wrote: > On 6/11/21 4:29 PM, Terry Wilson wrote: >> This is a partial revert of c1aa16d19, but keeps its test. >> >> Applications that use eventlet cannot use poll() without blocking. >> To fix an issue where the check_connection_completion() code would >> not detect connection errors when using select(), we switched to >> using the system poll() for checking the connection while leaving >> the select-based poller the default. This mixes monkeypatched and >> unpatched code and caused eventlet-based code to block every time >> we connect.
One more question: Why exactly this happens? I mean, we're calling poll(0) that should not block, why eventlet is blocked in this case? >> >> According to the connect(2) man page, you can detect connect() >> errors when using select(): >> >> After select(2) indicates writability, use getsockopt(2) to read >> the SO_ERROR option at level SOL_SOCKET to determine whether >> connect() completed successfully (SO_ERROR is zero) or >> unsuccessfully (SO_ERROR is one of the usual error codes listed >> here, explaining the reason for the failure) >> >> so this patch does that instead. >> >> Signed-off-by: Terry Wilson <[email protected]> >> --- >> python/ovs/poller.py | 35 ++--------------------------------- >> python/ovs/socket_util.py | 9 ++++++--- >> 2 files changed, 8 insertions(+), 36 deletions(-) >> >> diff --git a/python/ovs/poller.py b/python/ovs/poller.py >> index 3624ec865..7b3238d6d 100644 >> --- a/python/ovs/poller.py >> +++ b/python/ovs/poller.py >> @@ -31,22 +31,14 @@ except ImportError: >> SSL = None >> >> try: >> - from eventlet import patcher as eventlet_patcher >> + import eventlet.patcher >> >> def _using_eventlet_green_select(): >> - return eventlet_patcher.is_monkey_patched(select) >> + return eventlet.patcher.is_monkey_patched(select) >> except: >> - eventlet_patcher = None >> - >> def _using_eventlet_green_select(): >> return False >> >> -try: >> - from gevent import monkey as gevent_monkey >> -except: >> - gevent_monkey = None >> - >> - >> vlog = ovs.vlog.Vlog("poller") >> >> POLLIN = 0x001 >> @@ -265,26 +257,3 @@ class Poller(object): >> def __reset(self): >> self.poll = SelectPoll() >> self.timeout = -1 >> - >> - >> -def get_system_poll(): >> - """Returns the original select.poll() object. If select.poll is >> - monkey patched by eventlet or gevent library, it gets the original >> - select.poll and returns an object of it using the >> - eventlet.patcher.original/gevent.monkey.get_original functions. >> - >> - As a last resort, if there is any exception it returns the >> - SelectPoll() object. >> - """ >> - try: >> - if _using_eventlet_green_select(): >> - _system_poll = eventlet_patcher.original("select").poll >> - elif gevent_monkey and gevent_monkey.is_object_patched( >> - 'select', 'poll'): >> - _system_poll = gevent_monkey.get_original('select', 'poll') >> - else: >> - _system_poll = select.poll >> - except: >> - _system_poll = SelectPoll >> - >> - return _system_poll() >> diff --git a/python/ovs/socket_util.py b/python/ovs/socket_util.py >> index 3faa64e9d..7dd000916 100644 >> --- a/python/ovs/socket_util.py >> +++ b/python/ovs/socket_util.py >> @@ -159,8 +159,8 @@ def make_unix_socket(style, nonblock, bind_path, >> connect_path, short=False): >> >> >> def check_connection_completion(sock): >> + p = ovs.poller.SelectPoll() >> if sys.platform == "win32": >> - p = ovs.poller.SelectPoll() >> event = winutils.get_new_event(None, False, True, None) >> # Receive notification of readiness for writing, of completed >> # connection or multipoint join operation, and of socket closure. >> @@ -170,12 +170,15 @@ def check_connection_completion(sock): >> win32file.FD_CLOSE) >> p.register(event, ovs.poller.POLLOUT) >> else: >> - p = ovs.poller.get_system_poll() >> p.register(sock, ovs.poller.POLLOUT) >> pfds = p.poll(0) >> if len(pfds) == 1: >> revents = pfds[0][1] >> - if revents & ovs.poller.POLLERR or revents & ovs.poller.POLLHUP: >> + if revents & ovs.poller.POLLOUT: >> + # This is how select() indicates a connect() error > > This wording doesn't sound correct to me as it might be an error and might > be not, IIUC. POLLOUT here just means that something happened. It might > be an error or the actual POLLOUT. > (Ugh... select()/poll() are so inconsistent across different platforms) > >> + return sock.getsockopt(socket.SOL_SOCKET, socket.SO_ERROR) > > Looking at the history, it seems that ESX doesn't support SO_ERROR (at least > some old versions), so we need to avoid use of it. > See commit d6cedfd9d29d ("socket-util: Avoid using SO_ERROR.") > > We, likely, need to treat this in the same path as error. i.e. invoke send() > and catch the exception. However, for the POLLOUT case we should not > log and return the error if send() succeeds. But this doesn't sound like > a good idea to send NULL bytes to everyone on success... > > So, some options here: > > 1. Keep as is, i.e. system poll() > --> python will be blocked by poll() > > 2. Fully revert c1aa16d19, i.e. just use select() > --> connections will be treated as established even if they are not > until the actual read/write. > > 3. Use select() + getsockopt(SO_ERRROR), effectively revert d6cedfd9d29d > --> give up on systems that doesn't support SO_ERRROR. (ESX?) > > 4. Use select() + getsockopt(SO_ERRROR), keep returning 0 (success) on > ENOPROTOOPT > --> systems that doesn't support might think that connections are > established while they are not. > > 5. Try to detect if SO_ERRROR supported or not and use system poll() or > select() + getsockopt(SO_ERRROR) based on that fact. > --> better performance on systems that supports and same as now on > systems that doesn't. > > Option 3 is, probably, OK, but I don't know. > Option 4 might be the best choice, but I'm not quiet sure how to detect. > Is checking for ENOPROTOOPT enough (same actually applies to option 3 > as I'm not sure if ESX actually reports ENOPROTOOPT)? > > Also, we're using select() in C code for windows. Do we have the same > issue there? Alin, do you know if check_connection_completion() correctly > detects connection failure on windows, e.g. if 'Check stream open block' > tests are working? > > Any thoughts? > > Ben, what do you think? > > Best regards, Ilya Maximets. > _______________________________________________ dev mailing list [email protected] https://mail.openvswitch.org/mailman/listinfo/ovs-dev
