wjszlachta-man commented on code in PR #53306:
URL: https://github.com/apache/spark/pull/53306#discussion_r2626837765


##########
python/pyspark/daemon.py:
##########
@@ -174,15 +175,20 @@ def handle_sigterm(*args):
 
         while True:
             if poller is not None:
-                ready_fds = [fd_reverse_map[fd] for fd, _ in poller.poll(1000)]
-            else:
-                try:
-                    ready_fds = select.select([0, listen_sock], [], [], 1)[0]
-                except select.error as ex:
-                    if ex[0] == EINTR:
-                        continue
+                ready_fds = []
+                # Unlike select, poll timeout is in millis.
+                for fd, event in poller.poll(1000):
+                    if event & (select.POLLIN | select.POLLHUP):
+                        # Data can be read (for POLLHUP peer hang up, so reads 
will return
+                        # 0 bytes, in which case we want to break out - this 
is consistent
+                        # with how select behaves).
+                        ready_fds.append(fd_reverse_map[fd])
                     else:
-                        raise
+                        # Could be POLLERR or POLLNVAL (select would raise in 
this case).
+                        raise PySparkRuntimeError(f"Polling error - event 
{event} on fd {fd}")

Review Comment:
   Personally I wouldn't make it `OSError` as this is not a system call error 
(why I went for `RuntimeError` instead).
   
   But honestly not too fussed and happy to change it to whatever you or Spark 
committers find appropriate as your feel for the codebase is better than mine 
(thanks for reviewing it!).
   
   My focus here is to merge the fix that removes dependency on 
`select.select()` as this is something we have been patching internally for 
quite some time now to allow our researchers to run with large number of 
executors. The problem is even worse if you load a lot of shared libs (for 
example via `ctypes`), which take fds <1024 so you can hit `FD_SETSIZE` with 
fewer than 1000 executors.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to