JoshRosen edited a comment on issue #25138: [SPARK-26175][PYSPARK] Closing 
stdin of the worker process right after fork
URL: https://github.com/apache/spark/pull/25138#issuecomment-511143818
 
 
   Good catch. This seems reasonable to me.
   
   One question, though: is it possible to add a regression test for this? 
Here's some brainstorming on how we might do that:
   
   - Prior to this patch's change, the example you gave would hang indefinitely 
upon cancellation, whereas afterwards it'll exit quickly, so maybe we can add a 
Python unit test alongside existing Python-initiated cancel tests (assuming 
that we have those tests in `.py` files or some other type of test case that 
lets us run PySpark code).
   - In our test, could we start an async job, wait until it's running, kill 
the job, then assert that the number of running tasks (as reported by executor 
status API) reaches 0 within some reasonable time period?
   - One risk in my proposal is race conditions in waiting for the tasks to 
kick off: if we cancel a task before it forks a child process then our test 
will spuriously pass (i.e. it wouldn't function as a reliable regression test). 
Fixing this is kinda tricky, though, because our usual trick of using 
semaphores / countdown latches doesn't work well in a cross-language world. 
Maybe our best option is to just hardcode some reasonable time constants there? 
Or have the Python code `touch` a file and then wait for that file to appear 
before killing the task (e.g. a subprocess of `touch 
/filename/picked/by/driver/in/advance && cat`).
   
   Just thinking aloud here; let me know if you can think of a cleaner way to 
test this (or whether we can regression-test this via some other means).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to