> Until you have this have you considered turning on core dumps? Then you will 
> be able to see the image id in the dump file.
Alternatively you could write the context to a file. Then log the contents of 
the file in a wrapper script that handlers python exiting on SEGV.

> I understand what you are getting at about not wanting to log every attempt.
But you could have another program that was sent a message (over UDS) saying 
"about to process {context}" and
after completion sent a message "processing of {context} completed. You would 
log only the {context}
that are not completed.

There are a number of ways to do this, but quite often the framework you're 
using is purely involved with processing batches of data, which makes the 
"poison pill" harder to find at that level.

My suggestion is merely a quality of life one: it would be very, very handy to 
include _some_ information in the faulthandler traceback. Even a fixed 120 
character ASCII-only string would be really useful - imagine your job finishes 
and you have 3 faulthandler tracebacks, being able to tell right away if these 
are because of the same input (be that batch, task, file, etc) or separate ones 
is invaluable. Right now you've 3 of the same tracebacks with no idea what 
actual input triggered them.

For reference, this request comes from running Dask[1] jobs. Dask handles 
retrying and tracking tasks across machines but if you're dealing with a batch 
of inputs that reliably kills a worker it is really hard to debug, moreso if it 
only happens ~12 hours into your job. At certain scales it's quite hard to log 
every processing event reliably, and the overhead may not be worth it for a 1 
in 10,000,000 failure.

1. https://dask.org/
_______________________________________________
Python-ideas mailing list -- python-ideas@python.org
To unsubscribe send an email to python-ideas-le...@python.org
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at 
https://mail.python.org/archives/list/python-ideas@python.org/message/RCKD5NK57LG3BFGWHTL7N2QM4AWLYA2I/
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to