> Until you have this have you considered turning on core dumps? Then you will > be able to see the image id in the dump file. Alternatively you could write the context to a file. Then log the contents of the file in a wrapper script that handlers python exiting on SEGV.
> I understand what you are getting at about not wanting to log every attempt. But you could have another program that was sent a message (over UDS) saying "about to process {context}" and after completion sent a message "processing of {context} completed. You would log only the {context} that are not completed. There are a number of ways to do this, but quite often the framework you're using is purely involved with processing batches of data, which makes the "poison pill" harder to find at that level. My suggestion is merely a quality of life one: it would be very, very handy to include _some_ information in the faulthandler traceback. Even a fixed 120 character ASCII-only string would be really useful - imagine your job finishes and you have 3 faulthandler tracebacks, being able to tell right away if these are because of the same input (be that batch, task, file, etc) or separate ones is invaluable. Right now you've 3 of the same tracebacks with no idea what actual input triggered them. For reference, this request comes from running Dask[1] jobs. Dask handles retrying and tracking tasks across machines but if you're dealing with a batch of inputs that reliably kills a worker it is really hard to debug, moreso if it only happens ~12 hours into your job. At certain scales it's quite hard to log every processing event reliably, and the overhead may not be worth it for a 1 in 10,000,000 failure. 1. https://dask.org/ _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/RCKD5NK57LG3BFGWHTL7N2QM4AWLYA2I/ Code of Conduct: http://python.org/psf/codeofconduct/