[issue24904] Patch: add timeout to difflib SequenceMatcher ratio() and quick_ratio()
Robert Collins added the comment: So - I'm with Victor and Raymond here. I think modifying difflib to provide external control over the poor-O components would permit many more benefits than just controlling time: you could wrap them in a timer module to get what this patch does, you could replace them with alternative implementations (e.g. parallel ones). -- nosy: +rbcollins ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue24904 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue24904] Patch: add timeout to difflib SequenceMatcher ratio() and quick_ratio()
Raymond Hettinger added the comment: In general, it isn't good design to incorporate timeout logic in computation logic. What would be better is a general purpose, reusable, decoupled tool: run_with_time_limit(some_computation, some_args, time_limit). Such a tool might be based on separate process that can be timed or killed, it might use signals, or may be based on threading.Timer. I did a quick look around the net. Timeouts on diff APIs aren't common (i.e GNU diff doesn't have a timeout) but there are a couple of precedents (you aren't the first to have had concerns about the running time for unfavorable inputs). -- nosy: +rhettinger ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue24904 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue24904] Patch: add timeout to difflib SequenceMatcher ratio() and quick_ratio()
Changes by Raymond Hettinger raymond.hettin...@gmail.com: -- nosy: +tim.peters ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue24904 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue24904] Patch: add timeout to difflib SequenceMatcher ratio() and quick_ratio()
New submission from John Taylor: SequenceMatcher in the difflib module contain ratio() and quick_ratio() methods which can take a long time to run with certain input. One example is two slightly different versions of jquery.min.js. I have written a patch against python-350b4 that adds a timeout to these methods. The new functionality also has the capability to fall through to the next quickest comparison method should a timeout occur. If a timeout does occur and using a fall through method is not desired, then -1 is returned for the ratio. I'd like this to be incorporated into Python 3.5.0 if it is not too late. -- components: Library (Lib) files: difflib-diff.patch keywords: patch messages: 248919 nosy: jftuga priority: normal severity: normal status: open title: Patch: add timeout to difflib SequenceMatcher ratio() and quick_ratio() type: enhancement versions: Python 3.5 Added file: http://bugs.python.org/file40217/difflib-diff.patch ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue24904 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue24904] Patch: add timeout to difflib SequenceMatcher ratio() and quick_ratio()
STINNER Victor added the comment: I'm not sure that it's a good idea to add a timeout to such algorithm. It can be very surprising to have a difference result depending on the system load (CPU usage of _other_ applications) and on the CPU performances. If you really want this result, I would prefer to design the feature outside the Python stdlib. You might modify the stdlib to allow incremental computation. About the patch itself, which kind of timer should be used? Monotonic clock? System clock? Process time (CPU time)? Maybe we can optimize the code? -- nosy: +haypo ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue24904 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com