[issue24904] Patch: add timeout to difflib SequenceMatcher ratio() and quick_ratio()

2015-08-20 Thread Robert Collins

Robert Collins added the comment:

So - I'm with Victor and Raymond here. I think modifying difflib to provide 
external control over the poor-O components would permit many more benefits 
than just controlling time: you could wrap them in a timer module to get what 
this patch does, you could replace them with alternative implementations (e.g. 
parallel ones).

--
nosy: +rbcollins

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue24904
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24904] Patch: add timeout to difflib SequenceMatcher ratio() and quick_ratio()

2015-08-20 Thread Raymond Hettinger

Raymond Hettinger added the comment:

In general, it isn't good design to incorporate timeout logic in computation 
logic.  What would be better is a general purpose, reusable, decoupled tool: 
run_with_time_limit(some_computation, some_args, time_limit).  Such a tool 
might be based on separate process that can be timed or killed, it might use 
signals, or may be based on threading.Timer.

I did a quick look around the net.  Timeouts on diff APIs aren't common (i.e 
GNU diff doesn't have a timeout) but there are a couple of precedents (you 
aren't the first to have had concerns about the running time for unfavorable 
inputs).

--
nosy: +rhettinger

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue24904
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24904] Patch: add timeout to difflib SequenceMatcher ratio() and quick_ratio()

2015-08-20 Thread Raymond Hettinger

Changes by Raymond Hettinger raymond.hettin...@gmail.com:


--
nosy: +tim.peters

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue24904
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24904] Patch: add timeout to difflib SequenceMatcher ratio() and quick_ratio()

2015-08-20 Thread John Taylor

New submission from John Taylor:

SequenceMatcher in the difflib module contain ratio() and quick_ratio() methods 
which can take a long time to run with certain input.  One example is two 
slightly different versions of jquery.min.js.

I have written a patch against python-350b4 that adds a timeout to these 
methods.  The new functionality also has the capability to fall through to 
the next quickest comparison method should a timeout occur. If a timeout does 
occur and using a fall through method is not desired, then -1 is returned for 
the ratio.

I'd like this to be incorporated into Python 3.5.0 if it is not too late.

--
components: Library (Lib)
files: difflib-diff.patch
keywords: patch
messages: 248919
nosy: jftuga
priority: normal
severity: normal
status: open
title: Patch: add timeout to difflib SequenceMatcher ratio() and quick_ratio()
type: enhancement
versions: Python 3.5
Added file: http://bugs.python.org/file40217/difflib-diff.patch

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue24904
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue24904] Patch: add timeout to difflib SequenceMatcher ratio() and quick_ratio()

2015-08-20 Thread STINNER Victor

STINNER Victor added the comment:

I'm not sure that it's a good idea to add a timeout to such algorithm. It can 
be very surprising to have a difference result depending on the system load 
(CPU usage of _other_ applications) and on the CPU performances.

If you really want this result, I would prefer to design the feature outside 
the Python stdlib. You might modify the stdlib to allow incremental computation.

About the patch itself, which kind of timer should be used? Monotonic clock? 
System clock? Process time (CPU time)?

Maybe we can optimize the code?

--
nosy: +haypo

___
Python tracker rep...@bugs.python.org
http://bugs.python.org/issue24904
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com