Michael Ohlrogge added the comment:

This is my first time posting here, so apologies if I'm breaking rules.

I'd like to put in a vote in favor of this patch to get the matching scores.

I am a researcher at Stanford University using this tool to match up about 
100,000 different names of companies/entities in two different datasets that I 
have.  The names reflect the same underlying entities but because they come 
from different datasets, the spellings, abbreviations, etc. differ.

It would be helpful to me to be able to run the get_scored_close_matches() 
function and then sort the results by how close the matches were.  If I could 
for instance determine, based on some spot checking / sampling of the results, 
that everything with a match above a certain threshold is almost certainly 
correct, whereas those below a certain threshold need to be reviewed by hand, 
that would be helpful for me.  

I suppose I can accomplish something similar by playing around with setting the 
matching threshold at different levels.  Nevertheless, with as many possible 
matches as I am doing, the algorithm takes a decent amount of time to run, and 
I don't have a good way to know ex-ante what a reasonable threshold would be.

Just in general, I think it can be useful information for users to know how 
much confidence to have in the matches produced by the algorithm.  Users could 
choose to formulate this confidence either as a direct function of the score or 
perhaps based on some other factors, such as a statistical analysis procedure 
that takes the score into account.  

Thanks to everyone who put this package together and who suggested the patch.

----------
nosy: +michaelohlrogge
versions: +Python 2.7 -Python 3.5

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue21344>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to