Hi hackers, I am interested in extending Postgres with a "generalized edit function" like SAS's "compged"[1], which is basically levenshtein distance with transposes (ab <-> ba) and LOTS of different weights for certain ops (like insert a blank versus delete from the end versus insert a regular character).
Compged seems to work really well for us when trying to match addresses (MUCH better than pure levenshtein), and it would be a great tool for data miners. I have a number of questions: 1. Does anybody else care? I would love to see this in contrib, but if the chances are slim, then I would like to know that too. 2. Has anybody else done something like this and can give ideas or source? It seems to me that the code will have to be a mess of pointers and indexes, but if there is some theory that simplifies it I haven't heard about it. (Levenshtein without transposes is theoretically clean, but I think the fact that we have transposes means we look ahead 2 chars and lose all the nice dynamic programming stuff.) 3. I will probably implement this for ascii characters -- if anyone has any thoughts on other encodings, please share. Thanks for everyone's time. I will try to implement a command line version and put that on pastebin for people to look at while I port it to the postgres environment. [1] (http://support.sas.com/documentation/cdl/en/lrdict/64316/HTML/default/viewer.htm#a002206133.htm) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers