We are happy to announce the release of a small but useful utility program known as nameconflate! You can download it here:
http://www.d.umn.edu/~tpederse/tools.html This program takes as input text from the English GigaWord Corpus, and allows you to conflate any number of words or phrases into a single word (aka a pseudo-word). For example, you could conflate line, China, and "Tom Hanks" into a single word in the Giga Word corpus (or some portion of it). The output is in the lexical sample format from Senseval-2, and will replace each occurrence of the individual words with their conflated (ambiguous) form. The correct (unconflated/unambiguous) form is retained as well, so you can perform word sense disambiguation on the conflated text, and then easily score your results. We have used this program rather extensively to create data for SenseClusters (http://senseclusters.sourceforge.net) and also with our Duluth WSD systems (http://www.d.umn.edu/~tpederse/senseval3.html) We have placed some sample data on the tools page above so you can see how it looks. If you don't have the GigaWord corpus, we would be happy to generate some samples for you based on particular words you might like to see conflated. Cordially, Ted and Anagha -- Ted Pedersen http://www.d.umn.edu/~tpederse ------------------------------------------------------- The SF.Net email is sponsored by: Beat the post-holiday blues Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek. It's fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt _______________________________________________ senseclusters-users mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/senseclusters-users
