Re: WordNet code updated, now with query expansion -- Re: SYNONYM + GOOGLE
Hi, David Spencer a écrit : Do you plan to add expansion on other Wordnet relationships ? Hypernyms and hyponyms would be a good start point for thesaurus-like search, wouldn't it ? Good point, I hadn't considered this - but how would it work -just consider these 2 relationships "synonyms" (thus easier to use) or make it separate (too academic?) Well... the ideal case would be (easy) customization :-), form an external text (XML ?) file. Depending of the kind of relationship, the boost factor could be adjusted when the query is expanded. The same on relationships' depths. For example a "father" hypernym could have a boost factor of 0.8, a "grand-father" a boost factor of 0.4, a "grand-grand-father" a boost factor of 0.2. Well, I wonder whether a logarithmic scale makes a better sense than a linear scale, but this should/would be customizable... However, I'm afraid that this kind of feature would require refactoring, probably based on WordNet-dedicated libraries. JWNL (http://jwordnet.sourceforge.net/) may be a good candidate for this. Good point, should leverage existing code. One thing you can also easily get from this library are Wordnet's "exceptions", often irregular plurals (mouse/mice, addendum/addenda...). A very basic yet efficient kind of stemming which should be expanded with the same boost factor than the original term. Well, there are many other relationships in WordNet. Take a look at : http://jws-champo.ac-toulouse.fr:8080/treebolic-wordnet/ legends are here : http://treebolic.sourceforge.net/en/browserwn.htm Cheers, -- Pierrick Brihaye, informaticien Service régional de l'Inventaire DRAC Bretagne mailto:[EMAIL PROTECTED] +33 (0)2 99 29 67 78 - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: WordNet code updated, now with query expansion -- Re: SYNONYM + GOOGLE
Pierrick Brihaye wrote: Hi, David Spencer a écrit : One example of expansion with the synonym boost set to 0.9 is the query "big dog" expands to: Interesting. Do you plan to add expansion on other Wordnet relationships ? Hypernyms and hyponyms would be a good start point for thesaurus-like search, wouldn't it ? Good point, I hadn't considered this - but how would it work -just consider these 2 relationships "synonyms" (thus easier to use) or make it separate (too academic?) However, I'm afraid that this kind of feature would require refactoring, probably based on WordNet-dedicated libraries. JWNL (http://jwordnet.sourceforge.net/) may be a good candidate for this. Good point, should leverage existing code. Thank you for your work. thx, Dave Cheers, - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: WordNet code updated, now with query expansion -- Re: SYNONYM + GOOGLE
Hi, David Spencer a écrit : One example of expansion with the synonym boost set to 0.9 is the query "big dog" expands to: Interesting. Do you plan to add expansion on other Wordnet relationships ? Hypernyms and hyponyms would be a good start point for thesaurus-like search, wouldn't it ? However, I'm afraid that this kind of feature would require refactoring, probably based on WordNet-dedicated libraries. JWNL (http://jwordnet.sourceforge.net/) may be a good candidate for this. Thank you for your work. Cheers, -- Pierrick Brihaye, informaticien Service régional de l'Inventaire DRAC Bretagne mailto:[EMAIL PROTECTED] +33 (0)2 99 29 67 78 - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: WordNet code updated, now with query expansion -- Re: SYNONYM + GOOGLE
Daniel Naber <[EMAIL PROTECTED]> writes: > On Wednesday 12 January 2005 01:47, David Spencer wrote: > >> Amusingly then, documents with the terms "liberal wienerwurst" match >> "big dog"! :) > > There's something like frequency information in WordNet, it could probably > be used to ignore the uncommon meanings. If you just go search CiteSeer for "WordNet", you will find the output of every failed MS thesis experiment to improve retrieval performance by naive application of WordNet synsets. But I like the query expansion code. Ian - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: WordNet code updated, now with query expansion -- Re: SYNONYM + GOOGLE
On Wednesday 12 January 2005 01:47, David Spencer wrote: > Amusingly then, documents with the terms "liberal wienerwurst" match > "big dog"! :) There's something like frequency information in WordNet, it could probably be used to ignore the uncommon meanings. Regards Daniel -- http://www.danielnaber.de - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
WordNet code updated, now with query expansion -- Re: SYNONYM + GOOGLE
Erik Hatcher wrote: On Jan 10, 2005, at 6:54 PM, David Spencer wrote: Hi...I wrote the WordNet sandbox code - but I'm not sure if I undertand this thread. Are we saying that it does not work w/ the new WordNet data, or that code in Eric's book is better/more up to date etc? I have not tried the sandbox with any versions past WordNet 1.6. Karthik shows a Java API to it, which I have not used - only your code that parses the prolog files. So the book code explains exactly what is in the sandbox and describes WordNet 1.6 integration. Though WordNet has evolved. If needed I can update the sandbox code.. It'd be awesome to have current WordNet support - I haven't looked at what is involved in making it so. I verified that the code works w/ the latest WordNet (2.0), and it does so, no problem. The relevant data from WordNet has not changed so there's no need to upgrade WordNet for this package at least. I added "query expansion" which takes in a simple query string and for every term adds their synonyms. There's an optional boost parameter to be used to "penalize" synonyms if you want to use the heuristic that the user probably knows the right word. One example of expansion with the synonym boost set to 0.9 is the query "big dog" expands to: big adult^0.9 bad^0.9 bighearted^0.9 boastful^0.9 boastfully^0.9 bounteous^0.9 bountiful^0.9 braggy^0.9 crowing^0.9 freehanded^0.9 giving^0.9 grown^0.9 grownup^0.9 handsome^0.9 large^0.9 liberal^0.9 magnanimous^0.9 momentous^0.9 openhanded^0.9 prominent^0.9 swelled^0.9 vainglorious^0.9 vauntingly^0.9 dog andiron^0.9 blackguard^0.9 bounder^0.9 cad^0.9 chase^0.9 click^0.9 detent^0.9 dogtooth^0.9 firedog^0.9 frank^0.9 frankfurter^0.9 frump^0.9 heel^0.9 hotdog^0.9 hound^0.9 pawl^0.9 tag^0.9 tail^0.9 track^0.9 trail^0.9 weenie^0.9 wiener^0.9 wienerwurst^0.9 Amusingly then, documents with the terms "liberal wienerwurst" match "big dog"! :) Javadoc is here: http://www.searchmorph.com/pub/jakarta-lucene-sandbox/contributions/WordNet/build/docs/api/org/apache/lucene/wordnet/package-summary.html The new query expansion is here: http://www.searchmorph.com/pub/jakarta-lucene-sandbox/contributions/WordNet/build/docs/api/org/apache/lucene/wordnet/SynExpand.html Want to try it out? This page *expands* a query and prints out the result (but doesn't execute it yet). http://www.searchmorph.com/kat/synonym.jsp?syn=big CVS tree here: http://cvs.apache.org/viewcvs.cgi/jakarta-lucene-sandbox/contributions/WordNet/ If you just want to use a prebuild index it's here (1MB): http://searchmorph.com/pub/syn_index.zip The prebuilt jar file is here: http://www.searchmorph.com/pub/lucene-wordnet-dev.jar Redundant weblog entry here: http://www.searchmorph.com/weblog/index.php?id=34 Hope y'all like it and someone finds it useful, Dave PS Oh - it may need the 1.5 dev branch of Lucene to work - I'm not positive but it I tried to remove deprecated warnings and doing so may have tied it to the latest code... Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: SYNONYM + GOOGLE
On Jan 10, 2005, at 6:54 PM, David Spencer wrote: Hi...I wrote the WordNet sandbox code - but I'm not sure if I undertand this thread. Are we saying that it does not work w/ the new WordNet data, or that code in Eric's book is better/more up to date etc? I have not tried the sandbox with any versions past WordNet 1.6. Karthik shows a Java API to it, which I have not used - only your code that parses the prolog files. So the book code explains exactly what is in the sandbox and describes WordNet 1.6 integration. Though WordNet has evolved. If needed I can update the sandbox code.. It'd be awesome to have current WordNet support - I haven't looked at what is involved in making it so. Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: SYNONYM + GOOGLE
Erik Hatcher wrote: Karthik, Thanks for that info. I knew I was behind the times with WordNet using the sandbox code, but it was good enough for my purposes at the time. I will definitely try out the latest WordNet offerings in the future Hi...I wrote the WordNet sandbox code - but I'm not sure if I undertand this thread. Are we saying that it does not work w/ the new WordNet data, or that code in Eric's book is better/more up to date etc? If needed I can update the sandbox code.. thx, Dave though. Erik On Jan 10, 2005, at 7:37 AM, Karthik N S wrote: Hi Erik Apologies... I may be a little offline from this form,but I may help u for the next version of Luncene In Action. I Was working on Java WordNet Library , On fiddling with the API's, found something Interesting , the code attached to this get's more Synonyms then the Wordnet's Indexed format avaliable from the LuceneinAction Zip File 1) It needs Wordnet2.0's Dictonery Installed 2) jwnl.jar from SourceForge [ http://sourceforge.net/project/showfiles.php? group_id=33824&package_id=33975 &release_id=196864 ] After sucess compilation Type for watch ORIGINAL : "watch" OR "analog_watch" OR "digital_watch" OR "hunter" OR "hunting_watch" OR "pendulum_watch" OR "pocket_watch" OR "stem-winder" OR "wristwatch" OR "wrist_watch" FORMATTED : "watch" OR "analog watch" OR "digital watch" OR "hunter" OR "hunting watch" OR "pendulum watch" OR "pocket watch" Check this Out,may be u will come up with Briliant Idea's with regards Karthik -Original Message- From: Erik Hatcher [mailto:[EMAIL PROTECTED] Sent: Monday, January 10, 2005 5:19 PM To: Lucene Users List Subject: Re: SYNONYM + GOOGLE On Jan 10, 2005, at 5:33 AM, Karthik N S wrote: If u search Google using '~shoes', It returns hits based on the Synonym's [ I know there is a Synonym Wordnet based Lucene Package in the sandbox http://cvs.apache.org/viewcvs.cgi/jakarta-lucene-sandbox/ contributions/WordN et/ ] Can this be achieved in Lucene ,If so How ??? Yes, it can be achieved. Not quite synonyms, but various forms of the same word can be found in this example, like this search for similar (see the highlighted variations): http://www.lucenebook.com/search?query=similar This is accomplished using the Snowball stemmer filter found in the sandbox. For synonyms, you have lots of options. In Lucene in Action I demonstrate custom analyzers that inject synonyms using the WordNet database (from the sandbox). From the source code distribution of LIA: % ant SynonymAnalyzerViewer Buildfile: build.xml SynonymAnalyzerViewer: [echo] [echo] Using a custom SynonymAnalyzer, two fixed strings are [echo] analyzed with the results displayed. Synonyms, from the [echo] WordNet database, are injected into the same positions [echo] as the original words. [echo] [echo] See the "Analysis" chapter for more on synonym injection and [echo] position increments. The "Tools and extensions" chapter covers [echo] the WordNet feature found in the Lucene sandbox. [echo] [input] Press return to continue... [echo] Running lia.analysis.synonym.SynonymAnalyzerViewer... [java] 1: [quick] [warm] [straightaway] [spry] [speedy] [ready] [quickly] [promptly] [prompt] [nimble] [immediate] [flying] [fast] [agile] [java] 2: [brown] [brownness] [brownish] [java] 3: [fox] [trick] [throw] [slyboots] [fuddle] [fob] [dodger] [discombobulate] [confuse] [confound] [befuddle] [bedevil] [java] 4: [jumps] [java] 5: [over] [o] [across] [java] 6: [lazy] [faineant] [indolent] [otiose] [slothful] [java] 7: [dogs] ... The phrase analyzed was "The quick brown fox jumps over the lazy dogs". Why no synonyms for "jumps" and "dogs"? WordNet has synonyms for "jump" and "dog", but not the plural forms. Stemming would be a necessary step in achieving full synonym look-up, though this would need to be done carefully as the stem of a word is not necessarily a real word itself - so you'd probably want to stem the synonym database also to ensure accurate lookup. Also notice the semantically incorrect synonyms that appear for the animal fox ("confuse", for example). Be careful! :) Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: SYNONYM + GOOGLE
Karthik, Thanks for that info. I knew I was behind the times with WordNet using the sandbox code, but it was good enough for my purposes at the time. I will definitely try out the latest WordNet offerings in the future though. Erik On Jan 10, 2005, at 7:37 AM, Karthik N S wrote: Hi Erik Apologies... I may be a little offline from this form,but I may help u for the next version of Luncene In Action. I Was working on Java WordNet Library , On fiddling with the API's, found something Interesting , the code attached to this get's more Synonyms then the Wordnet's Indexed format avaliable from the LuceneinAction Zip File 1) It needs Wordnet2.0's Dictonery Installed 2) jwnl.jar from SourceForge [ http://sourceforge.net/project/showfiles.php? group_id=33824&package_id=33975 &release_id=196864 ] After sucess compilation Type for watch ORIGINAL : "watch" OR "analog_watch" OR "digital_watch" OR "hunter" OR "hunting_watch" OR "pendulum_watch" OR "pocket_watch" OR "stem-winder" OR "wristwatch" OR "wrist_watch" FORMATTED : "watch" OR "analog watch" OR "digital watch" OR "hunter" OR "hunting watch" OR "pendulum watch" OR "pocket watch" Check this Out,may be u will come up with Briliant Idea's with regards Karthik -Original Message- From: Erik Hatcher [mailto:[EMAIL PROTECTED] Sent: Monday, January 10, 2005 5:19 PM To: Lucene Users List Subject: Re: SYNONYM + GOOGLE On Jan 10, 2005, at 5:33 AM, Karthik N S wrote: If u search Google using '~shoes', It returns hits based on the Synonym's [ I know there is a Synonym Wordnet based Lucene Package in the sandbox http://cvs.apache.org/viewcvs.cgi/jakarta-lucene-sandbox/ contributions/WordN et/ ] Can this be achieved in Lucene ,If so How ??? Yes, it can be achieved. Not quite synonyms, but various forms of the same word can be found in this example, like this search for similar (see the highlighted variations): http://www.lucenebook.com/search?query=similar This is accomplished using the Snowball stemmer filter found in the sandbox. For synonyms, you have lots of options. In Lucene in Action I demonstrate custom analyzers that inject synonyms using the WordNet database (from the sandbox). From the source code distribution of LIA: % ant SynonymAnalyzerViewer Buildfile: build.xml SynonymAnalyzerViewer: [echo] [echo] Using a custom SynonymAnalyzer, two fixed strings are [echo] analyzed with the results displayed. Synonyms, from the [echo] WordNet database, are injected into the same positions [echo] as the original words. [echo] [echo] See the "Analysis" chapter for more on synonym injection and [echo] position increments. The "Tools and extensions" chapter covers [echo] the WordNet feature found in the Lucene sandbox. [echo] [input] Press return to continue... [echo] Running lia.analysis.synonym.SynonymAnalyzerViewer... [java] 1: [quick] [warm] [straightaway] [spry] [speedy] [ready] [quickly] [promptly] [prompt] [nimble] [immediate] [flying] [fast] [agile] [java] 2: [brown] [brownness] [brownish] [java] 3: [fox] [trick] [throw] [slyboots] [fuddle] [fob] [dodger] [discombobulate] [confuse] [confound] [befuddle] [bedevil] [java] 4: [jumps] [java] 5: [over] [o] [across] [java] 6: [lazy] [faineant] [indolent] [otiose] [slothful] [java] 7: [dogs] ... The phrase analyzed was "The quick brown fox jumps over the lazy dogs". Why no synonyms for "jumps" and "dogs"? WordNet has synonyms for "jump" and "dog", but not the plural forms. Stemming would be a necessary step in achieving full synonym look-up, though this would need to be done carefully as the stem of a word is not necessarily a real word itself - so you'd probably want to stem the synonym database also to ensure accurate lookup. Also notice the semantically incorrect synonyms that appear for the animal fox ("confuse", for example). Be careful! :) Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: SYNONYM + GOOGLE
Hi Erik Apologies... I may be a little offline from this form,but I may help u for the next version of Luncene In Action. I Was working on Java WordNet Library , On fiddling with the API's, found something Interesting , the code attached to this get's more Synonyms then the Wordnet's Indexed format avaliable from the LuceneinAction Zip File 1) It needs Wordnet2.0's Dictonery Installed 2) jwnl.jar from SourceForge [ http://sourceforge.net/project/showfiles.php?group_id=33824&package_id=33975 &release_id=196864 ] After sucess compilation Type for watch ORIGINAL : "watch" OR "analog_watch" OR "digital_watch" OR "hunter" OR "hunting_watch" OR "pendulum_watch" OR "pocket_watch" OR "stem-winder" OR "wristwatch" OR "wrist_watch" FORMATTED : "watch" OR "analog watch" OR "digital watch" OR "hunter" OR "hunting watch" OR "pendulum watch" OR "pocket watch" Check this Out,may be u will come up with Briliant Idea's with regards Karthik -Original Message- From: Erik Hatcher [mailto:[EMAIL PROTECTED] Sent: Monday, January 10, 2005 5:19 PM To: Lucene Users List Subject: Re: SYNONYM + GOOGLE On Jan 10, 2005, at 5:33 AM, Karthik N S wrote: > If u search Google using '~shoes', It returns hits based on the > Synonym's > > [ I know there is a Synonym Wordnet based Lucene Package in the > sandbox > > http://cvs.apache.org/viewcvs.cgi/jakarta-lucene-sandbox/ > contributions/WordN > et/ ] > > Can this be achieved in Lucene ,If so How ??? Yes, it can be achieved. Not quite synonyms, but various forms of the same word can be found in this example, like this search for similar (see the highlighted variations): http://www.lucenebook.com/search?query=similar This is accomplished using the Snowball stemmer filter found in the sandbox. For synonyms, you have lots of options. In Lucene in Action I demonstrate custom analyzers that inject synonyms using the WordNet database (from the sandbox). From the source code distribution of LIA: % ant SynonymAnalyzerViewer Buildfile: build.xml SynonymAnalyzerViewer: [echo] [echo] Using a custom SynonymAnalyzer, two fixed strings are [echo] analyzed with the results displayed. Synonyms, from the [echo] WordNet database, are injected into the same positions [echo] as the original words. [echo] [echo] See the "Analysis" chapter for more on synonym injection and [echo] position increments. The "Tools and extensions" chapter covers [echo] the WordNet feature found in the Lucene sandbox. [echo] [input] Press return to continue... [echo] Running lia.analysis.synonym.SynonymAnalyzerViewer... [java] 1: [quick] [warm] [straightaway] [spry] [speedy] [ready] [quickly] [promptly] [prompt] [nimble] [immediate] [flying] [fast] [agile] [java] 2: [brown] [brownness] [brownish] [java] 3: [fox] [trick] [throw] [slyboots] [fuddle] [fob] [dodger] [discombobulate] [confuse] [confound] [befuddle] [bedevil] [java] 4: [jumps] [java] 5: [over] [o] [across] [java] 6: [lazy] [faineant] [indolent] [otiose] [slothful] [java] 7: [dogs] ... The phrase analyzed was "The quick brown fox jumps over the lazy dogs". Why no synonyms for "jumps" and "dogs"? WordNet has synonyms for "jump" and "dog", but not the plural forms. Stemming would be a necessary step in achieving full synonym look-up, though this would need to be done carefully as the stem of a word is not necessarily a real word itself - so you'd probably want to stem the synonym database also to ensure accurate lookup. Also notice the semantically incorrect synonyms that appear for the animal fox ("confuse", for example). Be careful! :) Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: SYNONYM + GOOGLE
On Jan 10, 2005, at 5:33 AM, Karthik N S wrote: If u search Google using '~shoes', It returns hits based on the Synonym's [ I know there is a Synonym Wordnet based Lucene Package in the sandbox http://cvs.apache.org/viewcvs.cgi/jakarta-lucene-sandbox/ contributions/WordN et/ ] Can this be achieved in Lucene ,If so How ??? Yes, it can be achieved. Not quite synonyms, but various forms of the same word can be found in this example, like this search for similar (see the highlighted variations): http://www.lucenebook.com/search?query=similar This is accomplished using the Snowball stemmer filter found in the sandbox. For synonyms, you have lots of options. In Lucene in Action I demonstrate custom analyzers that inject synonyms using the WordNet database (from the sandbox). From the source code distribution of LIA: % ant SynonymAnalyzerViewer Buildfile: build.xml SynonymAnalyzerViewer: [echo] [echo] Using a custom SynonymAnalyzer, two fixed strings are [echo] analyzed with the results displayed. Synonyms, from the [echo] WordNet database, are injected into the same positions [echo] as the original words. [echo] [echo] See the "Analysis" chapter for more on synonym injection and [echo] position increments. The "Tools and extensions" chapter covers [echo] the WordNet feature found in the Lucene sandbox. [echo] [input] Press return to continue... [echo] Running lia.analysis.synonym.SynonymAnalyzerViewer... [java] 1: [quick] [warm] [straightaway] [spry] [speedy] [ready] [quickly] [promptly] [prompt] [nimble] [immediate] [flying] [fast] [agile] [java] 2: [brown] [brownness] [brownish] [java] 3: [fox] [trick] [throw] [slyboots] [fuddle] [fob] [dodger] [discombobulate] [confuse] [confound] [befuddle] [bedevil] [java] 4: [jumps] [java] 5: [over] [o] [across] [java] 6: [lazy] [faineant] [indolent] [otiose] [slothful] [java] 7: [dogs] ... The phrase analyzed was "The quick brown fox jumps over the lazy dogs". Why no synonyms for "jumps" and "dogs"? WordNet has synonyms for "jump" and "dog", but not the plural forms. Stemming would be a necessary step in achieving full synonym look-up, though this would need to be done carefully as the stem of a word is not necessarily a real word itself - so you'd probably want to stem the synonym database also to ensure accurate lookup. Also notice the semantically incorrect synonyms that appear for the animal fox ("confuse", for example). Be careful! :) Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
SYNONYM + GOOGLE
Hi Guys Apologies Does Lucene have a Synonym Functonality as Google. If u search Google using '~shoes', It returns hits based on the Synonym's [ I know there is a Synonym Wordnet based Lucene Package in the sandbox http://cvs.apache.org/viewcvs.cgi/jakarta-lucene-sandbox/contributions/WordN et/ ] Can this be achieved in Lucene ,If so How ??? Thx in Advance Karthik WITH WARM REGARDS HAVE A NICE DAY [ N.S.KARTHIK] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]