Re: WordNet code updated, now with query expansion -- Re: SYNONYM + GOOGLE

2005-01-25 Thread Pierrick Brihaye
Hi,
David Spencer a écrit :
Do you plan to add expansion on other Wordnet relationships ? 
Hypernyms and hyponyms would be a good start point for thesaurus-like 
search, wouldn't it ?
Good point, I hadn't considered this - but how would it work -just 
consider these 2 relationships "synonyms" (thus easier to use) or make 
it separate (too academic?)
Well... the ideal case would be (easy) customization :-), form an 
external text (XML ?) file. Depending of the kind of relationship, the 
boost factor could be adjusted when the query is expanded. The same on 
relationships' depths.

For example a "father" hypernym could have a boost factor of 0.8, a 
"grand-father" a boost factor of 0.4, a "grand-grand-father" a boost 
factor of 0.2. Well, I wonder whether a logarithmic scale makes a better 
sense than a linear scale, but this should/would be customizable...

However, I'm afraid that this kind of feature would require 
refactoring, probably based on WordNet-dedicated libraries. JWNL 
(http://jwordnet.sourceforge.net/) may be a good candidate for this.
Good point, should leverage existing code.
One thing you can also easily get from this library are Wordnet's 
"exceptions", often irregular plurals (mouse/mice, addendum/addenda...). 
A very basic yet efficient kind of stemming which should be expanded 
with the same boost factor than the original term.

Well, there are many other relationships in WordNet. Take a look at :
http://jws-champo.ac-toulouse.fr:8080/treebolic-wordnet/
legends are here :
http://treebolic.sourceforge.net/en/browserwn.htm
Cheers,
--
Pierrick Brihaye, informaticien
Service régional de l'Inventaire
DRAC Bretagne
mailto:[EMAIL PROTECTED]
+33 (0)2 99 29 67 78
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: WordNet code updated, now with query expansion -- Re: SYNONYM + GOOGLE

2005-01-24 Thread David Spencer
Pierrick Brihaye wrote:
Hi,
David Spencer a écrit :
One example of expansion with the synonym boost set to 0.9 is the 
query "big dog" expands to:

Interesting.
Do you plan to add expansion on other Wordnet relationships ? Hypernyms 
and hyponyms would be a good start point for thesaurus-like search, 
wouldn't it ?
Good point, I hadn't considered this - but how would it work -just 
consider these 2 relationships "synonyms" (thus easier to use) or make 
it separate (too academic?)
However, I'm afraid that this kind of feature would require refactoring, 
probably based on WordNet-dedicated libraries. JWNL 
(http://jwordnet.sourceforge.net/) may be a good candidate for this.
Good point, should leverage existing code.

Thank you for your work.
thx,
 Dave
Cheers,

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: WordNet code updated, now with query expansion -- Re: SYNONYM + GOOGLE

2005-01-19 Thread Pierrick Brihaye
Hi,
David Spencer a écrit :
One example of expansion with the synonym boost set to 0.9 is the query 
"big dog" expands to:
Interesting.
Do you plan to add expansion on other Wordnet relationships ? Hypernyms 
and hyponyms would be a good start point for thesaurus-like search, 
wouldn't it ?

However, I'm afraid that this kind of feature would require refactoring, 
probably based on WordNet-dedicated libraries. JWNL 
(http://jwordnet.sourceforge.net/) may be a good candidate for this.

Thank you for your work.
Cheers,
--
Pierrick Brihaye, informaticien
Service régional de l'Inventaire
DRAC Bretagne
mailto:[EMAIL PROTECTED]
+33 (0)2 99 29 67 78
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: WordNet code updated, now with query expansion -- Re: SYNONYM + GOOGLE

2005-01-14 Thread Ian Soboroff
Daniel Naber <[EMAIL PROTECTED]> writes:

> On Wednesday 12 January 2005 01:47, David Spencer wrote:
>
>> Amusingly then, documents with the terms "liberal wienerwurst" match
>> "big dog"! :)
>
> There's something like frequency information in WordNet, it could probably 
> be used to ignore the uncommon meanings.

If you just go search CiteSeer for "WordNet", you will find the output
of every failed MS thesis experiment to improve retrieval performance
by naive application of WordNet synsets.

But I like the query expansion code.

Ian



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: WordNet code updated, now with query expansion -- Re: SYNONYM + GOOGLE

2005-01-12 Thread Daniel Naber
On Wednesday 12 January 2005 01:47, David Spencer wrote:

> Amusingly then, documents with the terms "liberal wienerwurst" match
> "big dog"! :)

There's something like frequency information in WordNet, it could probably 
be used to ignore the uncommon meanings.

Regards
 Daniel

-- 
http://www.danielnaber.de

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



WordNet code updated, now with query expansion -- Re: SYNONYM + GOOGLE

2005-01-11 Thread David Spencer
Erik Hatcher wrote:
On Jan 10, 2005, at 6:54 PM, David Spencer wrote:
Hi...I wrote the WordNet sandbox code - but I'm not sure if I 
undertand this thread. Are we saying that it does not work w/ the new 
WordNet data, or that code in Eric's book is better/more up to date etc?

I have not tried the sandbox with any versions past WordNet 1.6.  
Karthik shows a Java API to it, which I have not used - only your code 
that parses the prolog files.  So the book code explains exactly what is 
in the sandbox and describes WordNet 1.6 integration.  Though WordNet 
has evolved.

If needed I can update the sandbox code..

It'd be awesome to have current WordNet support - I haven't looked at 
what is involved in making it so.

I verified that the code works w/ the latest WordNet (2.0), and it does 
so, no problem. The relevant data from WordNet has not changed so 
there's no need to upgrade WordNet for this package at least.

I added "query expansion" which takes in a simple query string and for 
every term adds their synonyms. There's an optional boost parameter to 
be used to "penalize" synonyms if you want to use the heuristic that the 
 user probably knows the right word.

One example of expansion with the synonym boost set to 0.9 is the query 
"big dog" expands to:

big adult^0.9 bad^0.9 bighearted^0.9 boastful^0.9 boastfully^0.9 
bounteous^0.9 bountiful^0.9 braggy^0.9 crowing^0.9 freehanded^0.9 
giving^0.9 grown^0.9 grownup^0.9 handsome^0.9 large^0.9 liberal^0.9 
magnanimous^0.9 momentous^0.9 openhanded^0.9 prominent^0.9 swelled^0.9 
vainglorious^0.9 vauntingly^0.9
 dog andiron^0.9 blackguard^0.9 bounder^0.9 cad^0.9 chase^0.9 click^0.9 
detent^0.9 dogtooth^0.9 firedog^0.9 frank^0.9 frankfurter^0.9 frump^0.9 
heel^0.9 hotdog^0.9 hound^0.9 pawl^0.9 tag^0.9 tail^0.9 track^0.9 
trail^0.9 weenie^0.9 wiener^0.9 wienerwurst^0.9

Amusingly then, documents with the terms "liberal wienerwurst" match 
"big dog"! :)

Javadoc is here:
http://www.searchmorph.com/pub/jakarta-lucene-sandbox/contributions/WordNet/build/docs/api/org/apache/lucene/wordnet/package-summary.html
The new query expansion is here:
http://www.searchmorph.com/pub/jakarta-lucene-sandbox/contributions/WordNet/build/docs/api/org/apache/lucene/wordnet/SynExpand.html
Want to try it out? This page *expands* a query and prints out the 
result (but doesn't execute it yet).
http://www.searchmorph.com/kat/synonym.jsp?syn=big

CVS tree here:
http://cvs.apache.org/viewcvs.cgi/jakarta-lucene-sandbox/contributions/WordNet/
If you just want to use a prebuild index it's here (1MB):
http://searchmorph.com/pub/syn_index.zip
The prebuilt jar file is here:
http://www.searchmorph.com/pub/lucene-wordnet-dev.jar
Redundant weblog entry here:
http://www.searchmorph.com/weblog/index.php?id=34
Hope y'all like it and someone finds it useful,
  Dave
PS
 Oh - it may need the 1.5 dev branch of Lucene to work - I'm not 
positive but it I tried to remove deprecated warnings and doing so may 
have tied it to the latest code...

Erik
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: SYNONYM + GOOGLE

2005-01-10 Thread Erik Hatcher
On Jan 10, 2005, at 6:54 PM, David Spencer wrote:
Hi...I wrote the WordNet sandbox code - but I'm not sure if I 
undertand this thread. Are we saying that it does not work w/ the new 
WordNet data, or that code in Eric's book is better/more up to date 
etc?
I have not tried the sandbox with any versions past WordNet 1.6.  
Karthik shows a Java API to it, which I have not used - only your code 
that parses the prolog files.  So the book code explains exactly what 
is in the sandbox and describes WordNet 1.6 integration.  Though 
WordNet has evolved.

If needed I can update the sandbox code..
It'd be awesome to have current WordNet support - I haven't looked at 
what is involved in making it so.

Erik
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: SYNONYM + GOOGLE

2005-01-10 Thread David Spencer
Erik Hatcher wrote:
Karthik,
Thanks for that info.  I knew I was behind the times with WordNet using  
the sandbox code, but it was good enough for my purposes at the time.   
I will definitely try out the latest WordNet offerings in the future  
Hi...I wrote the WordNet sandbox code - but I'm not sure if I undertand 
this thread. Are we saying that it does not work w/ the new WordNet 
data, or that code in Eric's book is better/more up to date etc?

If needed I can update the sandbox code..
thx,
 Dave

though.
Erik
On Jan 10, 2005, at 7:37 AM, Karthik N S wrote:
Hi Erik
Apologies...
I may be a little offline from this form,but I may help u for the next
version of Luncene In Action.
 I Was working on Java WordNet Library , On fiddling with the API's,  
found
something Interesting ,

 the code attached to this  get's more Synonyms then the Wordnet's  
Indexed
format avaliable from the LuceneinAction Zip File


1) It needs Wordnet2.0's Dictonery  Installed
2) jwnl.jar from SourceForge
[
http://sourceforge.net/project/showfiles.php? 
group_id=33824&package_id=33975
&release_id=196864 ]

After sucess compilation
Type for watch
ORIGINAL  : "watch" OR "analog_watch" OR "digital_watch" OR "hunter" OR
"hunting_watch" OR "pendulum_watch" OR
"pocket_watch" OR "stem-winder" OR "wristwatch" OR  
"wrist_watch"

FORMATTED : "watch" OR "analog watch" OR "digital watch" OR "hunter" OR
"hunting watch" OR "pendulum watch" OR "pocket watch"
Check this Out,may be u will come up with Briliant Idea's

with regards
Karthik
-Original Message-
From: Erik Hatcher [mailto:[EMAIL PROTECTED]
Sent: Monday, January 10, 2005 5:19 PM
To: Lucene Users List
Subject: Re: SYNONYM + GOOGLE

On Jan 10, 2005, at 5:33 AM, Karthik N S wrote:
If u search Google  using  '~shoes',  It returns  hits  based on the
Synonym's
[ I know there is a Synonym Wordnet  based Lucene Package in the
sandbox
http://cvs.apache.org/viewcvs.cgi/jakarta-lucene-sandbox/
contributions/WordN
et/   ]
Can this be achieved in Lucene ,If so How ???

Yes, it can be achieved.  Not quite synonyms, but various forms of the
same word can be found in this example, like this search for similar
(see the highlighted variations):
http://www.lucenebook.com/search?query=similar
This is accomplished using the Snowball stemmer filter found in the
sandbox.   For synonyms, you have lots of options.  In Lucene in Action
I demonstrate custom analyzers that inject synonyms using the WordNet
database (from the sandbox).  From the source code distribution of LIA:
% ant SynonymAnalyzerViewer
Buildfile: build.xml
SynonymAnalyzerViewer:
  [echo]
  [echo]   Using a custom SynonymAnalyzer, two fixed strings  are
  [echo]   analyzed with the results displayed.  Synonyms, from
the
  [echo]   WordNet database, are injected into the same  
positions
  [echo]   as the original words.
  [echo]
  [echo]   See the "Analysis" chapter for more on synonym
injection and
  [echo]   position increments.  The "Tools and extensions"
chapter covers
  [echo]   the WordNet feature found in the Lucene sandbox.
  [echo]
 [input] Press return to continue...

  [echo] Running lia.analysis.synonym.SynonymAnalyzerViewer...
  [java] 1: [quick] [warm] [straightaway] [spry] [speedy] [ready]
[quickly] [promptly] [prompt] [nimble] [immediate] [flying] [fast]
[agile]
  [java] 2: [brown] [brownness] [brownish]
  [java] 3: [fox] [trick] [throw] [slyboots] [fuddle] [fob]  [dodger]
[discombobulate] [confuse] [confound] [befuddle] [bedevil]
  [java] 4: [jumps]
  [java] 5: [over] [o] [across]
  [java] 6: [lazy] [faineant] [indolent] [otiose] [slothful]
  [java] 7: [dogs]
...
The phrase analyzed was "The quick brown fox jumps over the lazy dogs".
  Why no synonyms for "jumps" and "dogs"?  WordNet has synonyms for
"jump" and "dog", but not the plural forms.  Stemming would be a
necessary step in achieving full synonym look-up, though this would
need to be done carefully as the stem of a word is not necessarily a
real word itself - so you'd probably want to stem the synonym database
also to ensure accurate lookup.
Also notice the semantically incorrect synonyms that appear for the
animal fox ("confuse", for example).  Be careful!  :)
Erik
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: SYNONYM + GOOGLE

2005-01-10 Thread Erik Hatcher
Karthik,
Thanks for that info.  I knew I was behind the times with WordNet using  
the sandbox code, but it was good enough for my purposes at the time.   
I will definitely try out the latest WordNet offerings in the future  
though.

Erik
On Jan 10, 2005, at 7:37 AM, Karthik N S wrote:
Hi Erik
Apologies...
I may be a little offline from this form,but I may help u for the next
version of Luncene In Action.
 I Was working on Java WordNet Library , On fiddling with the API's,  
found
something Interesting ,

 the code attached to this  get's more Synonyms then the Wordnet's  
Indexed
format avaliable from the LuceneinAction Zip File


1) It needs Wordnet2.0's Dictonery  Installed
2) jwnl.jar from SourceForge
[
http://sourceforge.net/project/showfiles.php? 
group_id=33824&package_id=33975
&release_id=196864 ]

After sucess compilation
Type for watch
ORIGINAL  : "watch" OR "analog_watch" OR "digital_watch" OR "hunter" OR
"hunting_watch" OR "pendulum_watch" OR
"pocket_watch" OR "stem-winder" OR "wristwatch" OR  
"wrist_watch"

FORMATTED : "watch" OR "analog watch" OR "digital watch" OR "hunter" OR
"hunting watch" OR "pendulum watch" OR "pocket watch"
Check this Out,may be u will come up with Briliant Idea's

with regards
Karthik
-Original Message-
From: Erik Hatcher [mailto:[EMAIL PROTECTED]
Sent: Monday, January 10, 2005 5:19 PM
To: Lucene Users List
Subject: Re: SYNONYM + GOOGLE

On Jan 10, 2005, at 5:33 AM, Karthik N S wrote:
If u search Google  using  '~shoes',  It returns  hits  based on the
Synonym's
[ I know there is a Synonym Wordnet  based Lucene Package in the
sandbox
http://cvs.apache.org/viewcvs.cgi/jakarta-lucene-sandbox/
contributions/WordN
et/   ]
Can this be achieved in Lucene ,If so How ???
Yes, it can be achieved.  Not quite synonyms, but various forms of the
same word can be found in this example, like this search for similar
(see the highlighted variations):
http://www.lucenebook.com/search?query=similar
This is accomplished using the Snowball stemmer filter found in the
sandbox.   For synonyms, you have lots of options.  In Lucene in Action
I demonstrate custom analyzers that inject synonyms using the WordNet
database (from the sandbox).  From the source code distribution of LIA:
% ant SynonymAnalyzerViewer
Buildfile: build.xml
SynonymAnalyzerViewer:
  [echo]
  [echo]   Using a custom SynonymAnalyzer, two fixed strings  
are
  [echo]   analyzed with the results displayed.  Synonyms, from
the
  [echo]   WordNet database, are injected into the same  
positions
  [echo]   as the original words.
  [echo]
  [echo]   See the "Analysis" chapter for more on synonym
injection and
  [echo]   position increments.  The "Tools and extensions"
chapter covers
  [echo]   the WordNet feature found in the Lucene sandbox.
  [echo]
 [input] Press return to continue...

  [echo] Running lia.analysis.synonym.SynonymAnalyzerViewer...
  [java] 1: [quick] [warm] [straightaway] [spry] [speedy] [ready]
[quickly] [promptly] [prompt] [nimble] [immediate] [flying] [fast]
[agile]
  [java] 2: [brown] [brownness] [brownish]
  [java] 3: [fox] [trick] [throw] [slyboots] [fuddle] [fob]  
[dodger]
[discombobulate] [confuse] [confound] [befuddle] [bedevil]
  [java] 4: [jumps]
  [java] 5: [over] [o] [across]
  [java] 6: [lazy] [faineant] [indolent] [otiose] [slothful]
  [java] 7: [dogs]

...
The phrase analyzed was "The quick brown fox jumps over the lazy dogs".
  Why no synonyms for "jumps" and "dogs"?  WordNet has synonyms for
"jump" and "dog", but not the plural forms.  Stemming would be a
necessary step in achieving full synonym look-up, though this would
need to be done carefully as the stem of a word is not necessarily a
real word itself - so you'd probably want to stem the synonym database
also to ensure accurate lookup.
Also notice the semantically incorrect synonyms that appear for the
animal fox ("confuse", for example).  Be careful!  :)
Erik
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


RE: SYNONYM + GOOGLE

2005-01-10 Thread Karthik N S
Hi Erik

Apologies...

I may be a little offline from this form,but I may help u for the next
version of Luncene In Action.


 I Was working on Java WordNet Library , On fiddling with the API's, found
something Interesting ,

 the code attached to this  get's more Synonyms then the Wordnet's Indexed
format avaliable from the LuceneinAction Zip File



1) It needs Wordnet2.0's Dictonery  Installed

2) jwnl.jar from SourceForge

[
http://sourceforge.net/project/showfiles.php?group_id=33824&package_id=33975
&release_id=196864 ]


After sucess compilation

Type for watch

ORIGINAL  : "watch" OR "analog_watch" OR "digital_watch" OR "hunter" OR
"hunting_watch" OR "pendulum_watch" OR
"pocket_watch" OR "stem-winder" OR "wristwatch" OR "wrist_watch"

FORMATTED : "watch" OR "analog watch" OR "digital watch" OR "hunter" OR
"hunting watch" OR "pendulum watch" OR "pocket watch"


Check this Out,may be u will come up with Briliant Idea's



with regards
Karthik

-Original Message-
From: Erik Hatcher [mailto:[EMAIL PROTECTED]
Sent: Monday, January 10, 2005 5:19 PM
To: Lucene Users List
Subject: Re: SYNONYM + GOOGLE



On Jan 10, 2005, at 5:33 AM, Karthik N S wrote:
> If u search Google  using  '~shoes',  It returns  hits  based on the
> Synonym's
>
> [ I know there is a Synonym Wordnet  based Lucene Package in the
> sandbox
>
> http://cvs.apache.org/viewcvs.cgi/jakarta-lucene-sandbox/
> contributions/WordN
> et/   ]
>
> Can this be achieved in Lucene ,If so How ???

Yes, it can be achieved.  Not quite synonyms, but various forms of the
same word can be found in this example, like this search for similar
(see the highlighted variations):

http://www.lucenebook.com/search?query=similar

This is accomplished using the Snowball stemmer filter found in the
sandbox.   For synonyms, you have lots of options.  In Lucene in Action
I demonstrate custom analyzers that inject synonyms using the WordNet
database (from the sandbox).  From the source code distribution of LIA:

% ant SynonymAnalyzerViewer
Buildfile: build.xml

SynonymAnalyzerViewer:
  [echo]
  [echo]   Using a custom SynonymAnalyzer, two fixed strings are
  [echo]   analyzed with the results displayed.  Synonyms, from
the
  [echo]   WordNet database, are injected into the same positions
  [echo]   as the original words.
  [echo]
  [echo]   See the "Analysis" chapter for more on synonym
injection and
  [echo]   position increments.  The "Tools and extensions"
chapter covers
  [echo]   the WordNet feature found in the Lucene sandbox.
  [echo]
 [input] Press return to continue...

  [echo] Running lia.analysis.synonym.SynonymAnalyzerViewer...

  [java] 1: [quick] [warm] [straightaway] [spry] [speedy] [ready]
[quickly] [promptly] [prompt] [nimble] [immediate] [flying] [fast]
[agile]
  [java] 2: [brown] [brownness] [brownish]
  [java] 3: [fox] [trick] [throw] [slyboots] [fuddle] [fob] [dodger]
[discombobulate] [confuse] [confound] [befuddle] [bedevil]
  [java] 4: [jumps]
  [java] 5: [over] [o] [across]
  [java] 6: [lazy] [faineant] [indolent] [otiose] [slothful]
  [java] 7: [dogs]

...

The phrase analyzed was "The quick brown fox jumps over the lazy dogs".
  Why no synonyms for "jumps" and "dogs"?  WordNet has synonyms for
"jump" and "dog", but not the plural forms.  Stemming would be a
necessary step in achieving full synonym look-up, though this would
need to be done carefully as the stem of a word is not necessarily a
real word itself - so you'd probably want to stem the synonym database
also to ensure accurate lookup.

Also notice the semantically incorrect synonyms that appear for the
animal fox ("confuse", for example).  Be careful!  :)

Erik


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: SYNONYM + GOOGLE

2005-01-10 Thread Erik Hatcher
On Jan 10, 2005, at 5:33 AM, Karthik N S wrote:
If u search Google  using  '~shoes',  It returns  hits  based on the
Synonym's
[ I know there is a Synonym Wordnet  based Lucene Package in the  
sandbox

http://cvs.apache.org/viewcvs.cgi/jakarta-lucene-sandbox/ 
contributions/WordN
et/   ]

Can this be achieved in Lucene ,If so How ???
Yes, it can be achieved.  Not quite synonyms, but various forms of the  
same word can be found in this example, like this search for similar  
(see the highlighted variations):

http://www.lucenebook.com/search?query=similar
This is accomplished using the Snowball stemmer filter found in the  
sandbox.   For synonyms, you have lots of options.  In Lucene in Action  
I demonstrate custom analyzers that inject synonyms using the WordNet  
database (from the sandbox).  From the source code distribution of LIA:

% ant SynonymAnalyzerViewer
Buildfile: build.xml
SynonymAnalyzerViewer:
 [echo]
 [echo]   Using a custom SynonymAnalyzer, two fixed strings are
 [echo]   analyzed with the results displayed.  Synonyms, from  
the
 [echo]   WordNet database, are injected into the same positions
 [echo]   as the original words.
 [echo]
 [echo]   See the "Analysis" chapter for more on synonym  
injection and
 [echo]   position increments.  The "Tools and extensions"  
chapter covers
 [echo]   the WordNet feature found in the Lucene sandbox.
 [echo]
[input] Press return to continue...

 [echo] Running lia.analysis.synonym.SynonymAnalyzerViewer...
 [java] 1: [quick] [warm] [straightaway] [spry] [speedy] [ready]  
[quickly] [promptly] [prompt] [nimble] [immediate] [flying] [fast]  
[agile]
 [java] 2: [brown] [brownness] [brownish]
 [java] 3: [fox] [trick] [throw] [slyboots] [fuddle] [fob] [dodger]  
[discombobulate] [confuse] [confound] [befuddle] [bedevil]
 [java] 4: [jumps]
 [java] 5: [over] [o] [across]
 [java] 6: [lazy] [faineant] [indolent] [otiose] [slothful]
 [java] 7: [dogs]

...
The phrase analyzed was "The quick brown fox jumps over the lazy dogs".  
 Why no synonyms for "jumps" and "dogs"?  WordNet has synonyms for  
"jump" and "dog", but not the plural forms.  Stemming would be a  
necessary step in achieving full synonym look-up, though this would  
need to be done carefully as the stem of a word is not necessarily a  
real word itself - so you'd probably want to stem the synonym database  
also to ensure accurate lookup.

Also notice the semantically incorrect synonyms that appear for the  
animal fox ("confuse", for example).  Be careful!  :)

Erik
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


SYNONYM + GOOGLE

2005-01-10 Thread Karthik N S


Hi Guys

Apologies

Does Lucene have a  Synonym  Functonality as Google.

If u search Google  using  '~shoes',  It returns  hits  based on the
Synonym's

[ I know there is a Synonym Wordnet  based Lucene Package in the sandbox

http://cvs.apache.org/viewcvs.cgi/jakarta-lucene-sandbox/contributions/WordN
et/   ]

Can this be achieved in Lucene ,If so How ???



Thx in Advance
Karthik






















WITH WARM REGARDS
HAVE A NICE DAY
[ N.S.KARTHIK]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]