Hi Ben,

Just a small addition to Sid's note, some of which I'm certain you
already know but I'll go into a few of those details for anyone else
who might be wondering about these same things...

Indeed we do have a virtual root node - in previous versions of
WordNet (pre 3.0) there were in fact multiple noun hierarchies and
many verb hierarchies, and so it was not uncommon for the path based
measures to be unable to measure similarity between pairs of nouns or
pairs of verbs. We decided to allow the user to enable a virtual root
node that would essentially sit on top of the nouns and another one to
sit on top of the verbs and make sure that all nouns were connected to
each other, and that all verbs were connected to each other as well.

Now, in WordNet 3.0 all the nouns were in fact put together in
WordNet, more or less rendering our virtual root for nouns moot.
However, the verbs remain separated (into hundreds of hierarches) and
so our virtual root node remains relevent here, and this is something
that a user can turn on or off at their discretion.

So, if you are using WordNet 2.1, you are getting a virtual root node
for your nouns and your verbs. If you are using WordNet 3.0, you are
getting a virtual root node for your verbs. The goal of the virtual
root node was exactly as you surmise, to make the nouns completely
connected and to make the verbs completely connected. Do note that we
do not add any connections between nouns and verbs, so that if you try
to find a path based (path, wup, lch) or info content based measure
(res, lin, jcn) between a noun and a verb it will fail because they
aren't connected.

Adjectives and adverbs... yikes. :) The only measures that will work
for those are our gloss based methods (lesk, vector, vector-pairs)
because they aren't hierarchical. hso will probably work too, although
I can't say I've ever tried that.

In fact, one of the big reasons for ever starting the development of
the lesk or vector measures in the first place was realizing that
there were many pairs of concepts that could not be measured via
similarity measures. The main limitation was you could not measure
"mixed" pairs of words (like nouns and verbs, adjectives and nouns,
etc.) I think those kinds of measurements are very natural and often
what people really want to do, so lesk and vector hopefully fill that
need.

One of our great unfulfilled ambitions, btw, is to extend the hso
measure. This is a path finding measure that isn't hierarchical, that
is it goes weaving around WordNet hoping from nouns to verbs, etc.,
and it has a lot of really neat properties that would be interesting
to explore in a bit more depth. This measure also allows you to cross
over pos boundaries, and seems also to get to the gist of your
question.

So those are a few more random thoughts - these are very interesting
issues to me, so additional discussion, ideas, etc. are most welcome
from you or anyone else reading this...

Cordially,
Ted


On Wed, Mar 5, 2008 at 12:21 AM, Siddharth Patwardhan <[EMAIL PROTECTED]> wrote:
>
>
>
>
>
>
> On Tue, 2008-03-04 at 20:18 -0500, Ben Haskell wrote:
>  > On Tue, 4 Mar 2008, Quang Do Xuan wrote:
>  >
>  > > Hi all,
>  > >
>  > > I'm trying to get the shortest distance for all word pairs in WordNet
>  > > (say, all word pairs of words in the noun database index.noun) .
>  >
>  > Question for Ted and/or Sid:
>  >
>  > Quang's query made me wonder: how does WN::Similarity deal with the fact
>  > that WordNet's not fully connected? The nouns are, but the other parts of
>  > speech are not. With verbs, (and formerly, with nouns), I recall some
>  > concept of a 'virtual' root node. How about adjectives and adverbs, which
>  > aren't hierarchical?
>  >
>  > Best,
>  > Ben
>
>  Hi Ben,
>
>  Most of the measures that rely on WordNet's is-a hierarchy are limited
>  in that respect. They do not cross part-of-speech boundaries, and
>  support only noun-noun or verb-verb similarity/relatedness.
>
>  Specifically, the infocontent-based and path-based measures -- res,
>  lin, jcn, lch, path and wup -- can only compute noun-noun or verb-verb
>  scores. For adjectives and adverbs an error is returned, and for
>  noun-verb, it returns a warning and a low negative score (-1000000)
>  representing minus-infinity.
>
>  The remaining measures do not rely on the isa-hierarchy, so they are
>  not affected. Interestingly, even tough hso relies on finding paths in
>  WordNet, these paths use some of the inter-part-of-speech relations in
>  WordNet to traverse part-of-speech boundaries.
>
>  Hope this answers your questions.
>
>  -- Sid.
>
>

-- 
Ted Pedersen
http://www.d.umn.edu/~tpederse

Reply via email to