Dear all,

Thank you for your answers to my previous email, which were all very 
interesting. 

I suspect that the example I used did not reveal the whole picture and
I would like to give it another try, hoping to receive equally useful
feedback as the first time.

The question relates to the conditional probability assigned to a rule,
based on the training data, and it can be shown by the following very 
simple example: Given that my grammar contains many rules of the
form "NOUN -> X", where X various nouns, among which the noun "play",
the rule "VERB -> play", as well as many other rules. Assume also that
the training set contains 1 instance of the verb "play" and many more 
(say 10) instances of the noun "play". The probability attached to the
rule "VERB -> play" in a "generative" manner is 1.0, while the one
attached to the rule "NOUN -> play" is less than 1.0. Thus, when it
comes to parsing a sentence containing this token, if there is no
other way of selecting between the two rules (i.e., using the rest of
the grammar), we would say that the word is a verb, despite the fact
that we have seen many more noun instances. Doesn't this seem odd?
Would it be better to condition on the appearance of the body, rather
than the head of the rule?

A slightly different and more complex, though still artificial, example
is the following:

Given a set of training sentences:
Fat people eat accumulates.
Fat people eat often.
Animal fat is harmful.
Eat!
Leave!

and the induced grammar:
S -> NP VB often      freq=1   p=0.16
S -> NP VB harmful    freq=1   p=0.16
S -> NP VB            freq=1   p=0.16
S -> NP S VB          freq=1   p=0.16
S -> VB               freq=2   p=0.33

NP -> fat             freq=2   p=0.33
NP -> people          freq=2   p=0.33
NP -> ADJ NP          freq=2   p=0.33
ADJ -> fat            freq=1   p=0.5
ADJ -> animal         freq=1   p=0.5

VB -> eat             freq=3   p=0.5
VB -> accumulates     freq=1   p=0.16
VB -> is              freq=1   p=0.16
VB -> leave           freq=1   p=0.16

Parsing the sentence "fat people eat accumulates", two parses are 
produced:
 
(S 0.16 (NP 0.33 (ADJ 0.5 fat) (NP 0.33 people))
        (S  0.33 (VB  0.5 eat))
        (VB 0.16 accumulates))
p=0.16*0.33*0.5*0.33*0.33*0.5*0.16=0.00023
 
(S 0.16 (NP 0.33 fat))
        (S  0.16 (NP 0.33 people) (VB 0.5 eat))
        (VB 0.16 accumulates))
p=0.16*0.33*0.16*0.33*0.5*0.16=0.00022

Calculating rule probabilities conditioned on the appearance of the body,
rather then head, would trivially solve that problem.

Despite the artificiality of this example, the main issue remains very real.

Any comments, suggestions, criticism, pointers to related work are very
welcome!

George



Reply via email to