Dear all,
Thank you for your answers to my previous email, which were all very
interesting.
I suspect that the example I used did not reveal the whole picture and
I would like to give it another try, hoping to receive equally useful
feedback as the first time.
The question relates to the conditional probability assigned to a rule,
based on the training data, and it can be shown by the following very
simple example: Given that my grammar contains many rules of the
form "NOUN -> X", where X various nouns, among which the noun "play",
the rule "VERB -> play", as well as many other rules. Assume also that
the training set contains 1 instance of the verb "play" and many more
(say 10) instances of the noun "play". The probability attached to the
rule "VERB -> play" in a "generative" manner is 1.0, while the one
attached to the rule "NOUN -> play" is less than 1.0. Thus, when it
comes to parsing a sentence containing this token, if there is no
other way of selecting between the two rules (i.e., using the rest of
the grammar), we would say that the word is a verb, despite the fact
that we have seen many more noun instances. Doesn't this seem odd?
Would it be better to condition on the appearance of the body, rather
than the head of the rule?
A slightly different and more complex, though still artificial, example
is the following:
Given a set of training sentences:
Fat people eat accumulates.
Fat people eat often.
Animal fat is harmful.
Eat!
Leave!
and the induced grammar:
S -> NP VB often freq=1 p=0.16
S -> NP VB harmful freq=1 p=0.16
S -> NP VB freq=1 p=0.16
S -> NP S VB freq=1 p=0.16
S -> VB freq=2 p=0.33
NP -> fat freq=2 p=0.33
NP -> people freq=2 p=0.33
NP -> ADJ NP freq=2 p=0.33
ADJ -> fat freq=1 p=0.5
ADJ -> animal freq=1 p=0.5
VB -> eat freq=3 p=0.5
VB -> accumulates freq=1 p=0.16
VB -> is freq=1 p=0.16
VB -> leave freq=1 p=0.16
Parsing the sentence "fat people eat accumulates", two parses are
produced:
(S 0.16 (NP 0.33 (ADJ 0.5 fat) (NP 0.33 people))
(S 0.33 (VB 0.5 eat))
(VB 0.16 accumulates))
p=0.16*0.33*0.5*0.33*0.33*0.5*0.16=0.00023
(S 0.16 (NP 0.33 fat))
(S 0.16 (NP 0.33 people) (VB 0.5 eat))
(VB 0.16 accumulates))
p=0.16*0.33*0.16*0.33*0.5*0.16=0.00022
Calculating rule probabilities conditioned on the appearance of the body,
rather then head, would trivially solve that problem.
Despite the artificiality of this example, the main issue remains very real.
Any comments, suggestions, criticism, pointers to related work are very
welcome!
George