Dear Javier,

Thank you for your reply to my second email.
The issue I am trying to make is clarified by your answer in the simple
example that I gave, namely the use of priors about non-terminals
(P(NOUN) and P(VERB) in the example). Do probabilistic parsers take
these priors into account? Is there any reason why they don't have to 
do that? Are there any relevant references?

Thanks again for your help!

George

PS. I apologise for the confusion that I caused regarding the use of
    "sentences" instead of trees, which was due to my effort to shorten
    my email. Indeed I am interested in trees and the probabilities are
    calculated on the basis of the parse trees of training sentences,
    which I omitted for brevity.

"Francisco J. Diez" wrote:
> 
> Dear George:
> 
> As explained in one of the previous messages, you must distinguish the
> prior probability of a sentence, P(sentence), from the probability of a
> certain parse tree given a sentence P(parse-tree|sentence).
> 
> Given that each tree generates only one sentence,
> P(sentence|parse-tree-i)=1. If a sentence can be generated by only one
> parse tree, then P(parse-tree|sentence)=1.
> 
> If a sentence can be generated by two parse trees, the probability of
> each tree given that sentence is, according with Bayes theorem,
> 
> P(parse-tree-1|sentence) = P(sentence|parse-tree-1) x P(parse-tree-1) /
>    [P(sentence|parse-tree-1) x P(parse-tree-1)
>         + P(sentence|parse-tree-2) x P(parse-tree-2)]
>    = P(parse-tree-1) / [P(parse-tree-1) + P(parse-tree-2)]
> 
> > The question relates to the conditional probability assigned to a rule,
> > based on the training data, and it can be shown by the following very
> > simple example: Given that my grammar contains many rules of the
> > form "NOUN -> X", where X various nouns, among which the noun "play",
> > the rule "VERB -> play", as well as many other rules. Assume also that
> > the training set contains 1 instance of the verb "play" and many more
> > (say 10) instances of the noun "play". The probability attached to the
> > rule "VERB -> play" in a "generative" manner is 1.0,
> 
> This is true only is there is no other rule of type "VERB -> some-verb".
> 
> > while the one
> > attached to the rule "NOUN -> play" is less than 1.0.
> > comes to parsing a sentence containing this token, if there is no
> > other way of selecting between the two rules (i.e., using the rest of
> > the grammar), we would say that the word is a verb, despite the fact
> > that we have seen many more noun instances. Doesn't this seem odd?
> 
> Again you must distinguish between P("play"|NOUN) and P(NOUN|"play"). In
> this example, in which the context does not indicate you whether the
> word "play" is a noun or a verb,
> 
> P(NOUN|"play") = P("play"|NOUN) x P(NOUN) /
>      [P("play"|NOUN) x P(NOUN) + P("play"|VERB) x P(VERB)] .
> 
> According with your data set, P(NOUN) is more than ten times bigger than
> P(VERB), and for this reason P(NOUN|"play") > P(VERB|"play"), despite
> the fact that P("play"|NOUN) < P("play"|VERB).
> 
> > Would it be better to condition on the appearance of the body, rather
> > than the head of the rule?
> 
> I don't know exactly what you mean. Each of the probabilities you have
> in your grammars are the probability that a non-terminal (left-hand
> side, which I assume it what you call head) is replaced by a certain
> combination of terminal and non-terminals (right-hand side, the body).
> Then you have the probability of the body given the head.
> 
> > A slightly different and more complex, though still artificial, example
> > is the following:
> >
> > Given a set of training sentences:
> > Fat people eat accumulates.
> > Fat people eat often.
> > Animal fat is harmful.
> > Eat!
> > Leave!
> >
> > and the induced grammar:
> > S -> NP VB often      freq=1   p=0.16
> > S -> NP VB harmful    freq=1   p=0.16
> > S -> NP VB            freq=1   p=0.16
> > S -> NP S VB          freq=1   p=0.16
> > S -> VB               freq=2   p=0.33
> 
> The sentence "Fat people eat accumulates" can be generated by two
> different parse trees. For ambigous grammars, the probability of a tree
> is not the same as the probability of a sentence. Apparently you are
> interested in the probability of trees, but you are computing the
> probability sentences. If you want to obtain the probability of a rule,
> you should use a set of trees, not a set of sentences.
> 
> You must also reconsider the assumption that the sentences in your
> dataset have been randomly generated from a grammar by selecting at each
> step one of the available replacements for a terminal.
> 
> In my opinion, you must have a careful definition of (the meaning of)
> your probabilities, and then build a coherent model that leads to those
> probabilities. Otherwise you will encounter lots of paradoxes and
> inconsistencies.
> 
> Regards
>   Javier D�ez
> 
> --------------------------------------------------------------------
> F. J. Diez                       Phone: +34-91-3987161
> Dpto. Inteligencia Artificial    Fax:   +34-91-3986697
> UNED. Senda del Rey, 9           E-mail: [EMAIL PROTECTED]
> 28040 Madrid. Spain              WWW: http://www.ia.uned.es/~fjdiez

Reply via email to