On 17/04/2009 7:32 PM, Clarendon wrote:
Dear John Machin

I presume that you replied to me instead of the list accidentally.


So sorry about the typo. It should be: "the program should *see* that
the designated *words* are..."

"a long way" has two parentheses to the left -- (VP (DT -- before it
hits a separate group -- VBD came).

Like I said, the parentheses are an artifact of one particular visual representation of the parse tree. Your effort at clarification has introduced new unexplained terminology ("separate group"). BTW if you plan to persist with parentheses, you might at least display the tree in a somewhat more consistent fashion, discovering in the process that you are two parentheses short:
(ROOT
    (S
        (NP
            (PRP I)
        )
        (VP
            (VBD came)
            (NP
                (DT a)
                (JJ long)
                (NN way)
            )
            (PP
                (IN in)
                (S
                    (VP
                        (VBG changing)
                        (NP
                            (PRP$ my)
                            (NN habit)
                        )
                    )
                )
            )
        )
Now look at this:
ROOT
    S
        NP
            PRP I
        VP
            VBD came
            NP
                DT a
                JJ long
                NN way
            PP
                IN in
                etc etc
No parentheses, and no loss of information.

In fact if you keep the parentheses and lose all whitespace except a space between each node-type an a terminal word, you'll see that the parenthesis notation is just one way of serialising the tree.

You have a tree structure, with the parsed information built on top of the words (terminals). A very quick flip through the NLTK tutorial gave me the impression that it would be highly unlikely not to have all you need -- and a bazillion other things, which is probably why you can't find what you want :-) I certainly saw having parents mentioned as an option

Suggestions:
1. Get a pencil and a piece of paper, write "ROOT" at the top in the centre, and write "I came a long way in ......" spaced across the bottom. Fill in the parse tree. 2. Express your requirement in terms of moving around the tree, following pointers to parent, left/elder sibling (if any), right/younger sibling (if any), and children. E.g. the 3 parse nodes for "a long way" are "DT JJ NN" and their parent is "NP". NP's left sibling is a VBD node ("came") and its right sibling is a PP ("in .....")
3. Then have another look at the NLTK docs
4. Ask questions on the NLTK mailing list.

HTH,
John

--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to