Hello, We can assign a head word to every syntactic constituent (noun phrase, verbal phrase, etc.). The head rules state which one is the head word when traversing (in the java implementation of the HeadRules) the syntactic tree (from left to right or from right to left). Appendix A in Collins thesis explains this quite well including an example. Perhaps you can check the java code for traversing the trees and rules and see if you can make sense of it?
Best regards, Rodrigo On Tue, 28 Jul 2020 at 12:18, Markus Jelsma <markus.jel...@openindex.io> wrote: > > Hello Rodrigo, > > Thanks for your response. It seems it might be going to be more difficult > than i thought. I am not from the linguistics department but from IR instead, > and have no idea what a "head for every type of constituent except for Noun > Phrases" exactly is. > > This Dutch treebank i got from the university is very similar to one with > regular unified tags, but just with some Dutch flavour and naming. Regardless > of treebank type, i just do not understand it, the thesis' appendix is not > helping either, too much jargon i am unfamiliar with. > > Is there some basic explanation of what a head rule must do, or how i can > include all the rules with the tags i use in the head rules > files directly? Or is there a draft of the manual somewhere describing this > topic? > > Many thanks, > Markus > > -----Original message----- > > From:Rodrigo Agerri <rage...@apache.org> > > Sent: Tuesday 21st July 2020 15:52 > > To: users@opennlp.apache.org > > Subject: Re: Parser, head rules > > > > Hello, > > > > The header files are also included in the official repo (for Spanish > > and English): > > > > https://github.com/apache/opennlp/tree/master/opennlp-tools/lang/en/parser > > > > https://github.com/apache/opennlp/tree/master/opennlp-tools/lang/es/parser > > > > The format and content of the header rules comes from Collins thesis, > > Appendix A: > > > > http://www.cs.columbia.edu/~mcollins/papers/thesis.ps > > > > On Tue, 21 Jul 2020 at 14:24, Markus Jelsma <markus.jel...@openindex.io> > > wrote: > > > > > > Also, our source data does not use UD or Penn-treebank type POS-tags. I > > > do not assume this is a problem for the trainer tool, but is it? > > > > The head rules files try to find the head for every type of > > constituent except for Noun Phrases, which directly included in the > > head rules parser class, here: > > > > https://github.com/apache/opennlp/blob/master/opennlp-tools/src/main/java/opennlp/tools/parser/lang/en/HeadRules.java > > > > https://github.com/apache/opennlp/blob/master/opennlp-tools/src/main/java/opennlp/tools/parser/lang/es/AncoraSpanishHeadRules.java > > > > Thus, if you do not use Penn Treebank tags or constituents then you > > can include all the rules with the tags you use in the head rules > > files directly. > > > > HTH, > > > > R > >