Hello Rodrigo, Thanks for your response. It seems it might be going to be more difficult than i thought. I am not from the linguistics department but from IR instead, and have no idea what a "head for every type of constituent except for Noun Phrases" exactly is. This Dutch treebank i got from the university is very similar to one with regular unified tags, but just with some Dutch flavour and naming. Regardless of treebank type, i just do not understand it, the thesis' appendix is not helping either, too much jargon i am unfamiliar with.
Is there some basic explanation of what a head rule must do, or how i can include all the rules with the tags i use in the head rules files directly? Or is there a draft of the manual somewhere describing this topic? Many thanks, Markus -----Original message----- > From:Rodrigo Agerri <rage...@apache.org> > Sent: Tuesday 21st July 2020 15:52 > To: users@opennlp.apache.org > Subject: Re: Parser, head rules > > Hello, > > The header files are also included in the official repo (for Spanish > and English): > > https://github.com/apache/opennlp/tree/master/opennlp-tools/lang/en/parser > > https://github.com/apache/opennlp/tree/master/opennlp-tools/lang/es/parser > > The format and content of the header rules comes from Collins thesis, > Appendix A: > > http://www.cs.columbia.edu/~mcollins/papers/thesis.ps > > On Tue, 21 Jul 2020 at 14:24, Markus Jelsma <markus.jel...@openindex.io> > wrote: > > > > Also, our source data does not use UD or Penn-treebank type POS-tags. I do > > not assume this is a problem for the trainer tool, but is it? > > The head rules files try to find the head for every type of > constituent except for Noun Phrases, which directly included in the > head rules parser class, here: > > https://github.com/apache/opennlp/blob/master/opennlp-tools/src/main/java/opennlp/tools/parser/lang/en/HeadRules.java > > https://github.com/apache/opennlp/blob/master/opennlp-tools/src/main/java/opennlp/tools/parser/lang/es/AncoraSpanishHeadRules.java > > Thus, if you do not use Penn Treebank tags or constituents then you > can include all the rules with the tags you use in the head rules > files directly. > > HTH, > > R >