Hello Rodrigo,

Thanks for your response. It seems it might be going to be more difficult than 
i thought. I am not from the linguistics department but from IR instead, and 
have no idea what a "head for every type of constituent except for Noun 
Phrases" exactly is.
 
This Dutch treebank i got from the university is very similar to one with 
regular unified tags, but just with some Dutch flavour and naming. Regardless 
of treebank type, i just do not understand it, the thesis' appendix is not 
helping either, too much jargon i am unfamiliar with.

Is there some basic explanation of what a head rule must do, or how i can 
include all the rules with the tags i use in the head rules
files directly? Or is there a draft of the manual somewhere describing this 
topic?

Many thanks,
Markus

-----Original message-----
> From:Rodrigo Agerri <rage...@apache.org>
> Sent: Tuesday 21st July 2020 15:52
> To: users@opennlp.apache.org
> Subject: Re: Parser, head rules
> 
> Hello,
> 
> The header files are also included in the official repo (for Spanish
> and English):
> 
> https://github.com/apache/opennlp/tree/master/opennlp-tools/lang/en/parser
> 
> https://github.com/apache/opennlp/tree/master/opennlp-tools/lang/es/parser
> 
> The format and content of the header rules comes from Collins thesis,
> Appendix A:
> 
> http://www.cs.columbia.edu/~mcollins/papers/thesis.ps
> 
> On Tue, 21 Jul 2020 at 14:24, Markus Jelsma <markus.jel...@openindex.io> 
> wrote:
> >
> > Also, our source data does not use UD or Penn-treebank type POS-tags. I do 
> > not assume this is a problem for the trainer tool, but is it?
> 
> The head rules files try to find the head for every type of
> constituent except for Noun Phrases, which directly included in the
> head rules parser class, here:
> 
> https://github.com/apache/opennlp/blob/master/opennlp-tools/src/main/java/opennlp/tools/parser/lang/en/HeadRules.java
> 
> https://github.com/apache/opennlp/blob/master/opennlp-tools/src/main/java/opennlp/tools/parser/lang/es/AncoraSpanishHeadRules.java
> 
> Thus, if you do not use Penn Treebank tags or constituents then you
> can include all the rules with the tags you use in the head rules
> files directly.
> 
> HTH,
> 
> R
> 

Reply via email to