Hello, The header files are also included in the official repo (for Spanish and English):
https://github.com/apache/opennlp/tree/master/opennlp-tools/lang/en/parser https://github.com/apache/opennlp/tree/master/opennlp-tools/lang/es/parser The format and content of the header rules comes from Collins thesis, Appendix A: http://www.cs.columbia.edu/~mcollins/papers/thesis.ps On Tue, 21 Jul 2020 at 14:24, Markus Jelsma <markus.jel...@openindex.io> wrote: > > Also, our source data does not use UD or Penn-treebank type POS-tags. I do > not assume this is a problem for the trainer tool, but is it? The head rules files try to find the head for every type of constituent except for Noun Phrases, which directly included in the head rules parser class, here: https://github.com/apache/opennlp/blob/master/opennlp-tools/src/main/java/opennlp/tools/parser/lang/en/HeadRules.java https://github.com/apache/opennlp/blob/master/opennlp-tools/src/main/java/opennlp/tools/parser/lang/es/AncoraSpanishHeadRules.java Thus, if you do not use Penn Treebank tags or constituents then you can include all the rules with the tags you use in the head rules files directly. HTH, R