Hello,

The header files are also included in the official repo (for Spanish
and English):

https://github.com/apache/opennlp/tree/master/opennlp-tools/lang/en/parser

https://github.com/apache/opennlp/tree/master/opennlp-tools/lang/es/parser

The format and content of the header rules comes from Collins thesis,
Appendix A:

http://www.cs.columbia.edu/~mcollins/papers/thesis.ps

On Tue, 21 Jul 2020 at 14:24, Markus Jelsma <markus.jel...@openindex.io> wrote:
>
> Also, our source data does not use UD or Penn-treebank type POS-tags. I do 
> not assume this is a problem for the trainer tool, but is it?

The head rules files try to find the head for every type of
constituent except for Noun Phrases, which directly included in the
head rules parser class, here:

https://github.com/apache/opennlp/blob/master/opennlp-tools/src/main/java/opennlp/tools/parser/lang/en/HeadRules.java

https://github.com/apache/opennlp/blob/master/opennlp-tools/src/main/java/opennlp/tools/parser/lang/es/AncoraSpanishHeadRules.java

Thus, if you do not use Penn Treebank tags or constituents then you
can include all the rules with the tags you use in the head rules
files directly.

HTH,

R

Reply via email to