[ https://issues.apache.org/jira/browse/OPENNLP-1363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Richard Zowalla resolved OPENNLP-1363. -------------------------------------- Resolution: Fixed > Verify the documentation of the lemmatizer input format > ------------------------------------------------------- > > Key: OPENNLP-1363 > URL: https://issues.apache.org/jira/browse/OPENNLP-1363 > Project: OpenNLP > Issue Type: Documentation > Components: Documentation > Affects Versions: 2.1.0 > Reporter: Jeff Zemerick > Assignee: Atita Arora > Priority: Minor > Fix For: 2.1.1 > > > In OPENNLP-1257, a change was proposed to update the code to split the > lemmatizer input by spaces instead of by tab. I believe tab is the desired > delimiter but we need to make sure the documentation is consistent. > Refer to > [https://opennlp.apache.org/docs/1.9.4/manual/opennlp.html#tools.lemmatizer|https://opennlp.apache.org/docs/1.9.4/manual/opennlp.html#tools.lemmatizer.] > , in particular the following sentences: > "The training data consist of three columns separated by spaces. Each word > has been put on a separate line and there is an empty line after each > sentence. The first column contains the current word, the second its > part-of-speech tag and the third its lemma. Here is an example of the file > format:" > Determine if that first line should read "separated by tabs" instead. > -- This message was sent by Atlassian Jira (v8.20.10#820010)