[ 
https://issues.apache.org/jira/browse/OPENNLP-1363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Zowalla resolved OPENNLP-1363.
--------------------------------------
    Resolution: Fixed

> Verify the documentation of the lemmatizer input format
> -------------------------------------------------------
>
>                 Key: OPENNLP-1363
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-1363
>             Project: OpenNLP
>          Issue Type: Documentation
>          Components: Documentation
>    Affects Versions: 2.1.0
>            Reporter: Jeff Zemerick
>            Assignee: Atita Arora
>            Priority: Minor
>             Fix For: 2.1.1
>
>
> In OPENNLP-1257, a change was proposed to update the code to split the 
> lemmatizer input by spaces instead of by tab. I believe tab is the desired 
> delimiter but we need to make sure the documentation is consistent.
> Refer to 
> [https://opennlp.apache.org/docs/1.9.4/manual/opennlp.html#tools.lemmatizer|https://opennlp.apache.org/docs/1.9.4/manual/opennlp.html#tools.lemmatizer.]
>  , in particular the following sentences:
> "The training data consist of three columns separated by spaces. Each word 
> has been put on a separate line and there is an empty line after each 
> sentence. The first column contains the current word, the second its 
> part-of-speech tag and the third its lemma. Here is an example of the file 
> format:"
> Determine if that first line should read "separated by tabs" instead.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to