Yes that would give ultimate control. C On May 5, 2011, at 9:42 AM, Jörn Kottmann wrote:
> On 5/5/11 6:12 PM, Chris Collins wrote: >> Right, I guess so: >> >> #Mon Mar 28 12:17:52 PDT 2011 >> Training-Eventhash=d61e8fc9af7e230ff91060f27e0d2959 >> Manifest-Version=1.0 >> Language=de >> useTokenEnd=true >> Training-Cutoff=5 >> Training-Iterations=100 >> OpenNLP-Version=1.5.0 >> Timestamp=1301339872213 >> Component-Name=SentenceDetectorME >> >> though I meant also major minor version that the person doing the build can >> provide for the version of the data not the OpenNLP software (don't forget >> data location e.g. >> /Users/chris/model_training/en/me_playing_around_dont_use_in_production :-}) > > Maybe we should give the user the freedom to write custom properties into the > earlier proposed training file and > extend the above manifest with automatically generates properties as far as > it makes sense. > > I guess that would suit your needs? > > The training data location might not always be available. I for example > retrieve my training data from a > database which contains my corpus. The data is then directly streamed into > OpenNLP without ever hitting the disk. > > Jörn
