On 5/5/11 6:12 PM, Chris Collins wrote:
Right, I guess so:

#Mon Mar 28 12:17:52 PDT 2011
Training-Eventhash=d61e8fc9af7e230ff91060f27e0d2959
Manifest-Version=1.0
Language=de
useTokenEnd=true
Training-Cutoff=5
Training-Iterations=100
OpenNLP-Version=1.5.0
Timestamp=1301339872213
Component-Name=SentenceDetectorME

though I meant also major minor version that the person doing the build can 
provide for the version of the data not the OpenNLP software (don't forget data 
location e.g. 
/Users/chris/model_training/en/me_playing_around_dont_use_in_production :-})

Maybe we should give the user the freedom to write custom properties into the earlier proposed training file and extend the above manifest with automatically generates properties as far as it makes sense.

I guess that would suit your needs?

The training data location might not always be available. I for example retrieve my training data from a database which contains my corpus. The data is then directly streamed into OpenNLP without ever hitting the disk.

Jörn

Reply via email to