On 5/5/11 6:12 PM, Chris Collins wrote:
Right, I guess so:
#Mon Mar 28 12:17:52 PDT 2011
Training-Eventhash=d61e8fc9af7e230ff91060f27e0d2959
Manifest-Version=1.0
Language=de
useTokenEnd=true
Training-Cutoff=5
Training-Iterations=100
OpenNLP-Version=1.5.0
Timestamp=1301339872213
Component-Name=SentenceDetectorME
though I meant also major minor version that the person doing the build can
provide for the version of the data not the OpenNLP software (don't forget data
location e.g.
/Users/chris/model_training/en/me_playing_around_dont_use_in_production :-})
Maybe we should give the user the freedom to write custom properties
into the earlier proposed training file and
extend the above manifest with automatically generates properties as far
as it makes sense.
I guess that would suit your needs?
The training data location might not always be available. I for example
retrieve my training data from a
database which contains my corpus. The data is then directly streamed
into OpenNLP without ever hitting the disk.
Jörn