Right, I guess so: #Mon Mar 28 12:17:52 PDT 2011 Training-Eventhash=d61e8fc9af7e230ff91060f27e0d2959 Manifest-Version=1.0 Language=de useTokenEnd=true Training-Cutoff=5 Training-Iterations=100 OpenNLP-Version=1.5.0 Timestamp=1301339872213 Component-Name=SentenceDetectorME
though I meant also major minor version that the person doing the build can provide for the version of the data not the OpenNLP software (don't forget data location e.g. /Users/chris/model_training/en/me_playing_around_dont_use_in_production :-}) C On May 5, 2011, at 9:01 AM, Jörn Kottmann wrote: > On 5/5/11 5:57 PM, Chris Collins wrote: >> That is a good idea, I would also consider including a few other optional >> fields and making it human readable. In the system I work on all our data >> gets this type of "body tag", we include other things like: >> >> - machine it was built on and perhaps the os user that did the run. >> - build date >> - source path to where the input data (in this case training set) >> - maybe a hash of the training set. >> - major/ minor version number >> - maybe the training tool allows you to pass a set of arbitrary key value >> pairs this way the above could be defined in an ant script or what have you. >> >> This way when you find this model sitting a disk some day you can actually >> figure out if you trust it. Nothing like going into production with >> something like this to find it was something built on your interns laptop >> just as a test that everyone forgot about. >> > > That just sounds like what we already write into the model, expect the > machine name, OS and user. > The model itself is a zip package, and includes a manifest which includes > these values. > > Maybe we should extend the cmd line tooling to display it, then you do not > need to unpack > the zip package. > > Jörn >
