Daniel, The cutoff value is for filtering out 1-time or special cases that may hinder training. If the training data doesn't contain at least 5 occurrences, when set to 5, in the same context it will ignore the training data. This happens when filtering the data and determining the number of outcomes for the model to be trained for to find.
The iterations are how many passes through the training set the trainer will attempt before stopping. With the Maxent models this helps train faster by keeping the number of iterations small. This value really depends on the model generated and the type. Most of the testing is done with 100 iterations just to get the training done quickly, due to the size of the training data-sets used sometimes. The key features are two numbers that get printed for each run (iteration), they indicate how close to trained on the data-set they are. Be careful, the trick is to train and not get the models to memorize the training set... this is because the training set is only a snapshot in time... news article limitation currently in most of the training; but, too small is also bad. There is also another stopping point, when the model is completely trained and knows the training set. This happens when the numbers get to optimal values... In my guess when the model predicts the training set perfectly. James On 7/5/2012 6:04 AM, Daniel wrote: > when I am training nameFinders, I usually use 500 iteratios and 5 > cutoff, but I really dont know why I should use these values or > others, can anybody tell me something about this parameters? > > Thanks!
