*Results of Chunker evaluation with public data* *Component:* Chunker
*Data:* CONLL 2000 *Tester:* colen *Tagging Perf 1.5.0:* *Tagging Perf 1.5.1:* Precision: 0.9255923572240226 Recall: 0.9220610430991112 F-Measure: 0.9238233255623465 *Comment:* ChunkerEvaluator tool was not availabe in 1.5.0. To evaluate if something changed I compared the output of 1.5.0 and 1.5.1 in a way similar to "Compatibility Test with OpenNLP 1.5.0 SourceForge Models". The output changed a little because of a bug fixed in 1.5.1 (missing trailing closing bracket) ------------------------ *Component:* Chunker *Data:* Arvores Deitadas *Tester:* colen *Tagging Perf 1.5.0:* *Tagging Perf 1.5.1:* Precision: 0.9406086044071353 Recall: 0.9364814040952779 F-Measure: 0.9385404669668097 *Comment:* AD format for Chunker was not available for 1.5.0 ========= Test details ========= Conll 2000 ================================================================================ 1.5.1 -------------------------------------------------------------------------------- $ time ./bin/opennlp ChunkerTrainerME -lang en -encoding UTF8 -iterations 100 -cutoff 5 -data train.txt -model en-chunker.bin real 4m39.469s -------- $ time ./bin/opennlp ChunkerEvaluator -encoding UTF8 -data test.txt -model en-chunker.bin Average: 161,7 sent/s Total: 2013 sent Runtime: 12.446s Precision: 0.9255923572240226 Recall: 0.9220610430991112 F-Measure: 0.9238233255623465 real 0m13.356s -------- $ time ./bin/opennlp ChunkerME en-chunker.bin < test_pos.txt > output.txt Loading Chunker model ... done (0,650s) Average: 167,3 sent/s Total: 2012 sent Runtime: 12.024s real 0m12.906s 1.5.0 -------------------------------------------------------------------------------- $ time ./bin/opennlp ChunkerTrainerME -lang en -encoding UTF8 -iterations 100 -cutoff 5 -data ../apache-opennlp/train.txt -model en-chunker.bin real 5m12.107s -------- $ time ./bin/opennlp ChunkerME en-chunker.bin < ../apache-opennlp/test_pos.txt > output.txt Loading Chunker model ... done (0,649s) Average: 169,5 sent/s Total: 2012 sent Runtime: 11.869s real 0m12.752s Arvores Deitadas ================================================================================ 1.5.1 -------------------------------------------------------------------------------- $ bin/opennlp ChunkerConverter ad -encoding ISO-8859-1 -data ../wrk/corpus/Bosque_CF_8.0.ad.txt > bosque-chunk $ time ./bin/opennlp ChunkerTrainerME -lang pt -encoding UTF8 -iterations 100 -cutoff 5 -data bosque-chunk_train.txt -model pt-chunker.bin real 0m56.778s -------- $ time ./bin/opennlp ChunkerEvaluator -encoding UTF8 -data bosque-chunk_test.txt -model pt-chunker.bin Loading Chunker model ... done (0,245s) Average: 145,5 sent/s Total: 411 sent Runtime: 2.825s Precision: 0.9406086044071353 Recall: 0.9364814040952779 F-Measure: 0.9385404669668097 real 0m3.332s
