There is a great deal of formal activity in this area - see TREC (
http://en.wikipedia.org/wiki/Text_Retrieval_Conference) which runs
competitions and provides metrics.
Formally a lot of effort is required to produce a precise, reproducible
number. In simple terms you need a corpus which has alrea
Hi, I am using PDFBox to extract text from PDF files.
As you know, due to some reason, PDFbox might produce errors when
extracting text from some PDF files, the question I want to ask is
that: is there a way to automatically evaluate the quality of text
extraction result? Or can PDFBox offer a conf
2 matches
Mail list logo