On 10/3/11 10:30 AM, Em wrote:
What about document's length?
Just as an example: The production-data will contain documents with a
length of several pages as well as very short texts containing only a
few sentences.
I think about chunking the long documents into smaller ones (i.e. a page
of a longer document is splitted into an individual doc). Does this
makes sense?
I would first try to process a long document at once. If you encounter any
issues you could just call clearAdaptiveData before the end of the document.
But as Olivier said, you might just want to include a couple of these in
your training
data.
Jörn