Re: How does good training data look like?

Jörn Kottmann Wed, 05 Oct 2011 10:47:50 -0700

On 10/3/11 10:30 AM, Em wrote:

What about document's length?
Just as an example: The production-data will contain documents with a
length of several pages as well as very short texts containing only a
few sentences.


I think about chunking the long documents into smaller ones (i.e. a page
of a longer document is splitted into an individual doc). Does this
makes sense?


I would first try to process a long document at once. If you encounter any
issues you could just call clearAdaptiveData before the end of the document.

But as Olivier said, you might just want to include a couple of these inyour training

data.

Jörn

Re: How does good training data look like?

Reply via email to