Hi,

On Tue, Aug 24, 2010 at 4:50 PM, Jan Høydahl / Cominvent
<[email protected]> wrote:
> Do anyone have an answer to this question that I posted last week?
> I know how to generate profiles for Nutch, but not for Tika.

It's the same thing, you just need to postprocess the Nutch profile
files to only contain three-letter ngrams as that's what Tika
currently uses as the standard ngram size.

Any sufficiently representative corpus of text should be good enough
for the language profiles. It would also be good to include some
simple test cases that we can use to verify that future updates to the
language profiles won't break things.

BR,

Jukka Zitting

Reply via email to