Nick,

I've tracked down the issue, but I'm afraid it does not help much:
https://issues.apache.org/jira/browse/TIKA-546
Converting the 4-grams to 3-grams and dropping the 1- and 2- grams crossed
my mind, but it seems I'm probably better off creating a new profile from a
fresh, large corpus anyway.

Best solution would be, if Tika would read the Nutch profile format :-) But
I don't have enough understanding of the code to see whether this would be
easy to do.

Best
Cedric

On 18 January 2013 16:14, Nick Burch <[email protected]> wrote:

> gram profiler

Reply via email to