Process time is divided by 6 from original code. But these results uses the old ngrams profiles (that gathers ngrams of different sizes). I must rebuild the ngrams profiles with only 3-grams in order to correctly bench the code.

nice improvement!


Sami, do you uses the whole set available at http://people.csail.mit.edu/people/koehn/publications/europarl/ , or just some parts of text to build the profiles? (If I correctly remember my previous works on ngrams, just a few Mo are necessary to have a representative set of 3-grams).

I used a relative small subset - just a few MB to build the profiles.

--
 Sami Siren


------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click _______________________________________________ Nutch-developers mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to