Re: protwords.txt support in stemmers

Yonik Seeley Tue, 30 Mar 2010 07:32:30 -0700

On Tue, Mar 30, 2010 at 10:07 AM, Robert Muir <rcm...@gmail.com> wrote:
> Sorta unrelated too, but on the same topic of performance, I'd really like
> to improve the indexing speed with the example schema, and thats my hidden
> motivation here.
>
> I think we've already significantly improved WDF and SnowballPorter
> performance in trunk, but if we add this support we could at least consider
> switching to the much much faster PorterStemmer in the Lucene core for the
> example schema, as it would then support protected words via this mechanism.


Cool!

> Do you have a preferred way to benchmark type "text" for example?

Unfortunately not... it's normally something ad hoc like uploading a
big CSV file, etc.

There's also the very simplistic TestIndexingPerformance.
ant test -Dtestcase=TestIndexingPerformance -Dargs="-server
-Diter=100000"; grep throughput
build/test-results/*TestIndexingPerformance*


-Yonik
http://www.lucidimagination.com

Re: protwords.txt support in stemmers

Reply via email to