It looks like implementing the new attribute stuff will not be enough - the token architecture has changed enough that it looks like we must cache tokenstreams to get back to good performance.
-Yonik http://www.lucidimagination.com On Sun, Aug 9, 2009 at 12:57 PM, Yonik Seeley<[email protected]> wrote: > OK, I've isolated (magnified) the effect with a test I just checked in. > Indexing documents directly at the UpdateHandler was 85% faster before > the latest lucene update. > > Run the test like this: > > ant test -Dtestcase=TestIndexingPerformance -Dargs="-server > -Diter=100000"; grep throughput > build/test-results/*TestIndexingPerformance.xml > > To run on an older trunk version, just copy over > src/test/org/apache/solr/update/TestIndexingPerformance.java > src/test/test-files/solr/conf/solrconfig_perf.xml > > I had a throughput of 10946 docs/sec before the lucene update, and 5849 after. > > -Yonik > http://www.lucidimagination.com > > > On Sun, Aug 9, 2009 at 12:10 PM, Yonik Seeley<[email protected]> > wrote: >> On Sun, Aug 9, 2009 at 12:01 PM, Grant Ingersoll<[email protected]> wrote: >>> Or bite the bullet and upgrade to the incrementToken() method. >> >> Right - I'm not sure if that would fix it or not - I haven't been >> involved in the new Token attribute stuff... >> I'm currently writing a basic indexing unit test that we can use to >> measure this (the standard solrconfig does stuff that slows down >> indexing a lot, but helps in catching bugs on edge cases by creating >> many segments). >> >> -Yonik >> http://www.lucidimagination.com >> >
