Re: indexing slowdown with latest lucene udpate

Yonik Seeley Sun, 09 Aug 2009 11:13:53 -0700

FYI
https://issues.apache.org/jira/browse/SOLR-1353


On Sun, Aug 9, 2009 at 2:02 PM, Yonik Seeley<[email protected]> wrote:
> It looks like implementing the new attribute stuff will not be enough
> - the token architecture has changed enough that it looks like we must
> cache tokenstreams to get back to good performance.
>
> -Yonik
> http://www.lucidimagination.com
>
>
> On Sun, Aug 9, 2009 at 12:57 PM, Yonik Seeley<[email protected]> 
> wrote:
>> OK, I've isolated (magnified) the effect with a test I just checked in.
>> Indexing documents directly at the UpdateHandler was 85% faster before
>> the latest lucene update.
>>
>> Run the test like this:
>>
>> ant test -Dtestcase=TestIndexingPerformance -Dargs="-server
>> -Diter=100000"; grep throughput
>> build/test-results/*TestIndexingPerformance.xml
>>
>> To run on an older trunk version, just copy over
>> src/test/org/apache/solr/update/TestIndexingPerformance.java
>> src/test/test-files/solr/conf/solrconfig_perf.xml
>>
>> I had a throughput of 10946 docs/sec before the lucene update, and 5849 
>> after.
>>
>> -Yonik
>> http://www.lucidimagination.com
>>
>>
>> On Sun, Aug 9, 2009 at 12:10 PM, Yonik Seeley<[email protected]> 
>> wrote:
>>> On Sun, Aug 9, 2009 at 12:01 PM, Grant Ingersoll<[email protected]> wrote:
>>>> Or bite the bullet and upgrade to the incrementToken() method.
>>>
>>> Right - I'm not sure if that would fix it or not - I haven't been
>>> involved in the new Token attribute stuff...
>>> I'm currently writing a basic indexing unit test that we can use to
>>> measure this (the standard solrconfig does stuff that slows down
>>> indexing a lot, but helps in catching bugs on edge cases by creating
>>> many segments).
>>>
>>> -Yonik
>>> http://www.lucidimagination.com
>>>
>>
>

Re: indexing slowdown with latest lucene udpate

Reply via email to