I am concerned about this one as well. Especially since the majority of the language analyzers in lucene-contrib do not implement reusableTokenStream.
On Sun, Aug 9, 2009 at 5:06 PM, Michael Busch<[email protected]> wrote: > Are you sure that the initialization costs of the > TokenStream/AttributeSource cause the slowdown? With the bw-comp. code now > every call of a Token method goes through a delegation layer. I'm afraid > that might cause a slowdown? > > The code that figures out what Attributes to put into the map uses > reflection, but only if the impl wasn't seen before; otherwise the > attributes are looked up in a cache. > > The culprit could also be the reflection code that checks which TokenStream > methods are implemented. > > I can't look at the code right now (writing on my cell). > Even if this is "fixable", I don't really like the fact that users who > upgrade to 2.9 will potentially see such a performance hit unless they > implement incrementToken() and reusableTokenStream. > > Michael > > On Aug 9, 2009, at 11:13 AM, Yonik Seeley <[email protected]> > wrote: > >> FYI >> https://issues.apache.org/jira/browse/SOLR-1353 >> >> On Sun, Aug 9, 2009 at 2:02 PM, Yonik Seeley<[email protected]> >> wrote: >>> >>> It looks like implementing the new attribute stuff will not be enough >>> - the token architecture has changed enough that it looks like we must >>> cache tokenstreams to get back to good performance. >>> >>> -Yonik >>> http://www.lucidimagination.com >>> >>> >>> On Sun, Aug 9, 2009 at 12:57 PM, Yonik Seeley<[email protected]> >>> wrote: >>>> >>>> OK, I've isolated (magnified) the effect with a test I just checked in. >>>> Indexing documents directly at the UpdateHandler was 85% faster before >>>> the latest lucene update. >>>> >>>> Run the test like this: >>>> >>>> ant test -Dtestcase=TestIndexingPerformance -Dargs="-server >>>> -Diter=100000"; grep throughput >>>> build/test-results/*TestIndexingPerformance.xml >>>> >>>> To run on an older trunk version, just copy over >>>> src/test/org/apache/solr/update/TestIndexingPerformance.java >>>> src/test/test-files/solr/conf/solrconfig_perf.xml >>>> >>>> I had a throughput of 10946 docs/sec before the lucene update, and 5849 >>>> after. >>>> >>>> -Yonik >>>> http://www.lucidimagination.com >>>> >>>> >>>> On Sun, Aug 9, 2009 at 12:10 PM, Yonik >>>> Seeley<[email protected]> wrote: >>>>> >>>>> On Sun, Aug 9, 2009 at 12:01 PM, Grant Ingersoll<[email protected]> >>>>> wrote: >>>>>> >>>>>> Or bite the bullet and upgrade to the incrementToken() method. >>>>> >>>>> Right - I'm not sure if that would fix it or not - I haven't been >>>>> involved in the new Token attribute stuff... >>>>> I'm currently writing a basic indexing unit test that we can use to >>>>> measure this (the standard solrconfig does stuff that slows down >>>>> indexing a lot, but helps in catching bugs on edge cases by creating >>>>> many segments). >>>>> >>>>> -Yonik >>>>> http://www.lucidimagination.com >>>>> >>>> >>> > -- Robert Muir [email protected]
