Are you sure that the initialization costs of the TokenStream/
AttributeSource cause the slowdown? With the bw-comp. code now every
call of a Token method goes through a delegation layer. I'm afraid
that might cause a slowdown?
The code that figures out what Attributes to put into the map uses
reflection, but only if the impl wasn't seen before; otherwise the
attributes are looked up in a cache.
The culprit could also be the reflection code that checks which
TokenStream methods are implemented.
I can't look at the code right now (writing on my cell).
Even if this is "fixable", I don't really like the fact that users who
upgrade to 2.9 will potentially see such a performance hit unless they
implement incrementToken() and reusableTokenStream.
Michael
On Aug 9, 2009, at 11:13 AM, Yonik Seeley <[email protected]>
wrote:
FYI
https://issues.apache.org/jira/browse/SOLR-1353
On Sun, Aug 9, 2009 at 2:02 PM, Yonik Seeley<[email protected]
> wrote:
It looks like implementing the new attribute stuff will not be enough
- the token architecture has changed enough that it looks like we
must
cache tokenstreams to get back to good performance.
-Yonik
http://www.lucidimagination.com
On Sun, Aug 9, 2009 at 12:57 PM, Yonik Seeley<[email protected]
> wrote:
OK, I've isolated (magnified) the effect with a test I just
checked in.
Indexing documents directly at the UpdateHandler was 85% faster
before
the latest lucene update.
Run the test like this:
ant test -Dtestcase=TestIndexingPerformance -Dargs="-server
-Diter=100000"; grep throughput
build/test-results/*TestIndexingPerformance.xml
To run on an older trunk version, just copy over
src/test/org/apache/solr/update/TestIndexingPerformance.java
src/test/test-files/solr/conf/solrconfig_perf.xml
I had a throughput of 10946 docs/sec before the lucene update, and
5849 after.
-Yonik
http://www.lucidimagination.com
On Sun, Aug 9, 2009 at 12:10 PM, Yonik Seeley<[email protected]
> wrote:
On Sun, Aug 9, 2009 at 12:01 PM, Grant Ingersoll<[email protected]
> wrote:
Or bite the bullet and upgrade to the incrementToken() method.
Right - I'm not sure if that would fix it or not - I haven't been
involved in the new Token attribute stuff...
I'm currently writing a basic indexing unit test that we can use to
measure this (the standard solrconfig does stuff that slows down
indexing a lot, but helps in catching bugs on edge cases by
creating
many segments).
-Yonik
http://www.lucidimagination.com