Re: indexing slowdown with latest lucene udpate

Mark Miller Sun, 09 Aug 2009 20:48:07 -0700

isMethodOverriden is just nasty - copying Methods, security checks,walking the type hierarchy, this, that, some more. I bet cglib has areally fast version - too bad there is no built in equivalent.

Its not nearly as clean, but what if a new TokenStream simply identifieditself as supporting increment, and the default impl returns false? Thedeveloper knows at compile time right? Almost no reason to keep askingthe code over and over again, especially since its so expensive. Thenreusable doubles the cost.


Mark Miller wrote:

Michael Busch wrote:
Are you sure that the initialization costs of theTokenStream/AttributeSource cause the slowdown? With the bw-comp.code now every call of a Token method goes through a delegationlayer. I'm afraid that might cause a slowdown?
Its isMethodOverriden and TokenStream<init>(AttributeSource).
The code that figures out what Attributes to put into the map usesreflection, but only if the impl wasn't seen before; otherwise theattributes are looked up in a cache.
The culprit could also be the reflection code that checks whichTokenStream methods are implemented.
I can't look at the code right now (writing on my cell).
Even if this is "fixable", I don't really like the fact that userswho upgrade to 2.9 will potentially see such a performance hit unlessthey implement incrementToken() and reusableTokenStream.
Looks like you take a good hit, but keep in mind that test is almostworst case scenario as well - the Document text is extremely short.
 Michael
On Aug 9, 2009, at 11:13 AM, Yonik Seeley<[email protected]> wrote:
FYI
https://issues.apache.org/jira/browse/SOLR-1353
On Sun, Aug 9, 2009 at 2:02 PM, YonikSeeley<[email protected]> wrote:
It looks like implementing the new attribute stuff will not be enough
- the token architecture has changed enough that it looks like we must
cache tokenstreams to get back to good performance.

-Yonik
http://www.lucidimagination.com
On Sun, Aug 9, 2009 at 12:57 PM, YonikSeeley<[email protected]> wrote:
OK, I've isolated (magnified) the effect with a test I justchecked in.Indexing documents directly at the UpdateHandler was 85% fasterbefore
the latest lucene update.

Run the test like this:

ant test -Dtestcase=TestIndexingPerformance -Dargs="-server
-Diter=100000"; grep throughput
build/test-results/*TestIndexingPerformance.xml

To run on an older trunk version, just copy over
src/test/org/apache/solr/update/TestIndexingPerformance.java
src/test/test-files/solr/conf/solrconfig_perf.xml
I had a throughput of 10946 docs/sec before the lucene update, and5849 after.
-Yonik
http://www.lucidimagination.com
On Sun, Aug 9, 2009 at 12:10 PM, YonikSeeley<[email protected]> wrote:
On Sun, Aug 9, 2009 at 12:01 PM, GrantIngersoll<[email protected]> wrote:
Or bite the bullet and upgrade to the incrementToken() method.
Right - I'm not sure if that would fix it or not - I haven't been
involved in the new Token attribute stuff...
I'm currently writing a basic indexing unit test that we can use to
measure this (the standard solrconfig does stuff that slows down
indexing a lot, but helps in catching bugs on edge cases by creating
many segments).

-Yonik
http://www.lucidimagination.com



--
- Mark

http://www.lucidimagination.com

Re: indexing slowdown with latest lucene udpate

Reply via email to