isMethodOverriden is just nasty - copying Methods, security checks,
walking the type hierarchy, this, that, some more. I bet cglib has a
really fast version - too bad there is no built in equivalent.
Its not nearly as clean, but what if a new TokenStream simply identified
itself as supporting increment, and the default impl returns false? The
developer knows at compile time right? Almost no reason to keep asking
the code over and over again, especially since its so expensive. Then
reusable doubles the cost.
Mark Miller wrote:
Michael Busch wrote:
Are you sure that the initialization costs of the
TokenStream/AttributeSource cause the slowdown? With the bw-comp.
code now every call of a Token method goes through a delegation
layer. I'm afraid that might cause a slowdown?
Its isMethodOverriden and TokenStream<init>(AttributeSource).
The code that figures out what Attributes to put into the map uses
reflection, but only if the impl wasn't seen before; otherwise the
attributes are looked up in a cache.
The culprit could also be the reflection code that checks which
TokenStream methods are implemented.
I can't look at the code right now (writing on my cell).
Even if this is "fixable", I don't really like the fact that users
who upgrade to 2.9 will potentially see such a performance hit unless
they implement incrementToken() and reusableTokenStream.
Looks like you take a good hit, but keep in mind that test is almost
worst case scenario as well - the Document text is extremely short.
Michael
On Aug 9, 2009, at 11:13 AM, Yonik Seeley
<[email protected]> wrote:
FYI
https://issues.apache.org/jira/browse/SOLR-1353
On Sun, Aug 9, 2009 at 2:02 PM, Yonik
Seeley<[email protected]> wrote:
It looks like implementing the new attribute stuff will not be enough
- the token architecture has changed enough that it looks like we must
cache tokenstreams to get back to good performance.
-Yonik
http://www.lucidimagination.com
On Sun, Aug 9, 2009 at 12:57 PM, Yonik
Seeley<[email protected]> wrote:
OK, I've isolated (magnified) the effect with a test I just
checked in.
Indexing documents directly at the UpdateHandler was 85% faster
before
the latest lucene update.
Run the test like this:
ant test -Dtestcase=TestIndexingPerformance -Dargs="-server
-Diter=100000"; grep throughput
build/test-results/*TestIndexingPerformance.xml
To run on an older trunk version, just copy over
src/test/org/apache/solr/update/TestIndexingPerformance.java
src/test/test-files/solr/conf/solrconfig_perf.xml
I had a throughput of 10946 docs/sec before the lucene update, and
5849 after.
-Yonik
http://www.lucidimagination.com
On Sun, Aug 9, 2009 at 12:10 PM, Yonik
Seeley<[email protected]> wrote:
On Sun, Aug 9, 2009 at 12:01 PM, Grant
Ingersoll<[email protected]> wrote:
Or bite the bullet and upgrade to the incrementToken() method.
Right - I'm not sure if that would fix it or not - I haven't been
involved in the new Token attribute stuff...
I'm currently writing a basic indexing unit test that we can use to
measure this (the standard solrconfig does stuff that slows down
indexing a lot, but helps in catching bugs on edge cases by creating
many segments).
-Yonik
http://www.lucidimagination.com
--
- Mark
http://www.lucidimagination.com