Quoting Rickard Öberg <[email protected]>:
Hi,
I did some tests, and found a number of ways to fix this issue.
First of all, whenever you do benchmarking, make sure you run the
test repeatedly. Running it once will not allow the JIT to kick in.
Run tests about 10 times instead, in the same go, to get more stable
numbers.
Second, I changed the notifyChanges() indexing in
RdfEntityIndexerMixin to do the removes in one call to Sesame,
instead of one call per object.
Third (and most important), you had forgotten to add indexes. I added this:
prefModule.forMixin( NativeConfiguration.class
).declareDefaults().tripleIndexes().set( "cspo,spoc" );
And the performance became muuuuuch better. Along with the other
fixes, my results (two-year-old MacBook Pro, 24Ghz, are:
286ms compared to 97secs is a big difference. My guess is that the
lack of indexing was the biggest problem. In any case, I've checked
in the updated RDF indexer version which removes all entities in one
call, which should help regardless.
Thanks for fixes and advices, Rickard. Adding those indexes seemed to
make good improvement in our project. However, I still am not
convinced with OpenRDF code related to this problem at all, in some
places it looks like someone's first programming assignment. Need to
find time to make some proper solution to this problem tho.
Btw, those 97secs were during performance instrumentation, which slows
down program significally. Without instrumentation, I got around
1-2sec time (without indexes) for full test.
Oh, a "funny" bit, when removing entity, which contained around 12000
aggregated entities (as a test), I got out-of-memory error somewhere
in Qi4j, MapEntityStore or somewhere like that (need to check log on
my work place, if you want more info about that).
_______________________________________________
qi4j-dev mailing list
[email protected]
http://lists.ops4j.org/mailman/listinfo/qi4j-dev