Hi,
On Wed, Feb 27, 2013 at 12:24 PM, Jukka Zitting <[email protected]> wrote:
> Added 167000 pages in 467 seconds (2.80ms/page)
> Imported 167404 pages in 1799 seconds (10.75ms/page)
Here's an update on the latest status with the Wikipedia import benchmark:
$ java -Xmx1500m -jar oak-run/target/oak-run-0.7-SNAPSHOT.jar \
benchmark --wikipedia=simplewiki-20130414-pages-articles.xml \
--cache=200 WikipediaImport Oak-Segment
[...]
Added 171000 pages in 166 seconds (0.97ms/page)
Imported 171382 pages in 355 seconds (2.07ms/page)
[...]
Traversed 171382 pages in 27 seconds (0.16ms/page)
Pretty good progress here.
> There are still a few problems, most notably the fact the index update
> hook operates directly on the plain MemoryNodeBuilder used by the
> current SegmentMK, so it won't benefit from the automatic purging of
> large change-sets and thus ends up requiring lots of memory during the
> massive final save() call. Something like a SegmentNodeBuilder with
> similar internal purge logic like what we already prototyped in
> KernelNodeState should solve that issue.
This is still an issue, see the -Xmx1500m I used for the import.
> The other big issue is the large amount of time spent processing the
> commit hooks. The one hook approach I outlined earlier should help us
> there.
The work we've done here with the Editor mechanism is clearly paying
off as the commit hooks are now taking some 53% of the import time,
down from 74% two months ago, even when we've been adding more
functionality there.
BR,
Jukka Zitting