Hi,

On Wed, Feb 27, 2013 at 12:24 PM, Jukka Zitting <[email protected]> wrote:
>     Added 167000 pages in 467 seconds (2.80ms/page)
>     Imported 167404 pages in 1799 seconds (10.75ms/page)

Here's an update on the latest status with the Wikipedia import benchmark:

    $ java -Xmx1500m -jar oak-run/target/oak-run-0.7-SNAPSHOT.jar \
          benchmark --wikipedia=simplewiki-20130414-pages-articles.xml \
          --cache=200  WikipediaImport Oak-Segment
    [...]
    Added 171000 pages in 166 seconds (0.97ms/page)
    Imported 171382 pages in 355 seconds (2.07ms/page)
    [...]
    Traversed 171382 pages in 27 seconds (0.16ms/page)

Pretty good progress here.

> There are still a few problems, most notably the fact the index update
> hook operates directly on the plain MemoryNodeBuilder used by the
> current SegmentMK, so it won't benefit from the automatic purging of
> large change-sets and thus ends up requiring lots of memory during the
> massive final save() call. Something like a SegmentNodeBuilder with
> similar internal purge logic like what we already prototyped in
> KernelNodeState should solve that issue.

This is still an issue, see the -Xmx1500m I used for the import.

> The other big issue is the large amount of time spent processing the
> commit hooks. The one hook approach I outlined earlier should help us
> there.

The work we've done here with the Editor mechanism is clearly paying
off as the commit hooks are now taking some 53% of the import time,
down from 74% two months ago, even when we've been adding more
functionality there.

BR,

Jukka Zitting

Reply via email to