Hi, On 4/26/13 2:15 PM, "Jukka Zitting" <[email protected]> wrote:
>Hi, > >On Wed, Feb 27, 2013 at 12:24 PM, Jukka Zitting <[email protected]> >wrote: >> Added 167000 pages in 467 seconds (2.80ms/page) >> Imported 167404 pages in 1799 seconds (10.75ms/page) > >Here's an update on the latest status with the Wikipedia import benchmark: > > $ java -Xmx1500m -jar oak-run/target/oak-run-0.7-SNAPSHOT.jar \ > benchmark --wikipedia=simplewiki-20130414-pages-articles.xml \ > --cache=200 WikipediaImport Oak-Segment > [...] > Added 171000 pages in 166 seconds (0.97ms/page) > Imported 171382 pages in 355 seconds (2.07ms/page) > [...] > Traversed 171382 pages in 27 seconds (0.16ms/page) > >Pretty good progress here. Those are impressive numbers. Do comparisons with Jackrabbit exist? Cheers Lukas >> There are still a few problems, most notably the fact the index update >> hook operates directly on the plain MemoryNodeBuilder used by the >> current SegmentMK, so it won't benefit from the automatic purging of >> large change-sets and thus ends up requiring lots of memory during the >> massive final save() call. Something like a SegmentNodeBuilder with >> similar internal purge logic like what we already prototyped in >> KernelNodeState should solve that issue. > >This is still an issue, see the -Xmx1500m I used for the import. > >> The other big issue is the large amount of time spent processing the >> commit hooks. The one hook approach I outlined earlier should help us >> there. > >The work we've done here with the Editor mechanism is clearly paying >off as the commit hooks are now taking some 53% of the import time, >down from 74% two months ago, even when we've been adding more >functionality there. > >BR, > >Jukka Zitting
