Tom Mueller (pkg-discuss) wrote: > Brock, > > PyLucene is a front end for Lucene, which is written in Java. So would > using this mean that a JRE would be required to run pkg(5)? > I imagine so, yes. Brock > Tom > > Brock Pytlik wrote: >> I hit send before I meant to, so here's one more piece of performance data. >> >> Indexing a repo on ipkg takes about 40min with existing methods and has >> a size of 951M and a RSS of 947M. Indexing with pylucene takes roughly 2 >> hours, whether it's has a size of 155M or a size of 1100M. >> (Specifically, the 155M run took 2 hours and 1 minute while the 1100M >> run took 1 hour and 49 minutes.) >> >> Brock >> >> Brock Pytlik wrote: >> >>> Over the past couple of weeks or so I've been looking into switching our >>> search back end to use PyLucene. I've now got a working prototype which >>> passes the test suite and I've been experimenting with it recently to >>> check out its performance. After all that, I'm not sure which direction >>> makes sense going forward, whether to make the switch or instead try to >>> improve our existing back end. >>> >>> The one sentence summary is that PyLucene is more flexible and offers >>> functionality that would take substantial effort for us to engineer but >>> has RAM and disk footprints that are heavier than the current >>> implementations and doesn't offering overwhelming speed improvements. If >>> we went with PyLucene I could work on making search so that it returns >>> the entire action and updating the API's to use that ability as best >>> they could. If we stay with the current approach, then I would work on >>> speeding update and laying the ground work to handle the critical >>> features like boolean queries and structured search (which would give us >>> the ability to search against versions, and with a bit more extension, >>> against incorporations). >>> >>> What I'm looking for from everyone is some views on whether the >>> footprints I'm seeing from PyLucene are just to heavy or not. I have >>> some ideas about how to reduce the footprint of PyLucene, at least a >>> small amount, but I don't expect substantial changes, especially not for >>> the memory growth during search. >>> >>> In detail, here's what I've found. >>> >>> Reasons for switching to PyLucene: >>> Large variety of desired queries preexisting, including boolean and >>> structured queries which would need to be implemented in the other >>> engine in the near future and which are not trivial to do. >>> >>> Somewhat faster searching locally (1.0 secs vs 1.4 roughly). >>> >>> It already correctly handles locking indexes and having readers update >>> on the fly. Multiple readers can have the same index open at the same time. >>> >>> Easier control of RAM/time tradeoffs. >>> >>> Depot RAM usage not dependent on size of index. >>> >>> It's likely to scale better in terms of speed for local search, and >>> possibly for remote search as well. >>> >>> >>> >>> >>> Reasons for sticking with existing approach: >>> Smaller indexes, at least so far. (40M vs 240M on my local system, 272M >>> vs 4.2G on ipkg as reported by du) >>> >>> Constant depot memory usage for all searches. Using pylucene makes the >>> depot grow when searches are done for things like p* (up to 710 size, >>> 650M rss). >>> >>> Faster search for things p*. (30 seconds vs 2 minutes) though on normal >>> queries, times seem comparable. >>> >>> More predictable behavior for queries. PyLucene preexpands wildcard >>> queries and requires a max clause count number to be set. Even at 100000 >>> a search against ipkg for '(1.6.0_06)*' broke this limit. Turning this >>> number higher had negative effects on performance from what I observed. >>> >>> >>> >>> On the subject of faster index update, I think the jury is out. If >>> pylucene doesn't optimize the index after each install, then it's >>> substantially faster than the current implementation, but not faster >>> than I think a fairly simple adjustment to the current implementation >>> would be so that it also didn't optimize the index after each installation. >>> >>> Thanks for your time, I'm looking forward to hearing what everyone thinks. >>> >>> Brock >>> _______________________________________________ >>> pkg-discuss mailing list >>> [email protected] >>> http://mail.opensolaris.org/mailman/listinfo/pkg-discuss >>> >>> >> >> _______________________________________________ >> pkg-discuss mailing list >> [email protected] >> http://mail.opensolaris.org/mailman/listinfo/pkg-discuss >> >
_______________________________________________ pkg-discuss mailing list [email protected] http://mail.opensolaris.org/mailman/listinfo/pkg-discuss
