Tom Mueller (pkg-discuss) wrote:
> Brock,
>
> PyLucene is a front end for Lucene, which is written in Java. So would 
> using this mean that a JRE would be required to run pkg(5)?
>
I imagine so, yes.
Brock
> Tom
>
> Brock Pytlik wrote:
>> I hit send before I meant to, so here's one more piece of performance data.
>>
>> Indexing a repo on ipkg takes about 40min with existing methods and has 
>> a size of 951M and a RSS of 947M. Indexing with pylucene takes roughly 2 
>> hours, whether it's has a size of 155M or a size of 1100M. 
>> (Specifically, the 155M run took 2 hours and 1 minute while the 1100M 
>> run took 1 hour and 49 minutes.)
>>
>> Brock
>>
>> Brock Pytlik wrote:
>>   
>>> Over the past couple of weeks or so I've been looking into switching our 
>>> search back end to use PyLucene. I've now got a working prototype which 
>>> passes the test suite and I've been experimenting with it recently to 
>>> check out its performance. After all that, I'm not sure which direction 
>>> makes sense going forward, whether to make the switch or instead try to 
>>> improve our existing back end.
>>>
>>> The one sentence summary is that PyLucene is more flexible and offers 
>>> functionality that would take substantial effort for us to engineer but 
>>> has RAM and disk footprints that are heavier than the current 
>>> implementations and doesn't offering overwhelming speed improvements. If 
>>> we went with PyLucene I could work on making search so that it returns 
>>> the entire action and updating the API's to use that ability as best 
>>> they could. If we stay with the current approach, then I would work on 
>>> speeding update and laying the ground work to handle the critical 
>>> features like boolean queries and structured search (which would give us 
>>> the ability to search against versions, and with a bit more extension, 
>>> against incorporations).
>>>
>>> What I'm looking for from everyone is some views on whether the 
>>> footprints I'm seeing from PyLucene are just to heavy or not. I have 
>>> some ideas about how to reduce the footprint of PyLucene, at least a 
>>> small amount, but I don't expect substantial changes, especially not for 
>>> the memory growth during search.
>>>
>>> In detail, here's what I've found.
>>>
>>> Reasons for switching to PyLucene:
>>> Large variety of desired queries preexisting, including boolean and 
>>> structured queries which would need to be implemented in the other 
>>> engine in the near future and which are not trivial to do.
>>>
>>> Somewhat faster searching locally (1.0 secs vs 1.4 roughly).
>>>
>>> It already correctly handles locking indexes and having readers update 
>>> on the fly. Multiple readers can have the same index open at the same time.
>>>
>>> Easier control of RAM/time tradeoffs.
>>>
>>> Depot RAM usage not dependent on size of index.
>>>
>>> It's likely to scale better in terms of speed for local search, and 
>>> possibly for remote search as well.
>>>
>>>
>>>
>>>
>>> Reasons for sticking with existing approach:
>>> Smaller indexes, at least so far. (40M vs 240M on my local system, 272M 
>>> vs 4.2G on ipkg as reported by du)
>>>
>>> Constant depot memory usage for all searches. Using pylucene makes the 
>>> depot grow when searches are done for things like p* (up to 710 size, 
>>> 650M rss).
>>>
>>> Faster search for things p*. (30 seconds vs 2 minutes) though on normal 
>>> queries, times seem comparable.
>>>
>>> More predictable behavior for queries. PyLucene preexpands wildcard 
>>> queries and requires a max clause count number to be set. Even at 100000 
>>> a search against ipkg for '(1.6.0_06)*' broke this limit. Turning this 
>>> number higher had negative effects on performance from what I observed.
>>>
>>>
>>>
>>> On the subject of faster index update, I think the jury is out. If 
>>> pylucene doesn't optimize the index after each install, then it's 
>>> substantially faster than the current implementation, but not faster 
>>> than I think a fairly simple adjustment to the current implementation 
>>> would be so that it also didn't optimize the index after each installation.
>>>
>>> Thanks for your time, I'm looking forward to hearing what everyone thinks.
>>>
>>> Brock
>>> _______________________________________________
>>> pkg-discuss mailing list
>>> [email protected]
>>> http://mail.opensolaris.org/mailman/listinfo/pkg-discuss
>>>   
>>>     
>>
>> _______________________________________________
>> pkg-discuss mailing list
>> [email protected]
>> http://mail.opensolaris.org/mailman/listinfo/pkg-discuss
>>   
>

_______________________________________________
pkg-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/pkg-discuss

Reply via email to