Re: lucene - speed and scalability

Brooks, Travis C. Tue, 22 Jun 2010 23:26:59 +0200

On Jun 22, 2010, at 3:35 PM, Roman Chyla wrote:
> 
>> 
>> A concrete use case for DB-driven-suggestions are inputters needs, when
>> the query can be as convoluted as: the cataloger starts typing Geneva in
>> the address field, the system looks up in the Institutes database how
>> many different institutions we have with the address town Geneva, then
>> looks up how many papers we have in the system coming from these
>> institutions, and then rank final institution name propositions
> 
> To my mind, the lookups should be fast or super-fast, I am not sure,
> if suddenly all values become autosuggestable, it is a good thing -
> besides, that can really increase the load of the database and degrade
> performance for everybody. Some of you might know the glocals.ch site,
> once they switched to the system with a lot of lookups, it suddenly
> got so bad (on the live site) that people started leaving the site.
> For the INSPIRE inputters' lookup, it is probably not a big problem,
> nevertheless, I still believe even for them one wants to configure
> carefully what can be autocompleted and what not.



I understand, however, recall that the inputters are doing hundreds of authors 
a day, and this sort of autosuggestion makes a tremendous time savings to them. 
  Thus to make things efficient for them is worth a fair bit of effort on our 
part, including going to lengths to make as many autosuggestions as possible 
available to them.   The use case Tibor mentions above is already implemented 
in SPIRES and inputters save a lot of time with it.   Having this sort of 
functionality is not really optional, at least on the inputter side.   That 
said, if it is useful for inputters, why not make it available for others, who 
might like it.   It is clear actually that this is similar to the autosuggest 
of Google, in that your top completions are the most frequent searches, and 
Tibor's use case is a special case of ranking suggestions based on frequency of 
occurrence in the DB.

So before committing to too much Lucene-ity, it seems like we need to know that 
such uses are not excluded by lucene.   Certainly any heavy auto-suggest use 
might load the system, whether in Lucene or native Invenio, and pre-caching the 
frequencies, or offloading completions to another server, or similar strategies 
might help here.   But first we might like to know that Lucene is capable of 
handling the use case of Tibor in a speedy fashion.

Best
Travis


> 
> Best,
> 
> roman
> 
>> accordingly (e.g. CERN would come before University of Geneva).  This
>> example is what Marko was working on via KBDs.
>> 
>> Best regards
>> --
>> Tibor Simko
>> 

Travis C. Brooks
Manager of Information Systems & SPIRES/INSPIRE
SLAC National Accelerator Laboratory Library
http://www.slac.stanford.edu/spires/

Re: lucene - speed and scalability

Reply via email to