Re: lucene - speed and scalability

Roman Chyla Wed, 23 Jun 2010 00:33:51 +0200

>> To my mind, the lookups should be fast or super-fast, I am not sure,
>> if suddenly all values become autosuggestable, it is a good thing -
>> besides, that can really increase the load of the database and degrade
>> performance for everybody. Some of you might know the glocals.ch site,
>> once they switched to the system with a lot of lookups, it suddenly
>> got so bad (on the live site) that people started leaving the site.
>> For the INSPIRE inputters' lookup, it is probably not a big problem,
>> nevertheless, I still believe even for them one wants to configure
>> carefully what can be autocompleted and what not.
>
>
> I understand, however, recall that the inputters are doing hundreds of 
> authors a day, and this sort of autosuggestion makes a tremendous time 
> savings to them.   Thus to make things efficient for them is worth a fair bit 
> of effort on our part, including going to lengths to make as many 
> autosuggestions as possible available to them.   The use case Tibor


I agree, though certain fields may have 'higher return on investments'
and be 'more important' - interesting thing is that search engines
optimize for a small set of queries to get >80% cases  - i don't
remember the numbers, but i could find it. I believe the logic is the
same

mentions above is already implemented in SPIRES and inputters save a
lot of time with it. Having this sort of functionality is not really
optional, at least on the inputter side.
That said, if it is useful for inputters, why not make it available
for others, who might like it.

I am going to write the documentation soon and commit the plugin

It is clear actually that this is similar to the autosuggest of
Google, in that your top completions are the most frequent searches,
and Tibor's use case is a special case of

in my case it is much more limited, I would caution against thinking
it is Google-like (managing expectations?:) -- no, seriously, I don't
have informaiton about searches, that should be somehow added)

ranking suggestions based on frequency of occurrence in the DB.

>
> So before committing to too much Lucene-ity, it seems like we need to know 
> that such uses are not excluded by lucene.   Certainly any heavy auto-suggest 
> use might load the

I may speak about Lucene, because it was used - but in fact it is not
exactly Lucene-ity vs DB-ity. It is more IR-ity vs SQL-ity. SQL-ity
may rule out IR-ity if you believe that SQL can handle all the cases
and that would be pitty for INSPIRE - it can already be shown that
IR-ity does very well for certain cases where DB faired poorly (at
least according to some test that were reportedly done in the past)

system, whether in Lucene or native Invenio, and pre-caching the
frequencies, or offloading completions to another server, or similar
strategies might help here.   But first we might like to know that
Lucene is capable of handling the use case of Tibor in a speedy
fashion.

Tibor's use case is made for a db query. Lucene is good for some
tasks, Mysql is better in others, if I somehow suggested that
everything should be handled by Lucene, it must have been some error
on my side, I apologise. Never meant anything similar.

Cheers,

  roman

>
> Best
> Travis
>
>
>>
>> Best,
>>
>> roman
>>
>>> accordingly (e.g. CERN would come before University of Geneva).  This
>>> example is what Marko was working on via KBDs.
>>>
>>> Best regards
>>> --
>>> Tibor Simko
>>>
>
> Travis C. Brooks
> Manager of Information Systems & SPIRES/INSPIRE
> SLAC National Accelerator Laboratory Library
> http://www.slac.stanford.edu/spires/
>
>
>
>
>

Re: lucene - speed and scalability

Reply via email to