Re: big perf-difference between solr-server vs. SOlrJ req.process(solrserver)

Otis Gospodnetic Tue, 01 Jan 2008 20:54:20 -0800

Maybe I'm not following your situation 100%, but it sounded like pulling the 
values of purely stored fields is the slow part. *Perhaps* using a non-Lucene 
data store just for the saved fields would be faster.


Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

----- Original Message ----
From: Geert-Jan Brits <[EMAIL PROTECTED]>
To: solr-user@lucene.apache.org
Sent: Monday, December 31, 2007 8:49:43 AM
Subject: Re: big perf-difference between solr-server vs. SOlrJ 
req.process(solrserver)

Hi Otis,

I don't really see how this would minimize my number of fields.
At the moment I have 1 pricefield (stored / indexed) and 1 multivalued
 field
(stored) per  product-variant. I have about 2000 product variants.

I could indeed replace each multivalued field by a singlevaluedfield
 with an
id pointing to a external store, where I get the needed fields. However
 this
would not change the number of fields in my index (correct?) and thus
wouldn't matter for the big scanning-time I'm seeing. Moreover, it
 wouldn't
matter for the query-time either I guess.

Thanks,
Geert-Jan





2007/12/29, Otis Gospodnetic <[EMAIL PROTECTED]>:
>
> Hi Geert-Jan,
>
> Have you considered storing this data in an external data store and
 not
> Lucene index?  In other words, use the Lucene index only to index the
> content you need to search.  Then, when you search this index, just
 pull out
> the single stored fields, the unique ID for each of top N hits, and
 use
> those ID to pull the actual content for display purposes from the
 external
> store.  This external store could be a RDBMS, an ODBMS, a BDB, etc.
  I've
> worked with very large indices where we successfully used BDBs for
 this
> purpose.
>
> Otis
>
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>
> ----- Original Message ----
> From: Geert-Jan Brits <[EMAIL PROTECTED]>
> To: solr-user@lucene.apache.org
> Sent: Thursday, December 27, 2007 11:44:13 AM
> Subject: Re: big perf-difference between solr-server vs. SOlrJ
 req.process
> (solrserver)
>
> yeah, that makes sense.
> so, in in all, could scanning all the fields and loading the 10
 fields
> add
> up to cost about the same or even more as performing the intial
 query?
> (Just
> making sure)
>
> I am wondering if the following change to the schema would help in
 this
> case:
>
> current setup:
> It's possible to have up to 2000 product-variants.
> each product-variant has:
> - 1 price field (stored / indexed)
> - 1 multivalued field which contains product-variant characteristics
> (strored / not indexed).
>
> This adds up to the 4000 fields described. Moreover there are some
> fields on
> the product level but these would contibute just a tiny bit to the
> overall
> scanning / loading costs (about 50 -stored and indexed- fields in
> total)
>
> possible new setup (only the changes) :
> - index but not store the price-field.
> - store the price as just another one of the product-variant
> characteristics
> in the multivalued product-variant field.
>
> as a result this would bring back the maximum number of stored fields
> to
> about 2050 from 4050 and thereby about halving scanning / loading
 costs
> while leaving the current quering-costs intact.
> Indexing costs would increase a bit.
>
> Would you expect the same performance gain?
>
> Thanks,
> Geert-Jan
>
> 2007/12/27, Yonik Seeley <[EMAIL PROTECTED]>:
> >
> > On Dec 27, 2007 11:01 AM, Britske <[EMAIL PROTECTED]> wrote:
> > > after inspecting solrconfig.xml I see that I already have enabled
> lazy
> > field
> > > loading by:
> > > <enableLazyFieldLoading>true</enableLazyFieldLoading> (I guess it
> was
> > > enabled by default)
> > >
> > > Since any query returns about 10 fields (which differ from query
 to
> > query) ,
> > > would this mean that only these 10 of about 2000-4000 fields are
> > retrieved /
> > > loaded?
> >
> > Yes, but that's not the whole story.
> > Lucene stores all of the fields back-to-back with no index (there
 is
> > no random access to particular stored fields)... so all of the
 fields
> > must be at least scanned.
> >
> > -Yonik
> >
>
>
>
>

Re: big perf-difference between solr-server vs. SOlrJ req.process(solrserver)

Reply via email to