Maybe I'm not following your situation 100%, but it sounded like pulling the values of purely stored fields is the slow part. *Perhaps* using a non-Lucene data store just for the saved fields would be faster.
Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch ----- Original Message ---- From: Geert-Jan Brits <[EMAIL PROTECTED]> To: solr-user@lucene.apache.org Sent: Monday, December 31, 2007 8:49:43 AM Subject: Re: big perf-difference between solr-server vs. SOlrJ req.process(solrserver) Hi Otis, I don't really see how this would minimize my number of fields. At the moment I have 1 pricefield (stored / indexed) and 1 multivalued field (stored) per product-variant. I have about 2000 product variants. I could indeed replace each multivalued field by a singlevaluedfield with an id pointing to a external store, where I get the needed fields. However this would not change the number of fields in my index (correct?) and thus wouldn't matter for the big scanning-time I'm seeing. Moreover, it wouldn't matter for the query-time either I guess. Thanks, Geert-Jan 2007/12/29, Otis Gospodnetic <[EMAIL PROTECTED]>: > > Hi Geert-Jan, > > Have you considered storing this data in an external data store and not > Lucene index? In other words, use the Lucene index only to index the > content you need to search. Then, when you search this index, just pull out > the single stored fields, the unique ID for each of top N hits, and use > those ID to pull the actual content for display purposes from the external > store. This external store could be a RDBMS, an ODBMS, a BDB, etc. I've > worked with very large indices where we successfully used BDBs for this > purpose. > > Otis > > -- > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > > ----- Original Message ---- > From: Geert-Jan Brits <[EMAIL PROTECTED]> > To: solr-user@lucene.apache.org > Sent: Thursday, December 27, 2007 11:44:13 AM > Subject: Re: big perf-difference between solr-server vs. SOlrJ req.process > (solrserver) > > yeah, that makes sense. > so, in in all, could scanning all the fields and loading the 10 fields > add > up to cost about the same or even more as performing the intial query? > (Just > making sure) > > I am wondering if the following change to the schema would help in this > case: > > current setup: > It's possible to have up to 2000 product-variants. > each product-variant has: > - 1 price field (stored / indexed) > - 1 multivalued field which contains product-variant characteristics > (strored / not indexed). > > This adds up to the 4000 fields described. Moreover there are some > fields on > the product level but these would contibute just a tiny bit to the > overall > scanning / loading costs (about 50 -stored and indexed- fields in > total) > > possible new setup (only the changes) : > - index but not store the price-field. > - store the price as just another one of the product-variant > characteristics > in the multivalued product-variant field. > > as a result this would bring back the maximum number of stored fields > to > about 2050 from 4050 and thereby about halving scanning / loading costs > while leaving the current quering-costs intact. > Indexing costs would increase a bit. > > Would you expect the same performance gain? > > Thanks, > Geert-Jan > > 2007/12/27, Yonik Seeley <[EMAIL PROTECTED]>: > > > > On Dec 27, 2007 11:01 AM, Britske <[EMAIL PROTECTED]> wrote: > > > after inspecting solrconfig.xml I see that I already have enabled > lazy > > field > > > loading by: > > > <enableLazyFieldLoading>true</enableLazyFieldLoading> (I guess it > was > > > enabled by default) > > > > > > Since any query returns about 10 fields (which differ from query to > > query) , > > > would this mean that only these 10 of about 2000-4000 fields are > > retrieved / > > > loaded? > > > > Yes, but that's not the whole story. > > Lucene stores all of the fields back-to-back with no index (there is > > no random access to particular stored fields)... so all of the fields > > must be at least scanned. > > > > -Yonik > > > > > >