Warning: I'm waaaay out of my competency range when I comment on SOLR, but I've seen the statement that string fields are NOT tokenized while text fields are, and I notice that almost all of your fields are string type.
Would someone more knowledgeable than me care to comment on whether this is at all relevant? Offered in the spirit that sometimes there are things so basic that only an amateur can see them <G>.... Best Erick On Wed, May 13, 2009 at 4:42 PM, vivek sar <vivex...@gmail.com> wrote: > Thanks Otis. > > Our use case doesn't require any sorting or faceting. I'm wondering if > I've configured anything wrong. > > I got total of 25 fields (15 are indexed and stored, other 10 are just > stored). All my fields are basic data type - which I thought are not > sorted. My id field is unique key. > > Is there any field here that might be getting sorted? > > <field name="id" type="long" indexed="true" stored="true" > required="true" omitNorms="true" compressed="false"/> > > <field name="atmps" type="integer" indexed="false" stored="true" > compressed="false"/> > <field name="bcid" type="string" indexed="true" stored="true" > omitNorms="true" compressed="false"/> > <field name="cmpcd" type="string" indexed="true" stored="true" > omitNorms="true" compressed="false"/> > <field name="ctry" type="string" indexed="true" stored="true" > omitNorms="true" compressed="false"/> > <field name="dlt" type="date" indexed="false" stored="true" > default="NOW/HOUR" compressed="false"/> > <field name="dmn" type="string" indexed="true" stored="true" > omitNorms="true" compressed="false"/> > <field name="eaddr" type="string" indexed="true" stored="true" > omitNorms="true" compressed="false"/> > <field name="emsg" type="string" indexed="false" stored="true" > compressed="false"/> > <field name="erc" type="string" indexed="false" stored="true" > compressed="false"/> > <field name="evt" type="string" indexed="true" stored="true" > omitNorms="true" compressed="false"/> > <field name="from" type="string" indexed="true" stored="true" > omitNorms="true" compressed="false"/> > <field name="lfid" type="string" indexed="true" stored="true" > omitNorms="true" compressed="false"/> > <field name="lsid" type="string" indexed="true" stored="true" > omitNorms="true" compressed="false"/> > <field name="prsid" type="string" indexed="true" stored="true" > omitNorms="true" compressed="false"/> > <field name="rc" type="string" indexed="false" stored="true" > compressed="false"/> > <field name="rmcd" type="string" indexed="false" stored="true" > compressed="false"/> > <field name="rmscd" type="string" indexed="false" stored="true" > compressed="false"/> > <field name="scd" type="string" indexed="true" stored="true" > omitNorms="true" compressed="false"/> > <field name="sip" type="string" indexed="false" stored="true" > compressed="false"/> > <field name="ts" type="date" indexed="true" stored="false" > default="NOW/HOUR" omitNorms="true"/> > > > <!-- catchall field, containing all other searchable text fields > (implemented > via copyField further on in this schema --> > <field name="all" type="text_ws" indexed="true" stored="false" > omitNorms="true" multiValued="true"/> > > Thanks, > -vivek > > On Wed, May 13, 2009 at 1:10 PM, Otis Gospodnetic > <otis_gospodne...@yahoo.com> wrote: > > > > Hi, > > Some answers: > > 1) .tii files in the Lucene index. When you sort, all distinct values > for the field(s) used for sorting. Similarly for facet fields. Solr > caches. > > 2) ramBufferSizeMB dictates, more or less, how much Lucene/Solr will > consume during indexing. There is no need to commit every 50K docs unless > you want to trigger snapshot creation. > > 3) see 1) above > > > > 1.5 billion docs per instance where each doc is cca 1KB? I doubt that's > going to fly. :) > > > > Otis > > -- > > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > > > > > > > > ----- Original Message ---- > >> From: vivek sar <vivex...@gmail.com> > >> To: solr-user@lucene.apache.org > >> Sent: Wednesday, May 13, 2009 3:04:46 PM > >> Subject: Solr memory requirements? > >> > >> Hi, > >> > >> I'm pretty sure this has been asked before, but I couldn't find a > >> complete answer in the forum archive. Here are my questions, > >> > >> 1) When solr starts up what does it loads up in the memory? Let's say > >> I've 4 cores with each core 50G in size. When Solr comes up how much > >> of it would be loaded in memory? > >> > >> 2) How much memory is required during index time? If I'm committing > >> 50K records at a time (1 record = 1KB) using solrj, how much memory do > >> I need to give to Solr. > >> > >> 3) Is there a minimum memory requirement by Solr to maintain a certain > >> size index? Is there any benchmark on this? > >> > >> Here are some of my configuration from solrconfig.xml, > >> > >> 1) 64 > >> 2) All the caches (under query tag) are commented out > >> 3) Few others, > >> a) true ==> > >> would this require memory? > >> b) 50 > >> c) 200 > >> d) > >> e) false > >> f) 2 > >> > >> The problem we are having is following, > >> > >> I've given Solr RAM of 6G. As the total index size (all cores > >> combined) start growing the Solr memory consumption goes up. With 800 > >> million documents, I see Solr already taking up all the memory at > >> startup. After that the commits, searches everything become slow. We > >> will be having distributed setup with multiple Solr instances (around > >> 8) on four boxes, but our requirement is to have each Solr instance at > >> least maintain around 1.5 billion documents. > >> > >> We are trying to see if we can somehow reduce the Solr memory > >> footprint. If someone can provide a pointer on what parameters affect > >> memory and what effects it has we can then decide whether we want that > >> parameter or not. I'm not sure if there is any minimum Solr > >> requirement for it to be able maintain large indexes. I've used Lucene > >> before and that didn't require anything by default - it used up memory > >> only during index and search times - not otherwise. > >> > >> Any help is very much appreciated. > >> > >> Thanks, > >> -vivek > > > > >