I think that if you have in your index any documents with norms, you
will still use norms for those fields even if the schema is changed
later.  Did you wipe and re-index after all your schema changes?

-Peter

On Fri, May 15, 2009 at 9:14 PM, vivek sar <vivex...@gmail.com> wrote:
> Some more info,
>
>  Profiling the heap dump shows
> "org.apache.lucene.index.ReadOnlySegmentReader" as the biggest object
> - taking up almost 80% of total memory (6G) - see the attached screen
> shot for a smaller dump. There is some norms object - not sure where
> are they coming from as I've omitnorms=true for all indexed records.
>
> I also noticed that if I run a query - let's say generic query that
> hits 100million records and then follow up with a specific query -
> which hits only 1 record, the second query causes the increase in
> heap.
>
> Looks like there are few bytes being loaded into memory for each
> document - I've checked the schema all indexes have omitNorms=true,
> all caches are commented out - still looking to see what else might
> put things in memory which don't get collected by GC.
>
> I also saw, https://issues.apache.org/jira/browse/SOLR-1111 for Solr
> 1.4 (which I'm using). Not sure if that can cause any problem. I do
> use range queries for dates - would that have any effect?
>
> Any other ideas?
>
> Thanks,
> -vivek
>
> On Thu, May 14, 2009 at 8:38 PM, vivek sar <vivex...@gmail.com> wrote:
>> Thanks Mark.
>>
>> I checked all the items you mentioned,
>>
>> 1) I've omitnorms=true for all my indexed fields (stored only fields I
>> guess doesn't matter)
>> 2) I've tried commenting out all caches in the solrconfig.xml, but
>> that doesn't help much
>> 3) I've tried commenting out the first and new searcher listeners
>> settings in the solrconfig.xml - the only way that helps is that at
>> startup time the memory usage doesn't spike up - that's only because
>> there is no auto-warmer query to run. But, I noticed commenting out
>> searchers slows down any other queries to Solr.
>> 4) I don't have any sort or facet in my queries
>> 5) I'm not sure how to change the "Lucene term interval" from Solr -
>> is there a way to do that?
>>
>> I've been playing around with this memory thing the whole day and have
>> found that it's the search that's hogging the memory. Any time there
>> is a search on all the records (800 million) the heap consumption
>> jumps by 5G. This makes me think there has to be some configuration in
>> Solr that's causing some terms per document to be loaded in memory.
>>
>> I've posted my settings several times on this forum, but no one has
>> been able to pin point what configuration might be causing this. If
>> someone is interested I can attach the solrconfig and schema files as
>> well. Here are the settings again under Query tag,
>>
>> <query>
>>  <maxBooleanClauses>1024</maxBooleanClauses>
>>  <enableLazyFieldLoading>true</enableLazyFieldLoading>
>>  <queryResultWindowSize>50</queryResultWindowSize>
>>  <queryResultMaxDocsCached>200</queryResultMaxDocsCached>
>>   <HashDocSet maxSize="3000" loadFactor="0.75"/>
>>  <useColdSearcher>false</useColdSearcher>
>>  <maxWarmingSearchers>2</maxWarmingSearchers>
>>  </query>
>>
>> and schema,
>>
>>  <field name="id" type="long" indexed="true" stored="true"
>> required="true" omitNorms="true" compressed="false"/>
>>
>>  <field name="atmps" type="integer" indexed="false" stored="true"
>> compressed="false"/>
>>  <field name="bcid" type="string" indexed="true" stored="true"
>> omitNorms="true" compressed="false"/>
>>  <field name="cmpcd" type="string" indexed="true" stored="true"
>> omitNorms="true" compressed="false"/>
>>  <field name="ctry" type="string" indexed="true" stored="true"
>> omitNorms="true" compressed="false"/>
>>  <field name="dlt" type="date" indexed="false" stored="true"
>> default="NOW/HOUR"  compressed="false"/>
>>  <field name="dmn" type="string" indexed="true" stored="true"
>> omitNorms="true" compressed="false"/>
>>  <field name="eaddr" type="string" indexed="true" stored="true"
>> omitNorms="true" compressed="false"/>
>>  <field name="emsg" type="string" indexed="false" stored="true"
>> compressed="false"/>
>>  <field name="erc" type="string" indexed="false" stored="true"
>> compressed="false"/>
>>  <field name="evt" type="string" indexed="true" stored="true"
>> omitNorms="true" compressed="false"/>
>>  <field name="from" type="string" indexed="true" stored="true"
>> omitNorms="true" compressed="false"/>
>>  <field name="lfid" type="string" indexed="true" stored="true"
>> omitNorms="true" compressed="false"/>
>>  <field name="lsid" type="string" indexed="true" stored="true"
>> omitNorms="true" compressed="false"/>
>>  <field name="prsid" type="string" indexed="true" stored="true"
>> omitNorms="true" compressed="false"/>
>>  <field name="rc" type="string" indexed="false" stored="true"
>> compressed="false"/>
>>  <field name="rmcd" type="string" indexed="false" stored="true"
>> compressed="false"/>
>>  <field name="rmscd" type="string" indexed="false" stored="true"
>> compressed="false"/>
>>  <field name="scd" type="string" indexed="true" stored="true"
>> omitNorms="true" compressed="false"/>
>>  <field name="sip" type="string" indexed="false" stored="true"
>> compressed="false"/>
>>  <field name="ts" type="date" indexed="true" stored="false"
>> default="NOW/HOUR" omitNorms="true"/>
>>
>>  <!-- catchall field, containing all other searchable text fields 
>> (implemented
>>       via copyField further on in this schema  -->
>>  <field name="all" type="text_ws" indexed="true" stored="false"
>> omitNorms="true" multiValued="true"/>
>>
>> Any help is greatly appreciated.
>>
>> Thanks,
>> -vivek
>>
>> On Thu, May 14, 2009 at 6:22 PM, Mark Miller <markrmil...@gmail.com> wrote:
>>> 800 million docs is on the high side for modern hardware.
>>>
>>> If even one field has norms on, your talking almost 800 MB right there. And
>>> then if another Searcher is brought up well the old one is serving (which
>>> happens when you update)? Doubled.
>>>
>>> Your best bet is to distribute across a couple machines.
>>>
>>> To minimize you would want to turn off or down caching, don't facet, don't
>>> sort, turn off all norms, possibly get at the Lucene term interval and raise
>>> it. Drop on deck searchers setting. Even then, 800 million...time to
>>> distribute I'd think.
>>>
>>> vivek sar wrote:
>>>>
>>>> Some update on this issue,
>>>>
>>>> 1) I attached jconsole to my app and monitored the memory usage.
>>>> During indexing the memory usage goes up and down, which I think is
>>>> normal. The memory remains around the min heap size (4 G) for
>>>> indexing, but as soon as I run a search the tenured heap usage jumps
>>>> up to 6G and remains there. Subsequent searches increases the heap
>>>> usage even more until it reaches the max (8G) - after which everything
>>>> (indexing and searching becomes slow).
>>>>
>>>> The search query is a very generic one in this case which goes through
>>>> all the cores (4 of them - 800 million records), finds 400million
>>>> matches and returns 100 rows.
>>>>
>>>> Does the Solr searcher holds up the reference to objects in memory? I
>>>> couldn't find any settings that would tell me it does, but every
>>>> search causing heap to go up is definitely suspicious.
>>>>
>>>> 2) I ran the jmap histo to get the top objects (this is on a smaller
>>>> instance with 2 G memory, this is before running search - after
>>>> running search I wasn't able to run jmap),
>>>>
>>>>  num     #instances         #bytes  class name
>>>> ----------------------------------------------
>>>>   1:       3890855      222608992  [C
>>>>   2:       3891673      155666920  java.lang.String
>>>>   3:       3284341      131373640  org.apache.lucene.index.TermInfo
>>>>   4:       3334198      106694336  org.apache.lucene.index.Term
>>>>   5:           271       26286496  [J
>>>>   6:            16       26273936  [Lorg.apache.lucene.index.Term;
>>>>   7:            16       26273936  [Lorg.apache.lucene.index.TermInfo;
>>>>   8:        320512       15384576
>>>> org.apache.lucene.index.FreqProxTermsWriter$PostingList
>>>>   9:         10335       11554136  [I
>>>>
>>>> I'm not sure what's the first one (C)? I couldn't profile it to know
>>>> what all the Strings are being allocated by - any ideas?
>>>>
>>>> Any ideas on what Searcher might be holding on and how can we change
>>>> that behavior?
>>>>
>>>> Thanks,
>>>> -vivek
>>>>
>



-- 
Peter M. Wolanin, Ph.D.
Momentum Specialist,  Acquia. Inc.
peter.wola...@acquia.com

Reply via email to