RE: Out of memory on Solr sorting

Fuad Efendi Tue, 22 Jul 2008 15:23:51 -0700

Yes, it is a cache, it stores "sorted" by "sorted field" array ofDocument IDs together with sorted fields; query results can intersectwith it and reorder accordingly.


But memory requirements should be well documented.

It uses internally WeakHashMap which is not good(!!!) - a lot of"underground" warming ups of caches which SOLR is not aware of...Could be.


I think Lucene-SOLR developers should join this discussion:


/**
 * Expert: The default cache implementation, storing all values in memory.
 * A WeakHashMap is used for storage.
 *
..............

  // inherit javadocs
  public StringIndex getStringIndex(IndexReader reader, String field)
      throws IOException {
    return (StringIndex) stringsIndexCache.get(reader, field);
  }

  Cache stringsIndexCache = new Cache() {

    protected Object createValue(IndexReader reader, Object fieldKey)
        throws IOException {
      String field = ((String) fieldKey).intern();
      final int[] retArray = new int[reader.maxDoc()];
      String[] mterms = new String[reader.maxDoc()+1];
      TermDocs termDocs = reader.termDocs();
      TermEnum termEnum = reader.terms (new Term (field, ""));
....................





Quoting Fuad Efendi <[EMAIL PROTECTED]>:

I am hoping [new StringIndex (retArray, mterms)] is called only once
per-sort-field and cached somewhere at Lucene;

theoretically you need multiply number of documents on size of field
(supposing that field contains unique text); you need not tokenize this
field; you need not store TermVector.

for 2 000 000 documents with simple untokenized text field such as
title of book (256 bytes) you need probably 512 000 000 bytes per
Searcher, and as Mark mentioned you should limit number of searchers in
SOLR.

So that Xmx512M is definitely not enough even for simple cases...


Quoting sundar shankar <[EMAIL PROTECTED]>:
I haven't seen the source code before, But I don't know why thesorting isn't done after the fetch is done. Wouldn't that make itmore faster. at least in case of field level sorting? I could bewrong too and the implementation might probably be better. Butdon't know why all of the fields have had to be loaded.
Date: Tue, 22 Jul 2008 14:26:26 -0700> From: [EMAIL PROTECTED]> To:solr-user@lucene.apache.org> Subject: Re: Out of memory on Solrsorting> > > Ok, after some analysis of FieldCacheImpl:> > - it issupposed that (sorted) Enumeration of "terms" is less than >total number of documents> (so that SOLR uses specific field typefor sorted searches: > solr.StrField with omitNorms="true")> >It creates int[reader.maxDoc()] array, checks (sorted)Enumeration of > "terms" (untokenized solr.StrField), andpopulates array with document > Ids.> > > - it also createsarray of String> String[] mterms = newString[reader.maxDoc()+1];> > Why do we need that? For 1Gdocument with average term/StrField size > of 100 bytes (whichcould be unique text!!!) it will create kind of > huge 100Gbcache which is not really needed...> StringIndex value = newStringIndex (retArray, mterms);> > If I understand correctly...StringIndex _must_ be a file in a > filesystem for such acase... We create StringIndex, and retrieve top > 10 documents,huge overhead.> > > >
> Quoting Fuad Efendi <[EMAIL PROTECTED]>:> > >> > Ok, what is
confusing me is implicit guess that FieldCache contains> > "field"and Lucene uses in-memory sort instead of using file-system> >"index".......> >> > Array syze: 100Mb (25M x 4 bytes), and it isjust pointers (4-byte> > integers) to documents in index.> >> >org.apache.lucene.search.FieldCacheImpl$10.createValue> > ...> >357: protected Object createValue(IndexReader reader, ObjectfieldKey)> > 358: throws IOException {> > 359: String field =((String) fieldKey).intern();> > 360: final int[] retArray = newint[reader.maxDoc()]; // > > OutOfMemoryError!!!> > ...> > 408:StringIndex value = new StringIndex (retArray, mterms);> > 409:return value;> > 410: }> > ...> >> > It's very confusing, I don'tknow such internals...> >> >> >>>>> <field name="XXX"type="string" indexed="true" stored="true" > >>>>>termVectors="true"/>> >>>>> The sorting is done based on stringfield.> >> >> > I think Sundar should not use[termVectors="true"]...> >> >> >> > Quoting Mark Miller<[EMAIL PROTECTED]>:> >> >> Hmmm...I think its 32bits aninteger with an index entry for each doc, so> >>> >>> >> **25 000000 x 32 bits = 95.3674316 megabytes**> >>> >> Then you have thestring array that contains each unique term from your> >>index...you can guess that based on the number of terms in yourindex> >> and an avg length guess.> >>> >> There is some otheroverhead beyond the sort cache as well, but thats> >> the bulk ofwhat it will add. I think my memory may be bad with my> >>original estimate :)> >>> >> Fuad Efendi wrote:> >>> Thank youvery much Mark,> >>>> >>> it explains me a lot;> >>>> >>> I amguessing: for 1,000,000 documents with a [string] field of > >>>average size 1024 bytes I need 1Gb for single IndexSearcher > >>>instance; field-level cache it is used internally by Lucene (can> >>> Lucene manage size if it?); we can't have 1G of suchdocuments > >>> without having 1Tb RAM...> >>>> >>>> >>>> >>>Quoting Mark Miller <[EMAIL PROTECTED]>:> >>>> >>>> FuadEfendi wrote:> >>>>>> SEVERE: java.lang.OutOfMemoryError:allocLargeObjectOrArray - Object> >>>>>> size: 100767936, Numelements: 25191979> >>>>>>
>>>>> I just noticed, this is an exact number of documents  in
index: 25191979> >>>>>> >>>>> (http://www.tokenizer.org/, you cansort - click headers Id, > >>>>> [COuntry, Site, Price] in atable; experimental)> >>>>>> >>>>>> >>>>> If array is allocatedONLY on new searcher warming up I am > >>>>> _extremely_ happy...I had constant OOMs during past month (SUN > >>>>> Java 5).> >>>>It is only on warmup - I believe its lazy loaded, so the firsttime a> >>>> search is done (solr does the search as part ofwarmup I believe) the> >>>> fieldcache is loaded. The underlyingIndexReader is the key to the>
fieldcache, so until you get a new  IndexReader (SolrSearcher in
solr> >>>> world?) the field cache will be good. Keep in mindthat as a searcher> >>>> is warming, the other search is stillserving, so that will up the ram> >>>> requirements...and since Ithink you can have >1 searchers on> >>>> deck...you get theidea.> >>>>> >>>> As far as the number I gave, thats from amemory made months and months> >>>> ago, so go with what yousee.> >>>>>> >>>>>> >>>>>>
Quoting Fuad Efendi  <[EMAIL PROTECTED]>:> >>>>>> >>>>>> I've even
seen exceptions (posted  here) when "sort"-type queries caused>
Lucene to allocate  100Mb arrays, here is what happened to
me:> >>>>>>> >>>>>> SEVERE: java.lang.OutOfMemoryError:allocLargeObjectOrArray - Object> >>>>>> size: 100767936, Numelements: 25191979> >>>>>> at> >>>>>>org.apache.lucene.search.FieldCacheImpl$10.createValue(FieldCacheImpl.java:360) > >>>>>> at> >>>>>> org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:72) - it does not happen after I increased from 4096M to 8192M > >>>>>> (JRockit> >>>>>> R27; more intelligent stacktrace, isn't it?)> >>>>>>> >>>>>> Thanks Mark; I didn't know that it happens only once (on warming up a> >>>>>> searcher).> >>>>>>> >>>>>>> >>>>>>> >>>>>> Quoting Mark Miller <[EMAIL PROTECTED]>:> >>>>>>> >>>>>>> Because to sort efficiently, Solr loads the term to sort on for each> >>>>>>> doc in the index into an array. For ints,longs, etc its just an array> >>>>>>> the size of the number of docs in your index (i believe deleted or> >>>>>>> not). For a String its an array to hold each unique string and an array> >>>>>>> of ints indexing into the String array.> >>>>>>>> >>>>>>> So if you do a sort, and search for something that only gets 1 doc as a> >>>>>>> hit...your still loading up that field cache for every single doc in> >>>>>>> your index on the first search. With solr, this happens in the> >>>>>>> background as it warms up the searcher. The end story is, you need more> >>>>>>> RAM to accommodate the sort most likely...have you upped your xmx> >>>>>>> setting? I think you can roughly say a 2 million doc index would need> >>>>>>> 40-50 MB (depending and rough, but to give an idea) per field your> >>>>>>> sorting on.> >>>>>>>> >>>>>>> - Mark> >>>>>>>> >>>>>>> sundar shankar wrote:> >>>>>>>> Thanks Fuad.> >>>>>>>> But why does just sorting provide an OOM. I > >>>>>>>> executed the query without adding the sort clause it executed > >>>>>>>> perfectly. In fact I even tried remove the maxrows=10 and > >>>>>>>> executed. it came out fine. Queries with bigger results > >>>>>>>> seems to come out fine too. But why just sort of that too > >>>>>>>> just 10 rows??> >>>>>>>> -Sundar> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Date: Tue, 22 Jul 2008 12:24:35 -0700> From: [EMAIL PROTECTED]> > >>>>>>>>> To: solr-user@lucene.apache.org> Subject: RE: Out of > >>>>>>>>> memory on Solr sorting> > > >>>>>>>>> org.apache.lucene.search.FieldCacheImpl$10.createValue(FieldCacheImpl.java:403)> > - this piece of code do not request Array[100M] (as I seen with > Lucene), it asks only few bytes / Kb for a field...> > > Probably 128 - 512 is not enough; it is also advisable to use equal sizes> -Xms1024M -Xmx1024M> (it minimizes GC frequency, and itensures that 1024M is available at startup)> > OOM happens also with fragmented memory, when application requests big > contigues fragment and GC is unable to optimize; looks like your > application requests a little and memory is not available...> > > Quoting sundar shankar <[EMAIL PROTECTED]>:> > >> >> >> >> From: [EMAIL PROTECTED]> >> To: solr-user@lucene.apache.org> >> Subject: Out of memory on Solr sorting> >> Date: Tue, 22 Jul 2008 19:11:02 +0000> >>> >>> >> Hi,> >> Sorry again fellos. I am not sure whats happening. The day with > >> solr is bad for me I guess. EZMLM didnt let me send any mails this > >> morning. Asked me to confirm subscription and when I did, it said I > >> was already a member. Now my mails are all coming out bad. Sorry > >> for troubling y'all this bad. I hope this mail comes out right.> >> >> > Hi,> > We are developing a product in a agile manner and the current > > implementation has a data of size just about a 800 megs in dev.> > The memory allocated to solr on dev (Dual core Linux box) is 128-512.> >> > My config> > =========> >> > > >> > <filterCache> > class="solr.LRUCache"> > size="512"> > initialSize="512"> > autowarmCount="256"/>> >> > <queryResultCache> > class="solr.LRUCache"> > size="512"> > initialSize="512"> > autowarmCount="256"/>> >> > <documentCache> > class="solr.LRUCache"> > size="512"> > initialSize="512"> > autowarmCount="0"/>> >> > <enableLazyFieldLoading>true</enableLazyFieldLoading>> >> >> > My Field> > =======> >> > <fieldType name="autocomplete" class="solr.TextField">> > <analyzer type="index">> > <tokenizer class="solr.KeywordTokenizerFactory"/>> > <filter class="solr.LowerCaseFilterFactory" />> > <filter class="solr.PatternReplaceFilterFactory" > > pattern="([^a-z0-9])" replacement="" replace="all" />> > <filter class="solr.EdgeNGramFilterFactory" > > maxGramSize="100" minGramSize="1" />> > </analyzer>> > <analyzer type="query">> > <tokenizer class="solr.KeywordTokenizerFactory"/>> > <filter class="solr.LowerCaseFilterFactory" />> > <filter class="solr.PatternReplaceFilterFactory" > > pattern="([^a-z0-9])" replacement="" replace="all" />> > <filter class="solr.PatternReplaceFilterFactory" > > pattern="^(.{20})(.*)?" replacement="$1" replace="all" />> > </analyzer>> > </fieldType>> >> >> > Problem> > ======> >> > I execute a query that returns 24 rows of result. I pick 10 out of > > it. I have no problem when I execute this.> > But When I do sort it by a String field that is fetched from this > > result. I get an OOM. I am able to execute several> > other queries with no problem. Just having a sort asc clause added > > to the query throws an OOM. Why is that.> > What should I have ideally done. My config on QA is pretty similar > > to the dev box and probably has more data than on dev.> > It didnt throw any OOM during the integration test. The Autocomplete > > is a new field we added recently.> >> > Another point is that the indexing is done with a field of type string> > <field name="XXX" type="string" indexed="true" stored="true" > > termVectors="true"/>> >> > and the autocomplete field is a copy field.> >> > The sorting is done based on string field.> >> > Please do lemme know what mistake am I doing?> >> > Regards> > Sundar> >> > P.S: The stack trace of the exception is> >> >> > Caused by: org.apache.solr.client.solrj.SolrServerException: Error > > executing query> > at > > org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:86)> > at > > org.apache.solr.client.solrj.impl.BaseSolrServer.query(BaseSolrServer.java:101)> > at > > com.apollo.sisaw.solr.service.AbstractSolrSearchService.makeSolrQuery(AbstractSolrSearchService.java:193)> > ... 105 more> > Caused by: org.apache.solr.common.SolrException: Java heap space > > java.lang.OutOfMemoryError: Java heap space> > at > > org.apache.lucene.search.FieldCacheImpl$10.createValue(FieldCacheImpl.java:403)> > at org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:72)> > at > > org.apache.lucene.search.FieldCacheImpl.getStringIndex(FieldCacheImpl.java:352)> > at > > org.apache.lucene.search.FieldSortedHitQueue.comparatorString(FieldSortedHitQueue.java:416)> > at > > org.apache.lucene.search.FieldSortedHitQueue$1.createValue(FieldSortedHitQueue.java:207)> > at org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:72)> > at > > org.apache.lucene.search.FieldSortedHitQueue.getCachedComparator(FieldSortedHitQueue.java:168)> > at > > org.apache.lucene.search.FieldSortedHitQueue.<init>(FieldSortedHitQueue.java:56)> > at > > org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:907)> > at > > org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:838)> > at > > org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:269)> > at > > org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:160)> > at > > org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:156)> > at > > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:128)> > at org.apache.solr.core.SolrCore.execute(SolrCore.java:1025)> > at > > org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)> > at > > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:272)> > at > > org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:202)> > at > > org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:173)> > at > > org.jboss.web.tomcat.filters.ReplyHeaderFilter.doFilter(ReplyHeaderFilter.java:96)> > at > > org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:202)> > at > > org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:173)> > at > > org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:213)> > at > > org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:178)> > at > > org.jboss.web.tomcat.security.SecurityAssociationValve.invoke(SecurityAssociationValve.java:175)> > at > > org.jboss.web.tomcat.security.JaccContextValve.invoke(JaccContextValve.java:74)> > at > > org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:126)> > at > > org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:105)> > at > > org.jboss.web.tomcat.tc5.jca.CachedConnectionValve.invoke(CachedConnectionValve.java:156)> > at > > org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:107)> > at > > org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:148)> > at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:869)> >> > _________________________________________________________________> > Wish to Marry Now? Click Here to Register FREE> > > >>>>>>>>> http://www.shaadi.com/registration/user/index.php?ptnr=mhottag>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>> _________________________________________________________________> >>>>>>>> Missed your favourite programme? Stop surfing TV channels and > >>>>>>>> start planning your weekend TV viewing with our > >>>>>>>> comprehensive TV Listing> >>>>>>>> http://entertainment.in.msn.com/TV/TVListing.aspx> >>>>>>>>> >>>>>> >>>>>> >>>>>> >>>> >>>> >>>>>
_________________________________________________________________
Wish to Marry Now? Join Shaadi.com FREE!
http://www.shaadi.com/registration/user/index.php?ptnr=mhottag

RE: Out of memory on Solr sorting

Reply via email to