Yes, it is a cache, it stores "sorted" by "sorted field" array of Document IDs together with sorted fields; query results can intersect with it and reorder accordingly.

But memory requirements should be well documented.

It uses internally WeakHashMap which is not good(!!!) - a lot of "underground" warming ups of caches which SOLR is not aware of... Could be.

I think Lucene-SOLR developers should join this discussion:


/**
 * Expert: The default cache implementation, storing all values in memory.
 * A WeakHashMap is used for storage.
 *
..............

  // inherit javadocs
  public StringIndex getStringIndex(IndexReader reader, String field)
      throws IOException {
    return (StringIndex) stringsIndexCache.get(reader, field);
  }

  Cache stringsIndexCache = new Cache() {

    protected Object createValue(IndexReader reader, Object fieldKey)
        throws IOException {
      String field = ((String) fieldKey).intern();
      final int[] retArray = new int[reader.maxDoc()];
      String[] mterms = new String[reader.maxDoc()+1];
      TermDocs termDocs = reader.termDocs();
      TermEnum termEnum = reader.terms (new Term (field, ""));
....................





Quoting Fuad Efendi <[EMAIL PROTECTED]>:

I am hoping [new StringIndex (retArray, mterms)] is called only once
per-sort-field and cached somewhere at Lucene;

theoretically you need multiply number of documents on size of field
(supposing that field contains unique text); you need not tokenize this
field; you need not store TermVector.

for 2 000 000 documents with simple untokenized text field such as
title of book (256 bytes) you need probably 512 000 000 bytes per
Searcher, and as Mark mentioned you should limit number of searchers in
SOLR.

So that Xmx512M is definitely not enough even for simple cases...


Quoting sundar shankar <[EMAIL PROTECTED]>:

I haven't seen the source code before, But I don't know why the sorting isn't done after the fetch is done. Wouldn't that make it more faster. at least in case of field level sorting? I could be wrong too and the implementation might probably be better. But don't know why all of the fields have had to be loaded.





Date: Tue, 22 Jul 2008 14:26:26 -0700> From: [EMAIL PROTECTED]> To: solr-user@lucene.apache.org> Subject: Re: Out of memory on Solr sorting> > > Ok, after some analysis of FieldCacheImpl:> > - it is supposed that (sorted) Enumeration of "terms" is less than > total number of documents> (so that SOLR uses specific field type for sorted searches: > solr.StrField with omitNorms="true")> > It creates int[reader.maxDoc()] array, checks (sorted) Enumeration of > "terms" (untokenized solr.StrField), and populates array with document > Ids.> > > - it also creates array of String> String[] mterms = new String[reader.maxDoc()+1];> > Why do we need that? For 1G document with average term/StrField size > of 100 bytes (which could be unique text!!!) it will create kind of > huge 100Gb cache which is not really needed...> StringIndex value = new StringIndex (retArray, mterms);> > If I understand correctly... StringIndex _must_ be a file in a > filesystem for such a case... We create StringIndex, and retrieve top > 10 documents, huge overhead.> > > >
> Quoting Fuad Efendi <[EMAIL PROTECTED]>:> > >> > Ok, what is
confusing me is implicit guess that FieldCache contains> > "field" and Lucene uses in-memory sort instead of using file-system> > "index".......> >> > Array syze: 100Mb (25M x 4 bytes), and it is just pointers (4-byte> > integers) to documents in index.> >> > org.apache.lucene.search.FieldCacheImpl$10.createValue> > ...> > 357: protected Object createValue(IndexReader reader, Object fieldKey)> > 358: throws IOException {> > 359: String field = ((String) fieldKey).intern();> > 360: final int[] retArray = new int[reader.maxDoc()]; // > > OutOfMemoryError!!!> > ...> > 408: StringIndex value = new StringIndex (retArray, mterms);> > 409: return value;> > 410: }> > ...> >> > It's very confusing, I don't know such internals...> >> >> >>>>> <field name="XXX" type="string" indexed="true" stored="true" > >>>>> termVectors="true"/>> >>>>> The sorting is done based on string field.> >> >> > I think Sundar should not use [termVectors="true"]...> >> >> >> > Quoting Mark Miller <[EMAIL PROTECTED]>:> >> >> Hmmm...I think its 32bits an integer with an index entry for each doc, so> >>> >>> >> **25 000 000 x 32 bits = 95.3674316 megabytes**> >>> >> Then you have the string array that contains each unique term from your> >> index...you can guess that based on the number of terms in your index> >> and an avg length guess.> >>> >> There is some other overhead beyond the sort cache as well, but thats> >> the bulk of what it will add. I think my memory may be bad with my> >> original estimate :)> >>> >> Fuad Efendi wrote:> >>> Thank you very much Mark,> >>>> >>> it explains me a lot;> >>>> >>> I am guessing: for 1,000,000 documents with a [string] field of > >>> average size 1024 bytes I need 1Gb for single IndexSearcher > >>> instance; field-level cache it is used internally by Lucene (can > >>> Lucene manage size if it?); we can't have 1G of such documents > >>> without having 1Tb RAM...> >>>> >>>> >>>> >>> Quoting Mark Miller <[EMAIL PROTECTED]>:> >>>> >>>> Fuad Efendi wrote:> >>>>>> SEVERE: java.lang.OutOfMemoryError: allocLargeObjectOrArray - Object> >>>>>> size: 100767936, Num elements: 25191979> >>>>>>
>>>>> I just noticed, this is an exact number of documents  in
index: 25191979> >>>>>> >>>>> (http://www.tokenizer.org/, you can sort - click headers Id, > >>>>> [COuntry, Site, Price] in a table; experimental)> >>>>>> >>>>>> >>>>> If array is allocated ONLY on new searcher warming up I am > >>>>> _extremely_ happy... I had constant OOMs during past month (SUN > >>>>> Java 5).> >>>> It is only on warmup - I believe its lazy loaded, so the first time a> >>>> search is done (solr does the search as part of warmup I believe) the> >>>> fieldcache is loaded. The underlying IndexReader is the key to the>
fieldcache, so until you get a new  IndexReader (SolrSearcher in
solr> >>>> world?) the field cache will be good. Keep in mind that as a searcher> >>>> is warming, the other search is still serving, so that will up the ram> >>>> requirements...and since I think you can have >1 searchers on> >>>> deck...you get the idea.> >>>>> >>>> As far as the number I gave, thats from a memory made months and months> >>>> ago, so go with what you see.> >>>>>> >>>>>> >>>>>>
Quoting Fuad Efendi  <[EMAIL PROTECTED]>:> >>>>>> >>>>>> I've even
seen exceptions (posted  here) when "sort"-type queries caused>
Lucene to allocate  100Mb arrays, here is what happened to
me:> >>>>>>> >>>>>> SEVERE: java.lang.OutOfMemoryError: allocLargeObjectOrArray - Object> >>>>>> size: 100767936, Num elements: 25191979> >>>>>> at> >>>>>> org.apache.lucene.search.FieldCacheImpl$10.createValue(FieldCacheImpl.java:360) > >>>>>> at> >>>>>> org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:72) - it does not happen after I increased from 4096M to 8192M > >>>>>> (JRockit> >>>>>> R27; more intelligent stacktrace, isn't it?)> >>>>>>> >>>>>> Thanks Mark; I didn't know that it happens only once (on warming up a> >>>>>> searcher).> >>>>>>> >>>>>>> >>>>>>> >>>>>> Quoting Mark Miller <[EMAIL PROTECTED]>:> >>>>>>> >>>>>>> Because to sort efficiently, Solr loads the term to sort on for each> >>>>>>> doc in the index into an array. For ints,longs, etc its just an array> >>>>>>> the size of the number of docs in your index (i believe deleted or> >>>>>>> not). For a String its an array to hold each unique string and an array> >>>>>>> of ints indexing into the String array.> >>>>>>>> >>>>>>> So if you do a sort, and search for something that only gets 1 doc as a> >>>>>>> hit...your still loading up that field cache for every single doc in> >>>>>>> your index on the first search. With solr, this happens in the> >>>>>>> background as it warms up the searcher. The end story is, you need more> >>>>>>> RAM to accommodate the sort most likely...have you upped your xmx> >>>>>>> setting? I think you can roughly say a 2 million doc index would need> >>>>>>> 40-50 MB (depending and rough, but to give an idea) per field your> >>>>>>> sorting on.> >>>>>>>> >>>>>>> - Mark> >>>>>>>> >>>>>>> sundar shankar wrote:> >>>>>>>> Thanks Fuad.> >>>>>>>> But why does just sorting provide an OOM. I > >>>>>>>> executed the query without adding the sort clause it executed > >>>>>>>> perfectly. In fact I even tried remove the maxrows=10 and > >>>>>>>> executed. it came out fine. Queries with bigger results > >>>>>>>> seems to come out fine too. But why just sort of that too > >>>>>>>> just 10 rows??> >>>>>>>> -Sundar> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Date: Tue, 22 Jul 2008 12:24:35 -0700> From: [EMAIL PROTECTED]> > >>>>>>>>> To: solr-user@lucene.apache.org> Subject: RE: Out of > >>>>>>>>> memory on Solr sorting> > > >>>>>>>>> org.apache.lucene.search.FieldCacheImpl$10.createValue(FieldCacheImpl.java:403)> > - this piece of code do not request Array[100M] (as I seen with > Lucene), it asks only few bytes / Kb for a field...> > > Probably 128 - 512 is not enough; it is also advisable to use equal sizes> -Xms1024M -Xmx1024M> (it minimizes GC frequency, and itensures that 1024M is available at startup)> > OOM happens also with fragmented memory, when application requests big > contigues fragment and GC is unable to optimize; looks like your > application requests a little and memory is not available...> > > Quoting sundar shankar <[EMAIL PROTECTED]>:> > >> >> >> >> From: [EMAIL PROTECTED]> >> To: solr-user@lucene.apache.org> >> Subject: Out of memory on Solr sorting> >> Date: Tue, 22 Jul 2008 19:11:02 +0000> >>> >>> >> Hi,> >> Sorry again fellos. I am not sure whats happening. The day with > >> solr is bad for me I guess. EZMLM didnt let me send any mails this > >> morning. Asked me to confirm subscription and when I did, it said I > >> was already a member. Now my mails are all coming out bad. Sorry > >> for troubling y'all this bad. I hope this mail comes out right.> >> >> > Hi,> > We are developing a product in a agile manner and the current > > implementation has a data of size just about a 800 megs in dev.> > The memory allocated to solr on dev (Dual core Linux box) is 128-512.> >> > My config> > =========> >> > <!-- autocommit pending docs if certain criteria are met> > <autoCommit>> > <maxDocs>10000</maxDocs>> > <maxTime>1000</maxTime>> > </autoCommit>> > -->> >> > <filterCache> > class="solr.LRUCache"> > size="512"> > initialSize="512"> > autowarmCount="256"/>> >> > <queryResultCache> > class="solr.LRUCache"> > size="512"> > initialSize="512"> > autowarmCount="256"/>> >> > <documentCache> > class="solr.LRUCache"> > size="512"> > initialSize="512"> > autowarmCount="0"/>> >> > <enableLazyFieldLoading>true</enableLazyFieldLoading>> >> >> > My Field> > =======> >> > <fieldType name="autocomplete" class="solr.TextField">> > <analyzer type="index">> > <tokenizer class="solr.KeywordTokenizerFactory"/>> > <filter class="solr.LowerCaseFilterFactory" />> > <filter class="solr.PatternReplaceFilterFactory" > > pattern="([^a-z0-9])" replacement="" replace="all" />> > <filter class="solr.EdgeNGramFilterFactory" > > maxGramSize="100" minGramSize="1" />> > </analyzer>> > <analyzer type="query">> > <tokenizer class="solr.KeywordTokenizerFactory"/>> > <filter class="solr.LowerCaseFilterFactory" />> > <filter class="solr.PatternReplaceFilterFactory" > > pattern="([^a-z0-9])" replacement="" replace="all" />> > <filter class="solr.PatternReplaceFilterFactory" > > pattern="^(.{20})(.*)?" replacement="$1" replace="all" />> > </analyzer>> > </fieldType>> >> >> > Problem> > ======> >> > I execute a query that returns 24 rows of result. I pick 10 out of > > it. I have no problem when I execute this.> > But When I do sort it by a String field that is fetched from this > > result. I get an OOM. I am able to execute several> > other queries with no problem. Just having a sort asc clause added > > to the query throws an OOM. Why is that.> > What should I have ideally done. My config on QA is pretty similar > > to the dev box and probably has more data than on dev.> > It didnt throw any OOM during the integration test. The Autocomplete > > is a new field we added recently.> >> > Another point is that the indexing is done with a field of type string> > <field name="XXX" type="string" indexed="true" stored="true" > > termVectors="true"/>> >> > and the autocomplete field is a copy field.> >> > The sorting is done based on string field.> >> > Please do lemme know what mistake am I doing?> >> > Regards> > Sundar> >> > P.S: The stack trace of the exception is> >> >> > Caused by: org.apache.solr.client.solrj.SolrServerException: Error > > executing query> > at > > org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:86)> > at > > org.apache.solr.client.solrj.impl.BaseSolrServer.query(BaseSolrServer.java:101)> > at > > com.apollo.sisaw.solr.service.AbstractSolrSearchService.makeSolrQuery(AbstractSolrSearchService.java:193)> > ... 105 more> > Caused by: org.apache.solr.common.SolrException: Java heap space > > java.lang.OutOfMemoryError: Java heap space> > at > > org.apache.lucene.search.FieldCacheImpl$10.createValue(FieldCacheImpl.java:403)> > at org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:72)> > at > > org.apache.lucene.search.FieldCacheImpl.getStringIndex(FieldCacheImpl.java:352)> > at > > org.apache.lucene.search.FieldSortedHitQueue.comparatorString(FieldSortedHitQueue.java:416)> > at > > org.apache.lucene.search.FieldSortedHitQueue$1.createValue(FieldSortedHitQueue.java:207)> > at org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:72)> > at > > org.apache.lucene.search.FieldSortedHitQueue.getCachedComparator(FieldSortedHitQueue.java:168)> > at > > org.apache.lucene.search.FieldSortedHitQueue.<init>(FieldSortedHitQueue.java:56)> > at > > org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:907)> > at > > org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:838)> > at > > org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:269)> > at > > org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:160)> > at > > org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:156)> > at > > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:128)> > at org.apache.solr.core.SolrCore.execute(SolrCore.java:1025)> > at > > org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)> > at > > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:272)> > at > > org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:202)> > at > > org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:173)> > at > > org.jboss.web.tomcat.filters.ReplyHeaderFilter.doFilter(ReplyHeaderFilter.java:96)> > at > > org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:202)> > at > > org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:173)> > at > > org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:213)> > at > > org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:178)> > at > > org.jboss.web.tomcat.security.SecurityAssociationValve.invoke(SecurityAssociationValve.java:175)> > at > > org.jboss.web.tomcat.security.JaccContextValve.invoke(JaccContextValve.java:74)> > at > > org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:126)> > at > > org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:105)> > at > > org.jboss.web.tomcat.tc5.jca.CachedConnectionValve.invoke(CachedConnectionValve.java:156)> > at > > org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:107)> > at > > org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:148)> > at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:869)> >> > _________________________________________________________________> > Wish to Marry Now? Click Here to Register FREE> > > >>>>>>>>> http://www.shaadi.com/registration/user/index.php?ptnr=mhottag>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>> _________________________________________________________________> >>>>>>>> Missed your favourite programme? Stop surfing TV channels and > >>>>>>>> start planning your weekend TV viewing with our > >>>>>>>> comprehensive TV Listing> >>>>>>>> http://entertainment.in.msn.com/TV/TVListing.aspx> >>>>>>>>> >>>>>> >>>>>> >>>>>> >>>> >>>> >>>> >

_________________________________________________________________
Wish to Marry Now? Join Shaadi.com FREE!
http://www.shaadi.com/registration/user/index.php?ptnr=mhottag



Reply via email to