Re: facet method=enum and uninvertedfield limitations
What is the actual target speed you are pursuing? Is this for user suggestions or something of that sort? Content based suggestions with faceting and esp on 1.4 solr won't be lightning fast. Have you looked at TermsComponent? http://wiki.apache.org/solr/TermsComponent By shingles, which in the rest of the world are more commonly called ngrams, I meant a way of compressing the number of entities to iterate through. Let's say if you only store bigrams or trigrams and facet based on those (less in amount). Dmitry On Wed, Nov 20, 2013 at 6:10 PM, Lemke, Michael SZ/HZA-ZSW lemke...@schaeffler.com wrote: On Wednesday, November 20, 2013 7:37 AM, Dmitry Kan wrote: Thanks for your reply. Since you are faceting on a text field (is this correct?) you deal with a lot of unique values in it. Yes, this is a text field and we experimented with reducing the index. As I said in my original question the stripped down index had 178,000 terms and it (fc) still didn't work. Is number of terms the relevant quantity? So your best bet is enum method. Hm, yes, that works but I have to wait 4 minutes for the answer (with the original data). Not good. Also if you are on solr 4x try building doc values in the index: this suits faceting well. We are on Solr 1.4, so, no. Otherwise start from your spec once again. Can you use shingles instead? Possibly but I don't know shingles. Although I'd prefer to use our original index we are trying to build a specialized index just for this sort of query but still don't know what to look for. A query like q=wordfacet.field=CONTENTfacet=truefacet.limit=10facet.mincount=1facet.method=fcfacet.prefix=arows=0 would give me the top ten results containing 'word' and something starting with 'a'. That's what I want. An empty facet.prefix should also work. Eventually, the query will be more complex containing other fields and filter queries but the basic function should be exactly like this. How can we achieve this? Thanks, Michael On 19 Nov 2013 17:44, Lemke, Michael SZ/HZA-ZSW lemke...@schaeffler.com wrote: On Friday, November 15, 2013 11:22 AM, Lemke, Michael SZ/HZA-ZSW wrote: Judging from numerous replies this seems to be a tough question. Nevertheless, I'd really appreciate any help as we are stuck. We'd really like to know what in our index causes the facet.method=fc query to fail. Thanks, Michael On Thu, November 14, 2013 7:26 PM, Yonik Seeley wrote: On Thu, Nov 14, 2013 at 12:03 PM, Lemke, Michael SZ/HZA-ZSW lemke...@schaeffler.com wrote: I am running into performance problems with faceted queries. If I do a q=wordfacet.field=CONTENTfacet=truefacet.limit=10facet.mincount=1facet.method=fcfacet.prefix=arows=0 I am getting an exception: org.apache.solr.common.SolrException: Too many values for UnInvertedField faceting on field CONTENT at org.apache.solr.request.UnInvertedField.uninvert(UnInvertedField.java:384) at org.apache.solr.request.UnInvertedField.lt;initgt;(UnInvertedField.java:178) at org.apache.solr.request.UnInvertedField.getUnInvertedField(UnInvertedField.java:839) ... I understand it's got something to do with a 24bit limit somewhere in the code but I don't understand enough of it to be able to construct a specialized index that can be queried with facet.method=enum. You shouldn't need to do anything differently to try facet.method=enum (just replace facet.method=fc with facet.method=enum) This is true and facet.method=enum does work indeed. The problem is runtime. In particular queries with an empty facet.prefix= run many seconds if not minutes. I initially asked about this here: http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201310.mbox/%3c33ec3398272fbe47b64ee3b3e98f69a761427...@de011521.schaeffler.com%3E It was suggested that fc is much faster than enum and I'd like to test that. We are still fairly free to design the index such that it performs well. But to do that we need to understand what is killing it. You may also want to add the parameter facet.enum.cache.minDf=10 to lower memory usage by only usiing the filter cache for terms that match more than 100K docs. That helped a little, cut down my particular test from 10 sec to 5 sec. But still too slow. Mind you this is for an autosuggest feature. Thanks for your reply. Michael -- Dmitry Blog: http://dmitrykan.blogspot.com Twitter: twitter.com/dmitrykan
RE: facet method=enum and uninvertedfield limitations
On Wednesday, November 20, 2013 7:37 AM, Dmitry Kan wrote: Thanks for your reply. Since you are faceting on a text field (is this correct?) you deal with a lot of unique values in it. Yes, this is a text field and we experimented with reducing the index. As I said in my original question the stripped down index had 178,000 terms and it (fc) still didn't work. Is number of terms the relevant quantity? So your best bet is enum method. Hm, yes, that works but I have to wait 4 minutes for the answer (with the original data). Not good. Also if you are on solr 4x try building doc values in the index: this suits faceting well. We are on Solr 1.4, so, no. Otherwise start from your spec once again. Can you use shingles instead? Possibly but I don't know shingles. Although I'd prefer to use our original index we are trying to build a specialized index just for this sort of query but still don't know what to look for. A query like q=wordfacet.field=CONTENTfacet=truefacet.limit=10facet.mincount=1facet.method=fcfacet.prefix=arows=0 would give me the top ten results containing 'word' and something starting with 'a'. That's what I want. An empty facet.prefix should also work. Eventually, the query will be more complex containing other fields and filter queries but the basic function should be exactly like this. How can we achieve this? Thanks, Michael On 19 Nov 2013 17:44, Lemke, Michael SZ/HZA-ZSW lemke...@schaeffler.com wrote: On Friday, November 15, 2013 11:22 AM, Lemke, Michael SZ/HZA-ZSW wrote: Judging from numerous replies this seems to be a tough question. Nevertheless, I'd really appreciate any help as we are stuck. We'd really like to know what in our index causes the facet.method=fc query to fail. Thanks, Michael On Thu, November 14, 2013 7:26 PM, Yonik Seeley wrote: On Thu, Nov 14, 2013 at 12:03 PM, Lemke, Michael SZ/HZA-ZSW lemke...@schaeffler.com wrote: I am running into performance problems with faceted queries. If I do a q=wordfacet.field=CONTENTfacet=truefacet.limit=10facet.mincount=1facet.method=fcfacet.prefix=arows=0 I am getting an exception: org.apache.solr.common.SolrException: Too many values for UnInvertedField faceting on field CONTENT at org.apache.solr.request.UnInvertedField.uninvert(UnInvertedField.java:384) at org.apache.solr.request.UnInvertedField.lt;initgt;(UnInvertedField.java:178) at org.apache.solr.request.UnInvertedField.getUnInvertedField(UnInvertedField.java:839) ... I understand it's got something to do with a 24bit limit somewhere in the code but I don't understand enough of it to be able to construct a specialized index that can be queried with facet.method=enum. You shouldn't need to do anything differently to try facet.method=enum (just replace facet.method=fc with facet.method=enum) This is true and facet.method=enum does work indeed. The problem is runtime. In particular queries with an empty facet.prefix= run many seconds if not minutes. I initially asked about this here: http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201310.mbox/%3c33ec3398272fbe47b64ee3b3e98f69a761427...@de011521.schaeffler.com%3E It was suggested that fc is much faster than enum and I'd like to test that. We are still fairly free to design the index such that it performs well. But to do that we need to understand what is killing it. You may also want to add the parameter facet.enum.cache.minDf=10 to lower memory usage by only usiing the filter cache for terms that match more than 100K docs. That helped a little, cut down my particular test from 10 sec to 5 sec. But still too slow. Mind you this is for an autosuggest feature. Thanks for your reply. Michael
RE: facet method=enum and uninvertedfield limitations
On Friday, November 15, 2013 11:22 AM, Lemke, Michael SZ/HZA-ZSW wrote: Judging from numerous replies this seems to be a tough question. Nevertheless, I'd really appreciate any help as we are stuck. We'd really like to know what in our index causes the facet.method=fc query to fail. Thanks, Michael On Thu, November 14, 2013 7:26 PM, Yonik Seeley wrote: On Thu, Nov 14, 2013 at 12:03 PM, Lemke, Michael SZ/HZA-ZSW lemke...@schaeffler.com wrote: I am running into performance problems with faceted queries. If I do a q=wordfacet.field=CONTENTfacet=truefacet.limit=10facet.mincount=1facet.method=fcfacet.prefix=arows=0 I am getting an exception: org.apache.solr.common.SolrException: Too many values for UnInvertedField faceting on field CONTENT at org.apache.solr.request.UnInvertedField.uninvert(UnInvertedField.java:384) at org.apache.solr.request.UnInvertedField.lt;initgt;(UnInvertedField.java:178) at org.apache.solr.request.UnInvertedField.getUnInvertedField(UnInvertedField.java:839) ... I understand it's got something to do with a 24bit limit somewhere in the code but I don't understand enough of it to be able to construct a specialized index that can be queried with facet.method=enum. You shouldn't need to do anything differently to try facet.method=enum (just replace facet.method=fc with facet.method=enum) This is true and facet.method=enum does work indeed. The problem is runtime. In particular queries with an empty facet.prefix= run many seconds if not minutes. I initially asked about this here: http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201310.mbox/%3c33ec3398272fbe47b64ee3b3e98f69a761427...@de011521.schaeffler.com%3E It was suggested that fc is much faster than enum and I'd like to test that. We are still fairly free to design the index such that it performs well. But to do that we need to understand what is killing it. You may also want to add the parameter facet.enum.cache.minDf=10 to lower memory usage by only usiing the filter cache for terms that match more than 100K docs. That helped a little, cut down my particular test from 10 sec to 5 sec. But still too slow. Mind you this is for an autosuggest feature. Thanks for your reply. Michael
RE: facet method=enum and uninvertedfield limitations
Since you are faceting on a text field (is this correct?) you deal with a lot of unique values in it. So your best bet is enum method. Also if you are on solr 4x try building doc values in the index: this suits faceting well. Otherwise start from your spec once again. Can you use shingles instead? On 19 Nov 2013 17:44, Lemke, Michael SZ/HZA-ZSW lemke...@schaeffler.com wrote: On Friday, November 15, 2013 11:22 AM, Lemke, Michael SZ/HZA-ZSW wrote: Judging from numerous replies this seems to be a tough question. Nevertheless, I'd really appreciate any help as we are stuck. We'd really like to know what in our index causes the facet.method=fc query to fail. Thanks, Michael On Thu, November 14, 2013 7:26 PM, Yonik Seeley wrote: On Thu, Nov 14, 2013 at 12:03 PM, Lemke, Michael SZ/HZA-ZSW lemke...@schaeffler.com wrote: I am running into performance problems with faceted queries. If I do a q=wordfacet.field=CONTENTfacet=truefacet.limit=10facet.mincount=1facet.method=fcfacet.prefix=arows=0 I am getting an exception: org.apache.solr.common.SolrException: Too many values for UnInvertedField faceting on field CONTENT at org.apache.solr.request.UnInvertedField.uninvert(UnInvertedField.java:384) at org.apache.solr.request.UnInvertedField.lt;initgt;(UnInvertedField.java:178) at org.apache.solr.request.UnInvertedField.getUnInvertedField(UnInvertedField.java:839) ... I understand it's got something to do with a 24bit limit somewhere in the code but I don't understand enough of it to be able to construct a specialized index that can be queried with facet.method=enum. You shouldn't need to do anything differently to try facet.method=enum (just replace facet.method=fc with facet.method=enum) This is true and facet.method=enum does work indeed. The problem is runtime. In particular queries with an empty facet.prefix= run many seconds if not minutes. I initially asked about this here: http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201310.mbox/%3c33ec3398272fbe47b64ee3b3e98f69a761427...@de011521.schaeffler.com%3E It was suggested that fc is much faster than enum and I'd like to test that. We are still fairly free to design the index such that it performs well. But to do that we need to understand what is killing it. You may also want to add the parameter facet.enum.cache.minDf=10 to lower memory usage by only usiing the filter cache for terms that match more than 100K docs. That helped a little, cut down my particular test from 10 sec to 5 sec. But still too slow. Mind you this is for an autosuggest feature. Thanks for your reply. Michael
RE: facet method=enum and uninvertedfield limitations
On Thu, November 14, 2013 7:26 PM, Yonik Seeley wrote: On Thu, Nov 14, 2013 at 12:03 PM, Lemke, Michael SZ/HZA-ZSW lemke...@schaeffler.com wrote: I am running into performance problems with faceted queries. If I do a q=wordfacet.field=CONTENTfacet=truefacet.limit=10facet.mincount=1facet.method=fcfacet.prefix=arows=0 I am getting an exception: org.apache.solr.common.SolrException: Too many values for UnInvertedField faceting on field CONTENT at org.apache.solr.request.UnInvertedField.uninvert(UnInvertedField.java:384) at org.apache.solr.request.UnInvertedField.lt;initgt;(UnInvertedField.java:178) at org.apache.solr.request.UnInvertedField.getUnInvertedField(UnInvertedField.java:839) ... I understand it's got something to do with a 24bit limit somewhere in the code but I don't understand enough of it to be able to construct a specialized index that can be queried with facet.method=enum. You shouldn't need to do anything differently to try facet.method=enum (just replace facet.method=fc with facet.method=enum) This is true and facet.method=enum does work indeed. The problem is runtime. In particular queries with an empty facet.prefix= run many seconds if not minutes. I initially asked about this here: http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201310.mbox/%3c33ec3398272fbe47b64ee3b3e98f69a761427...@de011521.schaeffler.com%3E It was suggested that fc is much faster than enum and I'd like to test that. We are still fairly free to design the index such that it performs well. But to do that we need to understand what is killing it. You may also want to add the parameter facet.enum.cache.minDf=10 to lower memory usage by only usiing the filter cache for terms that match more than 100K docs. That helped a little, cut down my particular test from 10 sec to 5 sec. But still too slow. Mind you this is for an autosuggest feature. Thanks for your reply. Michael
facet method=enum and uninvertedfield limitations
I am running into performance problems with faceted queries. If I do a q=wordfacet.field=CONTENTfacet=truefacet.limit=10facet.mincount=1facet.method=fcfacet.prefix=arows=0 I am getting an exception: org.apache.solr.common.SolrException: Too many values for UnInvertedField faceting on field CONTENT at org.apache.solr.request.UnInvertedField.uninvert(UnInvertedField.java:384) at org.apache.solr.request.UnInvertedField.lt;initgt;(UnInvertedField.java:178) at org.apache.solr.request.UnInvertedField.getUnInvertedField(UnInvertedField.java:839) ... I understand it's got something to do with a 24bit limit somewhere in the code but I don't understand enough of it to be able to construct a specialized index that can be queried with facet.method=enum. A stripped down index still doesn't work. It has exactly one field CONTENT with 178,000 Terms and ~1 mio documents. The top ranking terms according to Luke are 1 413950CONTENT word1 2 321223CONTENT word2 3 299036CONTENT word3 4 276757CONTENT word4 ... How would we have to strip the index? Thanks, Michael
Re: facet method=enum and uninvertedfield limitations
On Thu, Nov 14, 2013 at 12:03 PM, Lemke, Michael SZ/HZA-ZSW lemke...@schaeffler.com wrote: I am running into performance problems with faceted queries. If I do a q=wordfacet.field=CONTENTfacet=truefacet.limit=10facet.mincount=1facet.method=fcfacet.prefix=arows=0 I am getting an exception: org.apache.solr.common.SolrException: Too many values for UnInvertedField faceting on field CONTENT at org.apache.solr.request.UnInvertedField.uninvert(UnInvertedField.java:384) at org.apache.solr.request.UnInvertedField.lt;initgt;(UnInvertedField.java:178) at org.apache.solr.request.UnInvertedField.getUnInvertedField(UnInvertedField.java:839) ... I understand it's got something to do with a 24bit limit somewhere in the code but I don't understand enough of it to be able to construct a specialized index that can be queried with facet.method=enum. You shouldn't need to do anything differently to try facet.method=enum (just replace facet.method=fc with facet.method=enum) You may also want to add the parameter facet.enum.cache.minDf=10 to lower memory usage by only usiing the filter cache for terms that match more than 100K docs. -Yonik http://heliosearch.com -- making solr shine
Re: UnInvertedField limitations
Hi Jack, 24bit = 16M possibilities, it's clear; just to confirm... the rest is unclear, why 4-byte can have 4 million cardinality? I thought it is 4 billions... And, just to confirm: UnInvertedField allows 16M cardinality, correct? On 12-08-20 6:51 PM, Jack Krupansky j...@basetechnology.com wrote: It appears that there is a hard limit of 24-bits or 16M for the number of bytes to reference the terms in a single field of a single document. It takes 1, 2, 3, 4, or 5 bytes to reference a term. If it took 4 bytes, that would allow 16/4 or 4 million unique terms - per document. Do you have such large documents? This appears to be a hard limit based of 24-bytes in a Java int. You can try facet.method=enum, but that may be too slow. What release of Solr are you running? -- Jack Krupansky -Original Message- From: Fuad Efendi Sent: Monday, August 20, 2012 4:34 PM To: Solr-User@lucene.apache.org Subject: UnInvertedField limitations Hi All, I have a problemŠ (Yonik, please!) help me, what is Term count limits? I possibly have 256,000,000 different terms in a fieldŠ or 16,000,000? Thanks! 2012-08-20 16:20:19,262 ERROR [solr.core.SolrCore] - [pool-1-thread-1] - : org.apache.solr.common.SolrException: Too many values for UnInvertedField faceting on field enrich_keywords_string_mv at org.apache.solr.request.UnInvertedField.init(UnInvertedField.java:179) at org.apache.solr.request.UnInvertedField.getUnInvertedField(UnInvertedField .j ava:668) at org.apache.solr.request.SimpleFacets.getTermCounts(SimpleFacets.java:326) at org.apache.solr.request.SimpleFacets.getFacetFieldCounts(SimpleFacets.java :4 23) at org.apache.solr.request.SimpleFacets.getFacetCounts(SimpleFacets.java:206) at org.apache.solr.handler.component.FacetComponent.process(FacetComponent.ja va :85) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHa nd ler.java:204) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBas e. java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1561) -- Fuad Efendi http://www.tokenizer.ca
Re: UnInvertedField limitations
Hi Lance, Use case is keyword extraction, and it could be 2- and 3-grams (2- and 3- words); so that theoretically we can have 10,000^3 = 1,000,000,000,000 3-grams for English only... of course my suggestion is to use statistics and to build a dictionary of such 3-word combinations (remove top, remove tail, using frequencies)... And to hard-limit this dictionary to 1,000,000... That was business requirement which technically impossible to implement (as a realtime query results); we don't even use word stemming etc... -Fuad On 12-08-20 7:22 PM, Lance Norskog goks...@gmail.com wrote: Is this required by your application? Is there any way to reduce the number of terms? A work around is to use shards. If your terms follow Zipf's Law each shard will have fewer than the complete number of terms. For N shards, each shard will have ~1/N of the singleton terms. For 2-count terms, 1/N or 2/N will have that term. Now I'm interested but not mathematically capable: what is the general probabilistic formula for splitting Zipf's Law across shards? On Mon, Aug 20, 2012 at 3:51 PM, Jack Krupansky j...@basetechnology.com wrote: It appears that there is a hard limit of 24-bits or 16M for the number of bytes to reference the terms in a single field of a single document. It takes 1, 2, 3, 4, or 5 bytes to reference a term. If it took 4 bytes, that would allow 16/4 or 4 million unique terms - per document. Do you have such large documents? This appears to be a hard limit based of 24-bytes in a Java int. You can try facet.method=enum, but that may be too slow. What release of Solr are you running? -- Jack Krupansky -Original Message- From: Fuad Efendi Sent: Monday, August 20, 2012 4:34 PM To: Solr-User@lucene.apache.org Subject: UnInvertedField limitations Hi All, I have a problemŠ (Yonik, please!) help me, what is Term count limits? I possibly have 256,000,000 different terms in a fieldŠ or 16,000,000? Thanks! 2012-08-20 16:20:19,262 ERROR [solr.core.SolrCore] - [pool-1-thread-1] - : org.apache.solr.common.SolrException: Too many values for UnInvertedField faceting on field enrich_keywords_string_mv at org.apache.solr.request.UnInvertedField.init(UnInvertedField.java:179) at org.apache.solr.request.UnInvertedField.getUnInvertedField(UnInvertedFiel d.j ava:668) at org.apache.solr.request.SimpleFacets.getTermCounts(SimpleFacets.java:326) at org.apache.solr.request.SimpleFacets.getFacetFieldCounts(SimpleFacets.jav a:4 23) at org.apache.solr.request.SimpleFacets.getFacetCounts(SimpleFacets.java:206 ) at org.apache.solr.handler.component.FacetComponent.process(FacetComponent.j ava :85) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchH and ler.java:204) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBa se. java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1561) -- Fuad Efendi http://www.tokenizer.ca -- Lance Norskog goks...@gmail.com
Re: UnInvertedField limitations
It's actually limited to 24 bits to point to the term list in a byte[], but there are 256 different arrays, so the maximum capacity is 4B bytes of un-inverted terms, but each bucket is limited to 4B/256 so the real limit can come in at a little less due to luck. From the comments: * There is a single int[maxDoc()] which either contains a pointer into a byte[] for * the termNumber lists, or directly contains the termNumber list if it fits in the 4 * bytes of an integer. If the first byte in the integer is 1, the next 3 bytes * are a pointer into a byte[] where the termNumber list starts. * * There are actually 256 byte arrays, to compensate for the fact that the pointers * into the byte arrays are only 3 bytes long. The correct byte array for a document * is a function of it's id. -Yonik http://lucidworks.com On Thu, Sep 6, 2012 at 6:33 PM, Fuad Efendi f...@efendi.ca wrote: Hi Jack, 24bit = 16M possibilities, it's clear; just to confirm... the rest is unclear, why 4-byte can have 4 million cardinality? I thought it is 4 billions... And, just to confirm: UnInvertedField allows 16M cardinality, correct? On 12-08-20 6:51 PM, Jack Krupansky j...@basetechnology.com wrote: It appears that there is a hard limit of 24-bits or 16M for the number of bytes to reference the terms in a single field of a single document. It takes 1, 2, 3, 4, or 5 bytes to reference a term. If it took 4 bytes, that would allow 16/4 or 4 million unique terms - per document. Do you have such large documents? This appears to be a hard limit based of 24-bytes in a Java int. You can try facet.method=enum, but that may be too slow. What release of Solr are you running? -- Jack Krupansky -Original Message- From: Fuad Efendi Sent: Monday, August 20, 2012 4:34 PM To: Solr-User@lucene.apache.org Subject: UnInvertedField limitations Hi All, I have a problemŠ (Yonik, please!) help me, what is Term count limits? I possibly have 256,000,000 different terms in a fieldŠ or 16,000,000? Thanks! 2012-08-20 16:20:19,262 ERROR [solr.core.SolrCore] - [pool-1-thread-1] - : org.apache.solr.common.SolrException: Too many values for UnInvertedField faceting on field enrich_keywords_string_mv at org.apache.solr.request.UnInvertedField.init(UnInvertedField.java:179) at org.apache.solr.request.UnInvertedField.getUnInvertedField(UnInvertedField .j ava:668) at org.apache.solr.request.SimpleFacets.getTermCounts(SimpleFacets.java:326) at org.apache.solr.request.SimpleFacets.getFacetFieldCounts(SimpleFacets.java :4 23) at org.apache.solr.request.SimpleFacets.getFacetCounts(SimpleFacets.java:206) at org.apache.solr.handler.component.FacetComponent.process(FacetComponent.ja va :85) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHa nd ler.java:204) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBas e. java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1561) -- Fuad Efendi http://www.tokenizer.ca
UnInvertedField limitations
Hi All, I have a problem (Yonik, please!) help me, what is Term count limits? I possibly have 256,000,000 different terms in a field or 16,000,000? Can I temporarily disable tho feature? Thanks! 2012-08-20 16:20:19,262 ERROR [solr.core.SolrCore] - [pool-1-thread-1] - : org.apache.solr.common.SolrException: Too many values for UnInvertedField faceting on field enrich_keywords_string_mv at org.apache.solr.request.UnInvertedField.init(UnInvertedField.java:179) at org.apache.solr.request.UnInvertedField.getUnInvertedField(UnInvertedField.j ava:668) at org.apache.solr.request.SimpleFacets.getTermCounts(SimpleFacets.java:326) at org.apache.solr.request.SimpleFacets.getFacetFieldCounts(SimpleFacets.java:4 23) at org.apache.solr.request.SimpleFacets.getFacetCounts(SimpleFacets.java:206) at org.apache.solr.handler.component.FacetComponent.process(FacetComponent.java :85) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHand ler.java:204) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase. java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1561) -- Fuad Efendi http://www.tokenizer.ca
UnInvertedField limitations
Hi All, I have a problem (Yonik, please!) help me, what is Term count limits? I possibly have 256,000,000 different terms in a field or 16,000,000? Thanks! 2012-08-20 16:20:19,262 ERROR [solr.core.SolrCore] - [pool-1-thread-1] - : org.apache.solr.common.SolrException: Too many values for UnInvertedField faceting on field enrich_keywords_string_mv at org.apache.solr.request.UnInvertedField.init(UnInvertedField.java:179) at org.apache.solr.request.UnInvertedField.getUnInvertedField(UnInvertedField.j ava:668) at org.apache.solr.request.SimpleFacets.getTermCounts(SimpleFacets.java:326) at org.apache.solr.request.SimpleFacets.getFacetFieldCounts(SimpleFacets.java:4 23) at org.apache.solr.request.SimpleFacets.getFacetCounts(SimpleFacets.java:206) at org.apache.solr.handler.component.FacetComponent.process(FacetComponent.java :85) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHand ler.java:204) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase. java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1561) -- Fuad Efendi http://www.tokenizer.ca
Re: UnInvertedField limitations
It appears that there is a hard limit of 24-bits or 16M for the number of bytes to reference the terms in a single field of a single document. It takes 1, 2, 3, 4, or 5 bytes to reference a term. If it took 4 bytes, that would allow 16/4 or 4 million unique terms - per document. Do you have such large documents? This appears to be a hard limit based of 24-bytes in a Java int. You can try facet.method=enum, but that may be too slow. What release of Solr are you running? -- Jack Krupansky -Original Message- From: Fuad Efendi Sent: Monday, August 20, 2012 4:34 PM To: Solr-User@lucene.apache.org Subject: UnInvertedField limitations Hi All, I have a problemŠ (Yonik, please!) help me, what is Term count limits? I possibly have 256,000,000 different terms in a fieldŠ or 16,000,000? Thanks! 2012-08-20 16:20:19,262 ERROR [solr.core.SolrCore] - [pool-1-thread-1] - : org.apache.solr.common.SolrException: Too many values for UnInvertedField faceting on field enrich_keywords_string_mv at org.apache.solr.request.UnInvertedField.init(UnInvertedField.java:179) at org.apache.solr.request.UnInvertedField.getUnInvertedField(UnInvertedField.j ava:668) at org.apache.solr.request.SimpleFacets.getTermCounts(SimpleFacets.java:326) at org.apache.solr.request.SimpleFacets.getFacetFieldCounts(SimpleFacets.java:4 23) at org.apache.solr.request.SimpleFacets.getFacetCounts(SimpleFacets.java:206) at org.apache.solr.handler.component.FacetComponent.process(FacetComponent.java :85) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHand ler.java:204) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase. java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1561) -- Fuad Efendi http://www.tokenizer.ca
Re: UnInvertedField limitations
Is this required by your application? Is there any way to reduce the number of terms? A work around is to use shards. If your terms follow Zipf's Law each shard will have fewer than the complete number of terms. For N shards, each shard will have ~1/N of the singleton terms. For 2-count terms, 1/N or 2/N will have that term. Now I'm interested but not mathematically capable: what is the general probabilistic formula for splitting Zipf's Law across shards? On Mon, Aug 20, 2012 at 3:51 PM, Jack Krupansky j...@basetechnology.com wrote: It appears that there is a hard limit of 24-bits or 16M for the number of bytes to reference the terms in a single field of a single document. It takes 1, 2, 3, 4, or 5 bytes to reference a term. If it took 4 bytes, that would allow 16/4 or 4 million unique terms - per document. Do you have such large documents? This appears to be a hard limit based of 24-bytes in a Java int. You can try facet.method=enum, but that may be too slow. What release of Solr are you running? -- Jack Krupansky -Original Message- From: Fuad Efendi Sent: Monday, August 20, 2012 4:34 PM To: Solr-User@lucene.apache.org Subject: UnInvertedField limitations Hi All, I have a problemŠ (Yonik, please!) help me, what is Term count limits? I possibly have 256,000,000 different terms in a fieldŠ or 16,000,000? Thanks! 2012-08-20 16:20:19,262 ERROR [solr.core.SolrCore] - [pool-1-thread-1] - : org.apache.solr.common.SolrException: Too many values for UnInvertedField faceting on field enrich_keywords_string_mv at org.apache.solr.request.UnInvertedField.init(UnInvertedField.java:179) at org.apache.solr.request.UnInvertedField.getUnInvertedField(UnInvertedField.j ava:668) at org.apache.solr.request.SimpleFacets.getTermCounts(SimpleFacets.java:326) at org.apache.solr.request.SimpleFacets.getFacetFieldCounts(SimpleFacets.java:4 23) at org.apache.solr.request.SimpleFacets.getFacetCounts(SimpleFacets.java:206) at org.apache.solr.handler.component.FacetComponent.process(FacetComponent.java :85) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHand ler.java:204) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase. java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1561) -- Fuad Efendi http://www.tokenizer.ca -- Lance Norskog goks...@gmail.com