Re: Excessive Heap Usage from docValues?
On Wed, 2014-03-19 at 22:01 +0100, tradergene wrote: I have a Solr index with about 32 million docs. Each doc is relatively small but has multiple dynamic fields that are storing INTs. The initial problem that I had to resolve is that we were running into OOMs (on a 48GB heap, 130GB on-disk index). I narrowed that issue down to Lucene FieldCache filling up the heap due to all the dynamic fields. 48GB heap for a 130GB, 32M docs index sounds excessive. Could you tell us how many unique fields your searcher uses in total for faceting and maybe the overall layout of your index? Is this perhaps a case of many distinct groups of data put in the same index, where the searches are always within a single group and each group has its own fields for faceting? Are the fields single- or multi-valued? - Toke Eskildsen, State and University Library, Denmark
Re: w/10 ? [was: Partial Counts in SOLR]
Yup! On Thu, Mar 20, 2014 at 5:13 AM, Otis Gospodnetic otis.gospodne...@gmail.com wrote: Hi, Guessing it's surround query parser's support for within backed by span queries. Otis Solr ElasticSearch Support http://sematext.com/ On Mar 19, 2014 4:44 PM, T. Kuro Kurosaka k...@healthline.com wrote: In the thread Partial Counts in SOLR, Salman gave us this sample query: ((stock or share*) w/10 (sale or sell* or sold or bought or buy* or purchase* or repurchase*)) w/10 (executive or director) I'm not familiar with this w/10 notation. What does this mean, and what parser(s) supports this syntax? Kuro -- Regards, Salman Akram
Re: Solr memory usage off-heap
thanks! On Tue, Mar 18, 2014 at 4:37 PM, Erick Erickson erickerick...@gmail.comwrote: Avishai: It sounds like you already understand mmap. Even so you might be interested in this excellent writeup of MMapDirectory and Lucene by Uwe: http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html Best, Erick On Tue, Mar 18, 2014 at 7:23 AM, Avishai Ish-Shalom avis...@fewbytes.com wrote: aha! mmap explains it. thank you. On Tue, Mar 18, 2014 at 3:11 PM, Shawn Heisey s...@elyograg.org wrote: On 3/18/2014 5:30 AM, Avishai Ish-Shalom wrote: My solr instances are configured with 10GB heap (Xmx) but linux shows resident size of 16-20GB. even with thread stack and permgen taken into account i'm still far off from these numbers. Could it be that jvm IO buffers take so much space? does lucene use JNI/JNA memory allocations? Solr does not do anything off-heap. There is a project called heliosearch underway that aims to use off-heap memory extensively with Solr. There IS some mis-reporting of memory usage, though. See a screenshot that I just captured of top output, sorted by memory usage. The java process at the top of the list is Solr, running under the included Jetty: https://www.dropbox.com/s/03a3pp510mrtixo/solr-ram-usage-wrong.png I have a 6GB heap and 52GB of index data on this server. This makes the 62.2GB virtual memory size completely reasonable. The claimed resident memory size is 20GB, though. If you add that 20GB to the 49GB that is allocated to the OS disk cache and the 6GB that it says is free, that's 75GB. I've only got 64GB of RAM on the box, so something is being reported wrong. If I take my 20GB resident size and subtract the 14GB shared size, that is closer to reality, and it makes the numbers fit into the actual amount of RAM that's on the machine. I believe the misreporting is caused by the specific way that Java uses MMap when opening Lucene indexes. This information comes from what I remember about a conversation I witnessed in #lucene or #lucene-dev, not from my own exploration. I believe they said that the MMap methods which don't misreport memory usage would not do what Lucene requires. Thanks, Shawn
wrong query results with wdf and ngtf
Is there a way to tell ngramfilterfactory while indexing that number shall never be tokenized? then the query should be able to find numbers. Or do i have to change the ngram-min for numbers (not alpha) to 1, if that is possible? So to speak put the hole number as token and not all possible tokens. Solr analysis shows onnly WDF has no underscore in its tokens, the rest have it. can i tell the query to search numbers differently with NGTF, WT, LCF or whatever? I also tried filter class=solr.WordDelimiterFilterFactory types=at-under-alpha.txt/ @ = ALPHA _ = ALPHA I have gotten nearly everything to work. There are to queries where i dont get back what i want. avaloq frage 1- only returns if i set minGramSize=1 while indexing yh_cug- query parser doesn't remove _ but the indexer does (WDF) so there is no match Is there a way to also query the hole term avaloq frage 1 without tokenizing it? Fieldtype: fieldType name=text_de class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.WordDelimiterFilterFactory types=at-under-alpha.txt/ filter class=solr.StopFilterFactory ignoreCase=true words=lang/stopwords_de.txt format=snowball enablePositionIncrements=true/ !-- remove common words -- filter class=solr.GermanNormalizationFilterFactory/ filter class=solr.SnowballPorterFilterFactory language=German/ !-- remove noun/adjective inflections like plural endings -- filter class=solr.NGramFilterFactory minGramSize=3 maxGramSize=15/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ /analyzer analyzer type=query tokenizer class=solr.WhiteSpaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.WordDelimiterFilterFactory types=at-under-alpha.txt/ filter class=solr.StopFilterFactory ignoreCase=true words=lang/stopwords_de.txt format=snowball enablePositionIncrements=true/ !-- remove common words -- filter class=solr.GermanNormalizationFilterFactory/ filter class=solr.SnowballPorterFilterFactory language=German/ /analyzer /fieldType Solrconfig: queryParser name=synonym_edismax class=solr.SynonymExpandingExtendedDismaxQParserPlugin lst name=synonymAnalyzers lst name=myCoolAnalyzer lst name=tokenizer str name=classstandard/str /lst lst name=filter str name=classshingle/str str name=outputUnigramsIfNoShinglestrue/str str name=outputUnigramstrue/str str name=minShingleSize2/str str name=maxShingleSize4/str /lst lst name=filter str name=classsynonym/str str name=tokenizerFactorysolr.KeywordTokenizerFactory/str str name=synonymssynonyms.txt/str str name=expandtrue/str str name=ignoreCasetrue/str /lst /lst /lst /queryParser requestHandler name=/select2 class=solr.SearchHandler lst name=defaults str name=echoParamsexplicit/str int name=rows10/int str name=defTypesynonym_edismax/str str name=synonymstrue/str str name=qfplain_text^10 editorschoice^200 title^20 h_*^14 tags^10 thema^15 inhaltstyp^6 breadcrumb^6 doctype^10 contentmanager^5 links^5 last_modified^5 url^5 /str str name=bq(expiration:[NOW TO *] OR (*:* -expiration:*))^6/str str name=bfdiv(clicks,max(displays,1))^8/str !-- tested -- str name=dftext/str str name=fl*,path,score/str str name=wtjson/str str name=q.opAND/str !-- Highlighting defaults -- str name=hlon/str str name=hl.flplain_text,title/str str name=hl.fragSize200/str str name=hl.simple.prelt;bgt;/str str name=hl.simple.postlt;/bgt;/str !-- lst name=invariants -- str name=faceton/str str name=facet.mincount1/str str name=facet.field{!ex=inhaltstyp_s}inhaltstyp_s/str str name=f.inhaltstyp_s.facet.sortindex/str str name=facet.field{!ex=doctype}doctype/str str name=f.doctype.facet.sortindex/str str name=facet.field{!ex=thema_f}thema_f/str str name=f.thema_f.facet.sortindex/str str name=facet.field{!ex=author_s}author_s/str str name=f.author_s.facet.sortindex/str str name=facet.field{!ex=sachverstaendiger_s}sachverstaendiger_s/str str name=f.sachverstaendiger_s.facet.sortindex/str str name=facet.field{!ex=veranstaltung_s}veranstaltung_s/str str name=f.veranstaltung_s.facet.sortindex/str str name=facet.date{!ex=last_modified}last_modified/str str name=facet.date.gap+1MONTH/str str name=facet.date.endNOW/MONTH+1MONTH/str str name=facet.date.startNOW/MONTH-36MONTHS/str
Re: join and filter query with AND
Nope. There is no line break in the string and it is not feed from file. What else could be the reason ? On 19 March 2014 17:57, Erick Erickson erickerick...@gmail.com wrote: It looks to me like you're feeding this from some kind of text file and you really _do_ have a line break after Stara Or have a line break in the string you paste into the URL or something similar. Kind of shooting in the dark though. Erick On Wed, Mar 19, 2014 at 8:48 AM, Marcin Rzewucki mrzewu...@gmail.com wrote: Hi, I have the following issue with join query parser and filter query. For such query: str name=q*:*/str str name=fq (({!join from=inner_id to=outer_id fromIndex=othercore}city:Stara Zagora)) AND (prod:214) /str I got error: lst name=error str name=msg org.apache.solr.search.SyntaxError: Cannot parse 'city:Stara': Lexical error at line 1, column 12. Encountered: EOF after : \Stara /str int name=code400/int /lst Stack: DEBUG - 2014-03-19 13:35:20.825; org.eclipse.jetty.servlet.ServletHandler; chain=SolrRequestFilter-default DEBUG - 2014-03-19 13:35:20.826; org.eclipse.jetty.servlet.ServletHandler$CachedChain; call filter SolrRequestFilter ERROR - 2014-03-19 13:35:20.828; org.apache.solr.common.SolrException; org.apache.solr.common.SolrException: org.apache.solr.search.SyntaxError: Cannot parse 'city:Stara': Lexical error at line 1, column 12. E ncountered: EOF after : \Stara at org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:179) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:193) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1916) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:780) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:427) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:217) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) at org.eclipse.jetty.server.Server.handle(Server.java:364) at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489) at org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53) at org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942) at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640) at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235) at org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72) at org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543) at java.lang.Thread.run(Thread.java:744) Caused by: org.apache.solr.search.SyntaxError: Cannot parse 'city:Stara': Lexical error at line 1, column 12. Encountered: EOF after : \Stara at org.apache.solr.parser.SolrQueryParserBase.parse(SolrQueryParserBase.java:159) at org.apache.solr.search.LuceneQParser.parse(LuceneQParser.java:50) at org.apache.solr.search.QParser.getQuery(QParser.java:141) at
Problems with the Suggest Request Handler in Solr 4.7.0
The Suggest Search Component that comes preconfigured in Solr 4.7.0 solrconfig.xml seems to thread dump when I call it: http://localhost:8983/solr/suggest?spellcheck=onq=acwt=jsonindent=true msg:No suggester named default was configured Can someone tell me what's going on there? However, I can stop that happening if I replace the preconfigured Suggest Search Component and Request Handler with the Search Component and Request Handler configuration detailed here: https://cwiki.apache.org/confluence/display/solr/Suggester ...but after indexing the data in exampledocs, it doesn't seem to return any suggestions either. Can anyone help suggest how I might get suggest suggesting suggestions? -- Steve Huckle If you print this email, eventually you'll want to throw it away. But there is no away. So don't print this email, even if you have to.
Solr dih to read Clob contents
Hi, I have a requirement to index a database table with clob content. Each row in my table a column which is an xml stored as clob. I want to read the contents of xmlthrough dih and map each of the xml tag to a separate solr field, Below is my clob content. root authorA/author date02-Dec-2013/date . . . /root i want to read the contents of the clob and map author to author_solr and date to date_solr . Is this possible with a clob tranformer or a script tranformer. Thanks, Prasi
wrong results with wdf ngtf
Is there a way to tell ngramfilterfactory while indexing that number shall never be tokenized? then the query should be able to find numbers. Or do i have to change the ngram-min for numbers (not alpha) to 1, if that is possible? So to speak put the hole number as token and not all possible tokens. Solr analysis shows onnly WDF has no underscore in its tokens, the rest have it. can i tell the query to search numbers differently with NGTF, WT, LCF or whatever? I also tried filter class=solr.WordDelimiterFilterFactory types=at-under-alpha.txt/ @ = ALPHA _ = ALPHA I have gotten nearly everything to work. There are to queries where i dont get back what i want. avaloq frage 1 - only returns if i set minGramSize=1 while indexing yh_cug- query parser doesn't remove _ but the indexer does (WDF) so there is no match Is there a way to also query the hole term avaloq frage 1 without tokenizing it? Fieldtype: fieldType name=text_de class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.WordDelimiterFilterFactory types=at-under-alpha.txt/ filter class=solr.StopFilterFactory ignoreCase=true words=lang/stopwords_de.txt format=snowball enablePositionIncrements=true/ !-- remove common words -- filter class=solr.GermanNormalizationFilterFactory/ filter class=solr.SnowballPorterFilterFactory language=German/ !-- remove noun/adjective inflections like plural endings -- filter class=solr.NGramFilterFactory minGramSize=3 maxGramSize=15/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ /analyzer analyzer type=query tokenizer class=solr.WhiteSpaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.WordDelimiterFilterFactory types=at-under-alpha.txt/ filter class=solr.StopFilterFactory ignoreCase=true words=lang/stopwords_de.txt format=snowball enablePositionIncrements=true/ !-- remove common words -- filter class=solr.GermanNormalizationFilterFactory/ filter class=solr.SnowballPorterFilterFactory language=German/ /analyzer /fieldType Solrconfig: queryParser name=synonym_edismax class=solr.SynonymExpandingExtendedDismaxQParserPlugin lst name=synonymAnalyzers lst name=myCoolAnalyzer lst name=tokenizer str name=classstandard/str /lst lst name=filter str name=classshingle/str str name=outputUnigramsIfNoShinglestrue/str str name=outputUnigramstrue/str str name=minShingleSize2/str str name=maxShingleSize4/str /lst lst name=filter str name=classsynonym/str str name=tokenizerFactorysolr.KeywordTokenizerFactory/str str name=synonymssynonyms.txt/str str name=expandtrue/str str name=ignoreCasetrue/str /lst /lst /lst /queryParser requestHandler name=/select2 class=solr.SearchHandler lst name=defaults str name=echoParamsexplicit/str int name=rows10/int str name=defTypesynonym_edismax/str str name=synonymstrue/str str name=qfplain_text^10 editorschoice^200 title^20 h_*^14 tags^10 thema^15 inhaltstyp^6 breadcrumb^6 doctype^10 contentmanager^5 links^5 last_modified^5 url^5 /str str name=bq(expiration:[NOW TO *] OR (*:* -expiration:*))^6/str str name=bfdiv(clicks,max(displays,1))^8/str !-- tested -- str name=dftext/str str name=fl*,path,score/str str name=wtjson/str str name=q.opAND/str !-- Highlighting defaults -- str name=hlon/str str name=hl.flplain_text,title/str str name=hl.fragSize200/str str name=hl.simple.prelt;bgt;/str str name=hl.simple.postlt;/bgt;/str !-- lst name=invariants -- str name=faceton/str str name=facet.mincount1/str str name=facet.field{!ex=inhaltstyp_s}inhaltstyp_s/str str name=f.inhaltstyp_s.facet.sortindex/str str name=facet.field{!ex=doctype}doctype/str str name=f.doctype.facet.sortindex/str str name=facet.field{!ex=thema_f}thema_f/str str name=f.thema_f.facet.sortindex/str str name=facet.field{!ex=author_s}author_s/str str name=f.author_s.facet.sortindex/str str
Re: Solr dih to read Clob contents
On 20 March 2014 14:53, Prasi S prasi1...@gmail.com wrote: Hi, I have a requirement to index a database table with clob content. Each row in my table a column which is an xml stored as clob. I want to read the contents of xmlthrough dih and map each of the xml tag to a separate solr field, Below is my clob content. root authorA/author date02-Dec-2013/date . . . /root i want to read the contents of the clob and map author to author_solr and date to date_solr . Is this possible with a clob tranformer or a script tranformer. You will need to use a FieldReaderDataSource, and a XPathEntityProcessor along with the ClobTransformer. You do not provide details of your DIH data configuration file, but this should look something like: dataSource name=xmldata type=FieldReaderDataSource/ ... document entity name=x query=... transformer=ClobTransformer entity name=y dataSource=xmldata dataField=x.clob_column processor=XPathEntityProcessor forEach=/root field column=author_solr xpath=/author / field column=date_solr xpath=/date / /entity /entity /document Regards, Gora
Re: Solr4.7 No live SolrServers available to handle this request
Sathya, I assume you're using Solr Cloud. Please provide your clusterstate.json while you're seeing this issue and check your logs for any exceptions. With no information from you it's hard to troubleshoot any issues! Thanks, Greg On Mar 20, 2014, at 12:44 AM, Sathya sathia.blacks...@gmail.com wrote: Hi Friends, I am new to Solr. I have 5 solr node in 5 different machine. When i index the data, sometimes *No live SolrServers available to handle this request* exception occur in 1 or 2 machines. I dont know why its happen and how to solve this. Kindly help me to solve this issue. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr4-7-No-live-SolrServers-available-to-handle-this-request-tp4125679.html Sent from the Solr - User mailing list archive at Nabble.com.
wrong results with wdf ngtf
Is there a way to tell ngramfilterfactory while indexing that number shall never be tokenized? then the query should be able to find numbers. Or do i have to change the ngram-min for numbers (not alpha) to 1, if that is possible? So to speak put the hole number as token and not all possible tokens. Solr analysis shows onnly WDF has no underscore in its tokens, the rest have it. can i tell the query to search numbers differently with NGTF, WT, LCF or whatever? I also tried filter class=solr.WordDelimiterFilterFactory types=at-under-alpha.txt/ @ = ALPHA _ = ALPHA I have gotten nearly everything to work. There are to queries where i dont get back what i want. avaloq frage 1 - only returns if i set minGramSize=1 while indexing yh_cug- query parser doesn't remove _ but the indexer does (WDF) so there is no match Is there a way to also query the hole term avaloq frage 1 without tokenizing it? Fieldtype: fieldType name=text_de class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.WordDelimiterFilterFactory types=at-under-alpha.txt/ filter class=solr.StopFilterFactory ignoreCase=true words=lang/stopwords_de.txt format=snowball enablePositionIncrements=true/ !-- remove common words -- filter class=solr.GermanNormalizationFilterFactory/ filter class=solr.SnowballPorterFilterFactory language=German/ !-- remove noun/adjective inflections like plural endings -- filter class=solr.NGramFilterFactory minGramSize=3 maxGramSize=15/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ /analyzer analyzer type=query tokenizer class=solr.WhiteSpaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.WordDelimiterFilterFactory types=at-under-alpha.txt/ filter class=solr.StopFilterFactory ignoreCase=true words=lang/stopwords_de.txt format=snowball enablePositionIncrements=true/ !-- remove common words -- filter class=solr.GermanNormalizationFilterFactory/ filter class=solr.SnowballPorterFilterFactory language=German/ /analyzer /fieldType Solrconfig: queryParser name=synonym_edismax class=solr.SynonymExpandingExtendedDismaxQParserPlugin lst name=synonymAnalyzers lst name=myCoolAnalyzer lst name=tokenizer str name=classstandard/str /lst lst name=filter str name=classshingle/str str name=outputUnigramsIfNoShinglestrue/str str name=outputUnigramstrue/str str name=minShingleSize2/str str name=maxShingleSize4/str /lst lst name=filter str name=classsynonym/str str name=tokenizerFactorysolr.KeywordTokenizerFactory/str str name=synonymssynonyms.txt/str str name=expandtrue/str str name=ignoreCasetrue/str /lst /lst /lst /queryParser requestHandler name=/select2 class=solr.SearchHandler lst name=defaults str name=echoParamsexplicit/str int name=rows10/int str name=defTypesynonym_edismax/str str name=synonymstrue/str str name=qfplain_text^10 editorschoice^200 title^20 h_*^14 tags^10 thema^15 inhaltstyp^6 breadcrumb^6 doctype^10 contentmanager^5 links^5 last_modified^5 url^5 /str str name=bq(expiration:[NOW TO *] OR (*:* -expiration:*))^6/str str name=bfdiv(clicks,max(displays,1))^8/str !-- tested -- str name=dftext/str str name=fl*,path,score/str str name=wtjson/str str name=q.opAND/str !-- Highlighting defaults -- str name=hlon/str str name=hl.flplain_text,title/str str name=hl.fragSize200/str str name=hl.simple.prelt;bgt;/str str name=hl.simple.postlt;/bgt;/str !-- lst name=invariants -- str name=faceton/str str name=facet.mincount1/str str name=facet.field{!ex=inhaltstyp_s}inhaltstyp_s/str str name=f.inhaltstyp_s.facet.sortindex/str str name=facet.field{!ex=doctype}doctype/str str name=f.doctype.facet.sortindex/str str name=facet.field{!ex=thema_f}thema_f/str str name=f.thema_f.facet.sortindex/str str name=facet.field{!ex=author_s}author_s/str str name=f.author_s.facet.sortindex/str str name=facet.field{!ex=sachverstaendiger_s}sachverstaendiger_s/str str
Bootstrapping SolrCloud cluster with multiple collections in differene sharding/replication setup
Hi, I would like some advice about the best way to bootstrap from scratch a SolrCloud cluster housing at least two collections with different sharding/replication setup. Going through the docs/'Solr In Action' book what I have sees so far is that there is a way to bootstrap a SolrCloud cluster with sharding configuration using the: -DnumShards=2 but this (afaik) works only for a single collection. What I need is a way to deploy from scratch a SolrCloud cluster housing (e.g.) two collections Foo and Bar where Foo has only one shard and is replicated everywhere while Bar has three shards and ,again, is replicated. I can't find a config file where to put this sharding plan and I'm starting to think that the only way to do this is after the deploy using the Collections API. Is there a best approach way to do this ? Ugo
Re: join and filter query with AND
Well, the error message really looks like your input is getting chopped off. It's vaguely possible that you have some super-low limit in your servlet container configuration that is only letting very small packets through. What I'd do is look in the Solr log file to see exactly what is coming through. Because regardless of what you _think_ you're sending, it _really_ looks like Solr is getting the fq clause with something that breaks it up. So I'd like to absolutely nail that as being wrong before speculating. Because I can cut/paste your fq clause just fine. Of course it fails because I don't have the other core defined, but that means the query has made it through query parsing while yours hasn't in your setup. Best, Erick On Thu, Mar 20, 2014 at 2:19 AM, Marcin Rzewucki mrzewu...@gmail.com wrote: Nope. There is no line break in the string and it is not feed from file. What else could be the reason ? On 19 March 2014 17:57, Erick Erickson erickerick...@gmail.com wrote: It looks to me like you're feeding this from some kind of text file and you really _do_ have a line break after Stara Or have a line break in the string you paste into the URL or something similar. Kind of shooting in the dark though. Erick On Wed, Mar 19, 2014 at 8:48 AM, Marcin Rzewucki mrzewu...@gmail.com wrote: Hi, I have the following issue with join query parser and filter query. For such query: str name=q*:*/str str name=fq (({!join from=inner_id to=outer_id fromIndex=othercore}city:Stara Zagora)) AND (prod:214) /str I got error: lst name=error str name=msg org.apache.solr.search.SyntaxError: Cannot parse 'city:Stara': Lexical error at line 1, column 12. Encountered: EOF after : \Stara /str int name=code400/int /lst Stack: DEBUG - 2014-03-19 13:35:20.825; org.eclipse.jetty.servlet.ServletHandler; chain=SolrRequestFilter-default DEBUG - 2014-03-19 13:35:20.826; org.eclipse.jetty.servlet.ServletHandler$CachedChain; call filter SolrRequestFilter ERROR - 2014-03-19 13:35:20.828; org.apache.solr.common.SolrException; org.apache.solr.common.SolrException: org.apache.solr.search.SyntaxError: Cannot parse 'city:Stara': Lexical error at line 1, column 12. E ncountered: EOF after : \Stara at org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:179) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:193) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1916) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:780) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:427) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:217) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) at org.eclipse.jetty.server.Server.handle(Server.java:364) at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489) at org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53) at org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942) at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640) at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235) at
Re: Bootstrapping SolrCloud cluster with multiple collections in differene sharding/replication setup
Honestly, the best approach is to start with no collections defined and use the collections api. If you want to prefconfigure (which has it’s warts and will likely go away as an option), it’s tricky to do it with different numShards, as that is a global property per node. You would basically set -DnumShards=1 and start your cluster with Foo defined. Then you stop the cluster and define Bar and start with -DnumShards=3. The ability to preconfigure and bootstrap like this was kind of a transitional system meant to help people that knew Solr pre SolrCloud get something up quickly back before we had a collections api. The collections API is much better if you want multiple collections and it’s the future. -- Mark Miller about.me/markrmiller On March 20, 2014 at 10:24:18 AM, Ugo Matrangolo (ugo.matrang...@gmail.com) wrote: Hi, I would like some advice about the best way to bootstrap from scratch a SolrCloud cluster housing at least two collections with different sharding/replication setup. Going through the docs/'Solr In Action' book what I have sees so far is that there is a way to bootstrap a SolrCloud cluster with sharding configuration using the: -DnumShards=2 but this (afaik) works only for a single collection. What I need is a way to deploy from scratch a SolrCloud cluster housing (e.g.) two collections Foo and Bar where Foo has only one shard and is replicated everywhere while Bar has three shards and ,again, is replicated. I can't find a config file where to put this sharding plan and I'm starting to think that the only way to do this is after the deploy using the Collections API. Is there a best approach way to do this ? Ugo
Re: Bootstrapping SolrCloud cluster with multiple collections in differene sharding/replication setup
You might find this useful: http://heliosearch.org/solrcloud-assigning-nodes-machines/ It uses the collections API to create your collection with zero nodes, then shows how to assign your leaders to specific machines (well, at least specify the nodes the leaders will be created on, it doesn't show how to assign, for instance, shard1 to nodeX) It also shows a way to assign specific replicas on specific nodes to specific shards, although as Mark says this is a transitional technique. I know there's an addreplica command in the works for the collections API that should make this easier, but that's not released yet. Best, Erick On Thu, Mar 20, 2014 at 7:23 AM, Ugo Matrangolo ugo.matrang...@gmail.com wrote: Hi, I would like some advice about the best way to bootstrap from scratch a SolrCloud cluster housing at least two collections with different sharding/replication setup. Going through the docs/'Solr In Action' book what I have sees so far is that there is a way to bootstrap a SolrCloud cluster with sharding configuration using the: -DnumShards=2 but this (afaik) works only for a single collection. What I need is a way to deploy from scratch a SolrCloud cluster housing (e.g.) two collections Foo and Bar where Foo has only one shard and is replicated everywhere while Bar has three shards and ,again, is replicated. I can't find a config file where to put this sharding plan and I'm starting to think that the only way to do this is after the deploy using the Collections API. Is there a best approach way to do this ? Ugo
Multilingual indexing, search results, edismax and stopwords
On our drupal multilingual system we use apache Solr 3.5. The problem is well known on different blogs, sites I read. The search results are not the one we want. On our code in hook apachesolr_query_alter we override the defaultOperator: $query-replaceParam('mm', '90%'); The requirement is, when I search for: biological analyses, I want to fetch only the results which have both of the words. When I search for: biological and chemical analyses, I want it to fetch only the results which have biological , chemical, analyses. The and is not indexed due to stopwords. If I set mm to 100% and my query has stopwords it will not fetch any result. If I set mm to 100$ and my query does not have stopwords it will fetch the desired results. If I set mm anything between 50%-99% it fetches not wanted results, as results that contain only one of the searched keywords, or words like the searched keywords, like analyse (even if I searched for analyses). If I search using + before the words that are mandatory it works ok, but it is not user friently, to ask from the user to type + before each word exvcept from the stopwords. Do I make any sense? Below are some of our configuration details: All the indexed fields are of type text_language, e.g from our schema.xml /field name=label type=text indexed=true stored=true termVectors=true omitNorms=true/ field name=i18n_label_en type=text_en indexed=true stored=true termVectors=true omitNorms=true/ field name=i18n_label_fr type=text_fr indexed=true stored=true termVectors=true omitNorms=true// All the text fieldtypes have the same configuration except from the protected, words, dictionary parameters which are language specific. e.g from our schema.xml /fieldType name=text_en class=solr.TextField positionIncrementGap=100 analyzer type=index charFilter class=solr.MappingCharFilterFactory mapping=mapping-ISOLatin1Accent_en.txt/ tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords_en.txt enablePositionIncrements=true/ filter class=solr.WordDelimiterFilterFactory protected=protwords.txt generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=1 splitOnCaseChange=1 preserveOriginal=1 splitOnNumerics=1 stemEnglishPossessive=1/ filter class=solr.LengthFilterFactory min=2 max=100/ filter class=solr.LowerCaseFilterFactory/filter class=solr.DictionaryCompoundWordTokenFilterFactory dictionary=compoundwords_en.txt minWordSize=5 minSubwordSize=4 maxSubwordSize=15 onlyLongestMatch=true/ filter class=solr.SnowballPorterFilterFactory language=English protected=protwords_en.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer analyzer type=query charFilter class=solr.MappingCharFilterFactory mapping=mapping-ISOLatin1Accent_en.txt/ tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms_en.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords_en.txt enablePositionIncrements=true/ filter class=solr.WordDelimiterFilterFactory protected=protwords.txt generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=1 splitOnCaseChange=1 preserveOriginal=1 splitOnNumerics=1 stemEnglishPossessive=1/ filter class=solr.LengthFilterFactory min=2 max=100/ filter class=solr.LowerCaseFilterFactory/filter class=solr.DictionaryCompoundWordTokenFilterFactory dictionary=compoundwords_en.txt minWordSize=5 minSubwordSize=4 maxSubwordSize=15 onlyLongestMatch=true/ filter class=solr.SnowballPorterFilterFactory language=English protected=protwords_en.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType/ solrQueryParser defaultOperator=AND/ solrconfig.xml /requestHandler name=pinkPony class=solr.SearchHandler default=true lst name=defaults str name=defTypeedismax/str str name=echoParamsexplicit/str bool name=omitHeadertrue/bool float name=tie0.01/float int name=timeAllowed${solr.pinkPony.timeAllowed:-1}/int str name=q.alt*:*/str str name=spellcheckfalse/str str name=spellcheck.onlyMorePopulartrue/str str name=spellcheck.extendedResultsfalse/str str name=spellcheck.count1/str /lst arr name=last-components strspellcheck/str /arr /requestHandler/ ANY ideas are appreciated! -- View this message in context: http://lucene.472066.n3.nabble.com/Multilingual-indexing-search-results-edismax-and-stopwords-tp4125746.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Filter in terms component
Will it work for multi value fields, It is saying that Field Cache will not work for multi value fields error. Most of the data is multi value fields in index. Thanks, Jilani On Thu, Mar 20, 2014 at 1:53 AM, Ahmet Arslan iori...@yahoo.com wrote: Hi, If you just need counts may be you can make use of http://wiki.apache.org/solr/FunctionQuery#Relevance_Functions Ahmet On Wednesday, March 19, 2014 9:49 PM, Jilani Shaik jilani24...@gmail.com wrote: Hi Ahmet, I have gone through the facet component, as our application has 300+ million docs and it very time consuming with this component and also it uses cache. So I have gone through the terms component where Solr is reading index for field terms, is there any approach where I can get the terms using the filter. So that I can restrict some of the document terms in counts. Basically we have set of documents where we want to show the terms count based on those filters with set name. Instead of reading entire index. Please let me know if you need any details to throw some more pointers Thanks, Jilani On Thu, Mar 20, 2014 at 1:04 AM, Ahmet Arslan iori...@yahoo.com wrote: Hi Jilani, What features of terms component are you after? If if it is just terms.prefix, it could be simulated with facet component with facet.prefix parameter. faceting component respects filter queries. On Wednesday, March 19, 2014 8:58 PM, Jilani Shaik jilani24...@gmail.com wrote: Hi, I have huge index and using Solr. I need terms component with filter by a field. Please let me know is there anything that I can get it. Please provide me some pointers, even to develop this by going through the Lucene. Please suggest. Thanks, Jilani
understand debuginfo from query
i want the infos simplified so that the user can see why a doc was found bellow is the output a a doc: 0.085597195 = (MATCH) sum of: 0.083729245 = (MATCH) max of: 0.0019158133 = (MATCH) weight(plain_text:test^10.0 in 601) [DefaultSimilarity], result of: 0.0019158133 = score(doc=601,freq=9.0 = termFreq=9.0 ), product of: 0.022560213 = queryWeight, product of: 10.0 = boost 3.6232536 = idf(docFreq=81, maxDocs=1130) 6.2265067E-4 = queryNorm 0.084920004 = fieldWeight in 601, product of: 3.0 = tf(freq=9.0), with freq of: 9.0 = termFreq=9.0 3.6232536 = idf(docFreq=81, maxDocs=1130) 0.0078125 = fieldNorm(doc=601) 0.083729245 = (MATCH) weight(inhaltstyp:test^6.0 in 601) [DefaultSimilarity], result of: 0.083729245 = score(doc=601,freq=1.0 = termFreq=1.0 ), product of: 0.017686278 = queryWeight, product of: 6.0 = boost 4.734136 = idf(docFreq=26, maxDocs=1130) 6.2265067E-4 = queryNorm 4.734136 = fieldWeight in 601, product of: 1.0 = tf(freq=1.0), with freq of: 1.0 = termFreq=1.0 4.734136 = idf(docFreq=26, maxDocs=1130) 1.0 = fieldNorm(doc=601) 0.013458222 = (MATCH) weight(title:test^20.0 in 601) [DefaultSimilarity], result of: 0.013458222 = score(doc=601,freq=1.0 = termFreq=1.0 ), product of: 0.042281017 = queryWeight, product of: 20.0 = boost 3.395244 = idf(docFreq=102, maxDocs=1130) 6.2265067E-4 = queryNorm 0.31830412 = fieldWeight in 601, product of: 1.0 = tf(freq=1.0), with freq of: 1.0 = termFreq=1.0 3.395244 = idf(docFreq=102, maxDocs=1130) 0.09375 = fieldNorm(doc=601) 0.001867952 = (MATCH) product of: 0.003735904 = (MATCH) sum of: 0.003735904 = (MATCH) ConstantScore(expiration:[1395328539325 TO *]), product of: 1.0 = boost 0.003735904 = queryNorm 0.5 = coord(1/2) 0.0 = (MATCH) FunctionQuery(div(int(clicks),max(int(displays),const(1, product of: 0.0 = div(int(clicks)=0,max(int(displays)=432,const(1))) 8.0 = boost 6.2265067E-4 = queryNorm why is the sum 0.085597195? this would mean 0.083729245 + 0.001867952 and these are not included in the sum: 0.0019158133 + 0.013458222 + 0.003735904 am i looking at the wrong total? aren't these 2 cases the ones i have to sum up x = (MATCH) sum of or x = score( ? i'm trying to extract the fields that where used for weighing the doc.
Singles in solr for bigrams,trigrams in parsed_query
Hi Folks, I am using singles to index bigrams/trigrams. The same is also used for query in the schema.xml file. But when I run the query in debug mode for a collections, I dont see the bigrams in the parsed_query . Any idea what I might be missing. solr/colection/select?q=best%20pricedebugQuery=on str name=parsedquery_toStringtext:best text:price/str I was hoping to see str name=parsedquery_toStringtext:best text:price text:best price/str My schema files looks like this: types fieldType name=string class=solr.StrField sortMissingLast=true omitNorms=true/ fieldType name=int class=solr.TrieIntField precisionStep=0 omitNorms=true positionIncrementGap=0/ fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer type=index charFilter class=solr.HTMLStripCharFilterFactory/ filter class=solr.ShingleFilterFactory minShingleSize=2 maxShingleSize=4 outputUnigrams=true / tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.LengthFilterFactory min=3 max=50 / filter class=solr.WordDelimiterFilterFactory generateWordParts=0 generateNumberParts=0 catenateWords=1 catenateNumbers=1 catenateAll=1 preserveOriginal=1 splitOnCaseChange=0 splitOnNumerics=0 stemEnglishPossessive=1/ filter class=solr.StopFilterFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.TrimFilterFactory / /analyzer analyzer type=query filter class=solr.LowerCaseFilterFactory/ filter class=solr.LengthFilterFactory min=3 max=50 / tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory/ filter class=solr.TrimFilterFactory / filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=1 splitOnCaseChange=0 splitOnNumerics=0 stemEnglishPossessive=1/ filter class=solr.ShingleFilterFactory minShingleSize=2 maxShingleSize=4 outputUnigrams=true / filter class=solr.CommonGramsFilterFactory words=stopwords.txt ignoreCase=true/ !--filter class=solr.CommonGramsFilterFactory words=stopwords.txt ignoreCase=true/ filter class=solr.ShingleFilterFactory minShingleSize=2 maxShingleSize=4 outputUnigrams=true /-- /analyzer /fieldType /types -- Best Regards, Jyotirmoy Sundi
Re: Parallel queries to Solr
Thanks Shawn. When we run any solrj application , the below message is displayed org.apache.solr.client.solrj.impl.HttpClientUtil createClient INFO: Creating new http client, config:maxConnections=128maxConnectionsPerHost=32followRedirects=false and while restarting solr we are getting this message. org.apache.solr.client.solrj.impl.HttpClientUtil; Creating new http client, config:maxConnectionsPerHost=20maxConnections=1socketTimeout=0connTimeout=0retry=false This is indicating the number of http connections by default? This can be overridden by adding the below one.? ModifiableSolrParams params = new ModifiableSolrParams(); params.add(HttpClientUtil.PROP_MAX_CONNECTIONS_PER_HOST, 300); params.add(HttpClientUtil.PROP_MAX_CONNECTIONS, 5000); HttpClient httpClient = HttpClientUtil.createClient(params); Thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/Parallel-queries-to-Solr-tp4119959p4125806.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Bootstrapping SolrCloud cluster with multiple collections in differene sharding/replication setup
Please note that although the article talks about the ADDREPLICA command, that feature is coming in Solr 4.8, so don¹t be confused if you can¹t find it yet. See https://issues.apache.org/jira/browse/SOLR-5130 On 3/20/14, 7:45 AM, Erick Erickson erickerick...@gmail.com wrote: You might find this useful: http://heliosearch.org/solrcloud-assigning-nodes-machines/ It uses the collections API to create your collection with zero nodes, then shows how to assign your leaders to specific machines (well, at least specify the nodes the leaders will be created on, it doesn't show how to assign, for instance, shard1 to nodeX) It also shows a way to assign specific replicas on specific nodes to specific shards, although as Mark says this is a transitional technique. I know there's an addreplica command in the works for the collections API that should make this easier, but that's not released yet. Best, Erick On Thu, Mar 20, 2014 at 7:23 AM, Ugo Matrangolo ugo.matrang...@gmail.com wrote: Hi, I would like some advice about the best way to bootstrap from scratch a SolrCloud cluster housing at least two collections with different sharding/replication setup. Going through the docs/'Solr In Action' book what I have sees so far is that there is a way to bootstrap a SolrCloud cluster with sharding configuration using the: -DnumShards=2 but this (afaik) works only for a single collection. What I need is a way to deploy from scratch a SolrCloud cluster housing (e.g.) two collections Foo and Bar where Foo has only one shard and is replicated everywhere while Bar has three shards and ,again, is replicated. I can't find a config file where to put this sharding plan and I'm starting to think that the only way to do this is after the deploy using the Collections API. Is there a best approach way to do this ? Ugo
Re: Filter in terms component
Hi, Please provide some more pointers to go ahead in addressing this. Thnks, Jilani On Thu, Mar 20, 2014 at 8:50 PM, Jilani Shaik jilani24...@gmail.com wrote: Will it work for multi value fields, It is saying that Field Cache will not work for multi value fields error. Most of the data is multi value fields in index. Thanks, Jilani On Thu, Mar 20, 2014 at 1:53 AM, Ahmet Arslan iori...@yahoo.com wrote: Hi, If you just need counts may be you can make use of http://wiki.apache.org/solr/FunctionQuery#Relevance_Functions Ahmet On Wednesday, March 19, 2014 9:49 PM, Jilani Shaik jilani24...@gmail.com wrote: Hi Ahmet, I have gone through the facet component, as our application has 300+ million docs and it very time consuming with this component and also it uses cache. So I have gone through the terms component where Solr is reading index for field terms, is there any approach where I can get the terms using the filter. So that I can restrict some of the document terms in counts. Basically we have set of documents where we want to show the terms count based on those filters with set name. Instead of reading entire index. Please let me know if you need any details to throw some more pointers Thanks, Jilani On Thu, Mar 20, 2014 at 1:04 AM, Ahmet Arslan iori...@yahoo.com wrote: Hi Jilani, What features of terms component are you after? If if it is just terms.prefix, it could be simulated with facet component with facet.prefix parameter. faceting component respects filter queries. On Wednesday, March 19, 2014 8:58 PM, Jilani Shaik jilani24...@gmail.com wrote: Hi, I have huge index and using Solr. I need terms component with filter by a field. Please let me know is there anything that I can get it. Please provide me some pointers, even to develop this by going through the Lucene. Please suggest. Thanks, Jilani
Limit on # of collections -SolrCloud
Hi there Is there a limit on the # of collections solrcloud can support? Can zk/solrcloud handle 1000s of collections? Also i see that the bootup time of solrcloud increases with increase in # of cores. I do not have any expensive warm up queries. How do i speedup solr startup? -- Best -- C
Re: Limit on # of collections -SolrCloud
There are no arbitrary limits on the number of collections but yes there are practical limits. For example, the cluster state can become a bottleneck. There is a lot of work happening on finding and addressing these problems. See https://issues.apache.org/jira/browse/SOLR-5381 Boot up time is because of: 1) Core discovery, schema/config parsing etc 2) Transaction log replay on startup 3) Wait time for enough replicas to become available before leader election happens You can't do much about 1 right now I think. For #2, you can keep your transaction logs smaller by a hard commit before shutdown. For #3 there is a leaderVoteWait settings but I'd rather not touch that unless it becomes a problem. On Fri, Mar 21, 2014 at 1:39 AM, Chris W chris1980@gmail.com wrote: Hi there Is there a limit on the # of collections solrcloud can support? Can zk/solrcloud handle 1000s of collections? Also i see that the bootup time of solrcloud increases with increase in # of cores. I do not have any expensive warm up queries. How do i speedup solr startup? -- Best -- C -- Regards, Shalin Shekhar Mangar.
Re: Solr4.7 No live SolrServers available to handle this request
I'm getting a similar exception when writing documents (on the client side). I can write one document fine, but the second (which is being routed to a different shard) generates the error. It happens every time - definitely not a resource issue or timing problem since this database is completely empty -- I'm just getting started and running some tests, so there must be some kind of setup problem. But it's difficult to diagnose (for me, anyway)! I'd appreciate any insight, hints, guesses, etc. since I'm stuck. Thanks! One node (the leader?) is reporting Internal Server Error in its log, and another node (presumably the shard where the document is being directed) bombs out like this: ERROR - 2014-03-20 15:56:53.022; org.apache.solr.common.SolrException; null:org.apache.solr.common.SolrException: ERROR adding document SolrInputDocument( ... long dump of document fields ) at org.apache.solr.handler.loader.JavabinLoader$1.update(JavabinLoader.java:99) at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readOuterMostDocIterator(JavaBinUpdateRequestCodec.java:166) at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readIterator(JavaBinUpdateRequestCodec.java:136) at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:225) at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readNamedList(JavaBinUpdateRequestCodec.java:121) at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:190) at org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:116) at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec.unmarshal(JavaBinUpdateRequestCodec.java:173) at org.apache.solr.handler.loader.JavabinLoader.parseAndLoadDocs(JavabinLoader.java:106) at org.apache.solr.handler.loader.JavabinLoader.load(JavabinLoader.java:58) at org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1859) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:721) ... Caused by: java.net.SocketException: Broken pipe at java.net.SocketOutputStream.socketWrite0(Native Method) at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:109) at java.net.SocketOutputStream.write(SocketOutputStream.java:153) at org.apache.coyote.http11.InternalOutputBuffer.realWriteBytes(InternalOutputBuffer.java:215) at org.apache.tomcat.util.buf.ByteChunk.flushBuffer(ByteChunk.java:480) at org.apache.tomcat.util.buf.ByteChunk.append(ByteChunk.java:366) at org.apache.coyote.http11.InternalOutputBuffer$OutputStreamOutputBuffer.doWrite(InternalOutputBuffer.java:240) at org.apache.coyote.http11.filters.ChunkedOutputFilter.doWrite(ChunkedOutputFilter.java:119) at org.apache.coyote.http11.AbstractOutputBuffer.doWrite(AbstractOutputBuffer.java:192) at org.apache.coyote.Response.doWrite(Response.java:520) at org.apache.catalina.connector.OutputBuffer.realWriteBytes(OutputBuffer.java:408) ... 37 more This is with Solr 4.6.1, Tomcat 7. Here's my clusterstate.json. Updates are being sent to the test1x3 collection { test3x1:{ shards:{ shard1:{ range:8000-d554, state:active, replicas:{core_node1:{ state:active, base_url:http://10.4.24.37:8080/solr;, core:test3x1_shard1_replica1, node_name:10.4.24.37:8080_solr, leader:true}}}, shard2:{ range:d555-2aa9, state:active, replicas:{core_node3:{ state:active, base_url:http://10.4.24.39:8080/solr;, core:test3x1_shard2_replica1, node_name:10.4.24.39:8080_solr, leader:true}}}, shard3:{ range:2aaa-7fff, state:active, replicas:{core_node2:{ state:active, base_url:http://10.4.24.38:8080/solr;, core:test3x1_shard3_replica1, node_name:10.4.24.38:8080_solr, leader:true, maxShardsPerNode:1, router:{name:compositeId}, replicationFactor:1}, test1x3:{ shards:{shard1:{ range:8000-7fff, state:active, replicas:{ core_node1:{ state:active, base_url:http://10.4.24.39:8080/solr;, core:test1x3_shard1_replica2, node_name:10.4.24.39:8080_solr, leader:true}, core_node2:{ state:active, base_url:http://10.4.24.38:8080/solr;, core:test1x3_shard1_replica1, node_name:10.4.24.38:8080_solr},
Re: Limit on # of collections -SolrCloud
Thanks, Shalin. Making clusterstate.json on a collection basis sounds awesome. I am not having problems with #2 . #3 is a major time hog in my environment. I have over 300 +collections and restarting the entire cluster takes in the order of hours. (2-3 hour). Can you explain more about the leaderVoteWait setting? On Thu, Mar 20, 2014 at 1:28 PM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: There are no arbitrary limits on the number of collections but yes there are practical limits. For example, the cluster state can become a bottleneck. There is a lot of work happening on finding and addressing these problems. See https://issues.apache.org/jira/browse/SOLR-5381 Boot up time is because of: 1) Core discovery, schema/config parsing etc 2) Transaction log replay on startup 3) Wait time for enough replicas to become available before leader election happens You can't do much about 1 right now I think. For #2, you can keep your transaction logs smaller by a hard commit before shutdown. For #3 there is a leaderVoteWait settings but I'd rather not touch that unless it becomes a problem. On Fri, Mar 21, 2014 at 1:39 AM, Chris W chris1980@gmail.com wrote: Hi there Is there a limit on the # of collections solrcloud can support? Can zk/solrcloud handle 1000s of collections? Also i see that the bootup time of solrcloud increases with increase in # of cores. I do not have any expensive warm up queries. How do i speedup solr startup? -- Best -- C -- Regards, Shalin Shekhar Mangar. -- Best -- C
Re: Filter in terms component
Hi, I suggest you start a new threat describing your use case. Just describe the problem without assumptions. With a appropriate title/subject. Ahmet On Thursday, March 20, 2014 10:01 PM, Jilani Shaik jilani24...@gmail.com wrote: Hi, Please provide some more pointers to go ahead in addressing this. Thnks, Jilani On Thu, Mar 20, 2014 at 8:50 PM, Jilani Shaik jilani24...@gmail.com wrote: Will it work for multi value fields, It is saying that Field Cache will not work for multi value fields error. Most of the data is multi value fields in index. Thanks, Jilani On Thu, Mar 20, 2014 at 1:53 AM, Ahmet Arslan iori...@yahoo.com wrote: Hi, If you just need counts may be you can make use of http://wiki.apache.org/solr/FunctionQuery#Relevance_Functions Ahmet On Wednesday, March 19, 2014 9:49 PM, Jilani Shaik jilani24...@gmail.com wrote: Hi Ahmet, I have gone through the facet component, as our application has 300+ million docs and it very time consuming with this component and also it uses cache. So I have gone through the terms component where Solr is reading index for field terms, is there any approach where I can get the terms using the filter. So that I can restrict some of the document terms in counts. Basically we have set of documents where we want to show the terms count based on those filters with set name. Instead of reading entire index. Please let me know if you need any details to throw some more pointers Thanks, Jilani On Thu, Mar 20, 2014 at 1:04 AM, Ahmet Arslan iori...@yahoo.com wrote: Hi Jilani, What features of terms component are you after? If if it is just terms.prefix, it could be simulated with facet component with facet.prefix parameter. faceting component respects filter queries. On Wednesday, March 19, 2014 8:58 PM, Jilani Shaik jilani24...@gmail.com wrote: Hi, I have huge index and using Solr. I need terms component with filter by a field. Please let me know is there anything that I can get it. Please provide me some pointers, even to develop this by going through the Lucene. Please suggest. Thanks, Jilani
Re: Limit on # of collections -SolrCloud
How many total replicas are we talking here? As in how many shards and, for each shard, how many replicas? I'm not asking for a long list here, just if you have a bazillion replicas in aggregate. Hours is surprising. Best, Erick On Thu, Mar 20, 2014 at 2:17 PM, Chris W chris1980@gmail.com wrote: Thanks, Shalin. Making clusterstate.json on a collection basis sounds awesome. I am not having problems with #2 . #3 is a major time hog in my environment. I have over 300 +collections and restarting the entire cluster takes in the order of hours. (2-3 hour). Can you explain more about the leaderVoteWait setting? On Thu, Mar 20, 2014 at 1:28 PM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: There are no arbitrary limits on the number of collections but yes there are practical limits. For example, the cluster state can become a bottleneck. There is a lot of work happening on finding and addressing these problems. See https://issues.apache.org/jira/browse/SOLR-5381 Boot up time is because of: 1) Core discovery, schema/config parsing etc 2) Transaction log replay on startup 3) Wait time for enough replicas to become available before leader election happens You can't do much about 1 right now I think. For #2, you can keep your transaction logs smaller by a hard commit before shutdown. For #3 there is a leaderVoteWait settings but I'd rather not touch that unless it becomes a problem. On Fri, Mar 21, 2014 at 1:39 AM, Chris W chris1980@gmail.com wrote: Hi there Is there a limit on the # of collections solrcloud can support? Can zk/solrcloud handle 1000s of collections? Also i see that the bootup time of solrcloud increases with increase in # of cores. I do not have any expensive warm up queries. How do i speedup solr startup? -- Best -- C -- Regards, Shalin Shekhar Mangar. -- Best -- C
Re: Limit on # of collections -SolrCloud
Hours sounds too long indeed. We recently had a client with several thousand collections, but restart wasn't taking hours... Otis Solr ElasticSearch Support http://sematext.com/ On Mar 20, 2014 5:49 PM, Erick Erickson erickerick...@gmail.com wrote: How many total replicas are we talking here? As in how many shards and, for each shard, how many replicas? I'm not asking for a long list here, just if you have a bazillion replicas in aggregate. Hours is surprising. Best, Erick On Thu, Mar 20, 2014 at 2:17 PM, Chris W chris1980@gmail.com wrote: Thanks, Shalin. Making clusterstate.json on a collection basis sounds awesome. I am not having problems with #2 . #3 is a major time hog in my environment. I have over 300 +collections and restarting the entire cluster takes in the order of hours. (2-3 hour). Can you explain more about the leaderVoteWait setting? On Thu, Mar 20, 2014 at 1:28 PM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: There are no arbitrary limits on the number of collections but yes there are practical limits. For example, the cluster state can become a bottleneck. There is a lot of work happening on finding and addressing these problems. See https://issues.apache.org/jira/browse/SOLR-5381 Boot up time is because of: 1) Core discovery, schema/config parsing etc 2) Transaction log replay on startup 3) Wait time for enough replicas to become available before leader election happens You can't do much about 1 right now I think. For #2, you can keep your transaction logs smaller by a hard commit before shutdown. For #3 there is a leaderVoteWait settings but I'd rather not touch that unless it becomes a problem. On Fri, Mar 21, 2014 at 1:39 AM, Chris W chris1980@gmail.com wrote: Hi there Is there a limit on the # of collections solrcloud can support? Can zk/solrcloud handle 1000s of collections? Also i see that the bootup time of solrcloud increases with increase in # of cores. I do not have any expensive warm up queries. How do i speedup solr startup? -- Best -- C -- Regards, Shalin Shekhar Mangar. -- Best -- C
Re: Limit on # of collections -SolrCloud
The replication factor is two. I have equally sharded all collections across all nodes. We have a 6 node cluster setup. 300* 6 shards and 2 replicas per shard. I have almost 600 cores per machine Also one fact is that my zk timeout is in the order of 2-3 minutes. I see zk responses very slow and a lot of outstanding requests (found that out thanks to https://github.com/phunt/) On Thu, Mar 20, 2014 at 2:53 PM, Otis Gospodnetic otis.gospodne...@gmail.com wrote: Hours sounds too long indeed. We recently had a client with several thousand collections, but restart wasn't taking hours... Otis Solr ElasticSearch Support http://sematext.com/ On Mar 20, 2014 5:49 PM, Erick Erickson erickerick...@gmail.com wrote: How many total replicas are we talking here? As in how many shards and, for each shard, how many replicas? I'm not asking for a long list here, just if you have a bazillion replicas in aggregate. Hours is surprising. Best, Erick On Thu, Mar 20, 2014 at 2:17 PM, Chris W chris1980@gmail.com wrote: Thanks, Shalin. Making clusterstate.json on a collection basis sounds awesome. I am not having problems with #2 . #3 is a major time hog in my environment. I have over 300 +collections and restarting the entire cluster takes in the order of hours. (2-3 hour). Can you explain more about the leaderVoteWait setting? On Thu, Mar 20, 2014 at 1:28 PM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: There are no arbitrary limits on the number of collections but yes there are practical limits. For example, the cluster state can become a bottleneck. There is a lot of work happening on finding and addressing these problems. See https://issues.apache.org/jira/browse/SOLR-5381 Boot up time is because of: 1) Core discovery, schema/config parsing etc 2) Transaction log replay on startup 3) Wait time for enough replicas to become available before leader election happens You can't do much about 1 right now I think. For #2, you can keep your transaction logs smaller by a hard commit before shutdown. For #3 there is a leaderVoteWait settings but I'd rather not touch that unless it becomes a problem. On Fri, Mar 21, 2014 at 1:39 AM, Chris W chris1980@gmail.com wrote: Hi there Is there a limit on the # of collections solrcloud can support? Can zk/solrcloud handle 1000s of collections? Also i see that the bootup time of solrcloud increases with increase in # of cores. I do not have any expensive warm up queries. How do i speedup solr startup? -- Best -- C -- Regards, Shalin Shekhar Mangar. -- Best -- C -- Best -- C
SOLR synonyms - Explicit mappings
I need some clarification of how to define explicit mappings in synonyms.txt file. I have been using equivalent synonyms for a while and it works as expected. I am confused with explicit mapping. I have the below synonyms added to query analyzer. I want the search on keyword 'watch' to actually do a search on 'smartwatch' but the below query mapping seems to bring the documents that contain both keywords 'watch' and 'smartwatch'.. Am I doing anything wrong? watch = smartwatch Thanks for your help!!! -- View this message in context: http://lucene.472066.n3.nabble.com/SOLR-synonyms-Explicit-mappings-tp4125858.html Sent from the Solr - User mailing list archive at Nabble.com.
Best approach to handle large volume of documents with constantly high incoming rate?
Hi, I am looking for some advice to handle large volume of documents with a very high incoming rate. The size of each document is about 0.5 KB and the incoming rate could be more than 20K per second and we want to store about one year's documents in Solr for near real=time searching. The goal is to achieve acceptable indexing and querying performance. We will use techniques like soft commit, dedicated indexing servers, etc. My main question is about how to structure the collection/shard/core to achieve the goals. Since the incoming rate is very high, we do not want the incoming documents to affect the existing older indexes. Some thoughts are to create a latest index to hold the incoming documents (say latest half hour's data, about 36M docs) so queries on older data could be faster since the old indexes are not affected. There seem three ways to grow the time dimension by adding/splitting/creating a new object listed below every half hour: collection shard core Which is the best way to grow the time dimension? Any limitation in that direction? Or there is some better approach? As an example, I am thinking about having 4 nodes with the following configuration to setup a Solr Cloud: Memory: 128 GB Storage: 4 TB How to set the collection/shard/core to deal with the use case? Thanks in advance. Shushuai
Memory + WeakIdentityMap
I'm transitioning my index from a 3.x version to 4.6. I'm running a large heap (20G), primarily to accomodate a large facet cache (~5G), but have been able to run it on 3.x stably. On 4.6.0 after stress testing I'm finding that all of my shards are spending all of their time in GC. After taking a heap dump and analyzing, it appears that org.apache.lucene.util.WeakIdentityMap is using many Gs of memory. Does anyone have any insight into which Solr component(s) use this and whether this kind of memory consumption is to be expected? Thank You, -Harish
Rounding errors with SOLR score
When doing complex boosting/bq we are getting rounding errors on the score. To get the score to be consistent I needed to use rint on sort: sort=rint(product(sum($p_score,$s_score,$q_score),100)) desc,s_query asc str name=p_scorerecip(priority,1,.5,.01)/str str name=s_scoreproduct(recip(synonym_rank,1,1,.01),17)/str str name=q_score query({!dismax qf=user_query_edge^1 user_query^0.5 user_query_fuzzy v=$q1}) /str The issue is in the qf area. {s_query: Ear Irrigation,score: 10.331313},{s_query: Ear Piercing, score: 10.331314},{s_query: Ear Pinning,score: 10.331313}, -- Bill Bell billnb...@gmail.com cell 720-256-8076
Re: solr cloud distributed optimize() becomes serialized
Yeah. optimize() also used to come back immediately if the index was already indexed. It just reopened the index. We uses to use that for cleaning up the old directories quickly. But now it does another optimize() even through the index is already optimized. Very strange. On Tue, Mar 18, 2014 at 11:30 AM, Chris Lu chris...@gmail.com wrote: I wonder whether this is a known bug. In previous SOLR cloud versions, 4.4 or maybe 4.5, an explicit optimize(), without any parameters, it usually took 2 minutes for a 32 core cluster. However, in 4.6.1, the same call took about 1 hour. Checking the index modification time for each core shows 2 minutes gap if sorted. We are using a solrj client connecting to zookeeper. I found it is talking to a specific solr server A, and that server A is distributing the calls to all other solr servers. Here is the thread dump for this server A: at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:395) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:199) at org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer.request(ConcurrentUpdateSolrServer.java:293) at org.apache.solr.update.SolrCmdDistributor.submit(SolrCmdDistributor.java:226) at org.apache.solr.update.SolrCmdDistributor.distribCommit(SolrCmdDistributor.java:195) at org.apache.solr.update.processor.DistributedUpdateProcessor.processCommit(DistributedUpdateProcessor.java:1250) at org.apache.solr.handler.RequestHandlerUtils.handleCommit(RequestHandlerUtils.java:69) -- Bill Bell billnb...@gmail.com cell 720-256-8076
Re: Wiki edit rights
PLease add me too. On Tue, Mar 18, 2014 at 8:33 AM, Erick Erickson erickerick...@gmail.comwrote: Done, thanks! On Tue, Mar 18, 2014 at 3:54 AM, Anders Gustafsson anders.gustafs...@pedago.fi wrote: Yes, please. My Wiki ID is Anders Gustafsson But yes, please, add the howto to Wiki. You will need to get your account whitelisted first (due to spammers), so send a separate email with your Apache wiki id and somebody will unlock you for editing. -- Anders Gustafsson Engineer, CNI, CNE6, ASE Pedago, The Aaland Islands (N60 E20) www.pedago.fi phone +358 18 12060 mobile +358 40506 7099 -- Bill Bell billnb...@gmail.com cell 720-256-8076
Re: solr cloud distributed optimize() becomes serialized
That's not right. Which Solr versions are you on (question for both William and Chris)? On Fri, Mar 21, 2014 at 8:07 AM, William Bell billnb...@gmail.com wrote: Yeah. optimize() also used to come back immediately if the index was already indexed. It just reopened the index. We uses to use that for cleaning up the old directories quickly. But now it does another optimize() even through the index is already optimized. Very strange. On Tue, Mar 18, 2014 at 11:30 AM, Chris Lu chris...@gmail.com wrote: I wonder whether this is a known bug. In previous SOLR cloud versions, 4.4 or maybe 4.5, an explicit optimize(), without any parameters, it usually took 2 minutes for a 32 core cluster. However, in 4.6.1, the same call took about 1 hour. Checking the index modification time for each core shows 2 minutes gap if sorted. We are using a solrj client connecting to zookeeper. I found it is talking to a specific solr server A, and that server A is distributing the calls to all other solr servers. Here is the thread dump for this server A: at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:395) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:199) at org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer.request(ConcurrentUpdateSolrServer.java:293) at org.apache.solr.update.SolrCmdDistributor.submit(SolrCmdDistributor.java:226) at org.apache.solr.update.SolrCmdDistributor.distribCommit(SolrCmdDistributor.java:195) at org.apache.solr.update.processor.DistributedUpdateProcessor.processCommit(DistributedUpdateProcessor.java:1250) at org.apache.solr.handler.RequestHandlerUtils.handleCommit(RequestHandlerUtils.java:69) -- Bill Bell billnb...@gmail.com cell 720-256-8076 -- Regards, Shalin Shekhar Mangar.
Re: Wiki edit rights
What's your wiki username? On Fri, Mar 21, 2014 at 8:12 AM, William Bell billnb...@gmail.com wrote: PLease add me too. On Tue, Mar 18, 2014 at 8:33 AM, Erick Erickson erickerick...@gmail.comwrote: Done, thanks! On Tue, Mar 18, 2014 at 3:54 AM, Anders Gustafsson anders.gustafs...@pedago.fi wrote: Yes, please. My Wiki ID is Anders Gustafsson But yes, please, add the howto to Wiki. You will need to get your account whitelisted first (due to spammers), so send a separate email with your Apache wiki id and somebody will unlock you for editing. -- Anders Gustafsson Engineer, CNI, CNE6, ASE Pedago, The Aaland Islands (N60 E20) www.pedago.fi phone +358 18 12060 mobile +358 40506 7099 -- Bill Bell billnb...@gmail.com cell 720-256-8076 -- Regards, Shalin Shekhar Mangar.
Re: Memory + WeakIdentityMap
On 3/20/2014 6:54 PM, Harish Agarwal wrote: I'm transitioning my index from a 3.x version to 4.6. I'm running a large heap (20G), primarily to accomodate a large facet cache (~5G), but have been able to run it on 3.x stably. On 4.6.0 after stress testing I'm finding that all of my shards are spending all of their time in GC. After taking a heap dump and analyzing, it appears that org.apache.lucene.util.WeakIdentityMap is using many Gs of memory. Does anyone have any insight into which Solr component(s) use this and whether this kind of memory consumption is to be expected? I can't really say what WeakIdentityMap is doing. I can trace the only usage in Lucene to MMapDirectory, but it doesn't make a lot of sense for this to use a lot of memory, unless this is the source of the memory misreporting that Java 7 seems to do with MMap. See this message in a recent thread on this mailing list: http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201403.mbox/%3c53285ca1.9000...@elyograg.org%3E If you have a lot of facets, one approach for performance is to use facet.method=enum so that your Java heap does not need to be super large. This does not actually reduce the overall system memory requirements. It just shifts the responsibility for caching to the operating system instead of Solr, and requires that you have enough memory to put a majority of the index into the OS disk cache. Ideally, there would be enough RAM for the entire index to fit. http://wiki.apache.org/solr/SolrPerformanceProblems Another option for facet memory optimization is docValues. One caveat: It is my understanding that the docValues content is the same as a stored field. Depending on your schema definition, this may be different than the indexed values that facets normally use. The docValues feature also helps with sorting. Thanks, Shawn