Re: Query Elevation exception on shard queries
Ok found the solution .. Like SpellcheckComponent , Elevate Component also requires shards.qt param .. But still dont know why both these components doesn't work in absense of shards.qt . Can anyone explain ? Thanks Varun On Mon, May 6, 2013 at 1:14 PM, varun srivastava varunmail...@gmail.comwrote: Thanks Ravi. So then it is a bug . On Mon, May 6, 2013 at 12:04 PM, Ravi Solr ravis...@gmail.com wrote: Varun, Since our cores were totally disjoint i.e. they pertain to two different applications which may or may not have results for a given query, we moved the elavation outside of solr into our java code. As long as both cores had some results to return for a given query elevation would work. Thanks, Ravi On Sat, May 4, 2013 at 1:54 PM, varun srivastava varunmail...@gmail.com wrote: Hi Ravi, I am getting same probelm . You got any solution ? Thanks Varun On Fri, Mar 29, 2013 at 11:48 AM, Ravi Solr ravis...@gmail.com wrote: Hello, We have a Solr 3.6.2 multicore setup, where each core is a complete index for one application. In our site search we use sharded query to query two cores at a time. The issue is, If one core has docs but other core doesn't for an elevated query solr is throwing a 500 error. I woudl really appreciate it if somebody can point me in the right direction on how to avoid this error, the following is my query [#|2013-03-29T13:44:55.609-0400|INFO|sun-appserver2.1|org.apache.solr.core.SolrCore|_ThreadID=22;_ThreadName=httpSSLWorkerThread-9001-0;|[core1] webapp=/solr path=/select/ params={q=civil+warstart=0rows=10shards=localhost:/solr/core1,localhost:/solr/core2hl=truehl.fragsize=0hl.snippets=5hl.simple.pre=stronghl.simple.post=/stronghl.fl=bodyfl=*facet=truefacet.field=typefacet.mincount=1facet.method=enumfq=pubdate:[2005-01-01T00:00:00Z+TO+NOW/DAY%2B1DAY]facet.query={!ex%3Ddt+key%3DPast+24+Hours}pubdate:[NOW/DAY-1DAY+TO+NOW/DAY%2B1DAY]facet.query={!ex%3Ddt+key%3DPast+7+Days}pubdate:[NOW/DAY-7DAYS+TO+NOW/DAY%2B1DAY]facet.query={!ex%3Ddt+key%3DPast+60+Days}pubdate:[NOW/DAY-60DAYS+TO+NOW/DAY%2B1DAY]facet.query={!ex%3Ddt+key%3DPast+12+Months}pubdate:[NOW/DAY-1YEAR+TO+NOW/DAY%2B1DAY]facet.query={!ex%3Ddt+key%3DAll+Since+2005}pubdate:[*+TO+NOW/DAY%2B1DAY]} status=500 QTime=15 |#] As you can see the 2 cores are core1 and core2. The core1 has data for he query 'civil war' however core2 doesn't have any data. We have the 'civil war' in the elevate.xml which causes Solr to throw a SolrException as follows. However if I remove the elevate entry for this query, everything works well. *type* Status report *message*Index: 1, Size: 0 java.lang.IndexOutOfBoundsException: Index: 1, Size: 0 at java.util.ArrayList.RangeCheck(ArrayList.java:547) at java.util.ArrayList.get(ArrayList.java:322) at org.apache.solr.common.util.NamedList.getVal(NamedList.java:137) at org.apache.solr.handler.component.ShardFieldSortedHitQueue$ShardComparator.sortVal(ShardDoc.java:221) at org.apache.solr.handler.component.ShardFieldSortedHitQueue$2.compare(ShardDoc.java:260) at org.apache.solr.handler.component.ShardFieldSortedHitQueue.lessThan(ShardDoc.java:160) at org.apache.solr.handler.component.ShardFieldSortedHitQueue.lessThan(ShardDoc.java:101) at org.apache.lucene.util.PriorityQueue.upHeap(PriorityQueue.java:223) at org.apache.lucene.util.PriorityQueue.add(PriorityQueue.java:132) at org.apache.lucene.util.PriorityQueue.insertWithOverflow(PriorityQueue.java:148) at org.apache.solr.handler.component.QueryComponent.mergeIds(QueryComponent.java:786) at org.apache.solr.handler.component.QueryComponent.handleRegularResponses(QueryComponent.java:587) at org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:566) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:283) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1376) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:365) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:260) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:246) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:214) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:313) at org.apache.catalina.core.StandardContextValve.invokeInternal(StandardContextValve.java:287) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:218) at org.apache.catalina.core.StandardPipeline.doInvoke
Re: Elevate Problem with Distributed query
Ok found the solution .. Like SpellcheckComponent , Elevate Component also requires shards.qt param .. But still dont know why both these components doesn't work in absense of shards.qt . Can anyone explain ? Thanks On Sat, May 4, 2013 at 1:08 PM, varun srivastava varunmail...@gmail.comwrote: i am getting following exception when sort fieldname is _elevate_ . ava.lang.IndexOutOfBoundsException: Index: 1, Size: 0\n\tat java.util.ArrayList.RangeCheck(ArrayList.java:547)\n\tat java.util.ArrayList.get(ArrayList.java:322)\n\tat org.apache.solr.common.util.NamedList.getVal(NamedList.java:136)\n\tat org.apache.solr.handler.component.ShardFieldSortedHitQueue$ShardComparator.sortVal(ShardDoc.java:217)\n\tat org.apache.solr.handler.component.ShardFieldSortedHitQueue$2.compare(ShardDoc.java:255)\n\tat org.apache.solr.handler.component.ShardFieldSortedHitQueue.lessThan(ShardDoc.java:159)\n\tat org.apache.solr.handler.component.ShardFieldSortedHitQueue.lessThan(ShardDoc.java:101)\n\tat org.apache.lucene.util.PriorityQueue.insertWithOverflow(PriorityQueue.java:158)\n\tat org.apache.solr.handler.component.QueryComponent.mergeIds(QueryComponent.java:863)\n\tat org.apache.solr.handler.component.QueryComponent.handleRegularResponses(QueryComponent.java:626)\n\tat org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:605)\n\tat org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:309)\n\tat org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)\n\tat org.apache.solr.core.SolrCore.execute(SolrCore.java:1699)\n\tat org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:455)\n\tat org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:276)\n\tat org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)\n\tat org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)\n\tat org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:225)\n\tat org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:123)\n\tat org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:168)\n\tat org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:98)\n\tat org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:927)\n\tat org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)\n\tat org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:407)\n\tat org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1001)\n\tat org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:585)\n\tat org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:310)\n\tat java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)\n\tat java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)\n\tat java.lang.Thread.run(Thread.java:662) On Sat, May 4, 2013 at 11:10 AM, varun srivastava varunmail...@gmail.comwrote: Hi, Is Query Elevate featue is suppose to work with distributed query ? I have 2 shards but when I am doing distributed query I get following Exception. I am using solr 4.0.0 in following bug yonik is refering to problem in his comment https://issues.apache.org/jira/browse/SOLR-2949?focusedCommentId=13232736page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13232736 But it seems bug is fixed in 4.0 then why i am getting following exception with _elevate_ fieldname ava.lang.IndexOutOfBoundsException: Index: 1, Size: 0\n\tat java.util.ArrayList.RangeCheck(ArrayList.java:547)\n\tat java.util.ArrayList.get(ArrayList.java:322)\n\tat org.apache.solr.common.util.NamedList.getVal(NamedList.java:136)\n\tat org.apache.solr.handler.component.ShardFieldSortedHitQueue$ShardComparator.sortVal(ShardDoc.java:217)\n\tat org.apache.solr.handler.component.ShardFieldSortedHitQueue$2.compare(ShardDoc.java:255)\n\tat org.apache.solr.handler.component.ShardFieldSortedHitQueue.lessThan(ShardDoc.java:159)\n\tat org.apache.solr.handler.component.ShardFieldSortedHitQueue.lessThan(ShardDoc.java:101)\n\tat org.apache.lucene.util.PriorityQueue.insertWithOverflow(PriorityQueue.java:158)\n\tat org.apache.solr.handler.component.QueryComponent.mergeIds(QueryComponent.java:863)\n\tat org.apache.solr.handler.component.QueryComponent.handleRegularResponses(QueryComponent.java:626)\n\tat org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:605)\n\tat org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:309)\n\tat org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)\n\tat org.apache.solr.core.SolrCore.execute(SolrCore.java
Re: Query Elevation exception on shard queries
Thanks Ravi. So then it is a bug . On Mon, May 6, 2013 at 12:04 PM, Ravi Solr ravis...@gmail.com wrote: Varun, Since our cores were totally disjoint i.e. they pertain to two different applications which may or may not have results for a given query, we moved the elavation outside of solr into our java code. As long as both cores had some results to return for a given query elevation would work. Thanks, Ravi On Sat, May 4, 2013 at 1:54 PM, varun srivastava varunmail...@gmail.com wrote: Hi Ravi, I am getting same probelm . You got any solution ? Thanks Varun On Fri, Mar 29, 2013 at 11:48 AM, Ravi Solr ravis...@gmail.com wrote: Hello, We have a Solr 3.6.2 multicore setup, where each core is a complete index for one application. In our site search we use sharded query to query two cores at a time. The issue is, If one core has docs but other core doesn't for an elevated query solr is throwing a 500 error. I woudl really appreciate it if somebody can point me in the right direction on how to avoid this error, the following is my query [#|2013-03-29T13:44:55.609-0400|INFO|sun-appserver2.1|org.apache.solr.core.SolrCore|_ThreadID=22;_ThreadName=httpSSLWorkerThread-9001-0;|[core1] webapp=/solr path=/select/ params={q=civil+warstart=0rows=10shards=localhost:/solr/core1,localhost:/solr/core2hl=truehl.fragsize=0hl.snippets=5hl.simple.pre=stronghl.simple.post=/stronghl.fl=bodyfl=*facet=truefacet.field=typefacet.mincount=1facet.method=enumfq=pubdate:[2005-01-01T00:00:00Z+TO+NOW/DAY%2B1DAY]facet.query={!ex%3Ddt+key%3DPast+24+Hours}pubdate:[NOW/DAY-1DAY+TO+NOW/DAY%2B1DAY]facet.query={!ex%3Ddt+key%3DPast+7+Days}pubdate:[NOW/DAY-7DAYS+TO+NOW/DAY%2B1DAY]facet.query={!ex%3Ddt+key%3DPast+60+Days}pubdate:[NOW/DAY-60DAYS+TO+NOW/DAY%2B1DAY]facet.query={!ex%3Ddt+key%3DPast+12+Months}pubdate:[NOW/DAY-1YEAR+TO+NOW/DAY%2B1DAY]facet.query={!ex%3Ddt+key%3DAll+Since+2005}pubdate:[*+TO+NOW/DAY%2B1DAY]} status=500 QTime=15 |#] As you can see the 2 cores are core1 and core2. The core1 has data for he query 'civil war' however core2 doesn't have any data. We have the 'civil war' in the elevate.xml which causes Solr to throw a SolrException as follows. However if I remove the elevate entry for this query, everything works well. *type* Status report *message*Index: 1, Size: 0 java.lang.IndexOutOfBoundsException: Index: 1, Size: 0 at java.util.ArrayList.RangeCheck(ArrayList.java:547) at java.util.ArrayList.get(ArrayList.java:322) at org.apache.solr.common.util.NamedList.getVal(NamedList.java:137) at org.apache.solr.handler.component.ShardFieldSortedHitQueue$ShardComparator.sortVal(ShardDoc.java:221) at org.apache.solr.handler.component.ShardFieldSortedHitQueue$2.compare(ShardDoc.java:260) at org.apache.solr.handler.component.ShardFieldSortedHitQueue.lessThan(ShardDoc.java:160) at org.apache.solr.handler.component.ShardFieldSortedHitQueue.lessThan(ShardDoc.java:101) at org.apache.lucene.util.PriorityQueue.upHeap(PriorityQueue.java:223) at org.apache.lucene.util.PriorityQueue.add(PriorityQueue.java:132) at org.apache.lucene.util.PriorityQueue.insertWithOverflow(PriorityQueue.java:148) at org.apache.solr.handler.component.QueryComponent.mergeIds(QueryComponent.java:786) at org.apache.solr.handler.component.QueryComponent.handleRegularResponses(QueryComponent.java:587) at org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:566) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:283) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1376) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:365) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:260) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:246) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:214) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:313) at org.apache.catalina.core.StandardContextValve.invokeInternal(StandardContextValve.java:287) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:218) at org.apache.catalina.core.StandardPipeline.doInvoke(StandardPipeline.java:648) at org.apache.catalina.core.StandardPipeline.doInvoke(StandardPipeline.java:593) at com.sun.enterprise.web.WebPipeline.invoke(WebPipeline.java:94) at com.sun.enterprise.web.PESessionLockingStandardPipeline.invoke(PESessionLockingStandardPipeline.java:98
Re: Is there a way to remove caches in SOLR?
make size 0 On Mon, May 6, 2013 at 4:38 PM, bbarani bbar...@gmail.com wrote: I am trying to create performance metrics for SOLR. I don't want the searcher to warm up when I issue a query since I am trying to collect metrics for cold search. Is there a way to disable warming? -- View this message in context: http://lucene.472066.n3.nabble.com/Is-there-a-way-to-remove-caches-in-SOLR-tp4061216.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to get solr synonyms in result set.
Hi Suneel, After discovering that only query time synonym work with solr I found a good article on pros and cons of query and index time synonyms . It may help you http://nolanlawson.com/2012/10/31/better-synonym-handling-in-solr/ Regards Varun On Sun, May 5, 2013 at 9:20 AM, Erick Erickson erickerick...@gmail.comwrote: Sure, you can specify a separate synonyms list at query time, just define an index and query time analysis chain one without the synonym filter factory and one without. Be aware that index-time and query-time have some different characteristics, especially around multi-word synonyms see: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory Best Erick On Sun, May 5, 2013 at 12:23 AM, varun srivastava varunmail...@gmail.com wrote: Hi , Synonyms list is used at index time. So I dont think you can pass list at query time and make it work. On Fri, May 3, 2013 at 11:53 PM, Suneel Pandey pandey.sun...@gmail.com wrote: Hi, I want to get specific solr synonyms terms list during query time in result set based on filter criteria. I have implemented synonyms in .txt file. Thanks - Regards, Suneel Pandey Sr. Software Developer -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-get-solr-synonyms-in-result-set-tp4060796.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Query Elevation exception on shard queries
Hi Ravi, I am getting same probelm . You got any solution ? Thanks Varun On Fri, Mar 29, 2013 at 11:48 AM, Ravi Solr ravis...@gmail.com wrote: Hello, We have a Solr 3.6.2 multicore setup, where each core is a complete index for one application. In our site search we use sharded query to query two cores at a time. The issue is, If one core has docs but other core doesn't for an elevated query solr is throwing a 500 error. I woudl really appreciate it if somebody can point me in the right direction on how to avoid this error, the following is my query [#|2013-03-29T13:44:55.609-0400|INFO|sun-appserver2.1|org.apache.solr.core.SolrCore|_ThreadID=22;_ThreadName=httpSSLWorkerThread-9001-0;|[core1] webapp=/solr path=/select/ params={q=civil+warstart=0rows=10shards=localhost:/solr/core1,localhost:/solr/core2hl=truehl.fragsize=0hl.snippets=5hl.simple.pre=stronghl.simple.post=/stronghl.fl=bodyfl=*facet=truefacet.field=typefacet.mincount=1facet.method=enumfq=pubdate:[2005-01-01T00:00:00Z+TO+NOW/DAY%2B1DAY]facet.query={!ex%3Ddt+key%3DPast+24+Hours}pubdate:[NOW/DAY-1DAY+TO+NOW/DAY%2B1DAY]facet.query={!ex%3Ddt+key%3DPast+7+Days}pubdate:[NOW/DAY-7DAYS+TO+NOW/DAY%2B1DAY]facet.query={!ex%3Ddt+key%3DPast+60+Days}pubdate:[NOW/DAY-60DAYS+TO+NOW/DAY%2B1DAY]facet.query={!ex%3Ddt+key%3DPast+12+Months}pubdate:[NOW/DAY-1YEAR+TO+NOW/DAY%2B1DAY]facet.query={!ex%3Ddt+key%3DAll+Since+2005}pubdate:[*+TO+NOW/DAY%2B1DAY]} status=500 QTime=15 |#] As you can see the 2 cores are core1 and core2. The core1 has data for he query 'civil war' however core2 doesn't have any data. We have the 'civil war' in the elevate.xml which causes Solr to throw a SolrException as follows. However if I remove the elevate entry for this query, everything works well. *type* Status report *message*Index: 1, Size: 0 java.lang.IndexOutOfBoundsException: Index: 1, Size: 0 at java.util.ArrayList.RangeCheck(ArrayList.java:547) at java.util.ArrayList.get(ArrayList.java:322) at org.apache.solr.common.util.NamedList.getVal(NamedList.java:137) at org.apache.solr.handler.component.ShardFieldSortedHitQueue$ShardComparator.sortVal(ShardDoc.java:221) at org.apache.solr.handler.component.ShardFieldSortedHitQueue$2.compare(ShardDoc.java:260) at org.apache.solr.handler.component.ShardFieldSortedHitQueue.lessThan(ShardDoc.java:160) at org.apache.solr.handler.component.ShardFieldSortedHitQueue.lessThan(ShardDoc.java:101) at org.apache.lucene.util.PriorityQueue.upHeap(PriorityQueue.java:223) at org.apache.lucene.util.PriorityQueue.add(PriorityQueue.java:132) at org.apache.lucene.util.PriorityQueue.insertWithOverflow(PriorityQueue.java:148) at org.apache.solr.handler.component.QueryComponent.mergeIds(QueryComponent.java:786) at org.apache.solr.handler.component.QueryComponent.handleRegularResponses(QueryComponent.java:587) at org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:566) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:283) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1376) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:365) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:260) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:246) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:214) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:313) at org.apache.catalina.core.StandardContextValve.invokeInternal(StandardContextValve.java:287) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:218) at org.apache.catalina.core.StandardPipeline.doInvoke(StandardPipeline.java:648) at org.apache.catalina.core.StandardPipeline.doInvoke(StandardPipeline.java:593) at com.sun.enterprise.web.WebPipeline.invoke(WebPipeline.java:94) at com.sun.enterprise.web.PESessionLockingStandardPipeline.invoke(PESessionLockingStandardPipeline.java:98) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:222) at org.apache.catalina.core.StandardPipeline.doInvoke(StandardPipeline.java:648) at org.apache.catalina.core.StandardPipeline.doInvoke(StandardPipeline.java:593) at org.apache.catalina.core.StandardPipeline.invoke(StandardPipeline.java:587) at org.apache.catalina.core.ContainerBase.invoke(ContainerBase.java:1093) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:166) at org.apache.catalina.core.StandardPipeline.doInvoke(StandardPipeline.java:648) at org.apache.catalina.core.StandardPipeline.doInvoke(StandardPipeline.java:593) at org.apache.catalina.core.StandardPipeline.invoke(StandardPipeline.java:587) at
Elevate Problem with Distributed query
Hi, Is Query Elevate featue is suppose to work with distributed query ? I have 2 shards but when I am doing distributed query I get following Exception. I am using solr 4.0.0 in following bug yonik is refering to problem in his comment https://issues.apache.org/jira/browse/SOLR-2949?focusedCommentId=13232736page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13232736 But it seems bug is fixed in 4.0 then why i am getting following exception with _elevate_ fieldname ava.lang.IndexOutOfBoundsException: Index: 1, Size: 0\n\tat java.util.ArrayList.RangeCheck(ArrayList.java:547)\n\tat java.util.ArrayList.get(ArrayList.java:322)\n\tat org.apache.solr.common.util.NamedList.getVal(NamedList.java:136)\n\tat org.apache.solr.handler.component.ShardFieldSortedHitQueue$ShardComparator.sortVal(ShardDoc.java:217)\n\tat org.apache.solr.handler.component.ShardFieldSortedHitQueue$2.compare(ShardDoc.java:255)\n\tat org.apache.solr.handler.component.ShardFieldSortedHitQueue.lessThan(ShardDoc.java:159)\n\tat org.apache.solr.handler.component.ShardFieldSortedHitQueue.lessThan(ShardDoc.java:101)\n\tat org.apache.lucene.util.PriorityQueue.insertWithOverflow(PriorityQueue.java:158)\n\tat org.apache.solr.handler.component.QueryComponent.mergeIds(QueryComponent.java:863)\n\tat org.apache.solr.handler.component.QueryComponent.handleRegularResponses(QueryComponent.java:626)\n\tat org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:605)\n\tat org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:309)\n\tat org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)\n\tat org.apache.solr.core.SolrCore.execute(SolrCore.java:1699)\n\tat org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:455)\n\tat org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:276)\n\tat org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)\n\tat org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)\n\tat org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:225)\n\tat org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:123)\n\tat org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:168)\n\tat org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:98)\n\tat org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:927)\n\tat org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)\n\tat org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:407)\n\tat org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1001)\n\tat org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:585)\n\tat org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:310)\n\tat java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)\n\tat java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)\n\tat java.lang.Thread.run(Thread.java:662) Thanks Varun
Re: Elevate Problem with Distributed query
i am getting following exception when sort fieldname is _elevate_ . ava.lang.IndexOutOfBoundsException: Index: 1, Size: 0\n\tat java.util.ArrayList.RangeCheck(ArrayList.java:547)\n\tat java.util.ArrayList.get(ArrayList.java:322)\n\tat org.apache.solr.common.util.NamedList.getVal(NamedList.java:136)\n\tat org.apache.solr.handler.component.ShardFieldSortedHitQueue$ShardComparator.sortVal(ShardDoc.java:217)\n\tat org.apache.solr.handler.component.ShardFieldSortedHitQueue$2.compare(ShardDoc.java:255)\n\tat org.apache.solr.handler.component.ShardFieldSortedHitQueue.lessThan(ShardDoc.java:159)\n\tat org.apache.solr.handler.component.ShardFieldSortedHitQueue.lessThan(ShardDoc.java:101)\n\tat org.apache.lucene.util.PriorityQueue.insertWithOverflow(PriorityQueue.java:158)\n\tat org.apache.solr.handler.component.QueryComponent.mergeIds(QueryComponent.java:863)\n\tat org.apache.solr.handler.component.QueryComponent.handleRegularResponses(QueryComponent.java:626)\n\tat org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:605)\n\tat org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:309)\n\tat org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)\n\tat org.apache.solr.core.SolrCore.execute(SolrCore.java:1699)\n\tat org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:455)\n\tat org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:276)\n\tat org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)\n\tat org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)\n\tat org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:225)\n\tat org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:123)\n\tat org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:168)\n\tat org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:98)\n\tat org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:927)\n\tat org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)\n\tat org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:407)\n\tat org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1001)\n\tat org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:585)\n\tat org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:310)\n\tat java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)\n\tat java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)\n\tat java.lang.Thread.run(Thread.java:662) On Sat, May 4, 2013 at 11:10 AM, varun srivastava varunmail...@gmail.comwrote: Hi, Is Query Elevate featue is suppose to work with distributed query ? I have 2 shards but when I am doing distributed query I get following Exception. I am using solr 4.0.0 in following bug yonik is refering to problem in his comment https://issues.apache.org/jira/browse/SOLR-2949?focusedCommentId=13232736page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13232736 But it seems bug is fixed in 4.0 then why i am getting following exception with _elevate_ fieldname ava.lang.IndexOutOfBoundsException: Index: 1, Size: 0\n\tat java.util.ArrayList.RangeCheck(ArrayList.java:547)\n\tat java.util.ArrayList.get(ArrayList.java:322)\n\tat org.apache.solr.common.util.NamedList.getVal(NamedList.java:136)\n\tat org.apache.solr.handler.component.ShardFieldSortedHitQueue$ShardComparator.sortVal(ShardDoc.java:217)\n\tat org.apache.solr.handler.component.ShardFieldSortedHitQueue$2.compare(ShardDoc.java:255)\n\tat org.apache.solr.handler.component.ShardFieldSortedHitQueue.lessThan(ShardDoc.java:159)\n\tat org.apache.solr.handler.component.ShardFieldSortedHitQueue.lessThan(ShardDoc.java:101)\n\tat org.apache.lucene.util.PriorityQueue.insertWithOverflow(PriorityQueue.java:158)\n\tat org.apache.solr.handler.component.QueryComponent.mergeIds(QueryComponent.java:863)\n\tat org.apache.solr.handler.component.QueryComponent.handleRegularResponses(QueryComponent.java:626)\n\tat org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:605)\n\tat org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:309)\n\tat org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)\n\tat org.apache.solr.core.SolrCore.execute(SolrCore.java:1699)\n\tat org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:455)\n\tat org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:276)\n\tat org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)\n\tat
Re: Query Elevation exception on shard queries
Hi , I am getting same exception as Ravi when using shard query containing elevated terms. I am using solr 4.0.0 stable version which is suppose to contain fix for this. https://issues.apache.org/jira/browse/SOLR-2949 ava.lang.IndexOutOfBoundsException: Index: 1, Size: 0\n\tat java.util.ArrayList.RangeCheck(ArrayList.java:547)\n\tat java.util.ArrayList.get(ArrayList.java:322)\n\tat org.apache.solr.common.util.NamedList.getVal(NamedList.java:136)\n\tat org.apache.solr.handler.component.ShardFieldSortedHitQueue$ShardComparator.sortVal(ShardDoc.java:217)\n\tat org.apache.solr.handler.component.ShardFieldSortedHitQueue$2.compare(ShardDoc.java:255)\n\tat org.apache.solr.handler.component.ShardFieldSortedHitQueue.lessThan(ShardDoc.java:159)\n\tat org.apache.solr.handler.component.ShardFieldSortedHitQueue.lessThan(ShardDoc.java:101)\n\tat org.apache.lucene.util.PriorityQueue.insertWithOverflow(PriorityQueue.java:158)\n\tat org.apache.solr.handler.component.QueryComponent.mergeIds(QueryComponent.java:863)\n\tat org.apache.solr.handler.component.QueryComponent.handleRegularResponses(QueryComponent.java:626)\n\tat org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:605)\n\tat org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:309)\n\tat org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)\n\tat org.apache.solr.core.SolrCore.execute(SolrCore.java:1699)\n\tat org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:455)\n\tat org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:276)\n\tat org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)\n\tat org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)\n\tat org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:225)\n\tat org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:123)\n\tat org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:168)\n\tat org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:98)\n\tat org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:927)\n\tat org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)\n\tat org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:407)\n\tat org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1001)\n\tat org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:585)\n\tat org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:310)\n\tat java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)\n\tat java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)\n\tat java.lang.Thread.run(Thread.java:662) On Sat, May 4, 2013 at 10:54 AM, varun srivastava varunmail...@gmail.comwrote: Hi Ravi, I am getting same probelm . You got any solution ? Thanks Varun On Fri, Mar 29, 2013 at 11:48 AM, Ravi Solr ravis...@gmail.com wrote: Hello, We have a Solr 3.6.2 multicore setup, where each core is a complete index for one application. In our site search we use sharded query to query two cores at a time. The issue is, If one core has docs but other core doesn't for an elevated query solr is throwing a 500 error. I woudl really appreciate it if somebody can point me in the right direction on how to avoid this error, the following is my query [#|2013-03-29T13:44:55.609-0400|INFO|sun-appserver2.1|org.apache.solr.core.SolrCore|_ThreadID=22;_ThreadName=httpSSLWorkerThread-9001-0;|[core1] webapp=/solr path=/select/ params={q=civil+warstart=0rows=10shards=localhost:/solr/core1,localhost:/solr/core2hl=truehl.fragsize=0hl.snippets=5hl.simple.pre=stronghl.simple.post=/stronghl.fl=bodyfl=*facet=truefacet.field=typefacet.mincount=1facet.method=enumfq=pubdate:[2005-01-01T00:00:00Z+TO+NOW/DAY%2B1DAY]facet.query={!ex%3Ddt+key%3DPast+24+Hours}pubdate:[NOW/DAY-1DAY+TO+NOW/DAY%2B1DAY]facet.query={!ex%3Ddt+key%3DPast+7+Days}pubdate:[NOW/DAY-7DAYS+TO+NOW/DAY%2B1DAY]facet.query={!ex%3Ddt+key%3DPast+60+Days}pubdate:[NOW/DAY-60DAYS+TO+NOW/DAY%2B1DAY]facet.query={!ex%3Ddt+key%3DPast+12+Months}pubdate:[NOW/DAY-1YEAR+TO+NOW/DAY%2B1DAY]facet.query={!ex%3Ddt+key%3DAll+Since+2005}pubdate:[*+TO+NOW/DAY%2B1DAY]} status=500 QTime=15 |#] As you can see the 2 cores are core1 and core2. The core1 has data for he query 'civil war' however core2 doesn't have any data. We have the 'civil war' in the elevate.xml which causes Solr to throw a SolrException as follows. However if I remove the elevate entry for this query, everything works well. *type* Status report *message*Index: 1, Size: 0 java.lang.IndexOutOfBoundsException: Index: 1, Size: 0 at java.util.ArrayList.RangeCheck(ArrayList.java:547
Re: How to get solr synonyms in result set.
Hi , Synonyms list is used at index time. So I dont think you can pass list at query time and make it work. On Fri, May 3, 2013 at 11:53 PM, Suneel Pandey pandey.sun...@gmail.comwrote: Hi, I want to get specific solr synonyms terms list during query time in result set based on filter criteria. I have implemented synonyms in .txt file. Thanks - Regards, Suneel Pandey Sr. Software Developer -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-get-solr-synonyms-in-result-set-tp4060796.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SolrCloud - Sorting Problem
Hi Deniz, Your mail about distributed query is really helpful. Can you or someone else improve the following wiki. RIght now we dont have any document explaining distributed search in solr, which is now backbone of solr cloud. http://wiki.apache.org/solr/WritingDistributedSearchComponents Thanks Varun On Sun, Dec 2, 2012 at 10:49 PM, deniz denizdurmu...@gmail.com wrote: I think I have figured out this... at least some kinda.. After putting logs here there in the code, especially in SolrCore, HttpShardHandler, SearchHandler classes, it seems like sorting is done after all of the shards finish responding and then before we see the results the result set is sorted... I am not sure if this is correct or not totally, it is what i see from the logs, in the request headers.. so for a shard or distributed search the header looks like this: status=0,QTime=4,params={df=text,fl=*,position,shard.url=blablabla and just before i see the results on my browser the header becomes this: status=0,QTime=178,params={fl=*,position,sort=myfield desc and basically, because the position field was filled before actual sorting on the page, the positions are incorrect... is this right? i mean sorting is really done after everything finishes and we are about to get results? - Zeki ama calismiyor... Calissa yapar... -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-Sorting-Problem-tp4023382p4023889.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SolrCloud - Sorting Problem
Also if anyone who understand DistributedSearch can update following wiki it will be really helpful for all of us. http://wiki.apache.org/solr/DistributedSearchDesign Thanks Varun On Sat, Mar 9, 2013 at 4:03 PM, varun srivastava varunmail...@gmail.comwrote: Hi Deniz, Your mail about distributed query is really helpful. Can you or someone else improve the following wiki. RIght now we dont have any document explaining distributed search in solr, which is now backbone of solr cloud. http://wiki.apache.org/solr/WritingDistributedSearchComponents Thanks Varun On Sun, Dec 2, 2012 at 10:49 PM, deniz denizdurmu...@gmail.com wrote: I think I have figured out this... at least some kinda.. After putting logs here there in the code, especially in SolrCore, HttpShardHandler, SearchHandler classes, it seems like sorting is done after all of the shards finish responding and then before we see the results the result set is sorted... I am not sure if this is correct or not totally, it is what i see from the logs, in the request headers.. so for a shard or distributed search the header looks like this: status=0,QTime=4,params={df=text,fl=*,position,shard.url=blablabla and just before i see the results on my browser the header becomes this: status=0,QTime=178,params={fl=*,position,sort=myfield desc and basically, because the position field was filled before actual sorting on the page, the positions are incorrect... is this right? i mean sorting is really done after everything finishes and we are about to get results? - Zeki ama calismiyor... Calissa yapar... -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-Sorting-Problem-tp4023382p4023889.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: dropping fields from input data
Thanks Hoss .. Is this available in 4.0 ? On Tue, Mar 5, 2013 at 5:14 PM, Chris Hostetter hossman_luc...@fucit.orgwrote: :dynamicField name=stamp_* type=string indexed=false : stored=false multiValued=true/ Take a look at IgnoreFieldUpdateProcessorFactory... https://lucene.apache.org/solr/4_1_0/solr-core/org/apache/solr/update/processor/IgnoreFieldUpdateProcessorFactory.html https://lucene.apache.org/solr/4_1_0/solr-core/org/apache/solr/update/processor/FieldMutatingUpdateProcessorFactory.html Using that (instead of setting indexed=false stored=false in schema.xml) has the advantage that you can use it to throw away fields early in the update processor pipeline, before any distributed logic happens in SolrCloud. -Hoss
Re: Role of zookeeper at runtime
How can I setup cloud master-slave ? Can you point me to any sample config or tutorial which describe the steps to get slor cloud in master-slave setup. As you know from my previous mails, that I dont need active solr replicas, I just need a mechanism to copy a given solr cloud index to a new instance of solr-cloud ( classic master-slave setup) Eric/ Mark, We have 10 virtual data centres . Now its setup like this because we do rolling update. While 1 st dc is getting indexed other 9 serve traffic . Indexing one dc take 2 hours. Now with single shard we use to index one dc and then quickly replicate index into other dcs by having master-slave setup. Now in case of solr cloud obviously we can't index each dc sequentially as it will take 2*10 hours. So we need way of indexing 1 dc and then somehow quickly propagate the index binary to others. What will you recommend for solr cloud ? Thanks Varun On Thu, Feb 28, 2013 at 6:12 AM, Mark Miller markrmil...@gmail.com wrote: On Feb 26, 2013, at 6:49 PM, varun srivastava varunmail...@gmail.com wrote: So does it means while doing document add the state of cluster is fetched from zookeeper and then depending upon hash of docid the target shard is decided ? We keep the zookeeper info cached locally. We only updated it when ZooKeeper tells us it has changed. Assume we have 3 shards ( with no replicas) in which 1 went down while indexing , so will all the documents will be routed to remaining 2 shards or only 2/3 rd of the documents will be indexed ? If answer is remaining 2 shards will get all the documents , then if later 3rd shard comes up online then will solr cloud will do rebalancing ? All of the updates that hash to the third shard will fail. That is why we have replicas - if you have a replica, it will take over as the leader. Is anywhere in zookeeper we store the range of docids stored in each shard, or any other information about actual docs ? The range of hashes are stored for each shard in zk. We have 2 datacentres (dc1 and dc2) which need to be indexed with exactly same data and we update index only once a day. Both dc1 and dc2 have exact same solrcloud config and machines. Can we populate dc2 by just copying all the index binaries from solr-cores/core0/data of dc1, to the machines in dc2 ( to avoid indexing same documents on dc2). I guess solr replication API doesn't work in solrcloud, hence loooking for work around. Thanks Varun On Tue, Feb 26, 2013 at 3:34 PM, Mark Miller markrmil...@gmail.com wrote: ZooKeeper / /clusterstate.json - info about the layout and state of the cluster - collections, shards, urls, etc /collections - config to use for the collection, shard leader voting zk nodes /configs - sets of config files /live_nodes - ephemeral nodes, one per Solr node /overseer - work queue for update clusterstate.json, creating new collections, etc /overseer_elect - overseer voting zk nodes - Mark On Feb 26, 2013, at 6:18 PM, varun srivastava varunmail...@gmail.com wrote: Hi Mark, One more question While doing solr doc update/add what information is required from zookeeper ? Can you tell what all information is stored in zookeeper other than the startup configs. Thanks Varun On Tue, Feb 26, 2013 at 3:09 PM, Mark Miller markrmil...@gmail.com wrote: On Feb 26, 2013, at 5:25 PM, varun srivastava varunmail...@gmail.com wrote: Hi All, I have some questions regarding role of zookeeper in solrcloud runtime, while processing the queries . 1) Is zookeeper cluster referred by solr shards for processing every request, or its only used to copy config on startup time ? No, it's not used per request. Solr talks to ZooKeeper on SolrCore startup - to get configs and set itself up. Then it only talks to ZooKeeper when a cluster state change happens - in that case, ZooKeeper pings Solr and Solr will get an update view of the cluster. That view is cached and used for requests. In a stable state, Solr is not talking to ZooKeeper other than the heartbeat they keep to know a node is up. 2) How loadbalancing is done between replicas ? Is traffic stat shared through zookeeper ? Basic round robin. Traffic stats are not currently in Zk. 3) If for any reason zookeeper cluster goes offline for sometime, does solr cloud will not be able to server any traffic ? It will stop allowing updates, but continue serving searches. - Mark Thanks Varun
Re: Solr cloud deployment on tomcat in prod
Great .. I will do it and send you all for review. Thanks Varun On Thu, Feb 28, 2013 at 4:50 AM, Erick Erickson erickerick...@gmail.comwrote: Anyone can edit the Wiki, contributions welcome! Best Erick On Mon, Feb 25, 2013 at 5:50 PM, varun srivastava varunmail...@gmail.com wrote: Hi, Is there any official documentation around deployment of solr cloud in production on tomcat ? I am looking for anything as detailed as following one .. It will be good if someone can take the following tutorial and get it on official solrcloud wiki after reviewing each step. http://www.myjeeva.com/2012/10/solrcloud-cluster-single-collection-deployment/ Thanks Varun
Re: Role of zookeeper at runtime
Any thought on this ? We have 10 virtual data centres . Now its setup like this because we do rolling update. While 1 st dc is getting indexed other 9 serve traffic . Indexing one dc take 2 hours. Now with single shard we use to index one dc and then quickly replicate index into other dcs by having master-slave setup. Now in case of solr cloud obviously we can't index each dc sequentially as it will take 2*10 hours. So we need way of indexing 1 dc and then somehow quickly propagate the index binary to others. What will you recommend for solr cloud ? Thanks Varun On Thu, Feb 28, 2013 at 11:33 AM, varun srivastava varunmail...@gmail.comwrote: How can I setup cloud master-slave ? Can you point me to any sample config or tutorial which describe the steps to get slor cloud in master-slave setup. As you know from my previous mails, that I dont need active solr replicas, I just need a mechanism to copy a given solr cloud index to a new instance of solr-cloud ( classic master-slave setup) Eric/ Mark, We have 10 virtual data centres . Now its setup like this because we do rolling update. While 1 st dc is getting indexed other 9 serve traffic . Indexing one dc take 2 hours. Now with single shard we use to index one dc and then quickly replicate index into other dcs by having master-slave setup. Now in case of solr cloud obviously we can't index each dc sequentially as it will take 2*10 hours. So we need way of indexing 1 dc and then somehow quickly propagate the index binary to others. What will you recommend for solr cloud ? Thanks Varun On Thu, Feb 28, 2013 at 6:12 AM, Mark Miller markrmil...@gmail.comwrote: On Feb 26, 2013, at 6:49 PM, varun srivastava varunmail...@gmail.com wrote: So does it means while doing document add the state of cluster is fetched from zookeeper and then depending upon hash of docid the target shard is decided ? We keep the zookeeper info cached locally. We only updated it when ZooKeeper tells us it has changed. Assume we have 3 shards ( with no replicas) in which 1 went down while indexing , so will all the documents will be routed to remaining 2 shards or only 2/3 rd of the documents will be indexed ? If answer is remaining 2 shards will get all the documents , then if later 3rd shard comes up online then will solr cloud will do rebalancing ? All of the updates that hash to the third shard will fail. That is why we have replicas - if you have a replica, it will take over as the leader. Is anywhere in zookeeper we store the range of docids stored in each shard, or any other information about actual docs ? The range of hashes are stored for each shard in zk. We have 2 datacentres (dc1 and dc2) which need to be indexed with exactly same data and we update index only once a day. Both dc1 and dc2 have exact same solrcloud config and machines. Can we populate dc2 by just copying all the index binaries from solr-cores/core0/data of dc1, to the machines in dc2 ( to avoid indexing same documents on dc2). I guess solr replication API doesn't work in solrcloud, hence loooking for work around. Thanks Varun On Tue, Feb 26, 2013 at 3:34 PM, Mark Miller markrmil...@gmail.com wrote: ZooKeeper / /clusterstate.json - info about the layout and state of the cluster - collections, shards, urls, etc /collections - config to use for the collection, shard leader voting zk nodes /configs - sets of config files /live_nodes - ephemeral nodes, one per Solr node /overseer - work queue for update clusterstate.json, creating new collections, etc /overseer_elect - overseer voting zk nodes - Mark On Feb 26, 2013, at 6:18 PM, varun srivastava varunmail...@gmail.com wrote: Hi Mark, One more question While doing solr doc update/add what information is required from zookeeper ? Can you tell what all information is stored in zookeeper other than the startup configs. Thanks Varun On Tue, Feb 26, 2013 at 3:09 PM, Mark Miller markrmil...@gmail.com wrote: On Feb 26, 2013, at 5:25 PM, varun srivastava varunmail...@gmail.com wrote: Hi All, I have some questions regarding role of zookeeper in solrcloud runtime, while processing the queries . 1) Is zookeeper cluster referred by solr shards for processing every request, or its only used to copy config on startup time ? No, it's not used per request. Solr talks to ZooKeeper on SolrCore startup - to get configs and set itself up. Then it only talks to ZooKeeper when a cluster state change happens - in that case, ZooKeeper pings Solr and Solr will get an update view of the cluster. That view is cached and used for requests. In a stable state, Solr is not talking to ZooKeeper other than the heartbeat they keep to know a node is up. 2) How loadbalancing is done between replicas ? Is traffic stat shared through zookeeper ? Basic round robin. Traffic stats
Re: Role of zookeeper at runtime
You can replicate from a SolrCloud node still. Just hit it's replication handler and pass in the master url to replicate to How will this work ? lets say s1dc1 is master of s1dc2 , s2dc1 is master for s2dc2 .. so after hitting replicate index binary will get copied but then how appropriate entries will be made in zookeeper. Zookeeper need to know which doc id range residing in which shard. Thanks Varun On Thu, Feb 28, 2013 at 4:27 PM, Mark Miller markrmil...@gmail.com wrote: On Feb 28, 2013, at 6:20 PM, varun srivastava varunmail...@gmail.com wrote: So we need way of indexing 1 dc and then somehow quickly propagate the index binary to others. You can replicate from a SolrCloud node still. Just hit it's replication handler and pass in the master url to replicate to. It doesn't have any guarantees in terms of data loss, eg it's not part of SolrCloud per say, but it's a fast way to move an index. - Mark
Re: org.apache.solr.cloud.ZkCLI timeout
Hi Markus, Do you mean keeping the file in solr-cores/lib directory or inside collection1 ( if name of my solr cloud collection is collection1) ? In case I keep it inside solr-cores/lib will I get the file by calling SolrResourceLoader.openConfig(...) ? Mark, How can I tweak zookeeper limits ? Thanks Varun On Wed, Feb 27, 2013 at 1:08 PM, Markus Jelsma markus.jel...@openindex.iowrote: That's very big indeed, why not store it locally in the core's lib dir? It should work IIRC. -Original message- From:Mark Miller markrmil...@gmail.com Sent: Wed 27-Feb-2013 22:07 To: solr-user@lucene.apache.org Subject: Re: org.apache.solr.cloud.ZkCLI timeout Did you adjust ZooKeeper so that it will accept files greater than 1MB per node? That's more config files than I've ever tried to deal with... - Mark On Feb 27, 2013, at 4:02 PM, varun srivastava varunmail...@gmail.com wrote: Hi, I am using org.apache.solr.cloud.ZkCLI to push some a 56MB config into zookeeper for one of my solr component, but ZKCLI is always timing out. I can see that ZKCLI is not able to push any config file greater than 2MB of size even though zookeeper server is on same machine. Can we somehow increase the timeout for ZKCLI or any other way of pushing large config files ? Thanks Varun
Re: org.apache.solr.cloud.ZkCLI timeout
solr-cores is my solr/home On Wed, Feb 27, 2013 at 1:16 PM, varun srivastava varunmail...@gmail.comwrote: Hi Markus, Do you mean keeping the file in solr-cores/lib directory or inside collection1 ( if name of my solr cloud collection is collection1) ? In case I keep it inside solr-cores/lib will I get the file by calling SolrResourceLoader.openConfig(...) ? Mark, How can I tweak zookeeper limits ? Thanks Varun On Wed, Feb 27, 2013 at 1:08 PM, Markus Jelsma markus.jel...@openindex.io wrote: That's very big indeed, why not store it locally in the core's lib dir? It should work IIRC. -Original message- From:Mark Miller markrmil...@gmail.com Sent: Wed 27-Feb-2013 22:07 To: solr-user@lucene.apache.org Subject: Re: org.apache.solr.cloud.ZkCLI timeout Did you adjust ZooKeeper so that it will accept files greater than 1MB per node? That's more config files than I've ever tried to deal with... - Mark On Feb 27, 2013, at 4:02 PM, varun srivastava varunmail...@gmail.com wrote: Hi, I am using org.apache.solr.cloud.ZkCLI to push some a 56MB config into zookeeper for one of my solr component, but ZKCLI is always timing out. I can see that ZKCLI is not able to push any config file greater than 2MB of size even though zookeeper server is on same machine. Can we somehow increase the timeout for ZKCLI or any other way of pushing large config files ? Thanks Varun
Re: zk Config URL?
agree with darren here... setting up solr cloud is way too complicated .. moreover if you are using tomcat. Do we have any ticket to simplify the solr cloud installation ? I would love to include my suggestions in it. Thanks Varun On Mon, Feb 25, 2013 at 7:24 PM, darren dar...@ontrenet.com wrote: Ok. But its way too complicated than it should be. It should work smarter. Sent from my Verizon Wireless 4G LTE Smartphone Original message From: Anirudha Jadhav aniru...@nyu.edu Date: To: solr-user@lucene.apache.org Subject: Re: zk Config URL? Solr cloud reads solr cfg files from zookeeper. You need to push the cfg to zookeeper link collection to cfg. This is exactly what mark suggested earlier in the thread. This is also explained in solr cloud wiki. On Monday, February 25, 2013, Darren Govoni wrote: Hi Mark, I download latest zk, and run it. In my glassfish server, I set these system wide properties: numShards = 1 zkHost = 10.x.x.x:2181 jetty.port = 8080 (port of my domain) bootstrap_config = true I copy all the solr 4.1 dist/*.jar into my glassfish domain lib/ext directory. Then I deploy solr 4.1 war. It throws this exception always. [#|2013-02-25T13:31:32.304+**|INFO|glassfish3.1.2|** javax.enterprise.system.**container.web.com.sun.** enterprise.web|_ThreadID=10;_**ThreadName=Thread-2;|WEB0171: Created virtual server [__asadmin]|#] [#|2013-02-25T13:31:32.768+**|INFO|glassfish3.1.2|** javax.enterprise.system.**container.web.com.sun.** enterprise.web|_ThreadID=10;_**ThreadName=Thread-2;|WEB0172: Virtual server [server] loaded default web module []|#] [#|2013-02-25T13:31:34.222+**|WARNING|glassfish3.1.2|** javax.enterprise.system.tools.**deployment.org.glassfish.** deployment.common|_ThreadID=**10;_ThreadName=Thread-2;|**DPL8007: Unsupported deployment descriptors element schemaLocation value http://www.bea.com/ns/**weblogic/90 http://www.bea.com/ns/weblogic/90 http://www.bea.com/ns/**weblogic/90/weblogic-web-app.**xsd|# http://www.bea.com/ns/weblogic/90/weblogic-web-app.xsd%7C# ] [#|2013-02-25T13:31:34.223+**|SEVERE|glassfish3.1.2|** javax.enterprise.system.tools.**deployment.org.glassfish.** deployment.common|_ThreadID=**10;_ThreadName=Thread-2;|**DPL8006: get/add descriptor failure : filter-dispatched-requests-**enabled TO false|#] [#|2013-02-25T13:31:34.831+**|SEVERE|glassfish3.1.2|** javax.enterprise.system.**container.web.com.sun.** enterprise.web|_ThreadID=10;_**ThreadName=Thread-2;|**WebModule[/solr1]PWC1270: Exception starting filter SolrRequestFilter java.lang.**NoClassDefFoundError: javax/servlet/Filter at java.lang.ClassLoader.**defineClass1(Native Method) at java.lang.ClassLoader.**defineClassCond(ClassLoader.**java:631) at java.lang.ClassLoader.**defineClass(ClassLoader.java:**615) at java.security.**SecureClassLoader.defineClass(** SecureClassLoader.java:141) at java.net.URLClassLoader.**defineClass(URLClassLoader.**java:283) at java.net.URLClassLoader.**access$000(URLClassLoader.**java:58) at java.net.URLClassLoader$1.run(**URLClassLoader.java:197) at java.security.**AccessController.doPrivileged(**Native Method) at java.net.URLClassLoader.**findClass(URLClassLoader.java:**190) at sun.misc.Launcher$**ExtClassLoader.findClass(**Launcher.java:229) at java.lang.ClassLoader.**loadClass(ClassLoader.java:**306) at java.lang.ClassLoader.**loadClass(ClassLoader.java:**295) at com.sun.enterprise.v3.server.**APIClassLoaderServiceImpl$** APIClassLoader.loadClass(**APIClassLoaderServiceImpl.**java:206) at java.lang.ClassLoader.**loadClass(ClassLoader.java:**295) at java.lang.ClassLoader.**loadClass(ClassLoader.java:**295) at java.lang.ClassLoader.**loadClass(ClassLoader.java:**247) at org.glassfish.web.loader.**WebappClassLoader.loadClass(** WebappClassLoader.java:1456) at org.glassfish.web.loader.**WebappClassLoader.loadClass(** WebappClassLoader.java:1359) at org.apache.catalina.core.**ApplicationFilterConfig.** loadFilterClass(**ApplicationFilterConfig.java:**280) at org.apache.catalina.core.**ApplicationFilterConfig.**getFilter(** ApplicationFilterConfig.java:**250) at org.apache.catalina.core.**ApplicationFilterConfig.init** (ApplicationFilterConfig.java:**120) at org.apache.catalina.core.**StandardContext.filterStart(** StandardContext.java:4685) at org.apache.catalina.core.**StandardContext.start(** StandardContext.java:5377) at com.sun.enterprise.web.**WebModule.start(WebModule.**java:498) at org.apache.catalina.core.**ContainerBase.**addChildInternal(** ContainerBase.java:917) at org.apache.catalina.core.**ContainerBase.addChild(** ContainerBase.java:901) at org.apache.catalina.core.**StandardHost.addChild(** StandardHost.java:733) at
Re: Role of zookeeper at runtime
So does it means while doing document add the state of cluster is fetched from zookeeper and then depending upon hash of docid the target shard is decided ? Assume we have 3 shards ( with no replicas) in which 1 went down while indexing , so will all the documents will be routed to remaining 2 shards or only 2/3 rd of the documents will be indexed ? If answer is remaining 2 shards will get all the documents , then if later 3rd shard comes up online then will solr cloud will do rebalancing ? Is anywhere in zookeeper we store the range of docids stored in each shard, or any other information about actual docs ? We have 2 datacentres (dc1 and dc2) which need to be indexed with exactly same data and we update index only once a day. Both dc1 and dc2 have exact same solrcloud config and machines. Can we populate dc2 by just copying all the index binaries from solr-cores/core0/data of dc1, to the machines in dc2 ( to avoid indexing same documents on dc2). I guess solr replication API doesn't work in solrcloud, hence loooking for work around. Thanks Varun On Tue, Feb 26, 2013 at 3:34 PM, Mark Miller markrmil...@gmail.com wrote: ZooKeeper / /clusterstate.json - info about the layout and state of the cluster - collections, shards, urls, etc /collections - config to use for the collection, shard leader voting zk nodes /configs - sets of config files /live_nodes - ephemeral nodes, one per Solr node /overseer - work queue for update clusterstate.json, creating new collections, etc /overseer_elect - overseer voting zk nodes - Mark On Feb 26, 2013, at 6:18 PM, varun srivastava varunmail...@gmail.com wrote: Hi Mark, One more question While doing solr doc update/add what information is required from zookeeper ? Can you tell what all information is stored in zookeeper other than the startup configs. Thanks Varun On Tue, Feb 26, 2013 at 3:09 PM, Mark Miller markrmil...@gmail.com wrote: On Feb 26, 2013, at 5:25 PM, varun srivastava varunmail...@gmail.com wrote: Hi All, I have some questions regarding role of zookeeper in solrcloud runtime, while processing the queries . 1) Is zookeeper cluster referred by solr shards for processing every request, or its only used to copy config on startup time ? No, it's not used per request. Solr talks to ZooKeeper on SolrCore startup - to get configs and set itself up. Then it only talks to ZooKeeper when a cluster state change happens - in that case, ZooKeeper pings Solr and Solr will get an update view of the cluster. That view is cached and used for requests. In a stable state, Solr is not talking to ZooKeeper other than the heartbeat they keep to know a node is up. 2) How loadbalancing is done between replicas ? Is traffic stat shared through zookeeper ? Basic round robin. Traffic stats are not currently in Zk. 3) If for any reason zookeeper cluster goes offline for sometime, does solr cloud will not be able to server any traffic ? It will stop allowing updates, but continue serving searches. - Mark Thanks Varun
Re: zk Config URL?
Is there any page following for solr cloud ? http://wiki.apache.org/solr/SolrTomcat Can we set -zkHost and -zkTimeout in tomcat/webapps/solr/META_INF/context.xml Thanks Varun On Tue, Feb 26, 2013 at 3:04 PM, Mark Miller markrmil...@gmail.com wrote: On Feb 26, 2013, at 4:35 PM, varun srivastava varunmail...@gmail.com wrote: Do we have any ticket to simplify the solr cloud installation ? I would love to include my suggestions in it. Please, throw some thoughts out on the list or start a new JIRA issue. - Mark
Re: zk Config URL?
I dont like setting parameters as system properties, but I am happy if i can setup these fields inside solr.xml . So you mean following config will work cores adminPath=/admin/cores defaultCoreName=core0 zkClientTimeout=2 hostPort=tomcat port hostContext=solr zkHost=zookeeper hosts Thanks Varun On Tue, Feb 26, 2013 at 4:09 PM, Mark Miller markrmil...@gmail.com wrote: On Feb 26, 2013, at 7:01 PM, varun srivastava varunmail...@gmail.com wrote: Is there any page following for solr cloud ? http://wiki.apache.org/solr/SolrTomcat Not that I know of. The main hitch with tomcat is that the hostPort in solr.xml is setup to be set by the jetty.port system property. So you either need to pass that property to Tomcat or rename it in solr.xml to something that makes sense in a Tomcat world. Otherwise, things are about the same as with Jetty. Can we set -zkHost and -zkTimeout in tomcat/webapps/solr/META_INF/context.xml No, they are set in solr.xml - special syntax is used to allow them to be set as system properties though. - Mark Thanks Varun On Tue, Feb 26, 2013 at 3:04 PM, Mark Miller markrmil...@gmail.com wrote: On Feb 26, 2013, at 4:35 PM, varun srivastava varunmail...@gmail.com wrote: Do we have any ticket to simplify the solr cloud installation ? I would love to include my suggestions in it. Please, throw some thoughts out on the list or start a new JIRA issue. - Mark
Re: zk Config URL?
Hi Mark, specifying zkHost in solr.xml is not working. It seems only system property -DzkHost works. Can you confirm the param name is zkHost in solr.xml ? Thanks Varun On Tue, Feb 26, 2013 at 4:24 PM, Mark Miller markrmil...@gmail.com wrote: On Feb 26, 2013, at 7:15 PM, varun srivastava varunmail...@gmail.com wrote: I dont like setting parameters as system properties, They are nice for the example, and often if you are using shell scripts or something to manage your cluster when you are screwing around, but yeah, many people will be happy to just put the info in the xml file. but I am happy if i can setup these fields inside solr.xml . So you mean following config will work cores adminPath=/admin/cores defaultCoreName=core0 zkClientTimeout=2 hostPort=tomcat port hostContext=solr zkHost=zookeeper hosts Yes. The only sys prop you would have to set is numShards, unless you removed the default collection and used the CoreAdmin or Collections API to create the first collection. - Mark
Re: zk Config URL?
Hi Mark, How to provide solr-plugin directory to solr collection. I have my plugins in solr_home/lib directory but still collection creation command failing as its not getting the plugin classes (/solr/admin/collections?action=CREATEname=europe-collectionnumShards=2replicationFactor=1) Thanks Varun On Tue, Feb 26, 2013 at 5:26 PM, varun srivastava varunmail...@gmail.comwrote: Hi Mark, specifying zkHost in solr.xml is not working. It seems only system property -DzkHost works. Can you confirm the param name is zkHost in solr.xml ? Thanks Varun On Tue, Feb 26, 2013 at 4:24 PM, Mark Miller markrmil...@gmail.comwrote: On Feb 26, 2013, at 7:15 PM, varun srivastava varunmail...@gmail.com wrote: I dont like setting parameters as system properties, They are nice for the example, and often if you are using shell scripts or something to manage your cluster when you are screwing around, but yeah, many people will be happy to just put the info in the xml file. but I am happy if i can setup these fields inside solr.xml . So you mean following config will work cores adminPath=/admin/cores defaultCoreName=core0 zkClientTimeout=2 hostPort=tomcat port hostContext=solr zkHost=zookeeper hosts Yes. The only sys prop you would have to set is numShards, unless you removed the default collection and used the CoreAdmin or Collections API to create the first collection. - Mark
Re: SloppyPhraseScorer behavior change
Moreover just checked .. autoGeneratePhraseQueries=true is set for both 3.4 and 4.0 in my schema. Thanks Varun On Fri, Jan 11, 2013 at 1:04 PM, varun srivastava varunmail...@gmail.comwrote: Hi Jack, Is this a new change done in solr 4.0 ? Seems autoGeneratePhraseQueries option is present from solr 3.1. Just wanted to confirm this is the difference causing change in behavior between 3.4 and 4.0. Thanks Varun On Mon, Dec 24, 2012 at 3:00 PM, Jack Krupansky j...@basetechnology.comwrote: Thanks. Sloppy phrase requires that the query terms be in a phrase, but you don't have any quotes in your query. Depending on your schema field type you may be running into a change in how auto-generated phrase queries are handled. It used to be that apple0ipad would always be treated as the quoted phrase apple 0 ipad, but now that is only true if your field type has autoGeneratePhraseQueries=true set. Now, if you don't have that option set, the term gets treated as (apple OR 0 OR ipad), which is a lot looser than the exact phrase. Look at the new example schema for the text_en_splitting field type as an example. -- Jack Krupansky -Original Message- From: varun srivastava Sent: Monday, December 24, 2012 5:49 PM To: solr-user@lucene.apache.org Subject: Re: SloppyPhraseScorer behavior change Hi Jack, My query was simple /solr/select?query=ipad apple apple0ipad and doc contained apple ipad . If you see the patch attached with the bug 3215 , you will find following comment. I want to confirm whether the behaviour I am observing is in sync with what the patch developer intended or its just some regression bug. In solr 3.4 phrase order is honored, whereas in solr 4.0 phrase order is not honored, i.e. apple ipad and ipad apple both treated as same. /** + * Score a candidate doc for all slop-valid position-combinations (matches) + * encountered while traversing/hopping the PhrasePositions. + * br The score contribution of a match depends on the distance: + * br - highest score for distance=0 (exact match). + * br - score gets lower as distance gets higher. + * brExample: for query a b~2, a document x a b a y can be scored twice: + * once for a b (distance=0), and once for b a (distance=2). + * brPossibly not all valid combinations are encountered, because for efficiency + * we always propagate the least PhrasePosition. This allows to base on + * PriorityQueue and move forward faster. + * As result, for example, document a b c b a + * would score differently for queries a b c~4 and c b a~4, although + * they really are equivalent. + * Similarly, for doc a b c b a f g, query c b~2 + * would get same score as g f~2, although c b~2 could be matched twice. + * We may want to fix this in the future (currently not, for performance reasons). + */ On Mon, Dec 24, 2012 at 1:21 PM, Jack Krupansky j...@basetechnology.com **wrote: Could you post the full query URL, so we can see exactly what your query was? Or, post the output of debug=query, which will show us what Lucene query was generated. -- Jack Krupansky -Original Message- From: varun srivastava Sent: Monday, December 24, 2012 1:53 PM To: solr-user@lucene.apache.org Subject: SloppyPhraseScorer behavior change Hi, Due to following bug fix https://issues.apache.org/jira/browse/LUCENE-3215https://issues.apache.org/**jira/browse/LUCENE-3215 https:**//issues.apache.org/jira/**browse/LUCENE-3215https://issues.apache.org/jira/browse/LUCENE-3215observing a change in behavior of SloppyPhraseScorer. I just wanted to confirm my understanding with you all. After solr 3.5 ( bug is fixed in 3.5), if there is a document a b c d e, then in solr 3.4 only query a b will match with document, but in solr 3.5 onwards, both query a b and b a will match. Is it right ? Thanks Varun
Re: SloppyPhraseScorer behavior change
Hi Jack, My query was simple /solr/select?query=ipad apple apple0ipad and doc contained apple ipad . If you see the patch attached with the bug 3215 , you will find following comment. I want to confirm whether the behaviour I am observing is in sync with what the patch developer intended or its just some regression bug. In solr 3.4 phrase order is honored, whereas in solr 4.0 phrase order is not honored, i.e. apple ipad and ipad apple both treated as same. /** + * Score a candidate doc for all slop-valid position-combinations (matches) + * encountered while traversing/hopping the PhrasePositions. + * br The score contribution of a match depends on the distance: + * br - highest score for distance=0 (exact match). + * br - score gets lower as distance gets higher. + * brExample: for query a b~2, a document x a b a y can be scored twice: + * once for a b (distance=0), and once for b a (distance=2). + * brPossibly not all valid combinations are encountered, because for efficiency + * we always propagate the least PhrasePosition. This allows to base on + * PriorityQueue and move forward faster. + * As result, for example, document a b c b a + * would score differently for queries a b c~4 and c b a~4, although + * they really are equivalent. + * Similarly, for doc a b c b a f g, query c b~2 + * would get same score as g f~2, although c b~2 could be matched twice. + * We may want to fix this in the future (currently not, for performance reasons). + */ On Mon, Dec 24, 2012 at 1:21 PM, Jack Krupansky j...@basetechnology.comwrote: Could you post the full query URL, so we can see exactly what your query was? Or, post the output of debug=query, which will show us what Lucene query was generated. -- Jack Krupansky -Original Message- From: varun srivastava Sent: Monday, December 24, 2012 1:53 PM To: solr-user@lucene.apache.org Subject: SloppyPhraseScorer behavior change Hi, Due to following bug fix https://issues.apache.org/**jira/browse/LUCENE-3215https://issues.apache.org/jira/browse/LUCENE-3215observing a change in behavior of SloppyPhraseScorer. I just wanted to confirm my understanding with you all. After solr 3.5 ( bug is fixed in 3.5), if there is a document a b c d e, then in solr 3.4 only query a b will match with document, but in solr 3.5 onwards, both query a b and b a will match. Is it right ? Thanks Varun
Re: solr 4.0 missing SolrPluginUtils addOrReplaceResults
Hi Solr-Users, Anyone has any work around for SolrPluginUtils.addOrReplaceResults in solr 4.0 ? Should be easy to migrate the code from 3.6 branch to 4.0 SolrPluginUtils. Is there any specific reason why this method is dropped in 4.0 ? Thanks Varun On Tue, Oct 23, 2012 at 11:14 AM, varun srivastava varunmail...@gmail.comwrote: Hi, What is the replacement for SolrPluginUtils.addOrReplaceResults in solr 4.0 ? Thanks Varun
Re: anyone has solrcloud perfromance numbers ?
Thanks Otis On Tue, Oct 2, 2012 at 8:06 PM, Otis Gospodnetic otis.gospodne...@gmail.com wrote: I don't have the URL handy, but guys at LinkedIn have a benchmark tool for Solr, ElasticSearch, and Sensei. Check the list archives for URL and my signature below for a tool that can show metrics for any of those systems, which you'll probably want to observe during testing. Otis -- Performance Monitoring - http://sematext.com/spm On Oct 2, 2012 6:47 PM, varun srivastava varunmail...@gmail.com wrote: Hi, Does anyone has some solr cloud preliminary performance numbers ? Or if someone has performance comparison ( throughput and latency) between solr 3.6 and solrcloud ( having a huge monolithic index vs sharded) ? Thanks Varun
Re: anyone has solrcloud perfromance numbers ?
Otis, I am looking for performance benchmark number rather than performance monitoring tools. SPM looks like monitoring tool. Moreover its comparing Solr with Elastic Search etc, I want comparison between Solr 3.6 and solrcloud. Thanks Varun On Tue, Oct 2, 2012 at 9:15 PM, varun srivastava varunmail...@gmail.comwrote: Thanks Otis On Tue, Oct 2, 2012 at 8:06 PM, Otis Gospodnetic otis.gospodne...@gmail.com wrote: I don't have the URL handy, but guys at LinkedIn have a benchmark tool for Solr, ElasticSearch, and Sensei. Check the list archives for URL and my signature below for a tool that can show metrics for any of those systems, which you'll probably want to observe during testing. Otis -- Performance Monitoring - http://sematext.com/spm On Oct 2, 2012 6:47 PM, varun srivastava varunmail...@gmail.com wrote: Hi, Does anyone has some solr cloud preliminary performance numbers ? Or if someone has performance comparison ( throughput and latency) between solr 3.6 and solrcloud ( having a huge monolithic index vs sharded) ? Thanks Varun
Re: Zookeeper setup for solr cloud
Hi, Rephrasing my question ... Let me know if anyone feel some problem with following deployment of solrcloud 1) Have 200 solrcloud nodes ( serv1, serv2, .. serv200) with each machine having both zookeeper and solr both. 2) zookeeper config contain the list of all servers server.1=serv1:2888:3888 server.2=serv2:2888:3888 ... server.200=serv200:2888:3888 3) Each solrconfig only talks to localhost zookeeper - -DzkHost=localhost:9983 Thanks Varun On Sun, Sep 30, 2012 at 4:51 PM, Lance Norskog goks...@gmail.com wrote: You can find Solr information with this: http://find.searchhub.org/?q=zookeeper+cluster http://find.searchhub.org/link?url=http://wiki.apache.org/solr/SolrCloud - Original Message - | From: varun srivastava varunmail...@gmail.com | To: solr-user@lucene.apache.org | Sent: Saturday, September 29, 2012 9:38:16 PM | Subject: Zookeeper setup for solr cloud | | Hi, | I would like to get recommendation on zookeeper ensemble | architecture. I | am thinking of following options, please let me know if I am correct | in | pros and con of each option. Also please feel free to add | differentiating | points I am missing. | | 1) Have separate boxes for zookeeper ensemble and all the solrcloud | instances access it on runtime. | Pros: Small set of zookeeper instances to maintain. May be sync up | between zookeeper boxes will be fast and reliable. | | 2) Let each solr box have zookeeper instance also. Each solr instance | accessing the localhost zookeeper. |Pros: solr will not incur over the wire cost at runtime, hence |should be | fast. More fault tolerant as solr not going over the wire to access | zookeeper. |Con: Lots of zookeeper instances and hence may be slow to update. | | | Thanks | Varun |
Re: Solr Caching - how to tune, how much to increase, and any tips on using Solr with JDK7 and G1 GC?
Hi Erick, You mentioned for 4.0 memory pattern is much difference than 3.X . Can you elaborate whether its worse or better ? Does 4.0 tend to use more memory for similar index size as compared to 3.X ? Thanks Varun On Sat, Sep 29, 2012 at 1:58 PM, Erick Erickson erickerick...@gmail.comwrote: Well, I haven't had experience with JDK7, so I'll skip that part... But about caches. First, as far as memory is concerned, be sure to read Uwe's blog about MMapDirectory here: http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html As to the caches. Be a little careful here. Getting high hit rates on _all_ your caches is a waste. filterCache. This is the exception, you want as high a hit ratio as you can get for this one, it's where the results of all the fq= clauses go and is a major factor in speeding up QPS.. queryResultCache. Hmmm, given the lack of updates to your index, this one may actually get more hits than Id expect. But it's a very cheap cache memory wise. Think of it as a map where the key is the query and the value is an array of queryResultWindowSize longs (document IDs). It's really intended for paging mostly. It's also often the case that the chances of the exact same query (except for start and rows) being issued is actually relatively small. As always YMMV. I usually see hit rates on this cache 10%. Evictions merely mean it's been around a long time, bumping the size of this cache probably won't affect the hit rate unless your app somehow submits just a few queries. documentCache. Again, this often doesn't have a great hit ration. It's main use as I understand it is to keep various parts of a query component chain from having to re-access the disk. Each element in a query component is completely separate from the others, so if two or more components want values from the doc, having them cached is useful. The usual recommendation is (#docs returned to user) * (expected simultaneous queries), where # docs returned to user is really the rows value. One of the consequences of having huge amounts of memory allocated to the JVM can be really long garbage collections. They happen less frequently but have more work to do when they happen. Oh, and when you start using 4.0, the memory patterns are much different... Finally, here's a great post on solr memory tuning, too bad the image links are broken... http://searchhub.org/dev/2011/03/27/garbage-collection-bootcamp-1-0/ Best Erick On Sat, Sep 29, 2012 at 3:08 PM, Aaron Daubman daub...@gmail.com wrote: Greetings, I've recently moved to running some of our Solr (3.6.1) instances using JDK 7u7 with the G1 GC (playing with max pauses in the 20 to 100ms range). By and large, it has been working well (or, perhaps I should say that without requiring much tuning it works much better in general than my haphazard attempts to tune CMS). I have two instances in particular, one with a heap size of 14G and one with a heap size of 60G. I'm attempting to squeeze out additional performance by increasing Solr's cache sizes (I am still seeing the hit ratio go up as I increase max size size and decrease the number of evictions), and am guessing this is the cause of some recent situations where the 14G instance especially eventually (12-24 hrs later under 100s of queries per minute) makes it to 80%-90% of the heap and then spirals into major GC with long-pause territory. I am wondering: 1) if anybody has experience tuning the G1 GC, especially for use with Solr (what are decent max-pause times to use?) 2) how to better tune Solr's cache sizes - e.g. how to even tell the actual amount of memory used by each cache (not # entries as the stats sow, but # bits) 3) if there are any guidelines on when increasing a cache's size (even if it does continue to increase the hit ratio) runs into the law of diminishing returns or even starts to hurt - e.g. if the document cache has a current maxSize of 65536 and has seen 4409275 evictions, and currently has a hit ratio of 0.74, should the max be increased further? If so, how much ram needs to be added to the heap, and how much larger should its max size be made? I should mention that these solr instances are read-only (so cache is probably more valuable than in other scenarios - we only invalidate the searcher every 20-24hrs or so) and are also backed with indexes (6G and 70G for the 14G and 60G heap sizes) on IODrives, so I'm not as concerned about leaving RAM for linux to cache the index files (I'd much rather actually cache the post-transformed values). Thanks as always, Aaron
Zookeeper setup for solr cloud
Hi, I would like to get recommendation on zookeeper ensemble architecture. I am thinking of following options, please let me know if I am correct in pros and con of each option. Also please feel free to add differentiating points I am missing. 1) Have separate boxes for zookeeper ensemble and all the solrcloud instances access it on runtime. Pros: Small set of zookeeper instances to maintain. May be sync up between zookeeper boxes will be fast and reliable. 2) Let each solr box have zookeeper instance also. Each solr instance accessing the localhost zookeeper. Pros: solr will not incur over the wire cost at runtime, hence should be fast. More fault tolerant as solr not going over the wire to access zookeeper. Con: Lots of zookeeper instances and hence may be slow to update. Thanks Varun