Understanding RecoveryStrategy
Hi Using Solr trunk with the replica feature, I see the below exception repeatedly in the Solr log. I have been looking into the code of RecoveryStrategy#commitOnLeader and read the code as follows: 1. sends a commit request (with COMMIT_END_POINT=true) to the Solr instance containing the leader of the slice 2. sends a commit request to the Solr instance containing the leader of the slice The first results in a commit on the shards in the single leader Solr instance and the second results in a commit on the shards in the single leader Solr plus on all other Solrs having slices or replica belonging to the collection. I would expect that the first request is the relevant (and enough to do a recovery of the specific replica). Am I reading the second request wrong or is it a bug? The code I'm referring to is UpdateRequest ureq = new UpdateRequest(); ureq.setParams(new ModifiableSolrParams()); ureq.getParams().set(DistributedUpdateProcessor.COMMIT_END_POINT, true); ureq.getParams().set(RecoveryStrategy.class.getName(), baseUrl); 1.ureq.setAction(AbstractUpdateRequest.ACTION.COMMIT, false, true).process( server); 2.server.commit(); Thanks in advance for any input. Best regards Trym R. Møller Apr 21, 2012 10:14:11 AM org.apache.solr.common.SolrException log SEVERE: Error while trying to recover:org.apache.solr.client.solrj.SolrServerException: http://myIP:8983/solr/myShardId at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:493) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:264) at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:103) at org.apache.solr.client.solrj.SolrServer.commit(SolrServer.java:180) at org.apache.solr.client.solrj.SolrServer.commit(SolrServer.java:156) at org.apache.solr.cloud.RecoveryStrategy.commitOnLeader(RecoveryStrategy.java:170) at org.apache.solr.cloud.RecoveryStrategy.replicate(RecoveryStrategy.java:120) at org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:341) at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:206) Caused by: java.net.SocketTimeoutException: Read timed out at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.read(SocketInputStream.java:129) at java.io.BufferedInputStream.fill(BufferedInputStream.java:218) at java.io.BufferedInputStream.read(BufferedInputStream.java:237) at org.apache.commons.httpclient.HttpParser.readRawLine(HttpParser.java:78) at org.apache.commons.httpclient.HttpParser.readLine(HttpParser.java:106) at org.apache.commons.httpclient.HttpConnection.readLine(HttpConnection.java:1116) at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$HttpConnectionAdapter.readLine(MultiThreadedHttpConnectionManager.java:1413) at org.apache.commons.httpclient.HttpMethodBase.readStatusLine(HttpMethodBase.java:1973) at org.apache.commons.httpclient.HttpMethodBase.readResponse(HttpMethodBase.java:1735) at org.apache.commons.httpclient.HttpMethodBase.execute(HttpMethodBase.java:1098) at org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:398) at org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:171) at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397) at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:440) ... 8 more
Re: Facet and totaltermfreq
The query: http://localhost:8983/solr/select/?qt=tvrhq=query:thetv.fl=querytv.all=truef.id.tv.tf=truefacet.field=idfacet=truefacet.limit=-1facet.mincount=1 be careful with facet.limit=-1, it'll pull everything matching the query. Probably paging would make more sense in your case. f.id.tv.tf will make sure to output term frequencies. In my schema the field query contains some text data to search against (don't get mixed up ;) ). You will also need termVectors=true termPositions=true termOffsets=true for your searching field. Relevant exceprts from response: result name=response numFound=787 start=0 doc str name=id721/str str name=querythe doors the end/str date name=timestamp2002-08-09T04:05:28Z/date str name=userwifcbcdr/str /doc ... /result lst name=facet_counts lst name=facet_queries/ lst name=facet_fields lst name=id int name=7211/int ... /lst /lst lst name=facet_dates/ lst name=facet_ranges/ lst name=termVectors lst name=doc-720 !-- Notice how strange the doc-id is less than solr uniqueKey id by one. I have no idea why it is like this -- str name=uniqueKey721/str lst name=query lst name=doors int name=tf1/int lst name=offsets int name=start4/int int name=end9/int /lst lst name=positions int name=position1/int /lst int name=df37/int double name=tf-idf0.02702702702702703/double /lst lst name=end int name=tf1/int lst name=offsets int name=start15/int int name=end18/int /lst lst name=positions int name=position3/int /lst int name=df43/int double name=tf-idf0.023255813953488372/double /lst lst name=*the* !-- In-document term frequency -- int name=tf2/int lst name=offsets int name=start0/int int name=end3/int int name=start11/int int name=end14/int /lst lst name=positions int name=position0/int int name=position2/int /lst int name=df787/int double name=tf-idf0.0025412960609911056/double /lst /lst /lst ... /lst -Dmitry On Sat, May 5, 2012 at 12:05 AM, Jamie Johnson jej2...@gmail.com wrote: it might be...can you provide an example of the request/response? On Fri, May 4, 2012 at 3:31 PM, Dmitry Kan dmitry@gmail.com wrote: I have tried (as a test) combining facets and term vectors ( http://wiki.apache.org/solr/TermVectorComponent ) in one query and was able to get a list of facets and for each facet there was a term freq under termVectors section. Not sure, if that's what you are trying to achieve. -Dmitry On Fri, May 4, 2012 at 8:37 PM, Jamie Johnson jej2...@gmail.com wrote: Is it possible when faceting to return not only the strings but also the total term frequency for those facets? I am trying to avoid building a customized faceting component and making multiple queries. In our scenario we have multivalued fields which may have duplicates and I would like to be able to get a count of how many documents that term appears (currently what faceting does) but also how many times that term appears in general. -- Regards, Dmitry Kan -- Regards, Dmitry Kan
Re: 1MB file to Zookeeper
ZK is not really designed for keeping large data files, from http://zookeeper.apache.org/doc/current/zookeeperProgrammers.html#Data+Access: ZooKeeper was not designed to be a general database or large object store.If large data storage is needed, the usually pattern of dealing with such data is to store it on a bulk storage system, such as NFS or HDFS, and store pointers to the storage locations in ZooKeeper. So perhaps we should think about adding K/V store support to ResourceLoader? If a file is 1Mb, a reference to the file is stored in ZK under the original resource name, in a way that ResourceLoader can tell that it is a reference, not the complete file. We then make a simple LargeObjectStoreInterface (with get/put/del) which ResourceLoader uses to get the complete file based on reference. To start with we can make a ZkLargeFileStoreImpl where the put(key,val) method chops up the file and stores it spanning multiple 1M ZK nodes, and the get(key) method assembles all parts and returns the object. It would be good enough for most, but if you require something better you can easily impl support for CouchDb, Voldemort or whatever. -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Solr Training - www.solrtraining.com On 4. mai 2012, at 19:09, Yonik Seeley wrote: On Fri, May 4, 2012 at 12:50 PM, Mark Miller markrmil...@gmail.com wrote: And how should we detect if data is compressed when reading from ZooKeeper? I was thinking we could somehow use file extensions? eg synonyms.txt.gzip - then you can use different compression algs depending on the ext, etc. We would want to try and make it as transparent as possible though... At first I thought about adding a marker to the beginning of a file, but file extensions could work too, as long as the resource loader made it transparent (i.e. code would just need to ask for synonyms.txt, but the resource loader would search for synonyms.txt.gzip, etc, if the original name was not found) Hmmm, but this breaks down for things like watches - I guess that's where putting the encoding inside the file would be a better option. -Yonik lucenerevolution.com - Lucene/Solr Open Source Search Conference. Boston May 7-10
Re: 1MB file to Zookeeper
On Sat, May 5, 2012 at 8:39 AM, Jan Høydahl jan@cominvent.com wrote: support for CouchDb, Voldemort or whatever. Hmmm... Or Solr! -Yonik
solr: adding a string on to a field via DIH
Hello, Is it possible to concatenate a field via DIH? For example for the id field, in order to make it unique I want to add 'project' to the beginning of the id field. So the field would look like 'project1234' Is this possible? field column=id name=id / Thanks
Re: solr: adding a string on to a field via DIH
There might be a Solr way of accomplishing this, but I've always done stuff like this in SQL (i.e. the CONCAT command). Doing it a Solr-native way would probably be better in terms of bandwidth consumption, but just giving you that option early in case there's not a better one. Michael On Sat, 2012-05-05 at 09:12 -0400, okayndc wrote: Hello, Is it possible to concatenate a field via DIH? For example for the id field, in order to make it unique I want to add 'project' to the beginning of the id field. So the field would look like 'project1234' Is this possible? field column=id name=id / Thanks
Re: solr: adding a string on to a field via DIH
Sounds like you need a Template Transformer: ... it helps to concatenate multiple values or add extra characters to field for injection. entity name=e transformer=TemplateTransformer .. field column=namedesc template=hello${e.name},${eparent.surname} / ... /entity See: http://wiki.apache.org/solr/DataImportHandler#TemplateTransformer Or did you have something different in mind? -- Jack Krupansky -Original Message- From: okayndc Sent: Saturday, May 05, 2012 9:12 AM To: solr-user@lucene.apache.org Subject: solr: adding a string on to a field via DIH Hello, Is it possible to concatenate a field via DIH? For example for the id field, in order to make it unique I want to add 'project' to the beginning of the id field. So the field would look like 'project1234' Is this possible? field column=id name=id / Thanks
Re: solr: adding a string on to a field via DIH
Thanks guys. I had taken a quick look at the Template Transformer and it looks it does what I need it to dodidn't see the 'hello' part when reviewing earlier. On Sat, May 5, 2012 at 11:47 AM, Jack Krupansky j...@basetechnology.comwrote: Sounds like you need a Template Transformer: ... it helps to concatenate multiple values or add extra characters to field for injection. entity name=e transformer=**TemplateTransformer .. field column=namedesc template=hello${e.name},${**eparent.surname} / ... /entity See: http://wiki.apache.org/solr/**DataImportHandler#**TemplateTransformerhttp://wiki.apache.org/solr/DataImportHandler#TemplateTransformer Or did you have something different in mind? -- Jack Krupansky -Original Message- From: okayndc Sent: Saturday, May 05, 2012 9:12 AM To: solr-user@lucene.apache.org Subject: solr: adding a string on to a field via DIH Hello, Is it possible to concatenate a field via DIH? For example for the id field, in order to make it unique I want to add 'project' to the beginning of the id field. So the field would look like 'project1234' Is this possible? field column=id name=id / Thanks
Re: Solr Merge during off peak times
On 5/4/2012 8:10 PM, Lance Norskog wrote: Optimize takes a 'maxSegments' option. This tells it to stop when there are N segments instead of just one. If you use a very high mergeFactor and then call optimize with a sane number like 50, it only merges the little teeny segments. When I optimize, I want only one segment. My main concern in doing occasional optimizes is removing deleted documents. Whatever speedup I get from having only one segment is just a nice bonus. When it comes to only merging the small segments, I am concerned about that happening when regular indexing builds up enough segments to do a merge. If I start with one large optimized segment, then do indexing operations such that I reach segmentsPerTier, will it leave the large segment alone and just work on the little ones? I am using Solr 3.5 with the following config: mergePolicy class=org.apache.lucene.index.TieredMergePolicy int name=maxMergeAtOnce35/int int name=segmentsPerTier35/int int name=maxMergeAtOnceExplicit105/int /mergePolicy Thanks, Shawn
Re: 1MB file to Zookeeper
On May 5, 2012, at 8:39 AM, Jan Høydahl wrote: ZK is not really designed for keeping large data files, fromhttp://zookeeper.apache.org/doc/current/zookeeperProgrammers.html#Data+Access: ZooKeeper was not designed to be a general database or large object store.If large data storage is needed, the usually pattern of dealing with such data is to store it on a bulk storage system, such as NFS or HDFS, and store pointers to the storage locations in ZooKeeper. I don't really think it's that big a deal where Solr is concerned. We don't use ZooKeeper intensively. Only for state changes, and initially loading config for SolrCore starts or reloads. ZooKeeper perf should not be a big deal for Solr. I think the main issue is going to be that zk wants everything in ram - but again, a few, couple MB conf files should be no big deal AFAICT. Other large scale applications that constantly and intensively use zookeeper are a different story - when Solr is in its running state, it doesnt do anything with zookeeper other than maintain it's heartbeat. - Mark Miller lucidimagination.com
Re: Understanding RecoveryStrategy
On May 5, 2012, at 2:37 AM, Trym R. Møller wrote: Hi Using Solr trunk with the replica feature, I see the below exception repeatedly in the Solr log. I have been looking into the code of RecoveryStrategy#commitOnLeader and read the code as follows: 1. sends a commit request (with COMMIT_END_POINT=true) to the Solr instance containing the leader of the slice 2. sends a commit request to the Solr instance containing the leader of the slice The first results in a commit on the shards in the single leader Solr instance and the second results in a commit on the shards in the single leader Solr plus on all other Solrs having slices or replica belonging to the collection. I would expect that the first request is the relevant (and enough to do a recovery of the specific replica). Am I reading the second request wrong or is it a bug? Your right - that second server.commit() looks like a bug - I don't think it should be harmful - it would just send out a commit to the cluster - but it should not be there. I'm not sure why that second commit is timing out though. I guess it may be that reopening the searcher so quickly twice in a row is taking longer than the 30 second timeout. I guess we have to consider if we even need that commit to open a new searcher - but either way it should be a lot better once we remove that second commit. The code I'm referring to is UpdateRequest ureq = new UpdateRequest(); ureq.setParams(new ModifiableSolrParams()); ureq.getParams().set(DistributedUpdateProcessor.COMMIT_END_POINT, true); ureq.getParams().set(RecoveryStrategy.class.getName(), baseUrl); 1.ureq.setAction(AbstractUpdateRequest.ACTION.COMMIT, false, true).process( server); 2.server.commit(); Thanks in advance for any input. Best regards Trym R. Møller Apr 21, 2012 10:14:11 AM org.apache.solr.common.SolrException log SEVERE: Error while trying to recover:org.apache.solr.client.solrj.SolrServerException: http://myIP:8983/solr/myShardId at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:493) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:264) at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:103) at org.apache.solr.client.solrj.SolrServer.commit(SolrServer.java:180) at org.apache.solr.client.solrj.SolrServer.commit(SolrServer.java:156) at org.apache.solr.cloud.RecoveryStrategy.commitOnLeader(RecoveryStrategy.java:170) at org.apache.solr.cloud.RecoveryStrategy.replicate(RecoveryStrategy.java:120) at org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:341) at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:206) Caused by: java.net.SocketTimeoutException: Read timed out at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.read(SocketInputStream.java:129) at java.io.BufferedInputStream.fill(BufferedInputStream.java:218) at java.io.BufferedInputStream.read(BufferedInputStream.java:237) at org.apache.commons.httpclient.HttpParser.readRawLine(HttpParser.java:78) at org.apache.commons.httpclient.HttpParser.readLine(HttpParser.java:106) at org.apache.commons.httpclient.HttpConnection.readLine(HttpConnection.java:1116) at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$HttpConnectionAdapter.readLine(MultiThreadedHttpConnectionManager.java:1413) at org.apache.commons.httpclient.HttpMethodBase.readStatusLine(HttpMethodBase.java:1973) at org.apache.commons.httpclient.HttpMethodBase.readResponse(HttpMethodBase.java:1735) at org.apache.commons.httpclient.HttpMethodBase.execute(HttpMethodBase.java:1098) at org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:398) at org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:171) at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397) at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:440) ... 8 more - Mark Miller lucidimagination.com
Re: Understanding RecoveryStrategy
https://issues.apache.org/jira/browse/SOLR-3437 On May 5, 2012, at 12:46 PM, Mark Miller wrote: On May 5, 2012, at 2:37 AM, Trym R. Møller wrote: Hi Using Solr trunk with the replica feature, I see the below exception repeatedly in the Solr log. I have been looking into the code of RecoveryStrategy#commitOnLeader and read the code as follows: 1. sends a commit request (with COMMIT_END_POINT=true) to the Solr instance containing the leader of the slice 2. sends a commit request to the Solr instance containing the leader of the slice The first results in a commit on the shards in the single leader Solr instance and the second results in a commit on the shards in the single leader Solr plus on all other Solrs having slices or replica belonging to the collection. I would expect that the first request is the relevant (and enough to do a recovery of the specific replica). Am I reading the second request wrong or is it a bug? Your right - that second server.commit() looks like a bug - I don't think it should be harmful - it would just send out a commit to the cluster - but it should not be there. I'm not sure why that second commit is timing out though. I guess it may be that reopening the searcher so quickly twice in a row is taking longer than the 30 second timeout. I guess we have to consider if we even need that commit to open a new searcher - but either way it should be a lot better once we remove that second commit. The code I'm referring to is UpdateRequest ureq = new UpdateRequest(); ureq.setParams(new ModifiableSolrParams()); ureq.getParams().set(DistributedUpdateProcessor.COMMIT_END_POINT, true); ureq.getParams().set(RecoveryStrategy.class.getName(), baseUrl); 1.ureq.setAction(AbstractUpdateRequest.ACTION.COMMIT, false, true).process( server); 2.server.commit(); Thanks in advance for any input. Best regards Trym R. Møller Apr 21, 2012 10:14:11 AM org.apache.solr.common.SolrException log SEVERE: Error while trying to recover:org.apache.solr.client.solrj.SolrServerException: http://myIP:8983/solr/myShardId at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:493) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:264) at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:103) at org.apache.solr.client.solrj.SolrServer.commit(SolrServer.java:180) at org.apache.solr.client.solrj.SolrServer.commit(SolrServer.java:156) at org.apache.solr.cloud.RecoveryStrategy.commitOnLeader(RecoveryStrategy.java:170) at org.apache.solr.cloud.RecoveryStrategy.replicate(RecoveryStrategy.java:120) at org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:341) at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:206) Caused by: java.net.SocketTimeoutException: Read timed out at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.read(SocketInputStream.java:129) at java.io.BufferedInputStream.fill(BufferedInputStream.java:218) at java.io.BufferedInputStream.read(BufferedInputStream.java:237) at org.apache.commons.httpclient.HttpParser.readRawLine(HttpParser.java:78) at org.apache.commons.httpclient.HttpParser.readLine(HttpParser.java:106) at org.apache.commons.httpclient.HttpConnection.readLine(HttpConnection.java:1116) at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$HttpConnectionAdapter.readLine(MultiThreadedHttpConnectionManager.java:1413) at org.apache.commons.httpclient.HttpMethodBase.readStatusLine(HttpMethodBase.java:1973) at org.apache.commons.httpclient.HttpMethodBase.readResponse(HttpMethodBase.java:1735) at org.apache.commons.httpclient.HttpMethodBase.execute(HttpMethodBase.java:1098) at org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:398) at org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:171) at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397) at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:440) ... 8 more - Mark Miller lucidimagination.com - Mark Miller lucidimagination.com
Re: Invalid version (expected 2, but 60) on CentOS in production please Help!!!
The first thing I'd check is if, in the log, there is a replication happening immediately prior to the error. I confess I'm not entirely up on the version thing, but is it possible you're replicating an index that is built with some other version of Solr? That would at least explain your statement that it runs OK, but then fails sometime later. Best Erick On Fri, May 4, 2012 at 1:50 PM, Ravi Solr ravis...@gmail.com wrote: Hello, We Recently we migrated our SOLR 3.6 server OS from Solaris to CentOS and from then on we started seeing Invalid version (expected 2, but 60) errors on one of the query servers (oddly one other query server seems fine). If we restart the server having issue everything will be alright, but the next day in the morning again we get the same exception. I made sure that all the client applications are using SOLR 3.6 version. The Glassfish on which all the applications and SOLR are deployed use Java 1.6.0_29. The only difference I could see 1. The process indexing to the server having issues is using java1.6.0_31 2. The process indexing to the server that DOES NOT have issues is using java1.6.0_29 Could the Java minor version being greater than the SOLR instance be the cause of this issue ??? Can anybody please help me debug this a bit more ? what else can I look at to understand the underlying problem. The stack trace is given below [#|2012-05-04T09:58:43.985-0400|SEVERE|sun-appserver2.1.1|xxx...|_ThreadID=32;_ThreadName=httpSSLWorkerThread-9001-7;_RequestID=a19f92cc-2a8c-47e8-b159-a20330f14af5; org.apache.solr.client.solrj.SolrServerException: Error executing query at org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:95) at org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:311) at com.wpost.ipad.feeds.FeedController.findLinksetNewsBySection(FeedController.java:743) at com.wpost.ipad.feeds.FeedController.findNewsBySection(FeedController.java:347) at sun.reflect.GeneratedMethodAccessor282.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.springframework.web.bind.annotation.support.HandlerMethodInvoker.invokeHandlerMethod(HandlerMethodInvoker.java:175) at org.springframework.web.servlet.mvc.annotation.AnnotationMethodHandlerAdapter.invokeHandlerMethod(AnnotationMethodHandlerAdapter.java:421) at org.springframework.web.servlet.mvc.annotation.AnnotationMethodHandlerAdapter.handle(AnnotationMethodHandlerAdapter.java:409) at org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:774) at org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:719) at org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:644) at org.springframework.web.servlet.FrameworkServlet.doGet(FrameworkServlet.java:549) at javax.servlet.http.HttpServlet.service(HttpServlet.java:734) at javax.servlet.http.HttpServlet.service(HttpServlet.java:847) at org.apache.catalina.core.ApplicationFilterChain.servletService(ApplicationFilterChain.java:427) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:315) at org.apache.catalina.core.StandardContextValve.invokeInternal(StandardContextValve.java:287) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:218) at org.apache.catalina.core.StandardPipeline.doInvoke(StandardPipeline.java:648) at org.apache.catalina.core.StandardPipeline.doInvoke(StandardPipeline.java:593) at com.sun.enterprise.web.WebPipeline.invoke(WebPipeline.java:94) at com.sun.enterprise.web.PESessionLockingStandardPipeline.invoke(PESessionLockingStandardPipeline.java:98) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:222) at org.apache.catalina.core.StandardPipeline.doInvoke(StandardPipeline.java:648) at org.apache.catalina.core.StandardPipeline.doInvoke(StandardPipeline.java:593) at org.apache.catalina.core.StandardPipeline.invoke(StandardPipeline.java:587) at org.apache.catalina.core.ContainerBase.invoke(ContainerBase.java:1093) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:166) at org.apache.catalina.core.StandardPipeline.doInvoke(StandardPipeline.java:648) at org.apache.catalina.core.StandardPipeline.doInvoke(StandardPipeline.java:593) at org.apache.catalina.core.StandardPipeline.invoke(StandardPipeline.java:587) at org.apache.catalina.core.ContainerBase.invoke(ContainerBase.java:1093) at org.apache.coyote.tomcat5.CoyoteAdapter.service(CoyoteAdapter.java:291)
Re: Single Index to Shards
Oh, isn't that easier! Need more coffee before suggesting things.. Thanks, Erick On Fri, May 4, 2012 at 8:16 PM, Lance Norskog goks...@gmail.com wrote: If you are not using SolrCloud, splitting an index is simple: 1) copy the index 2) remove what you do not want via delete-by-query 3) Optimize! #2 brings up a basic design question: you have to decide which documents go to which shards. Mostly people use a value generated by a hash on the actual id- this allows you to assign docs evenly. http://wiki.apache.org/solr/UniqueKey On Fri, May 4, 2012 at 4:28 PM, Young, Cody cody.yo...@move.com wrote: You can also make a copy of your existing index, bring it up as a second instance/core and then send delete queries to both indexes. -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Friday, May 04, 2012 8:37 AM To: solr-user@lucene.apache.org Subject: Re: Single Index to Shards There's no way to split an _existing_ index into multiple shards, although some of the work on SolrCloud is considering being able to do this. You have a couple of choices here: 1 Just reindex everything from scratch into two shards 2 delete all the docs from your index that will go into shard 2 and 2 just index the docs for shard 2 in your new shard But I want to be sure you're on the right track here. You only need to shard if your index contains too many documents for your hardware to produce decent query rates. If you are getting (and I'm picking this number out of thin air) 50 QPS on your hardware (i.e. you're not stressing memory etc) and just want to get to 150 QPS, use replication rather than sharding. see: http://wiki.apache.org/solr/SolrReplication Best Erick On Fri, May 4, 2012 at 9:44 AM, michaelsever sever_mich...@bah.com wrote: If I have a single Solr index running on a Core, can I split it or migrate it into 2 shards? -- View this message in context: http://lucene.472066.n3.nabble.com/Single-Index-to-Shards-tp3962380.ht ml Sent from the Solr - User mailing list archive at Nabble.com. -- Lance Norskog goks...@gmail.com
Re: Single Index to Shards
We did it at my last job. Took a few days to split a 500mdoc index. On Sat, May 5, 2012 at 9:55 AM, Erick Erickson erickerick...@gmail.com wrote: Oh, isn't that easier! Need more coffee before suggesting things.. Thanks, Erick On Fri, May 4, 2012 at 8:16 PM, Lance Norskog goks...@gmail.com wrote: If you are not using SolrCloud, splitting an index is simple: 1) copy the index 2) remove what you do not want via delete-by-query 3) Optimize! #2 brings up a basic design question: you have to decide which documents go to which shards. Mostly people use a value generated by a hash on the actual id- this allows you to assign docs evenly. http://wiki.apache.org/solr/UniqueKey On Fri, May 4, 2012 at 4:28 PM, Young, Cody cody.yo...@move.com wrote: You can also make a copy of your existing index, bring it up as a second instance/core and then send delete queries to both indexes. -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Friday, May 04, 2012 8:37 AM To: solr-user@lucene.apache.org Subject: Re: Single Index to Shards There's no way to split an _existing_ index into multiple shards, although some of the work on SolrCloud is considering being able to do this. You have a couple of choices here: 1 Just reindex everything from scratch into two shards 2 delete all the docs from your index that will go into shard 2 and 2 just index the docs for shard 2 in your new shard But I want to be sure you're on the right track here. You only need to shard if your index contains too many documents for your hardware to produce decent query rates. If you are getting (and I'm picking this number out of thin air) 50 QPS on your hardware (i.e. you're not stressing memory etc) and just want to get to 150 QPS, use replication rather than sharding. see: http://wiki.apache.org/solr/SolrReplication Best Erick On Fri, May 4, 2012 at 9:44 AM, michaelsever sever_mich...@bah.com wrote: If I have a single Solr index running on a Core, can I split it or migrate it into 2 shards? -- View this message in context: http://lucene.472066.n3.nabble.com/Single-Index-to-Shards-tp3962380.ht ml Sent from the Solr - User mailing list archive at Nabble.com. -- Lance Norskog goks...@gmail.com -- Lance Norskog goks...@gmail.com
Re: Why would solr norms come up different from Lucene norms?
Which Similarity class do you use for the Lucene code? Solr has a custom one. On Fri, May 4, 2012 at 6:30 AM, Benson Margulies bimargul...@gmail.com wrote: So, I've got some code that stores the same documents in a Lucene 3.5.0 index and a Solr 3.5.0 instance. It's only five documents. For a particular field, the Solr norm is always 0.625, while the Lucene norm is .5. I've watched the code in NormsWriterPerField in both cases. In Solr we've got .577, in naked Lucene it's .5. I tried to check for boosts, and I don't see any non-1.0 document or field boosts. The Solr field is: field name=bt_rni_NameHRK_encodedName type=text_ws indexed=true stored=true multiValued=false / -- Lance Norskog goks...@gmail.com
Re: problem with date searching.
Use debugQuery=true to see exactly how the dismax parser sees this query. Also, since this is a binary query, you can use filter queries instead. Those use the Lucene syntax. On Fri, May 4, 2012 at 8:14 AM, Erick Erickson erickerick...@gmail.com wrote: Right, you need to do the explicit qualification of the date field. dismax parsing is intended to work with text-type fields, not numeric or date fields. If you attach debugQuery=on, you'll see that your scanneddate field is just dropped. Furthermore, dismax was never intended to work with range queries. Note this from the DisMaxQParserPlugin page: extremely simplified subset of the Lucene QueryParser syntax I'll expand on this a bit on the Wiki page. Best Erick On Fri, May 4, 2012 at 6:45 AM, Dmitry Kan dmitry@gmail.com wrote: unless, something else is wrong, my question would be, if you have the documents in solr stamped with these dates? also could try for a test specifying the field name directly: q=scanneddate:[2011-09-22T22:40:30Z TO 2012-02-02T01:30:52Z] also, in your first e-mail you said you have used [*2012-02-02T01:30:52Z TO 2012-02-02T01:30:52Z*] with asterisks *, what scanneddate values did you then get? On Fri, May 4, 2012 at 1:37 PM, ayyappan ayyaba...@gmail.com wrote: thanks for quick response. I tried your advice . [2011-09-22T22:40:30Z TO 2012-02-02T01:30:52Z] like that even though i am not getting any result . -- View this message in context: http://lucene.472066.n3.nabble.com/problem-with-date-searching-tp3961761p3961833.html Sent from the Solr - User mailing list archive at Nabble.com. -- Regards, Dmitry Kan -- Lance Norskog goks...@gmail.com
Re: Why would solr norms come up different from Lucene norms?
On Sat, May 5, 2012 at 7:59 PM, Lance Norskog goks...@gmail.com wrote: Which Similarity class do you use for the Lucene code? Solr has a custom one. I am embarassed to report that I also have a custom similarity that I didn't know about, and once I configured that into Solr all was well. On Fri, May 4, 2012 at 6:30 AM, Benson Margulies bimargul...@gmail.com wrote: So, I've got some code that stores the same documents in a Lucene 3.5.0 index and a Solr 3.5.0 instance. It's only five documents. For a particular field, the Solr norm is always 0.625, while the Lucene norm is .5. I've watched the code in NormsWriterPerField in both cases. In Solr we've got .577, in naked Lucene it's .5. I tried to check for boosts, and I don't see any non-1.0 document or field boosts. The Solr field is: field name=bt_rni_NameHRK_encodedName type=text_ws indexed=true stored=true multiValued=false / -- Lance Norskog goks...@gmail.com