SolrCloud (7.3) and Legacy replication slaves
Is it possible set up an existing SolrCloud cluster as the master for legacy replication to a slave server or two? It looks like another option is to use Uni-direction CDCR, but not sure what is the best option in this case. -- Michael Tracey
Sort order, return the first 20 results, and the last 80 results
Hey all, I'm interested returning 100 rows in a query, with a sort order on a tfloat field, but return the first 20 results, then the last 80 results. I'd like to do this without two requests, to keep down requests per second. Is there any way to do this in one query with function queries or another method? Thanks, Michael
SolrCloud Nodes autoSoftCommit and (temporary) missing documents
Hey all, I've got a number of nodes (Solr 4.4 Cloud) that I'm balancing with HaProxy for queries. I'm indexing pretty much constantly, and have autoCommit and autoSoftCommit on for Near Realtime Searching. All works nicely, except that occasionally the auto-commit cycles are far enough off that one node will return a document that another node doesn't. I don't want to have to add something like this: timestamp:[* TO NOW-30MINUTE] to every query to make sure that all the nodes have the record. Ideas? autoSoftCommit more often? 10 720 false 3 5000 Thanks, M.
Turning on KeywordRepeat and RemoveDups on an existing fieldType.
As per the stemming docs ( https://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#Stemming ), I want to score the original term higher than the stemmed version by adding: to a field type that is already created (with Stemming). I have 100M documents in this index, and it gets slowly reindexed every month as records change. My question is, can I add this to the existing fieldType, or do I need to make a new fieldType, and copyField the data over to it, and after it's all reindexed switch my code? I'd rather be able to just add the lines to my fieldType because I don't think I have enough disk space on my cloud members to hold my primary fulltext field twice. Just in case it helps, I'm running 4.4.0 and the field I'm wanting to mod looks like this: Thanks, M.
Re: Problems bulk adding documents to Solr Cloud in 4.5.1
Dave, that's the exact symptoms we all have had in SOLR-5402. After many attempted fixes (including upgrading jetty, switching to tomcat, messing with buffer settings) my solution was to fall back to 4.4 and await a fix. - Original Message - From: "Dave Seltzer" To: solr-user@lucene.apache.org Sent: Monday, November 18, 2013 9:48:46 PM Subject: Problems bulk adding documents to Solr Cloud in 4.5.1 Hello, I'm having quite a bit of trouble indexing content in Solr Cloud. I build a content indexer on top of the REST API designed to index my data quickly. It was working very well indexing about 100 documents per "" instruction. After some tweaking of the schema I switched on a few more servers. Set up a few shards and started indexing data. Everything was working perfectly, but as soon as I switched to "Cloud" I started getting RemoteServerExceptions "Illegal to have multiple roots." I'm using the stock Jetty container on both servers. To get things working I reduced the number of documents per add until it worked. Unfortunately that has limited me to adding a single document per add - which is quite slow. I'm fairly sure it's not the size of the HTTP post because things were working just fine until I moved over to Solr Cloud. Does anyone have any information about this problem? It sounds a lot like Sai Gadde's https://issues.apache.org/jira/browse/SOLR-5402 Thanks so much! -Dave
qf match density?
While doing a search like: q=great+gatsby&defType=edismax&qf=title^1.8 records with a title of "great gatsby / great gatsby" always score higher than "great gatsby" just a single time. How do I express that a single match should be just as important as having the query match multiple times in the title field? Thanks, m.
Is this a reasonable way to boost?
I'm trying to boost results slightly on a price (not currency) field that are closer to a certain value. I want results that are not too expensive or too inexpensive to be favored. Here is what we currently are trying: bf=sub(1,abs(sub(15,price)))^0.2 where 15 is that "median" I want to boost towards. Is this a good way? I understand in older solr's it was common to use recip(ord()) for this but you shouldn't do so now. Thanks for any comments or advice on improving this. M.
SolrCloud (4.4) and CurrencyField refresh intervals
I've got a 4.4 solrCloud cluster running, and have an external process that rebuilds the currency.xml file and uploads to zookeeper the latest version every X minutes. It looks like with CurrencyField the OpenExchangeRatesOrgProvider provider has a refreshInterval setting, but the documentation does not mention a refreshInterval on the FileExchangeRateProvider. Is there a way to do this without reloading the whole core on each of the nodes after updating the rates? (Ideally, I'd like the changes to be picked up at the next hard commit). Thanks, M.
Re: Solr 4.5.1 replication Bug? "Illegal to have multiple roots (start tag in epilog?)."
Hey, this is Michael, who was having the exact error on the Jetty side with an update. I've upgraded jetty from the 4.5.1 embedded version (in the example directory) to version 9.0.6, which means I had to upgrade my OpenJDK from 1.6 to 1.7.0_45. Also, I added the suggested (very large) settings to my solrconfig.xml: but I am still getting the errors when I put a second server in the cloud. Single servers (external zookeeper, but no cloud partner) works just fine. I suppose my next step is to try Tomcat, but according to your post, it will not help! Any help is appreciated, M. - Original Message - From: "Sai Gadde" To: solr-user@lucene.apache.org Sent: Monday, October 28, 2013 7:10:41 AM Subject: Solr 4.5.1 replication Bug? "Illegal to have multiple roots (start tag in epilog?)." we have a similar error as this thread. http://www.mail-archive.com/solr-user@lucene.apache.org/msg90748.html Tried tomcat setting from this post. We used exact setting sepecified here. we merge 500 documents at a time. I am creating a new thread because Michael is using Jetty where as we use Tomcat. formdataUploadLimitInKB and multipartUploadLimitInKB limits are set to very high value 2GB. As suggested in the following thread. https://issues.apache.org/jira/browse/SOLR-5331 We use out of the box Solr 4.5.1 no customization done. If we merge documents via SolrJ to a single server it is perfectly working fine. But as soon as we add another node to the cloud we are getting following while merging documents. This is the error we are getting on the server (10.10.10.116 - IP is irrelavent just for clarity)where merging is happening. 10.10.10.119 is the new node here. This server gets RemoteSolrException shard update error StdNode: http://10.10.10.119:8980/solr/mycore/:org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: Illegal to have multiple roots (start tag in epilog?). at [row,col {unknown-source}]: [1,12468] at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:425) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:180) at org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:401) at org.apache.solr.update.SolrCmdDistributor$1.call(SolrCmdDistributor.java:1) at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source) at java.util.concurrent.FutureTask.run(Unknown Source) at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source) at java.util.concurrent.FutureTask.run(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) On the other server 10.10.10.119 we get following error org.apache.solr.common.SolrException: Illegal to have multiple roots (start tag in epilog?). at [row,col {unknown-source}]: [1,12468] at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:176) at org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1859) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:703) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:406) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:195) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:222) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:123) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:171) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:99) at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:936) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:407) at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1004) at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:589) at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:310) at ja
Re: Solr 4.5.1 and Illegal to have multiple roots (start tag in epilog?). (perhaps SOLR-4327 bug?)
Thanks Chris and Sai. I was hoping to use the standard jetty configuration (noting another thread on these forums indicating that it is the default and supported container). but will migrate to tomcat of needed. Has anyone found a workaround that works with the standard container? We are sending updates of around 1000 records at a time, about 500k for the whole json document. Sent from my iPhone > On Oct 25, 2013, at 6:01 AM, Sai Gadde wrote: > > We were trying to migrate to 4.5 from 4.0 and faced similar issue as well. > I saw the ticket raised by Chris and tried setting formdataUploadLimitInKB > to a higher value and which did not resolve this issue. > > We use Solr 4.0.0 currently and no additional container settings are > required. But it is very strange since when I tested with a single instance > there was no problem at all. How come it is so difficult for two Solr > instances to communicate with each other! I except Solr cloud setup should > be independent of container configuration. > > Anyway thanks Chris for the info we will try these tomcat settings and see > if this issue goes away. > > >> On Fri, Oct 25, 2013 at 4:35 PM, Chris Geeringh wrote: >> >> Hi Michael, >> >> I opened that ticket, and it looks like there is indeed a buffer or limit I >> was exceeding. As per the ticket I guess the stream is cut off at that >> limit, and is then malformed. I am using Tomcat, and since increasing some >> limits on the connector, I haven't had any issues since. I'll close that >> ticket. >> >> > connectionTimeout="6" >> redirectPort="8443" maxPostSize="104857600" >> maxHttpHeaderSize="819200" maxThreads="1"/> >> >> Hope that helps. >> >> Cheers, >> Chris >> >> >>> On 25 October 2013 03:48, Michael Tracey wrote: >>> >>> Hey Solr-users, >>> >>> I've got a single solr 4.5.1 node with 96GB ram, a 65GB index (105 >> million >>> records) and a lot of daily churn of newly indexed files (auto softcommit >>> and commits). I'm trying to bring another matching node into the mix, >> and >>> am getting these errors on the new node: >>> >>> org.apache.solr.common.SolrException; >>> org.apache.solr.common.SolrException: Illegal to have multiple roots >> (start >>> tag in epilog?). >>> >>> On the old server, still running, I'm getting: >>> >>> shard update error StdNode: http://server1: >> /solr/collection/:org.apache.solr.client.solrj.SolrServerException: >>> Server refused connection at: http://server2:/solr/collection >>> >>> the new core never actually comes online, stays in recovery mode. The >>> other two tiny cores (100,000+ records each and not updated frequently), >>> work just fine. >>> >>> is this SOLR-4327 bug? https://issues.apache.org/jira/browse/SOLR-5331 >>> And if so, how can I get the new node up and running so I can get back in >>> production with some redundancy and speed? >>> >>> I'm running an external zookeeper, and that is all running just fine. >>> Also internal Solrj/jetty with little to no modifications. >>> >>> Any ideas would be appreciated, thanks, >>> >>> M. >>
Solr 4.5.1 and Illegal to have multiple roots (start tag in epilog?). (perhaps SOLR-4327 bug?)
Hey Solr-users, I've got a single solr 4.5.1 node with 96GB ram, a 65GB index (105 million records) and a lot of daily churn of newly indexed files (auto softcommit and commits). I'm trying to bring another matching node into the mix, and am getting these errors on the new node: org.apache.solr.common.SolrException; org.apache.solr.common.SolrException: Illegal to have multiple roots (start tag in epilog?). On the old server, still running, I'm getting: shard update error StdNode: http://server1:/solr/collection/:org.apache.solr.client.solrj.SolrServerException: Server refused connection at: http://server2:/solr/collection the new core never actually comes online, stays in recovery mode. The other two tiny cores (100,000+ records each and not updated frequently), work just fine. is this SOLR-4327 bug? https://issues.apache.org/jira/browse/SOLR-5331 And if so, how can I get the new node up and running so I can get back in production with some redundancy and speed? I'm running an external zookeeper, and that is all running just fine. Also internal Solrj/jetty with little to no modifications. Any ideas would be appreciated, thanks, M.
Controlling traffic between solr 4.1 nodes
Hey all, new to Solr 4.x, and am wondering if there is any way that I could have a single collection (single or multiple shards) replicated into two datacenters, where only 1 solr instance in each datacenter communicate. (for example, 4 servers in one DC, 4 servers in another datacenter and only one in each DC communicate). >From everything I've seen, all zookeepers and replicas must have access to all >other members. Is there something I'm missing? Thanks, M.