Re: Solr Cloud A/B Deployment Issue
Great. Thanks for the work on this patch! Jim -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Cloud-A-B-Deployment-Issue-tp4302810p4303357.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr Cloud A/B Deployment Issue
It appears this has all been resolved by the following ticket: https://issues.apache.org/jira/browse/SOLR-9446 My scenario fails in 6.2.1, but works in 6.3 and Master where this bug has been fixed. In the meantime, we can use our workaround to issue a simple delete command that deletes a non-existent document. Jim -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Cloud-A-B-Deployment-Issue-tp4302810p4303210.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr Cloud A/B Deployment Issue
Also, if we issue a delete by query where the query is "_version_:0", it also creates a transaction log and then has no trouble transferring leadership between old and new nodes. Still, it seems like when we ADDREPLICA, some sort of transaction log should be started. Jim -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Cloud-A-B-Deployment-Issue-tp4302810p4302959.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr Cloud A/B Deployment Issue
Interestingly, If I simply add one document to the full cluster after all 6 nodes are active, this entire problem goes away. This appears to be because a transaction log entry is created which in turn prevents the new nodes from going into full replication recovery upon leader change. Adding a document is a hacky solution, however. It seems like new nodes that were added via ADDREPLICA should know more about versions than they currently do. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Cloud-A-B-Deployment-Issue-tp4302810p4302949.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr 6.0 Highlighting Not Working
Perhaps you need to wrap your inner "" and "" tags in the CDATA structure? -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-6-0-Highlighting-Not-Working-tp4302787p4302835.html Sent from the Solr - User mailing list archive at Nabble.com.
Solr Cloud A/B Deployment Issue
We are running into a timing issue when trying to do a scripted deployment of our Solr Cloud cluster. Scenario to reproduce (sometimes): 1. launch 3 clean solr nodes connected to zookeeper. 2. create a 1 shard collection with replicas on each node. 3. load data (more will make the problem worse) 4. launch 3 more nodes 5. add replicas to each new node 6. once entire cluster is healthy, start killing first three nodes. Depending on the timing, the second three nodes end up all in RECOVERING state without a leader. This appears to be happening because when the first leader dies, all the new nodes go into full replication recovery and if all the old boxes happen to die during that state, the boxes are stuck. The boxes cannot serve requests and they eventually (1-8 hours) go into RECOVERY_FAILED state. This state is easy to fix with a FORCELEADER call to the collections API, but that's only remediation, not prevention. My question is this: Why do the new nodes have to go into full replication recovery when they are already up to date? I just added the replica, so it shouldn't have to a new full replication again. Jim -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Cloud-A-B-Deployment-Issue-tp4302810.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr Cloud prevent Ping Request From Forwarding Request
It seems like all the parameters in the PingHandler get processed by the remote server. So, things like shards=localhost or distrib=false take effect too late. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Cloud-prevent-Ping-Request-From-Forwarding-Request-tp4297521p4297565.html Sent from the Solr - User mailing list archive at Nabble.com.
Solr Cloud prevent Ping Request From Forwarding Request
Here's the scenario: Boxes 1,2, and 3 have replicas of collections dogs and cats. Box 4 has only a replica of dogs. All of these boxes have a healthcheck file on them that works with the PingRequestHandler to say whether the box is up or not. If I hit Box4/cats/admin/ping, Solr forwards the ping request to another box which returns with status OK. Is there anyway to stop a box from forwarding a request to another node? Thanks! -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Cloud-prevent-Ping-Request-From-Forwarding-Request-tp4297521.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Can't create collection without plugin, can't load plugin without collection
Sadly, that didn't work. Without a core to hit, the /[COLLECTION]/config returns a 404 error. The best bet at this point may be for me may be one of the following: 1. Programmatically modify configoverlay.json file to add the runtime libs when I upload the config. or 2. Patch solr so that schema.xml loads custom classes directly from the BlobStore like solrconfig.xml does. or 3. Patch solr so that you can specify configSets instead of a collection when associating a runtimeLib. -- View this message in context: http://lucene.472066.n3.nabble.com/Can-t-create-collection-without-plugin-can-t-load-plugin-without-collection-tp4294865p4295028.html Sent from the Solr - User mailing list archive at Nabble.com.
Can't create collection without plugin, can't load plugin without collection
I've run into an orchestration problem while creating collections and loading plugins via the ConfigAPI in Solr Cloud. Here's the scenario: 1. I create a configSet that references a custom class in schema.xml. 2. I upload the jar to the BlobStore and issue add-runtimelib using the Config API. This fails because the collection doesn't exist yet. 3. I try to create the collection with the configSet but it fails because the custom plugin is not available yet. I can force this to work by removing the custom reference, create the collection, load the jar and then add the custom reference back in place. This is fine as a manual one-time setup, but not feasible in a scripted production deployment. I wish I could create a collection without actually needing to create any cores. Then, I could get all of the configurations for a collection setup before creating the cores. -- View this message in context: http://lucene.472066.n3.nabble.com/Can-t-create-collection-without-plugin-can-t-load-plugin-without-collection-tp4294865.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Why Doesn't Solr Really Quit on Zookeeper Exceptions?
Thanks Shawn. I'm leaning towards a retry as well. So, there's no mechanism that currently exists within Solr that would allow me to automatically retry the zookeeper connection on launch? My options then would be: 1. Externally monitor the status of Solr (eg /solr/admin/collections?action=CLUSTERSTATUS or bin/solr status) and force a restart. 2. Write a patch to retry Zookeeper connections based on some configuration values that specify attempts and wait times. -- View this message in context: http://lucene.472066.n3.nabble.com/Why-Doesn-t-Solr-Really-Quit-on-Zookeeper-Exceptions-tp4279971p4279987.html Sent from the Solr - User mailing list archive at Nabble.com.
Why Doesn't Solr Really Quit on Zookeeper Exceptions?
When I try to launch Solr 6.0 in cloud mode and connect it to a specific chroot in zookeeper that doesn't exist, I get an error in my solr.log. That's expected, but the solr process continues to launch and succeeds. Why wouldn't we want the start process simply to fail and exit? There's no mechanism to trigger a retry, so Solr just sits there like a zombie. -- View this message in context: http://lucene.472066.n3.nabble.com/Why-Doesn-t-Solr-Really-Quit-on-Zookeeper-Exceptions-tp4279971.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Using a RequestHandler to expand query parameter
Never got a response on this ... Just looking for the best way to handle it? -- View this message in context: http://lucene.472066.n3.nabble.com/Using-a-RequestHandler-to-expand-query-parameter-tp4155596p4157613.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Using a RequestHandler to expand query parameter
So, the problem I found that's driving this is that I have several phrase synonyms set up. For example, ipod mini into ipad mini. This synonym is only applied if you submit it as a phrase in quotes. So, the pf param doesn't help because it's not the right phrase in the first place. I can fix this by sending in the query as (ipod mini ipod mini). -- View this message in context: http://lucene.472066.n3.nabble.com/Using-a-RequestHandler-to-expand-query-parameter-tp4155596p4157637.html Sent from the Solr - User mailing list archive at Nabble.com.
Using a RequestHandler to expand query parameter
I would like to send only one query to my custom request handler and have the request handler expand that query into a more complicated query. Example: */myHandler?q=kids+books* ... would turn into a more complicated EDismax query of: *kids books kids books* Is this achievable via a Request Handler definition in solrconfig.xml? Thanks! Jim -- View this message in context: http://lucene.472066.n3.nabble.com/Using-a-RequestHandler-to-expand-query-parameter-tp4155596.html Sent from the Solr - User mailing list archive at Nabble.com.
CloudSolrServer vs Software/Hardware Load Balancer
Hi there, We're trying to evaluate whether to use the CloudSolrServer in SolrJ or to use the HttpSolrServer that is pointed at a software or hardware load balancer such as haproxy or f5. This would be in production. Can anyone provide any experiential pros or cons on these? In addition to performance, i'm interested in management, scalability, and stability. Technically at this point we can already support both, so I'm really looking for best practices. Thanks! Jim -- View this message in context: http://lucene.472066.n3.nabble.com/CloudSolrServer-vs-Software-Hardware-Load-Balancer-tp4146761.html Sent from the Solr - User mailing list archive at Nabble.com.
Help importing xml file as raw xml
Hi, I found a few threads out there dealing with this problem, but there didn't really seem to be much detail to the solution. I have large xml files (500M to 2+ G) with a complex nested structure. It's impossible for me to import the exact structure into a solr representation, and, honestly, I don't need to. But, I do need to store the raw xml for each main item in a solr field for use by other clients. I tried using the xsl option for the XPathEntityProcessor, and it works perfectly for small files. However, it cannot handle the big file -- or at least the machine I have doesn't have enough memory to handle this task. Normal import with the XPEProcessor takes just a few minutes. I do this job a couple times a day and I don't want it to eat up all the memory on one of my nodes. I tried using xsltproc to pretransform the file, but it also took a long time and eventually failed due to memory. My best option now would seem to be using awk or sed to transform the file prior to solr import. Perhaps by removing line breaks and using the LineEntityProcessor and some scripts. My other thought is that since the XPEProcessor knows the structure, there must be some way for it to be extended so that it outputs the raw input if requested. Anyone have any other thoughts? Thanks! Jim -- View this message in context: http://lucene.472066.n3.nabble.com/Help-importing-xml-file-as-raw-xml-tp4082824.html Sent from the Solr - User mailing list archive at Nabble.com.
How to uncache a query to debug?
I have a query that runs slow occasionally. I'm having trouble debugging it because once it's cached, it runs fast -- under 10 ms. But throughout the day it occasionally takes up to 3 secs. It seems like it could be one of the following: 1. My autoCommit (30 and openSearcher=false) and softAutoCommit (1) settings 2. Something to do with distributed search -- There are three nodes, but only 1 shard each. 3. Just a slow query that is getting blown out of cache periodically This is in Solr 4.2. I like that it runs fast when cached, but if it's going to be blown out quickly, then I'd really like to just optimize the query to run fast uncached. *Is there any way to run a query using no caching whatsoever?* The query changes, but has *:* for the q param and 4 fq parameters. It's also trying to do field collapsing. Jim -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-uncache-a-query-to-debug-tp4082010.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to uncache a query to debug?
Thanks, but that doesn't seem to do much. I've added it to all four of the fq params and the q param, but it only makes it marginally slower -- like 50 ms instead of 2 ms. There appears to be a deeper or more widely encompassing cache at work here. Jim On Thu, Aug 1, 2013 at 2:49 PM, Mikhail Khludnev [via Lucene] ml-node+s472066n4082044...@n3.nabble.com wrote: Hello Jim, Does q={!cache=false}lorem ipsum works for you? On Thu, Aug 1, 2013 at 9:12 PM, jimtronic [hidden email]http://user/SendEmail.jtp?type=nodenode=4082044i=0 wrote: I have a query that runs slow occasionally. I'm having trouble debugging it because once it's cached, it runs fast -- under 10 ms. But throughout the day it occasionally takes up to 3 secs. It seems like it could be one of the following: 1. My autoCommit (30 and openSearcher=false) and softAutoCommit (1) settings 2. Something to do with distributed search -- There are three nodes, but only 1 shard each. 3. Just a slow query that is getting blown out of cache periodically This is in Solr 4.2. I like that it runs fast when cached, but if it's going to be blown out quickly, then I'd really like to just optimize the query to run fast uncached. *Is there any way to run a query using no caching whatsoever?* The query changes, but has *:* for the q param and 4 fq parameters. It's also trying to do field collapsing. Jim -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-uncache-a-query-to-debug-tp4082010.html Sent from the Solr - User mailing list archive at Nabble.com. -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com [hidden email] http://user/SendEmail.jtp?type=nodenode=4082044i=1 -- If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/How-to-uncache-a-query-to-debug-tp4082010p4082044.html To unsubscribe from How to uncache a query to debug?, click herehttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=4082010code=amltdHJvbmljQGdtYWlsLmNvbXw0MDgyMDEwfDEzMjQ4NDk0MTQ= . NAMLhttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-uncache-a-query-to-debug-tp4082010p4082046.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to uncache a query to debug?
Thanks. I'd rather not turn off caching completely because it only seems to show up in production and I don't want to turn reboot all the solr processes on each node. Jim On Thu, Aug 1, 2013 at 12:30 PM, Roman Chyla [via Lucene] ml-node+s472066n4082014...@n3.nabble.com wrote: When you set your cache (solrconfig.xml) to size=0, you are not using a cache. so you can debug more easily roman On Thu, Aug 1, 2013 at 1:12 PM, jimtronic [hidden email]http://user/SendEmail.jtp?type=nodenode=4082014i=0 wrote: I have a query that runs slow occasionally. I'm having trouble debugging it because once it's cached, it runs fast -- under 10 ms. But throughout the day it occasionally takes up to 3 secs. It seems like it could be one of the following: 1. My autoCommit (30 and openSearcher=false) and softAutoCommit (1) settings 2. Something to do with distributed search -- There are three nodes, but only 1 shard each. 3. Just a slow query that is getting blown out of cache periodically This is in Solr 4.2. I like that it runs fast when cached, but if it's going to be blown out quickly, then I'd really like to just optimize the query to run fast uncached. *Is there any way to run a query using no caching whatsoever?* The query changes, but has *:* for the q param and 4 fq parameters. It's also trying to do field collapsing. Jim -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-uncache-a-query-to-debug-tp4082010.html Sent from the Solr - User mailing list archive at Nabble.com. -- If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/How-to-uncache-a-query-to-debug-tp4082010p4082014.html To unsubscribe from How to uncache a query to debug?, click herehttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=4082010code=amltdHJvbmljQGdtYWlsLmNvbXw0MDgyMDEwfDEzMjQ4NDk0MTQ= . NAMLhttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-uncache-a-query-to-debug-tp4082010p4082047.html Sent from the Solr - User mailing list archive at Nabble.com.
How to debug an OutOfMemoryError?
I've encountered an OOM that seems to come after the server has been up for a few weeks. While I would love for someone to just tell me you did X wrong, I'm more interested in trying to debug this. So, given the error below, where would I look next? The only odd thing that sticks out to me is that my log file had grown to about 70G. Would that cause an error like this? This is Solr 4.2. Jul 24, 2013 3:08:09 PM org.apache.solr.common.SolrException log SEVERE: null:java.lang.RuntimeException: java.lang.OutOfMemoryError: Java heap space at org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:651) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:364) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:141) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1307) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:453) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:560) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1072) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:382) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1006) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) at org.eclipse.jetty.server.Server.handle(Server.java:365) at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:485) at org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53) at org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:926) at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:988) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:642) at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235) at org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72) at org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543) at java.lang.Thread.run(Thread.java:619) Caused by: java.lang.OutOfMemoryError: Java heap space at org.apache.lucene.util.OpenBitSet.init(OpenBitSet.java:88) at org.apache.solr.search.DocSetCollector.collect(DocSetCollector.java:65) at org.apache.lucene.search.Scorer.score(Scorer.java:64) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:605) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:297) at org.apache.solr.search.SolrIndexSearcher.getDocSetNC(SolrIndexSearcher.java:1060) at org.apache.solr.search.SolrIndexSearcher.getPositiveDocSet(SolrIndexSearcher.java:763) at org.apache.solr.search.SolrIndexSearcher.getProcessedFilter(SolrIndexSearcher.java:880) at org.apache.solr.search.Grouping.execute(Grouping.java:284) at org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:384) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:208) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1797) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:637) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:343) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:141) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1307) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:453) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:560) at
Re: Node down, but not out
Wow! Awesome. Give me a bit to try to plug this into my environment. The other way I was going to attempt this was to use the health check file option for the ping request handler. I would have to write a separate process in python or something that would ping zookeeper for active nodes and if the current box's ip is there, I would create the health check file which would make the ping work. I'd prefer not to introduce yet another process that I need to keep running, so this looks promising. Jim On Wed, Jul 24, 2013 at 11:49 AM, Timothy Potter [via Lucene] ml-node+s472066n4080116...@n3.nabble.com wrote: Hi Jim, Based on our discussion, I cooked up this solution for my book Solr in Action and would appreciate you looking it over to see if it meets your needs. The basic idea is to extend Solr's built-in PingRequestHandler to verify a replica is connected to Zookeeper and is in the active state. To enable this, install the custom JAR and then update your solrconfig.xml to use this class instead of the built-in one for the /admin/ping request handler: requestHandler name=/admin/ping class=sia.ch13.ClusterStateAwarePingRequestHandler Code package sia.ch13; import org.apache.solr.cloud.CloudDescriptor; import org.apache.solr.cloud.ZkController; import org.apache.solr.common.SolrException; import org.apache.solr.common.cloud.ClusterState; import org.apache.solr.common.cloud.Slice; import org.apache.solr.core.CoreContainer; import org.apache.solr.core.CoreDescriptor; import org.apache.solr.core.SolrCore; import org.apache.solr.handler.PingRequestHandler; import org.apache.solr.request.SolrQueryRequest; import org.apache.solr.response.SolrQueryResponse; import org.slf4j.Logger; import org.slf4j.LoggerFactory; /** * Extends Solr's PingRequestHandler to check a replica's cluster status as part of the health check. */ public class ClusterStateAwarePingRequestHandler extends PingRequestHandler { public static Logger log = LoggerFactory.getLogger(ClusterStateAwarePingRequestHandler.class); @Override public void handleRequestBody(SolrQueryRequest solrQueryRequest, SolrQueryResponse solrQueryResponse) throws Exception { // delegate to the base class to check the status of this local index super.handleRequestBody(solrQueryRequest, solrQueryResponse); // if ping status is OK, then check cluster state of this core if (OK.equals(solrQueryResponse.getValues().get(status))) { verifyThisReplicaIsActive(solrQueryRequest.getCore()); } } /** * Verifies this replica is active. */ protected void verifyThisReplicaIsActive(SolrCore solrCore) throws SolrException { String replicaState = unknown; String nodeName = ?; String shardName = ?; String collectionName = ?; String role = ?; Exception exc = null; try { CoreDescriptor coreDescriptor = solrCore.getCoreDescriptor(); CoreContainer coreContainer = coreDescriptor.getCoreContainer(); CloudDescriptor cloud = coreDescriptor.getCloudDescriptor(); shardName = cloud.getShardId(); collectionName = cloud.getCollectionName(); role = (cloud.isLeader() ? Leader : Replica); ZkController zkController = coreContainer.getZkController(); if (zkController != null) { nodeName = zkController.getNodeName(); if (zkController.isConnected()) { ClusterState clusterState = zkController.getClusterState(); Slice slice = clusterState.getSlice(collectionName, shardName); replicaState = (slice != null) ? slice.getState() : gone; } else { replicaState = not connected to Zookeeper; } } else { replicaState = Zookeeper not enabled/configured; } } catch (Exception e) { replicaState = error determining cluster state; exc = e; } if (active.equals(replicaState)) { log.info(String.format(%s at %s for %s in the %s collection is active., role, nodeName, shardName, collectionName)); } else { // fail the ping by raising an exception String errMsg = String.format(%s at %s for %s in the %s collection is not active! State is: %s, role, nodeName, shardName, collectionName, replicaState); if (exc != null) { throw new SolrException(SolrException.ErrorCode.SERVER_ERROR, errMsg, exc); } else { throw new SolrException(SolrException.ErrorCode.SERVER_ERROR, errMsg); } } } } On Tue, Jul 23, 2013 at 1:46 PM, jimtronic [hidden email]http://user/SendEmail.jtp?type=nodenode=4080116i=0 wrote
Re: Node down, but not out
{ throw new SolrException(SolrException.ErrorCode.SERVER_ERROR, errMsg); } } } } On Tue, Jul 23, 2013 at 1:46 PM, jimtronic [hidden email]http://user/SendEmail.jtp?type=nodenode=4080116i=0 wrote: I think the best bet here would be a ping like handler that would simply return the state of only this box in the cluster: Something like /admin/state which would return down,active,leader,recovering I'm not really sure where to begin however. Any ideas? jim On Mon, Jul 22, 2013 at 12:52 PM, Timothy Potter [via Lucene] [hidden email] http://user/SendEmail.jtp?type=nodenode=4080116i=1 wrote: There is but I couldn't get it to work in my environment on Jetty, see: http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201306.mbox/%3CCAJt9Wnib+p_woYODtrSPhF==v8Vx==mDBd_qH=x_knbw-BnPXQ@...%3E http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201306.mbox/%3CCAJt9Wnib+p_woYODtrSPhF==v8Vx==mDBd_qH=x_knbw-BnPXQ@...%3Ehttp://mail-archives.apache.org/mod_mbox/lucene-solr-user/201306.mbox/%3CCAJt9Wnib+p_woYODtrSPhF==v8Vx==mDBd_qH=x_knbw-bn...@mail.gmail.com%3E Let me know if you have any better luck. I had to resort to something hacky but was out of time I could devote to such unproductive endeavors ;-) On Mon, Jul 22, 2013 at 10:49 AM, jimtronic [hidden email] http://user/SendEmail.jtp?type=nodenode=4079518i=0 wrote: I'm not sure why it went down exactly -- I restarted the process and lost the logs. (d'oh!) An OOM seems likely, however. Is there a setting for killing the processes when solr encounters an OOM? Thanks! Jim -- View this message in context: http://lucene.472066.n3.nabble.com/Node-down-but-not-out-tp4079495p4079507.html Sent from the Solr - User mailing list archive at Nabble.com. -- If you reply to this email, your message will be added to the discussion below: . NAML http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml -- View this message in context: http://lucene.472066.n3.nabble.com/Node-down-but-not-out-tp4079495p4079856.html Sent from the Solr - User mailing list archive at Nabble.com. -- If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/Node-down-but-not-out-tp4079495p4080116.html To unsubscribe from Node down, but not out, click herehttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=4079495code=amltdHJvbmljQGdtYWlsLmNvbXw0MDc5NDk1fDEzMjQ4NDk0MTQ= . NAMLhttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml -- View this message in context: http://lucene.472066.n3.nabble.com/Node-down-but-not-out-tp4079495p4080169.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Node down, but not out
I think the best bet here would be a ping like handler that would simply return the state of only this box in the cluster: Something like /admin/state which would return down,active,leader,recovering I'm not really sure where to begin however. Any ideas? jim On Mon, Jul 22, 2013 at 12:52 PM, Timothy Potter [via Lucene] ml-node+s472066n4079518...@n3.nabble.com wrote: There is but I couldn't get it to work in my environment on Jetty, see: http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201306.mbox/%3CCAJt9Wnib+p_woYODtrSPhF==v8Vx==mDBd_qH=x_knbw-BnPXQ@...%3Ehttp://mail-archives.apache.org/mod_mbox/lucene-solr-user/201306.mbox/%3CCAJt9Wnib+p_woYODtrSPhF==v8Vx==mDBd_qH=x_knbw-bn...@mail.gmail.com%3E Let me know if you have any better luck. I had to resort to something hacky but was out of time I could devote to such unproductive endeavors ;-) On Mon, Jul 22, 2013 at 10:49 AM, jimtronic [hidden email]http://user/SendEmail.jtp?type=nodenode=4079518i=0 wrote: I'm not sure why it went down exactly -- I restarted the process and lost the logs. (d'oh!) An OOM seems likely, however. Is there a setting for killing the processes when solr encounters an OOM? Thanks! Jim -- View this message in context: http://lucene.472066.n3.nabble.com/Node-down-but-not-out-tp4079495p4079507.html Sent from the Solr - User mailing list archive at Nabble.com. -- If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/Node-down-but-not-out-tp4079495p4079518.html To unsubscribe from Node down, but not out, click herehttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=4079495code=amltdHJvbmljQGdtYWlsLmNvbXw0MDc5NDk1fDEzMjQ4NDk0MTQ= . NAMLhttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml -- View this message in context: http://lucene.472066.n3.nabble.com/Node-down-but-not-out-tp4079495p4079856.html Sent from the Solr - User mailing list archive at Nabble.com.
Node down, but not out
I've run into a problem recently that's difficult to debug and search for: I have three nodes in a cluster and this weekend one of the nodes went partially down. It no longer responds to distributed updates and it is marked as GONE in the Cloud view of the admin screen. That's not ideal, but there's still two boxes up so not the end of the world. The problem is that it is still responding to ping requests and returning queries successfully. In my setup, I have the three servers on an haproxy load balancer so that I can distribute requests and have clients stick to a specific solr box. Because the bad node is still returning OK to the ping requests and still returns results for simple queries, the load balancer does not remove it from the group. Is there a ping like request handler that would tell me whether the given box I'm hitting is still in the cloud? Thanks! Jim Musil -- View this message in context: http://lucene.472066.n3.nabble.com/Node-down-but-not-out-tp4079495.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Node down, but not out
I'm not sure why it went down exactly -- I restarted the process and lost the logs. (d'oh!) An OOM seems likely, however. Is there a setting for killing the processes when solr encounters an OOM? Thanks! Jim -- View this message in context: http://lucene.472066.n3.nabble.com/Node-down-but-not-out-tp4079495p4079507.html Sent from the Solr - User mailing list archive at Nabble.com.
Best way to match umlauts
I'm trying to make Brüno come up in my results when the user types in Bruno. What's the best way to accomplish this? Using Solr 4.2 -- View this message in context: http://lucene.472066.n3.nabble.com/Best-way-to-match-umlauts-tp4070256.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Best way to match umlauts
Thanks! Sorry for the basic question, but I was having trouble finding the results through google. On Thu, Jun 13, 2013 at 10:39 AM, Jack Krupansky-2 [via Lucene] ml-node+s472066n4070262...@n3.nabble.com wrote: charFilter class=solr.MappingCharFilterFactory mapping=mapping-ISOLatin1Accent.txt/ -- Jack Krupansky -Original Message- From: jimtronic Sent: Thursday, June 13, 2013 11:31 AM To: [hidden email] http://user/SendEmail.jtp?type=nodenode=4070262i=0 Subject: Best way to match umlauts I'm trying to make Brüno come up in my results when the user types in Bruno. What's the best way to accomplish this? Using Solr 4.2 -- View this message in context: http://lucene.472066.n3.nabble.com/Best-way-to-match-umlauts-tp4070256.html Sent from the Solr - User mailing list archive at Nabble.com. -- If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/Best-way-to-match-umlauts-tp4070256p4070262.html To unsubscribe from Best way to match umlauts, click herehttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=4070256code=amltdHJvbmljQGdtYWlsLmNvbXw0MDcwMjU2fDEzMjQ4NDk0MTQ= . NAMLhttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml -- View this message in context: http://lucene.472066.n3.nabble.com/Best-way-to-match-umlauts-tp4070256p4070273.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: dataimporter.last_index_time SolrCloud
Is this a bug? I can create the ticket in Jira if it is, but it's not clear to me what should be happening. I noticed that if it is using the value set in the home directory, but that value does not get updated, so my imports get slower and slower. I guess I could create a cron job to update that time, but this seems kind of wonky. Thanks! Jim -- View this message in context: http://lucene.472066.n3.nabble.com/dataimporter-last-index-time-SolrCloud-tp4055679p4056718.html Sent from the Solr - User mailing list archive at Nabble.com.
dataimporter.last_index_time SolrCloud
My data-config files use the dataimporter.last_index_time variable, but it seems to have stopped working when I upgraded to 4.2. In previous 4.x versions, I saw that it was being written to zookeeper, but now there's nothing there. Did anything change? Or should I be doing something differently? Thanks! Jim -- View this message in context: http://lucene.472066.n3.nabble.com/dataimporter-last-index-time-SolrCloud-tp4055679.html Sent from the Solr - User mailing list archive at Nabble.com.
bootstrap_conf without restarting
I'm doing fairly frequent changes to my data-config.xml files on some of my cores in a solr cloud setup. Is there anyway to to get these files active and up to Zookeeper without restarting the instance? I've noticed that if I just launch another instance of solr with the bootstrap_conf flag set to true, it uploads the new settings, but it dies because there's already a solr instance running on that port. It also seems to make the original one unresponsive or at least down in zookeeper's eyes. I then just restart that instance and everything is back up. It'd be nice if I could bootstrap without actually starting solr. What's the best practice for deploying changes to data-config.xml? Thanks, Jim -- View this message in context: http://lucene.472066.n3.nabble.com/bootstrap-conf-without-restarting-tp4052092.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Did something change with Payloads?
Created: https://issues.apache.org/jira/browse/SOLR-4639 Thanks! On Fri, Mar 22, 2013 at 5:01 PM, Mark Miller-3 [via Lucene] ml-node+s472066n405060...@n3.nabble.com wrote: On Mar 22, 2013, at 5:54 PM, jimtronic [hidden email]http://user/SendEmail.jtp?type=nodenode=4050603i=0 wrote: Ok, this is very bizzare. If I insert more than one document at a time using the update handler like so: [{id:1,foo_ap:bar|50}},{id:2,foo_ap:bar|75}] It actually stores the same payload value 50 for both docs. That seems like a bug, no? There was a core change in 4.1 to how payloads were stored. I'm wondering if solr is not handling them properly? This could be - if you have compiled a lot of evidence (sorry i have not had time to follow up on this myself), please create a jira issue for more prominence. - Mark Jim -- View this message in context: http://lucene.472066.n3.nabble.com/Did-something-change-with-Payloads-tp4049561p4050599.html Sent from the Solr - User mailing list archive at Nabble.com. -- If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/Did-something-change-with-Payloads-tp4049561p4050603.html To unsubscribe from Did something change with Payloads?, click herehttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=4049561code=amltdHJvbmljQGdtYWlsLmNvbXw0MDQ5NTYxfDEzMjQ4NDk0MTQ= . NAMLhttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml -- View this message in context: http://lucene.472066.n3.nabble.com/Did-something-change-with-Payloads-tp4049561p4050748.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Did something change with Payloads?
Ok, this is very bizzare. If I insert more than one document at a time using the update handler like so: [{id:1,foo_ap:bar|50}},{id:2,foo_ap:bar|75}] It actually stores the same payload value 50 for both docs. That seems like a bug, no? There was a core change in 4.1 to how payloads were stored. I'm wondering if solr is not handling them properly? Jim -- View this message in context: http://lucene.472066.n3.nabble.com/Did-something-change-with-Payloads-tp4049561p4050599.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Did something change with Payloads?
Ok, Yes, I have now recompiled against the 4.2.0 libraries. I needed to change a few things, but the problem still exists using the new libraries. I think the problem may actually be on the indexing side of things. Here's why: 1. I had an old index created under 4.0, running 4.0. Works as expected. 2. I used the same index, but running under 4.2. Works as expected. 3. I started fresh with 4.2 and did a fresh import of the data. Does not work. By Does not work I mean this. The payload values that I enter are not the payload values I get back using my custom query plugin. These are stored fields, so I can see clearly what the payload value should be. Is there any way to see what the payload value is at a very low level? Thanks! Jim -- View this message in context: http://lucene.472066.n3.nabble.com/Did-something-change-with-Payloads-tp4049561p4049813.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Did something change with Payloads?
Something has definitely changed at 4.1. I've installed 4.0, 4.1, and 4.2 side by side and conducted the same tests on each one. Only 4.0 is returning the expected results. Apologies for cross posting this here and in the Lucene forum, but I really can't tell if this is a Solr or a Lucene issue. In my tests, I have the following two documents and a custom query plugin that should average the payload of the term bing and use that as the score: In 4.1 and 4.2, I get: docs:[ { id:3, foo_ap:[bing|9,bing|7], score:9.0}, { id:1, foo_ap:[bing|9 bing|7,badda|9 bing|7], score:9.0}, ] Using 4.0, I get these results: docs:[ { id:1, foo_ap:[bing|9 bing|7,badda|9 bing|7], score:7.665}, { id:3, foo_ap:[bing|9,bing|7], score:8.0} ] Thanks for any input. -- View this message in context: http://lucene.472066.n3.nabble.com/Did-something-change-with-Payloads-tp4049561p4049957.html Sent from the Solr - User mailing list archive at Nabble.com.
Did something change with Payloads?
I've been using Payloads through several versions of Solr including 4.0, but now they are no longer working correctly on 4.2 I had originally followed Grant's article here: http://searchhub.org/2009/08/05/getting-started-with-payloads/ I have a custom query plugin {!payload} that will return the payload value for a given term, but now it's returning erratic results. No errors, but just the wrong values. Thanks for any help! Jim -- View this message in context: http://lucene.472066.n3.nabble.com/Did-something-change-with-Payloads-tp4049561.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Did something change with Payloads?
Actually, this is more like the code I've got in place: http://sujitpal.blogspot.com/2011/01/payloads-with-solr.html Jim -- View this message in context: http://lucene.472066.n3.nabble.com/Did-something-change-with-Payloads-tp4049561p4049566.html Sent from the Solr - User mailing list archive at Nabble.com.
Zookeeper specs
I understand this may be a better question for the zookeeper list, but I'm asking here because I'm not completely clear how much load zookeeper takes on in a solr cloud setup. I'm trying to determine what specs my zookeeper boxes should be. I'm on EC2, so what I'm curious about is whether zookeeper should have high I/O, high memory, or high CPU. I've been running my zookeeper on micro instances with no problem, but want to understand what the potential bottlenecks might be. Thanks for any input! Jim -- View this message in context: http://lucene.472066.n3.nabble.com/Zookeeper-specs-tp4049058.html Sent from the Solr - User mailing list archive at Nabble.com.
Practicality of enormous fields
What are the likely ramifications of having a stored field with millions of words? For example, If I had an article and wanted to store the user id of every user who has read it and stuck it into a simple white space delimited field. What would go wrong and when? My tests lead me to believe this is not a problem, but it feels weird. Jim -- View this message in context: http://lucene.472066.n3.nabble.com/Practicality-of-enormous-fields-tp4049131.html Sent from the Solr - User mailing list archive at Nabble.com.
Scaling SolrCloud and DIH
I'm curious how people are using DIH with SolrCloud. I have cron jobs set up to trigger the dataimports which come from both xml files and a sql database. Some are frequent small delta imports while others are larger daily xml imports. Here's what I've tried: 1. Set up a micro box that sends the dataimport requests to a load balancer using cron. This didn't work because frequent requests would get spread around and at one point all my nodes were doing the dataimport requests at the same time. 2. Designate one box as the indexer and call dataimport via localhost. The problem here is that I now have a single point of failure for indexing -- I always have to have that box running. I love that SolrCloud is distributed so I can have 3 boxes in my cluster and I don't care which one goes down. I don't really know what the solution is, but I guess it would be nice if the dataimport was cloud aware. Meaning that the cluster knows an update is happening on one of the boxes and won't let another one start. That way I could just send the dataimport request up through the load balancer and forget about it. Anyway, I thought I would see how others are handling this issue. Cheers, Jim -- View this message in context: http://lucene.472066.n3.nabble.com/Scaling-SolrCloud-and-DIH-tp4047049.html Sent from the Solr - User mailing list archive at Nabble.com.
Some nodes have all the load
I was doing some rolling updates of my cluster ( 12 cores, 4 servers ) and I ended up in a situation where one node was elected leader by all the cores. This seemed very taxing to that one node. It was also still trying to serve query requests so it slowed everything down. I'm trying to do a lot of frequent atomic updates along with some periodic DIH syncs. My solution to this situation was to try to take the supreme leader out of the cluster and let the leader election start. This was not easy as there was so much load on it, I couldn't take it out gracefully. Some of my cores became unreachable for a while. This was all under fictitious load, but it made me nervous about high load production situation. I'm sure there's several things I'm doing wrong in all this, so I thought I'd see what you guys think. Jim -- View this message in context: http://lucene.472066.n3.nabble.com/Some-nodes-have-all-the-load-tp4046349.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Some nodes have all the load
The load test was fairly heavy (ie lots of users) and designed to mimic a fully operational system with lots of users doing normal things. There were two things I gleaned from the logs: PERFORMANCE WARNING: Overlapping onDeckSearchers=2 appeared for several of my more active cores and The non-leaders were throwing errors saying that the leader as not responding while trying to forward updates. (sorry can't find that specific error now) My best guess is that it has something to do with the commits. a. frequent user generated writes using /update?commitWithin=500waitFlush=falsewaitSearcher=false b. softCommit set to 3000 c. autoCommit set to 300,000 and openSearcher false d. I'm also doing frequent periodic DIH updates. I guess this is commit=true by default. Should I omit commitWithin and set DIH to commit=false and just let soft and autocommit do their jobs? Cheers, Jim -- View this message in context: http://lucene.472066.n3.nabble.com/Some-nodes-have-all-the-load-tp4046349p4046476.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Feeding Custom QueryParser with Nested Query
It seems like I could could accomplish this by following the JoinQParserPlugin logic. I can actually get pretty close using the join query, but I need to do some extra math in the middle. The difference in my case is that I need to access the id and the score. I *think* the logic would go something like this: 1. do sub query to get doc ids and score 2. use the resulting doc ids to feed into another query. 3. write a custom scorer that uses the score from the subquery to determine the scores of the final results. Thanks for any suggestions... Jim -- View this message in context: http://lucene.472066.n3.nabble.com/Feeding-Custom-QueryParser-with-Nested-Query-tp4046007p4046162.html Sent from the Solr - User mailing list archive at Nabble.com.
optimal maxWarmingSearchers in solr cloud
The notes for maxWarmingSearchers in solrconfig.xml state: Recommend values of 1-2 for read-only slaves, higher for masters w/o cache warming. Since solr cloud nodes could be both a leader and non-leader depending on the current state of the cloud, what would be the optimal setting here? Thanks! Jim -- View this message in context: http://lucene.472066.n3.nabble.com/optimal-maxWarmingSearchers-in-solr-cloud-tp4046164.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Multiple Collections in one Zookeeper
Ok, I'm a little confused. I had originally bootstrapped zookeeper using a solr.xml file which specified the following cores: cats dogs birds In my /solr/#/cloud?view=tree view I see that I have /collections /cats /dogs /birds /configs /cats /dogs /birds When I launch a new server and connect it to zookeeper, it creates all three collections. What I'd like to do is move cats to it's own set of boxes. When I run: java -DzkHost=zookeeper:9893/cats -jar start.jar or java -DzkHost=zookeeper:9893,zookeeper:9893/cats -jar start.jar I get this error: SEVERE: Could not create Overseer node For simplicity, I'd like to only have zookeeper ensemble. -- View this message in context: http://lucene.472066.n3.nabble.com/Multiple-Collections-in-one-Zookeeper-tp4045936p4045981.html Sent from the Solr - User mailing list archive at Nabble.com.
Feeding Custom QueryParser with Nested Query
I've written a custom query parser that we'll call {!doFoo } which takes two parameters: a field name and a space delimited list of values. The parser does some calculations between the list of values and the field in question. In some cases, the list is quite long and as it turns out, the core already has the information. I think most of my latency in this operation is just passing big lists around. Ideally, I'd like to accomplish something like this: {!doFoo f=my_field v='query(...)'} Or, even better, if I could just pass a parameter in and get the results. {!doFoo with='bar') Thanks for any advice! Jim -- View this message in context: http://lucene.472066.n3.nabble.com/Feeding-Custom-QueryParser-with-Nested-Query-tp4046007.html Sent from the Solr - User mailing list archive at Nabble.com.
Multiple Collections in one Zookeeper
Hi, I have a solrcloud cluster running several cores and pointing at one zookeeper. For performance reasons, I'd like to move one of the cores on to it's own dedicated cluster of servers. Can I use the same zookeeper to keep track of both clusters. Thanks! Jim -- View this message in context: http://lucene.472066.n3.nabble.com/Multiple-Collections-in-one-Zookeeper-tp4045936.html Sent from the Solr - User mailing list archive at Nabble.com.
Nodes out of sync, deletes fail
I'm not sure how it happened, but one of my nodes has different data than the others. When I try to delete the offending document by posting json to the /update url, it hangs and after a minute it just fails with no reply. I disconnected the offending node from the cloud and was able to delete the problem docs without issue. It seems as though there's a real problem here though if a delete tries to propagate to other nodes that don't have that document. I tried deleting by id and by query. -- View this message in context: http://lucene.472066.n3.nabble.com/Nodes-out-of-sync-deletes-fail-tp4043433.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Nodes out of sync, deletes fail
solrspec: 5.0.0.2012.12.03.13.10.02 -- View this message in context: http://lucene.472066.n3.nabble.com/Nodes-out-of-sync-deletes-fail-tp4043433p4043437.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Nodes out of sync, deletes fail
Oddly, not much info there. Here's what I do know. - I had a three node cluster running. - adding documents was also failing in the same exact way. - updates/deletes would make it to the elected leader, but then never show up on the other nodes. - eventually, after 30 seconds or so, the write to the leader would succeed, but it never showed up on any other node. This caused my nodes to be out of sync. - once i restarted solr on the other nodes, everything worked great. updates/deletes worked immediately. It seems odd that the write should succeed on the leader even though it didn't work on the other nodes. Jim On Wed, Feb 27, 2013 at 1:06 PM, Mark Miller-3 [via Lucene] ml-node+s472066n4043462...@n3.nabble.com wrote: You are working off trunk? Do you have any interesting info in the logs? - Mark On Feb 27, 2013, at 12:55 PM, jimtronic [hidden email]http://user/SendEmail.jtp?type=nodenode=4043462i=0 wrote: solrspec: 5.0.0.2012.12.03.13.10.02 -- View this message in context: http://lucene.472066.n3.nabble.com/Nodes-out-of-sync-deletes-fail-tp4043433p4043437.html Sent from the Solr - User mailing list archive at Nabble.com. -- If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/Nodes-out-of-sync-deletes-fail-tp4043433p4043462.html To unsubscribe from Nodes out of sync, deletes fail, click herehttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=4043433code=amltdHJvbmljQGdtYWlsLmNvbXw0MDQzNDMzfDEzMjQ4NDk0MTQ= . NAMLhttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml -- View this message in context: http://lucene.472066.n3.nabble.com/Nodes-out-of-sync-deletes-fail-tp4043433p4043465.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Nodes out of sync, deletes fail
Currently, a leader does an update locally before sending in parallel to all replicas. If we can't send an update to a replica, because it crashed, or because of some other reason, we ask that replica to recover if we can. In that case, it's either gone and will come back and recover, or oddly, the request failed and it's still in normal operations, in which case we ask it to recover because something must be wrong. So if a leader can't send to any replicas, he's going to assume they are all screwed (they are if he can't send to them) and think he is the only part of the cluster. It might be nice if we had a param for you to say, consider this a fail unless it hits this many replicas - but still the leader is going to have carried out the request. This seems to violate the strong consistency model doesn't it? If a write doesn't succeed at a replica, it shouldn't succeed anywhere. Cassandra seems to have this same problem -- http://www.datastax.com/dev/blog/how-cassandra-deals-with-replica-failure-- except that it returns a timeout error and saves the hint for later. I was assuming that solr was acting like CONSISTENCY ALL for writes and CONSISTENCY ANY for reads. If that were the case, I'd like to ensure that my nodes don't get out of sync if an otherwise healthy node can't perform the update and that the original write would be rolled back. What you need to figure out is why the leader could not talk to the replicas - very weird to not see log errors about that! Were the replicas responding to requests? OOM's are bad for SolrCloud by the way - a JVM that has OOM is outta control - you really want to use the option that kills the jvm on OOMs. This does seem to be the biggest problem. The replica was responding normally. I'll try upping the memory and getting the latest version. - Mark -- If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/Nodes-out-of-sync-deletes-fail-tp4043433p4043467.html To unsubscribe from Nodes out of sync, deletes fail, click herehttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=4043433code=amltdHJvbmljQGdtYWlsLmNvbXw0MDQzNDMzfDEzMjQ4NDk0MTQ= . NAMLhttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml -- View this message in context: http://lucene.472066.n3.nabble.com/Nodes-out-of-sync-deletes-fail-tp4043433p4043478.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SolrCloud as my primary data store
Yes, these are good points. I'm using solr to leverage user preference data and I need that data available real time. SQL just can't do the kind of things I'm able to do in solr, so I have to wait until the write (a user action, a user preference, etc) gets to solr from the db anyway. I'm kind of curious about how many single documents i can send through via the json update in a day. Millions would be nice, but I wonder what the upper limit would be. -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-as-my-primary-data-store-tp4041774p4042251.html Sent from the Solr - User mailing list archive at Nabble.com.
SolrCloud as my primary data store
Now that I've been running Solr Cloud for a couple months and gotten comfortable with it, I think it's time to revisit this subject. When I search for the topic of using Solr as a primary db online, I get lots of discussions from 2-3 years ago and usually they point out a lot of hurdles that have now largely been eliminated with the release of Solr Cloud. I've stopped using the standard method of writing to my db and pushing out periodically to solr. Instead, I'm writing simultaneously to solr and the db with less frequent syncs from the database just to be safe. I find this to be much faster and easier than doing delta imports via the DIH handler. In fact, it's gone so smoothly, I'm really wondering why I need to keep writing it to the db at all. I've always got several nodes running and launching new ones takes only minutes to be fully operational. I'm taking frequent snapshots and my test restores have been painless and quick. So, if I'm looking at other NoSQL solutions like MongoDB or Cassandra, why wouldn't I just use Solr? It's distributed, fast, and stable. It has a great http api and it's nearly schema-less using dynamic fields. And, most importantly, it offers the most powerful query language available. I'd really like to hear from someone who has made the leap. Cheers, Jim -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-as-my-primary-data-store-tp4041774.html Sent from the Solr - User mailing list archive at Nabble.com.
DIH clean=true behavior in SolrCloud
I'm confused about the behavior of clean=true using the DataImportHandler. When I use clean=true on just one instance, it doesn't blow all the data out until the import succeeds. In a cluster, however, it appears to blow all the data out of the other nodes first, then starts adding new docs. Am I wrong about this? Jim -- View this message in context: http://lucene.472066.n3.nabble.com/DIH-clean-true-behavior-in-SolrCloud-tp4031680.html Sent from the Solr - User mailing list archive at Nabble.com.
If bootstrap a new solrconfig file to zookeeper, do I need to restart all nodes?
I have a simple cluster of three servers and a dedicated zookeeper server running separately. If I make a change to my solrconfig.xml file on one of the servers and restart the server with the bootstrap_conf=true option, will that change be sent to the other nodes? Or, will I have to log into each node and restart the server? -- View this message in context: http://lucene.472066.n3.nabble.com/If-bootstrap-a-new-solrconfig-file-to-zookeeper-do-I-need-to-restart-all-nodes-tp4014520.html Sent from the Solr - User mailing list archive at Nabble.com.
Filter results based on custom scoring and _val_
I'm using solr function queries to generate my own custom score. I achieve this using something along these lines: q=_val_:my_custom_function() This populates the score field as expected, but it also includes documents that score 0. I need a way to filter the results so that scores below zero are not included. I realize that I'm using score in a non-standard way and that normally the score that lucene/solr produce is not absolute. However, producing my own score works really well for my needs. I've tried using {!frange l=0} but this causes the score for all documents to be 1.0. I've found that I can do the following: q=*:*fl=foo:my_custom_function()fq={!frange l=1}my_custom_function() This puts my custom score into foo, but it requires me to list all the logic twice. Sometimes my logic is very long. -- View this message in context: http://lucene.472066.n3.nabble.com/Filter-results-based-on-custom-scoring-and-val-tp4012968.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to post atomic updates using xml
For multi-valued fields, you can use add to add a value to the list. If the value already exists, it will be there twice. set will replace the entire list with the one value that you specify. There's currently no method to remove a value, although the issue has been logged: https://issues.apache.org/jira/browse/SOLR-3862 You can always edit the list by pulling down all the values and uploading the new set. Jim -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-post-atomic-updates-using-xml-tp4007323p4010547.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: need best solution for indexing and searching multiple, related database tables
I'm not sure if this will be relevant for you, but this is roughly what I do. Apologies if it's too basic. I have a complex view that normalizes all the data that I need to be together -- from over a dozen different tables. For one to many and many to many relationships, I have sql turn the data into a comma delimited string which the data import handler and the RegexTransformer will split into a multi-valued field. So, you might have a schema like this: id123/id name_sJohn Smith/name_s attr_products strpython/str strjava/str strjavascript/str /attr_products Often I've found that I don't really need to the data together into one solr core and it works better to just create a separate core just for that schema. -- View this message in context: http://lucene.472066.n3.nabble.com/need-best-solution-for-indexing-and-searching-multiple-related-database-tables-tp4009857p4009879.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: some general solr 4.0 questions
I've got a setup like yours -- lots of cores and replicas, but no need for shards -- and here's what I've found so far: 1. Zookeeper is tiny. I would think network I/O is going to be the biggest concern. 2. I think this is more about high availability than performance. I've been expirementing with taking down parts of my setup to see what happens. When zookeeper goes down, the solr instances still serve requests. It appears, however, that updating and replication stop. I want to make frequent updates so this is a big concern for me. 3. On ec2, I launch a server which is configured to register itself with my zookeeper box upon launch. When they are ready I add them to my load balancer. Theoretically, zookeeper would help further balance them, but right now I find those queries to be too slow. Since the load balancer is already distributing the load, I'm adding the parameter distrib=false to my queries. This forces the request to stay on the box the load balancer chose. 4. This is interesting. I started down this path of wanting to maintain a master, but I've moved towards a system where all of my update requests go through my load balancer. Since zookeeper dynamically elects a leader, no matter which box gets the update the leader gets it anyway. This is very nice for me because I want all my solr instances to be identical. Since there's not a lot of documentation on this yet, I hope other people share their findings, too. -- View this message in context: http://lucene.472066.n3.nabble.com/some-general-solr-4-0-questions-tp4009267p4009286.html Sent from the Solr - User mailing list archive at Nabble.com.
Backup strategy for SolrCloud
I'm trying to determine my options for backing up data from a SolrCloud cluster. For me, bringing up my cluster from scratch can take several hours. It's way faster to take snapshots of the index periodically and then use one of these when booting a new instance. Since I use static xml files and delta-imports, everything catches up on quickly. Sorry if this is a dumb question, but where do I pull the snapshots from? Zookeeper? Any box in the cluster? The leader? Thanks! Jim -- View this message in context: http://lucene.472066.n3.nabble.com/Backup-strategy-for-SolrCloud-tp4009291.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: deleting a single value from multivalued field
Just added this today. https://issues.apache.org/jira/browse/SOLR-3862 -- View this message in context: http://lucene.472066.n3.nabble.com/deleting-a-single-value-from-multivalued-field-tp4009092p4009292.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Backup strategy for SolrCloud
I'm thinking about catastrophic failure and recovery. If, for some reason, the cluster should go down or become unusable and I simply want to bring it back up as quickly as possible, what's the best way to accomplish that? Maybe I'm thinking about this incorrectly? Is this not a concern? -- View this message in context: http://lucene.472066.n3.nabble.com/Backup-strategy-for-SolrCloud-tp4009291p4009297.html Sent from the Solr - User mailing list archive at Nabble.com.
Help with slow Solr Cloud query
Hi, I've got a set up as follows: - 13 cores - 2 servers - running Solr 4.0 Beta with numShards=1 and an embedded zookeeper. I'm trying to figure out why some complex queries are running so slowly in this setup versus quickly in a standalone mode. Given a query like: /select?q=(some complex query) It runs fast and gets faster (caches) when only running one server: 1. ?fl=*q=(complex query)wt=jsonrows=24 (QTime 3) When, I issue the same query to the cluster and watch the logs, it looks like it's actually performing the query 3 times like so: 1. ?q=(complex query)distrib=falsewt=javabinrows=24version=2NOW=1347911018556shard.url=(server1)|(server2)fl=id,scoredf=textstart=0isShard=truefsv=true (QTime 2) 2. ?ids=(ids from query 1)distrib=falsewt=javabinrows=24version=2df=textfl=*shard.url=(server1)|(server2)NOW=1347911018556start=0q=(complex query)isShard=true (QTime 4) 3. ?fl=*q=(complex query)wt=jsonrows=24 (QTime 459) Why is it performing #3? It already has everything it needs in #2 and #3 seems to be really slow even when warmed and cached. As stated above, this query is fast when running on a single server that is warmed and cached. Since my query is complex, I could understand some slowness if I was attempting this across multiple shards, but since there's only one shard, shouldn't it just pick one server and query it? Thanks! Jim -- View this message in context: http://lucene.472066.n3.nabble.com/Help-with-slow-Solr-Cloud-query-tp4008448.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to post atomic updates using xml
Actually, the correct method appears to be this: an atomic update in JSON: { id : book1, author : {set:Neal Stephenson} } the same in XML: add doc field name=idbook1/field field name=author update=setNeal Stephenson/field /doc /add Jim -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-post-atomic-updates-using-xml-tp4007323p4007517.html Sent from the Solr - User mailing list archive at Nabble.com.
How to post atomic updates using xml
There's a good intro to atomic updates here: http://yonik.com/solr/atomic-updates/ but it does not describe how to structure the updates using xml. Anyone have any idea on how these would look? Thanks! Jim -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-post-atomic-updates-using-xml-tp4007323.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to post atomic updates using xml
Figured it out. in JSON: {id : book1, author : {set:Neal Stephenson} } in XML: adddocfield name=idbook1/fieldfield name=author set=Neal Stephenson/field This seems to work. Jim -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-post-atomic-updates-using-xml-tp4007323p4007325.html Sent from the Solr - User mailing list archive at Nabble.com.
Atomic Updates, Payloads, Non-stored data
Hi, I'm using payloads to tie a value to an attribute for a document -- eg a user's rating for a document. I do not store this data, but I index it and access the value through function queries. I was really excited about atomic updates, but they do not work for me because they are blowing out all of my non-stored payload data. I can make the fields stored, but that is not desirable as in some cases there's a lot of data. I was wondering how feasible it would be for me to modify the DistributedUpdateProcessor so that it preserves my non-stored payloads while performing the atomic updates. Thanks! Jim -- View this message in context: http://lucene.472066.n3.nabble.com/Atomic-Updates-Payloads-Non-stored-data-tp4006678.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How can I use a function or fieldvalue as the default for query(subquery, default)?
I was able to use solr 3.1 functions to accomplish this logic: /solr/select?q=_val_:sum(query({!dismax qf=text v='solr rocks'}),product(map(query({!dismax qf=text v='solr rocks'},-1),0,100,0,1), product(this_field,that_field))) -- View this message in context: http://lucene.472066.n3.nabble.com/How-can-I-use-a-function-or-fieldvalue-as-the-default-for-query-subquery-default-tp3924172p3926183.html Sent from the Solr - User mailing list archive at Nabble.com.
How can I use a function or fieldvalue as the default for query(subquery, default)?
Hi, For the solr function query(subquery, default) I'd like to be able to specify the value of another field or even a function as the default. For example, I might have: /solr/select?q=_val_:query({!dismax qf=text v='solr rocks'}, product(this_field, that_field)) Is this possible? I see that Boolean functions are coming in Solr 4, but it is unclear whether these would accept functions as defaults. Thanks, Jim -- View this message in context: http://lucene.472066.n3.nabble.com/How-can-I-use-a-function-or-fieldvalue-as-the-default-for-query-subquery-default-tp3924172p3924172.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Concatenate multivalued DIH fields
I solved this problem using the flatten=true attribute. Given this schema people person names name firstNameJoe/firstName lastNameSmith/firstName /name /names /person /people field column=attr_names xpath=/people/person/names/name flatten=true / attr_names is a multiValued field in my schema.xml. The flatten attribute tells solr to take all the text from the specified node and below. -- View this message in context: http://lucene.472066.n3.nabble.com/Concatenate-multivalued-DIH-fields-tp2749988p2875435.html Sent from the Solr - User mailing list archive at Nabble.com.