Re: 4.3.1 SC - IndexWriter issues causing replication + failures
Some more info to provide: -Replication almost never completes following the this IndexWriter is closed stacktraces. -When the replication begins after this IndexWriter is closed error, over a few hours the replica eventually fills the disk to 100% with index files under data/. There are so many files in the data directory it can't be listed and takes a very long time to delete. It seems the frequent replications are filling the disk with new files whose sum is roughly 3 times larger than the real index. Is it leaking filehandles or forgetting it has downloaded something? Is this a better question for the lucene list? It seems (see below) that this stacktrace is occuring in the lucene layer vs solr, but maybe someone could confirm? ERROR [2014-01-27 18:28:49.368] [org.apache.solr.common.SolrException] org.apache.lucene.store.AlreadyClosedException: this IndexWriter is closed at org.apache.lucene.index.DocumentsWriter.ensureOpen(DocumentsWriter.java:199) at org.apache.lucene.index.DocumentsWriter.preUpdate(DocumentsWriter.java:338) at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:419) at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1508) at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:210) at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:69) at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:51) at org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(DistributedUpdateProcessor.java:519) at org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:655) at org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:398) at org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:246) at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:173) at org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1820) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:656) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:359) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:155) ... chopped Thanks! Tim On 5 February 2014 13:04, Tim Vaillancourt t...@elementspace.com wrote: Hey guys, I am troubleshooting an issue on a 4.3.1 SolrCloud: 1 collection and 2 shards over 4 Solr instances, (which results in 1 core per Solr instance). After some time in Production without issues, we are seeing errors related to the IndexWriter all over our logs and an infinite loop of failing replication from Leader on our 2 replicas. We see a flood of: org.apache.lucene.store.AlreadyClosedException: this IndexWriter is closed stacktraces, then the Solr replica tries to replicate/recover, then fails replication and then the following 2 errors show up: 1) SolrIndexWriter was not closed prior to finalize(), indicates a bug -- POSSIBLE RESOURCE LEAK!!! 2) Error closing IndexWriter, trying rollback (which results in a null-pointer exception). I'm guessing the best way forward would be to upgrade to latest, but that is an undertaking that will take significant time/testing. In the meantime, is there anything I can do to mitigate or understand the issue more? Does anyone know what the IndexWriter errors refer to? Below is a URL to a .txt file with summarized portions of my solr.log. Any help is really appreciated as always!! http://timvaillancourt.com.s3.amazonaws.com/tmp/solr.log-summarized.txt Thanks all, Tim
4.3.1 SC - IndexWriter issues causing replication + failures
Hey guys, I am troubleshooting an issue on a 4.3.1 SolrCloud: 1 collection and 2 shards over 4 Solr instances, (which results in 1 core per Solr instance). After some time in Production without issues, we are seeing errors related to the IndexWriter all over our logs and an infinite loop of failing replication from Leader on our 2 replicas. We see a flood of: org.apache.lucene.store.AlreadyClosedException: this IndexWriter is closed stacktraces, then the Solr replica tries to replicate/recover, then fails replication and then the following 2 errors show up: 1) SolrIndexWriter was not closed prior to finalize(), indicates a bug -- POSSIBLE RESOURCE LEAK!!! 2) Error closing IndexWriter, trying rollback (which results in a null-pointer exception). I'm guessing the best way forward would be to upgrade to latest, but that is an undertaking that will take significant time/testing. In the meantime, is there anything I can do to mitigate or understand the issue more? Does anyone know what the IndexWriter errors refer to? Below is a URL to a .txt file with summarized portions of my solr.log. Any help is really appreciated as always!! http://timvaillancourt.com.s3.amazonaws.com/tmp/solr.log-summarized.txt Thanks all, Tim
Re: Perl Client for SolrCloud
I'm pretty interested in taking a stab at a Perl CPAN for SolrCloud that is Zookeeper-aware; it's the least I can do for Solr as a non-Java developer. :) A quick question though: how would I write the shard logic to behave similar to Java's Zookeeper-aware client? I'm able to get the hash/hex needed for each shard from clusterstate.json, but how do I know which field to hash on? I'm guessing I also need to read the collection's schema.xml from Zookeeper to get uniqueKey, and then use that for sharding, or does the Java client take the sharding field as input? Looking for ideas here. Thanks! Tim On 08/01/14 09:35 AM, Chris Hostetter wrote: : I couldn't find anyone which can connect to SolrCloud similar to SolrJ's : CloudSolrServer. : : Since I have a load balancer in front of 8 nodes, WebService::Solr[1] still : works fine. Right -- just because SolrJ is ZooKeeper aware doesn't mean you can *only* talk to SolrCloud with SolrJ -- you can still use any HTTP client of your choice to connect to your Solr nodes in a round robin fashion (or via a load blancer) if you wish -- just like with a non SolrCloud deployment using something like master/slave. What you might want to consider, is taking a look at something like Net::ZooKeeper to have a ZK aware perl client layer that could wrap WebService::Solr. -Hoss http://www.lucidworks.com/
Re: Redis as Solr Cache
This is a neat idea, but could be too close to lucene/etc. You could jump up one level in the stack and use Redis/memcache as a distributed HTTP cache in conjunction with Solr's HTTP caching and a proxy. I tried doing this myself with Nginx, but I forgot what issue I hit - I think misses needed logic outside of nginx but I didn't spend too much time on it. Tim On 2 January 2014 07:51, Alexander Ramos Jardim alexander.ramos.jar...@gmail.com wrote: You touched an interesting point. I am really assuming if a quick win scenario is even possible. But what would be the advantage of using Redis to keep Solr Cache if each node would keep it's own Redis cache? 2013/12/29 Upayavira u...@odoko.co.uk On Sun, Dec 29, 2013, at 02:35 PM, Alexander Ramos Jardim wrote: While researching for Solr Caching options and interesting cases, I bumped on this https://github.com/dfdeshom/solr-redis-cache. Does anyone has any experience with this setup? Using Redis as Solr Cache. I see a lot of advantage in having a distributed cache for solr. One solr node benefiting from the cache generated on another one would be beautiful. I see problems too. Performance wise, I don't know if it would be viable for Solr to write it's cache through the network on Redis Master node. And what about if I have Solr nodes with different index version looking at the same cache? IMO as long as Redis is useful, if it isn't to have a distributed cache, I think it's not possible to get better performance using it. This idea makes assumptions about how a Solr/Lucene index operates. Certainly, in a SolrCloud setup, each node is responsible for its own committing, and its caches exist for the timespan between commits. Thus, the cache one node will need will not necessarily be the same as the one that is needed by another node, which might have a commit interval slightly out of sync with the first. So, whilst this may be possible, and may give some benefits, I'd reckon that it would be a rather substantial engineering exercise, rather than the quick win you seem to be assuming it might be. Upayavira -- Alexander Ramos Jardim
Re: Inconsistent numFound in SC when querying core directly
Very good point. I've seen this issue occur once before when I was playing with 4.3.1 and don't remember it happening since 4.5.0+, so that is good news - we are just behind. For anyone that is curious, on my earlier mention that Zookeeper/clusterstate.json was not taking updates: this was NOT correct. Zookeeper has no issues taking set/creates to clusterstate.json (or any znode), just this one node seemed to stay stuck as state: active while it was very inconsistent for reasons unknown, potentially just bugs. The good news is this will be resolved today with a create/destroy of the bad replica. Thanks all! Tim On 4 December 2013 16:50, Mark Miller markrmil...@gmail.com wrote: Keep in mind, there have been a *lot* of bug fixes since 4.3.1. - Mark On Dec 4, 2013, at 7:07 PM, Tim Vaillancourt t...@elementspace.com wrote: Hey all, Now that I am getting correct results with distrib=false, I've identified that 1 of my nodes has just 1/3rd of the total data set and totally explains the flapping in results. The fix for this is obvious (rebuild replica) but the cause is less obvious. There is definately more than one issue going on with this SolrCloud (but 1 down thanks to Chris' suggestion!), so I'm guessing the fact that /clusterstate.json doesn't seem to get updated when nodes are brought down/up is the reason why this replica remained in the distributed request chain without recovering/re-replicating from leader. I imagine my Zookeeper ensemble is having some problems unrelated to Solr that is the real root cause. Thanks! Tim On 04/12/13 03:00 PM, Tim Vaillancourt wrote: Chris, this is extremely helpful and it's silly I didn't think of this sooner! Thanks a lot, this makes the situation make much more sense. I will gather some proper data with your suggestion and get back to the thread shortly. Thanks!! Tim On 04/12/13 02:57 PM, Chris Hostetter wrote: : : I may be incorrect here, but I assumed when querying a single core of a : SolrCloud collection, the SolrCloud routing is bypassed and I am talking : directly to a plain/non-SolrCloud core. No ... every query received from a client by solr is handled by a single core -- if that core knows it's part of a SolrCloud collection then it will do a distributed search across a random replica from each shard in that collection. If you want to bypass the distribute search logic, you have to say so explicitly... To ask an arbitrary replica to only search itself add distrib=false to the request. Alternatively: you can ask that only certain shard names (or certain explicit replicas) be included in a distribute request.. https://cwiki.apache.org/confluence/display/solr/Distributed+Requests -Hoss http://www.lucidworks.com/
Re: Inconsistent numFound in SC when querying core directly
I spoke too soon, my plan for fixing this didn't quite work. I've moved this issue into a new thread/topic: No /clusterstate.json updates on Solrcloud 4.3.1 Cores API UNLOAD/CREATE. Thanks all for the help on this one! Tim On 5 December 2013 11:37, Tim Vaillancourt t...@elementspace.com wrote: Very good point. I've seen this issue occur once before when I was playing with 4.3.1 and don't remember it happening since 4.5.0+, so that is good news - we are just behind. For anyone that is curious, on my earlier mention that Zookeeper/clusterstate.json was not taking updates: this was NOT correct. Zookeeper has no issues taking set/creates to clusterstate.json (or any znode), just this one node seemed to stay stuck as state: active while it was very inconsistent for reasons unknown, potentially just bugs. The good news is this will be resolved today with a create/destroy of the bad replica. Thanks all! Tim On 4 December 2013 16:50, Mark Miller markrmil...@gmail.com wrote: Keep in mind, there have been a *lot* of bug fixes since 4.3.1. - Mark On Dec 4, 2013, at 7:07 PM, Tim Vaillancourt t...@elementspace.com wrote: Hey all, Now that I am getting correct results with distrib=false, I've identified that 1 of my nodes has just 1/3rd of the total data set and totally explains the flapping in results. The fix for this is obvious (rebuild replica) but the cause is less obvious. There is definately more than one issue going on with this SolrCloud (but 1 down thanks to Chris' suggestion!), so I'm guessing the fact that /clusterstate.json doesn't seem to get updated when nodes are brought down/up is the reason why this replica remained in the distributed request chain without recovering/re-replicating from leader. I imagine my Zookeeper ensemble is having some problems unrelated to Solr that is the real root cause. Thanks! Tim On 04/12/13 03:00 PM, Tim Vaillancourt wrote: Chris, this is extremely helpful and it's silly I didn't think of this sooner! Thanks a lot, this makes the situation make much more sense. I will gather some proper data with your suggestion and get back to the thread shortly. Thanks!! Tim On 04/12/13 02:57 PM, Chris Hostetter wrote: : : I may be incorrect here, but I assumed when querying a single core of a : SolrCloud collection, the SolrCloud routing is bypassed and I am talking : directly to a plain/non-SolrCloud core. No ... every query received from a client by solr is handled by a single core -- if that core knows it's part of a SolrCloud collection then it will do a distributed search across a random replica from each shard in that collection. If you want to bypass the distribute search logic, you have to say so explicitly... To ask an arbitrary replica to only search itself add distrib=false to the request. Alternatively: you can ask that only certain shard names (or certain explicit replicas) be included in a distribute request.. https://cwiki.apache.org/confluence/display/solr/Distributed+Requests -Hoss http://www.lucidworks.com/
No /clusterstate.json updates on Solrcloud 4.3.1 Cores API UNLOAD/CREATE
Hey guys, I've been having an issue with 1 of my 4 replicas having an inconsistent replica, and have been trying to fix it. At the core of this issue, I've noticed /clusterstate.json doesn't seem to be receiving updates when cores get unhealthy, or even added/removed. Today I decided I would remove the bad replica from the SolrCloud and force a sync of a new clean replica, so I ran a '/admin/cores?command=UNLOADname=name' to drop it. After this, on the instance with the bad replica, the core was removed from solr.xml but strangely NOT the /clusterstate.json in Zookeeper - it remained in Zookeeper unchanged, still with state: active :(. So, I then manually edited the clusterstate.json with a Perl script, removing the json data for the bad replica. I checked all nodes saw the change themselves, things looked good. Then I brought the node up/down to check that it was properly adding/removing itself from /live_nodes znode in Zookeeper. That all worked perfectly, too. Here is the really odd part: when I created a new replica on this node (to replace the bad replica), the core was created on the node, and NO update was made to /clusterstate.json. At this point this node had no cores, no cores with state in /clusterstate.json, and all data dirs deleted, so this is quite confusing. Upon checking ACLs on /clusterstate.json, it is world/anyone accessible: [zk: localhost:2181(CONNECTED) 18] getAcl /clusterstate.json 'world,'anyone : cdrwa Also, keep in mind my external Perl script had no issue updating /clusterstate.json. Can anyone make any suggestions why /clusterstate.json isn't getting updated when I create this new core? One other thing I checked was the health of the Zookeeper ensemble, and all 3 Zookeepers have the same mZxid, ctime, mtime, etc for /clusterstate.json and receive updates no problem, just this node isn't updating Zookeeper somehow. Any thoughts are much appreciated! Thanks! Tim
Inconsistent numFound in SC when querying core directly
Hey guys, I'm looking into a strange issue on an unhealthy 4.3.1 SolrCloud with 3-node external Zookeeper and 1 collection (2 shards, 2 replicas). Currently we are noticing inconsistent results from the SolrCloud when performing the same simple /select query many times to our collection. Almost every other query the numFound count (and the returned data) jumps between two very different values. Initially I suspected a replica in a shard of the collection was inconsistent (and every other request hit that node) and started performing the same /select query direct to the individual cores of the SolrCloud collection on each instance, only to notice the same problem - the count jumps between two very different values! I may be incorrect here, but I assumed when querying a single core of a SolrCloud collection, the SolrCloud routing is bypassed and I am talking directly to a plain/non-SolrCloud core. As you can see here, the count for 1 core of my SolrCloud collection fluctuates wildly, and is only receiving updates and no deletes to explain the jumps: solrcloud [tvaillancourt@prodapp solr_cloud]$ curl -s 'http://backend:8983/solr/app_shard2_replica2/select?q=*:*wt=jsonrows=0indent=true'|grep numFound response:{numFound:123596839,start:0,maxScore:1.0,docs:[] solrcloud [tvaillancourt@prodapp solr_cloud]$ curl -s 'http://backend:8983/solr/app_shard2_replica2/select?q=*:*wt=jsonrows=0indent=true'|grep numFound response:{numFound:84739144,start:0,maxScore:1.0,docs:[] solrcloud [tvaillancourt@prodapp solr_cloud]$ curl -s 'http://backend:8983/solr/app_shard2_replica2/select?q=*:*wt=jsonrows=0indent=true'|grep numFound response:{numFound:123596839,start:0,maxScore:1.0,docs:[] solrcloud [tvaillancourt@prodapp solr_cloud]$ curl -s 'http://backend:8983/solr/app_shard2_replica2/select?q=*:*wt=jsonrows=0indent=true'|grep numFound response:{numFound:84771358,start:0,maxScore:1.0,docs:[] Could anyone help me understand why the same /select query direct to a single core would return inconsistent, flapping results if there are no deletes issued in my app to cause such jumps? Am I incorrect in my assumption that I am querying the core directly? An interesting observation is when I do an /admin/cores call to see the docCount of the core's index, it does not fluctuate, only the query result. That was hard to explain, hopefully someone has some insight! :) Thanks! Tim
Re: Inconsistent numFound in SC when querying core directly
To add two more pieces of data: 1) This occurs with real, conditional queries as well (eg: q=key:timvaillancourt), not just the q=*:* I provided in my email. 2) I've noticed when I bring a node of the SolrCloud down it is remaining state: active in my /clusterstate.json - something is really wrong with this cloud! Would a Zookeeper issue explain my varied results when querying a core directly? Thanks again! Tim On 04/12/13 02:17 PM, Tim Vaillancourt wrote: Hey guys, I'm looking into a strange issue on an unhealthy 4.3.1 SolrCloud with 3-node external Zookeeper and 1 collection (2 shards, 2 replicas). Currently we are noticing inconsistent results from the SolrCloud when performing the same simple /select query many times to our collection. Almost every other query the numFound count (and the returned data) jumps between two very different values. Initially I suspected a replica in a shard of the collection was inconsistent (and every other request hit that node) and started performing the same /select query direct to the individual cores of the SolrCloud collection on each instance, only to notice the same problem - the count jumps between two very different values! I may be incorrect here, but I assumed when querying a single core of a SolrCloud collection, the SolrCloud routing is bypassed and I am talking directly to a plain/non-SolrCloud core. As you can see here, the count for 1 core of my SolrCloud collection fluctuates wildly, and is only receiving updates and no deletes to explain the jumps: solrcloud [tvaillancourt@prodapp solr_cloud]$ curl -s 'http://backend:8983/solr/app_shard2_replica2/select?q=*:*wt=jsonrows=0indent=true'|grep numFound response:{numFound:123596839,start:0,maxScore:1.0,docs:[] solrcloud [tvaillancourt@prodapp solr_cloud]$ curl -s 'http://backend:8983/solr/app_shard2_replica2/select?q=*:*wt=jsonrows=0indent=true'|grep numFound response:{numFound:84739144,start:0,maxScore:1.0,docs:[] solrcloud [tvaillancourt@prodapp solr_cloud]$ curl -s 'http://backend:8983/solr/app_shard2_replica2/select?q=*:*wt=jsonrows=0indent=true'|grep numFound response:{numFound:123596839,start:0,maxScore:1.0,docs:[] solrcloud [tvaillancourt@prodapp solr_cloud]$ curl -s 'http://backend:8983/solr/app_shard2_replica2/select?q=*:*wt=jsonrows=0indent=true'|grep numFound response:{numFound:84771358,start:0,maxScore:1.0,docs:[] Could anyone help me understand why the same /select query direct to a single core would return inconsistent, flapping results if there are no deletes issued in my app to cause such jumps? Am I incorrect in my assumption that I am querying the core directly? An interesting observation is when I do an /admin/cores call to see the docCount of the core's index, it does not fluctuate, only the query result. That was hard to explain, hopefully someone has some insight! :) Thanks! Tim
Re: Inconsistent numFound in SC when querying core directly
Thanks Markus, I'm not sure if I'm encountering the same issue. This JIRA mentions 10s of docs difference, I'm seeing differences in the multi-millions of docs, and even more strangely it very predictably flaps between a 123M value and an 87M value, a 30M+ doc difference. Secondly, I'm not comparing values from 2 instances (Leader to Replica), I'm currently performing the same curl call to the same core directly and am seeing flapping results each time I perform the query, so this is currently happening within a single instance/core unless I am misunderstanding how to directly query a core. Cheers, Tim On 04/12/13 02:46 PM, Markus Jelsma wrote: https://issues.apache.org/jira/browse/SOLR-4260 Join the club Tim! Can you upgrade to trunk or incorporate the latest patches of related issues? You can fix it by trashing the bad node's data, although without multiple clusters it may be difficult to decide which node is bad. We use the latest commits now (since tuesday) and are still waiting for it to happen again. -Original message- From:Tim Vaillancourtt...@elementspace.com Sent: Wednesday 4th December 2013 23:38 To: solr-user@lucene.apache.org Subject: Re: Inconsistent numFound in SC when querying core directly To add two more pieces of data: 1) This occurs with real, conditional queries as well (eg: q=key:timvaillancourt), not just the q=*:* I provided in my email. 2) I've noticed when I bring a node of the SolrCloud down it is remaining state: active in my /clusterstate.json - something is really wrong with this cloud! Would a Zookeeper issue explain my varied results when querying a core directly? Thanks again! Tim On 04/12/13 02:17 PM, Tim Vaillancourt wrote: Hey guys, I'm looking into a strange issue on an unhealthy 4.3.1 SolrCloud with 3-node external Zookeeper and 1 collection (2 shards, 2 replicas). Currently we are noticing inconsistent results from the SolrCloud when performing the same simple /select query many times to our collection. Almost every other query the numFound count (and the returned data) jumps between two very different values. Initially I suspected a replica in a shard of the collection was inconsistent (and every other request hit that node) and started performing the same /select query direct to the individual cores of the SolrCloud collection on each instance, only to notice the same problem - the count jumps between two very different values! I may be incorrect here, but I assumed when querying a single core of a SolrCloud collection, the SolrCloud routing is bypassed and I am talking directly to a plain/non-SolrCloud core. As you can see here, the count for 1 core of my SolrCloud collection fluctuates wildly, and is only receiving updates and no deletes to explain the jumps: solrcloud [tvaillancourt@prodapp solr_cloud]$ curl -s 'http://backend:8983/solr/app_shard2_replica2/select?q=*:*wt=jsonrows=0indent=true'|grep numFound response:{numFound:123596839,start:0,maxScore:1.0,docs:[] solrcloud [tvaillancourt@prodapp solr_cloud]$ curl -s 'http://backend:8983/solr/app_shard2_replica2/select?q=*:*wt=jsonrows=0indent=true'|grep numFound response:{numFound:84739144,start:0,maxScore:1.0,docs:[] solrcloud [tvaillancourt@prodapp solr_cloud]$ curl -s 'http://backend:8983/solr/app_shard2_replica2/select?q=*:*wt=jsonrows=0indent=true'|grep numFound response:{numFound:123596839,start:0,maxScore:1.0,docs:[] solrcloud [tvaillancourt@prodapp solr_cloud]$ curl -s 'http://backend:8983/solr/app_shard2_replica2/select?q=*:*wt=jsonrows=0indent=true'|grep numFound response:{numFound:84771358,start:0,maxScore:1.0,docs:[] Could anyone help me understand why the same /select query direct to a single core would return inconsistent, flapping results if there are no deletes issued in my app to cause such jumps? Am I incorrect in my assumption that I am querying the core directly? An interesting observation is when I do an /admin/cores call to see the docCount of the core's index, it does not fluctuate, only the query result. That was hard to explain, hopefully someone has some insight! :) Thanks! Tim
Re: Inconsistent numFound in SC when querying core directly
Chris, this is extremely helpful and it's silly I didn't think of this sooner! Thanks a lot, this makes the situation make much more sense. I will gather some proper data with your suggestion and get back to the thread shortly. Thanks!! Tim On 04/12/13 02:57 PM, Chris Hostetter wrote: : : I may be incorrect here, but I assumed when querying a single core of a : SolrCloud collection, the SolrCloud routing is bypassed and I am talking : directly to a plain/non-SolrCloud core. No ... every query received from a client by solr is handled by a single core -- if that core knows it's part of a SolrCloud collection then it will do a distributed search across a random replica from each shard in that collection. If you want to bypass the distribute search logic, you have to say so explicitly... To ask an arbitrary replica to only search itself add distrib=false to the request. Alternatively: you can ask that only certain shard names (or certain explicit replicas) be included in a distribute request.. https://cwiki.apache.org/confluence/display/solr/Distributed+Requests -Hoss http://www.lucidworks.com/
Re: Inconsistent numFound in SC when querying core directly
Hey all, Now that I am getting correct results with distrib=false, I've identified that 1 of my nodes has just 1/3rd of the total data set and totally explains the flapping in results. The fix for this is obvious (rebuild replica) but the cause is less obvious. There is definately more than one issue going on with this SolrCloud (but 1 down thanks to Chris' suggestion!), so I'm guessing the fact that /clusterstate.json doesn't seem to get updated when nodes are brought down/up is the reason why this replica remained in the distributed request chain without recovering/re-replicating from leader. I imagine my Zookeeper ensemble is having some problems unrelated to Solr that is the real root cause. Thanks! Tim On 04/12/13 03:00 PM, Tim Vaillancourt wrote: Chris, this is extremely helpful and it's silly I didn't think of this sooner! Thanks a lot, this makes the situation make much more sense. I will gather some proper data with your suggestion and get back to the thread shortly. Thanks!! Tim On 04/12/13 02:57 PM, Chris Hostetter wrote: : : I may be incorrect here, but I assumed when querying a single core of a : SolrCloud collection, the SolrCloud routing is bypassed and I am talking : directly to a plain/non-SolrCloud core. No ... every query received from a client by solr is handled by a single core -- if that core knows it's part of a SolrCloud collection then it will do a distributed search across a random replica from each shard in that collection. If you want to bypass the distribute search logic, you have to say so explicitly... To ask an arbitrary replica to only search itself add distrib=false to the request. Alternatively: you can ask that only certain shard names (or certain explicit replicas) be included in a distribute request.. https://cwiki.apache.org/confluence/display/solr/Distributed+Requests -Hoss http://www.lucidworks.com/
Re: difference between apache tomcat vs Jetty
I (jokingly) propose we take it a step further and drop Java :)! I'm getting tired of trying to scale GC'ing JVMs! Tim On 25/10/13 09:02 AM, Mark Miller wrote: Just to add to the “use jetty for Solr” argument - Solr 5.0 will no longer consider itself a webapp and will consider the fact that Jetty is a used an implementation detail. We won’t necessarily make it impossible to use a different container, but the project won’t condone it or support it and may do some things that assume Jetty. Solr is taking over this layer in 5.0. - Mark On Oct 25, 2013, at 11:18 AM, Cassandra Targettcasstarg...@gmail.com wrote: In terms of adding or fixing documentation, the Installing Solr page (https://cwiki.apache.org/confluence/display/solr/Installing+Solr) includes a yellow box that says: Solr ships with a working Jetty server, with optimized settings for Solr, inside the example directory. It is recommended that you use the provided Jetty server for optimal performance. If you absolutely must use a different servlet container then continue to the next section on how to install Solr. So, it's stated, but maybe not in a way that makes it clear to most users. And maybe it needs to be repeated in another section. Suggestions? I did find this page, https://cwiki.apache.org/confluence/display/solr/Running+Solr+on+Jetty, which pretty much contradicts the previous text. I'll fix that now. Other recommendations for where doc could be more clear are welcome. On Thu, Oct 24, 2013 at 7:14 PM, Tim Vaillancourtt...@elementspace.com wrote: Hmm, thats an interesting move. I'm on the fence on that one but it surely simplifies some things. Good info, thanks! Tim On 24 October 2013 16:46, Anshum Guptaans...@anshumgupta.net wrote: Thought you may want to have a look at this: https://issues.apache.org/jira/browse/SOLR-4792 P.S: There are no timelines for 5.0 for now, but it's the future nevertheless. On Fri, Oct 25, 2013 at 3:39 AM, Tim Vaillancourtt...@elementspace.com wrote: I agree with Jonathan (and Shawn on the Jetty explanation), I think the docs should make this a bit more clear - I notice many people choosing Tomcat and then learning these details after, possibly regretting it. I'd be glad to modify the docs but I want to be careful how it is worded. Is it fair to go as far as saying Jetty is 100% THE recommended container for Solr, or should a recommendation be avoided, and maybe just a list of pros/cons? Cheers, Tim -- Anshum Gupta http://www.anshumgupta.net
Re: difference between apache tomcat vs Jetty
I agree with Jonathan (and Shawn on the Jetty explanation), I think the docs should make this a bit more clear - I notice many people choosing Tomcat and then learning these details after, possibly regretting it. I'd be glad to modify the docs but I want to be careful how it is worded. Is it fair to go as far as saying Jetty is 100% THE recommended container for Solr, or should a recommendation be avoided, and maybe just a list of pros/cons? Cheers, Tim
Re: difference between apache tomcat vs Jetty
Hmm, thats an interesting move. I'm on the fence on that one but it surely simplifies some things. Good info, thanks! Tim On 24 October 2013 16:46, Anshum Gupta ans...@anshumgupta.net wrote: Thought you may want to have a look at this: https://issues.apache.org/jira/browse/SOLR-4792 P.S: There are no timelines for 5.0 for now, but it's the future nevertheless. On Fri, Oct 25, 2013 at 3:39 AM, Tim Vaillancourt t...@elementspace.com wrote: I agree with Jonathan (and Shawn on the Jetty explanation), I think the docs should make this a bit more clear - I notice many people choosing Tomcat and then learning these details after, possibly regretting it. I'd be glad to modify the docs but I want to be careful how it is worded. Is it fair to go as far as saying Jetty is 100% THE recommended container for Solr, or should a recommendation be avoided, and maybe just a list of pros/cons? Cheers, Tim -- Anshum Gupta http://www.anshumgupta.net
Re: Skipping caches on a /select
Thanks Yonik, Does cache=false apply to all caches? The docs make it sound like it is for filterCache only, but I could be misunderstanding. When I force a commit and perform a /select a query many times with cache=false, I notice my query gets cached still, my guess is in the queryResultCache. At first the query takes 500ms+, then all subsequent requests take 0-1ms. I'll confirm this queryResultCache assumption today. Cheers, Tim On 16/10/13 06:33 PM, Yonik Seeley wrote: On Wed, Oct 16, 2013 at 6:18 PM, Tim Vaillancourtt...@elementspace.com wrote: I am debugging some /select queries on my Solr tier and would like to see if there is a way to tell Solr to skip the caches on a given /select query if it happens to ALREADY be in the cache. Live queries are being inserted and read from the caches, but I want my debug queries to bypass the cache entirely. I do know about the cache=false param (that causes the results of a select to not be INSERTED in to the cache), but what I am looking for instead is a way to tell Solr to not read the cache at all, even if there actually is a cached result for my query. Yeah, cache=false for q or fq should already not use the cache at all (read or write). -Yonik
Re: Skipping caches on a /select
Awesome, this make a lot of sense now. Thanks a lot guys. Currently the only mention of this setting in the docs is under filterQuery on the "SolrCaching" page as: " Solr3.4 Adding the localParam flag of {!cache=false} to a query will prevent the filterCache from being consulted for that query. " I will update the docs sometime soon to reflect that this can apply to any query (q or fq). Cheers, Tim On 17/10/13 01:44 PM, Chris Hostetter wrote: : Does "cache=false" apply to all caches? The docs make it sound like it is for : filterCache only, but I could be misunderstanding. it's per *query* -- not per cache, or per request... /select?q={!cache=true}foofq={!cache=false}barfq={!cache=true}baz ...should cause 1 lookup/insert in the filterCache (baz) and 1 lookup/insert into the queryResultCache (for the main query with it's associated filters pagination) -Hoss
Skipping caches on a /select
Hey guys, I am debugging some /select queries on my Solr tier and would like to see if there is a way to tell Solr to skip the caches on a given /select query if it happens to ALREADY be in the cache. Live queries are being inserted and read from the caches, but I want my debug queries to bypass the cache entirely. I do know about the cache=false param (that causes the results of a select to not be INSERTED in to the cache), but what I am looking for instead is a way to tell Solr to not read the cache at all, even if there actually is a cached result for my query. Is there a way to do this (without disabling my caches in solrconfig.xml), or is this feature request? Thanks! Tim Vaillancourt
Re: SolrCloud on SSL
Not important, but I'm also curious why you would want SSL on Solr (adds overhead, complexity, harder-to-troubleshoot, etc)? To avoid the overhead, could you put Solr on a separate VLAN (with ACLs to client servers)? Cheers, Tim On 12 October 2013 17:30, Shawn Heisey s...@elyograg.org wrote: On 10/11/2013 9:38 AM, Christopher Gross wrote: On Fri, Oct 11, 2013 at 11:08 AM, Shawn Heisey s...@elyograg.org wrote: On 10/11/2013 8:17 AM, Christopher Gross wrote: Is there a spot in a Solr configuration that I can set this up to use HTTPS? From what I can tell, not yet. https://issues.apache.org/jira/browse/SOLR-3854 https://issues.apache.org/jira/browse/SOLR-4407 https://issues.apache.org/jira/browse/SOLR-4470 Dang. Christopher, I was just looking through Solr source code for a completely different issue, and it seems that there *IS* a way to do this in your configuration. If you were to use https://hostname; or https://ipaddress; as the host parameter in your solr.xml file on each machine, it should do what you want. The parameter is described here, but not the behavior that I have discovered: http://wiki.apache.org/solr/SolrCloud#SolrCloud_Instance_Params Boring details: In the org.apache.solr.cloud package, there is a ZkController class. The getHostAddress method is where I discovered that you can do this. If you could try this out and confirm that it works, I will get the wiki page updated and look into the Solr reference guide as well. Thanks, Shawn
fq caching question
Hey guys, Sorry for such a simple question, but I am curious as to the differences in caching between a combined filter query, and many separate filter queries. Here are 2 example queries, one with combined fq, one separate: 1) /select?q=*:*fq=type:bidfq=user_id:3 2) /select?q=*:*fq=(type:bid%20AND%20user_id:3) For query #1: am I correct that the first query will keep 2 independent entries in the filterCache for type:bid and user_id:3?\ For query #2: is it correct that the 2nd query will keep 1 entry in the filterCache that satisfies all conditions? Lastly, is it a fair statement that under general query patterns, many separate filter queries are more-cacheable than 1 combined one? Eg, if I performed query #2 (in the filterCache) and then changed the user_id, nothing about my new query is cache able, correct (but if I used 2 separate filter queries than 1 of 2 is still cached)? Cheers, Tim Vaillancourt
Re: fq caching question
Thanks Koji! Cheers, Tim On 14/10/13 03:56 PM, Koji Sekiguchi wrote: Hi Tim, (13/10/15 5:22), Tim Vaillancourt wrote: Hey guys, Sorry for such a simple question, but I am curious as to the differences in caching between a combined filter query, and many separate filter queries. Here are 2 example queries, one with combined fq, one separate: 1) /select?q=*:*fq=type:bidfq=user_id:3 2) /select?q=*:*fq=(type:bid%20AND%20user_id:3) For query #1: am I correct that the first query will keep 2 independent entries in the filterCache for type:bid and user_id:3?\ Correct. For query #2: is it correct that the 2nd query will keep 1 entry in the filterCache that satisfies all conditions? Correct. Lastly, is it a fair statement that under general query patterns, many separate filter queries are more-cacheable than 1 combined one? Eg, if I performed query #2 (in the filterCache) and then changed the user_id, nothing about my new query is cache able, correct (but if I used 2 separate filter queries than 1 of 2 is still cached)? Yes, it is. koji
Re: {soft}Commit and cache flusing
Apologies all. I think the suggestion that I was replying to get noticed is what erked me, otherwise I would have moved on. I'll follow this advice. Cheers, Tim On 9 October 2013 05:20, Erick Erickson erickerick...@gmail.com wrote: Tim: I think you're mis-interpreting. By replying to a post with the subject: {soft}Commit and cache flushing but going in a different direction, it's easy for people to think I'm not interested in that thread, I'll ignore it, thereby missing the fact that you're asking a somewhat different question that they might have information about. It's not about whether you're doing anything particularly wrong with the question. It's about making it easy for people to help. See http://people.apache.org/~hossman/#threadhijack Best, Erick On Tue, Oct 8, 2013 at 6:23 PM, Tim Vaillancourt t...@elementspace.com wrote: I have a genuine question with substance here. If anything this nonconstructive, rude response was to get noticed. Thanks for contributing to the discussion. Tim On 8 October 2013 05:31, Dmitry Kan solrexp...@gmail.com wrote: Tim, I suggest you open a new thread and not reply to this one to get noticed. Dmitry On Mon, Oct 7, 2013 at 9:44 PM, Tim Vaillancourt t...@elementspace.com wrote: Is there a way to make autoCommit only commit if there are pending changes, ie: if there are 0 adds pending commit, don't autoCommit (open-a-searcher and wipe the caches)? Cheers, Tim On 2 October 2013 00:52, Dmitry Kan solrexp...@gmail.com wrote: right. We've got the autoHard commit configured only atm. The soft-commits are controlled on the client. It was just easier to implement the first version of our internal commit policy that will commit to all solr instances at once. This is where we have noticed the reported behavior. On Wed, Oct 2, 2013 at 9:32 AM, Bram Van Dam bram.van...@intix.eu wrote: if there are no modifications to an index and a softCommit or hardCommit issued, then solr flushes the cache. Indeed. The easiest way to work around this is by disabling auto commits and only commit when you have to.
Re: solr cpu usage
Yes, you've saved us all lots of time with this article. I'm about to do the same for the old Jetty or Tomcat? container question ;). Tim On 7 October 2013 18:55, Erick Erickson erickerick...@gmail.com wrote: Tim: Thanks! Mostly I wrote it to have something official looking to hide behind when I didn't have a good answer to the hardware sizing question :). On Mon, Oct 7, 2013 at 2:48 PM, Tim Vaillancourt t...@elementspace.com wrote: Fantastic article! Tim On 5 October 2013 18:14, Erick Erickson erickerick...@gmail.com wrote: From my perspective, your question is almost impossible to answer, there are too many variables. See: http://searchhub.org/dev/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/ Best, Erick On Thu, Oct 3, 2013 at 9:38 PM, Otis Gospodnetic otis.gospodne...@gmail.com wrote: Hi, More CPU cores means more concurrency. This is good if you need to handle high query rates. Faster cores mean lower query latency, assuming you are not bottlenecked by memory or disk IO or network IO. So what is ideal for you depends on your concurrency and latency needs. Otis Solr ElasticSearch Support http://sematext.com/ On Oct 1, 2013 9:33 AM, adfel70 adfe...@gmail.com wrote: hi We're building a spec for a machine to purchase. We're going to buy 10 machines. we aren't sure yet how many proccesses we will run per machine. the question is -should we buy faster cpu with less cores or slower cpu with more cores? in any case we will have 2 cpus in each machine. should we buy 2.6Ghz cpu with 8 cores or 3.5Ghz cpu with 4 cores? what will we gain by having many cores? what kinds of usages would make cpu be the bottleneck? -- View this message in context: http://lucene.472066.n3.nabble.com/solr-cpu-usage-tp4092938.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: {soft}Commit and cache flusing
I have a genuine question with substance here. If anything this nonconstructive, rude response was to get noticed. Thanks for contributing to the discussion. Tim On 8 October 2013 05:31, Dmitry Kan solrexp...@gmail.com wrote: Tim, I suggest you open a new thread and not reply to this one to get noticed. Dmitry On Mon, Oct 7, 2013 at 9:44 PM, Tim Vaillancourt t...@elementspace.com wrote: Is there a way to make autoCommit only commit if there are pending changes, ie: if there are 0 adds pending commit, don't autoCommit (open-a-searcher and wipe the caches)? Cheers, Tim On 2 October 2013 00:52, Dmitry Kan solrexp...@gmail.com wrote: right. We've got the autoHard commit configured only atm. The soft-commits are controlled on the client. It was just easier to implement the first version of our internal commit policy that will commit to all solr instances at once. This is where we have noticed the reported behavior. On Wed, Oct 2, 2013 at 9:32 AM, Bram Van Dam bram.van...@intix.eu wrote: if there are no modifications to an index and a softCommit or hardCommit issued, then solr flushes the cache. Indeed. The easiest way to work around this is by disabling auto commits and only commit when you have to.
Re: {soft}Commit and cache flusing
Is there a way to make autoCommit only commit if there are pending changes, ie: if there are 0 adds pending commit, don't autoCommit (open-a-searcher and wipe the caches)? Cheers, Tim On 2 October 2013 00:52, Dmitry Kan solrexp...@gmail.com wrote: right. We've got the autoHard commit configured only atm. The soft-commits are controlled on the client. It was just easier to implement the first version of our internal commit policy that will commit to all solr instances at once. This is where we have noticed the reported behavior. On Wed, Oct 2, 2013 at 9:32 AM, Bram Van Dam bram.van...@intix.eu wrote: if there are no modifications to an index and a softCommit or hardCommit issued, then solr flushes the cache. Indeed. The easiest way to work around this is by disabling auto commits and only commit when you have to.
Re: solr cpu usage
Fantastic article! Tim On 5 October 2013 18:14, Erick Erickson erickerick...@gmail.com wrote: From my perspective, your question is almost impossible to answer, there are too many variables. See: http://searchhub.org/dev/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/ Best, Erick On Thu, Oct 3, 2013 at 9:38 PM, Otis Gospodnetic otis.gospodne...@gmail.com wrote: Hi, More CPU cores means more concurrency. This is good if you need to handle high query rates. Faster cores mean lower query latency, assuming you are not bottlenecked by memory or disk IO or network IO. So what is ideal for you depends on your concurrency and latency needs. Otis Solr ElasticSearch Support http://sematext.com/ On Oct 1, 2013 9:33 AM, adfel70 adfe...@gmail.com wrote: hi We're building a spec for a machine to purchase. We're going to buy 10 machines. we aren't sure yet how many proccesses we will run per machine. the question is -should we buy faster cpu with less cores or slower cpu with more cores? in any case we will have 2 cpus in each machine. should we buy 2.6Ghz cpu with 8 cores or 3.5Ghz cpu with 4 cores? what will we gain by having many cores? what kinds of usages would make cpu be the bottleneck? -- View this message in context: http://lucene.472066.n3.nabble.com/solr-cpu-usage-tp4092938.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: App server?
Jetty should be sufficient, and is the more-common container for Solr. Also, Solr tests are written for Jetty. Lastly, I'd argue Jetty is just-as enterprise as Tomcat. Google App Engine (running lots of enterprise), is Jetty-based, for example. Cheers, Tim On 2 October 2013 15:44, Mark static.void@gmail.com wrote: Is Jetty sufficient for running Solr or should I go with something a little more enterprise like tomcat? Any others?
Re: SolrCloud 4.x hangs under high update volume
) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:564) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:213) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1083) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:379) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:175) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1017) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:136) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:258) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:109) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97) at org.eclipse.jetty.server.Server.handle(Server.java:445) at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:260) at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:225) at org.eclipse.jetty.io.AbstractConnection$ReadCallback.run(AbstractConnection.java:358) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:596) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:527) at java.lang.Thread.run(Thread.java:724) On your live_nodes question, I don't have historical data on this from when the crash occurred, which I guess is what you're looking for. I could add this to our monitoring for future tests, however. I'd be glad to continue further testing, but I think first more monitoring is needed to understand this further. Could we come up with a list of metrics that would be useful to see following another test and successful crash? Metrics needed: 1) # of live_nodes. 2) Full stack traces. 3) CPU used by Solr's JVM specifically (instead of system-wide). 4) Solr's JVM thread count (already done) 5) ? Cheers, Tim Vaillancourt On 6 September 2013 13:11, Mark Millermarkrmil...@gmail.com wrote: Did you ever get to index that long before without hitting the deadlock? There really isn't anything negative the patch could be introducing, other than allowing for some more threads to possibly run at once. If I had to guess, I would say its likely this patch fixes the deadlock issue and your seeing another issue - which looks like the system cannot keep up with the requests or something for some reason - perhaps due to some OS networking settings or something (more guessing). Connection refused happens generally when there is nothing listening on the port. Do you see anything interesting change with the rest of the system? CPU usage spikes or something like that? Clamping down further on the overall number of threads night help (which would require making something configurable). How many nodes are listed in zk under live_nodes? Mark Sent from my iPhone On Sep 6, 2013, at 12:02 PM, Tim Vaillancourtt...@elementspace.com wrote: Hey guys, (copy of my post to SOLR-5216) We tested this patch and unfortunately encountered some serious issues a few hours of 500 update-batches/sec. Our update batch is 10 docs, so we are writing about 5000 docs/sec total, using autoCommit to commit the updates (no explicit commits). Our environment: Solr 4.3.1 w/SOLR-5216 patch. Jetty 9, Java 1.7. 3 solr instances, 1 per physical server. 1 collection. 3 shards. 2 replicas (each instance is a leader and a replica). Soft autoCommit is 1000ms. Hard autoCommit is 15000ms. After about 6 hours of stress-testing this patch, we see many of these stalled transactions (below), and the Solr instances start to see each other as down, flooding our Solr logs with Connection Refused exceptions, and otherwise no obviously-useful logs that I could see. I did notice some stalled transactions on both /select and /update, however. This never occurred without this patch. Stack /select seems stalled on: http://pastebin.com/Y1NCrXGC Stack /update seems stalled on: http://pastebin.com/cFLbC8Y9 Lastly, I have a summary of the ERROR-severity logs from this 24-hour soak. My script normalizes the ERROR-severity stack traces and returns them in order of occurrence. Summary of my solr.log: http://pastebin.com/pBdMAWeb Thanks! Tim Vaillancourt On 6 September 2013 07:27, Markus Jelsma markus.jel...@openindex.io wrote: Thanks! -Original message- From:Erick Ericksonerickerick...@gmail.com Sent: Friday 6th September 2013 16:20 To: solr-user@lucene.apache.org Subject: Re: SolrCloud 4.x hangs under high update volume Markus: See: https://issues.apache.org/jira/browse/SOLR-5216 On Wed, Sep 4, 2013 at 11:04 AM, Markus Jelsma markus.jel...@openindex.iowrote: Hi Mark, Got an issue to watch? Thanks, Markus
Re: SolrCloud 4.x hangs under high update volume
Lol, at breaking during a demo - always the way it is! :) I agree, we are just tip-toeing around the issue, but waiting for 4.5 is definitely an option if we get-by for now in testing; patched Solr versions seem to make people uneasy sometimes :). Seeing there seems to be some danger to SOLR-5216 (in some ways it blows up worse due to less limitations on thread), I'm guessing only SOLR-5232 and SOLR-4816 are making it into 4.5? I feel those 2 in combination will make a world of difference! Thanks so much again guys! Tim On 12 September 2013 03:43, Erick Erickson erickerick...@gmail.com wrote: Fewer client threads updating makes sense, and going to 1 core also seems like it might help. But it's all a crap-shoot unless the underlying cause gets fixed up. Both would improve things, but you'll still hit the problem sometime, probably when doing a demo for your boss ;). Adrien has branched the code for SOLR 4.5 in preparation for a release candidate tentatively scheduled for next week. You might just start working with that branch if you can rather than apply individual patches... I suspect there'll be a couple more changes to this code (looks like Shikhar already raised an issue for instance) before 4.5 is finally cut... FWIW, Erick On Thu, Sep 12, 2013 at 2:13 AM, Tim Vaillancourt t...@elementspace.com wrote: Thanks Erick! Yeah, I think the next step will be CloudSolrServer with the SOLR-4816 patch. I think that is a very, very useful patch by the way. SOLR-5232 seems promising as well. I see your point on the more-shards idea, this is obviously a global/instance-level lock. If I really had to, I suppose I could run more Solr instances to reduce locking then? Currently I have 2 cores per instance and I could go 1-to-1 to simplify things. The good news is we seem to be more stable since changing to a bigger client-solr batch-size and fewer client threads updating. Cheers, Tim On 11/09/13 04:19 AM, Erick Erickson wrote: If you use CloudSolrServer, you need to apply SOLR-4816 or use a recent copy of the 4x branch. By recent, I mean like today, it looks like Mark applied this early this morning. But several reports indicate that this will solve your problem. I would expect that increasing the number of shards would make the problem worse, not better. There's also SOLR-5232... Best Erick On Tue, Sep 10, 2013 at 5:20 PM, Tim Vaillancourttim@elementspace. **comt...@elementspace.com wrote: Hey guys, Based on my understanding of the problem we are encountering, I feel we've been able to reduce the likelihood of this issue by making the following changes to our app's usage of SolrCloud: 1) We increased our document batch size to 200 from 10 - our app batches updates to reduce HTTP requests/overhead. The theory is increasing the batch size reduces the likelihood of this issue happening. 2) We reduced to 1 application node sending updates to SolrCloud - we write Solr updates to Redis, and have previously had 4 application nodes pushing the updates to Solr (popping off the Redis queue). Reducing the number of nodes pushing to Solr reduces the concurrency on SolrCloud. 3) Less threads pushing to SolrCloud - due to the increase in batch size, we were able to go down to 5 update threads on the update-pushing-app (from 10 threads). To be clear the above only reduces the likelihood of the issue happening, and DOES NOT actually resolve the issue at hand. If we happen to encounter issues with the above 3 changes, the next steps (I could use some advice on) are: 1) Increase the number of shards (2x) - the theory here is this reduces the locking on shards because there are more shards. Am I onto something here, or will this not help at all? 2) Use CloudSolrServer - currently we have a plain-old least-connection HTTP VIP. If we go direct to what we need to update, this will reduce concurrency in SolrCloud a bit. Thoughts? Thanks all! Cheers, Tim On 6 September 2013 14:47, Tim Vaillancourttim@elementspace.**com t...@elementspace.com wrote: Enjoy your trip, Mark! Thanks again for the help! Tim On 6 September 2013 14:18, Mark Millermarkrmil...@gmail.com wrote: Okay, thanks, useful info. Getting on a plane, but ill look more at this soon. That 10k thread spike is good to know - that's no good and could easily be part of the problem. We want to keep that from happening. Mark Sent from my iPhone On Sep 6, 2013, at 2:05 PM, Tim Vaillancourttim@elementspace.**com t...@elementspace.com wrote: Hey Mark, The farthest we've made it at the same batch size/volume was 12 hours without this patch, but that isn't consistent. Sometimes we would only get to 6 hours or less. During the crash I can see an amazing spike in threads to 10k which is essentially our ulimit
Re: SolrCloud 4.x hangs under high update volume
That makes sense, thanks Erick and Mark for you help! :) I'll see if I can find a place to assist with the testing of SOLR-5232. Cheers, Tim On 12 September 2013 11:16, Mark Miller markrmil...@gmail.com wrote: Right, I don't see SOLR-5232 making 4.5 unfortunately. It could perhaps make a 4.5.1 - it does resolve a critical issue - but 4.5 is in motion and SOLR-5232 is not quite ready - we need some testing. - Mark On Sep 12, 2013, at 2:12 PM, Erick Erickson erickerick...@gmail.com wrote: My take on it is this, assuming I'm reading this right: 1 SOLR-5216 - probably not going anywhere, 5232 will take care of it. 2 SOLR-5232 - expected to fix the underlying issue no matter whether you're using CloudSolrServer from SolrJ or sending lots of updates from lots of clients. 3 SOLR-4816 - use this patch and CloudSolrServer from SolrJ in the meantime. I don't quite know whether SOLR-5232 will make it in to 4.5 or not, it hasn't been committed anywhere yet. The Solr 4.5 release is imminent, RC0 is looking like it'll be ready to cut next week so it might not be included. Best, Erick On Thu, Sep 12, 2013 at 1:42 PM, Tim Vaillancourt t...@elementspace.com wrote: Lol, at breaking during a demo - always the way it is! :) I agree, we are just tip-toeing around the issue, but waiting for 4.5 is definitely an option if we get-by for now in testing; patched Solr versions seem to make people uneasy sometimes :). Seeing there seems to be some danger to SOLR-5216 (in some ways it blows up worse due to less limitations on thread), I'm guessing only SOLR-5232 and SOLR-4816 are making it into 4.5? I feel those 2 in combination will make a world of difference! Thanks so much again guys! Tim On 12 September 2013 03:43, Erick Erickson erickerick...@gmail.com wrote: Fewer client threads updating makes sense, and going to 1 core also seems like it might help. But it's all a crap-shoot unless the underlying cause gets fixed up. Both would improve things, but you'll still hit the problem sometime, probably when doing a demo for your boss ;). Adrien has branched the code for SOLR 4.5 in preparation for a release candidate tentatively scheduled for next week. You might just start working with that branch if you can rather than apply individual patches... I suspect there'll be a couple more changes to this code (looks like Shikhar already raised an issue for instance) before 4.5 is finally cut... FWIW, Erick On Thu, Sep 12, 2013 at 2:13 AM, Tim Vaillancourt t...@elementspace.com wrote: Thanks Erick! Yeah, I think the next step will be CloudSolrServer with the SOLR-4816 patch. I think that is a very, very useful patch by the way. SOLR-5232 seems promising as well. I see your point on the more-shards idea, this is obviously a global/instance-level lock. If I really had to, I suppose I could run more Solr instances to reduce locking then? Currently I have 2 cores per instance and I could go 1-to-1 to simplify things. The good news is we seem to be more stable since changing to a bigger client-solr batch-size and fewer client threads updating. Cheers, Tim On 11/09/13 04:19 AM, Erick Erickson wrote: If you use CloudSolrServer, you need to apply SOLR-4816 or use a recent copy of the 4x branch. By recent, I mean like today, it looks like Mark applied this early this morning. But several reports indicate that this will solve your problem. I would expect that increasing the number of shards would make the problem worse, not better. There's also SOLR-5232... Best Erick On Tue, Sep 10, 2013 at 5:20 PM, Tim Vaillancourttim@elementspace. **comt...@elementspace.com wrote: Hey guys, Based on my understanding of the problem we are encountering, I feel we've been able to reduce the likelihood of this issue by making the following changes to our app's usage of SolrCloud: 1) We increased our document batch size to 200 from 10 - our app batches updates to reduce HTTP requests/overhead. The theory is increasing the batch size reduces the likelihood of this issue happening. 2) We reduced to 1 application node sending updates to SolrCloud - we write Solr updates to Redis, and have previously had 4 application nodes pushing the updates to Solr (popping off the Redis queue). Reducing the number of nodes pushing to Solr reduces the concurrency on SolrCloud. 3) Less threads pushing to SolrCloud - due to the increase in batch size, we were able to go down to 5 update threads on the update-pushing-app (from 10 threads). To be clear the above only reduces the likelihood of the issue happening, and DOES NOT actually resolve the issue at hand. If we happen to encounter issues with the above 3 changes, the next steps (I could use some advice on) are: 1) Increase
Re: SolrCloud 4.x hangs under high update volume
Hey guys, Based on my understanding of the problem we are encountering, I feel we've been able to reduce the likelihood of this issue by making the following changes to our app's usage of SolrCloud: 1) We increased our document batch size to 200 from 10 - our app batches updates to reduce HTTP requests/overhead. The theory is increasing the batch size reduces the likelihood of this issue happening. 2) We reduced to 1 application node sending updates to SolrCloud - we write Solr updates to Redis, and have previously had 4 application nodes pushing the updates to Solr (popping off the Redis queue). Reducing the number of nodes pushing to Solr reduces the concurrency on SolrCloud. 3) Less threads pushing to SolrCloud - due to the increase in batch size, we were able to go down to 5 update threads on the update-pushing-app (from 10 threads). To be clear the above only reduces the likelihood of the issue happening, and DOES NOT actually resolve the issue at hand. If we happen to encounter issues with the above 3 changes, the next steps (I could use some advice on) are: 1) Increase the number of shards (2x) - the theory here is this reduces the locking on shards because there are more shards. Am I onto something here, or will this not help at all? 2) Use CloudSolrServer - currently we have a plain-old least-connection HTTP VIP. If we go direct to what we need to update, this will reduce concurrency in SolrCloud a bit. Thoughts? Thanks all! Cheers, Tim On 6 September 2013 14:47, Tim Vaillancourt t...@elementspace.com wrote: Enjoy your trip, Mark! Thanks again for the help! Tim On 6 September 2013 14:18, Mark Miller markrmil...@gmail.com wrote: Okay, thanks, useful info. Getting on a plane, but ill look more at this soon. That 10k thread spike is good to know - that's no good and could easily be part of the problem. We want to keep that from happening. Mark Sent from my iPhone On Sep 6, 2013, at 2:05 PM, Tim Vaillancourt t...@elementspace.com wrote: Hey Mark, The farthest we've made it at the same batch size/volume was 12 hours without this patch, but that isn't consistent. Sometimes we would only get to 6 hours or less. During the crash I can see an amazing spike in threads to 10k which is essentially our ulimit for the JVM, but I strangely see no OutOfMemory: cannot open native thread errors that always follow this. Weird! We also notice a spike in CPU around the crash. The instability caused some shard recovery/replication though, so that CPU may be a symptom of the replication, or is possibly the root cause. The CPU spikes from about 20-30% utilization (system + user) to 60% fairly sharply, so the CPU, while spiking isn't quite pinned (very beefy Dell R720s - 16 core Xeons, whole index is in 128GB RAM, 6xRAID10 15k). More on resources: our disk I/O seemed to spike about 2x during the crash (about 1300kbps written to 3500kbps), but this may have been the replication, or ERROR logging (we generally log nothing due to WARN-severity unless something breaks). Lastly, I found this stack trace occurring frequently, and have no idea what it is (may be useful or not): java.lang.IllegalStateException : at org.eclipse.jetty.server.Response.resetBuffer(Response.java:964) at org.eclipse.jetty.server.Response.sendError(Response.java:325) at org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:692) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:380) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:155) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1423) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:450) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:138) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:564) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:213) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1083) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:379) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:175) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1017) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:136) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:258) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:109) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97) at org.eclipse.jetty.server.Server.handle(Server.java:445
Re: SolrCloud 4.x hangs under high update volume
Hey guys, (copy of my post to SOLR-5216) We tested this patch and unfortunately encountered some serious issues a few hours of 500 update-batches/sec. Our update batch is 10 docs, so we are writing about 5000 docs/sec total, using autoCommit to commit the updates (no explicit commits). Our environment: Solr 4.3.1 w/SOLR-5216 patch. Jetty 9, Java 1.7. 3 solr instances, 1 per physical server. 1 collection. 3 shards. 2 replicas (each instance is a leader and a replica). Soft autoCommit is 1000ms. Hard autoCommit is 15000ms. After about 6 hours of stress-testing this patch, we see many of these stalled transactions (below), and the Solr instances start to see each other as down, flooding our Solr logs with Connection Refused exceptions, and otherwise no obviously-useful logs that I could see. I did notice some stalled transactions on both /select and /update, however. This never occurred without this patch. Stack /select seems stalled on: http://pastebin.com/Y1NCrXGC Stack /update seems stalled on: http://pastebin.com/cFLbC8Y9 Lastly, I have a summary of the ERROR-severity logs from this 24-hour soak. My script normalizes the ERROR-severity stack traces and returns them in order of occurrence. Summary of my solr.log: http://pastebin.com/pBdMAWeb Thanks! Tim Vaillancourt On 6 September 2013 07:27, Markus Jelsma markus.jel...@openindex.io wrote: Thanks! -Original message- From:Erick Erickson erickerick...@gmail.com Sent: Friday 6th September 2013 16:20 To: solr-user@lucene.apache.org Subject: Re: SolrCloud 4.x hangs under high update volume Markus: See: https://issues.apache.org/jira/browse/SOLR-5216 On Wed, Sep 4, 2013 at 11:04 AM, Markus Jelsma markus.jel...@openindex.iowrote: Hi Mark, Got an issue to watch? Thanks, Markus -Original message- From:Mark Miller markrmil...@gmail.com Sent: Wednesday 4th September 2013 16:55 To: solr-user@lucene.apache.org Subject: Re: SolrCloud 4.x hangs under high update volume I'm going to try and fix the root cause for 4.5 - I've suspected what it is since early this year, but it's never personally been an issue, so it's rolled along for a long time. Mark Sent from my iPhone On Sep 3, 2013, at 4:30 PM, Tim Vaillancourt t...@elementspace.com wrote: Hey guys, I am looking into an issue we've been having with SolrCloud since the beginning of our testing, all the way from 4.1 to 4.3 (haven't tested 4.4.0 yet). I've noticed other users with this same issue, so I'd really like to get to the bottom of it. Under a very, very high rate of updates (2000+/sec), after 1-12 hours we see stalled transactions that snowball to consume all Jetty threads in the JVM. This eventually causes the JVM to hang with most threads waiting on the condition/stack provided at the bottom of this message. At this point SolrCloud instances then start to see their neighbors (who also have all threads hung) as down w/Connection Refused, and the shards become down in state. Sometimes a node or two survives and just returns 503s no server hosting shard errors. As a workaround/experiment, we have tuned the number of threads sending updates to Solr, as well as the batch size (we batch updates from client - solr), and the Soft/Hard autoCommits, all to no avail. Turning off Client-to-Solr batching (1 update = 1 call to Solr), which also did not help. Certain combinations of update threads and batch sizes seem to mask/help the problem, but not resolve it entirely. Our current environment is the following: - 3 x Solr 4.3.1 instances in Jetty 9 w/Java 7. - 3 x Zookeeper instances, external Java 7 JVM. - 1 collection, 3 shards, 2 replicas (each node is a leader of 1 shard and a replica of 1 shard). - Log4j 1.2 for Solr logs, set to WARN. This log has no movement on a good day. - 5000 max jetty threads (well above what we use when we are healthy), Linux-user threads ulimit is 6000. - Occurs under Jetty 8 or 9 (many versions). - Occurs under Java 1.6 or 1.7 (several minor versions). - Occurs under several JVM tunings. - Everything seems to point to Solr itself, and not a Jetty or Java version (I hope I'm wrong). The stack trace that is holding up all my Jetty QTP threads is the following, which seems to be waiting on a lock that I would very much like to understand further: java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x0007216e68d8 (a java.util.concurrent.Semaphore$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186
Re: SolrCloud 4.x hangs under high update volume
Hey Mark, The farthest we've made it at the same batch size/volume was 12 hours without this patch, but that isn't consistent. Sometimes we would only get to 6 hours or less. During the crash I can see an amazing spike in threads to 10k which is essentially our ulimit for the JVM, but I strangely see no OutOfMemory: cannot open native thread errors that always follow this. Weird! We also notice a spike in CPU around the crash. The instability caused some shard recovery/replication though, so that CPU may be a symptom of the replication, or is possibly the root cause. The CPU spikes from about 20-30% utilization (system + user) to 60% fairly sharply, so the CPU, while spiking isn't quite pinned (very beefy Dell R720s - 16 core Xeons, whole index is in 128GB RAM, 6xRAID10 15k). More on resources: our disk I/O seemed to spike about 2x during the crash (about 1300kbps written to 3500kbps), but this may have been the replication, or ERROR logging (we generally log nothing due to WARN-severity unless something breaks). Lastly, I found this stack trace occurring frequently, and have no idea what it is (may be useful or not): java.lang.IllegalStateException : at org.eclipse.jetty.server.Response.resetBuffer(Response.java:964) at org.eclipse.jetty.server.Response.sendError(Response.java:325) at org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:692) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:380) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:155) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1423) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:450) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:138) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:564) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:213) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1083) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:379) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:175) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1017) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:136) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:258) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:109) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97) at org.eclipse.jetty.server.Server.handle(Server.java:445) at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:260) at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:225) at org.eclipse.jetty.io.AbstractConnection$ReadCallback.run(AbstractConnection.java:358) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:596) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:527) at java.lang.Thread.run(Thread.java:724) On your live_nodes question, I don't have historical data on this from when the crash occurred, which I guess is what you're looking for. I could add this to our monitoring for future tests, however. I'd be glad to continue further testing, but I think first more monitoring is needed to understand this further. Could we come up with a list of metrics that would be useful to see following another test and successful crash? Metrics needed: 1) # of live_nodes. 2) Full stack traces. 3) CPU used by Solr's JVM specifically (instead of system-wide). 4) Solr's JVM thread count (already done) 5) ? Cheers, Tim Vaillancourt On 6 September 2013 13:11, Mark Miller markrmil...@gmail.com wrote: Did you ever get to index that long before without hitting the deadlock? There really isn't anything negative the patch could be introducing, other than allowing for some more threads to possibly run at once. If I had to guess, I would say its likely this patch fixes the deadlock issue and your seeing another issue - which looks like the system cannot keep up with the requests or something for some reason - perhaps due to some OS networking settings or something (more guessing). Connection refused happens generally when there is nothing listening on the port. Do you see anything interesting change with the rest of the system? CPU usage spikes or something like that? Clamping down further on the overall number of threads night help (which would require making something configurable). How many nodes are listed in zk under live_nodes? Mark Sent from my iPhone On Sep 6, 2013, at 12:02 PM, Tim Vaillancourt t...@elementspace.com wrote: Hey guys
Re: SolrCloud 4.x hangs under high update volume
Enjoy your trip, Mark! Thanks again for the help! Tim On 6 September 2013 14:18, Mark Miller markrmil...@gmail.com wrote: Okay, thanks, useful info. Getting on a plane, but ill look more at this soon. That 10k thread spike is good to know - that's no good and could easily be part of the problem. We want to keep that from happening. Mark Sent from my iPhone On Sep 6, 2013, at 2:05 PM, Tim Vaillancourt t...@elementspace.com wrote: Hey Mark, The farthest we've made it at the same batch size/volume was 12 hours without this patch, but that isn't consistent. Sometimes we would only get to 6 hours or less. During the crash I can see an amazing spike in threads to 10k which is essentially our ulimit for the JVM, but I strangely see no OutOfMemory: cannot open native thread errors that always follow this. Weird! We also notice a spike in CPU around the crash. The instability caused some shard recovery/replication though, so that CPU may be a symptom of the replication, or is possibly the root cause. The CPU spikes from about 20-30% utilization (system + user) to 60% fairly sharply, so the CPU, while spiking isn't quite pinned (very beefy Dell R720s - 16 core Xeons, whole index is in 128GB RAM, 6xRAID10 15k). More on resources: our disk I/O seemed to spike about 2x during the crash (about 1300kbps written to 3500kbps), but this may have been the replication, or ERROR logging (we generally log nothing due to WARN-severity unless something breaks). Lastly, I found this stack trace occurring frequently, and have no idea what it is (may be useful or not): java.lang.IllegalStateException : at org.eclipse.jetty.server.Response.resetBuffer(Response.java:964) at org.eclipse.jetty.server.Response.sendError(Response.java:325) at org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:692) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:380) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:155) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1423) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:450) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:138) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:564) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:213) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1083) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:379) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:175) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1017) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:136) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:258) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:109) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97) at org.eclipse.jetty.server.Server.handle(Server.java:445) at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:260) at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:225) at org.eclipse.jetty.io.AbstractConnection$ReadCallback.run(AbstractConnection.java:358) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:596) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:527) at java.lang.Thread.run(Thread.java:724) On your live_nodes question, I don't have historical data on this from when the crash occurred, which I guess is what you're looking for. I could add this to our monitoring for future tests, however. I'd be glad to continue further testing, but I think first more monitoring is needed to understand this further. Could we come up with a list of metrics that would be useful to see following another test and successful crash? Metrics needed: 1) # of live_nodes. 2) Full stack traces. 3) CPU used by Solr's JVM specifically (instead of system-wide). 4) Solr's JVM thread count (already done) 5) ? Cheers, Tim Vaillancourt On 6 September 2013 13:11, Mark Miller markrmil...@gmail.com wrote: Did you ever get to index that long before without hitting the deadlock? There really isn't anything negative the patch could be introducing, other than allowing for some more threads to possibly run at once. If I had to guess, I would say its likely this patch fixes the deadlock issue and your seeing another issue - which looks like the system
Re: solrcloud shards backup/restoration
I wouldn't say I love this idea, but wouldn't it be safe to LVM snapshot the Solr index? I think this may even work on a live server, depending on some file I/O details. Has anyone tried this? An in-Solr solution sounds more elegant, but considering the tlog concern Shalin mentioned, I think this may work as an interim solution. Cheers! Tim On 6 September 2013 15:41, Aditya Sakhuja aditya.sakh...@gmail.com wrote: Thanks Shalin and Mark for your responses. I am on the same page about the conventions for taking the backup. However, I am less sure about the restoration of the index. Lets say we have 3 shards across 3 solrcloud servers. 1. I am assuming we should take a backup from each of the shard leaders to get a complete collection. do you think that will get the complete index ( not worrying about what is not hard committed at the time of backup ). ? 2. How do we go about restoring the index in a fresh solrcloud cluster ? From the structure of the snapshot I took, I did not see any replication.properties or index.properties which I see normally on a healthy solrcloud cluster nodes. if I have the snapshot named snapshot.20130905 does the snapshot.20130905/* go into data/index ? Thanks Aditya On Fri, Sep 6, 2013 at 7:28 AM, Mark Miller markrmil...@gmail.com wrote: Phone typing. The end should not say don't hard commit - it should say do a hard commit and take a snapshot. Mark Sent from my iPhone On Sep 6, 2013, at 7:26 AM, Mark Miller markrmil...@gmail.com wrote: I don't know that it's too bad though - its always been the case that if you do a backup while indexing, it's just going to get up to the last hard commit. With SolrCloud that will still be the case. So just make sure you do a hard commit right before taking the backup - yes, it might miss a few docs in the tran log, but if you are taking a back up while indexing, you don't have great precision in any case - you will roughly get a snapshot for around that time - even without SolrCloud, if you are worried about precision and getting every update into that backup, you want to stop indexing and commit first. But if you just want a rough snapshot for around that time, in both cases you can still just don't hard commit and take a snapshot. Mark Sent from my iPhone On Sep 6, 2013, at 1:13 AM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: The replication handler's backup command was built for pre-SolrCloud. It takes a snapshot of the index but it is unaware of the transaction log which is a key component in SolrCloud. Hence unless you stop updates, commit your changes and then take a backup, you will likely miss some updates. That being said, I'm curious to see how peer sync behaves when you try to restore from a snapshot. When you say that you haven't been successful in restoring, what exactly is the behaviour you observed? On Fri, Sep 6, 2013 at 5:14 AM, Aditya Sakhuja aditya.sakh...@gmail.com wrote: Hello, I was looking for a good backup / recovery solution for the solrcloud indexes. I am more looking for restoring the indexes from the index snapshot, which can be taken using the replicationHandler's backup command. I am looking for something that works with solrcloud 4.3 eventually, but still relevant if you tested with a previous version. I haven't been successful in have the restored index replicate across the new replicas, after I restart all the nodes, with one node having the restored index. Is restoring the indexes on all the nodes the best way to do it ? -- Regards, -Aditya Sakhuja -- Regards, Shalin Shekhar Mangar. -- Regards, -Aditya Sakhuja
Re: SolrCloud 4.x hangs under high update volume
Update: It is a bit too soon to tell, but about 6 hours into testing there are no crashes with this patch. :) We are pushing 500 batches of 10 updates per second to a 3 node, 3 shard cluster I mentioned above. 5000 updates per second total. More tomorrow after a 24 hr soak! Tim On Wednesday, 4 September 2013, Tim Vaillancourt wrote: Thanks so much for the explanation Mark, I owe you one (many)! We have this on our high TPS cluster and will run it through it's paces tomorrow. I'll provide any feedback I can, more soon! :D Cheers, Tim
Re: DIH + Solr Cloud
Hey Alejandro, I guess it means what you call more than one instance. The request handlers are at the core-level, and not the Solr instance/global level, and within each of those cores you could have one or more data import handlers. Most setups have 1 DIH per core at the handler location /dataimport, but I believe you could have several, ie: /dataimport2, /dataimport3 if you had different DIH configs for each handler. Within a single data import handler, you can have several entities, which are what explain to the DIH processes how to get/index the data. What you can do here is have several entities that construct your index, and execute those entities with several separate HTTP calls to the DIH, thus creating more than one instance of the DIH process within 1 core and 1 DIH handler. ie: curl http://localhost:8983/solr/core1/dataimport?command=full-importentity=suppliers; curl http://localhost:8983/solr/core1/dataimport?command=full-importentity=parts; curl http://localhost:8983/solr/core1/dataimport?command=full-importentity=companies; http://wiki.apache.org/solr/DataImportHandler#Commands Cheers, Tim On 03/09/13 09:25 AM, Alejandro Calbazana wrote: Hi, Quick question about data import handlers in Solr cloud. Does anyone use more than one instance to support the DIH process? Or is the typical setup to have one box setup as only the DIH and keep this responsibility outside of the Solr cloud environment? I'm just trying to get picture of his this is typically deployed. Thanks! Alejandro
Re: SolrCloud 4.x hangs under high update volume
Thanks guys! :) Mark: this patch is much appreciated, I will try to test this shortly, hopefully today. For my curiosity/understanding, could someone explain to me quickly what locks SolrCloud takes on updates? Was I on to something that more shards decrease the chance for locking? Secondly, I was wondering if someone could summarize what this patch 'fixes'? I'm not too familiar with Java and the solr codebase (working on that though :D). Cheers, Tim On 4 September 2013 09:52, Mark Miller markrmil...@gmail.com wrote: There is an issue if I remember right, but I can't find it right now. If anyone that has the problem could try this patch, that would be very helpful: http://pastebin.com/raw.php?i=aaRWwSGP - Mark On Wed, Sep 4, 2013 at 8:04 AM, Markus Jelsma markus.jel...@openindex.io wrote: Hi Mark, Got an issue to watch? Thanks, Markus -Original message- From:Mark Miller markrmil...@gmail.com Sent: Wednesday 4th September 2013 16:55 To: solr-user@lucene.apache.org Subject: Re: SolrCloud 4.x hangs under high update volume I'm going to try and fix the root cause for 4.5 - I've suspected what it is since early this year, but it's never personally been an issue, so it's rolled along for a long time. Mark Sent from my iPhone On Sep 3, 2013, at 4:30 PM, Tim Vaillancourt t...@elementspace.com wrote: Hey guys, I am looking into an issue we've been having with SolrCloud since the beginning of our testing, all the way from 4.1 to 4.3 (haven't tested 4.4.0 yet). I've noticed other users with this same issue, so I'd really like to get to the bottom of it. Under a very, very high rate of updates (2000+/sec), after 1-12 hours we see stalled transactions that snowball to consume all Jetty threads in the JVM. This eventually causes the JVM to hang with most threads waiting on the condition/stack provided at the bottom of this message. At this point SolrCloud instances then start to see their neighbors (who also have all threads hung) as down w/Connection Refused, and the shards become down in state. Sometimes a node or two survives and just returns 503s no server hosting shard errors. As a workaround/experiment, we have tuned the number of threads sending updates to Solr, as well as the batch size (we batch updates from client - solr), and the Soft/Hard autoCommits, all to no avail. Turning off Client-to-Solr batching (1 update = 1 call to Solr), which also did not help. Certain combinations of update threads and batch sizes seem to mask/help the problem, but not resolve it entirely. Our current environment is the following: - 3 x Solr 4.3.1 instances in Jetty 9 w/Java 7. - 3 x Zookeeper instances, external Java 7 JVM. - 1 collection, 3 shards, 2 replicas (each node is a leader of 1 shard and a replica of 1 shard). - Log4j 1.2 for Solr logs, set to WARN. This log has no movement on a good day. - 5000 max jetty threads (well above what we use when we are healthy), Linux-user threads ulimit is 6000. - Occurs under Jetty 8 or 9 (many versions). - Occurs under Java 1.6 or 1.7 (several minor versions). - Occurs under several JVM tunings. - Everything seems to point to Solr itself, and not a Jetty or Java version (I hope I'm wrong). The stack trace that is holding up all my Jetty QTP threads is the following, which seems to be waiting on a lock that I would very much like to understand further: java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x0007216e68d8 (a java.util.concurrent.Semaphore$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:834) at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:994) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1303) at java.util.concurrent.Semaphore.acquire(Semaphore.java:317) at org.apache.solr.util.AdjustableSemaphore.acquire(AdjustableSemaphore.java:61) at org.apache.solr.update.SolrCmdDistributor.submit(SolrCmdDistributor.java:418) at org.apache.solr.update.SolrCmdDistributor.submit(SolrCmdDistributor.java:368) at org.apache.solr.update.SolrCmdDistributor.flushAdds(SolrCmdDistributor.java:300) at org.apache.solr.update.SolrCmdDistributor.finish(SolrCmdDistributor.java:96) at org.apache.solr.update.processor.DistributedUpdateProcessor.doFinish(DistributedUpdateProcessor.java:462
Re: SolrCloud 4.x hangs under high update volume
Thanks so much for the explanation Mark, I owe you one (many)! We have this on our high TPS cluster and will run it through it's paces tomorrow. I'll provide any feedback I can, more soon! :D Cheers, Tim
SolrCloud 4.x hangs under high update volume
) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97) at org.eclipse.jetty.server.Server.handle(Server.java:445) at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:268) at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:229) at org.eclipse.jetty.io.AbstractConnection$ReadCallback.run(AbstractConnection.java:358) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:601) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:532) at java.lang.Thread.run(Thread.java:724) Some questions I had were: 1) What exclusive locks does SolrCloud make when performing an update? 2) Keeping in mind I do not read or write java (sorry :D), could someone help me understand what solr is locking in this case at org.apache.solr.util.AdjustableSemaphore.acquire(AdjustableSemaphore.java:61) when performing an update? That will help me understand where to look next. 3) It seems all threads in this state are waiting for 0x0007216e68d8, is there a way to tell what 0x0007216e68d8 is? 4) Is there a limit to how many updates you can do in SolrCloud? 5) Wild-ass-theory: would more shards provide more locks (whatever they are) on update, and thus more update throughput? To those interested, I've provided a stacktrace of 1 of 3 nodes at this URL in gzipped form: https://s3.amazonaws.com/timvaillancourt.com/tmp/solr-jstack-2013-08-23.gz Any help/suggestions/ideas on this issue, big or small, would be much appreciated. Thanks so much all! Tim Vaillancourt
Re: Sharing SolrCloud collection configs w/overrides
Here you go Erick, feel free to update this. I am unable to assign to you, but asked for someone to do so: https://issues.apache.org/jira/browse/SOLR-5208 Cheers, Tim On 21/08/13 10:40 AM, Tim Vaillancourt wrote: Well, the mention of DIH is a bit off-topic. I'll simplify and say all I need is the ability to set ANY variables in solrconfig.xml without having to make N number of copies of the same configuration to achieve that. Essentially I need 10+ collections to use the exact same config dir in Zookeeper with minor/trivial differences set in variables. Your proposal of taking in values at core creation-time is a neat one and would be a very flexible solution for a lot of use cases. My only concern for my really-specific use cae is that I'd be setting DB user/passwords via plain-text HTTP calls, but having this feature is better than not. In a perfect world I'd like to be able to include files in Zookeeper (like XInclude) that are outside the common config dir (eg: '/configs/sharedconfig') all the collections would be sharing. On the other hand, that sort of solution would open up the Zookeeper layout to arbitrary files and could end up in a nightmare if not done carefully, however. Would it be possible for Solr to support specifying multiple configs at collection creation, that are merged or concatenated. This idea sounds terrible to me even at this moment, but I wonder if there is something in there.. Tim
Re: Sharing SolrCloud collection configs w/overrides
Well, the mention of DIH is a bit off-topic. I'll simplify and say all I need is the ability to set ANY variables in solrconfig.xml without having to make N number of copies of the same configuration to achieve that. Essentially I need 10+ collections to use the exact same config dir in Zookeeper with minor/trivial differences set in variables. Your proposal of taking in values at core creation-time is a neat one and would be a very flexible solution for a lot of use cases. My only concern for my really-specific use cae is that I'd be setting DB user/passwords via plain-text HTTP calls, but having this feature is better than not. In a perfect world I'd like to be able to include files in Zookeeper (like XInclude) that are outside the common config dir (eg: '/configs/sharedconfig') all the collections would be sharing. On the other hand, that sort of solution would open up the Zookeeper layout to arbitrary files and could end up in a nightmare if not done carefully, however. Would it be possible for Solr to support specifying multiple configs at collection creation, that are merged or concatenated. This idea sounds terrible to me even at this moment, but I wonder if there is something in there.. Tim
Sharing SolrCloud collection configs w/overrides
Hey guys, I have a situation where I have a lot of collections that share the same core config in Zookeeper. For each of my SolrCloud collections, 99.9% of the config (schema.xml, solrcloud.xml) are the same, only the DataImportHandler parameters are different for different database names/credentials, per collection. To provide the different DIH credentials per collection, I currently upload many copies of the exact-same Solr config dir with 1 Xincluded file with the 4-5 database parameters that are different alongside the schema.xml and solrconfig.xml. I don't feel this ideal and is wasting space in Zookeeper considering most of my configs are duplicated. At a high level, is there a way for me to share one config in Zookeeper while having minor overrides to the variables? Is there a way for me to XInclude a file outside of my Zookeeper config dir, ie: could I XInclude arbitrary locations in Zookeeper so that I can have the same config dir for all collections and a file in Zookeeper that is external to the common config dir to apply the collection-specific overrides? To extend my question for Solr 4.4 core.properties files: am I stuck in the same boat under Solr 4.4 if I have say 10 collections sharing one config, but I want each to have a unique core.properties? Cheers! Tim
Re: Problems installing Solr4 in Jetty9
Try adding 'ext' to your OPTIONS= line for Jetty. Tim On 16/08/13 05:04 AM, Dmitry Kan wrote: Hi, I have the following jar in jetty/lib/ext: log4j-1.2.16.jar slf4j-api-1.6.6.jar slf4j-log4j12-1.6.6.jar jcl-over-slf4j-1.6.6.jar jul-to-slf4j-1.6.6.jar do you? Dmitry On Thu, Aug 8, 2013 at 12:49 PM, Spadezjames_will...@hotmail.com wrote: Apparently this is the error: 2013-08-08 09:35:19.994:WARN:oejw.WebAppContext:main: Failed startup of context o.e.j.w.WebAppContext@64a20878 {/solr,file:/tmp/jetty-0.0.0.0-8080-solr.war-_solr-any-/webapp/,STARTING}{/solr.war} org.apache.solr.common.SolrException: Could not find necessary SLF4j logging jars. If using Jetty, the SLF4j logging jars need to go in the jetty lib/ext directory. For other containers, the corresponding directory should be used. For more information, see: http://wiki.apache.org/solr/SolrLogging -- View this message in context: http://lucene.472066.n3.nabble.com/Problems-installing-Solr4-in-Jetty9-tp4083209p4083224.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SolrCloud Load Balancer weight
Soon ended up being a while :), feel free to add any thoughts. https://issues.apache.org/jira/browse/SOLR-5166 Tim On 07/06/13 03:07 PM, Vaillancourt, Tim wrote: Cool! Having those values influenced by stats is a neat idea too. I'll get on that soon. Tim -Original Message- From: Mark Miller [mailto:markrmil...@gmail.com] Sent: Monday, June 03, 2013 5:07 PM To: solr-user@lucene.apache.org Subject: Re: SolrCloud Load Balancer weight On Jun 3, 2013, at 3:33 PM, Tim Vaillancourtt...@elementspace.com wrote: Should I JIRA this? Thoughts? Yeah - it's always been in the back of my mind - it's come up a few times - eventually we would like nodes to report some stats to zk to influence load balancing. - mark
Re: Adding Postgres and Mysql JDBC drivers to Solr
Another option is defining the location of these jars in your solrconfig.xml and storing the libraries external to jetty, which has some advantages. Eg: MySQL connector is located at '/opt/mysql_connector' and adding this to your solrconfig.xml alongside the other lib entities: lib dir=/opt/mysql_connector/ regex=.*\.jar / Cheers, Tim On 06/08/13 08:02 AM, Spadez wrote: Thank you very much -- View this message in context: http://lucene.472066.n3.nabble.com/Adding-Postgres-and-Mysql-JDBC-drivers-to-Solr-tp4082806p4082832.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Internal shard communication - performance?
For me the biggest deal with increased chatter between SolrCloud is object creation and GCs. The resulting CPU load from the increase GCing seems to affect performance for me in some load tests, but I'm still trying to gather hard numbers on it. Cheers, Tim On 07/08/13 04:05 PM, Shawn Heisey wrote: On 8/7/2013 2:45 PM, Torsten Albrecht wrote: I would like to run zookeeper external at my old master server. So I have two zookeeper to control my cloud. The third and fourth zookeeper will be a virtual machine. For true HA with zookepeer, you need at least three instances on separate physical hardware. If you want to use VMs, that would be fine, but you must ensure that you aren't running more than one instance on the same physical server. For best results, use an odd number of ZK instances. With three ZK instances, one can go down and everything still works. With five, two can go down and everything still works. If you've got a fully switched network that's at least gigabit speed, then the network latency involved in internal communication shouldn't really matter. Thanks, Shawn
Re: debian package for solr with jetty
Hey guys, It is by no means perfect or pretty, but I use this script below to build Solr into a .deb package that installs Solr to /opt/solr-VERSION with 'example' and 'docs' removed, and a symlink to /opt/solr. When building, the script wget's the tgz, builds it in a tmpdir within the cwd and makes a .deb. There is no container included or anything, so this essentially builds a library-style package of Solr to be included by other packages, so it's probably not entirely what people are looking for here, but here goes: solr-dpkg.sh: #!/bin/bash set -e VERSION=$1 if test -z ${VERSION}; then echo Usage: $0 [SOLR VERSION] exit 1 fi NAME=solr MIRROR_BASE=http://apache.mirror.iweb.ca; PREFIX=/opt PNAME=solr_${VERSION} BUILD_BASE=$$ BUILD_DIR=${BUILD_BASE}/${PNAME} START_DIR=${PWD} # Clean build dir: if test -e ${BUILD_DIR}; then rm -rf ${BUILD_DIR} fi # Wget solr: SOLR_TAR=solr-${VERSION}.tgz if test ! -e ${SOLR_TAR}; then wget -N ${MIRROR_BASE}/lucene/solr/${VERSION}/${SOLR_TAR} fi # Debian metadata: mkdir -p ${BUILD_DIR} ${BUILD_DIR}/DEBIAN cat EOF ${BUILD_DIR}/DEBIAN/control Package: solr Priority: extra Maintainer: Tim Vaillancourt t...@timvaillancourt.com Section: libs Homepage: http://lucene.apache.org/solr/ Version: ${VERSION} Description: Apache Solr ${VERSION} Architecture: all EOF # Unpack solr in correct location: mkdir -p ${BUILD_DIR}${PREFIX} tar xfz ${SOLR_TAR} -C ${BUILD_DIR}${PREFIX} rm -rf ${BUILD_DIR}${PREFIX}/solr-${VERSION}/{docs,example} ln -s ${PREFIX}/solr-${VERSION} ${BUILD_DIR}${PREFIX}/solr # Package and cleanup after: cd ${BUILD_BASE} dpkg-deb -b ${PNAME} \ mv ${PNAME}.deb ${START_DIR}/${PNAME}.deb cd ${START_DIR} rm -rf ${BUILD_BASE} exit 0 Usage example: ./solr-dpkg.sh 4.4.0 In my setup I have other packages pointing to this package's path as a library with solr, jetty and the 'instance-package' separated. These packages depend on the version of the solr 'library package' built by this script. Enjoy! Tim On 01/08/13 08:14 PM, Yago Riveiro wrote: Some time ago a found this https://github.com/LucidWorks/solr-fabric/blob/master/solr-fabric-guide.md , Instead of puppet or chef (I don't know if it is a requirement) it is developed with fabric. -- Yago Riveiro Sent with Sparrow (http://www.sparrowmailapp.com/?sig) On Friday, August 2, 2013 at 3:32 AM, Alexandre Rafalovitch wrote: Well, it is one of the requests with a couple of vote on the Solr Usability Contest: https://solrstart.uservoice.com/forums/216001-usability-contest/suggestions/4249809-puppet-chef-configuration-to-automatically-setup-s So, if somebody with the knowledge of those tools could review the space and figure out what the state of the art for this is, it would be great. If somebody could identify the gap and fill in, it would be awesome. :-) Regards, Alex. Personal website: http://www.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Thu, Aug 1, 2013 at 10:25 PM, Michael Della Bitta michael.della.bi...@appinions.com (mailto:michael.della.bi...@appinions.com) wrote: There should be at least a good Chef recipe, since Chef uses Solr internally. I'm not using anything of theirs, since we've thus far been a Tomcat shop. If nothing exists, I should whip something up. On Aug 1, 2013 3:06 PM, Alexandre Rafalovitcharafa...@gmail.com (mailto:arafa...@gmail.com) wrote: And are there good chef/puppet/etc rules for the public use? I could not find when I looked. Regards, Alex On 1 Aug 2013 11:32, Michael Della Bitta michael.della.bi...@appinions.com (mailto:michael.della.bi...@appinions.com) wrote: Hi Manasi, We use Chef for this type of thing here at my current job. Have you considered something like it? Other ones to look at are Puppet, CFEngine, Salt, and Ansible. Michael Della Bitta Applications Developer o: +1 646 532 3062 | c: +1 917 477 7906 appinions inc. “The Science of Influence Marketing” 18 East 41st Street New York, NY 10017 t: @appinionshttps://twitter.com/Appinions | g+: plus.google.com/appinions (http://plus.google.com/appinions) https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts w: appinions.comhttp://www.appinions.com/ On Wed, Jul 31, 2013 at 8:10 PM, smanadsma...@gmail.com (mailto:sma...@gmail.com) wrote: Hi, I am trying to create a debian package for solr 4.3 (default installation with jetty). Is there anything already available? Also, I need 3 different cores so plan to create corresponding packages for each of them to create solr core using admin/cores or collections api. I also want to use, solrcloud setup with external zookeeper ensemble, whats the best way to create a debian package for updating zookeeper config files as well? Please suggest. Any pointers will be helpful. Thanks, -Manasi
Re: SolrCloud 4.3.1 - Failure to open existing log file (non fatal) errors under high load
Thanks for the reply Erick, Hard Commit - 15000ms, openSearcher=false Soft Commit - 1000ms, openSearcher=true 15sec hard commit was sort of a guess, I could try a smaller number. When you say getting too large what limit do you think it would be hitting: a ulimit (nofiles), disk space, number of changes, a limit in Solr itself? By my math there would be 15 tlogs max per core, but I don't really know how it all works if someone could fill me in/point me somewhere. Cheers, Tim On 27/07/13 07:57 AM, Erick Erickson wrote: What is your autocommit limit? Is it possible that your transaction logs are simply getting too large? tlogs are truncated whenever you do a hard commit (autocommit) with openSearcher either true for false it doesn't matter. FWIW, Erick On Fri, Jul 26, 2013 at 12:56 AM, Tim Vaillancourtt...@elementspace.com wrote: Thanks Shawn and Yonik! Yonik: I noticed this error appears to be fairly trivial, but it is not appearing after a previous crash. Every time I run this high-volume test that produced my stack trace, I zero out the logs, Solr data and Zookeeper data and start over from scratch with a brand new collection and zero'd out logs. The test is mostly high volume (2000-4000 updates/sec) and at the start the SolrCloud runs decently for a good 20-60~ minutes, no errors in the logs at all. Then that stack trace occurs on all 3 nodes (staggered), I immediately get some replica down messages and then some cannot connect errors to all other cluster nodes, who have all crashed the same way. The tlog error could be a symptom of the problem of running out of threads perhaps. Shawn: thanks so much for sharing those details! Yes, they seem to be nice servers, for sure - I don't get to touch/see them but they're fast! I'll look into firmwares for sure and will try again after updating them. These Solr instances are not-bare metal and are actually KVM VMs so that's another layer to look into, although it is consistent between the two clusters. I am not currently increasing the 'nofiles' ulimit to above default like you are, but does Solr use 10,000+ file handles? It won't hurt to try it I guess :). To rule out Java 7, I'll probably also try Jetty 8 and Java 1.6 as an experiment as well. Thanks! Tim On 25/07/13 05:55 PM, Yonik Seeley wrote: On Thu, Jul 25, 2013 at 7:44 PM, Tim Vaillancourtt...@elementspace.com wrote: ERROR [2013-07-25 19:34:24.264] [org.apache.solr.common.SolrException] Failure to open existing log file (non fatal) That itself isn't necessarily a problem (and why it says non fatal) - it just means that most likely the a transaction log file was truncated from a previous crash. It may be unrelated to the other issues you are seeing. -Yonik http://lucidworks.com
Re: SolrCloud 4.3.1 - Failure to open existing log file (non fatal) errors under high load
Thanks Jack/Erick, I don't know if this is true or not, but I've read there is a tlog per soft commit, which is then truncated by the hard commit. If this were true, a 15sec hard-commit with a 1sec soft-commit could generate around 15~ tlogs, but I've never checked. I like Erick's scenario more if it is 1 tlog/core though. I'll try to find out some more. Another two test/things I really should try for sanity are: - Java 1.6 and Jetty 8: just to rule things out (wouldn't actually launch this way). - ulimit for 'nofiles': the default is pretty high but why not? - Monitor size + # of tlogs. I'll be sure to share findings and really appreciate the help guys! PS: This is asking a lot, but if anyone can take a look at that thread dump, or give me some pointers on what to look for in a stall/thread-pile up thread dump like this, I would really appreciate it. I'm quite weak at deciphering those (I use Thread Dump Analyzer) but I'm sure it would tell a lot. Cheers, Tim On 27/07/13 02:24 PM, Erick Erickson wrote: Tim: 15 seconds isn't unreasonable, I was mostly wondering if it was hours. Take a look at the size of the tlogs as you're indexing, you should see them truncate every 15 seconds or so. There'll be a varying number of tlogs kept around, although under heavy indexing I'd only expect 1 or 2 inactive ones, the internal number is that there'll be enough tlogs kept around to hold 100 docs. There should only be 1 open tlog/core as I understand it. When a commit happens (hard, openSearcher = true or false doesn't matter) the current tlog is closed and a new one opened. Then some cleanup happens so there are only enough tlogs kept around to hold 100 docs. Strange, Im kind of out of ideas. Erick On Sat, Jul 27, 2013 at 4:41 PM, Jack Krupanskyj...@basetechnology.com wrote: No hard numbers, but the general guidance is that you should set your hard commit interval to match your expectations for how quickly nodes should come up if they need to be restarted. Specifically, a hard commit assures that all changes have been committed to disk and are ready for immediate access on restart, but any and all soft commit changes since the last hard commit must be replayed (reexecuted) on restart of a node. How long does it take to replay the changes in the update log? No firm numbers, but treat it as if all of those uncommitted updates had to be resent and reprocessed by Solr. It's probably faster than that, but you get the picture. I would suggest thinking in terms of minutes rather than seconds for hard commits 5 minutes, 10, 15, 20, 30 minutes. Hard commits may result in kicking off segment merges, so too rapid a rate of segment creation might cause problems or at least be counterproductive. So, instead of 15 seconds, try 15 minutes. OTOH, if you really need to handle 4,000 update a seconds... you are clearly in uncharted territory and need to expect to need to do some heavy duty trial and error tuning on your own. -- Jack Krupansky -Original Message- From: Tim Vaillancourt Sent: Saturday, July 27, 2013 4:21 PM To: solr-user@lucene.apache.org Subject: Re: SolrCloud 4.3.1 - Failure to open existing log file (non fatal) errors under high load Thanks for the reply Erick, Hard Commit - 15000ms, openSearcher=false Soft Commit - 1000ms, openSearcher=true 15sec hard commit was sort of a guess, I could try a smaller number. When you say getting too large what limit do you think it would be hitting: a ulimit (nofiles), disk space, number of changes, a limit in Solr itself? By my math there would be 15 tlogs max per core, but I don't really know how it all works if someone could fill me in/point me somewhere. Cheers, Tim On 27/07/13 07:57 AM, Erick Erickson wrote: What is your autocommit limit? Is it possible that your transaction logs are simply getting too large? tlogs are truncated whenever you do a hard commit (autocommit) with openSearcher either true for false it doesn't matter. FWIW, Erick On Fri, Jul 26, 2013 at 12:56 AM, Tim Vaillancourtt...@elementspace.com wrote: Thanks Shawn and Yonik! Yonik: I noticed this error appears to be fairly trivial, but it is not appearing after a previous crash. Every time I run this high-volume test that produced my stack trace, I zero out the logs, Solr data and Zookeeper data and start over from scratch with a brand new collection and zero'd out logs. The test is mostly high volume (2000-4000 updates/sec) and at the start the SolrCloud runs decently for a good 20-60~ minutes, no errors in the logs at all. Then that stack trace occurs on all 3 nodes (staggered), I immediately get some replica down messages and then some cannot connect errors to all other cluster nodes, who have all crashed the same way. The tlog error could be a symptom of the problem of running out of threads perhaps. Shawn: thanks so much for sharing those details! Yes, they seem to be nice servers, for sure - I don't get to touch/see them but they're fast
SolrCloud 4.3.1 - Failure to open existing log file (non fatal) errors under high load
Hey guys, I am reaching out to the Solr list with a very vague issue: under high load against a SolrCloud 4.3.1 cluster of 3 instances, 3 shards, 2 replicas (2 cores per instance), I eventually see failure messages related to transaction logs, and shortly after these stacktraces occur the cluster starts to fall apart. To explain my setup: - SolrCloud 4.3.1. - Jetty 9.x. - Oracle/Sun JDK 1.7.25 w/CMS. - RHEL 6.x 64-bit. - 3 instances, 1 per server. - 3 shards. - 2 replicas per shard. The transaction log error I receive after about 10-30 minutes of load testing is: ERROR [2013-07-25 19:34:24.264] [org.apache.solr.common.SolrException] Failure to open existing log file (non fatal) /opt/easw/easw_apps/easo_solr_cloud/solr/xmshd_shard3_replica2/data/tlog/tlog.078:org.apache.solr.common.SolrException: java.io.EOFException at org.apache.solr.update.TransactionLog.init(TransactionLog.java:182) at org.apache.solr.update.UpdateLog.init(UpdateLog.java:233) at org.apache.solr.update.UpdateHandler.initLog(UpdateHandler.java:83) at org.apache.solr.update.UpdateHandler.init(UpdateHandler.java:138) at org.apache.solr.update.UpdateHandler.init(UpdateHandler.java:125) at org.apache.solr.update.DirectUpdateHandler2.init(DirectUpdateHandler2.java:95) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:525) at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:525) at org.apache.solr.core.SolrCore.createUpdateHandler(SolrCore.java:596) at org.apache.solr.core.SolrCore.init(SolrCore.java:805) at org.apache.solr.core.SolrCore.init(SolrCore.java:618) at org.apache.solr.core.CoreContainer.createFromZk(CoreContainer.java:894) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:982) at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:597) at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:592) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:722) Caused by: java.io.EOFException at org.apache.solr.common.util.FastInputStream.readUnsignedByte(FastInputStream.java:73) at org.apache.solr.common.util.FastInputStream.readInt(FastInputStream.java:216) at org.apache.solr.update.TransactionLog.readHeader(TransactionLog.java:266) at org.apache.solr.update.TransactionLog.init(TransactionLog.java:160) ... 25 more Eventually after a few of these stack traces, the cluster starts to lose shards and replicas fail. Jetty then creates hung threads until hitting OutOfMemory on native threads due to the maximum process ulimit. I know this is quite a vague issue, so I'm not expecting a silver-bullet answer, but I was wondering if anyone has suggestions on where to look next? Does this sound Solr-related at all, or possibly system? Has anyone seen this issue before, or has any hypothesis how to find out more? I will reply shortly with a thread dump, taken from 1 locked-up node. Thanks for any suggestions! Tim
Re: SolrCloud 4.3.1 - Failure to open existing log file (non fatal) errors under high load
Stack trace: http://timvaillancourt.com.s3.amazonaws.com/tmp/solrcloud.nodeC.2013-07-25-16.jstack.gz Cheers! Tim On 25 July 2013 16:44, Tim Vaillancourt t...@elementspace.com wrote: Hey guys, I am reaching out to the Solr list with a very vague issue: under high load against a SolrCloud 4.3.1 cluster of 3 instances, 3 shards, 2 replicas (2 cores per instance), I eventually see failure messages related to transaction logs, and shortly after these stacktraces occur the cluster starts to fall apart. To explain my setup: - SolrCloud 4.3.1. - Jetty 9.x. - Oracle/Sun JDK 1.7.25 w/CMS. - RHEL 6.x 64-bit. - 3 instances, 1 per server. - 3 shards. - 2 replicas per shard. The transaction log error I receive after about 10-30 minutes of load testing is: ERROR [2013-07-25 19:34:24.264] [org.apache.solr.common.SolrException] Failure to open existing log file (non fatal) /opt/easw/easw_apps/easo_solr_cloud/solr/xmshd_shard3_replica2/data/tlog/tlog.078:org.apache.solr.common.SolrException: java.io.EOFException at org.apache.solr.update.TransactionLog.init(TransactionLog.java:182) at org.apache.solr.update.UpdateLog.init(UpdateLog.java:233) at org.apache.solr.update.UpdateHandler.initLog(UpdateHandler.java:83) at org.apache.solr.update.UpdateHandler.init(UpdateHandler.java:138) at org.apache.solr.update.UpdateHandler.init(UpdateHandler.java:125) at org.apache.solr.update.DirectUpdateHandler2.init(DirectUpdateHandler2.java:95) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:525) at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:525) at org.apache.solr.core.SolrCore.createUpdateHandler(SolrCore.java:596) at org.apache.solr.core.SolrCore.init(SolrCore.java:805) at org.apache.solr.core.SolrCore.init(SolrCore.java:618) at org.apache.solr.core.CoreContainer.createFromZk(CoreContainer.java:894) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:982) at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:597) at org.apache.solr.core.CoreContainer$2.call(CoreContainer.java:592) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:722) Caused by: java.io.EOFException at org.apache.solr.common.util.FastInputStream.readUnsignedByte(FastInputStream.java:73) at org.apache.solr.common.util.FastInputStream.readInt(FastInputStream.java:216) at org.apache.solr.update.TransactionLog.readHeader(TransactionLog.java:266) at org.apache.solr.update.TransactionLog.init(TransactionLog.java:160) ... 25 more Eventually after a few of these stack traces, the cluster starts to lose shards and replicas fail. Jetty then creates hung threads until hitting OutOfMemory on native threads due to the maximum process ulimit. I know this is quite a vague issue, so I'm not expecting a silver-bullet answer, but I was wondering if anyone has suggestions on where to look next? Does this sound Solr-related at all, or possibly system? Has anyone seen this issue before, or has any hypothesis how to find out more? I will reply shortly with a thread dump, taken from 1 locked-up node. Thanks for any suggestions! Tim
Re: SolrCloud 4.3.1 - Failure to open existing log file (non fatal) errors under high load
Thanks for the reply Shawn, I can always count on you :). We are using 10GB heaps and have over 100GB of OS cache free to answer the JVM question, Young has about 50% of the heap, all CMS. Our max number of processes for the JVM user is 10k, which is where Solr dies when it blows up with 'cannot create native thread'. I also want to say this is system related, but I am seeing this occur on all 3 servers, which are brand-new Dell R720s. I'm not saying this is impossible, but I don't see much to suggest that, and it would need to be one hell of a coincidence. To add more confusion to the mix, we actually run a 2nd SolrCloud cluster on the same Solr, Jetty and JVM versions that do not exhibit this issue, although using a completely different schema, servers and access-patterns, although it is also at high-TPS. That is some evidence to say the current software stack is OK, or maybe this only occurs under an extreme load that 2nd cluster does not see, or lastly only with a certain schema. Lastly, to add a bit more detail to my original description, so far I have tried: - Entirely rebuilding my cluster from scratch, reinstalling all deps, configs, reindexing the data (in case I screwed up somewhere). The EXACT same issue occurs under load about 20-45 minutes in. - Moving to Java 1.7.0_21 from _25 due to some known bugs. Same issue occurs after some load. - Restarting SolrCloud / forcing rebuilds or cores. Same issue occurs after some load. Cheers, Tim On 25 July 2013 17:13, Shawn Heisey s...@elyograg.org wrote: On 7/25/2013 5:44 PM, Tim Vaillancourt wrote: The transaction log error I receive after about 10-30 minutes of load testing is: ERROR [2013-07-25 19:34:24.264] [org.apache.solr.common.**SolrException] Failure to open existing log file (non fatal) /opt/easw/easw_apps/easo_solr_**cloud/solr/xmshd_shard3_** replica2/data/tlog/tlog.**078:org.**apache.solr.common.** SolrException: java.io.EOFException snip Caused by: java.io.EOFException at org.apache.solr.common.util.**FastInputStream.**readUnsignedByte(** FastInputStream.java:73) at org.apache.solr.common.util.**FastInputStream.readInt(** FastInputStream.java:216) at org.apache.solr.update.**TransactionLog.readHeader(** TransactionLog.java:266) at org.apache.solr.update.**TransactionLog.init(**TransactionLog.java:160) ... 25 more This looks to me like a system problem. RHEL should be pretty solid, I use CentOS without any trouble. My initial guesses are a corrupt filesystem, failing hardware, or possibly a kernel problem with your specific hardware. I'm running Jetty 8, which is the version that the example uses. Could Jetty 9 be a problem here? I couldn't really say, though my initial guess is that it's not a problem. I'm running Oracle Java 1.7.0_13. Normally later releases are better, but Java bugs do exist and do get introduced in later releases. Because you're on the absolute latest, I'm guessing that you had the problem with an earlier release and upgraded to see if it went away. If that's what happened, it is less likely that it's Java. My first instinct would be to do a 'yum distro-sync' followed by 'touch /forcefsck' and reboot with console access to the server, so that you can deal with any fsck problems. Perhaps you've already tried that. I'm aware that this could be very very hard to get pushed through strict change management procedures. I did some searching. SOLR-4519 is a different problem, but it looks like it has a similar underlying exception, with no resolution. It was filed When Solr 4.1.0 was current. Could there be a resource problem - heap too small, not enough OS disk cache, etc? Thanks, Shawn
Re: SolrCloud 4.3.1 - Failure to open existing log file (non fatal) errors under high load
Thanks Shawn and Yonik! Yonik: I noticed this error appears to be fairly trivial, but it is not appearing after a previous crash. Every time I run this high-volume test that produced my stack trace, I zero out the logs, Solr data and Zookeeper data and start over from scratch with a brand new collection and zero'd out logs. The test is mostly high volume (2000-4000 updates/sec) and at the start the SolrCloud runs decently for a good 20-60~ minutes, no errors in the logs at all. Then that stack trace occurs on all 3 nodes (staggered), I immediately get some replica down messages and then some cannot connect errors to all other cluster nodes, who have all crashed the same way. The tlog error could be a symptom of the problem of running out of threads perhaps. Shawn: thanks so much for sharing those details! Yes, they seem to be nice servers, for sure - I don't get to touch/see them but they're fast! I'll look into firmwares for sure and will try again after updating them. These Solr instances are not-bare metal and are actually KVM VMs so that's another layer to look into, although it is consistent between the two clusters. I am not currently increasing the 'nofiles' ulimit to above default like you are, but does Solr use 10,000+ file handles? It won't hurt to try it I guess :). To rule out Java 7, I'll probably also try Jetty 8 and Java 1.6 as an experiment as well. Thanks! Tim On 25/07/13 05:55 PM, Yonik Seeley wrote: On Thu, Jul 25, 2013 at 7:44 PM, Tim Vaillancourtt...@elementspace.com wrote: ERROR [2013-07-25 19:34:24.264] [org.apache.solr.common.SolrException] Failure to open existing log file (non fatal) That itself isn't necessarily a problem (and why it says non fatal) - it just means that most likely the a transaction log file was truncated from a previous crash. It may be unrelated to the other issues you are seeing. -Yonik http://lucidworks.com
Re: preferred container for running SolrCloud
We run Jetty 8 and 9 with Solr. No issues I can think of. We use Jetty interally anyways, and it seemed to be the most common container out there for Solr (from reading this mailinglist, articles, etc), so that made me feel a bit better if I needed advice or help from the community - not to say there isn't a lot of Tomcat + Solr knowledge on the list. Performance-wise, years back I heard Jetty was the faster/lighter-on-RAM container in regards to Tomcat, but recent benchmarks I've seen out there seem to indicate Tomcat is on par or possibly faster now, although I believe while using more RAM. Don't quote me here. I'd love if someone could do a Solr-specific benchmark. Another neat, but sort of unimportant tidbit is Google App Engine went with Jetty, which to me indicates the Jetty project isn't going away anytime soon. Who knows, Google may even submit back valuable improvements to the project. Live in hope! Tim On 11/07/13 08:14 PM, Saikat Kanjilal wrote: One last thing, no issues with jetty. The issues we did have was actually running separate zookeeper clusters. From: sxk1...@hotmail.com To: solr-user@lucene.apache.org Subject: RE: preferred container for running SolrCloud Date: Thu, 11 Jul 2013 20:13:27 -0700 Separate Zookeeper. Date: Thu, 11 Jul 2013 19:27:18 -0700 Subject: Re: preferred container for running SolrCloud From: docbook@gmail.com To: solr-user@lucene.apache.org With the embedded Zookeeper or separate Zookeeper? Also have run into any issues with running SolrCloud on jetty? On Thu, Jul 11, 2013 at 7:01 PM, Saikat Kanjilalsxk1...@hotmail.comwrote: We're running under jetty. Sent from my iPhone On Jul 11, 2013, at 6:06 PM, Ali, Saqibdocbook@gmail.com wrote: 1) Jboss 2) Jetty 3) Tomcat 4) Other.. ?
Re: preferred container for running SolrCloud
Very good point, Furkan. The unit tests being ran against Jetty is another very good reason to feel safer on Jetty, IMHO. I'm assuming the SolrCloud ChaosMonkey tests are ran against Jetty as well? Tim On 13/07/13 02:46 PM, Furkan KAMACI wrote: Of course you may have some reasons to use Tomcat or anything else (i.e. your stuff may have more experience at Tomcat etc.) However developers generally runs Jetty because it is default for Solr and I should point that Solr unit tests run against jetty (in fact, a specific version of Jetty) and well tested (if you search in mail list you can find some conversations about it). If you follow Solr developer list you may realize using a well tested container or not. For example: https://issues.apache.org/jira/browse/SOLR-4716 and https://issues.apache.org/jira/browse/SOLR-4584?focusedCommentId=13625276page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13625276can show that there maybe some bugs for non Jetty containers and if you choose any other container except for Jetty you can hit one of them. If you want to look at the comparison of Jetty vs. Tomcat I suggest you look at here: http://www.openlogic.com/wazi/bid/257366/Power-Java-based-web-apps-with-Jetty-application-server and here: http://www.infoq.com/news/2009/08/google-chose-jetty 2013/7/13 Tim Vaillancourtt...@elementspace.com We run Jetty 8 and 9 with Solr. No issues I can think of. We use Jetty interally anyways, and it seemed to be the most common container out there for Solr (from reading this mailinglist, articles, etc), so that made me feel a bit better if I needed advice or help from the community - not to say there isn't a lot of Tomcat + Solr knowledge on the list. Performance-wise, years back I heard Jetty was the faster/lighter-on-RAM container in regards to Tomcat, but recent benchmarks I've seen out there seem to indicate Tomcat is on par or possibly faster now, although I believe while using more RAM. Don't quote me here. I'd love if someone could do a Solr-specific benchmark. Another neat, but sort of unimportant tidbit is Google App Engine went with Jetty, which to me indicates the Jetty project isn't going away anytime soon. Who knows, Google may even submit back valuable improvements to the project. Live in hope! Tim On 11/07/13 08:14 PM, Saikat Kanjilal wrote: One last thing, no issues with jetty. The issues we did have was actually running separate zookeeper clusters. From: sxk1...@hotmail.com To: solr-user@lucene.apache.org Subject: RE: preferred container for running SolrCloud Date: Thu, 11 Jul 2013 20:13:27 -0700 Separate Zookeeper. Date: Thu, 11 Jul 2013 19:27:18 -0700 Subject: Re: preferred container for running SolrCloud From: docbook@gmail.com To: solr-user@lucene.apache.org With the embedded Zookeeper or separate Zookeeper? Also have run into any issues with running SolrCloud on jetty? On Thu, Jul 11, 2013 at 7:01 PM, Saikat Kanjilalsxk1...@hotmail.com wrote: We're running under jetty. Sent from my iPhone On Jul 11, 2013, at 6:06 PM, Ali, Saqibdocbook@gmail.com wrote: 1) Jboss 2) Jetty 3) Tomcat 4) Other.. ?
Re: documentCache not used in 4.3.1?
That's a good idea, I'll try that next week. Thanks! Tim On 29/06/13 12:39 PM, Erick Erickson wrote: Tim: Yeah, this doesn't make much sense to me either since, as you say, you should be seeing some metrics upon occasion. But do note that the underlying cache only gets filled when getting documents to return in query results, since there's no autowarming going on it may come and go. But you can test this pretty quickly by lengthening your autocommit interval or just not indexing anything for a while, then run a bunch of queries and look at your cache stats. That'll at least tell you whether it works at all. You'll have to have hard commits turned off (or openSearcher set to 'false') for that check too. Best Erick On Sat, Jun 29, 2013 at 2:48 PM, Vaillancourt, Timtvaillanco...@ea.comwrote: Yes, we are softCommit'ing every 1000ms, but that should be enough time to see metrics though, right? For example, I still get non-cumulative metrics from the other caches (which are also throw away). I've also curl/sampled enough that I probably should have seen a value by now. If anyone else can reproduce this on 4.3.1 I will feel less crazy :). Cheers, Tim -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Saturday, June 29, 2013 10:13 AM To: solr-user@lucene.apache.org Subject: Re: documentCache not used in 4.3.1? It's especially weird that the hit ratio is so high and you're not seeing anything in the cache. Are you perhaps soft committing frequently? Soft commits throw away all the top-level caches including documentCache I think Erick On Fri, Jun 28, 2013 at 7:23 PM, Tim Vaillancourtt...@elementspace.com wrote: Thanks Otis, Yeah I realized after sending my e-mail that doc cache does not warm, however I'm still lost on why there are no other metrics. Thanks! Tim On 28 June 2013 16:22, Otis Gospodneticotis.gospodne...@gmail.com wrote: Hi Tim, Not sure about the zeros in 4.3.1, but in SPM we see all these numbers are non-0, though I haven't had the chance to confirm with Solr 4.3.1. Note that you can't really autowarm document cache... Otis -- Solr ElasticSearch Support -- http://sematext.com/ Performance Monitoring -- http://sematext.com/spm On Fri, Jun 28, 2013 at 7:14 PM, Tim Vaillancourt t...@elementspace.com wrote: Hey guys, This has to be a stupid question/I must be doing something wrong, but after frequent load testing with documentCache enabled under Solr 4.3.1 with autoWarmCount=150, I'm noticing that my documentCache metrics are always zero for non-cumlative. At first I thought my commit rate is fast enough I just never see the non-cumlative result, but after 100s of samples I still always get zero values. Here is the current output of my documentCache from Solr's admin for 1 core: - documentCache http://localhost:8983/solr/#/channels_shard1_replica2/plugins/cache?en try=documentCache - class:org.apache.solr.search.LRUCache - version:1.0 - description:LRU Cache(maxSize=512, initialSize=512, autowarmCount=150, regenerator=null) - src:$URL: https:/ /svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_4_3/ solr/core/src/java/org/apache/solr/search/LRUCache.java https://svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_4_3/s olr/core/src/java/org/apache/solr/search/LRUCache.java $ - stats: - lookups:0 - hits:0 - hitratio:0.00 - inserts:0 - evictions:0 - size:0 - warmupTime:0 - cumulative_lookups:65198986 - cumulative_hits:63075669 - cumulative_hitratio:0.96 - cumulative_inserts:2123317 - cumulative_evictions:1010262 The cumulative values seem to rise, suggesting doc cache is working, but at the same time it seems I never see non-cumlative metrics, most importantly warmupTime. Am I doing something wrong, is this normal/by-design, or is there an issue here? Thanks for helping with my silly question! Have a good weekend, Tim
documentCache not used in 4.3.1?
Hey guys, This has to be a stupid question/I must be doing something wrong, but after frequent load testing with documentCache enabled under Solr 4.3.1 with autoWarmCount=150, I'm noticing that my documentCache metrics are always zero for non-cumlative. At first I thought my commit rate is fast enough I just never see the non-cumlative result, but after 100s of samples I still always get zero values. Here is the current output of my documentCache from Solr's admin for 1 core: - documentCachehttp://localhost:8983/solr/#/channels_shard1_replica2/plugins/cache?entry=documentCache - class:org.apache.solr.search.LRUCache - version:1.0 - description:LRU Cache(maxSize=512, initialSize=512, autowarmCount=150, regenerator=null) - src:$URL: https:/ /svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_4_3/ solr/core/src/java/org/apache/solr/search/LRUCache.javahttps://svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_4_3/solr/core/src/java/org/apache/solr/search/LRUCache.java$ - stats: - lookups:0 - hits:0 - hitratio:0.00 - inserts:0 - evictions:0 - size:0 - warmupTime:0 - cumulative_lookups:65198986 - cumulative_hits:63075669 - cumulative_hitratio:0.96 - cumulative_inserts:2123317 - cumulative_evictions:1010262 The cumulative values seem to rise, suggesting doc cache is working, but at the same time it seems I never see non-cumlative metrics, most importantly warmupTime. Am I doing something wrong, is this normal/by-design, or is there an issue here? Thanks for helping with my silly question! Have a good weekend, Tim
Re: documentCache not used in 4.3.1?
To answer some of my own question, Shawn H's great reply on this thread explains why I see no autoWarming on doc cache: http://www.marshut.com/iznwr/soft-commit-and-document-cache.html It is still unclear to me why I see no other metrics, however. Thanks Shawn, Tim On 28 June 2013 16:14, Tim Vaillancourt t...@elementspace.com wrote: Hey guys, This has to be a stupid question/I must be doing something wrong, but after frequent load testing with documentCache enabled under Solr 4.3.1 with autoWarmCount=150, I'm noticing that my documentCache metrics are always zero for non-cumlative. At first I thought my commit rate is fast enough I just never see the non-cumlative result, but after 100s of samples I still always get zero values. Here is the current output of my documentCache from Solr's admin for 1 core: - documentCachehttp://localhost:8983/solr/#/channels_shard1_replica2/plugins/cache?entry=documentCache - class:org.apache.solr.search.LRUCache - version:1.0 - description:LRU Cache(maxSize=512, initialSize=512, autowarmCount=150, regenerator=null) - src:$URL: https:/ /svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_4_3/ solr/core/src/java/org/apache/solr/search/LRUCache.javahttps://svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_4_3/solr/core/src/java/org/apache/solr/search/LRUCache.java$ - stats: - lookups:0 - hits:0 - hitratio:0.00 - inserts: 0 - evictions:0 - size:0 - warmupTime:0 - cumulative_lookups: 65198986 - cumulative_hits:63075669 - cumulative_hitratio:0.96 - cumulative_inserts: 2123317 - cumulative_evictions:1010262 The cumulative values seem to rise, suggesting doc cache is working, but at the same time it seems I never see non-cumlative metrics, most importantly warmupTime. Am I doing something wrong, is this normal/by-design, or is there an issue here? Thanks for helping with my silly question! Have a good weekend, Tim
Re: documentCache not used in 4.3.1?
Thanks Otis, Yeah I realized after sending my e-mail that doc cache does not warm, however I'm still lost on why there are no other metrics. Thanks! Tim On 28 June 2013 16:22, Otis Gospodnetic otis.gospodne...@gmail.com wrote: Hi Tim, Not sure about the zeros in 4.3.1, but in SPM we see all these numbers are non-0, though I haven't had the chance to confirm with Solr 4.3.1. Note that you can't really autowarm document cache... Otis -- Solr ElasticSearch Support -- http://sematext.com/ Performance Monitoring -- http://sematext.com/spm On Fri, Jun 28, 2013 at 7:14 PM, Tim Vaillancourt t...@elementspace.com wrote: Hey guys, This has to be a stupid question/I must be doing something wrong, but after frequent load testing with documentCache enabled under Solr 4.3.1 with autoWarmCount=150, I'm noticing that my documentCache metrics are always zero for non-cumlative. At first I thought my commit rate is fast enough I just never see the non-cumlative result, but after 100s of samples I still always get zero values. Here is the current output of my documentCache from Solr's admin for 1 core: - documentCache http://localhost:8983/solr/#/channels_shard1_replica2/plugins/cache?entry=documentCache - class:org.apache.solr.search.LRUCache - version:1.0 - description:LRU Cache(maxSize=512, initialSize=512, autowarmCount=150, regenerator=null) - src:$URL: https:/ /svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_4_3/ solr/core/src/java/org/apache/solr/search/LRUCache.java https://svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_4_3/solr/core/src/java/org/apache/solr/search/LRUCache.java $ - stats: - lookups:0 - hits:0 - hitratio:0.00 - inserts:0 - evictions:0 - size:0 - warmupTime:0 - cumulative_lookups:65198986 - cumulative_hits:63075669 - cumulative_hitratio:0.96 - cumulative_inserts:2123317 - cumulative_evictions:1010262 The cumulative values seem to rise, suggesting doc cache is working, but at the same time it seems I never see non-cumlative metrics, most importantly warmupTime. Am I doing something wrong, is this normal/by-design, or is there an issue here? Thanks for helping with my silly question! Have a good weekend, Tim
Re: Dataless nodes in SolrCloud?
To answer Otis' question of whether or not this would be useful, the trouble is, I don't know! :) It very well could be useful for my use case. Is there any way to determine the impact of result merging (time spent? Etc?) aside from just 'trying it'? Cheers, Tim On 10 June 2013 14:48, Otis Gospodnetic otis.gospodne...@gmail.com wrote: I think it would be useful. I know people using ElasticSearch use it relatively often. Is aggregation expensive enough to warrant a separate box? I think it can get expensive if X in rows=X is highish. We've seen this reported here on the Solr ML before So to make sorting/merging of N result set from N data nodes on this aggregator node you may want to get all the CPU you can get and not have the CPU simultaneously also try to handle incoming queries. Otis -- Solr ElasticSearch Support http://sematext.com/ On Mon, Jun 10, 2013 at 5:32 AM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: No, there's no such notion in SolrCloud. Each node that is part of a collection/shard is a replica and will handle indexing/querying. Even though you can send a request to a node containing a different collection, the request would just be forwarded to the right node and will be executed there. That being said, do people find such a feature useful? Is aggregation expensive enough to warrant a separate box? In a distributed search, the local index is used. One'd would just be adding a couple of extra network requests if you don't have a local index. On Sun, Jun 9, 2013 at 11:18 AM, Otis Gospodnetic otis.gospodne...@gmail.com wrote: Hi, Is there a notion of a data-node vs. non-data node in SolrCloud? Something a la http://www.elasticsearch.org/guide/reference/modules/node/ Thanks, Otis Solr ElasticSearch Support http://sematext.com/ -- Regards, Shalin Shekhar Mangar.
Re: Lucene/Solr Filesystem tunings
I figured as much for atime, thanks Otis! I haven't ran benchmarks just yet, but I'll be sure to share whatever I find. I plan to try ext4 vs xfs. I am also curious what effect disabling journaling (ext2) would have, relying on SolrCloud to manage 'consistency' over many instances vs FS journaling. Anyone have opinions there? If I test I'll share the results. Cheers, Tim On 4 June 2013 16:11, Otis Gospodnetic otis.gospodne...@gmail.com wrote: Hi, You can use noatime, nodiratime, nothing in Solr depends on that as far as I know. We tend to use ext4. Some people love xfs. Want to run some benchmarks and publish the results? :) Otis -- Solr ElasticSearch Support http://sematext.com/ On Tue, Jun 4, 2013 at 6:48 PM, Tim Vaillancourt t...@elementspace.com wrote: Hey all, Does anyone have any advice or special filesytem tuning to share for Lucene/Solr, and which file systems they like more? Also, does Lucene/Solr care about access times if I turn them off (I think I doesn't care)? A bit unrelated: What are people's opinions on reducing some consistency things like filesystem journaling, etc (ext2?) due to SolrCloud's additional HA with replicas? How about RAID 0 x 3 replicas or so? Thanks! Tim Vaillancourt
Re: Two instances of solr - the same datadir?
If it makes you feel better, I also considered this approach when I was in the same situation with a separate indexer and searcher on one Physical linux machine. My main concern was re-using the FS cache between both instances - If I replicated to myself there would be two independent copies of the index, FS-cached separately. I like the suggestion of using autoCommit to reload the index. If I'm reading that right, you'd set an autoCommit on 'zero docs changing', or just 'every N seconds'? Did that work? Best of luck! Tim On 5 June 2013 10:19, Roman Chyla roman.ch...@gmail.com wrote: So here it is for a record how I am solving it right now: Write-master is started with: -Dmontysolr.warming.enabled=false -Dmontysolr.write.master=true -Dmontysolr.read.master= http://localhost:5005 Read-master is started with: -Dmontysolr.warming.enabled=true -Dmontysolr.write.master=false solrconfig.xml changes: 1. all index changing components have this bit, enable=${montysolr.master:true} - ie. updateHandler class=solr.DirectUpdateHandler2 enable=${montysolr.master:true} 2. for cache warming de/activation listener event=newSearcher class=solr.QuerySenderListener enable=${montysolr.enable.warming:true}... 3. to trigger refresh of the read-only-master (from write-master): listener event=postCommit class=solr.RunExecutableListener enable=${montysolr.master:true} str name=execurl/str str name=dir./str bool name=waitfalse/bool arr name=args str${montysolr.read.master:http://localhost }/solr/admin/cores?wt=jsonamp;action=RELOADamp;core=collection1/str/arr /listener This works, I still don't like the reload of the whole core, but it seems like the easiest thing to do now. -- roman On Wed, Jun 5, 2013 at 12:07 PM, Roman Chyla roman.ch...@gmail.com wrote: Hi Peter, Thank you, I am glad to read that this usecase is not alien. I'd like to make the second instance (searcher) completely read-only, so I have disabled all the components that can write. (being lazy ;)) I'll probably use http://wiki.apache.org/solr/CollectionDistribution to call the curl after commit, or write some IndexReaderFactory that checks for changes The problem with calling the 'core reload' - is that it seems lots of work for just opening a new searcher, eeekkk...somewhere I read that it is cheap to reload a core, but re-opening the index searches must be definitely cheaper... roman On Wed, Jun 5, 2013 at 4:03 AM, Peter Sturge peter.stu...@gmail.com wrote: Hi, We use this very same scenario to great effect - 2 instances using the same dataDir with many cores - 1 is a writer (no caching), the other is a searcher (lots of caching). To get the searcher to see the index changes from the writer, you need the searcher to do an empty commit - i.e. you invoke a commit with 0 documents. This will refresh the caches (including autowarming), [re]build the relevant searchers etc. and make any index changes visible to the RO instance. Also, make sure to use lockTypenative/lockType in solrconfig.xml to ensure the two instances don't try to commit at the same time. There are several ways to trigger a commit: Call commit() periodically within your own code. Use autoCommit in solrconfig.xml. Use an RPC/IPC mechanism between the 2 instance processes to tell the searcher the index has changed, then call commit when called (more complex coding, but good if the index changes on an ad-hoc basis). Note, doing things this way isn't really suitable for an NRT environment. HTH, Peter On Tue, Jun 4, 2013 at 11:23 PM, Roman Chyla roman.ch...@gmail.com wrote: Replication is fine, I am going to use it, but I wanted it for instances *distributed* across several (physical) machines - but here I have one physical machine, it has many cores. I want to run 2 instances of solr because I think it has these benefits: 1) I can give less RAM to the writer (4GB), and use more RAM for the searcher (28GB) 2) I can deactivate warming for the writer and keep it for the searcher (this considerably speeds up indexing - each time we commit, the server is rebuilding a citation network of 80M edges) 3) saving disk space and better OS caching (OS should be able to use more RAM for the caching, which should result in faster operations - the two processes are accessing the same index) Maybe I should just forget it and go with the replication, but it doesn't 'feel right' IFF it is on the same physical machine. And Lucene specifically has a method for discovering changes and re-opening the index (DirectoryReader.openIfChanged) Am I not seeing something? roman On Tue, Jun 4, 2013 at 5:30 PM, Jason Hellman jhell...@innoventsolutions.com wrote: Roman, Could you be more specific as to
Lucene/Solr Filesystem tunings
Hey all, Does anyone have any advice or special filesytem tuning to share for Lucene/Solr, and which file systems they like more? Also, does Lucene/Solr care about access times if I turn them off (I think I doesn't care)? A bit unrelated: What are people's opinions on reducing some consistency things like filesystem journaling, etc (ext2?) due to SolrCloud's additional HA with replicas? How about RAID 0 x 3 replicas or so? Thanks! Tim Vaillancourt
SolrCloud Load Balancer weight
Hey guys, I have recently looked into an issue with my Solrcloud related to very high load when performing a full-import on DIH. While some work could be done to improve my queries, etc in DIH, this lead me to a new feature idea in Solr: weighted internal load balancing. Basically, I can think of two uses cases, and how a weight on load balancing could help: 1) My situation from above - I'm doing a huge import and want SolrCloud to direct fewer queries to the node handling the DIH full-import, say weight 10/100 (10%) instead of 100/100. 2) Mixed hardware - Although I wouldn't recommend doing this, some people may have mixed hardware, some capable of handling more or less traffic. These weights wouldn't be expected to be exact, just best-effort to be able generally to influence load on nodes inside the cluster. They of course would only matter on reads (/get, /select, etc). A full blown approach would have weight awareness in the Zookeeper-aware client implementation, and on inter-node replica requests. Should I JIRA this? Thoughts? Tim
Re: seeing lots of autowarming messages in log during DIH indexing
Interesting. In your scenario would you use commit=true, or commit=false, and do you use auto soft/hard commits? Secondly, if you did use auto-soft/hard commits, how would they affect this scenario? I'm guessing even with commit=false, the autoCommits would be triggered either by time or max docs, which opens a searcher anyways. A total guess though. I'm interested in doing full-imports without committing/opening new searchers until it is complete. Cheers! Tim On 20/05/13 03:59 PM, shreejay wrote: geeky2 wrote you mean i would add this switch to my script that kicks of the dataimport? exmaple: OUTPUT=$(curl -v http://${SERVER}.intra.searshc.com:${PORT}/solrpartscat/${CORE}/dataimport -F command=full-import -F clean=${CLEAN} -F commit=${COMMIT} -F optimize=${OPTIMIZE} -F openSearcher=false) Yes. Thats correct geeky2 wrote what needs to be done _AFTER_ the DIH finishes (if anything)? eg, does this need to be turned back on after the DIH has finished? Yes. You need to open the searcher to be able to search. Just run another commit with openSearcher = true , once your indexing process finishes. -- View this message in context: http://lucene.472066.n3.nabble.com/seeing-lots-of-autowarming-messages-in-log-during-DIH-indexing-tp4064649p4064768.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: protect solr pages
A lot of people (including me) are asking for this type of support in this JIRA: https://issues.apache.org/jira/browse/SOLR-4470 Although brought up frequently on the list, the effort doesn't seem to be moving too much. I can confirm that the most recent patch on this JIRA will work with the specific revision of 4.2.x though. Cheers, Tim On 17 May 2013 13:11, gpssolr2020 psgoms...@gmail.com wrote: Hi, i want implement security through jetty realm in solr4. So i configured related stuffs in realm.properties ,jetty.xml, webdefault.xml under /solrhome/example/etc. But still it is not working. Please advise. Thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/protect-solr-pages-tp4064274.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Does solr cloud support rename or swap function for collection?
I added a brief description on CREATEALIAS here, feel free to tweak: http://wiki.apache.org/solr/SolrCloud#Managing_collections_via_the_Collections_API Tim On 07/04/13 05:29 PM, Mark Miller wrote: It's pretty simple - just as Brad said, it's just http://localhost:8983/solr/admin/collections?action=CREATEALIASname=aliascollections=collection1,collection2,… You also have action=DELETEALIAS CREATEALIAS will create and update. For update requests, you only want a 1to1 alias. For read requests, you can map 1to1 or 1toN. I've also started work on shard level aliases, but I've yet to get back to finishing it. - Mark On Apr 7, 2013, at 5:10 PM, Tim Vaillancourtt...@elementspace.com wrote: I aim to use this feature in more in testing soon. I'll be sure to doc what I can. Cheers, Tim On 07/04/13 12:28 PM, Mark Miller wrote: On Apr 7, 2013, at 9:44 AM, bradhill99bradhil...@yahoo.com wrote: Thanks Mark for this great feature but I suggest you can update the wiki too. Yeah, I've stopped updating the wiki for a while now looking back - paralysis on how to handle versions (I didn't want to do the std 'this applies to 4.1', 'this applied to 4.0' all over the page) and the current likely move to a new Confluence wiki with Docs based on documentation LucidWorks recently donated to the project. That's all a lot of work away still I guess. I'll try and add some basic doc for this to the SolrCloud wiki page soon. - Mark
Re: Storing Solr Index on NFS
If centralization of storage is your goal by choosing NFS, iSCSI works reasonably well with SOLR indexes, although good local-storage will always be the overall winner. I noticed a near 5% degredation in overall search performance (casual testing, nothing scientific) when moving a 40-50GB indexes to iSCSI (10GBe network) from a 4x7200rpm RAID 10 local SATA disk setup. Tim On 15/04/13 09:59 AM, Walter Underwood wrote: Solr 4.2 does have field compression which makes smaller indexes. That will reduce the amount of network traffic. That probably does not help much, because I think the latency of NFS is what causes problems. wunder On Apr 15, 2013, at 9:52 AM, Ali, Saqib wrote: Hello Walter, Thanks for the response. That has been my experience in the past as well. But I was wondering if there new are things in Solr 4 and NFS 4.1 that make the storing of indexes on a NFS mount feasible. Thanks, Saqib On Mon, Apr 15, 2013 at 9:47 AM, Walter Underwoodwun...@wunderwood.orgwrote: On Apr 15, 2013, at 9:40 AM, Ali, Saqib wrote: Greetings, Are there any issues with storing Solr Indexes on a NFS share? Also any recommendations for using NFS for Solr indexes? I recommend that you do not put Solr indexes on NFS. It can be very slow, I measured indexing as 100X slower on NFS a few years ago. It is not safe to share Solr index files between two Solr servers, so there is no benefit to NFS. wunder -- Walter Underwood wun...@wunderwood.org -- Walter Underwood wun...@wunderwood.org
Re: Basic auth on SolrCloud /admin/* calls
I've thought about this too, and have heard of some people running a lightweight http proxy upstream of Solr. With the right network restrictions (only way for a client to reach solr is via a proxy + the nodes can still talk to each other), you could achieve the same thing SOLR-4470 is doing, with the drawback of additional proxy and firewall components to maintain, plus added overhead on HTTP calls. A benefit though is a lightweight proxy ahead of Solr could implement HTTP caching, taking some load off of Solr. In a perfect world, I'd say rolling out SOLR-4470 is the best solution, but again, it seems to be losing momentum (please Vote/support the discussion!). While proxies can achieve this, I think enough people have pondered about this to implement this as a feature in Solr. Tim On 14/04/13 12:32 AM, adfel70 wrote: Did anyone try blocking access to the ports in the firewall level, and allowing all the solr servers in the cluster+given control-machines? Assuming that search request to solr run though a proxy.. -- View this message in context: http://lucene.472066.n3.nabble.com/Basic-auth-on-SolrCloud-admin-calls-tp4052266p4055868.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Basic auth on SolrCloud /admin/* calls
This JIRA covers a lot of what you're asking: https://issues.apache.org/jira/browse/SOLR-4470 I am also trying to get this sort of solution in place, but it seems to be dying off a bit. Hopefully we can get some interest on this again, this question comes up every few weeks, it seems. I can confirm the latest patch from this JIRA works as expected, although my primary concern is the credentials appear in the JVM command, and I'd like to move that to a file. Cheers, Tim On 11/04/13 10:41 AM, Michael Della Bitta wrote: It's fairly easy to lock down Solr behind basic auth using just the servlet container it's running in, but the problem becomes letting services that *should* be able to access Solr in. I've rolled with basic auth in some setups, but certain deployments such as Solr Cloud or sharded setups don't play well with auth because there's no good way to configure them to use it. Michael Della Bitta Appinions 18 East 41st Street, 2nd Floor New York, NY 10017-6271 www.appinions.com Where Influence Isn’t a Game On Thu, Apr 11, 2013 at 1:19 PM, Raymond Wikerrwi...@gmail.com wrote: On Apr 11, 2013, at 17:12 , adfel70adfe...@gmail.com wrote: Hi I need to implement security in solr as follows: 1. prevent unauthorized users from accessing to solr admin pages. 2. prevent unauthorized users from performing solr operations - both /admin and /update. Is the conclusion of this thread is that this is not possible at the moment? The obvious solution (to me, at least) would be to (1) restrict access to solr to localhost, and (2) use a reverse proxy (e.g, apache) on the same node to provide authenticated restricted access to solr. I think I've seen recipes for (1), somewhere, and I've used (2) fairly extensively for similar purposes.
CSS appearing in Solr 4.2.1 logs
Hey guys, This sounds crazy, but does anyone see strange CSS/HTML in their Solr 4.2.x logs? Often I am finding entire CSS documents (likely from Solr's Admin) in my jetty's stderrout log. Example: 2013-04-12 00:23:20.363:WARN:oejh.HttpGenerator:Ignoring extra content /** * @license RequireJS order 1.0.5 Copyright (c) 2010-2011, The Dojo Foundation All Rights Reserved. * Available via the MIT or new BSD license. * see: http://github.com/jrburke/requirejs for details */ /*jslint nomen: false, plusplus: false, strict: false */ /*global require: false, define: false, window: false, document: false, setTimeout: false */ //Specify that requirejs optimizer should wrap this code in a closure that //maps the namespaced requirejs API to non-namespaced local variables. /*requirejs namespace: true */ (function () { //Sadly necessary browser inference due to differences in the way //that browsers load and execute dynamically inserted javascript //and whether the script/cache method works when ordered execution is //desired. Currently, Gecko and Opera do not load/fire onload for scripts with //type=script/cache but they execute injected scripts in order //unless the 'async' flag is present. //However, this is all changing in latest browsers implementing HTML5 //spec. With compliant browsers .async true by default, and //if false, then it will execute in order. Favor that test first for forward //compatibility. var testScript = typeof document !== undefined typeof window !== undefined document.createElement(script), supportsInOrderExecution = testScript (testScript.async || ((window.opera Object.prototype.toString.call(window.opera) === [object Opera]) || //If Firefox 2 does not have to be supported, then //a better check may be: //('mozIsLocallyAvailable' in window.navigator) (MozAppearance in document.documentElement.style))), Due this, my logs are getting really huge, and sometimes it breaks my tail -F commands on the logs, printing what looks like binary, so there is possibly some other junk in my logs aside from CSS. I am running Jetty 8.1.10 and Solr 4.2.1 (stable build). Cheers! Tim Vaillancourt
/admin/stats.jsp in SolrCloud
Hey guys, This feels like a silly question already, here goes: In SolrCloud it doesn't seem obvious to me where one can grab stats regarding caches for a given core using an http call (JSON/XML). Those values are available in the web-based app, but I am looking for a http call that would return this same data. In 3.x this was located at /admin/stats.php, and I used a script to grab the data, but in SolrCloud I am unclear and would like to add that to the docs below: http://wiki.apache.org/solr/SolrCaching#Overview http://wiki.apache.org/solr/SolrAdminStats Thanks! Tim
Re: /admin/stats.jsp in SolrCloud
There we go, Thanks Stefan! You're right, 3.x has this as well, I guess I missed it. I'll add this to the docs for SolrCaching. Cheers! Tim On 10 April 2013 13:19, Stefan Matheis matheis.ste...@gmail.com wrote: Hey Tim SolrCloud-Mode or not does not really matter for this fact .. in 4.x (and afaik as well in 3.x) you can find the stats here: http://host:port/solr/admin/mbeans?stats=true in xml or json (setting the responsewriter with wt=json) - as you like HTH Stefan On Wednesday, April 10, 2013 at 9:53 PM, Tim Vaillancourt wrote: Hey guys, This feels like a silly question already, here goes: In SolrCloud it doesn't seem obvious to me where one can grab stats regarding caches for a given core using an http call (JSON/XML). Those values are available in the web-based app, but I am looking for a http call that would return this same data. In 3.x this was located at /admin/stats.php, and I used a script to grab the data, but in SolrCloud I am unclear and would like to add that to the docs below: http://wiki.apache.org/solr/SolrCaching#Overview http://wiki.apache.org/solr/SolrAdminStats Thanks! Tim
Re: Solr 4.2.1 Branch
There is also this path for the SVN guys out there: https://svn.apache.org/repos/asf/lucene/dev/tags/lucene_solr_4_2_1 Cheers, Tim On 05/04/13 05:53 PM, Jagdish Nomula wrote: That works out. Thanks for shooting the link. On Fri, Apr 5, 2013 at 5:51 PM, Jack Krupanskyj...@basetechnology.comwrote: You want the tagged branch: https://github.com/apache/**lucene-solr/tree/lucene_solr_**4_2_1https://github.com/apache/lucene-solr/tree/lucene_solr_4_2_1 -- Jack Krupansky -Original Message- From: Jagdish Nomula Sent: Friday, April 05, 2013 8:36 PM To: solr-user@lucene.apache.org Subject: Solr 4.2.1 Branch Hello, I was trying to get hold of solr 4.2.1 branch on github. I see https://github.com/apache/**lucene-solr/tree/lucene_solr_**4_2https://github.com/apache/lucene-solr/tree/lucene_solr_4_2. I don't see any branch for 4.2.1. Am i missing anything ?. Thanks in advance for your help. -- ***Jagdish Nomula* Sr. Manager Search Simply Hired, Inc. 370 San Aleso Ave., Ste 200 Sunnyvale, CA 94085 office - 408.400.4700 cell - 408.431.2916 email - jagd...@simplyhired.comyourem...@simplyhired.com www.simplyhired.com
Re: Does solr cloud support rename or swap function for collection?
I aim to use this feature in more in testing soon. I'll be sure to doc what I can. Cheers, Tim On 07/04/13 12:28 PM, Mark Miller wrote: On Apr 7, 2013, at 9:44 AM, bradhill99bradhil...@yahoo.com wrote: Thanks Mark for this great feature but I suggest you can update the wiki too. Yeah, I've stopped updating the wiki for a while now looking back - paralysis on how to handle versions (I didn't want to do the std 'this applies to 4.1', 'this applied to 4.0' all over the page) and the current likely move to a new Confluence wiki with Docs based on documentation LucidWorks recently donated to the project. That's all a lot of work away still I guess. I'll try and add some basic doc for this to the SolrCloud wiki page soon. - Mark
Re: Zookeeper dataimport.properties node
It its in your SolrCloud-based collection's config, it won't be on disk and only in Zookeeper. What I did was use the XInclude feature to include a file with my dataimport handler properties, so I'm assuming you're doing the same. Use a relative path to the config dir in Zookeeper, ie: no path and just 'dataimport.properties', unless it is in a subdir of your config, then 'subdir/dataimport.properties'. I have a deployment system template the properties file before it is inserted into Zookeeper. Tim On 03/04/13 08:48 PM, Nathan Findley wrote: - Is dataimport.properties ever written to the filesystem? (Trying to determine if I have a permissions error because I don't see it anywhere on disk). - How do you manually edit dataimport.properties? My system is periodically pulling in new data. If that process has issues, I want to be able to reset to an earlier known good timestamp value. Regards, Nate