Re: Finding out optimal hash ranges for shard split
Yes - I'm using 2 level composite ids and that has caused the imbalance for some shards. Its cars data and the composite ids are of the form year-make!model-and couple of other specifications. e.g. 2013Ford!Edge!123456 - but there are just far too many Ford 2013 or 2011 cars that go and occupy the same shards. This was done so as co-location of these docs is required as well for a few of the search requirements - to avoid it hitting all shards all the time and all queries do have the year and make combinations always specified and its easier to work out the target shard for the query. Regarding storing the hash against each document and then querying to find out the optimal ranges - could it be done so that Solr maintains incremental counters for each of the hash in the range for the shard - and then the collections Splitshard API could use this internally to propose the optimal shard ranges for the split? -- View this message in context: http://lucene.472066.n3.nabble.com/Finding-out-optimal-hash-ranges-for-shard-split-tp4203609p4204124.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Finding out optimal hash ranges for shard split
Okay - Thanks for the confirmation Shalin. Could this be a feature request in the Collections API - that we have a Split shard dry run API that accepts sub-shards count as a request param and returns the optimal shard ranges for the number of sub-shards requested to be created along with the respective document counts for each of the sub-shards? The users can then use this shard ranges for the actual split? -- View this message in context: http://lucene.472066.n3.nabble.com/Finding-out-optimal-hash-ranges-for-shard-split-tp4203609p4204100.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Finding out optimal hash ranges for shard split
Looks like its not possible to find out the optimal hash ranges for a split before you actually split it. So the only way out is to keep splitting out the large subshards? -- View this message in context: http://lucene.472066.n3.nabble.com/Finding-out-optimal-hash-ranges-for-shard-split-tp4203609p4204045.html Sent from the Solr - User mailing list archive at Nabble.com.
Finding out optimal hash ranges for shard split
Hi all, Before doing a splitshard - Is there a way to figure out optimal hash ranges for the shard that will evenly split the documents on the new sub-shards that get created? Sort of a dry-run to the actual split shard command with ranges parameter specified with it that just shows the number of docs that will reside on the new sub-shards if the split shard command was executed with a given hash range? Thanks, Anand -- View this message in context: http://lucene.472066.n3.nabble.com/Finding-out-optimal-hash-ranges-for-shard-split-tp4203609.html Sent from the Solr - User mailing list archive at Nabble.com.
Delete Replica API Async Calls not being processed
Hi, I needed to delete a couple replica for a shard and used the Async Collections API calls to do that. I see all my requests in the 'submitted' state but none have been processed yet. (been 4 hours or so) How do I know whether these requests are under process at all? And if required how could I delete these now? I'm using Solr 4.10 Thanks, Anand -- View this message in context: http://lucene.472066.n3.nabble.com/Delete-Replica-API-Async-Calls-not-being-processed-tp4184998.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Leaders in Recovery Failed state
Erick Erickson gmail.com> writes: > > What version of Solr? > > On Tue, Jan 20, 2015 at 7:07 AM, anand.mahajan zerebral.co.in> > wrote: > > Hi all, > > > > > > I have a cluster with 36 Shards and 3 replica per shard. I had to > recently > > restart the entire cluster - most of the shards & replica are back up - > but > > a few shards have not had any leaders for a long long time (close to 18 > > hours now) - I tried reloading these cores and even the servlet > containers > > hosting these cores. Its only now that all the shards have leaders > allocated > > - but few of these Leaders are still shown as Recovery Failed status on > the > > Solr Cloud tree view. > > > > > > I see the following in the logs for these shards - > > INFO - 2015-01-20 14:38:19.797; > > org.apache.solr.handler.admin.CoreAdminHandler; In > WaitForState(recovering): > > collection=collection1, shard=shard1, > thisCore=collection1_shard1_replica3, > > leaderDoesNotNeedRecovery=false, isLeader? true, live=true, > checkLive=true, > > currentState=recovering, localState=recovery_failed, > > nodeName=10.68.77.9:8983_solr, coreNodeName=core_node2, > > onlyIfActiveCheckResult=true, nodeProps: > > > core_node2:{"state":"recovering","core":"collection1_shard1_replica1","node_name":"10.68.77.9:8983_solr","base_url":"http://10.68.77.9:8983/solr"} > > > > > > And on other server hosting the replica for this shard - > > ERROR - 2015-01-20 14:38:20.768; org.apache.solr.common.SolrException; > > org.apache.solr.common.SolrException: I was asked to wait on state > > recovering for shard3 in collection1 on 10.68.77.9:8983_solr but I still > do > > not see the requested state. I see state: recovering live:true leader > from > > ZK: http://10.68.77.3:8983/solr/collection1_shard3_replica3/ > > at > > > org.apache.solr.handler.admin.CoreAdminHandler.handleWaitForStateAction(CoreAdminHandler.java:999) > > at > > > org.apache.solr.handler.admin.CoreAdminHandler.handleRequestInternal(CoreAdminHandler.java:245) > > at > > > org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:188) > > at > > > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) > > at > > > org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:729) > > at > > > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:258) > > at > > > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207) > > at > > > org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419) > > at > > > org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455) > > at > > > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) > > at > > > org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557) > > at > > > org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231) > > at > > > org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075) > > at > > > org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384) > > at > > > org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193) > > at > > > org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009) > > at > > > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) > > at > > > org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) > > at > > > org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154) > > at > > > org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) > > at org.eclipse.jetty.server.Server.handle(Server.java:368) > > at > > > org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489) > > at > > > org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53) > > at > > > org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942) > > at > >
Leaders in Recovery Failed state
Hi all, I have a cluster with 36 Shards and 3 replica per shard. I had to recently restart the entire cluster - most of the shards & replica are back up - but a few shards have not had any leaders for a long long time (close to 18 hours now) - I tried reloading these cores and even the servlet containers hosting these cores. Its only now that all the shards have leaders allocated - but few of these Leaders are still shown as Recovery Failed status on the Solr Cloud tree view. I see the following in the logs for these shards - INFO - 2015-01-20 14:38:19.797; org.apache.solr.handler.admin.CoreAdminHandler; In WaitForState(recovering): collection=collection1, shard=shard1, thisCore=collection1_shard1_replica3, leaderDoesNotNeedRecovery=false, isLeader? true, live=true, checkLive=true, currentState=recovering, localState=recovery_failed, nodeName=10.68.77.9:8983_solr, coreNodeName=core_node2, onlyIfActiveCheckResult=true, nodeProps: core_node2:{"state":"recovering","core":"collection1_shard1_replica1","node_name":"10.68.77.9:8983_solr","base_url":"http://10.68.77.9:8983/solr"} And on other server hosting the replica for this shard - ERROR - 2015-01-20 14:38:20.768; org.apache.solr.common.SolrException; org.apache.solr.common.SolrException: I was asked to wait on state recovering for shard3 in collection1 on 10.68.77.9:8983_solr but I still do not see the requested state. I see state: recovering live:true leader from ZK: http://10.68.77.3:8983/solr/collection1_shard3_replica3/ at org.apache.solr.handler.admin.CoreAdminHandler.handleWaitForStateAction(CoreAdminHandler.java:999) at org.apache.solr.handler.admin.CoreAdminHandler.handleRequestInternal(CoreAdminHandler.java:245) at org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:188) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:729) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:258) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) at org.eclipse.jetty.server.Server.handle(Server.java:368) at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489) at org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53) at org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942) at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640) at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235) at org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72) at org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543) at java.lang.Thread.run(Unknown Source) I see that there is no replica catch-up going on between any of these servers now. Couple of questions - 1. What is it that the Solr cloud is waiting on to allocate the leaders for such shards? 2. Why are few of these shards show leaders in Recovery Failed state? And how do I recover such shards? Thanks, Anand -- View this message in context: http://lucene.472066.n3.nabble.com/Leaders-in-Recovery-Failed-state-tp4180611.html Sent from the Solr - User mailing list archive at
Leaders in Recovery Failed state
Hi all,I have a cluster with 36 Shards and 3 replica per shard. I had to recently restart the entire cluster - most of the shards & replica are back up - but a few shards have not had any leaders for a long long time (close to 18 hours now) - I tried reloading these cores and even the servlet containers hosting these cores. Its only now that all the shards have leaders allocated - but few of these Leaders are still shown as Recovery Failed status on the Solr Cloud tree view.I see the following in the logs for these shards - INFO - 2015-01-20 14:38:19.797; org.apache.solr.handler.admin.CoreAdminHandler; In WaitForState(recovering): collection=collection1, shard=shard1, thisCore=collection1_shard1_replica3, leaderDoesNotNeedRecovery=false, isLeader? true, live=true, checkLive=true, currentState=recovering, localState=recovery_failed, nodeName=10.68.77.9:8983_solr, coreNodeName=core_node2, onlyIfActiveCheckResult=true, nodeProps: core_node2:{"state":"recovering","core":"collection1_shard1_replica1","node_name":"10.68.77.9:8983_solr","base_url":"http://10.68.77.9:8983/solr"}And on other server hosting the replica for this shard - ERROR - 2015-01-20 14:38:20.768; org.apache.solr.common.SolrException; org.apache.solr.common.SolrException: I was asked to wait on state recovering for shard3 in collection1 on 10.68.77.9:8983_solr but I still do not see the requested state. I see state: recovering live:true leader from ZK: http://10.68.77.3:8983/solr/collection1_shard3_replica3/at org.apache.solr.handler.admin.CoreAdminHandler.handleWaitForStateAction(CoreAdminHandler.java:999) at org.apache.solr.handler.admin.CoreAdminHandler.handleRequestInternal(CoreAdminHandler.java:245) at org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:188) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:729) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:258) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) at org.eclipse.jetty.server.Server.handle(Server.java:368) at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489) at org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53) at org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942) at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640) at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235) at org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72) at org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543) at java.lang.Thread.run(Unknown Source)I see that there is no replica catch-up going on between any of these servers now. Couple of questions - 1. What is it that the Solr cloud is waiting on to allocate the leaders for such shards?2. Why are few of these shards show leaders in Recovery Failed state? And how do I recover such shards?Thanks,Anand -- View this message in context: http://lucene.472066.n3.nabble.com/Leaders-in-Recovery-Failed-state-tp4180610.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SolrCloud Slow to boot up
1. I've hosted it with Helios v 0.07 that ships with Solr 4.10 2. Change to solrconfig.xml - a. commits every 10 mins b. soft commits every 10 secs c. disabled all caches as the usage is very random (no end users only services doing the searches) and mostly single requests d. use cold searcher = true -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-Slow-to-boot-up-tp4161098p4161132.html Sent from the Solr - User mailing list archive at Nabble.com.
SolrCloud Slow to boot up
Hello all, Hosted a SolrCloud - 6 Nodes - 36 Shards x 3 Replica each -> 108 cores across 6 servers. Moved in about 250M documents in this cluster. When I restart this cluster - only the leaders per shard comes up live instantly (within a minute) and all the replicas are shown as Recovering on the Cloud screen and all 6 servers are doing some processing (consuming about 4 CPUs at the back and doing a lot of Network IO too) In essence its not doing any reads are writes to the index and I dont see any replication/catch up activity going on too at the back, yet the RAM grows consuming all 96GB available on each box. And all the Recovering replicas recover one by one in about an hour or so. Why is it taking so long to boot up, and what is it doing that is consuming so much CPU, RAM and Network IO? All disks are reading at 100% on all servers during this boot up. Is there are setting I might have missed that will help? FYI - The Zookeeper cluster is on the same 6 boxes. Size of the Solr data dir is about 150GB per server and each box has 96GB RAM. Thanks, Anand -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-Slow-to-boot-up-tp4161098.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SolrCloud Scale Struggle
Hello all, Thank you for your suggestions. With the autoCommit (every 10 mins) and softCommit (every 10 secs) frequencies reduced things work much better now. The CPU usages has gone down considerably too (by about 60%) and the read/write throughput is showing considerable improvements too. There are a certain shards that are giving poor response times - these have over 10M listings - I guess this is due to the fact that these are starving for RAM? Would it help if I split these up in smaller shards, but with the existing set of hardware? (I cannot allocate more machines to the cloud as yet) Thanks, Anand -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-Scale-Struggle-tp4150592p4152239.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SolrCloud Scale Struggle
Thanks Shawn. I'm using 2 level composite id routing right now. These are all Used Cars listings and all search queries always have car year and make in the search criteria - hence that made sense to have Year+Make as level 1 in the composite id. Beyond that the second level composite id is based on about 8 car attributes and that means all listings for a similar type of car and listings of any car are grouped together and co-located in the SlorCloud. Even with this there is still an imbalance in the cluster - as certain car makes are popular and there are more listings for such cars that go the same shard. Will splitting these up with the existing set of hardware help at all? -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-Scale-Struggle-tp4150592p4150811.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SolrCloud Scale Struggle
Thank you everyone for your responses. Increased the hard commit to 10mins and autoSoftCommit to 10 secs. (I wont really need a real time get - tweaked the app code to cache the doc and use the app side cached version instead of fetching it from Solr) Will watch it for a day or two and clock the throughput. For this deployment the peak is throughout the day as more data keeps streaming in - there are no direct users with search queries here (as of now) - but every incoming doc is compared against the existing set of docs in Solr - to check whether its a new one or an updated version of an existing one and only then the doc is inserted/updated. Right now its adding about 1100 docs a minute (~20 docs a second) [But thats because it has to run a search before to determine whether its an insert/update] Also, since there are already 18 JVMs per machine - How do I go about merging these existing cores under just 1 JVM? Would it be that I'd need to create 1 Solr instance with 18 cores inside and then migrate data from these separate JVMs into the new instance? -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-Scale-Struggle-tp4150592p4150810.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SolrCloud Scale Struggle
Thanks for the reply Shalin. 1. I'll try increasing the softCommit interval and the autoSoftCommit too. One mistake I made that I realized just now is that I am using /solr/select and expecting it to do an NRT - for NRT search its got to be /select/get handler that needs to be used. Please confirm. 2. Also, on the number of shards - I made 36 (even with 6 machines) as I was hoping I'd get more hardware and i'll be able to distribute existing shards on the new boxes. That has not happened yet. But even with current deployment - less number of shards would mean more docs per shard and would that now slow down search queries? 3. Increasing the commit interval would mean more RAM usage and could that make the situation bad? as there is already less RAM in there compared to the total doc size (with all fields stored) [FYI - ramBufferSizeMB and maxBufferedDocs are set to default - 100MB and 1000 respectively] 4. I read DataStack Enterprise edition could be an answer here? Is there an easy way to migrate to DSE - and something that would not cause too many code changes? (I had a discussion with the DSE folks a few weeks ago and they mentioned migration would be breeze from Solr to DSE and there would not be 'any' code changes required too on the ingestion and search code. (Perhaps I was talking to the Sales guy maybe?)) - With DSE - the data would sit in Cassendra and the search will still be with Solr plugged into DSE. but would that work with a 6 Node cluster? (Sorry if I'm deviating here a bit from the core problem i'm trying to fix - but if DSE could work with a very minimal time and effort requirement - i wont mind trying it out.) -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-Scale-Struggle-tp4150592p4150619.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SolrCloud Scale Struggle
Oops - my bad - Its autoSoftCommit that is set after every doc and not an autoCommit. Following snippet from the solrconfig - 1 true 1 Shall I increase the autoCommit time as well? But would that mean more RAM is consumed by all instances running on the box? -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-Scale-Struggle-tp4150592p4150615.html Sent from the Solr - User mailing list archive at Nabble.com.
SolrCloud Scale Struggle
Hello all, Struggling to get this going with SolrCloud - Requirement in brief : - Ingest about 4M Used Cars listings a day and track all unique cars for changes - 4M automated searches a day (during the ingestion phase to check if a doc exists in the index (based on values of 4-5 key fields) or it is a new one or an updated version) - Of the 4 M - About 3M Updates to existing docs (for every non-key value change) - About 1M inserts a day (I'm assuming these many new listings come in every day) - Daily Bulk CSV exports of inserts / updates in last 24 hours of various snapshots of the data to various clients My current deployment : i) I'm using Solr 4.8 and have set up a SolrCloud with 6 dedicated machines - 24 Core + 96 GB RAM each. ii)There are over 190M docs in the SolrCloud at the moment (for all replicas its consuming overall disk 2340GB which implies - each doc is at about 5-8kb in size.) iii) The docs are split into 36 Shards - and 3 replica per shard (in all 108 Solr Jetty processes split over 6 Servers leaving about 18 Jetty JVMs running on each host) iv) There are 60 fields per doc and all fields are stored at the moment :( (The backend is only Solr at the moment) v) The current shard/routing key is a combination of Car Year, Make and some other car level attributes that help classify the cars vi) We are mostly using the default Solr config as of now - no heavy caching as the search is pretty random in nature vii) Autocommit is on - with maxDocs = 1 Current throughput & Issues : With the above mentioned deployment the daily throughout is only at about 1.5M on average (Inserts + Updates) - falling way short of what is required. Search is slow - Some queries take about 15 seconds to return - and since insert is dependent on at least one Search that degrades the write throughput too. (This is not a Solr issue - but the app demands it so) Questions : 1. Autocommit with maxDocs = 1 - is that a goof up and could that be slowing down indexing? Its a requirement that all docs are available as soon as indexed. 2. Should I have been better served had I deployed a Single Jetty Solr instance per server with multiple cores running inside? The servers do start to swap out after a couple of days of Solr uptime - right now we reboot the entire cluster every 4 days. 3. The routing key is not able to effectively balance the docs on available shards - There are a few shards with just about 2M docs - and others over 11M docs. Shall I split the larger shards? But I do not have more nodes / hardware to allocate to this deployment. In such case would splitting up the large shards give better read-write throughput? 4. To remain with the current hardware - would it help if I remove 1 replica each from a shard? But that would mean even when just 1 node goes down for a shard there would be only 1 live node left that would not serve the write requests. 5. Also, is there a way to control where the Split Shard replicas would go? Is there a pattern / rule that Solr follows when it creates replicas for split shards? 6. I read somewhere that creating a Core would cost the OS one thread and a file handle. Since a core repsents an index in its entirty would it not be allocated the configured number of write threads? (The dafault that is 8) 7. The Zookeeper cluster is deployed on the same boxes as the Solr instance - Would separating the ZK cluster out help? Sorry for the long thread _ I thought of asking these all at once rather than posting separate ones. Thanks, Anand -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-Scale-Struggle-tp4150592.html Sent from the Solr - User mailing list archive at Nabble.com.