Thanks for all of your clarification. I know that solrcloud is a really better configuration than any other, but actually it has a complexity that is really higher. I just want to give you the pain point I've noticed while I was gathering all the info I can got on SolrCloud.
1) zookeeper documentation says that to have the best experience you should have a dedicated filesystem for the persistence and it should never swap to disk. I've not found any guidelines on how I should dimension zookeeper machine, how much ram, disk? Can I install zookeeper in the same machines where Solr resides ( I suspect no, because Solr machine are under stress and if zookeeper start swapping is can lead to problem)? 2) What about the update? If I need to update my solrcloud instance and the new version requires a new version of zookeeper which is the path to go? I need to first update zookeeper, or upgrading solr to existing machine or? Maybe I did not search well but I did not find a comprehensive guideline that told me how to upgrade my SolrCloud installation in various situation. 3) Which are the best practices to run DIH in solrcloud? I think I can round robin triggering DIH import on different server composing the cloud infrastructure, or there is a better way to go? (I probably need to trigger a DIH each 5/10 minutes but the number of new records is really small) 4) Since I believe that it is not best practice to install zookeeper on same SolrMachine (as separated process, not the built in zookeeper), I need at least three more machine to maintain / monitor / upgrade and I need also to monitor zookeeper, a new appliance that need to be mastered by IT Infrastructure. Is there any guidelines on how to automate promoting a slave as a master in classic Master Slave situation? I did not find anything official, because auto promoting a slave into master could solve my problem. -- Gian Maria Ricci Cell: +39 320 0136949 -----Original Message----- From: Emir Arnautovic [mailto:emir.arnauto...@sematext.com] Sent: martedì 8 dicembre 2015 11:25 To: solr-user@lucene.apache.org Subject: Re: Use multiple istance simultaneously Can you tolerate havin g indices in different state or you plan to keep them in sync with controlled commits. DIH-ing content from source when new machine is needed will probably be slow and I am afraid that you will end up simulating master-slave model (copying state from one of healthy nodes and DIH-ing diff). I would recommend using SolrCloud with single shard and let Solr do the hard work. Regards, Emir On 04.12.2015 14:37, Gian Maria Ricci - aka Alkampfer wrote: > Many thanks for your response. > > I worked with Solr until early version 4.0, then switched to > ElasticSearch for a variety of reasons. I've used replication in the > past with SolR, but with Elasticsearch basically I had no problem > because it works similar to SolrCloud by default and with almost zero configuration. > > Now I've a customer that want to use Solr, and he want the simplest > possible stuff to maintain in production. Since most of the work will > be done by Data Import Handler, having multiple parallel and > independent mach ines is easy to > maintain. If one machine fails, it is enough to configure another > machine, configure core and restart DIH. > > I'd like to know if other people went through this path in the past. > > -- > Gian Maria Ricci > Cell: +39 320 0136949 > > > -----Original Message----- > From: Shawn Heisey [mailto:apa...@elyograg.org] > Sent: giovedì 3 dicembre 2015 10:15 > To: solr-user@lucene.apache.org > Subject: Re: Use multiple istance simultaneously > > On 12/3/2015 1:25 AM, Gian Maria Ricci - aka Alkampfer wrote: >> In such a scenario could it be feasible to simply configure 2 or 3 >> identical instance of Solr and configure the application that >> transfer data to solr to all the instances simultaneously (the >> approach will be a DIH incremental for some core and an external >> application that push data continuously for other cores)? Which could >> be the drawback of using this approach? > When I first set up Solr, I used replication. Then version 3.1.0 was > released, in cluding a non-backward-compatible upgrade to javabin, and it was > not possible to replicate between 1.x and 3.x. > > This incompatibility meant that it would not be possible to do a > gradual upgrade to 3.x, where the slaves are upgraded first and then the master. > > To get around the problem, I basically did exactly wh at you've described. > I turned off replication and configured a second copy of my build > program to update what used to be slave servers. > > Later, when I moved to a SolrJ program for index maintenance, I made > one copy of the maintenance program capable of updating multiple > copies of the index in parallel. > > I have stuck with this architecture through 4.x and moving into 5.x, > even though I could go back to replication or switch to SolrCloud. > Having completely independent indexes allows a great deal of > flexibility with upgrades and testing new configurations, flexibility > that isn't available with SolrCloud or master-slave replication. > > Thanks, > Shawn > -- Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr & Elasticsearch Support * http://sematext.com/