Thanks for all of your clarification. I know that solrcloud is a really
better configuration than any other, but actually it has a complexity that
is really higher. I just want to give you the pain point I've noticed while
I was gathering all the info I can got on SolrCloud.

1) zookeeper documentation says that to have the best experience you should
have a dedicated filesystem for the persistence and it should never swap to
disk. I've not found any guidelines on how I should dimension zookeeper
machine, how much ram, disk? Can I install zookeeper in the same machines
where Solr resides ( I suspect no, because Solr machine are under stress and
if zookeeper start swapping is can lead to problem)?

2) What about the update? If I need to update my solrcloud instance and the
new version requires a new version of zookeeper which is the path to go? I
need to first update zookeeper, or upgrading solr to existing machine or?
Maybe I did not search well but I did not find a comprehensive guideline
that told me how to upgrade my SolrCloud installation in various situation. 

3) Which are the best practices to run DIH in solrcloud? I think I can round
robin triggering DIH import on different server composing the cloud
infrastructure, or there is a better way to go? (I probably need to trigger
a DIH each 5/10 minutes but the number of new records is really small)

4) Since I believe that it is not best practice to install zookeeper on same
SolrMachine (as separated process, not the built in zookeeper), I need at
least three more machine to maintain / monitor / upgrade and I need also to
monitor zookeeper, a new appliance that need to be mastered by IT
Infrastructure.

Is there any guidelines on how to automate promoting a slave as a master in
classic Master Slave situation? I did not find anything official, because
auto promoting a slave into master could solve my problem.

--
Gian Maria Ricci
Cell: +39 320 0136949
    

-----Original Message-----
From: Emir Arnautovic [mailto:emir.arnauto...@sematext.com] 
Sent: martedì 8 dicembre 2015 11:25
To: solr-user@lucene.apache.org
Subject: Re: Use multiple istance simultaneously

Can you tolerate havin
g indices in different state or you plan to keep them in sync with
controlled commits. DIH-ing content from source when new machine is needed
will probably be slow and I am afraid that you will end up simulating
master-slave model (copying state from one of healthy nodes and DIH-ing
diff). I would recommend using SolrCloud with single shard and let Solr do
the hard work.

Regards,
Emir

On 04.12.2015 14:37, Gian Maria Ricci - aka Alkampfer wrote:
> Many thanks for your response.
>
> I worked with Solr until early version 4.0, then switched to 
> ElasticSearch for a variety of reasons. I've used replication in the 
> past with SolR, but with Elasticsearch basically I had no problem 
> because it works similar to SolrCloud by default and with almost zero
configuration.
>
> Now I've a customer that want to use Solr, and he want the simplest 
> possible stuff to maintain in production. Since most of the work will 
> be done by Data Import Handler, having multiple parallel and 
> independent mach
ines is easy to
> maintain. If one machine fails, it is enough to configure another 
> machine, configure core and restart DIH.
>
> I'd like to know if other people went through this path in the past.
>
> --
> Gian Maria Ricci
> Cell: +39 320 0136949
>      
>
> -----Original Message-----
> From: Shawn Heisey [mailto:apa...@elyograg.org]
> Sent: giovedì 3 dicembre 2015 10:15
> To: solr-user@lucene.apache.org
> Subject: Re: Use multiple istance simultaneously
>
> On 12/3/2015 1:25 AM, Gian Maria Ricci - aka Alkampfer wrote:
>> In such a scenario could it be feasible to simply configure 2 or 3 
>> identical instance of Solr and configure the application that 
>> transfer data to solr to all the instances simultaneously (the 
>> approach will be a DIH incremental for some core and an external 
>> application that push data continuously for other cores)? Which could 
>> be the drawback of using this approach?
> When I first set up Solr, I used replication.  Then version 3.1.0 was 
> released, in
cluding a non-backward-compatible upgrade to javabin, and it was
> not possible to replicate between 1.x and 3.x.
>
> This incompatibility meant that it would not be possible to do a 
> gradual upgrade to 3.x, where the slaves are upgraded first and then the
master.
>
> To get around the problem, I basically did exactly wh at you've described.
> I turned off replication and configured a second copy of my build 
> program to update what used to be slave servers.
>
> Later, when I moved to a SolrJ program for index maintenance, I made 
> one copy of the maintenance program capable of updating multiple 
> copies of the index in parallel.
>
> I have stuck with this architecture through 4.x and moving into 5.x, 
> even though I could go back to replication or switch to SolrCloud.
> Having completely independent indexes allows a great deal of 
> flexibility with upgrades and testing new configurations, flexibility 
> that isn't available with SolrCloud or master-slave replication.
>
> Thanks,
> Shawn
>

--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr
& Elasticsearch Support * http://sematext.com/

Reply via email to