Re: Getting rid of zookeeper

2020-06-10 Thread matthew sporleder
FWIW -- zookeeper is pretty set-and-forget in my experience with
settings like autopurge.snapRetainCount, autopurge.purgeInterval, and
rotating the zookeeper.out stdout file.

It is a big hassle to setup the individual myid files and keep them in
sync with the server.$id=hostname in zoo.cfg but, again, one time
pain.

I think smaller solr deployments could benefit from some easier
ability to configure the embedded zookeeper (like the improved zk
upconfig and friends) which might address this entire point?  The only
reason I don't run embedded zk (I use three small ec2's) is because
cpu/disk contention on the same server have burned me in the past.

On Wed, Jun 10, 2020 at 3:30 AM Jan Høydahl  wrote:
>
> Curator is just on the client (solr) side, to make it easier to integrate 
> with Zookeeper, right?
>
> If you study Elastic, they had terrible cluster stability a few years ago 
> since everything
> was too «dynamic» and «zero config». That led to the system outsmarting 
> itself when facing
> real-life network partitions and other failures. Solr did not have these 
> issues exactly because
> it relies on Zookeeper which is very static and hard to change (on purpose), 
> and thus delivers
> a strong, stable quorum. So what did Elastic do a couple years ago? They 
> adopted the same
> best practice as ZK, recommending 3 or 5 (statically defined) master nodes 
> that owns the
> cluster state.
>
> Solr could get rid of ZK the same way as KAFKA. But while KAFKA already has a
> distributed log they could replace ZK with (hey, Kafka IS a log), Solr would 
> need to add
> such a log, and it would need to be embedded in the Solr process to avoid 
> that extra runtime.
> I believe it could be done with Apache Ratis 
> (https://ratis.incubator.apache.org ) 
> which
> is a RAFT Java library. But I’m doubtful if the project has the bandwidth and 
> dedication right
> now to embark on such a project. It would probably be a multi-year effort, 
> first building
> abstractions on top of ZK, then moving one piece of ZK dependency over to 
> RAFT at a time,
> needing both systems in parallel, before at the end ZK could go away.
>
> I’d like to see it happen. Especially for smaller deployments it would be 
> fantastic.
>
> Jan
>
> > 10. jun. 2020 kl. 01:03 skrev Erick Erickson :
> >
> > The intermediate solution is to migrate to Curator. I don’t know all the 
> > ins and outs
> > of that and whether or not it would be easier to setup and maintain.
> >
> > I do know that Zookeeper is deeply embedded in Solr and taking replacing it 
> > with
> > most anything would be a major pain.
> >
> > I’m also certain that rewriting Zookeeper is a rat-hole that would take a 
> > major
> > effort. If anyone would like to try it, all patches welcome.
> >
> > FWIW,
> > er...@curmudgeon.com
> >
> >> On Jun 9, 2020, at 6:01 PM, Dave  wrote:
> >>
> >> Is it horrible that I’m already burnt out from just reading that?
> >>
> >> I’m going to stick to the classic solr master slave set up for the 
> >> foreseeable future, at least that let’s me focus more on the search theory 
> >> rather than the back end system non stop.
> >>
> >>> On Jun 9, 2020, at 5:11 PM, Vincenzo D'Amore  wrote:
> >>>
> >>> My 2 cents, I have few solrcloud productions installations, I would share
> >>> some thoughts of what I learned in the latest 4/5 years (fwiw) just as 
> >>> they
> >>> come out of my mind.
> >>>
> >>> - to configure a SolrCloud *production* Cluster you have to be a zookeeper
> >>> expert even if you only need Solr.
> >>> - the Zookeeper ensemble (3 or 5 zookeeper nodes) is recommended to run on
> >>> separate machines but for many customers this is too expensive. And for 
> >>> the
> >>> rest it is expensive just to have the instances (i.e. dockers). It is
> >>> expensive even to have people that know Zookeeper or even only train them.
> >>> - given the high availability function of a zookeeper cluster you have
> >>> to monitor it and promptly backup and restore. But it is hard to monitor
> >>> (and configure the monitoring) and it is even harder to backup and restore
> >>> (when it is running).
> >>> - You can't add or remove nodes in zookeeper when it is up. Only the 
> >>> latest
> >>> version should finally give the possibility to add/remove nodes when it is
> >>> running, but afak this is not still supported by SolrCloud (out of the 
> >>> box).
> >>> - many people fail when they try to run a SolrCloud cluster because it is
> >>> hard to set up, for example: SolrCloud zkcli runs poorly on windows.
> >>> - it is hard to admin the zookeeper remotely, basically there are no
> >>> utilities that let you easily list/read/write/delete files on a zookeeper
> >>> filesystem.
> >>> - it was really hard to create a zookeeper ensemble in kubernetes, only
> >>> recently appeared few solutions. This was so counter-productive for the
> >>> Solr project because now the world is moving to Kubernetes, and there is
> >>> 

Re: Getting rid of zookeeper

2020-06-10 Thread Jan Høydahl
Curator is just on the client (solr) side, to make it easier to integrate with 
Zookeeper, right?

If you study Elastic, they had terrible cluster stability a few years ago since 
everything
was too «dynamic» and «zero config». That led to the system outsmarting itself 
when facing
real-life network partitions and other failures. Solr did not have these issues 
exactly because
it relies on Zookeeper which is very static and hard to change (on purpose), 
and thus delivers
a strong, stable quorum. So what did Elastic do a couple years ago? They 
adopted the same
best practice as ZK, recommending 3 or 5 (statically defined) master nodes that 
owns the
cluster state.

Solr could get rid of ZK the same way as KAFKA. But while KAFKA already has a
distributed log they could replace ZK with (hey, Kafka IS a log), Solr would 
need to add
such a log, and it would need to be embedded in the Solr process to avoid that 
extra runtime.
I believe it could be done with Apache Ratis 
(https://ratis.incubator.apache.org ) 
which 
is a RAFT Java library. But I’m doubtful if the project has the bandwidth and 
dedication right
now to embark on such a project. It would probably be a multi-year effort, 
first building
abstractions on top of ZK, then moving one piece of ZK dependency over to RAFT 
at a time,
needing both systems in parallel, before at the end ZK could go away.

I’d like to see it happen. Especially for smaller deployments it would be 
fantastic.

Jan

> 10. jun. 2020 kl. 01:03 skrev Erick Erickson :
> 
> The intermediate solution is to migrate to Curator. I don’t know all the ins 
> and outs
> of that and whether or not it would be easier to setup and maintain.
> 
> I do know that Zookeeper is deeply embedded in Solr and taking replacing it 
> with
> most anything would be a major pain.
> 
> I’m also certain that rewriting Zookeeper is a rat-hole that would take a 
> major
> effort. If anyone would like to try it, all patches welcome.
> 
> FWIW,
> er...@curmudgeon.com
> 
>> On Jun 9, 2020, at 6:01 PM, Dave  wrote:
>> 
>> Is it horrible that I’m already burnt out from just reading that?
>> 
>> I’m going to stick to the classic solr master slave set up for the 
>> foreseeable future, at least that let’s me focus more on the search theory 
>> rather than the back end system non stop. 
>> 
>>> On Jun 9, 2020, at 5:11 PM, Vincenzo D'Amore  wrote:
>>> 
>>> My 2 cents, I have few solrcloud productions installations, I would share
>>> some thoughts of what I learned in the latest 4/5 years (fwiw) just as they
>>> come out of my mind.
>>> 
>>> - to configure a SolrCloud *production* Cluster you have to be a zookeeper
>>> expert even if you only need Solr.
>>> - the Zookeeper ensemble (3 or 5 zookeeper nodes) is recommended to run on
>>> separate machines but for many customers this is too expensive. And for the
>>> rest it is expensive just to have the instances (i.e. dockers). It is
>>> expensive even to have people that know Zookeeper or even only train them.
>>> - given the high availability function of a zookeeper cluster you have
>>> to monitor it and promptly backup and restore. But it is hard to monitor
>>> (and configure the monitoring) and it is even harder to backup and restore
>>> (when it is running).
>>> - You can't add or remove nodes in zookeeper when it is up. Only the latest
>>> version should finally give the possibility to add/remove nodes when it is
>>> running, but afak this is not still supported by SolrCloud (out of the box).
>>> - many people fail when they try to run a SolrCloud cluster because it is
>>> hard to set up, for example: SolrCloud zkcli runs poorly on windows.
>>> - it is hard to admin the zookeeper remotely, basically there are no
>>> utilities that let you easily list/read/write/delete files on a zookeeper
>>> filesystem.
>>> - it was really hard to create a zookeeper ensemble in kubernetes, only
>>> recently appeared few solutions. This was so counter-productive for the
>>> Solr project because now the world is moving to Kubernetes, and there is
>>> basically no support.
>>> - well, after all these troubles, when the solrcloud clusters are
>>> configured correctly then, well, they are solid (rock?). And even if few
>>> Solr nodes/replicas went down the entire cluster can restore itself almost
>>> automatically, but how much work.
>>> 
>>> Believe me, I like Solr, but at the end of this long journey, sometimes I
>>> would really use only paas/saas instead of having to deal with all these
>>> troubles.
> 



Re: Getting rid of zookeeper

2020-06-09 Thread Erick Erickson
The intermediate solution is to migrate to Curator. I don’t know all the ins 
and outs
of that and whether or not it would be easier to setup and maintain.

I do know that Zookeeper is deeply embedded in Solr and taking replacing it with
most anything would be a major pain.

I’m also certain that rewriting Zookeeper is a rat-hole that would take a major
effort. If anyone would like to try it, all patches welcome.

FWIW,
er...@curmudgeon.com

> On Jun 9, 2020, at 6:01 PM, Dave  wrote:
> 
> Is it horrible that I’m already burnt out from just reading that?
> 
> I’m going to stick to the classic solr master slave set up for the 
> foreseeable future, at least that let’s me focus more on the search theory 
> rather than the back end system non stop. 
> 
>> On Jun 9, 2020, at 5:11 PM, Vincenzo D'Amore  wrote:
>> 
>> My 2 cents, I have few solrcloud productions installations, I would share
>> some thoughts of what I learned in the latest 4/5 years (fwiw) just as they
>> come out of my mind.
>> 
>> - to configure a SolrCloud *production* Cluster you have to be a zookeeper
>> expert even if you only need Solr.
>> - the Zookeeper ensemble (3 or 5 zookeeper nodes) is recommended to run on
>> separate machines but for many customers this is too expensive. And for the
>> rest it is expensive just to have the instances (i.e. dockers). It is
>> expensive even to have people that know Zookeeper or even only train them.
>> - given the high availability function of a zookeeper cluster you have
>> to monitor it and promptly backup and restore. But it is hard to monitor
>> (and configure the monitoring) and it is even harder to backup and restore
>> (when it is running).
>> - You can't add or remove nodes in zookeeper when it is up. Only the latest
>> version should finally give the possibility to add/remove nodes when it is
>> running, but afak this is not still supported by SolrCloud (out of the box).
>> - many people fail when they try to run a SolrCloud cluster because it is
>> hard to set up, for example: SolrCloud zkcli runs poorly on windows.
>> - it is hard to admin the zookeeper remotely, basically there are no
>> utilities that let you easily list/read/write/delete files on a zookeeper
>> filesystem.
>> - it was really hard to create a zookeeper ensemble in kubernetes, only
>> recently appeared few solutions. This was so counter-productive for the
>> Solr project because now the world is moving to Kubernetes, and there is
>> basically no support.
>> - well, after all these troubles, when the solrcloud clusters are
>> configured correctly then, well, they are solid (rock?). And even if few
>> Solr nodes/replicas went down the entire cluster can restore itself almost
>> automatically, but how much work.
>> 
>> Believe me, I like Solr, but at the end of this long journey, sometimes I
>> would really use only paas/saas instead of having to deal with all these
>> troubles.



Re: Getting rid of zookeeper

2020-06-09 Thread Dave
Is it horrible that I’m already burnt out from just reading that?

I’m going to stick to the classic solr master slave set up for the foreseeable 
future, at least that let’s me focus more on the search theory rather than the 
back end system non stop. 

> On Jun 9, 2020, at 5:11 PM, Vincenzo D'Amore  wrote:
> 
> My 2 cents, I have few solrcloud productions installations, I would share
> some thoughts of what I learned in the latest 4/5 years (fwiw) just as they
> come out of my mind.
> 
> - to configure a SolrCloud *production* Cluster you have to be a zookeeper
> expert even if you only need Solr.
> - the Zookeeper ensemble (3 or 5 zookeeper nodes) is recommended to run on
> separate machines but for many customers this is too expensive. And for the
> rest it is expensive just to have the instances (i.e. dockers). It is
> expensive even to have people that know Zookeeper or even only train them.
> - given the high availability function of a zookeeper cluster you have
> to monitor it and promptly backup and restore. But it is hard to monitor
> (and configure the monitoring) and it is even harder to backup and restore
> (when it is running).
> - You can't add or remove nodes in zookeeper when it is up. Only the latest
> version should finally give the possibility to add/remove nodes when it is
> running, but afak this is not still supported by SolrCloud (out of the box).
> - many people fail when they try to run a SolrCloud cluster because it is
> hard to set up, for example: SolrCloud zkcli runs poorly on windows.
> - it is hard to admin the zookeeper remotely, basically there are no
> utilities that let you easily list/read/write/delete files on a zookeeper
> filesystem.
> - it was really hard to create a zookeeper ensemble in kubernetes, only
> recently appeared few solutions. This was so counter-productive for the
> Solr project because now the world is moving to Kubernetes, and there is
> basically no support.
> - well, after all these troubles, when the solrcloud clusters are
> configured correctly then, well, they are solid (rock?). And even if few
> Solr nodes/replicas went down the entire cluster can restore itself almost
> automatically, but how much work.
> 
> Believe me, I like Solr, but at the end of this long journey, sometimes I
> would really use only paas/saas instead of having to deal with all these
> troubles.


Re: Getting rid of zookeeper

2020-06-09 Thread Vincenzo D'Amore
My 2 cents, I have few solrcloud productions installations, I would share
some thoughts of what I learned in the latest 4/5 years (fwiw) just as they
come out of my mind.

- to configure a SolrCloud *production* Cluster you have to be a zookeeper
expert even if you only need Solr.
- the Zookeeper ensemble (3 or 5 zookeeper nodes) is recommended to run on
separate machines but for many customers this is too expensive. And for the
rest it is expensive just to have the instances (i.e. dockers). It is
expensive even to have people that know Zookeeper or even only train them.
- given the high availability function of a zookeeper cluster you have
to monitor it and promptly backup and restore. But it is hard to monitor
(and configure the monitoring) and it is even harder to backup and restore
(when it is running).
- You can't add or remove nodes in zookeeper when it is up. Only the latest
version should finally give the possibility to add/remove nodes when it is
running, but afak this is not still supported by SolrCloud (out of the box).
- many people fail when they try to run a SolrCloud cluster because it is
hard to set up, for example: SolrCloud zkcli runs poorly on windows.
- it is hard to admin the zookeeper remotely, basically there are no
utilities that let you easily list/read/write/delete files on a zookeeper
filesystem.
- it was really hard to create a zookeeper ensemble in kubernetes, only
recently appeared few solutions. This was so counter-productive for the
Solr project because now the world is moving to Kubernetes, and there is
basically no support.
- well, after all these troubles, when the solrcloud clusters are
configured correctly then, well, they are solid (rock?). And even if few
Solr nodes/replicas went down the entire cluster can restore itself almost
automatically, but how much work.

Believe me, I like Solr, but at the end of this long journey, sometimes I
would really use only paas/saas instead of having to deal with all these
troubles.


Re: Getting rid of zookeeper

2020-06-09 Thread Walter Underwood
Zookeeper was created because fault-tolerant algorithms are extremely hard to 
test and get correct. Maybe the hardest thing in computing. Using a trusted 
implementation frees up lots of developer time.

To get an idea of the difficulty, read through the kinds of things fixed in the 
Zookeeper release notes.

https://zookeeper.apache.org/releases.html

Elasticsearch does not have a good record on fault-tolerance. I haven’t checked 
recently, but it was losing updates during leader elections for several years 
worth of software releases.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Jun 9, 2020, at 12:37 PM, David Hastings  
> wrote:
> 
> Zookeeper is annoying to both set up and manage, but then again the same
> thing can be said about solr cloud.  not certain why you would want to deal
> with either
> 
> On Tue, Jun 9, 2020 at 3:29 PM S G  wrote:
> 
>> Hello,
>> 
>> I recently stumbled across KIP-500: Replace ZooKeeper with a Self-Managed
>> Metadata Quorum
>> <
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-500%3A+Replace+ZooKeeper+with+a+Self-Managed+Metadata+Quorum
>>> 
>> Elastic-search does this too.
>> And so do many other systems.
>> 
>> Is there some work to go in this direction?
>> It would be nice to get rid of another totally disparate system.
>> Hardware savings would be nice to have too.
>> 
>> Best,
>> SG
>> 



Re: Getting rid of zookeeper

2020-06-09 Thread David Hastings
Zookeeper is annoying to both set up and manage, but then again the same
thing can be said about solr cloud.  not certain why you would want to deal
with either

On Tue, Jun 9, 2020 at 3:29 PM S G  wrote:

> Hello,
>
> I recently stumbled across KIP-500: Replace ZooKeeper with a Self-Managed
> Metadata Quorum
> <
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-500%3A+Replace+ZooKeeper+with+a+Self-Managed+Metadata+Quorum
> >
> Elastic-search does this too.
> And so do many other systems.
>
> Is there some work to go in this direction?
> It would be nice to get rid of another totally disparate system.
> Hardware savings would be nice to have too.
>
> Best,
> SG
>


Getting rid of zookeeper

2020-06-09 Thread S G
Hello,

I recently stumbled across KIP-500: Replace ZooKeeper with a Self-Managed
Metadata Quorum

Elastic-search does this too.
And so do many other systems.

Is there some work to go in this direction?
It would be nice to get rid of another totally disparate system.
Hardware savings would be nice to have too.

Best,
SG