Re: ZooKeeper snapCount Tuning

2020-04-03 Thread Michael Han
The workload is a more deciding factor than hardwares when tuning
zookeeper.snapCount and other config parameters, under current ZK
implementation. I am afraid there is no universal value that applicable to
every case although we can provide recommended settings by benchmarking
predictable and usual workloads. In any cases, larger snap count leading to
less frequent snapshotting, which should improve system performance but at
the cost of longer recovery time.

For new hardwares, I think most of time ZK just get the benefit for free.
For preallocation, I agree and think that'll still be useful, as that's a
file system thing and work regardless of underlying medium. To get optimal
usage of the new hardware would require more thought, and just borrow some
ideas from database world that might applicable to ZK:
* Off loading snapshot to dedicated hardware accelerator like FPGA.
* SyncRequestProcessor can flush transaction to NVRam without buffering and
group commit.
* Durable ZK data tree on NVRam that does not require WAL and snapshot.

I suspect not much going on here as ZK, unlike databases, never received
enough workloads (which is a by design) that justifies the investment.



On Fri, Apr 3, 2020 at 1:34 PM Ted Dunning  wrote:

> On Fri, Apr 3, 2020 at 10:01 AM Patrick Hunt  wrote:
>
> > ...
> > Makes sense. For eg. SSD characteristics are vastly diff from spinning
> > media.
>
>
> super true.
>
>
> > I suspect it would be worth looking into this in even more depth -
> > we pre-allocate certain files, perhaps that's no longer necessary, etc...
> >
>
> The preallocation still makes sense on most file systems since meta-data
> changes (i.e. changing file length) are much more expensive than data
> changes (overwriting previously allocated blocks).
>
> Makes sense. If we do something it would be great to have a set of tests
> > that could be used/reused to explore the various types even beyond SSD
> > itself.
> >
>
> Indeed. Storage class memory, for example, could make for an amazing ZK
> implementation. So could use of the upcoming SSD devices that implement
> key-value stores.
>
>
> >
> > Regards,
> >
> > Patrick
> >
> >
> > > My hypothesis is: with a larger snapCount value, ZK can have higher
> > > throughput because it is spending less time creating snapshots.
> > >
> > > Thanks!
> > >
> >
>


Re: ZooKeeper snapCount Tuning

2020-04-03 Thread Ted Dunning
On Fri, Apr 3, 2020 at 10:01 AM Patrick Hunt  wrote:

> ...
> Makes sense. For eg. SSD characteristics are vastly diff from spinning
> media.


super true.


> I suspect it would be worth looking into this in even more depth -
> we pre-allocate certain files, perhaps that's no longer necessary, etc...
>

The preallocation still makes sense on most file systems since meta-data
changes (i.e. changing file length) are much more expensive than data
changes (overwriting previously allocated blocks).

Makes sense. If we do something it would be great to have a set of tests
> that could be used/reused to explore the various types even beyond SSD
> itself.
>

Indeed. Storage class memory, for example, could make for an amazing ZK
implementation. So could use of the upcoming SSD devices that implement
key-value stores.


>
> Regards,
>
> Patrick
>
>
> > My hypothesis is: with a larger snapCount value, ZK can have higher
> > throughput because it is spending less time creating snapshots.
> >
> > Thanks!
> >
>


Re: ZooKeeper snapCount Tuning

2020-04-03 Thread Patrick Hunt
On Fri, Apr 3, 2020 at 6:19 AM David Mollitor  wrote:

> Hello Community,
>
> The configuration zookeeper.snapCount defaults to a value of 100,000 and
> has been at this default for 11 years now.
>
>
> https://github.com/apache/zookeeper/blob/e87bad6774e7269ef21a156aff9dad089ef54794/zookeeper-server/src/main/java/org/apache/zookeeper/server/ZooKeeperServer.java#L1149
>
> Based on the last ZK meetup, I know there has been some recent attempts to
> re-run the baseline performance benchmarks.
>
> The current value may be a "safe" value.  However, I think we can all agree
> that hardware has improved quite a bit in the past 11 years.  Does anyone
> have any experience tweaking and testing this number on a production
> system?  Are there any recommendations out there for how to set this value?
>
>
Makes sense. For eg. SSD characteristics are vastly diff from spinning
media. I suspect it would be worth looking into this in even more depth -
we pre-allocate certain files, perhaps that's no longer necessary, etc...
Makes sense. If we do something it would be great to have a set of tests
that could be used/reused to explore the various types even beyond SSD
itself.

Regards,

Patrick


> My hypothesis is: with a larger snapCount value, ZK can have higher
> throughput because it is spending less time creating snapshots.
>
> Thanks!
>


ZooKeeper snapCount Tuning

2020-04-03 Thread David Mollitor
Hello Community,

The configuration zookeeper.snapCount defaults to a value of 100,000 and
has been at this default for 11 years now.

https://github.com/apache/zookeeper/blob/e87bad6774e7269ef21a156aff9dad089ef54794/zookeeper-server/src/main/java/org/apache/zookeeper/server/ZooKeeperServer.java#L1149

Based on the last ZK meetup, I know there has been some recent attempts to
re-run the baseline performance benchmarks.

The current value may be a "safe" value.  However, I think we can all agree
that hardware has improved quite a bit in the past 11 years.  Does anyone
have any experience tweaking and testing this number on a production
system?  Are there any recommendations out there for how to set this value?

My hypothesis is: with a larger snapCount value, ZK can have higher
throughput because it is spending less time creating snapshots.

Thanks!