Re: Production deployment requirements for memory backend storage

2017-04-12 Thread nrjpoddar
Thanks Charlie for the detailed reply. It clarifies a lot of things for me.
Few more follow-on questions:

1) Based on your reply for "platform_data_dir", the size of the directory is
bounded for a stable cluster (i.e. not much ring ownership changes, bucket
types/buckets not being created with custom properties). Newly created node
joining the cluster obtains all relevant cluster/ring metadata from its
peers and persists data in this directory. Is my understanding correct?

2) Is there any documentation related to memory overhead for memory storage
backend? I found overhead documentation for bitcask backend but none for
memory. I'm looking for overhead added by Riak per key/data pair. I'm
guessing frequency of updates which might affect vector clock sizes
influence this number but an average & worst case overhead numbers would be
very useful. In my scenario I will be using bucket types with allow_mult set
as false with last_write_wins set to false to disable siblings creation but
still use vector clocks for resolving conflicts.

Thanks again!



--
View this message in context: 
http://riak-users.197444.n3.nabble.com/Production-deployment-requirements-for-memory-backend-storage-tp4035073p4035080.html
Sent from the Riak Users mailing list archive at Nabble.com.

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Production deployment requirements for memory backend storage

2017-04-12 Thread Charlie Voiselle
Neeraj:

Thanks for you interest in Riak. I will copy your questions into this email for 
reference and answer them inline.

1. What is the “platform_data_dir” used for when memory is used as storage 
backend? Is it only needed for active anti-entropy and cluster metadata? Do I 
need to persist this data i.e. if a node goes down and restarts in this 
configuration, is persistence of data in “platform_data_dir” required.
As you have pointed out, the platform_data_dir contains more than just the 
actual data stored in the cluster. There are three folders that must be 
persisted for a node to remain a member of a cluster and to not create issues 
with the sizes of the vector clocks internal to the objects. They are:

ring - The binary files that describe the cluster and the vnode ownership 
mappings. Deleting this folder will cause the node to start up and create a new 
default ring. This default ring will allocate 100% of the partitions to that 
node. This is non-fatal and is resolved by rejoining the node to the cluster. 
This extra work can be avoided by persisting the ring file properly.

cluster_meta - This folder contains the properties for bucket types and typed 
custom buckets.

kv_vnode - This folder contains generated actor-ids for each Riak vnode. The 
routine loss of this directory will cause orphaned vnode actor-ids to 
potentially accumulate in objects’ vclocks.

Active anti-entropy is a process to prevent bit-rot in long-lived data. Since 
your questions we concerning ephemeral data, we would recommend that it be 
disabled because there are overheads in creating and maintaining the trees that 
make no sense for ephemeral data.

2. What is the minimum memory requirement of an empty Riak node in this 
configuration?
On a sample node that I brought up, an empty Riak KV 2.2.3, the beam.smp 
process was using 1.5 gb of RAM with an empty memory backend and AAE-disabled.

3. What is the minimum disk and CPU requirement of a Riak node in this 
configuration?
There are a few variables that dictate how much actual disk throughput you will 
use in a Riak cluster that only uses the memory backend-logging overhead, ring 
changes, and cluster metadata changes.

Logging throughput is determined by the general health of the cluster and is 
minimal in clusters that are well-behaved. The logfiles themselves have 
configurable size caps and set numbers of rotations (by default 5 logs capped 
at 50mb for each file). There are some other logfiles that are not managed by 
lager and they can grow beyond these expected limits. If you are building nodes 
optimized for storage, you will want to monitor the size of this folder and 
trim it as appropriate.

The ring is a data structure that is used to hold information about the 
cluster’s membership, the node capabilities, MDC replication configuration, and 
the legacy custom bucket metadata. In stable clusters that are using no custom 
buckets the impact of writes to the ring is negligible; however there are 
certain antipatterns involving the creation of a large number of buckets with 
custom properties in the “default” bucket type that will bloat the ring file 
and result in a large amount of ring gossip.

Finally, Riak bucket types and their properties as well as the custom bucket 
properties of typed buckets is stored in cluster-metadata. This backend is a 
dets-based store that uses hashtree comparisons to maintain consistency across 
members of the cluster. This backend’s storage also depends on the amount and 
speed with which you create metadata within your cluster.

There is more generically-applicable information about [cluster capacity 
planning] 
 in the 
Riak KV documentation.

Thanks again for your interest,

Charlie Voiselle
Sr. Product Manager, Riak KV/Clients
Basho Technologies
@angrycub

[cluster capacity planning] - 
http://docs.basho.com/riak/kv/2.2.3/setup/planning/cluster-capacity/ 



> On Apr 10, 2017, at 3:30 PM, Neeraj Poddar  wrote:
> 
> Hello,
>  
> I wanted to understand the production requirements for using Riak as a 
> non-persistent ephemeral data store. In particular the following questions 
> relate to using Riak with “memory” configured as storage backend:
>  
> 1.   What is the “platform_data_dir” used for when memory is used as 
> storage backend? Is it only needed for active anti-entropy and cluster 
> metadata? Do I need to persist this data i.e. if a node goes down and 
> restarts in this configuration, is persistence of data in “platform_data_dir” 
> required.
> 2.   What is the minimum memory requirement of an empty Riak node in this 
> configuration?
> 3.   What is the minimum disk and CPU requirement of a Riak node in this 
> configuration?
>  
> -- 
> Regards,
> Neeraj Poddar
>  
> ___
> riak-users mailing list
> 

Re: Production deployment requirements for memory backend storage

2017-04-12 Thread nrjpoddar
Hi, It would be very helpful if someone can provide inputs for questions
posted above. Queries related to Riak KV 2.2.0 and above in particular. 



--
View this message in context: 
http://riak-users.197444.n3.nabble.com/Production-deployment-requirements-for-memory-backend-storage-tp4035073p4035077.html
Sent from the Riak Users mailing list archive at Nabble.com.

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Production deployment requirements for memory backend storage

2017-04-10 Thread Neeraj Poddar
Hello,

I wanted to understand the production requirements for using Riak as a 
non-persistent ephemeral data store. In particular the following questions 
relate to using Riak with “memory” configured as storage backend:


1.   What is the “platform_data_dir” used for when memory is used as 
storage backend? Is it only needed for active anti-entropy and cluster 
metadata? Do I need to persist this data i.e. if a node goes down and restarts 
in this configuration, is persistence of data in “platform_data_dir” required.

2.   What is the minimum memory requirement of an empty Riak node in this 
configuration?

3.   What is the minimum disk and CPU requirement of a Riak node in this 
configuration?


--
Regards,
Neeraj Poddar

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com