Re: Issues with partition distribution across nodes

2017-05-24 Thread Russell Brown

On 24 May 2017, at 15:44, Denis  wrote:

> Hi Russell
> 
> Thank you for your suggestions. I found diag is saying: "The following 
> preflists do not satisfy the n_val. Please add more nodes". It seems because 
> ring size (128) divided in 6 nodes is hard to arrange. 
> The history of our cluster is long story, cause we testing it in our lab. 
> Initially it was deployed with 5 nodes without issues. Then it was expanded 
> to 6 nodes, without issues again. After some time all storage space on whole 
> cluster was fully utilized and we had to remove all data from leveldb dir, 
> flush ring dir and rebuild cluster. This has been done by adding all 6 nodes 
> at one time, so this may be the case. We can try to flush cluster data once 
> again and then add nodes one by one (committing cluster change each time), 
> waiting to partition transfer to end each time.

Or add two more nodes in one go, might be quicker, and if time is money, 
cheaper.

> 
> 2017-05-24 15:44 GMT+03:00 Russell Brown :
> Hi,
> 
> This is just a quick reply since this is somewhat a current  topic on the ML.
> 
> On 24 May 2017, at 12:57, Denis Gudtsov  wrote:
> 
> > Hello
> >
> > We have 6-nodes cluster with ring size 128 configured. The problem is that
> > two partitions has replicas only on two nodes rather than three as required
> > (n_val=3). We have tried several times to clean leveldb and ring directories
> > and then rebuild cluster, but this issue is still present.
> 
> There was a fairly long discussion about this very issue recently (see 
> http://lists.basho.com/pipermail/riak-users_lists.basho.com/2017-May/019281.html)
> 
> I ran a little code and the following {RingSize, NodeCount, IsViolated} 
> tuples were the result. If you built any of these clusters from scratch (i.e. 
> you started NodeCount nodes, and used riak-admin cluster join, riak-admin 
> cluster plan, riak-admin cluster commit to create a cluster of NodeCount from 
> scratch) then you have tail violations in your ring.
> 
> [{16,3,true},
>  {16,5,true},
>  {16,7,true},
>  {16,13,true},
>  {16,14,true},
>  {32,3,true},
>  {32,5,true},
>  {32,6,true},
>  {32,10,true},
>  {64,3,true},
>  {64,7,true},
>  {64,9,true},
>  {128,3,true},
>  {128,5,true},
>  {128,6,true},
>  {128,7,true},
>  {128,9,true},
>  {128,14,true},
>  {256,3,true},
>  {256,5,true},
>  {256,11,true},
>  {512,3,true},
>  {512,5,true},
>  {512,6,true},
>  {512,7,true},
>  {512,10,true}]
> 
> 
> > How can we diagnose where the issue is and fix it?
> 
> WRT your problem, a quick experiment looks like adding 2 new nodes will solve 
> your problem, just adding one doesn’t look like it does. I tried just adding 
> one new node and still had a single violated preflist, but I have just thrown 
> a little experiment together so I could well be wrong. It doesn’t actually 
> build any clusters, and uses the claim code out of context, ymmv
> 
> > Is there any way how we
> > can assign partition to node manually?
> 
> I don’t know of a way, but that would be very useful.
> 
> Do you remember if this cluster was built all at once as a 6-node cluster, or 
> has it grown over time? Have you run the command riak-admin diag 
> ring_preflists as documented here 
> http://docs.basho.com/riak/kv/2.2.3/setup/upgrading/checklist/#confirming-configuration-with-riaknostic?
> 
> Sorry I can’t be more help
> 
> Cheers
> 
> Russell
> 
> >
> > Please find output of member-status below and screen from riak control ring
> > status:
> > [root@riak01 ~]# riak-admin  member-status
> > = Membership
> > ==
> > Status RingPendingNode
> > ---
> > valid  17.2%  --  'riak@riak01.
> > valid  17.2%  --  'riak@riak02.
> > valid  16.4%  --  'riak@riak03.
> > valid  16.4%  --  'riak@riak04.
> > valid  16.4%  --  'riak@riak05.
> > valid  16.4%  --  'riak@riak06.
> > ---
> > Valid:6 / Leaving:0 / Exiting:0 / Joining:0 / Down:0
> >
> > 
> >
> > Thank you.
> >
> >
> >
> > --
> > View this message in context: 
> > http://riak-users.197444.n3.nabble.com/Issues-with-partition-distribution-across-nodes-tp4035179.html
> > Sent from the Riak Users mailing list archive at Nabble.com.
> >
> > ___
> > riak-users mailing list
> > riak-users@lists.basho.com
> > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> 
> 


___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Issues with partition distribution across nodes

2017-05-24 Thread Denis
Hi Russell

Thank you for your suggestions. I found diag is saying: "The following
preflists do not satisfy the n_val. Please add more nodes". It seems
because ring size (128) divided in 6 nodes is hard to arrange.
The history of our cluster is long story, cause we testing it in our lab.
Initially it was deployed with 5 nodes without issues. Then it was expanded
to 6 nodes, without issues again. After some time all storage space on
whole cluster was fully utilized and we had to remove all data from leveldb
dir, flush ring dir and rebuild cluster. This has been done by adding all 6
nodes at one time, so this may be the case. We can try to flush cluster
data once again and then add nodes one by one (committing cluster change
each time), waiting to partition transfer to end each time.

2017-05-24 15:44 GMT+03:00 Russell Brown :

> Hi,
>
> This is just a quick reply since this is somewhat a current  topic on the
> ML.
>
> On 24 May 2017, at 12:57, Denis Gudtsov  wrote:
>
> > Hello
> >
> > We have 6-nodes cluster with ring size 128 configured. The problem is
> that
> > two partitions has replicas only on two nodes rather than three as
> required
> > (n_val=3). We have tried several times to clean leveldb and ring
> directories
> > and then rebuild cluster, but this issue is still present.
>
> There was a fairly long discussion about this very issue recently (see
> http://lists.basho.com/pipermail/riak-users_lists.
> basho.com/2017-May/019281.html)
>
> I ran a little code and the following {RingSize, NodeCount, IsViolated}
> tuples were the result. If you built any of these clusters from scratch
> (i.e. you started NodeCount nodes, and used riak-admin cluster join,
> riak-admin cluster plan, riak-admin cluster commit to create a cluster of
> NodeCount from scratch) then you have tail violations in your ring.
>
> [{16,3,true},
>  {16,5,true},
>  {16,7,true},
>  {16,13,true},
>  {16,14,true},
>  {32,3,true},
>  {32,5,true},
>  {32,6,true},
>  {32,10,true},
>  {64,3,true},
>  {64,7,true},
>  {64,9,true},
>  {128,3,true},
>  {128,5,true},
>  {128,6,true},
>  {128,7,true},
>  {128,9,true},
>  {128,14,true},
>  {256,3,true},
>  {256,5,true},
>  {256,11,true},
>  {512,3,true},
>  {512,5,true},
>  {512,6,true},
>  {512,7,true},
>  {512,10,true}]
>
>
> > How can we diagnose where the issue is and fix it?
>
> WRT your problem, a quick experiment looks like adding 2 new nodes will
> solve your problem, just adding one doesn’t look like it does. I tried just
> adding one new node and still had a single violated preflist, but I have
> just thrown a little experiment together so I could well be wrong. It
> doesn’t actually build any clusters, and uses the claim code out of
> context, ymmv
>
> > Is there any way how we
> > can assign partition to node manually?
>
> I don’t know of a way, but that would be very useful.
>
> Do you remember if this cluster was built all at once as a 6-node cluster,
> or has it grown over time? Have you run the command riak-admin diag
> ring_preflists as documented here http://docs.basho.com/riak/kv/
> 2.2.3/setup/upgrading/checklist/#confirming-configuration-with-riaknostic?
>
> Sorry I can’t be more help
>
> Cheers
>
> Russell
>
> >
> > Please find output of member-status below and screen from riak control
> ring
> > status:
> > [root@riak01 ~]# riak-admin  member-status
> > = Membership
> > ==
> > Status RingPendingNode
> > 
> ---
> > valid  17.2%  --  'riak@riak01.
> > valid  17.2%  --  'riak@riak02.
> > valid  16.4%  --  'riak@riak03.
> > valid  16.4%  --  'riak@riak04.
> > valid  16.4%  --  'riak@riak05.
> > valid  16.4%  --  'riak@riak06.
> > 
> ---
> > Valid:6 / Leaving:0 / Exiting:0 / Joining:0 / Down:0
> >
> > 
> >
> > Thank you.
> >
> >
> >
> > --
> > View this message in context: http://riak-users.197444.n3.
> nabble.com/Issues-with-partition-distribution-across-nodes-tp4035179.html
> > Sent from the Riak Users mailing list archive at Nabble.com.
> >
> > ___
> > riak-users mailing list
> > riak-users@lists.basho.com
> > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>
___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: Issues with partition distribution across nodes

2017-05-24 Thread Russell Brown
Hi,

This is just a quick reply since this is somewhat a current  topic on the ML.

On 24 May 2017, at 12:57, Denis Gudtsov  wrote:

> Hello
> 
> We have 6-nodes cluster with ring size 128 configured. The problem is that
> two partitions has replicas only on two nodes rather than three as required
> (n_val=3). We have tried several times to clean leveldb and ring directories
> and then rebuild cluster, but this issue is still present. 

There was a fairly long discussion about this very issue recently (see 
http://lists.basho.com/pipermail/riak-users_lists.basho.com/2017-May/019281.html)

I ran a little code and the following {RingSize, NodeCount, IsViolated} tuples 
were the result. If you built any of these clusters from scratch (i.e. you 
started NodeCount nodes, and used riak-admin cluster join, riak-admin cluster 
plan, riak-admin cluster commit to create a cluster of NodeCount from scratch) 
then you have tail violations in your ring.

[{16,3,true},
 {16,5,true},
 {16,7,true},
 {16,13,true},
 {16,14,true},
 {32,3,true},
 {32,5,true},
 {32,6,true},
 {32,10,true},
 {64,3,true},
 {64,7,true},
 {64,9,true},
 {128,3,true},
 {128,5,true},
 {128,6,true},
 {128,7,true},
 {128,9,true},
 {128,14,true},
 {256,3,true},
 {256,5,true},
 {256,11,true},
 {512,3,true},
 {512,5,true},
 {512,6,true},
 {512,7,true},
 {512,10,true}]


> How can we diagnose where the issue is and fix it?

WRT your problem, a quick experiment looks like adding 2 new nodes will solve 
your problem, just adding one doesn’t look like it does. I tried just adding 
one new node and still had a single violated preflist, but I have just thrown a 
little experiment together so I could well be wrong. It doesn’t actually build 
any clusters, and uses the claim code out of context, ymmv

> Is there any way how we
> can assign partition to node manually? 

I don’t know of a way, but that would be very useful.

Do you remember if this cluster was built all at once as a 6-node cluster, or 
has it grown over time? Have you run the command riak-admin diag ring_preflists 
as documented here 
http://docs.basho.com/riak/kv/2.2.3/setup/upgrading/checklist/#confirming-configuration-with-riaknostic?

Sorry I can’t be more help

Cheers

Russell

> 
> Please find output of member-status below and screen from riak control ring
> status:
> [root@riak01 ~]# riak-admin  member-status
> = Membership
> ==
> Status RingPendingNode
> ---
> valid  17.2%  --  'riak@riak01.
> valid  17.2%  --  'riak@riak02.
> valid  16.4%  --  'riak@riak03.
> valid  16.4%  --  'riak@riak04.
> valid  16.4%  --  'riak@riak05.
> valid  16.4%  --  'riak@riak06.
> ---
> Valid:6 / Leaving:0 / Exiting:0 / Joining:0 / Down:0
> 
>  
> 
> Thank you.
> 
> 
> 
> --
> View this message in context: 
> http://riak-users.197444.n3.nabble.com/Issues-with-partition-distribution-across-nodes-tp4035179.html
> Sent from the Riak Users mailing list archive at Nabble.com.
> 
> ___
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com