Hi Jan, When you update the Kubernates nodes, do you have to do them all at once or can they be done in a rolling fashion (one after another)?
If you can do them rolling-wise, you should be able to: For each node, one at a time: 1. Shut down Riak 2. Shutdown/restart/upgrade Kubernates 3. Start Riak 4. Use `riak-admin force-replace` to rename the old node name to the new node name 5. Repeat on remaining nodes. This is covered in "Renaming Multi-node clusters <http://docs.basho.com/riak/kv/2.1.4/using/cluster-operations/changing-cluster-info/#rename-multi-node-clusters>" doc. As for your current predicament, have you created any new buckets/changed bucket props in the default namespace since you restarted? Or have you only done regular operations since? Thanks, Alex On Mon, Jun 6, 2016 at 5:25 AM Jan-Philip Loos <maxda...@gmail.com> wrote: > Hi, > > we are using riak in a kuberentes cluster (on GKE). Sometimes it's > necessary to reboot the complete cluster to update the kubernetes-nodes. > This results in a complete shutdown of the riak cluster and the riak-nodes > are rescheduled with a new IP. So how can I handle this situation? How can > I form a new riak cluster out of the old nodes with new names? > > The /var/lib/riak directory is persisted. I had to delete the > /var/lib/riak/ring folder otherwise "riak start" crashed with this message > (but saved the old ring state in a tar): > > {"Kernel pid >> terminated",application_controller,"{application_start_failure,riak_core,{{shutdown,{failed_to_start_child,riak_core_broadcast,{'EXIT',{function_clause,[{orddict,fetch,[' >> riak@10.44.2.8 >> ',[]],[{file,\"orddict.erl\"},{line,72}]},{riak_core_broadcast,init_peers,1,[{file,\"src/riak_core_broadcast.erl\"},{line,616}]},{riak_core_broadcast,start_link,0,[{file,\"src/riak_core_broadcast.erl\"},{line,116}]},{supervisor,do_start_child,2,[{file,\"supervisor.erl\"},{line,310}]},{supervisor,start_children,3,[{file,\"supervisor.erl\"},{line,293}]},{supervisor,init_children,2,[{file,\"supervisor.erl\"},{line,259}]},{gen_server,init_it,6,[{file,\"gen_server.erl\"},{line,304}]},{proc_lib,init_p_do_apply,3,[{file,\"proc_lib.erl\"},{line,239}]}]}}}},{riak_core_app,start,[normal,[]]}}}"} >> Crash dump was written to: /var/log/riak/erl_crash.dump >> Kernel pid terminated (application_controller) >> ({application_start_failure,riak_core,{{shutdown,{failed_to_start_child,riak_core_broadcast,{'EXIT',{function_clause,[{orddict,fetch,[' >> riak@10.44.2.8', > > > The I formed a new cluster via join & plan & commit. > > But now, I discovered a problems with incomplete and inconsistent > partitions: > > *$ *curl -Ss " > http://riak.default.svc.cluster.local:8098/buckets/users/keys?keys=true" > | jq '.[] | length' > > 3064 > > *$* curl -Ss " > http://riak.default.svc.cluster.local:8098/buckets/users/keys?keys=true" > | jq '.[] | length' > > 2987 > > *$* curl -Ss " > http://riak.default.svc.cluster.local:8098/buckets/users/keys?keys=true" > | jq '.[] | length' > > 705 > > *$* curl -Ss " > http://riak.default.svc.cluster.local:8098/buckets/users/keys?keys=true" > | jq '.[] | length' > 3064 > > Is there a way to fix this? I guess this is caused by the missing old > ring-state? > > Greetings > > Jan > _______________________________________________ > riak-users mailing list > riak-users@lists.basho.com > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com >
_______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com