Hi Alexander and Andrew, Thanks for the follow-up!
Although I would expect to have used `riak-admin cluster leave`, it’s been months at this point and I can’t be sure. Perhaps I did something weird when I was getting started… Given the uncertain state of the system, it may make sense for me to migrate everything to a fresh cluster, unless a simple solution exists. It’s small enough that this would be practical, albeit inconvenient. Your timing in following up is interesting—I just today attempted to `riak-admin cluster leave` a node (104.131.130.237) and it’s still in state “leaving" with 0.0% of ring and the logs filling up with messages like: 2015-04-18 02:45:30.927 [warning] <0.9069.0>@riak_kv_ensemble_backend:handle_down:173 Vnode for Idx: 548063113999088594326381812268606132370974703616 crashed with reason: normal. Output of `riak-admin member-status`: ================================= Membership ================================== Status Ring Pending Node ------------------------------------------------------------------------------- leaving 0.0% -- '[email protected]' valid 34.4% -- '[email protected]' valid 32.8% -- '[email protected]' valid 32.8% -- '[email protected]' ------------------------------------------------------------------------------- Valid:3 / Leaving:1 / Exiting:0 / Joining:0 / Down:0 Output of `ring-admin ring-status`: ================================== Claimant =================================== Claimant: '[email protected]' Status: up Ring Ready: true ============================== Ownership Handoff ============================== No pending changes. ============================== Unreachable Nodes ============================== All nodes are up and reachable With regard to staging being spread out across NA, my thinking was that staging under extreme conditions would serve as a canary as well as help me familiarize myself with the performance characteristics of Riak. However it ended up working perfectly (including strong consistency), so I never ended up moving the servers to be in the same geographical area. I'd be reluctant to put everything in one LAN when the key requirement that lead us to pick Riak was high availability, and network issues at a single datacenter seems to be our most frequent mode of failure. I benchmarked under various network configurations and all seemed to work flawlessly and with acceptable performance. Do you think this is reasonable? Thanks again! Jonathan Koff B.CS. co-founder of Projexity www.projexity.com <http://www.projexity.com/> follow us on facebook at: www.facebook.com/projexity <http://www.facebook.com/projexity> follow us on twitter at: twitter.com/projexity <http://twitter.com/projexity> > On Apr 17, 2015, at 7:49 PM, Alexander Sicular <[email protected]> wrote: > > Hi Jonathan, > > "staging (3 servers across NA)" > > If this means you're spreading your cluster across North America I would > suggest you reconsider. A Riak cluster is meant to be deployed in one data > center, more specifically in one LAN. Connecting Riak nodes over a WAN > introduces network latencies. Riak's approach to multi datacenter replication > is as a cluster of clusters. That said, I don't believe strong consistency is > supported yet in an mdc environment. > > -Alexander > > @siculars > http://siculars.posthaven.com <http://siculars.posthaven.com/> > > Sent from my iRotaryPhone > > On Apr 17, 2015, at 16:19, Andrew Stone <[email protected] > <mailto:[email protected]>> wrote: > >> Hi Jonathan, >> >> Sorry for the late reply. It looks like riak_ensemble still thinks that >> those old nodes are part of the cluster. Did you remove them with >> 'riak-admin cluster leave' ? If so they should have been removed from the >> root ensemble also, and the machines shouldn't have actually left the >> cluster until all the ensembles were reconfigured via joint consensus. Can >> you paste the results from the following commands: >> >> riak-admin member-status >> riak-admin ring-status >> >> Thanks, >> Andrew >> >> >> On Mon, Mar 23, 2015 at 11:25 AM, Jonathan Koff <[email protected] >> <mailto:[email protected]>> wrote: >> Hi all, >> >> I recently used Riak’s Strong Consistency functionality to get >> auto-incrementing IDs for a feature of an application I’m working on, and >> although this worked great in dev (5 nodes in 1 VM) and staging (3 servers >> across NA) environments, I’ve run into some odd behaviour in production >> (originally 3 servers, now 4) that prevents it from working. >> >> I initially noticed that consistent requests were immediately failing as >> timeouts, and upon checking `riak-admin ensemble-status` saw that many >> ensembles were at 0 / 3, from the vantage point of the box I was SSH’d into. >> Interestingly, SSH-ing into different boxes showed different results. Here’s >> a brief snippet of what I see now, after adding a fourth server in a >> troubleshooting attempt: >> >> *Machine 1* (104.131.39.61) >> >> ============================== Consensus System >> =============================== >> Enabled: true >> Active: true >> Ring Ready: true >> Validation: strong (trusted majority required) >> Metadata: best-effort replication (asynchronous) >> >> ================================== Ensembles >> ================================== >> Ensemble Quorum Nodes Leader >> ------------------------------------------------------------------------------- >> root 0 / 6 3 / 6 -- >> 2 0 / 3 3 / 3 -- >> 3 3 / 3 3 / 3 [email protected] >> <mailto:[email protected]> >> 4 3 / 3 3 / 3 [email protected] >> <mailto:[email protected]> >> 5 3 / 3 3 / 3 [email protected] >> <mailto:[email protected]> >> 6 0 / 3 3 / 3 -- >> 7 0 / 3 3 / 3 -- >> 8 0 / 3 3 / 3 -- >> 9 3 / 3 3 / 3 [email protected] >> <mailto:[email protected]> >> 10 3 / 3 3 / 3 [email protected] >> <mailto:[email protected]> >> 11 0 / 3 3 / 3 -- >> >> *Machine 2* (104.236.79.78) >> >> ============================== Consensus System >> =============================== >> Enabled: true >> Active: true >> Ring Ready: true >> Validation: strong (trusted majority required) >> Metadata: best-effort replication (asynchronous) >> >> ================================== Ensembles >> ================================== >> Ensemble Quorum Nodes Leader >> ------------------------------------------------------------------------------- >> root 0 / 6 3 / 6 -- >> 2 3 / 3 3 / 3 [email protected] >> <mailto:[email protected]> >> 3 3 / 3 3 / 3 [email protected] >> <mailto:[email protected]> >> 4 3 / 3 3 / 3 [email protected] >> <mailto:[email protected]> >> 5 3 / 3 3 / 3 [email protected] >> <mailto:[email protected]> >> 6 3 / 3 3 / 3 [email protected] >> <mailto:[email protected]> >> 7 0 / 3 3 / 3 -- >> 8 0 / 3 3 / 3 -- >> 9 3 / 3 3 / 3 [email protected] >> <mailto:[email protected]> >> 10 3 / 3 3 / 3 [email protected] >> <mailto:[email protected]> >> 11 3 / 3 3 / 3 [email protected] >> <mailto:[email protected]> >> >> *Machine 3* (104.131.130.237) >> >> ============================== Consensus System >> =============================== >> Enabled: true >> Active: true >> Ring Ready: true >> Validation: strong (trusted majority required) >> Metadata: best-effort replication (asynchronous) >> >> ================================== Ensembles >> ================================== >> Ensemble Quorum Nodes Leader >> ------------------------------------------------------------------------------- >> root 0 / 6 3 / 6 -- >> 2 0 / 3 3 / 3 -- >> 3 3 / 3 3 / 3 [email protected] >> <mailto:[email protected]> >> 4 3 / 3 3 / 3 [email protected] >> <mailto:[email protected]> >> 5 3 / 3 3 / 3 [email protected] >> <mailto:[email protected]> >> 6 0 / 3 3 / 3 -- >> 7 0 / 3 3 / 3 -- >> 8 0 / 3 3 / 3 -- >> 9 3 / 3 3 / 3 [email protected] >> <mailto:[email protected]> >> 10 3 / 3 3 / 3 [email protected] >> <mailto:[email protected]> >> 11 0 / 3 3 / 3 -- >> >> *Machine 4* (162.243.5.87) >> >> ============================== Consensus System >> =============================== >> Enabled: true >> Active: true >> Ring Ready: true >> Validation: strong (trusted majority required) >> Metadata: best-effort replication (asynchronous) >> >> ================================== Ensembles >> ================================== >> Ensemble Quorum Nodes Leader >> ------------------------------------------------------------------------------- >> root 0 / 6 3 / 6 -- >> 2 3 / 3 3 / 3 [email protected] >> <mailto:[email protected]> >> 3 3 / 3 3 / 3 [email protected] >> <mailto:[email protected]> >> 4 3 / 3 3 / 3 [email protected] >> <mailto:[email protected]> >> 5 3 / 3 3 / 3 [email protected] >> <mailto:[email protected]> >> 6 3 / 3 3 / 3 [email protected] >> <mailto:[email protected]> >> 7 3 / 3 3 / 3 [email protected] >> <mailto:[email protected]> >> 8 3 / 3 3 / 3 [email protected] >> <mailto:[email protected]> >> 9 3 / 3 3 / 3 [email protected] >> <mailto:[email protected]> >> 10 3 / 3 3 / 3 [email protected] >> <mailto:[email protected]> >> 11 3 / 3 3 / 3 [email protected] >> <mailto:[email protected]> >> >> >> Interestingly, Machine 4 has full quora for all ensembles except for root, >> while Machine 3 only sees itself as a leader. >> >> Another interesting point is the output of `riak-admin ensemble-status root`: >> >> ================================= Ensemble #1 >> ================================= >> Id: root >> Leader: -- >> Leader ready: false >> >> ==================================== Peers >> ==================================== >> Peer Status Trusted Epoch Node >> ------------------------------------------------------------------------------- >> 1 (offline) -- -- [email protected] >> <mailto:[email protected]> >> 2 probe no 8 [email protected] >> <mailto:[email protected]> >> 3 (offline) -- -- [email protected] >> <mailto:[email protected]> >> 4 (offline) -- -- [email protected] >> <mailto:[email protected]> >> 5 probe no 8 [email protected] >> <mailto:[email protected]> >> 6 probe no 8 [email protected] >> <mailto:[email protected]> >> >> This is consistent across all 4 machines, and seems to include some old IPs >> from machines that left the cluster quite a while back, almost definitely >> before I’d used Riak's Strong Consistency. Note that the reason I added the >> fourth machine (104.131.39.61) was to see if this output would change, >> perhaps resulting in a quorum for the root ensemble. >> >> For reference, here’s the status of a sample ensemble that isn’t “Leader >> ready”, from the perspective of Machine 2: >> ================================ Ensemble #62 >> ================================= >> Id: {kv,1370157784997721485815954530671515330927436759040,3} >> Leader: -- >> Leader ready: false >> >> ==================================== Peers >> ==================================== >> Peer Status Trusted Epoch Node >> ------------------------------------------------------------------------------- >> 1 following yes 43 [email protected] >> <mailto:[email protected]> >> 2 following yes 43 [email protected] >> <mailto:[email protected]> >> 3 leading yes 43 [email protected] >> <mailto:[email protected]> >> >> >> My config consists of riak.conf with: >> >> strong_consistency = on >> >> and advanced.config with: >> >> [ >> {riak_core, >> [ >> {target_n_val, 5} >> ]}, >> {riak_ensemble, >> [ >> {ensemble_tick, 5000} >> ]} >> ]. >> >> though I’ve experimented with the latter in an attempt to get this resolved. >> >> I didn’t see any relevant-looking log output on any of the servers. >> >> Has anyone come across this before? >> >> Thanks! >> >> Jonathan Koff B.CS. >> co-founder of Projexity >> www.projexity.com <http://www.projexity.com/> >> >> follow us on facebook at: www.facebook.com/projexity >> <http://www.facebook.com/projexity> >> follow us on twitter at: twitter.com/projexity <http://twitter.com/projexity> >> >> _______________________________________________ >> riak-users mailing list >> [email protected] <mailto:[email protected]> >> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com >> <http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com> >> >> >> _______________________________________________ >> riak-users mailing list >> [email protected] <mailto:[email protected]> >> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com >> <http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com>
_______________________________________________ riak-users mailing list [email protected] http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
