Hi Alexander and Andrew,

Thanks for the follow-up!

Although I would expect to have used `riak-admin cluster leave`, it’s been 
months at this point and I can’t be sure. Perhaps I did something weird when I 
was getting started…

Given the uncertain state of the system, it may make sense for me to migrate 
everything to a fresh cluster, unless a simple solution exists. It’s small 
enough that this would be practical, albeit inconvenient.

Your timing in following up is interesting—I just today attempted to 
`riak-admin cluster leave` a node (104.131.130.237) and it’s still in state 
“leaving" with 0.0% of ring and the logs filling up with messages like:
2015-04-18 02:45:30.927 [warning] 
<0.9069.0>@riak_kv_ensemble_backend:handle_down:173 Vnode for Idx: 
548063113999088594326381812268606132370974703616 crashed with reason: normal.

Output of `riak-admin member-status`:
================================= Membership ==================================
Status     Ring    Pending    Node
-------------------------------------------------------------------------------
leaving     0.0%      --      '[email protected]'
valid      34.4%      --      '[email protected]'
valid      32.8%      --      '[email protected]'
valid      32.8%      --      '[email protected]'
-------------------------------------------------------------------------------
Valid:3 / Leaving:1 / Exiting:0 / Joining:0 / Down:0

Output of `ring-admin ring-status`:
================================== Claimant ===================================
Claimant:  '[email protected]'
Status:     up
Ring Ready: true

============================== Ownership Handoff ==============================
No pending changes.

============================== Unreachable Nodes ==============================
All nodes are up and reachable



With regard to staging being spread out across NA, my thinking was that staging 
under extreme conditions would serve as a canary as well as help me familiarize 
myself with the performance characteristics of Riak. However it ended up 
working perfectly (including strong consistency), so I never ended up moving 
the servers to be in the same geographical area.

I'd be reluctant to put everything in one LAN when the key requirement that 
lead us to pick Riak was high availability, and network issues at a single 
datacenter seems to be our most frequent mode of failure. I benchmarked under 
various network configurations and all seemed to work flawlessly and with 
acceptable performance. Do you think this is reasonable?


Thanks again!

Jonathan Koff B.CS.
co-founder of Projexity
www.projexity.com <http://www.projexity.com/>

follow us on facebook at: www.facebook.com/projexity 
<http://www.facebook.com/projexity>
follow us on twitter at: twitter.com/projexity <http://twitter.com/projexity>
> On Apr 17, 2015, at 7:49 PM, Alexander Sicular <[email protected]> wrote:
> 
> Hi Jonathan,
> 
> "staging (3 servers across NA)"
> 
> If this means you're spreading your cluster across North America I would 
> suggest you reconsider. A Riak cluster is meant to be deployed in one data 
> center, more specifically in one LAN. Connecting Riak nodes over a WAN 
> introduces network latencies. Riak's approach to multi datacenter replication 
> is as a cluster of clusters. That said, I don't believe strong consistency is 
> supported yet in an mdc environment. 
> 
> -Alexander 
> 
> @siculars
> http://siculars.posthaven.com <http://siculars.posthaven.com/>
> 
> Sent from my iRotaryPhone
> 
> On Apr 17, 2015, at 16:19, Andrew Stone <[email protected] 
> <mailto:[email protected]>> wrote:
> 
>> Hi Jonathan,
>>  
>> Sorry for the late reply. It looks like riak_ensemble still thinks that 
>> those old nodes are part of the cluster. Did you remove them with 
>> 'riak-admin cluster leave' ? If so they should have been removed from the 
>> root ensemble also, and the machines shouldn't have actually left the 
>> cluster until all the ensembles were reconfigured via joint consensus. Can 
>> you paste the results from the following commands:
>> 
>> riak-admin member-status
>> riak-admin ring-status
>> 
>> Thanks,
>> Andrew
>> 
>> 
>> On Mon, Mar 23, 2015 at 11:25 AM, Jonathan Koff <[email protected] 
>> <mailto:[email protected]>> wrote:
>> Hi all,
>> 
>> I recently used Riak’s Strong Consistency functionality to get 
>> auto-incrementing IDs for a feature of an application I’m working on, and 
>> although this worked great in dev (5 nodes in 1 VM) and staging (3 servers 
>> across NA) environments, I’ve run into some odd behaviour in production 
>> (originally 3 servers, now 4) that prevents it from working.
>> 
>> I initially noticed that consistent requests were immediately failing as 
>> timeouts, and upon checking `riak-admin ensemble-status` saw that many 
>> ensembles were at 0 / 3, from the vantage point of the box I was SSH’d into. 
>> Interestingly, SSH-ing into different boxes showed different results. Here’s 
>> a brief snippet of what I see now, after adding a fourth server in a 
>> troubleshooting attempt:
>> 
>> *Machine 1* (104.131.39.61)
>> 
>> ============================== Consensus System 
>> ===============================
>> Enabled:     true
>> Active:      true
>> Ring Ready:  true
>> Validation:  strong (trusted majority required)
>> Metadata:    best-effort replication (asynchronous)
>> 
>> ================================== Ensembles 
>> ==================================
>>  Ensemble     Quorum        Nodes      Leader
>> -------------------------------------------------------------------------------
>>    root       0 / 6         3 / 6      --
>>     2         0 / 3         3 / 3      --
>>     3         3 / 3         3 / 3      [email protected] 
>> <mailto:[email protected]>
>>     4         3 / 3         3 / 3      [email protected] 
>> <mailto:[email protected]>
>>     5         3 / 3         3 / 3      [email protected] 
>> <mailto:[email protected]>
>>     6         0 / 3         3 / 3      --
>>     7         0 / 3         3 / 3      --
>>     8         0 / 3         3 / 3      --
>>     9         3 / 3         3 / 3      [email protected] 
>> <mailto:[email protected]>
>>     10        3 / 3         3 / 3      [email protected] 
>> <mailto:[email protected]>
>>     11        0 / 3         3 / 3      --
>> 
>> *Machine 2* (104.236.79.78)
>> 
>> ============================== Consensus System 
>> ===============================
>> Enabled:     true
>> Active:      true
>> Ring Ready:  true
>> Validation:  strong (trusted majority required)
>> Metadata:    best-effort replication (asynchronous)
>> 
>> ================================== Ensembles 
>> ==================================
>>  Ensemble     Quorum        Nodes      Leader
>> -------------------------------------------------------------------------------
>>    root       0 / 6         3 / 6      --
>>     2         3 / 3         3 / 3      [email protected] 
>> <mailto:[email protected]>
>>     3         3 / 3         3 / 3      [email protected] 
>> <mailto:[email protected]>
>>     4         3 / 3         3 / 3      [email protected] 
>> <mailto:[email protected]>
>>     5         3 / 3         3 / 3      [email protected] 
>> <mailto:[email protected]>
>>     6         3 / 3         3 / 3      [email protected] 
>> <mailto:[email protected]>
>>     7         0 / 3         3 / 3      --
>>     8         0 / 3         3 / 3      --
>>     9         3 / 3         3 / 3      [email protected] 
>> <mailto:[email protected]>
>>     10        3 / 3         3 / 3      [email protected] 
>> <mailto:[email protected]>
>>     11        3 / 3         3 / 3      [email protected] 
>> <mailto:[email protected]>
>> 
>> *Machine 3* (104.131.130.237)
>> 
>> ============================== Consensus System 
>> ===============================
>> Enabled:     true
>> Active:      true
>> Ring Ready:  true
>> Validation:  strong (trusted majority required)
>> Metadata:    best-effort replication (asynchronous)
>> 
>> ================================== Ensembles 
>> ==================================
>>  Ensemble     Quorum        Nodes      Leader
>> -------------------------------------------------------------------------------
>>    root       0 / 6         3 / 6      --
>>     2         0 / 3         3 / 3      --
>>     3         3 / 3         3 / 3      [email protected] 
>> <mailto:[email protected]>
>>     4         3 / 3         3 / 3      [email protected] 
>> <mailto:[email protected]>
>>     5         3 / 3         3 / 3      [email protected] 
>> <mailto:[email protected]>
>>     6         0 / 3         3 / 3      --
>>     7         0 / 3         3 / 3      --
>>     8         0 / 3         3 / 3      --
>>     9         3 / 3         3 / 3      [email protected] 
>> <mailto:[email protected]>
>>     10        3 / 3         3 / 3      [email protected] 
>> <mailto:[email protected]>
>>     11        0 / 3         3 / 3      --
>> 
>> *Machine 4* (162.243.5.87)
>> 
>> ============================== Consensus System 
>> ===============================
>> Enabled:     true
>> Active:      true
>> Ring Ready:  true
>> Validation:  strong (trusted majority required)
>> Metadata:    best-effort replication (asynchronous)
>> 
>> ================================== Ensembles 
>> ==================================
>>  Ensemble     Quorum        Nodes      Leader
>> -------------------------------------------------------------------------------
>>    root       0 / 6         3 / 6      --
>>     2         3 / 3         3 / 3      [email protected] 
>> <mailto:[email protected]>
>>     3         3 / 3         3 / 3      [email protected] 
>> <mailto:[email protected]>
>>     4         3 / 3         3 / 3      [email protected] 
>> <mailto:[email protected]>
>>     5         3 / 3         3 / 3      [email protected] 
>> <mailto:[email protected]>
>>     6         3 / 3         3 / 3      [email protected] 
>> <mailto:[email protected]>
>>     7         3 / 3         3 / 3      [email protected] 
>> <mailto:[email protected]>
>>     8         3 / 3         3 / 3      [email protected] 
>> <mailto:[email protected]>
>>     9         3 / 3         3 / 3      [email protected] 
>> <mailto:[email protected]>
>>     10        3 / 3         3 / 3      [email protected] 
>> <mailto:[email protected]>
>>     11        3 / 3         3 / 3      [email protected] 
>> <mailto:[email protected]>
>> 
>> 
>> Interestingly, Machine 4 has full quora for all ensembles except for root, 
>> while Machine 3 only sees itself as a leader.
>> 
>> Another interesting point is the output of `riak-admin ensemble-status root`:
>> 
>> ================================= Ensemble #1 
>> =================================
>> Id:           root
>> Leader:       --
>> Leader ready: false
>> 
>> ==================================== Peers 
>> ====================================
>>  Peer  Status     Trusted          Epoch         Node
>> -------------------------------------------------------------------------------
>>   1    (offline)    --              --           [email protected] 
>> <mailto:[email protected]>
>>   2      probe      no              8            [email protected] 
>> <mailto:[email protected]>
>>   3    (offline)    --              --           [email protected] 
>> <mailto:[email protected]>
>>   4    (offline)    --              --           [email protected] 
>> <mailto:[email protected]>
>>   5      probe      no              8            [email protected] 
>> <mailto:[email protected]>
>>   6      probe      no              8            [email protected] 
>> <mailto:[email protected]>
>> 
>> This is consistent across all 4 machines, and seems to include some old IPs 
>> from machines that left the cluster quite a while back, almost definitely 
>> before I’d used Riak's Strong Consistency. Note that the reason I added the 
>> fourth machine (104.131.39.61) was to see if this output would change, 
>> perhaps resulting in a quorum for the root ensemble.
>> 
>> For reference, here’s the status of a sample ensemble that isn’t “Leader 
>> ready”, from the perspective of Machine 2:
>> ================================ Ensemble #62 
>> =================================
>> Id:           {kv,1370157784997721485815954530671515330927436759040,3}
>> Leader:       --
>> Leader ready: false
>> 
>> ==================================== Peers 
>> ====================================
>>  Peer  Status     Trusted          Epoch         Node
>> -------------------------------------------------------------------------------
>>   1    following    yes             43           [email protected] 
>> <mailto:[email protected]>
>>   2    following    yes             43           [email protected] 
>> <mailto:[email protected]>
>>   3     leading     yes             43           [email protected] 
>> <mailto:[email protected]>
>> 
>> 
>> My config consists of riak.conf with:
>> 
>> strong_consistency = on
>> 
>> and advanced.config with:
>> 
>> [
>>   {riak_core,
>>     [
>>       {target_n_val, 5}
>>       ]},
>>   {riak_ensemble,
>>     [
>>       {ensemble_tick, 5000}
>>     ]}
>> ].
>> 
>> though I’ve experimented with the latter in an attempt to get this resolved.
>> 
>> I didn’t see any relevant-looking log output on any of the servers.
>> 
>> Has anyone come across this before?
>> 
>> Thanks!
>> 
>> Jonathan Koff B.CS.
>> co-founder of Projexity
>> www.projexity.com <http://www.projexity.com/>
>> 
>> follow us on facebook at: www.facebook.com/projexity 
>> <http://www.facebook.com/projexity>
>> follow us on twitter at: twitter.com/projexity <http://twitter.com/projexity>
>> 
>> _______________________________________________
>> riak-users mailing list
>> [email protected] <mailto:[email protected]>
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com 
>> <http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com>
>> 
>> 
>> _______________________________________________
>> riak-users mailing list
>> [email protected] <mailto:[email protected]>
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com 
>> <http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com>

_______________________________________________
riak-users mailing list
[email protected]
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Reply via email to