[jira] [Commented] (CASSANDRA-1451) Shutting down a node cleanly still kills client requests when the node goes down

2011-05-15 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-1451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13033695#comment-13033695
 ] 

Jonathan Ellis commented on CASSANDRA-1451:
---

Instead of trying to make this an integrated part of drain, if we had a 
manually shut down gossip jmx control, we could have a simple workflow of

1) shut down gossip
2) wait for everyone to mark node-to-drain as down
3) drain

with no further changes required to internals, and no new exception types to 
introduce.

 Shutting down a node cleanly still kills client requests when the node goes 
 down
 --

 Key: CASSANDRA-1451
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1451
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: David King
Priority: Minor
 Fix For: 1.0


 Shutting down a node, even more cleanly through drain, still kills some 
 requests with timeoutexceptions. Ideally, operations would not be sent at all 
 to nodes that are known to be shutting down, perhaps by shutting down gossip 
 before starting the draining process. 
 Other nodes will still need to have the phi convict threshold exceeded, but 
 presumably that's usually shorter than drain

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-1451) Shutting down a node cleanly still kills client requests when the node goes down

2011-05-15 Thread Gary Dusbabek (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-1451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13033806#comment-13033806
 ] 

Gary Dusbabek commented on CASSANDRA-1451:
--

We do have such a control.

 Shutting down a node cleanly still kills client requests when the node goes 
 down
 --

 Key: CASSANDRA-1451
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1451
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: David King
Priority: Minor
 Fix For: 1.0


 Shutting down a node, even more cleanly through drain, still kills some 
 requests with timeoutexceptions. Ideally, operations would not be sent at all 
 to nodes that are known to be shutting down, perhaps by shutting down gossip 
 before starting the draining process. 
 Other nodes will still need to have the phi convict threshold exceeded, but 
 presumably that's usually shorter than drain

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] Commented: (CASSANDRA-1451) Shutting down a node cleanly still kills client requests when the node goes down

2011-02-09 Thread Matthew F. Dennis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-1451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12992806#comment-12992806
 ] 

Matthew F. Dennis commented on CASSANDRA-1451:
--

now that CASSANDRA-1108 is in place, it seems like whenever we start draining 
we:

1) shutdown gossiper. This will prevent future messages from other nodes (once 
it propagates)
2) immediately reply to any request with a TException(node shutting down). 
This will allow clients to distinguish between node is busy and node is not 
playing right now.  This will also allow other nodes to continue instead of 
waiting for RPCTimeout.

thoughts?

 Shutting down a node cleanly still kills client requests when the node goes 
 down
 --

 Key: CASSANDRA-1451
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1451
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 0.6.5
Reporter: David King

 Shutting down a node, even more cleanly through drain, still kills some 
 requests with timeoutexceptions. Ideally, operations would not be sent at all 
 to nodes that are known to be shutting down, perhaps by shutting down gossip 
 before starting the draining process. 
 Other nodes will still need to have the phi convict threshold exceeded, but 
 presumably that's usually shorter than drain

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] Commented: (CASSANDRA-1451) Shutting down a node cleanly still kills client requests when the node goes down

2010-11-23 Thread Gary Dusbabek (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-1451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12934831#action_12934831
 ] 

Gary Dusbabek commented on CASSANDRA-1451:
--

An approach:

More cleanly: introduce a gossip state in conjunction with 
ApplicationState.STATUS that basically proclaims I'm up, but stop routing 
requests to me.  (e.g.: see StorageService.startLeaving()).  But then you'd be 
at the mercy of relying on when that information makes it to every node.  We 
might already have a such a state, but it doesn't imply these semantics.

Even more cleanly: When a node is in that state and it receives a request from 
another node that doesn't know it, have send a message that politely explains 
the situation and please stop sending me requests.  Ideally, this would be 
done by forcing a gossip to the node that doesn't know the leaving node doesn't 
want requests (as opposed to creating a new message, verb handler, etc.).

 Shutting down a node cleanly still kills client requests when the node goes 
 down
 --

 Key: CASSANDRA-1451
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1451
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 0.6.5
Reporter: David King

 Shutting down a node, even more cleanly through drain, still kills some 
 requests with timeoutexceptions. Ideally, operations would not be sent at all 
 to nodes that are known to be shutting down, perhaps by shutting down gossip 
 before starting the draining process. 
 Other nodes will still need to have the phi convict threshold exceeded, but 
 presumably that's usually shorter than drain

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (CASSANDRA-1451) Shutting down a node cleanly still kills client requests when the node goes down

2010-11-23 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-1451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12934833#action_12934833
 ] 

Jonathan Ellis commented on CASSANDRA-1451:
---

bq. When a node is in that state and it receives a request from another node 
that doesn't know it, have send a message that politely explains the situation 
and please stop sending me requests.

you're basically guaranteed to get these from every node in the cluster b/c of 
gossip delay, if it's under load.  i'd say let's use gossip and accept there 
will be some delay.

 Shutting down a node cleanly still kills client requests when the node goes 
 down
 --

 Key: CASSANDRA-1451
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1451
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 0.6.5
Reporter: David King

 Shutting down a node, even more cleanly through drain, still kills some 
 requests with timeoutexceptions. Ideally, operations would not be sent at all 
 to nodes that are known to be shutting down, perhaps by shutting down gossip 
 before starting the draining process. 
 Other nodes will still need to have the phi convict threshold exceeded, but 
 presumably that's usually shorter than drain

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (CASSANDRA-1451) Shutting down a node cleanly still kills client requests when the node goes down

2010-11-22 Thread Erik Onnen (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-1451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12934646#action_12934646
 ] 

Erik Onnen commented on CASSANDRA-1451:
---

Here's what we observed that lead to this being discussed in IRC.

When executing nodetool drain, a node is no-longer able to accept new write 
operations. This is problematic for several reasons in the current 
implementation:

1) The drain node actually accepts writes, just won't process them locally but 
it will ship writes to remote endpoints. In 0.6.8, the write can actually be 
successful, even though a timeout error is reported back to the client when the 
local write fails causing the client to think the write fails when it in fact 
succeeded.
2) The drain node can still process some writes, just not writes for which it 
is a natural endpoint. This leads to non-deterministic behavior for clients 
where some writes succeed, but others fail.
3) The drain node can still process reads. This causes some upstream client 
libraries to think the node is healthy when in reality it should be shunned (at 
least for writes).
4) When a local write is rejected, it surfaces as a timeout exception. This is 
the same behavior that happens when pending read/write stage operations are 
full. In many cases, it's proper for a client to retry when read/write are full 
but due to how this appears to the client, the client cannot distinguish 
whether read/writes are backed up, or if the local node is simply rejecting the 
write as a result of being in a drain. The clients can't self-help in this 
case, they're left to guess which is bad.

 Shutting down a node cleanly still kills client requests when the node goes 
 down
 --

 Key: CASSANDRA-1451
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1451
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 0.6.5
Reporter: David King

 Shutting down a node, even more cleanly through drain, still kills some 
 requests with timeoutexceptions. Ideally, operations would not be sent at all 
 to nodes that are known to be shutting down, perhaps by shutting down gossip 
 before starting the draining process. 
 Other nodes will still need to have the phi convict threshold exceeded, but 
 presumably that's usually shorter than drain

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (CASSANDRA-1451) Shutting down a node cleanly still kills client requests when the node goes down

2010-11-22 Thread Erik Onnen (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-1451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12934652#action_12934652
 ] 

Erik Onnen commented on CASSANDRA-1451:
---

I'm happy to work on a fix for this if someone can point me in the right 
direction for getting started.

 Shutting down a node cleanly still kills client requests when the node goes 
 down
 --

 Key: CASSANDRA-1451
 URL: https://issues.apache.org/jira/browse/CASSANDRA-1451
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 0.6.5
Reporter: David King

 Shutting down a node, even more cleanly through drain, still kills some 
 requests with timeoutexceptions. Ideally, operations would not be sent at all 
 to nodes that are known to be shutting down, perhaps by shutting down gossip 
 before starting the draining process. 
 Other nodes will still need to have the phi convict threshold exceeded, but 
 presumably that's usually shorter than drain

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.