subject:"\[jira\] \[Commented\] \(CASSANDRA\-2034\) Make Read Repair unnecessary when Hinted Handoff is enabled"

[jira] [Commented] (CASSANDRA-2034) Make Read Repair unnecessary when Hinted Handoff is enabled

2011-08-31 Thread Jonathan Ellis (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-2034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13094684#comment-13094684
 ] 

Jonathan Ellis commented on CASSANDRA-2034:
---

what messages is RemoveTest waiting for that are causing the problem?

 Make Read Repair unnecessary when Hinted Handoff is enabled
 ---

 Key: CASSANDRA-2034
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2034
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Jonathan Ellis
Assignee: Patricio Echague
 Fix For: 1.0

 Attachments: 2034-formatting.txt, 2034-v16.txt, 2034-v17.txt, 
 2034-v18.txt, 2034-v19-rebased.txt, 2034-v19.txt, 2034-v20.txt, 2034-v21.txt, 
 CASSANDRA-2034-trunk-v10.patch, CASSANDRA-2034-trunk-v11.patch, 
 CASSANDRA-2034-trunk-v11.patch, CASSANDRA-2034-trunk-v12.patch, 
 CASSANDRA-2034-trunk-v13.patch, CASSANDRA-2034-trunk-v14.patch, 
 CASSANDRA-2034-trunk-v15.patch, CASSANDRA-2034-trunk-v2.patch, 
 CASSANDRA-2034-trunk-v3.patch, CASSANDRA-2034-trunk-v4.patch, 
 CASSANDRA-2034-trunk-v5.patch, CASSANDRA-2034-trunk-v6.patch, 
 CASSANDRA-2034-trunk-v7.patch, CASSANDRA-2034-trunk-v8.patch, 
 CASSANDRA-2034-trunk-v9.patch, CASSANDRA-2034-trunk.patch

   Original Estimate: 8h
  Remaining Estimate: 8h

 Currently, HH is purely an optimization -- if a machine goes down, enabling 
 HH means RR/AES will have less work to do, but you can't disable RR entirely 
 in most situations since HH doesn't kick in until the FailureDetector does.
 Let's add a scheduled task to the mutate path, such that we return to the 
 client normally after ConsistencyLevel is achieved, but after RpcTimeout we 
 check the responseHandler write acks and write local hints for any missing 
 targets.
 This would making disabling RR when HH is enabled a much more reasonable 
 option, which has a huge impact on read throughput.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2034) Make Read Repair unnecessary when Hinted Handoff is enabled

2011-08-31 Thread Patricio Echague (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-2034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13094699#comment-13094699
 ] 

Patricio Echague commented on CASSANDRA-2034:
-

I fixed the test.

If I'm not wrong,

{code}
for (InetAddress host : hosts)
{
Message msg = new Message(host, 
StorageService.Verb.REPLICATION_FINISHED, new byte[0], 
MessagingService.version_);
MessagingService.instance().sendRR(msg, 
FBUtilities.getBroadcastAddress());
}
{code}

sends 5 messages but some of them (4) are caught by a local SinkManager 
implementation(for testing purposes) and not processed.
And since you added a wait for the callbacks to be processed before exiting, I 
had to add a force shutdown (for testing purposes) in order to make the test 
complete successfully. 


 Make Read Repair unnecessary when Hinted Handoff is enabled
 ---

 Key: CASSANDRA-2034
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2034
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Jonathan Ellis
Assignee: Patricio Echague
 Fix For: 1.0

 Attachments: 2034-formatting.txt, 2034-v16.txt, 2034-v17.txt, 
 2034-v18.txt, 2034-v19-rebased.txt, 2034-v19.txt, 2034-v20.txt, 2034-v21.txt, 
 CASSANDRA-2034-trunk-v10.patch, CASSANDRA-2034-trunk-v11.patch, 
 CASSANDRA-2034-trunk-v11.patch, CASSANDRA-2034-trunk-v12.patch, 
 CASSANDRA-2034-trunk-v13.patch, CASSANDRA-2034-trunk-v14.patch, 
 CASSANDRA-2034-trunk-v15.patch, CASSANDRA-2034-trunk-v2.patch, 
 CASSANDRA-2034-trunk-v3.patch, CASSANDRA-2034-trunk-v4.patch, 
 CASSANDRA-2034-trunk-v5.patch, CASSANDRA-2034-trunk-v6.patch, 
 CASSANDRA-2034-trunk-v7.patch, CASSANDRA-2034-trunk-v8.patch, 
 CASSANDRA-2034-trunk-v9.patch, CASSANDRA-2034-trunk.patch

   Original Estimate: 8h
  Remaining Estimate: 8h

 Currently, HH is purely an optimization -- if a machine goes down, enabling 
 HH means RR/AES will have less work to do, but you can't disable RR entirely 
 in most situations since HH doesn't kick in until the FailureDetector does.
 Let's add a scheduled task to the mutate path, such that we return to the 
 client normally after ConsistencyLevel is achieved, but after RpcTimeout we 
 check the responseHandler write acks and write local hints for any missing 
 targets.
 This would making disabling RR when HH is enabled a much more reasonable 
 option, which has a huge impact on read throughput.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2034) Make Read Repair unnecessary when Hinted Handoff is enabled

2011-08-23 Thread Patricio Echague (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-2034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13089922#comment-13089922
 ] 

Patricio Echague commented on CASSANDRA-2034:
-

Jonathan, I noticed you modified a bit CallbackInfo.shoudHint()

{code}
public boolean shouldHint()
{
return message != null  StorageProxy.shouldHint(target);
}
{code}

Not sure if you meant to say that your changes addresses the issue of not 
hinting when CL is not reached.
The new shoudHint method you added should be ok as it is processed upon 
RPCTimeout disregard if the CL was achieved or not.


 Make Read Repair unnecessary when Hinted Handoff is enabled
 ---

 Key: CASSANDRA-2034
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2034
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Jonathan Ellis
Assignee: Patricio Echague
 Fix For: 1.0

 Attachments: 2034-formatting.txt, 2034-v16.txt, 2034-v17.txt, 
 2034-v18.txt, 2034-v19.txt, CASSANDRA-2034-trunk-v10.patch, 
 CASSANDRA-2034-trunk-v11.patch, CASSANDRA-2034-trunk-v11.patch, 
 CASSANDRA-2034-trunk-v12.patch, CASSANDRA-2034-trunk-v13.patch, 
 CASSANDRA-2034-trunk-v14.patch, CASSANDRA-2034-trunk-v15.patch, 
 CASSANDRA-2034-trunk-v2.patch, CASSANDRA-2034-trunk-v3.patch, 
 CASSANDRA-2034-trunk-v4.patch, CASSANDRA-2034-trunk-v5.patch, 
 CASSANDRA-2034-trunk-v6.patch, CASSANDRA-2034-trunk-v7.patch, 
 CASSANDRA-2034-trunk-v8.patch, CASSANDRA-2034-trunk-v9.patch, 
 CASSANDRA-2034-trunk.patch

   Original Estimate: 8h
  Remaining Estimate: 8h

 Currently, HH is purely an optimization -- if a machine goes down, enabling 
 HH means RR/AES will have less work to do, but you can't disable RR entirely 
 in most situations since HH doesn't kick in until the FailureDetector does.
 Let's add a scheduled task to the mutate path, such that we return to the 
 client normally after ConsistencyLevel is achieved, but after RpcTimeout we 
 check the responseHandler write acks and write local hints for any missing 
 targets.
 This would making disabling RR when HH is enabled a much more reasonable 
 option, which has a huge impact on read throughput.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2034) Make Read Repair unnecessary when Hinted Handoff is enabled

2011-08-23 Thread Jonathan Ellis (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-2034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13089943#comment-13089943
 ] 

Jonathan Ellis commented on CASSANDRA-2034:
---

Right, that's what I was referring to when I said [the old shouldHint] does 
not achieve our goal of making read-repair unnecessary. For that, we need to 
always hint when an attempted write fails.

 Make Read Repair unnecessary when Hinted Handoff is enabled
 ---

 Key: CASSANDRA-2034
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2034
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Jonathan Ellis
Assignee: Patricio Echague
 Fix For: 1.0

 Attachments: 2034-formatting.txt, 2034-v16.txt, 2034-v17.txt, 
 2034-v18.txt, 2034-v19.txt, CASSANDRA-2034-trunk-v10.patch, 
 CASSANDRA-2034-trunk-v11.patch, CASSANDRA-2034-trunk-v11.patch, 
 CASSANDRA-2034-trunk-v12.patch, CASSANDRA-2034-trunk-v13.patch, 
 CASSANDRA-2034-trunk-v14.patch, CASSANDRA-2034-trunk-v15.patch, 
 CASSANDRA-2034-trunk-v2.patch, CASSANDRA-2034-trunk-v3.patch, 
 CASSANDRA-2034-trunk-v4.patch, CASSANDRA-2034-trunk-v5.patch, 
 CASSANDRA-2034-trunk-v6.patch, CASSANDRA-2034-trunk-v7.patch, 
 CASSANDRA-2034-trunk-v8.patch, CASSANDRA-2034-trunk-v9.patch, 
 CASSANDRA-2034-trunk.patch

   Original Estimate: 8h
  Remaining Estimate: 8h

 Currently, HH is purely an optimization -- if a machine goes down, enabling 
 HH means RR/AES will have less work to do, but you can't disable RR entirely 
 in most situations since HH doesn't kick in until the FailureDetector does.
 Let's add a scheduled task to the mutate path, such that we return to the 
 client normally after ConsistencyLevel is achieved, but after RpcTimeout we 
 check the responseHandler write acks and write local hints for any missing 
 targets.
 This would making disabling RR when HH is enabled a much more reasonable 
 option, which has a huge impact on read throughput.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2034) Make Read Repair unnecessary when Hinted Handoff is enabled

2011-08-16 Thread Patricio Echague (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-2034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13085858#comment-13085858
 ] 

Patricio Echague commented on CASSANDRA-2034:
-

Thanks Jonathan for the snippet of code. I didn't notice it was broken.

I don't see where CallbackInfo.shouldHint is broken. 

{code}   
public boolean shouldHint()
{
if (StorageProxy.shouldHint(target)  isMutation)
{
try
{
1)  ((IWriteResponseHandler) callback).get();
return true;
}
catch (TimeoutException e) 
{
// CL was not achieved. We should not hint.
}
}
return false;
}
{code}

I process the callback after the message expired. If the CL was achieved (and 
the requirement for a hint are gathered) I return true for this target meaning 
that a hint needs to be written. 
On the other hand, if the message expire and the CL was not achieved, then I 
return FALSE (for this target).

Perhaps it needs a special treatment during the shutdown ?

 Make Read Repair unnecessary when Hinted Handoff is enabled
 ---

 Key: CASSANDRA-2034
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2034
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Jonathan Ellis
Assignee: Patricio Echague
 Fix For: 1.0

 Attachments: 2034-formatting.txt, CASSANDRA-2034-trunk-v10.patch, 
 CASSANDRA-2034-trunk-v11.patch, CASSANDRA-2034-trunk-v11.patch, 
 CASSANDRA-2034-trunk-v12.patch, CASSANDRA-2034-trunk-v13.patch, 
 CASSANDRA-2034-trunk-v14.patch, CASSANDRA-2034-trunk-v2.patch, 
 CASSANDRA-2034-trunk-v3.patch, CASSANDRA-2034-trunk-v4.patch, 
 CASSANDRA-2034-trunk-v5.patch, CASSANDRA-2034-trunk-v6.patch, 
 CASSANDRA-2034-trunk-v7.patch, CASSANDRA-2034-trunk-v8.patch, 
 CASSANDRA-2034-trunk-v9.patch, CASSANDRA-2034-trunk.patch

   Original Estimate: 8h
  Remaining Estimate: 8h

 Currently, HH is purely an optimization -- if a machine goes down, enabling 
 HH means RR/AES will have less work to do, but you can't disable RR entirely 
 in most situations since HH doesn't kick in until the FailureDetector does.
 Let's add a scheduled task to the mutate path, such that we return to the 
 client normally after ConsistencyLevel is achieved, but after RpcTimeout we 
 check the responseHandler write acks and write local hints for any missing 
 targets.
 This would making disabling RR when HH is enabled a much more reasonable 
 option, which has a huge impact on read throughput.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2034) Make Read Repair unnecessary when Hinted Handoff is enabled

2011-08-16 Thread Patricio Echague (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-2034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13085966#comment-13085966
 ] 

Patricio Echague commented on CASSANDRA-2034:
-

{quote}
currentHintsQueueSize [now totalHints] increment needs to be done OUTSIDE the 
runnable or it will never get above the number of task executors
{quote}

Interesting. I must have forgotten it after one of the patches. I remember 
fixing it before.

{quote}
Yes. We should probably either wait for the messages to time out (which is 
mildly annoying to the user) or just write hints for everything (which may be 
confusing: why are there hints being sent after I restart, when no node was 
ever down?) I don't see a perfect solution.
{quote}

I think I prefer make the user wait for RPCTimeout since it is not that much 
and perhaps puts a bit more clarity than just saving the hints just in case.

 Make Read Repair unnecessary when Hinted Handoff is enabled
 ---

 Key: CASSANDRA-2034
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2034
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Jonathan Ellis
Assignee: Patricio Echague
 Fix For: 1.0

 Attachments: 2034-formatting.txt, 2034-v16.txt, 
 CASSANDRA-2034-trunk-v10.patch, CASSANDRA-2034-trunk-v11.patch, 
 CASSANDRA-2034-trunk-v11.patch, CASSANDRA-2034-trunk-v12.patch, 
 CASSANDRA-2034-trunk-v13.patch, CASSANDRA-2034-trunk-v14.patch, 
 CASSANDRA-2034-trunk-v15.patch, CASSANDRA-2034-trunk-v2.patch, 
 CASSANDRA-2034-trunk-v3.patch, CASSANDRA-2034-trunk-v4.patch, 
 CASSANDRA-2034-trunk-v5.patch, CASSANDRA-2034-trunk-v6.patch, 
 CASSANDRA-2034-trunk-v7.patch, CASSANDRA-2034-trunk-v8.patch, 
 CASSANDRA-2034-trunk-v9.patch, CASSANDRA-2034-trunk.patch

   Original Estimate: 8h
  Remaining Estimate: 8h

 Currently, HH is purely an optimization -- if a machine goes down, enabling 
 HH means RR/AES will have less work to do, but you can't disable RR entirely 
 in most situations since HH doesn't kick in until the FailureDetector does.
 Let's add a scheduled task to the mutate path, such that we return to the 
 client normally after ConsistencyLevel is achieved, but after RpcTimeout we 
 check the responseHandler write acks and write local hints for any missing 
 targets.
 This would making disabling RR when HH is enabled a much more reasonable 
 option, which has a huge impact on read throughput.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2034) Make Read Repair unnecessary when Hinted Handoff is enabled

2011-08-16 Thread Patricio Echague (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-2034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13085978#comment-13085978
 ] 

Patricio Echague commented on CASSANDRA-2034:
-

{quote}
Also, still need to address this:

currentHintsQueueSize [now totalHints] increment needs to be done OUTSIDE the 
runnable or it will never get above the number of task executors
{quote}

hintsInProgress.incrementAndGet(); happens outside of the executor and actually 
before scheduling it.
totalHints.incrementAndGet(); on the other hand, the totalHint is incremented 
right after the hint was written and within the task.

Is that not right ?

 Make Read Repair unnecessary when Hinted Handoff is enabled
 ---

 Key: CASSANDRA-2034
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2034
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Jonathan Ellis
Assignee: Patricio Echague
 Fix For: 1.0

 Attachments: 2034-formatting.txt, 2034-v16.txt, 
 CASSANDRA-2034-trunk-v10.patch, CASSANDRA-2034-trunk-v11.patch, 
 CASSANDRA-2034-trunk-v11.patch, CASSANDRA-2034-trunk-v12.patch, 
 CASSANDRA-2034-trunk-v13.patch, CASSANDRA-2034-trunk-v14.patch, 
 CASSANDRA-2034-trunk-v15.patch, CASSANDRA-2034-trunk-v2.patch, 
 CASSANDRA-2034-trunk-v3.patch, CASSANDRA-2034-trunk-v4.patch, 
 CASSANDRA-2034-trunk-v5.patch, CASSANDRA-2034-trunk-v6.patch, 
 CASSANDRA-2034-trunk-v7.patch, CASSANDRA-2034-trunk-v8.patch, 
 CASSANDRA-2034-trunk-v9.patch, CASSANDRA-2034-trunk.patch

   Original Estimate: 8h
  Remaining Estimate: 8h

 Currently, HH is purely an optimization -- if a machine goes down, enabling 
 HH means RR/AES will have less work to do, but you can't disable RR entirely 
 in most situations since HH doesn't kick in until the FailureDetector does.
 Let's add a scheduled task to the mutate path, such that we return to the 
 client normally after ConsistencyLevel is achieved, but after RpcTimeout we 
 check the responseHandler write acks and write local hints for any missing 
 targets.
 This would making disabling RR when HH is enabled a much more reasonable 
 option, which has a huge impact on read throughput.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2034) Make Read Repair unnecessary when Hinted Handoff is enabled

2011-08-15 Thread Patricio Echague (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-2034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13085142#comment-13085142
 ] 

Patricio Echague commented on CASSANDRA-2034:
-

I can do that.

Also, I'm not quite happy with waiting for local hints to complete per 
mutation. I'm thinking of adding them to the handler so that we can wait for 
the hints after scheduling all the mutations.

It has pros and cons:

Pros: If the coordinator node is overwhelmed, we can tell the client right away.
Cons: Por large mutations, we are actually blocking for local hints (if any) 
per mutation which is not ideal.

 Make Read Repair unnecessary when Hinted Handoff is enabled
 ---

 Key: CASSANDRA-2034
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2034
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Jonathan Ellis
Assignee: Patricio Echague
 Fix For: 1.0

 Attachments: 2034-formatting.txt, CASSANDRA-2034-trunk-v10.patch, 
 CASSANDRA-2034-trunk-v11.patch, CASSANDRA-2034-trunk-v11.patch, 
 CASSANDRA-2034-trunk-v12.patch, CASSANDRA-2034-trunk-v13.patch, 
 CASSANDRA-2034-trunk-v2.patch, CASSANDRA-2034-trunk-v3.patch, 
 CASSANDRA-2034-trunk-v4.patch, CASSANDRA-2034-trunk-v5.patch, 
 CASSANDRA-2034-trunk-v6.patch, CASSANDRA-2034-trunk-v7.patch, 
 CASSANDRA-2034-trunk-v8.patch, CASSANDRA-2034-trunk-v9.patch, 
 CASSANDRA-2034-trunk.patch

   Original Estimate: 8h
  Remaining Estimate: 8h

 Currently, HH is purely an optimization -- if a machine goes down, enabling 
 HH means RR/AES will have less work to do, but you can't disable RR entirely 
 in most situations since HH doesn't kick in until the FailureDetector does.
 Let's add a scheduled task to the mutate path, such that we return to the 
 client normally after ConsistencyLevel is achieved, but after RpcTimeout we 
 check the responseHandler write acks and write local hints for any missing 
 targets.
 This would making disabling RR when HH is enabled a much more reasonable 
 option, which has a huge impact on read throughput.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2034) Make Read Repair unnecessary when Hinted Handoff is enabled

2011-08-15 Thread Jonathan Ellis (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-2034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13085151#comment-13085151
 ] 

Jonathan Ellis commented on CASSANDRA-2034:
---

Yes, that should be in mutate() or the handler so the waiting is parallelized.

 Make Read Repair unnecessary when Hinted Handoff is enabled
 ---

 Key: CASSANDRA-2034
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2034
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Jonathan Ellis
Assignee: Patricio Echague
 Fix For: 1.0

 Attachments: 2034-formatting.txt, CASSANDRA-2034-trunk-v10.patch, 
 CASSANDRA-2034-trunk-v11.patch, CASSANDRA-2034-trunk-v11.patch, 
 CASSANDRA-2034-trunk-v12.patch, CASSANDRA-2034-trunk-v13.patch, 
 CASSANDRA-2034-trunk-v2.patch, CASSANDRA-2034-trunk-v3.patch, 
 CASSANDRA-2034-trunk-v4.patch, CASSANDRA-2034-trunk-v5.patch, 
 CASSANDRA-2034-trunk-v6.patch, CASSANDRA-2034-trunk-v7.patch, 
 CASSANDRA-2034-trunk-v8.patch, CASSANDRA-2034-trunk-v9.patch, 
 CASSANDRA-2034-trunk.patch

   Original Estimate: 8h
  Remaining Estimate: 8h

 Currently, HH is purely an optimization -- if a machine goes down, enabling 
 HH means RR/AES will have less work to do, but you can't disable RR entirely 
 in most situations since HH doesn't kick in until the FailureDetector does.
 Let's add a scheduled task to the mutate path, such that we return to the 
 client normally after ConsistencyLevel is achieved, but after RpcTimeout we 
 check the responseHandler write acks and write local hints for any missing 
 targets.
 This would making disabling RR when HH is enabled a much more reasonable 
 option, which has a huge impact on read throughput.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2034) Make Read Repair unnecessary when Hinted Handoff is enabled

2011-08-15 Thread Jonathan Ellis (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-2034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13085497#comment-13085497
 ] 

Jonathan Ellis commented on CASSANDRA-2034:
---

Hmm. I think you're right: it would work better to do the hints in the handler 
instead of passing these lists around.  Sorry; let's change it to do it that 
way.

Other notes:

SP.shouldHint is broken (will always return true when hints are disabled).  I 
would write it like this:
{code}
public static boolean shouldHint(InetAddress ep)
{
if (!isHintedHandoffEnabled())
return false;

boolean hintWindowExpired = Gossiper.instance.getEndpointDowntime(ep)  
maxHintWindow;
if (hintWindowExpired)
logger.debug(not hinting {} which has been down {}ms, ep, 
Gossiper.instance.getEndpointDowntime(ep));
return !hintWindowExpired;
}
{code}

CallbackInfo.shouldHint is broken a different way.  It should be returning true 
if and only if the write to the target failed.  (Calling this variable from 
is odd -- from is used to refer to localhost in a MessageService context.)  
Currently, it returns true if the overall CL is achieved, which in the general 
case tells us nothing about the individual replica in question.

 Make Read Repair unnecessary when Hinted Handoff is enabled
 ---

 Key: CASSANDRA-2034
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2034
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Jonathan Ellis
Assignee: Patricio Echague
 Fix For: 1.0

 Attachments: 2034-formatting.txt, CASSANDRA-2034-trunk-v10.patch, 
 CASSANDRA-2034-trunk-v11.patch, CASSANDRA-2034-trunk-v11.patch, 
 CASSANDRA-2034-trunk-v12.patch, CASSANDRA-2034-trunk-v13.patch, 
 CASSANDRA-2034-trunk-v14.patch, CASSANDRA-2034-trunk-v2.patch, 
 CASSANDRA-2034-trunk-v3.patch, CASSANDRA-2034-trunk-v4.patch, 
 CASSANDRA-2034-trunk-v5.patch, CASSANDRA-2034-trunk-v6.patch, 
 CASSANDRA-2034-trunk-v7.patch, CASSANDRA-2034-trunk-v8.patch, 
 CASSANDRA-2034-trunk-v9.patch, CASSANDRA-2034-trunk.patch

   Original Estimate: 8h
  Remaining Estimate: 8h

 Currently, HH is purely an optimization -- if a machine goes down, enabling 
 HH means RR/AES will have less work to do, but you can't disable RR entirely 
 in most situations since HH doesn't kick in until the FailureDetector does.
 Let's add a scheduled task to the mutate path, such that we return to the 
 client normally after ConsistencyLevel is achieved, but after RpcTimeout we 
 check the responseHandler write acks and write local hints for any missing 
 targets.
 This would making disabling RR when HH is enabled a much more reasonable 
 option, which has a huge impact on read throughput.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2034) Make Read Repair unnecessary when Hinted Handoff is enabled

2011-08-13 Thread Jonathan Ellis (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-2034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13084779#comment-13084779
 ] 

Jonathan Ellis commented on CASSANDRA-2034:
---

I don't think keeping passing a list of unavailableEndpoints everywhere is 
actually necessary.  I may be missing a use case, but what I see is

- in sendToHintedEndpoints
- in assureSufficientLiveNodes implementation

Both of which can be replaced in a straightforward manner with FailureDetector 
calls.  (Note that it is not necessary for FD state to remain unchanged between 
assureSufficient and sending.)

In fact using the same list in both places is a bug: assureSufficient only 
cares about what FD thinks, so mixing hinted-handoff-enabledness in as 
getUnavailableEndpoints does will cause assureSufficient to return false 
positives w/ HH off.

So I'd make assureSufficient use FD directly, and sendTHE use FD + HH state.  
Bonus: no List allocation in the common case of everything is healthy.

 Make Read Repair unnecessary when Hinted Handoff is enabled
 ---

 Key: CASSANDRA-2034
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2034
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Jonathan Ellis
Assignee: Patricio Echague
 Fix For: 1.0

 Attachments: 2034-formatting.txt, CASSANDRA-2034-trunk-v10.patch, 
 CASSANDRA-2034-trunk-v11.patch, CASSANDRA-2034-trunk-v11.patch, 
 CASSANDRA-2034-trunk-v12.patch, CASSANDRA-2034-trunk-v13.patch, 
 CASSANDRA-2034-trunk-v2.patch, CASSANDRA-2034-trunk-v3.patch, 
 CASSANDRA-2034-trunk-v4.patch, CASSANDRA-2034-trunk-v5.patch, 
 CASSANDRA-2034-trunk-v6.patch, CASSANDRA-2034-trunk-v7.patch, 
 CASSANDRA-2034-trunk-v8.patch, CASSANDRA-2034-trunk-v9.patch, 
 CASSANDRA-2034-trunk.patch

   Original Estimate: 8h
  Remaining Estimate: 8h

 Currently, HH is purely an optimization -- if a machine goes down, enabling 
 HH means RR/AES will have less work to do, but you can't disable RR entirely 
 in most situations since HH doesn't kick in until the FailureDetector does.
 Let's add a scheduled task to the mutate path, such that we return to the 
 client normally after ConsistencyLevel is achieved, but after RpcTimeout we 
 check the responseHandler write acks and write local hints for any missing 
 targets.
 This would making disabling RR when HH is enabled a much more reasonable 
 option, which has a huge impact on read throughput.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2034) Make Read Repair unnecessary when Hinted Handoff is enabled

2011-08-12 Thread Jonathan Ellis (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-2034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13084154#comment-13084154
 ] 

Jonathan Ellis commented on CASSANDRA-2034:
---

- there's an unused overload of MS.addCallback
- shouldHint should probably be a CallbackInfo method
- the .warn in scheduleMH should be .error
- any reason scheduleMH is protected instead of private?
- technically MS.shutdown should probably collect the Futures of the hints 
being written and wait on them 

in
{code}
if (consistencyLevel != null  consistencyLevel == ConsistencyLevel.ANY)
{code}

did you mean this?
{code}
if (responseHandler != null  consistencyLevel == ConsistencyLevel.ANY)
{code}

- not a big fan of the subclass just exposes a different constructor than its 
parent idiom.  I think the normal expectation is that a subclass encapsulates 
some kind of different behavior than its parent.  I'd get rid of CIWM and just 
expose a with-Message constructor in CallbackInfo.
- Would also make CI message and isMutation final
- avoid declaring @Override on abstract methods (looking at writeLocalHint 
Runnable, also CTAF, may be others)
- avoid comments that repeat what the code says, like // One more task in the 
hints queue.
- would name hintCounter - totalHints (I had to look at usages to figure out 
what it does)
- there's no actual hint queue anymore so would name currentHintsQueueSize - 
hintsInProgress (similarly, maxHintsQueueSize)
- prefer camelCase variable names (CTAF overall_timeout)
- unit.toMillis(timeout) would be more idiomatic than 
TimeUnit.MILLISECONDS.convert(timeout, unit)
- otherwise CTAF is a good clean encapsulation, nice job
- generics is upset about hintsFutures.add(new 
CreationTimeAwareFuture(hintfuture)) -- can you fix the unchecked warning 
there?
- unnecessary whitespace added to the method below // wait for writes.  throws 
TimeoutException if necessary
- would prefer to avoid allocating the hintFutures list unless we actually need 
to write hints, since this is on the write inner loop
- still think we can simplify sendToHinted by getting rid of getHintedEndpoints 
and operating directly on the raw writeEndpoints (and consulting FD to decide 
whether to write a hint)
- currentHintsQueueSize increment needs to be done OUTSIDE the runnable or it 
will never get above the number of task executors
- it would be nice if we could tell the CallbackInfo whether the original write 
reached ConsistencyLevel.  Maybe we could do this by changing isMutation to 
volatile isHintRequired, something like that.  (And just set it to false if CL 
is not reached.)  If not, we don't have to write hints for timed out replicas 
-- this will help avoiding OOM if nodes in the cluster start getting overloaded 
and dropping writes.  Otherwise, the coordinator could run out of memory 
queuing up hints for writes that timed out and the client will retry.

 Make Read Repair unnecessary when Hinted Handoff is enabled
 ---

 Key: CASSANDRA-2034
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2034
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Jonathan Ellis
Assignee: Patricio Echague
 Fix For: 1.0

 Attachments: 2034-formatting.txt, CASSANDRA-2034-trunk-v10.patch, 
 CASSANDRA-2034-trunk-v11.patch, CASSANDRA-2034-trunk-v11.patch, 
 CASSANDRA-2034-trunk-v2.patch, CASSANDRA-2034-trunk-v3.patch, 
 CASSANDRA-2034-trunk-v4.patch, CASSANDRA-2034-trunk-v5.patch, 
 CASSANDRA-2034-trunk-v6.patch, CASSANDRA-2034-trunk-v7.patch, 
 CASSANDRA-2034-trunk-v8.patch, CASSANDRA-2034-trunk-v9.patch, 
 CASSANDRA-2034-trunk.patch

   Original Estimate: 8h
  Remaining Estimate: 8h

 Currently, HH is purely an optimization -- if a machine goes down, enabling 
 HH means RR/AES will have less work to do, but you can't disable RR entirely 
 in most situations since HH doesn't kick in until the FailureDetector does.
 Let's add a scheduled task to the mutate path, such that we return to the 
 client normally after ConsistencyLevel is achieved, but after RpcTimeout we 
 check the responseHandler write acks and write local hints for any missing 
 targets.
 This would making disabling RR when HH is enabled a much more reasonable 
 option, which has a huge impact on read throughput.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2034) Make Read Repair unnecessary when Hinted Handoff is enabled

2011-08-12 Thread Jonathan Ellis (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-2034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13084164#comment-13084164
 ] 

Jonathan Ellis commented on CASSANDRA-2034:
---

- the timeout-based waitOnFutures overload should only accept CTAF objects 
since it will NOT behave as expected with others, e.g., FutureTask objects


 Make Read Repair unnecessary when Hinted Handoff is enabled
 ---

 Key: CASSANDRA-2034
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2034
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Jonathan Ellis
Assignee: Patricio Echague
 Fix For: 1.0

 Attachments: 2034-formatting.txt, CASSANDRA-2034-trunk-v10.patch, 
 CASSANDRA-2034-trunk-v11.patch, CASSANDRA-2034-trunk-v11.patch, 
 CASSANDRA-2034-trunk-v2.patch, CASSANDRA-2034-trunk-v3.patch, 
 CASSANDRA-2034-trunk-v4.patch, CASSANDRA-2034-trunk-v5.patch, 
 CASSANDRA-2034-trunk-v6.patch, CASSANDRA-2034-trunk-v7.patch, 
 CASSANDRA-2034-trunk-v8.patch, CASSANDRA-2034-trunk-v9.patch, 
 CASSANDRA-2034-trunk.patch

   Original Estimate: 8h
  Remaining Estimate: 8h

 Currently, HH is purely an optimization -- if a machine goes down, enabling 
 HH means RR/AES will have less work to do, but you can't disable RR entirely 
 in most situations since HH doesn't kick in until the FailureDetector does.
 Let's add a scheduled task to the mutate path, such that we return to the 
 client normally after ConsistencyLevel is achieved, but after RpcTimeout we 
 check the responseHandler write acks and write local hints for any missing 
 targets.
 This would making disabling RR when HH is enabled a much more reasonable 
 option, which has a huge impact on read throughput.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2034) Make Read Repair unnecessary when Hinted Handoff is enabled

2011-08-11 Thread Jonathan Ellis (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-2034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13083486#comment-13083486
 ] 

Jonathan Ellis commented on CASSANDRA-2034:
---

I think v11 is missing the new Callback classes.  Can you rebase to trunk when 
you add those?

 Make Read Repair unnecessary when Hinted Handoff is enabled
 ---

 Key: CASSANDRA-2034
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2034
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Jonathan Ellis
Assignee: Patricio Echague
 Fix For: 1.0

 Attachments: 2034-formatting.txt, CASSANDRA-2034-trunk-v10.patch, 
 CASSANDRA-2034-trunk-v11.patch, CASSANDRA-2034-trunk-v2.patch, 
 CASSANDRA-2034-trunk-v3.patch, CASSANDRA-2034-trunk-v4.patch, 
 CASSANDRA-2034-trunk-v5.patch, CASSANDRA-2034-trunk-v6.patch, 
 CASSANDRA-2034-trunk-v7.patch, CASSANDRA-2034-trunk-v8.patch, 
 CASSANDRA-2034-trunk-v9.patch, CASSANDRA-2034-trunk.patch

   Original Estimate: 8h
  Remaining Estimate: 8h

 Currently, HH is purely an optimization -- if a machine goes down, enabling 
 HH means RR/AES will have less work to do, but you can't disable RR entirely 
 in most situations since HH doesn't kick in until the FailureDetector does.
 Let's add a scheduled task to the mutate path, such that we return to the 
 client normally after ConsistencyLevel is achieved, but after RpcTimeout we 
 check the responseHandler write acks and write local hints for any missing 
 targets.
 This would making disabling RR when HH is enabled a much more reasonable 
 option, which has a huge impact on read throughput.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2034) Make Read Repair unnecessary when Hinted Handoff is enabled

2011-08-10 Thread Jonathan Ellis (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-2034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13082492#comment-13082492
 ] 

Jonathan Ellis commented on CASSANDRA-2034:
---

bq. I don't think local hints need to be put on their own queue / thread-pool. 
Just write the hint to the local mutation queue and increment the hint counters

Agreed, that is a better solution than complicating things with extra queues / 
executors.  Since the message dropping is done by the Task, not the executor, 
there's no problem in that respect.  And we can use a simple counter to avoid 
clogging the queue with hints.

 Make Read Repair unnecessary when Hinted Handoff is enabled
 ---

 Key: CASSANDRA-2034
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2034
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Jonathan Ellis
Assignee: Patricio Echague
 Fix For: 1.0

 Attachments: 2034-formatting.txt, CASSANDRA-2034-trunk-v2.patch, 
 CASSANDRA-2034-trunk-v3.patch, CASSANDRA-2034-trunk-v4.patch, 
 CASSANDRA-2034-trunk-v5.patch, CASSANDRA-2034-trunk-v6.patch, 
 CASSANDRA-2034-trunk-v7.patch, CASSANDRA-2034-trunk-v8.patch, 
 CASSANDRA-2034-trunk-v9.patch, CASSANDRA-2034-trunk.patch

   Original Estimate: 8h
  Remaining Estimate: 8h

 Currently, HH is purely an optimization -- if a machine goes down, enabling 
 HH means RR/AES will have less work to do, but you can't disable RR entirely 
 in most situations since HH doesn't kick in until the FailureDetector does.
 Let's add a scheduled task to the mutate path, such that we return to the 
 client normally after ConsistencyLevel is achieved, but after RpcTimeout we 
 check the responseHandler write acks and write local hints for any missing 
 targets.
 This would making disabling RR when HH is enabled a much more reasonable 
 option, which has a huge impact on read throughput.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2034) Make Read Repair unnecessary when Hinted Handoff is enabled

2011-08-08 Thread T Jake Luciani (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-2034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13080979#comment-13080979
 ] 

T Jake Luciani commented on CASSANDRA-2034:
---

Comments on v8:

   - You are still waiting on hints in the client, I don't think we need 
CreationTimeAwareFuture anymore.
   - I don't think local hints needs to be put on it's own queue / thread-pool. 
Just write the hint to the local mutation queue and increment the hint counters.
   - The shutdown process should not wait for the mutation map to expire, it 
should simply write any outstanding hints from the map and shutdown.


Nitpik:
   - The new expiring map logic is backwards in my opinion. I would rather see 
the expiration callback handle the hint write, then see the MessageService call 
storageproxy.





 Make Read Repair unnecessary when Hinted Handoff is enabled
 ---

 Key: CASSANDRA-2034
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2034
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Jonathan Ellis
Assignee: Patricio Echague
 Fix For: 1.0

 Attachments: 2034-formatting.txt, CASSANDRA-2034-trunk-v2.patch, 
 CASSANDRA-2034-trunk-v3.patch, CASSANDRA-2034-trunk-v4.patch, 
 CASSANDRA-2034-trunk-v5.patch, CASSANDRA-2034-trunk-v6.patch, 
 CASSANDRA-2034-trunk-v7.patch, CASSANDRA-2034-trunk-v8.patch, 
 CASSANDRA-2034-trunk.patch

   Original Estimate: 8h
  Remaining Estimate: 8h

 Currently, HH is purely an optimization -- if a machine goes down, enabling 
 HH means RR/AES will have less work to do, but you can't disable RR entirely 
 in most situations since HH doesn't kick in until the FailureDetector does.
 Let's add a scheduled task to the mutate path, such that we return to the 
 client normally after ConsistencyLevel is achieved, but after RpcTimeout we 
 check the responseHandler write acks and write local hints for any missing 
 targets.
 This would making disabling RR when HH is enabled a much more reasonable 
 option, which has a huge impact on read throughput.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2034) Make Read Repair unnecessary when Hinted Handoff is enabled

2011-08-05 Thread Lior Golan (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-2034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13079936#comment-13079936
 ] 

Lior Golan commented on CASSANDRA-2034:
---

Writes are fast, but you also have network latency for communicating with all 
the nodes. What would happen in the worst-of-N case for multi-datacenter 
deployments?

 Make Read Repair unnecessary when Hinted Handoff is enabled
 ---

 Key: CASSANDRA-2034
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2034
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Jonathan Ellis
Assignee: Patricio Echague
 Fix For: 1.0

 Attachments: 2034-formatting.txt, CASSANDRA-2034-trunk-v2.patch, 
 CASSANDRA-2034-trunk-v3.patch, CASSANDRA-2034-trunk-v4.patch, 
 CASSANDRA-2034-trunk-v5.patch, CASSANDRA-2034-trunk-v6.patch, 
 CASSANDRA-2034-trunk-v7.patch, CASSANDRA-2034-trunk.patch

   Original Estimate: 8h
  Remaining Estimate: 8h

 Currently, HH is purely an optimization -- if a machine goes down, enabling 
 HH means RR/AES will have less work to do, but you can't disable RR entirely 
 in most situations since HH doesn't kick in until the FailureDetector does.
 Let's add a scheduled task to the mutate path, such that we return to the 
 client normally after ConsistencyLevel is achieved, but after RpcTimeout we 
 check the responseHandler write acks and write local hints for any missing 
 targets.
 This would making disabling RR when HH is enabled a much more reasonable 
 option, which has a huge impact on read throughput.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2034) Make Read Repair unnecessary when Hinted Handoff is enabled

2011-08-05 Thread Patricio Echague (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-2034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13080115#comment-13080115
 ] 

Patricio Echague commented on CASSANDRA-2034:
-

{quote} return to the client normally after ConsistencyLevel is achieved, but 
after RpcTimeout we check the responseHandler write acks and write local hints 
for any missing targets {quote}

The way I'm planning on implementing this last part is by adding a 
maybeWriteHint in the hook that already exist for the ExpiringMap in 
MessagingService.
The only problem is that I don't have the mutation as to generate a hint in 
MessageService.
I might catch what nodes hasn't replied in the StorageProxy and store one 
mutation per node that hasn't replied into an ExpiringMap in MessagingService.

The down side is that for CL.ONE I will end up storing basically one mutation 
per replica minus the one that responded, and it may lead to memory issues.

Thoughts?

 Make Read Repair unnecessary when Hinted Handoff is enabled
 ---

 Key: CASSANDRA-2034
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2034
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Jonathan Ellis
Assignee: Patricio Echague
 Fix For: 1.0

 Attachments: 2034-formatting.txt, CASSANDRA-2034-trunk-v2.patch, 
 CASSANDRA-2034-trunk-v3.patch, CASSANDRA-2034-trunk-v4.patch, 
 CASSANDRA-2034-trunk-v5.patch, CASSANDRA-2034-trunk-v6.patch, 
 CASSANDRA-2034-trunk-v7.patch, CASSANDRA-2034-trunk.patch

   Original Estimate: 8h
  Remaining Estimate: 8h

 Currently, HH is purely an optimization -- if a machine goes down, enabling 
 HH means RR/AES will have less work to do, but you can't disable RR entirely 
 in most situations since HH doesn't kick in until the FailureDetector does.
 Let's add a scheduled task to the mutate path, such that we return to the 
 client normally after ConsistencyLevel is achieved, but after RpcTimeout we 
 check the responseHandler write acks and write local hints for any missing 
 targets.
 This would making disabling RR when HH is enabled a much more reasonable 
 option, which has a huge impact on read throughput.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2034) Make Read Repair unnecessary when Hinted Handoff is enabled

2011-08-05 Thread T Jake Luciani (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-2034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13080129#comment-13080129
 ] 

T Jake Luciani commented on CASSANDRA-2034:
---

I think you can reuse the same RowMutation object across messages so shouldn't 
cause duplicates in memory.  

The only issue is if too many mutations queue up you might OOM but this is the 
same problem we currently we have with the write  stage.  So if you use a 
Expiring map you should add a onExpiration hook to write the hint locally for 
the replicas that never responded to the Mutation.  This covers the case that 
mutations expire before a response is received.  

Then all the MessageTask needs todo is clear the messageId from the expiring 
map before the expiration time.



 Make Read Repair unnecessary when Hinted Handoff is enabled
 ---

 Key: CASSANDRA-2034
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2034
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Jonathan Ellis
Assignee: Patricio Echague
 Fix For: 1.0

 Attachments: 2034-formatting.txt, CASSANDRA-2034-trunk-v2.patch, 
 CASSANDRA-2034-trunk-v3.patch, CASSANDRA-2034-trunk-v4.patch, 
 CASSANDRA-2034-trunk-v5.patch, CASSANDRA-2034-trunk-v6.patch, 
 CASSANDRA-2034-trunk-v7.patch, CASSANDRA-2034-trunk.patch

   Original Estimate: 8h
  Remaining Estimate: 8h

 Currently, HH is purely an optimization -- if a machine goes down, enabling 
 HH means RR/AES will have less work to do, but you can't disable RR entirely 
 in most situations since HH doesn't kick in until the FailureDetector does.
 Let's add a scheduled task to the mutate path, such that we return to the 
 client normally after ConsistencyLevel is achieved, but after RpcTimeout we 
 check the responseHandler write acks and write local hints for any missing 
 targets.
 This would making disabling RR when HH is enabled a much more reasonable 
 option, which has a huge impact on read throughput.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2034) Make Read Repair unnecessary when Hinted Handoff is enabled

2011-08-05 Thread Jonathan Ellis (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-2034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13080137#comment-13080137
]

Jonathan Ellis commented on CASSANDRA-2034:
---

bq. The only issue is if too many mutations queue up you might OOM but this is
the same problem we currently we have with the write stage

Right. Either way worst case is already you need to be able to buffer up to
rpc_timeout's worth of writes in memory.

bq. if you use a Expiring map you should add a onExpiration hook to write the
hint locally for the replicas that never responded to the Mutation

I'm not sure you want a separate ExpiringMap from the one you already have in
MS. Might make for weird corner cases. But that's an implementation detail;
the approach is sound.

Make Read Repair unnecessary when Hinted Handoff is enabled
---

Key: CASSANDRA-2034
URL: https://issues.apache.org/jira/browse/CASSANDRA-2034
Project: Cassandra
Issue Type: Improvement
Components: Core
Reporter: Jonathan Ellis
Assignee: Patricio Echague
Fix For: 1.0

Attachments: 2034-formatting.txt, CASSANDRA-2034-trunk-v2.patch,
CASSANDRA-2034-trunk-v3.patch, CASSANDRA-2034-trunk-v4.patch,
CASSANDRA-2034-trunk-v5.patch, CASSANDRA-2034-trunk-v6.patch,
CASSANDRA-2034-trunk-v7.patch, CASSANDRA-2034-trunk.patch

Original Estimate: 8h
Remaining Estimate: 8h

Currently, HH is purely an optimization -- if a machine goes down, enabling
HH means RR/AES will have less work to do, but you can't disable RR entirely
in most situations since HH doesn't kick in until the FailureDetector does.
Let's add a scheduled task to the mutate path, such that we return to the
client normally after ConsistencyLevel is achieved, but after RpcTimeout we
check the responseHandler write acks and write local hints for any missing
targets.
This would making disabling RR when HH is enabled a much more reasonable
option, which has a huge impact on read throughput.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2034) Make Read Repair unnecessary when Hinted Handoff is enabled

2011-08-05 Thread Patricio Echague (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-2034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13080155#comment-13080155
 ] 

Patricio Echague commented on CASSANDRA-2034:
-

The reason I need the extra map is to store the mutation. Otherwise I cannot 
generate a hint.



 Make Read Repair unnecessary when Hinted Handoff is enabled
 ---

 Key: CASSANDRA-2034
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2034
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Jonathan Ellis
Assignee: Patricio Echague
 Fix For: 1.0

 Attachments: 2034-formatting.txt, CASSANDRA-2034-trunk-v2.patch, 
 CASSANDRA-2034-trunk-v3.patch, CASSANDRA-2034-trunk-v4.patch, 
 CASSANDRA-2034-trunk-v5.patch, CASSANDRA-2034-trunk-v6.patch, 
 CASSANDRA-2034-trunk-v7.patch, CASSANDRA-2034-trunk.patch

   Original Estimate: 8h
  Remaining Estimate: 8h

 Currently, HH is purely an optimization -- if a machine goes down, enabling 
 HH means RR/AES will have less work to do, but you can't disable RR entirely 
 in most situations since HH doesn't kick in until the FailureDetector does.
 Let's add a scheduled task to the mutate path, such that we return to the 
 client normally after ConsistencyLevel is achieved, but after RpcTimeout we 
 check the responseHandler write acks and write local hints for any missing 
 targets.
 This would making disabling RR when HH is enabled a much more reasonable 
 option, which has a huge impact on read throughput.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2034) Make Read Repair unnecessary when Hinted Handoff is enabled

2011-08-05 Thread Patricio Echague (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-2034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13080203#comment-13080203
 ] 

Patricio Echague commented on CASSANDRA-2034:
-

{quote}
But, you are guaranteed that successful writes have been hinted (if necessary) 
so you do not have to repair unless there is hardware permadeath. (Otherwise 
you would have to repair after power failure or crashes, too.) 
{quote}

Since we are queuing up a hint on failure/RPC timeout after acknowledging to 
the client, it looks like we need to rapair everytime a node is shutdown given 
the crash-only way to shutting down Cassandra. Since we can have task in the 
queue yet to be executed. Right? I don't think repair is need only on hardware 
permadeath.

 Make Read Repair unnecessary when Hinted Handoff is enabled
 ---

 Key: CASSANDRA-2034
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2034
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Jonathan Ellis
Assignee: Patricio Echague
 Fix For: 1.0

 Attachments: 2034-formatting.txt, CASSANDRA-2034-trunk-v2.patch, 
 CASSANDRA-2034-trunk-v3.patch, CASSANDRA-2034-trunk-v4.patch, 
 CASSANDRA-2034-trunk-v5.patch, CASSANDRA-2034-trunk-v6.patch, 
 CASSANDRA-2034-trunk-v7.patch, CASSANDRA-2034-trunk.patch

   Original Estimate: 8h
  Remaining Estimate: 8h

 Currently, HH is purely an optimization -- if a machine goes down, enabling 
 HH means RR/AES will have less work to do, but you can't disable RR entirely 
 in most situations since HH doesn't kick in until the FailureDetector does.
 Let's add a scheduled task to the mutate path, such that we return to the 
 client normally after ConsistencyLevel is achieved, but after RpcTimeout we 
 check the responseHandler write acks and write local hints for any missing 
 targets.
 This would making disabling RR when HH is enabled a much more reasonable 
 option, which has a huge impact on read throughput.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2034) Make Read Repair unnecessary when Hinted Handoff is enabled

2011-08-05 Thread T Jake Luciani (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-2034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13080207#comment-13080207
 ] 

T Jake Luciani commented on CASSANDRA-2034:
---

There is a shutdownHook we added for these kinds of things. see StorageService 

 Make Read Repair unnecessary when Hinted Handoff is enabled
 ---

 Key: CASSANDRA-2034
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2034
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Jonathan Ellis
Assignee: Patricio Echague
 Fix For: 1.0

 Attachments: 2034-formatting.txt, CASSANDRA-2034-trunk-v2.patch, 
 CASSANDRA-2034-trunk-v3.patch, CASSANDRA-2034-trunk-v4.patch, 
 CASSANDRA-2034-trunk-v5.patch, CASSANDRA-2034-trunk-v6.patch, 
 CASSANDRA-2034-trunk-v7.patch, CASSANDRA-2034-trunk.patch

   Original Estimate: 8h
  Remaining Estimate: 8h

 Currently, HH is purely an optimization -- if a machine goes down, enabling 
 HH means RR/AES will have less work to do, but you can't disable RR entirely 
 in most situations since HH doesn't kick in until the FailureDetector does.
 Let's add a scheduled task to the mutate path, such that we return to the 
 client normally after ConsistencyLevel is achieved, but after RpcTimeout we 
 check the responseHandler write acks and write local hints for any missing 
 targets.
 This would making disabling RR when HH is enabled a much more reasonable 
 option, which has a huge impact on read throughput.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2034) Make Read Repair unnecessary when Hinted Handoff is enabled

2011-08-05 Thread Jonathan Ellis (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-2034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13080208#comment-13080208
 ] 

Jonathan Ellis commented on CASSANDRA-2034:
---

bq. I don't think repair is need only on hardware permadeath.

It's right there in what you quoted: Otherwise [if you're not waiting for all 
replica acks before returning to client] you would have to repair after power 
failure or crashes, too.

bq. it looks like we need to rapair everytime a node is shutdown given the 
crash-only way to shutting down Cassandra

As discussed on chat, you need to add a shutdown hook to let the executor 
finish for non-crash shutdowns.  Look for Runtime.getRuntime().addShutdownHook 
in StorageService.

 Make Read Repair unnecessary when Hinted Handoff is enabled
 ---

 Key: CASSANDRA-2034
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2034
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Jonathan Ellis
Assignee: Patricio Echague
 Fix For: 1.0

 Attachments: 2034-formatting.txt, CASSANDRA-2034-trunk-v2.patch, 
 CASSANDRA-2034-trunk-v3.patch, CASSANDRA-2034-trunk-v4.patch, 
 CASSANDRA-2034-trunk-v5.patch, CASSANDRA-2034-trunk-v6.patch, 
 CASSANDRA-2034-trunk-v7.patch, CASSANDRA-2034-trunk.patch

   Original Estimate: 8h
  Remaining Estimate: 8h

 Currently, HH is purely an optimization -- if a machine goes down, enabling 
 HH means RR/AES will have less work to do, but you can't disable RR entirely 
 in most situations since HH doesn't kick in until the FailureDetector does.
 Let's add a scheduled task to the mutate path, such that we return to the 
 client normally after ConsistencyLevel is achieved, but after RpcTimeout we 
 check the responseHandler write acks and write local hints for any missing 
 targets.
 This would making disabling RR when HH is enabled a much more reasonable 
 option, which has a huge impact on read throughput.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2034) Make Read Repair unnecessary when Hinted Handoff is enabled

2011-08-05 Thread Patricio Echague (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-2034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13080213#comment-13080213
 ] 

Patricio Echague commented on CASSANDRA-2034:
-

Thanks.

As per the TimeoutException trying to queue up a hint while writing hints after 
ACK timeout. Would it be ok to have a second executor with a non-capped queue? 
I'm currently using a capped queue for writing hints to replicas that are down 
before starting the mutation.

Downside of this is that the queue can grow big (by default we don't schedule 
hints after one hour that a replica has been down) 

 Make Read Repair unnecessary when Hinted Handoff is enabled
 ---

 Key: CASSANDRA-2034
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2034
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Jonathan Ellis
Assignee: Patricio Echague
 Fix For: 1.0

 Attachments: 2034-formatting.txt, CASSANDRA-2034-trunk-v2.patch, 
 CASSANDRA-2034-trunk-v3.patch, CASSANDRA-2034-trunk-v4.patch, 
 CASSANDRA-2034-trunk-v5.patch, CASSANDRA-2034-trunk-v6.patch, 
 CASSANDRA-2034-trunk-v7.patch, CASSANDRA-2034-trunk.patch

   Original Estimate: 8h
  Remaining Estimate: 8h

 Currently, HH is purely an optimization -- if a machine goes down, enabling 
 HH means RR/AES will have less work to do, but you can't disable RR entirely 
 in most situations since HH doesn't kick in until the FailureDetector does.
 Let's add a scheduled task to the mutate path, such that we return to the 
 client normally after ConsistencyLevel is achieved, but after RpcTimeout we 
 check the responseHandler write acks and write local hints for any missing 
 targets.
 This would making disabling RR when HH is enabled a much more reasonable 
 option, which has a huge impact on read throughput.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2034) Make Read Repair unnecessary when Hinted Handoff is enabled

2011-08-04 Thread Jonathan Ellis (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-2034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13079618#comment-13079618
 ] 

Jonathan Ellis commented on CASSANDRA-2034:
---

So I proposed two mutually exclusive approaches here:

- return to the client normally after ConsistencyLevel is achieved, but after 
RpcTimeout we check the responseHandler write acks and write local hints for 
any missing targets
- add a separate executor here, with a blocking, capped queue. When we go to do 
a hint-after-failure we enqueue [and] wait for the write and then return 
success to the client

The difference can be summarized as: do we wait for all hints to be written 
before returning to the client?  If you do, then CL.ONE write latency becomes 
worst-of-N instead of best-of-N.  But, you are guaranteed that successful 
writes have been hinted (if necessary) so you do not have to repair unless 
there is hardware permadeath.  (Otherwise you would have to repair after power 
failure or crashes, too.)

I'm inclined to think that the first option is better, partly because writes 
are *fast* so worst-of-N really isn't that different from best-of-N.  Also, we 
could use CASSANDRA-2819 to reduce the default write timeout while still being 
conservative for reads (which might hit disk).

 Make Read Repair unnecessary when Hinted Handoff is enabled
 ---

 Key: CASSANDRA-2034
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2034
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Jonathan Ellis
Assignee: Patricio Echague
 Fix For: 1.0

 Attachments: 2034-formatting.txt, CASSANDRA-2034-trunk-v2.patch, 
 CASSANDRA-2034-trunk-v3.patch, CASSANDRA-2034-trunk-v4.patch, 
 CASSANDRA-2034-trunk-v5.patch, CASSANDRA-2034-trunk-v6.patch, 
 CASSANDRA-2034-trunk-v7.patch, CASSANDRA-2034-trunk.patch

   Original Estimate: 8h
  Remaining Estimate: 8h

 Currently, HH is purely an optimization -- if a machine goes down, enabling 
 HH means RR/AES will have less work to do, but you can't disable RR entirely 
 in most situations since HH doesn't kick in until the FailureDetector does.
 Let's add a scheduled task to the mutate path, such that we return to the 
 client normally after ConsistencyLevel is achieved, but after RpcTimeout we 
 check the responseHandler write acks and write local hints for any missing 
 targets.
 This would making disabling RR when HH is enabled a much more reasonable 
 option, which has a huge impact on read throughput.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2034) Make Read Repair unnecessary when Hinted Handoff is enabled

2011-08-03 Thread Patricio Echague (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-2034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13078928#comment-13078928
 ] 

Patricio Echague commented on CASSANDRA-2034:
-

This ticket assumes counter will not be written with CL == ANY (CASSANDRA-2990)

 Make Read Repair unnecessary when Hinted Handoff is enabled
 ---

 Key: CASSANDRA-2034
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2034
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Jonathan Ellis
Assignee: Patricio Echague
 Fix For: 1.0

 Attachments: CASSANDRA-2034-trunk-v2.patch, 
 CASSANDRA-2034-trunk-v3.patch, CASSANDRA-2034-trunk-v4.patch, 
 CASSANDRA-2034-trunk.patch

   Original Estimate: 8h
  Remaining Estimate: 8h

 Currently, HH is purely an optimization -- if a machine goes down, enabling 
 HH means RR/AES will have less work to do, but you can't disable RR entirely 
 in most situations since HH doesn't kick in until the FailureDetector does.
 Let's add a scheduled task to the mutate path, such that we return to the 
 client normally after ConsistencyLevel is achieved, but after RpcTimeout we 
 check the responseHandler write acks and write local hints for any missing 
 targets.
 This would making disabling RR when HH is enabled a much more reasonable 
 option, which has a huge impact on read throughput.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2034) Make Read Repair unnecessary when Hinted Handoff is enabled

2011-08-03 Thread Jonathan Ellis (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-2034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13078935#comment-13078935
 ] 

Jonathan Ellis commented on CASSANDRA-2034:
---

I think CreationTimeAwareFuture is missing?

 Make Read Repair unnecessary when Hinted Handoff is enabled
 ---

 Key: CASSANDRA-2034
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2034
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Jonathan Ellis
Assignee: Patricio Echague
 Fix For: 1.0

 Attachments: CASSANDRA-2034-trunk-v2.patch, 
 CASSANDRA-2034-trunk-v3.patch, CASSANDRA-2034-trunk-v4.patch, 
 CASSANDRA-2034-trunk.patch

   Original Estimate: 8h
  Remaining Estimate: 8h

 Currently, HH is purely an optimization -- if a machine goes down, enabling 
 HH means RR/AES will have less work to do, but you can't disable RR entirely 
 in most situations since HH doesn't kick in until the FailureDetector does.
 Let's add a scheduled task to the mutate path, such that we return to the 
 client normally after ConsistencyLevel is achieved, but after RpcTimeout we 
 check the responseHandler write acks and write local hints for any missing 
 targets.
 This would making disabling RR when HH is enabled a much more reasonable 
 option, which has a huge impact on read throughput.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2034) Make Read Repair unnecessary when Hinted Handoff is enabled

2011-08-01 Thread Jonathan Ellis (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-2034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13073791#comment-13073791
]

Jonathan Ellis commented on CASSANDRA-2034:
---

Looks reasonable. I'd move the wait method into FBUtilities as an overload of
waitOnFutures.

Does the timeout on get start when the future is created, or when get is
called? I think it is the latter but I am not sure. If so, we should track
the total time waited and reduce after each get() so we do not allow total of
up to timeout * hints.

Make Read Repair unnecessary when Hinted Handoff is enabled
---

Attachments: CASSANDRA-2034-trunk.patch

Original Estimate: 8h
Remaining Estimate: 8h

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2034) Make Read Repair unnecessary when Hinted Handoff is enabled

2011-08-01 Thread Patricio Echague (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-2034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13073807#comment-13073807
]

Patricio Echague commented on CASSANDRA-2034:
-

{quote} Does the timeout on get start when the future is created, or when get
is called? I think it is the latter but I am not sure. If so, we should track
the total time waited and reduce after each get() so we do not allow total of
up to timeout * hints. {quote}
Yeah, I need to add a creation time and so something similar to what
IAsynResult does.

I noticed I missed to skip the hints creation when HH is disabled.
Some thoughts on this I would like some feedback:

Note: remember that hints are written locally on the coordinator node now.

| Hinted Handoff | Consist. Level |
| on | =1 | -- wait for hints. We DO NOT notify the
handler with handler.response() for hints;
| on | ANY | -- wait for hints. Responses count towards
consistency.
| off| =1 | -- DO NOT fire hints. And DO NOT wait for
them to complete.
| off| ANY | -- Fire hints but don't wait for them.
They count towards consistency.

Make Read Repair unnecessary when Hinted Handoff is enabled
---

Attachments: CASSANDRA-2034-trunk.patch

Original Estimate: 8h
Remaining Estimate: 8h

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2034) Make Read Repair unnecessary when Hinted Handoff is enabled

2011-08-01 Thread Jonathan Ellis (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-2034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13073812#comment-13073812
 ] 

Jonathan Ellis commented on CASSANDRA-2034:
---

Looks good to me for the first 3.

I think ANY should be equal to ONE for hints=off.  I.e., when it's off we 
*never* create hints.

 Make Read Repair unnecessary when Hinted Handoff is enabled
 ---

 Key: CASSANDRA-2034
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2034
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Jonathan Ellis
Assignee: Patricio Echague
 Fix For: 1.0

 Attachments: CASSANDRA-2034-trunk.patch

   Original Estimate: 8h
  Remaining Estimate: 8h

 Currently, HH is purely an optimization -- if a machine goes down, enabling 
 HH means RR/AES will have less work to do, but you can't disable RR entirely 
 in most situations since HH doesn't kick in until the FailureDetector does.
 Let's add a scheduled task to the mutate path, such that we return to the 
 client normally after ConsistencyLevel is achieved, but after RpcTimeout we 
 check the responseHandler write acks and write local hints for any missing 
 targets.
 This would making disabling RR when HH is enabled a much more reasonable 
 option, which has a huge impact on read throughput.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2034) Make Read Repair unnecessary when Hinted Handoff is enabled

2011-08-01 Thread Patricio Echague (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-2034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13075977#comment-13075977
]

Patricio Echague commented on CASSANDRA-2034:
-

v2 patch replaces v1.

Changes in v2:
- It fixes the add-up timeouts by adding a CreationAwareFuture
- Implements the matrix previously discussed

| Hinted Handoff | Consist. Level |
| on | =1 | -- wait for hints. We DO NOT notify the
handler with handler.response() for hints;
| on | ANY | -- wait for hints. Responses count towards
consistency.
| off| =1 | -- DO NOT fire hints. And DO NOT wait for
them to complete.
| off| ANY | -- DO NOT fire hints. And DO NOT wait for
them to complete.

Make Read Repair unnecessary when Hinted Handoff is enabled
---

Attachments: CASSANDRA-2034-trunk-v2.patch, CASSANDRA-2034-trunk.patch

Original Estimate: 8h
Remaining Estimate: 8h

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2034) Make Read Repair unnecessary when Hinted Handoff is enabled

2011-07-21 Thread Patricio Echague (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-2034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13069275#comment-13069275
 ] 

Patricio Echague commented on CASSANDRA-2034:
-

bq. after RpcTimeout we check the responseHandler write acks and write local 
hints for any missing targets.

CASSANDRA-2914 handles the local storage of hints.

 Make Read Repair unnecessary when Hinted Handoff is enabled
 ---

 Key: CASSANDRA-2034
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2034
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Jonathan Ellis
Assignee: Patricio Echague
 Fix For: 1.0

   Original Estimate: 8h
  Remaining Estimate: 8h

 Currently, HH is purely an optimization -- if a machine goes down, enabling 
 HH means RR/AES will have less work to do, but you can't disable RR entirely 
 in most situations since HH doesn't kick in until the FailureDetector does.
 Let's add a scheduled task to the mutate path, such that we return to the 
 client normally after ConsistencyLevel is achieved, but after RpcTimeout we 
 check the responseHandler write acks and write local hints for any missing 
 targets.
 This would making disabling RR when HH is enabled a much more reasonable 
 option, which has a huge impact on read throughput.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2034) Make Read Repair unnecessary when Hinted Handoff is enabled

2011-07-18 Thread Nicholas Telford (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-2034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13067059#comment-13067059
 ] 

Nicholas Telford commented on CASSANDRA-2034:
-

.bq Don't hints timeout?

Yes, the timeout is set to the GCGraceSeconds of the CF the hint is for. This 
is to prevent deletes from being undone by an old hint being replayed.

Post-#2045 hints contain the RM for multiple CFs; as such, the TTL for a hint 
is the minimum GCGraceSeconds from all of the CFs it references. This could 
cause hints to expire before delivery if one of the CFs for a mutation has a 
particularly short GCGraceSeconds.

 Make Read Repair unnecessary when Hinted Handoff is enabled
 ---

 Key: CASSANDRA-2034
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2034
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Jonathan Ellis
Assignee: Patricio Echague
 Fix For: 1.0

   Original Estimate: 8h
  Remaining Estimate: 8h

 Currently, HH is purely an optimization -- if a machine goes down, enabling 
 HH means RR/AES will have less work to do, but you can't disable RR entirely 
 in most situations since HH doesn't kick in until the FailureDetector does.
 Let's add a scheduled task to the mutate path, such that we return to the 
 client normally after ConsistencyLevel is achieved, but after RpcTimeout we 
 check the responseHandler write acks and write local hints for any missing 
 targets.
 This would making disabling RR when HH is enabled a much more reasonable 
 option, which has a huge impact on read throughput.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2034) Make Read Repair unnecessary when Hinted Handoff is enabled

2011-07-18 Thread Jonathan Ellis (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-2034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13067106#comment-13067106
 ] 

Jonathan Ellis commented on CASSANDRA-2034:
---

bq. the timeout is set to the GCGraceSeconds of the CF the hint is for

That's right.

So the guidance we can give is, you need to run repair if a node is down for 
longer than GCGraceSeconds, or if you lose data because of a hardware problem.

 Make Read Repair unnecessary when Hinted Handoff is enabled
 ---

 Key: CASSANDRA-2034
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2034
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Jonathan Ellis
Assignee: Patricio Echague
 Fix For: 1.0

   Original Estimate: 8h
  Remaining Estimate: 8h

 Currently, HH is purely an optimization -- if a machine goes down, enabling 
 HH means RR/AES will have less work to do, but you can't disable RR entirely 
 in most situations since HH doesn't kick in until the FailureDetector does.
 Let's add a scheduled task to the mutate path, such that we return to the 
 client normally after ConsistencyLevel is achieved, but after RpcTimeout we 
 check the responseHandler write acks and write local hints for any missing 
 targets.
 This would making disabling RR when HH is enabled a much more reasonable 
 option, which has a huge impact on read throughput.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2034) Make Read Repair unnecessary when Hinted Handoff is enabled

2011-06-30 Thread Patricio Echague (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-2034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13058044#comment-13058044
 ] 

Patricio Echague commented on CASSANDRA-2034:
-

Depends on CASSANDRA-2045. 

 Make Read Repair unnecessary when Hinted Handoff is enabled
 ---

 Key: CASSANDRA-2034
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2034
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Jonathan Ellis
Assignee: Patricio Echague
 Fix For: 1.0

   Original Estimate: 8h
  Remaining Estimate: 8h

 Currently, HH is purely an optimization -- if a machine goes down, enabling 
 HH means RR/AES will have less work to do, but you can't disable RR entirely 
 in most situations since HH doesn't kick in until the FailureDetector does.
 Let's add a scheduled task to the mutate path, such that we return to the 
 client normally after ConsistencyLevel is achieved, but after RpcTimeout we 
 check the responseHandler write acks and write local hints for any missing 
 targets.
 This would making disabling RR when HH is enabled a much more reasonable 
 option, which has a huge impact on read throughput.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2034) Make Read Repair unnecessary when Hinted Handoff is enabled

2011-06-28 Thread Jonathan Ellis (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-2034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13056910#comment-13056910
 ] 

Jonathan Ellis commented on CASSANDRA-2034:
---

bq. If we had different timeouts for writes than reads then it might be nice to 
use say 80% of the timeout for the normal write, and reserve 20% for the hint 
phase

Different r/w timeouts is being added in CASSANDRA-2819.

 Make Read Repair unnecessary when Hinted Handoff is enabled
 ---

 Key: CASSANDRA-2034
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2034
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Jonathan Ellis
Assignee: Patricio Echague
 Fix For: 1.0

   Original Estimate: 8h
  Remaining Estimate: 8h

 Currently, HH is purely an optimization -- if a machine goes down, enabling 
 HH means RR/AES will have less work to do, but you can't disable RR entirely 
 in most situations since HH doesn't kick in until the FailureDetector does.
 Let's add a scheduled task to the mutate path, such that we return to the 
 client normally after ConsistencyLevel is achieved, but after RpcTimeout we 
 check the responseHandler write acks and write local hints for any missing 
 targets.
 This would making disabling RR when HH is enabled a much more reasonable 
 option, which has a huge impact on read throughput.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2034) Make Read Repair unnecessary when Hinted Handoff is enabled

2011-05-25 Thread Jonathan Ellis (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-2034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13039181#comment-13039181
 ] 

Jonathan Ellis commented on CASSANDRA-2034:
---

Note that hint writes MUST be synchronous w/ the writes-as-percieved-by-client 
(which the design above accomplishes) or else you lose hints silently if a 
coordinator crashes (or is shut down/killed normally).

 Make Read Repair unnecessary when Hinted Handoff is enabled
 ---

 Key: CASSANDRA-2034
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2034
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Jonathan Ellis
Assignee: Sylvain Lebresne
 Fix For: 1.0

   Original Estimate: 8h
  Remaining Estimate: 8h

 Currently, HH is purely an optimization -- if a machine goes down, enabling 
 HH means RR/AES will have less work to do, but you can't disable RR entirely 
 in most situations since HH doesn't kick in until the FailureDetector does.
 Let's add a scheduled task to the mutate path, such that we return to the 
 client normally after ConsistencyLevel is achieved, but after RpcTimeout we 
 check the responseHandler write acks and write local hints for any missing 
 targets.
 This would making disabling RR when HH is enabled a much more reasonable 
 option, which has a huge impact on read throughput.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2034) Make Read Repair unnecessary when Hinted Handoff is enabled

2011-05-13 Thread T Jake Luciani (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-2034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13033147#comment-13033147
 ] 

T Jake Luciani commented on CASSANDRA-2034:
---

Don't hints timeout?  would there be a chance of never resolving the 
discrepancy with this approach?

 Make Read Repair unnecessary when Hinted Handoff is enabled
 ---

 Key: CASSANDRA-2034
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2034
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Jonathan Ellis
Assignee: Jonathan Ellis
 Fix For: 1.0

   Original Estimate: 8h
  Remaining Estimate: 8h

 Currently, HH is purely an optimization -- if a machine goes down, enabling 
 HH means RR/AES will have less work to do, but you can't disable RR entirely 
 in most situations since HH doesn't kick in until the FailureDetector does.
 Let's add a scheduled task to the mutate path, such that we return to the 
 client normally after ConsistencyLevel is achieved, but after RpcTimeout we 
 check the responseHandler write acks and write local hints for any missing 
 targets.
 This would making disabling RR when HH is enabled a much more reasonable 
 option, which has a huge impact on read throughput.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2034) Make Read Repair unnecessary when Hinted Handoff is enabled

2011-05-13 Thread Stu Hood (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-2034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13033159#comment-13033159
]

Stu Hood commented on CASSANDRA-2034:
-

bq. Better would be to add a hook to messagingservice callback expiration, and
fire hint recording from there
bq. So I think what we want to do, with this option on, is to attempt the hint
write but if we can't do it in a reasonable time...
+1. An expiration handler on the messaging service queue, and throttled local
hint writing should work very well.

Make Read Repair unnecessary when Hinted Handoff is enabled
---

Original Estimate: 8h
Remaining Estimate: 8h

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2034) Make Read Repair unnecessary when Hinted Handoff is enabled

2011-05-13 Thread Jonathan Ellis (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-2034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13033202#comment-13033202
]

Jonathan Ellis commented on CASSANDRA-2034:
---

bq. Don't hints timeout?

No (but cleanup can purge hinted rows, so don't do that unless all hints have
been replayed).

bq. would there be a chance of never resolving the discrepancy with this
approach?

Definitely in the case of a node went down, so I wrote some hints, but then
the hinted node lost its hdd too. In that case you'd need to run AE repair.

So the idea is not to make AE repair (completely) obsolete, only RR.

Make Read Repair unnecessary when Hinted Handoff is enabled
---

Original Estimate: 8h
Remaining Estimate: 8h

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2034) Make Read Repair unnecessary when Hinted Handoff is enabled

2011-05-13 Thread Stu Hood (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-2034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13033273#comment-13033273
 ] 

Stu Hood commented on CASSANDRA-2034:
-

IMO, disabling RR entirely is never a good idea unless we are going to 
_guarantee_ hint delivery. But I agree that this ticket is a good idea because 
increasing the probability of hint delivery is healthy.

 Make Read Repair unnecessary when Hinted Handoff is enabled
 ---

 Key: CASSANDRA-2034
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2034
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Jonathan Ellis
Assignee: Jonathan Ellis
 Fix For: 1.0

   Original Estimate: 8h
  Remaining Estimate: 8h

 Currently, HH is purely an optimization -- if a machine goes down, enabling 
 HH means RR/AES will have less work to do, but you can't disable RR entirely 
 in most situations since HH doesn't kick in until the FailureDetector does.
 Let's add a scheduled task to the mutate path, such that we return to the 
 client normally after ConsistencyLevel is achieved, but after RpcTimeout we 
 check the responseHandler write acks and write local hints for any missing 
 targets.
 This would making disabling RR when HH is enabled a much more reasonable 
 option, which has a huge impact on read throughput.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2034) Make Read Repair unnecessary when Hinted Handoff is enabled

2011-05-12 Thread Jonathan Ellis (JIRA)

[
https://issues.apache.org/jira/browse/CASSANDRA-2034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13032696#comment-13032696
]

Jonathan Ellis commented on CASSANDRA-2034:
---

bq. there is the potential to take us back to the Bad Old Days when HH could
cause cascading failure

To elaborate, the scenario here is, we did a write that succeeded on some
nodes, but not others. So we need to write a local hint to replay to the
down-or-slow nodes later. But, those nodes being down-or-slow mean load has
increased on the rest of the cluster, and writing the extra hint will increase
that further, possibly enough that other nodes will see this coordinator as
down-or-slow, too, and so on.

So I think what we want to do, with this option on, is to attempt the hint
write but if we can't do it in a reasonable time, throw back a
TimedOutException which is already our signal that your cluster may be
overloaded, you need to back off.

Specifically, we could add a separate executor here, with a blocking, capped
queue. When we go to do a hint-after-failure we enqueue the write but if it is
rejected because queue is full we throw the TOE. Otherwise, we wait for the
write and then return success to the client.

The tricky part is the queue needs to be large enough to handle load spikes but
small enough that wait-for-success-post-enqueue is negligible compared to
RpcTimeout. If we had different timeouts for writes than reads (which we don't
-- CASSANDRA-959) then it might be nice to use say 80% of the timeout for the
normal write, and reserve 20% for the hint phase.

Make Read Repair unnecessary when Hinted Handoff is enabled
---

Original Estimate: 8h
Remaining Estimate: 8h

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (CASSANDRA-2034) Make Read Repair unnecessary when Hinted Handoff is enabled

2011-04-29 Thread Jonathan Ellis (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-2034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13027151#comment-13027151
 ] 

Jonathan Ellis commented on CASSANDRA-2034:
---

bq. add a scheduled task

this is the wrong approach, as we found out when we tried something similar for 
read repair, which we fixed in CASSANDRA-2069.

Better would be to add a hook to messagingservice callback expiration, and fire 
hint recording from there if MS expires the callback before all acks are 
received.  (We could refactor the dynamic snitch latency update into a similar 
hook for reads.)

bq. This would need a separate executor for local writes that doesn't drop 
writes when it's behind

I'm more worried about this; there is the potential to take us back to the Bad 
Old Days when HH could cause cascading failure. (Of course the right answer is, 
Don't run your cluster so close to the edge of capacity, but we still want to 
degrade gracefully when this is ignored.)

 Make Read Repair unnecessary when Hinted Handoff is enabled
 ---

 Key: CASSANDRA-2034
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2034
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Jonathan Ellis
Assignee: Jonathan Ellis
 Fix For: 1.0

   Original Estimate: 8h
  Remaining Estimate: 8h

 Currently, HH is purely an optimization -- if a machine goes down, enabling 
 HH means RR/AES will have less work to do, but you can't disable RR entirely 
 in most situations since HH doesn't kick in until the FailureDetector does.
 Let's add a scheduled task to the mutate path, such that we return to the 
 client normally after ConsistencyLevel is achieved, but after RpcTimeout we 
 check the responseHandler write acks and write local hints for any missing 
 targets.
 This would making disabling RR when HH is enabled a much more reasonable 
 option, which has a huge impact on read throughput.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (CASSANDRA-2034) Make Read Repair unnecessary when Hinted Handoff is enabled

2011-01-28 Thread Jonathan Ellis (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-2034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12988173#action_12988173
 ] 

Jonathan Ellis commented on CASSANDRA-2034:
---

This would need a separate executor for local writes that doesn't drop writes 
when it's behind (and blocks when it's full) to avoid problems in overcapacity 
situations.

I'm about halfway convinced that while blocking clients for writes to {any node 
in the cluster} is bad to control overload situations, blocking clients when 
the coordinator itself is overloaded is ok.

 Make Read Repair unnecessary when Hinted Handoff is enabled
 ---

 Key: CASSANDRA-2034
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2034
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Jonathan Ellis
Assignee: Jonathan Ellis
 Fix For: 0.7.2


 Currently, HH is purely an optimization -- if a machine goes down, enabling 
 HH means RR/AES will have less work to do, but you can't disable RR entirely 
 in most situations since HH doesn't kick in until the FailureDetector does.
 Let's add a scheduled task to the mutate path, such that we return to the 
 client normally after ConsistencyLevel is achieved, but after RpcTimeout we 
 check the responseHandler write acks and write local hints for any missing 
 targets.
 This would making disabling RR when HH is enabled a much more reasonable 
 option, which has a huge impact on read throughput.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

45 matches

Mail list logo