[jira] [Commented] (CASSANDRA-2698) Instrument repair to be able to assess it's efficiency (precision)

2013-08-01 Thread Yuki Morishita (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13726617#comment-13726617
 ] 

Yuki Morishita commented on CASSANDRA-2698:
---

Benedict,

Thanks for the update.

bq. 1) I'm not sure what you mean by not serializing those 

I meant you don't have to send back to the coordinator.
Changing serialized format means we have to bump up messaging version defined 
in MessagingService.
2.0 got feature freeze, so I think it's better to wait until next minor version 
for message change.

Also, I looked at the change made to MerkleTree#differenceHelper, and I'm still 
not sure how row count helps improve logic.
What is the difference from just using hash value?

bq. 5) One thing we might want to consider changing is the format of the 
EstimatedHistogram ranges in the log messages

Yeah, especially -1  in (-1, 0] feels weird. How about omitting lower bound 
from the label and output like:

{code}
 ~0: xxx
~10: xxx
~20: xxx
{code}

nit: you should surround whole output logic in Validator#compele with 
`logger.isDebugEnabled` check

 Instrument repair to be able to assess it's efficiency (precision)
 --

 Key: CASSANDRA-2698
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2698
 Project: Cassandra
  Issue Type: Improvement
Reporter: Sylvain Lebresne
Assignee: Benedict
Priority: Minor
  Labels: lhf
 Attachments: nodetool_repair_and_cfhistogram.tar.gz, 
 patch_2698_v1.txt, patch.diff, patch-rebased.diff, patch.taketwo.alpha.diff, 
 patch.taketwo.forreview.diff


 Some reports indicate that repair sometime transfer huge amounts of data. One 
 hypothesis is that the merkle tree precision may deteriorate too much at some 
 data size. To check this hypothesis, it would be reasonably to gather 
 statistic during the merkle tree building of how many rows each merkle tree 
 range account for (and the size that this represent). It is probably an 
 interesting statistic to have anyway.   

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-2698) Instrument repair to be able to assess it's efficiency (precision)

2013-08-01 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13726641#comment-13726641
 ] 

Benedict commented on CASSANDRA-2698:
-

Good points.

As to the changes to differenceHelper(), the row count permits not breaking up 
contiguous ranges of differences that happen to be separated by unpopulated 
leaves (using just the hash to determine if the data was populated I realised 
was dangerous, as you cannot disambiguate between no rows and a non-zero number 
of empty rows), which in my previous patch was generating a lot of ugly log 
messages. After sending my patch last night I must admit I began to doubt the 
sense of keeping the changes in, and was probably the hangover of wanting to 
retain what I could from the previous patch. I think kill your babies is the 
mantra to apply here, as it doesn't serve any purpose at the moment, and if we 
don't intend to send counts over the wire would be actively dangerous.

I'll strip out those changes, modify the messages and and fire over another 
patch.

That said, I'd prefer to emit the lower bound as well so we know the starting 
point; ~100: xxx doesn't tell you if the distribution is 0-100, or 99-100, 
which might be useful information. This is only helpful for the first item, so 
could emit only for that, but for neatness I'd probably retain it for all; 
since we're dealing with integers there's an easy fix of just bumping both 
start/end by 1 and swapping the brackets.

 Instrument repair to be able to assess it's efficiency (precision)
 --

 Key: CASSANDRA-2698
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2698
 Project: Cassandra
  Issue Type: Improvement
Reporter: Sylvain Lebresne
Assignee: Benedict
Priority: Minor
  Labels: lhf
 Attachments: nodetool_repair_and_cfhistogram.tar.gz, 
 patch_2698_v1.txt, patch.diff, patch-rebased.diff, patch.taketwo.alpha.diff, 
 patch.taketwo.forreview.diff


 Some reports indicate that repair sometime transfer huge amounts of data. One 
 hypothesis is that the merkle tree precision may deteriorate too much at some 
 data size. To check this hypothesis, it would be reasonably to gather 
 statistic during the merkle tree building of how many rows each merkle tree 
 range account for (and the size that this represent). It is probably an 
 interesting statistic to have anyway.   

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-2698) Instrument repair to be able to assess it's efficiency (precision)

2013-08-01 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13726665#comment-13726665
 ] 

Benedict commented on CASSANDRA-2698:
-

Of course the other option is to always emit the range [0..lb] even if its not 
populated to demarcate the lb - which is your preference?

 Instrument repair to be able to assess it's efficiency (precision)
 --

 Key: CASSANDRA-2698
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2698
 Project: Cassandra
  Issue Type: Improvement
Reporter: Sylvain Lebresne
Assignee: Benedict
Priority: Minor
  Labels: lhf
 Attachments: nodetool_repair_and_cfhistogram.tar.gz, 
 patch_2698_v1.txt, patch.diff, patch-rebased.diff, patch.taketwo.alpha.diff, 
 patch.taketwo.forreview.diff


 Some reports indicate that repair sometime transfer huge amounts of data. One 
 hypothesis is that the merkle tree precision may deteriorate too much at some 
 data size. To check this hypothesis, it would be reasonably to gather 
 statistic during the merkle tree building of how many rows each merkle tree 
 range account for (and the size that this represent). It is probably an 
 interesting statistic to have anyway.   

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-2698) Instrument repair to be able to assess it's efficiency (precision)

2013-07-31 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13725604#comment-13725604
 ] 

Benedict commented on CASSANDRA-2698:
-

Hi Yuki,

Had some fun rebasing, but think everything looks good now. A few things to 
note:

1) I'm not sure what you mean by not serializing those - for correctness I 
serialize all of the data in a node. Do you want me to change the serialization 
methods to not send these values? I don't log them the other end, but I would 
prefer they were sent to ensure no surprises for users of the data, and also 
because of some optimisations to difference() that rely on knowing the number 
of rows for each sub-tree. It's not a tremendous amount of data after all.

2) I've modified DifferencerTest, and created two versions of the 
testDifference() method - one that tests differences on an empty tree, and one 
which tests a tree that has been populated with rows. Previously only the 
former was tested. This is because the changes I made to difference() for my 
previous patch, which I have retained and which ensures contiguous ranges are 
emitted where possible, treats the entire empty tree as one contiguous 
difference range (since the only non-empty sub-range in the tree is different), 
which was breaking the previous test. This test now works with the fully 
populated tree, and the previous test now confirms that the whole tree is 
considered different when it is empty. It's possible you may want to not deploy 
these improvements in this patch, but it seems a good idea to me whilst it's 
being modified, and given that I'd made the change already. Since we're not 
logging the ranges themselves at this time it won't have any direct impact, but 
it will be useful if that ever changes, and might help with future debugging.

3) I've updated the MerkleTreeTest methods to test the serialization and 
difference changes, and introduced a new HistogramBuilderTest

4) The histogram is built differently from my first patch, and is described in 
HistogramBuilder. Basically rather than creating neat linear ranges, I 
calculate the mean and create ranges that are multiples of the standard 
deviation either side of the mean, up to min/max (or, in this case, 3 stdevs, 
plus one range to min/max)

5) One thing we might want to consider changing is the format of the 
EstimatedHistogram ranges in the log messages. I've reproduced faithfully the 
boundary conventions of the EstimatedHistogram, but this is not a user friendly 
convention - it has an exclusive lower bound and inclusive upper bound, as 
opposed to the typical opposite convention. As such we get ranges like (-1, 0] 
to represent the range containing only 0, as opposed to [0, 1)

Think that's everything. Should respond quickly to queries at the moment, so 
drop me a line if you have any questions.

 Instrument repair to be able to assess it's efficiency (precision)
 --

 Key: CASSANDRA-2698
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2698
 Project: Cassandra
  Issue Type: Improvement
Reporter: Sylvain Lebresne
Assignee: Benedict
Priority: Minor
  Labels: lhf
 Attachments: nodetool_repair_and_cfhistogram.tar.gz, 
 patch_2698_v1.txt, patch.diff, patch-rebased.diff, patch.taketwo.alpha.diff


 Some reports indicate that repair sometime transfer huge amounts of data. One 
 hypothesis is that the merkle tree precision may deteriorate too much at some 
 data size. To check this hypothesis, it would be reasonably to gather 
 statistic during the merkle tree building of how many rows each merkle tree 
 range account for (and the size that this represent). It is probably an 
 interesting statistic to have anyway.   

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-2698) Instrument repair to be able to assess it's efficiency (precision)

2013-07-30 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13724263#comment-13724263
 ] 

Benedict commented on CASSANDRA-2698:
-

Thanks Yuki, sounds like my changes are fine assuming they test okay. I have a 
bit of a furlow between my various trips now (which I had rather optimistically 
expected to find time to test this on) so should be able to get a patch over in 
the next couple of days or so.

 Instrument repair to be able to assess it's efficiency (precision)
 --

 Key: CASSANDRA-2698
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2698
 Project: Cassandra
  Issue Type: Improvement
Reporter: Sylvain Lebresne
Assignee: Benedict
Priority: Minor
  Labels: lhf
 Attachments: nodetool_repair_and_cfhistogram.tar.gz, 
 patch_2698_v1.txt, patch.diff, patch-rebased.diff, patch.taketwo.alpha.diff


 Some reports indicate that repair sometime transfer huge amounts of data. One 
 hypothesis is that the merkle tree precision may deteriorate too much at some 
 data size. To check this hypothesis, it would be reasonably to gather 
 statistic during the merkle tree building of how many rows each merkle tree 
 range account for (and the size that this represent). It is probably an 
 interesting statistic to have anyway.   

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-2698) Instrument repair to be able to assess it's efficiency (precision)

2013-07-10 Thread Yuki Morishita (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13704655#comment-13704655
 ] 

Yuki Morishita commented on CASSANDRA-2698:
---

Hi Benedict,

Sorry for late reply.
I think calculating number of rows and those size per range part is fine.
One thing to point out is that we don't need to serialize those and return to 
the initiator, just log locally like you do is enough for now.

p.s. ActiveRepairService is broken up to o.a.c.repair package, so be careful 
when rebasing.

 Instrument repair to be able to assess it's efficiency (precision)
 --

 Key: CASSANDRA-2698
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2698
 Project: Cassandra
  Issue Type: Improvement
Reporter: Sylvain Lebresne
Assignee: Benedict
Priority: Minor
  Labels: lhf
 Attachments: nodetool_repair_and_cfhistogram.tar.gz, 
 patch_2698_v1.txt, patch.diff, patch-rebased.diff, patch.taketwo.alpha.diff


 Some reports indicate that repair sometime transfer huge amounts of data. One 
 hypothesis is that the merkle tree precision may deteriorate too much at some 
 data size. To check this hypothesis, it would be reasonably to gather 
 statistic during the merkle tree building of how many rows each merkle tree 
 range account for (and the size that this represent). It is probably an 
 interesting statistic to have anyway.   

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-2698) Instrument repair to be able to assess it's efficiency (precision)

2013-06-20 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13689804#comment-13689804
 ] 

Benedict commented on CASSANDRA-2698:
-

Hi Yuki,

{quote}
how many rows each merkle tree range account for (and the size that this 
represent)
{quote}

I take it two histograms, one of row distribution, the other of size (of row) 
distribution, is what you're looking for?

I've attached a new patch which is *not* complete, in that I have not tested it 
and may want to change a few of the final details (such as, possibly, where and 
what is logged with the histogram), but before I iron out those kinks I wanted 
to run past the main crux of the changes to see if it's what you're looking 
for. Simply put, the merkle tree ranges now retain both a sizeOfRange (ie size 
of rows added) and rowsInRange (ie number of rows added). Merkle tree now 
exposes two histogramXXX() methods which use these, and which as of now are 
logged in Validator.complete(). As it stands, I serialise both these new values 
over the wire with any merkle tree, to ensure no unexpected behaviour for 
future users of the class, and as such I also retained my TreeDifference 
changes to the merkle tree, which reports the size and row count of each side 
of a difference. These latter two changes may be slightly controversial, so 
want to run them past you, as well as confirm the basic information I'm 
printing is what you're looking for.

Cheers!

 Instrument repair to be able to assess it's efficiency (precision)
 --

 Key: CASSANDRA-2698
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2698
 Project: Cassandra
  Issue Type: Improvement
Reporter: Sylvain Lebresne
Assignee: Benedict
Priority: Minor
  Labels: lhf
 Attachments: nodetool_repair_and_cfhistogram.tar.gz, 
 patch_2698_v1.txt, patch.diff, patch-rebased.diff


 Some reports indicate that repair sometime transfer huge amounts of data. One 
 hypothesis is that the merkle tree precision may deteriorate too much at some 
 data size. To check this hypothesis, it would be reasonably to gather 
 statistic during the merkle tree building of how many rows each merkle tree 
 range account for (and the size that this represent). It is probably an 
 interesting statistic to have anyway.   

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-2698) Instrument repair to be able to assess it's efficiency (precision)

2013-06-20 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13689854#comment-13689854
 ] 

Benedict commented on CASSANDRA-2698:
-

Just noticed that patch doesn't include the HistogramBuilder class - after I 
posted I noticed I needed to pull in the latest remote changes, which 
unfortunately use 1.7 syntax, and since I'm on an old eclipse my problems 
window exploded. It's late here so I didn't/don't want to faff around too much, 
but if you have trouble let me know and I'll upload another patch after 
upgrading eclipse. The patch uploaded should be easy to quickly scan with my 
description to let me know if I'm still barking up the wrong tree, or if there 
are any changes you disagree with in principle.

 Instrument repair to be able to assess it's efficiency (precision)
 --

 Key: CASSANDRA-2698
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2698
 Project: Cassandra
  Issue Type: Improvement
Reporter: Sylvain Lebresne
Assignee: Benedict
Priority: Minor
  Labels: lhf
 Attachments: nodetool_repair_and_cfhistogram.tar.gz, 
 patch_2698_v1.txt, patch.diff, patch-rebased.diff, patch.taketwo.alpha.diff


 Some reports indicate that repair sometime transfer huge amounts of data. One 
 hypothesis is that the merkle tree precision may deteriorate too much at some 
 data size. To check this hypothesis, it would be reasonably to gather 
 statistic during the merkle tree building of how many rows each merkle tree 
 range account for (and the size that this represent). It is probably an 
 interesting statistic to have anyway.   

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-2698) Instrument repair to be able to assess it's efficiency (precision)

2013-06-15 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13684430#comment-13684430
 ] 

Benedict commented on CASSANDRA-2698:
-

Hi Yuki,

Sorry for the glacial response.

{quote}
If you do that, you should create your own class with labels and array since 
you're not using default offsets nor other histogram related methods. It 
confused me at first why you are doing addToIndex to EstimatedHistogram.
{quote}

Agreed. I was a little reticent to introduce a different histogram class, but 
it is a little ugly as it stands.

{quote}
But looking at this from the begining again, what we want to see is if we have 
Merkle tree of evenly distributed keys(or rows) in each hash. You can use 
EstimatedHistogram or your own to show that. For now, just use logger to log 
that distribution at the end of Merkle Tree creation with corresponding repair 
session Id is fine, instead of sending stats back to the coordinator.
{quote}

It sounds like all you want logged is the number of rows per merkle leaf, to 
see if the tree is roughly balanced? In which case I misinterpreted the goal 
entirely, though it makes sense now I understand more how the repair works. Is 
it worth leaving in the streaming of the leaf sizes with the merkle tree so 
that the deltas can be logged in future, should that be desired? 

I'll strip out the logging of the size of the ranges being streamed for now as 
well.



 Instrument repair to be able to assess it's efficiency (precision)
 --

 Key: CASSANDRA-2698
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2698
 Project: Cassandra
  Issue Type: Improvement
Reporter: Sylvain Lebresne
Assignee: Benedict
Priority: Minor
  Labels: lhf
 Attachments: nodetool_repair_and_cfhistogram.tar.gz, 
 patch_2698_v1.txt, patch.diff, patch-rebased.diff


 Some reports indicate that repair sometime transfer huge amounts of data. One 
 hypothesis is that the merkle tree precision may deteriorate too much at some 
 data size. To check this hypothesis, it would be reasonably to gather 
 statistic during the merkle tree building of how many rows each merkle tree 
 range account for (and the size that this represent). It is probably an 
 interesting statistic to have anyway.   

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-2698) Instrument repair to be able to assess it's efficiency (precision)

2013-04-11 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13629454#comment-13629454
 ] 

Jonathan Ellis commented on CASSANDRA-2698:
---

Jake has an Eclipse formatter at 
https://github.com/tjake/cassandra-style-eclipse.  Not sure if it's 100% 
accurate but Jake's patches look okay to me so it's probably close. :)

 Instrument repair to be able to assess it's efficiency (precision)
 --

 Key: CASSANDRA-2698
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2698
 Project: Cassandra
  Issue Type: Improvement
Reporter: Sylvain Lebresne
Assignee: Benedict
Priority: Minor
  Labels: lhf
 Attachments: nodetool_repair_and_cfhistogram.tar.gz, 
 patch_2698_v1.txt, patch.diff, patch-rebased.diff


 Some reports indicate that repair sometime transfer huge amounts of data. One 
 hypothesis is that the merkle tree precision may deteriorate too much at some 
 data size. To check this hypothesis, it would be reasonably to gather 
 statistic during the merkle tree building of how many rows each merkle tree 
 range account for (and the size that this represent). It is probably an 
 interesting statistic to have anyway.   

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-2698) Instrument repair to be able to assess it's efficiency (precision)

2013-04-09 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13626520#comment-13626520
 ] 

Benedict commented on CASSANDRA-2698:
-

Hi Yuki,

Without in some way collecting (or at least sampling) the size of the 
differences, I don't know what bucket sizes to use. Since I need to reinsert 
all the records once I've decided this anyway, I need to retain them all, which 
I chose to do in EstimatedHistogram as they do, in effect, constitute a 
histogram. I also sample the largest records which I figure could be useful for 
debugging purposes (though that was just a guess). I don't see why 1000s of 
items is a major issue?

I agree that logging is suboptimal for this data. Presumably similar data for 
other tasks may be optionally logged in future, and so I would guess this 
should form part of a discussion about metric logging?

{quote}
fix coding style (especially whitespace) to match other code.
{quote}
Do you have an Eclipse formatter profile I could use for your coding 
convention? I did my best to keep it correct manually, but it is difficult to 
spot differences in an unfamiliar convention. Whitespace should be 
comparatively easy though.

{quote}
EstimatedHistogram#testGroupBy is failing.
{quote}
Noted - will fix and resubmit

{quote}
comparator in Arrays#sort in EstimatedHistogram#logSummary has the same 
conditions in both if and else if.
{quote}
Thanks, good spot. I'm surprised Eclipse didn't warn me.




 Instrument repair to be able to assess it's efficiency (precision)
 --

 Key: CASSANDRA-2698
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2698
 Project: Cassandra
  Issue Type: Improvement
Reporter: Sylvain Lebresne
Assignee: Benedict
Priority: Minor
  Labels: lhf
 Attachments: nodetool_repair_and_cfhistogram.tar.gz, 
 patch_2698_v1.txt, patch.diff, patch-rebased.diff


 Some reports indicate that repair sometime transfer huge amounts of data. One 
 hypothesis is that the merkle tree precision may deteriorate too much at some 
 data size. To check this hypothesis, it would be reasonably to gather 
 statistic during the merkle tree building of how many rows each merkle tree 
 range account for (and the size that this represent). It is probably an 
 interesting statistic to have anyway.   

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-2698) Instrument repair to be able to assess it's efficiency (precision)

2013-04-08 Thread Yuki Morishita (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13626208#comment-13626208
 ] 

Yuki Morishita commented on CASSANDRA-2698:
---

[~benedict]

hmm, it is not clear to me why you create EstimatedHistogram of size of 
differences. Sometimes I see more than 1000 of differences for large clusters. 
You should just create it with reasonable bucket count. You don't have to keep 
every size for every range.

Using logging for outputting statistic is fine at this point, but I think we 
should come up with other way so that it is easy to see all the related logs 
and statistics about certain repair session. I don't have specific idea yet 
though. (Maybe another system cf similar to Tracing?)

nits:

- fix coding style (especially whitespace) to match other code.
- EstimatedHistogram#testGroupBy is failing.
- comparator in Arrays#sort in EstimatedHistogram#logSummary has the same 
conditions in both if and else if.

 Instrument repair to be able to assess it's efficiency (precision)
 --

 Key: CASSANDRA-2698
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2698
 Project: Cassandra
  Issue Type: Improvement
Reporter: Sylvain Lebresne
Assignee: Benedict
Priority: Minor
  Labels: lhf
 Attachments: nodetool_repair_and_cfhistogram.tar.gz, 
 patch_2698_v1.txt, patch.diff, patch-rebased.diff


 Some reports indicate that repair sometime transfer huge amounts of data. One 
 hypothesis is that the merkle tree precision may deteriorate too much at some 
 data size. To check this hypothesis, it would be reasonably to gather 
 statistic during the merkle tree building of how many rows each merkle tree 
 range account for (and the size that this represent). It is probably an 
 interesting statistic to have anyway.   

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-2698) Instrument repair to be able to assess it's efficiency (precision)

2013-03-28 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13616569#comment-13616569
 ] 

Benedict commented on CASSANDRA-2698:
-

Hi Yuki,

The patch was created some time ago, and there were some minor renames/changes 
to MerkleTree and AntiEntropyService in the meantime. I've pulled the latest 
changes, merged, and regenerated the patch. This is against the main trunk.

 Instrument repair to be able to assess it's efficiency (precision)
 --

 Key: CASSANDRA-2698
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2698
 Project: Cassandra
  Issue Type: Improvement
Reporter: Sylvain Lebresne
Assignee: Benedict
Priority: Minor
  Labels: lhf
 Attachments: nodetool_repair_and_cfhistogram.tar.gz, 
 patch_2698_v1.txt, patch.diff, patch-rebased.diff


 Some reports indicate that repair sometime transfer huge amounts of data. One 
 hypothesis is that the merkle tree precision may deteriorate too much at some 
 data size. To check this hypothesis, it would be reasonably to gather 
 statistic during the merkle tree building of how many rows each merkle tree 
 range account for (and the size that this represent). It is probably an 
 interesting statistic to have anyway.   

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-2698) Instrument repair to be able to assess it's efficiency (precision)

2013-03-25 Thread Yuki Morishita (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13612683#comment-13612683
 ] 

Yuki Morishita commented on CASSANDRA-2698:
---

Hi, Benedict,
Patch is not applicable to any branch. Can you rebase and re-upload?
When you do, please tell us which branch the patch is for.


 Instrument repair to be able to assess it's efficiency (precision)
 --

 Key: CASSANDRA-2698
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2698
 Project: Cassandra
  Issue Type: Improvement
Reporter: Sylvain Lebresne
Assignee: Benedict
Priority: Minor
  Labels: lhf
 Attachments: nodetool_repair_and_cfhistogram.tar.gz, 
 patch_2698_v1.txt, patch.diff


 Some reports indicate that repair sometime transfer huge amounts of data. One 
 hypothesis is that the merkle tree precision may deteriorate too much at some 
 data size. To check this hypothesis, it would be reasonably to gather 
 statistic during the merkle tree building of how many rows each merkle tree 
 range account for (and the size that this represent). It is probably an 
 interesting statistic to have anyway.   

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-2698) Instrument repair to be able to assess it's efficiency (precision)

2013-03-21 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13609625#comment-13609625
 ] 

Benedict commented on CASSANDRA-2698:
-

Hi,

I've uploaded a patch for this issue (patch.diff - apologies for the 
potentially future-clashing name). Logging is performed in two places, both on 
the source (not requesting) node of any comparison:

1) On the requesting node in AntiEntropyService.Difference.run(), after the 
MerkleTree difference is calculated and before the StreamingRepairTask is 
created
2) On the source node, on which StreamingRepairTask is run, in 
StreamOut.createPendingFiles()

In both cases we log, at debug level, a sample of the largest ranges followed 
by a histogram of the range size distribution.  The first is achieved by 
inserting each range directly into an EstimatedHistogram, on which we call the 
new logSummary() method; the second by calling the new groupByFrequency() 
method on that same histogram, to yield a histogram based on the frequency of 
sizes present in the original (on which we simply call log()).

In case 1, we construct the MerkleTree to include a size taken from the 
AbstractCompactedRow we compute the hash from, and use this in 
MerkleTree.difference to estimate the size of mismatching ranges. This tends to 
underestimate, versus that reported by StreamOut, by around 15%. One design 
decision of note here: instead of modifying AbstractCompactedRow to return a 
size (which would be invasive and in some cases incur an unnecessary penalty) 
we use a custom implementation of MessageDigest that counts the number of bytes 
provided to it.

Case 2 is much simpler, as we already have the ranges and their sizes available 
to us.

There are some other changes, particularly in MerkleTree, with some 
refactoring/renames/new subclasses as part of updating MerkleTree.difference(). 
In particular, TreeDifference is returned instead of TreeRange (to accommodate 
the extra size information), and it is used generally in place of it within 
this method tree where applicable; hash() and hashHelper() have also been 
renamed to find() and findHelper(), with a new hash() implementation depending 
on find(). I'm sure there are other minutiae, but hopefully nothing too opaque. 
If you need any clarification, feel free to ask.

 Instrument repair to be able to assess it's efficiency (precision)
 --

 Key: CASSANDRA-2698
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2698
 Project: Cassandra
  Issue Type: Improvement
Reporter: Sylvain Lebresne
Priority: Minor
  Labels: lhf
 Attachments: nodetool_repair_and_cfhistogram.tar.gz, 
 patch_2698_v1.txt, patch.diff


 Some reports indicate that repair sometime transfer huge amounts of data. One 
 hypothesis is that the merkle tree precision may deteriorate too much at some 
 data size. To check this hypothesis, it would be reasonably to gather 
 statistic during the merkle tree building of how many rows each merkle tree 
 range account for (and the size that this represent). It is probably an 
 interesting statistic to have anyway.   

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-2698) Instrument repair to be able to assess it's efficiency (precision)

2013-02-11 Thread Ivan Sobolev (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13575928#comment-13575928
 ] 

Ivan Sobolev commented on CASSANDRA-2698:
-

Hi, guys, it appears, the thing would be useful for me too. I'm going to give 
it a try. I'll follow the plan summarized by [~jbellis].
If you feel that I should know something you know - please shout :)

 Instrument repair to be able to assess it's efficiency (precision)
 --

 Key: CASSANDRA-2698
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2698
 Project: Cassandra
  Issue Type: Improvement
Reporter: Sylvain Lebresne
Priority: Minor
  Labels: lhf
 Attachments: nodetool_repair_and_cfhistogram.tar.gz


 Some reports indicate that repair sometime transfer huge amounts of data. One 
 hypothesis is that the merkle tree precision may deteriorate too much at some 
 data size. To check this hypothesis, it would be reasonably to gather 
 statistic during the merkle tree building of how many rows each merkle tree 
 range account for (and the size that this represent). It is probably an 
 interesting statistic to have anyway.   

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-2698) Instrument repair to be able to assess it's efficiency (precision)

2012-07-06 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13408329#comment-13408329
 ] 

Jonathan Ellis commented on CASSANDRA-2698:
---

Following the comments at the top, you want two things here:

# A histogram of TreeRange row counts
# for each pair of merkle tree, the number of ranges that differs and the 
corresponding streamed size of the data 

1. is easy: add an EstimatedHistogram to the MerkleTree class, and when the 
ranges are finished computing, you'd iterate over each and add its row count to 
the histogram
2. is a bit more involved: you want to extend the logging done by Differencer 
to include the given information, which is going to involve poking into the 
guts of (probably) MerkleTree.difference and performStreamingRepair.

I agree though that repair is an intimidating part of the code base.  If you 
want to start with something simpler, that's fine too.

 Instrument repair to be able to assess it's efficiency (precision)
 --

 Key: CASSANDRA-2698
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2698
 Project: Cassandra
  Issue Type: Improvement
Reporter: Sylvain Lebresne
Priority: Minor
  Labels: lhf
 Attachments: nodetool_repair_and_cfhistogram.tar.gz


 Some reports indicate that repair sometime transfer huge amounts of data. One 
 hypothesis is that the merkle tree precision may deteriorate too much at some 
 data size. To check this hypothesis, it would be reasonably to gather 
 statistic during the merkle tree building of how many rows each merkle tree 
 range account for (and the size that this represent). It is probably an 
 interesting statistic to have anyway.   

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2698) Instrument repair to be able to assess it's efficiency (precision)

2012-07-04 Thread Jason Wee (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13406548#comment-13406548
 ] 

Jason Wee commented on CASSANDRA-2698:
--

Hello Jonathan, I read through this ticket and trying to understand the 
context. Unfortunately, at this stage, I am clueless where to start. However, 
attach is the output of nodetool repair and nodetool cfhistogram which may help 
in this ticket context imo. Please let me know if the attachment helps or I 
should probably go to trivial bug first so that I can have more time to 
understand the code as well as the development culture and procedures here. 

 Instrument repair to be able to assess it's efficiency (precision)
 --

 Key: CASSANDRA-2698
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2698
 Project: Cassandra
  Issue Type: Improvement
Reporter: Sylvain Lebresne
Priority: Minor
  Labels: lhf
 Attachments: nodetool_repair_and_cfhistogram.tar.gz


 Some reports indicate that repair sometime transfer huge amounts of data. One 
 hypothesis is that the merkle tree precision may deteriorate too much at some 
 data size. To check this hypothesis, it would be reasonably to gather 
 statistic during the merkle tree building of how many rows each merkle tree 
 range account for (and the size that this represent). It is probably an 
 interesting statistic to have anyway.   

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2698) Instrument repair to be able to assess it's efficiency (precision)

2012-07-03 Thread Jason Wee (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13405822#comment-13405822
 ] 

Jason Wee commented on CASSANDRA-2698:
--

Hello, I've been using cassandra and develop client application to interfacing 
(crud) to the cassandra. It has been a great software and I would like to 
contribute back to cassandra and I've read 
http://wiki.apache.org/cassandra/HowToContribute 
which link me to 
https://issues.apache.org/jira/secure/IssueNavigator.jspa?reset=truejqlQuery=project+%3D+12310865+AND+labels+%3D+lhf+AND+status+!%3D+resolved
 . Since this is a fresh for me, I'm not sure if this is a right place to start 
contributing and I hope you can response where and how should I able to 
contribute.
Thank you.

 Instrument repair to be able to assess it's efficiency (precision)
 --

 Key: CASSANDRA-2698
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2698
 Project: Cassandra
  Issue Type: Improvement
Reporter: Sylvain Lebresne
Priority: Minor
  Labels: lhf

 Some reports indicate that repair sometime transfer huge amounts of data. One 
 hypothesis is that the merkle tree precision may deteriorate too much at some 
 data size. To check this hypothesis, it would be reasonably to gather 
 statistic during the merkle tree building of how many rows each merkle tree 
 range account for (and the size that this represent). It is probably an 
 interesting statistic to have anyway.   

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2698) Instrument repair to be able to assess it's efficiency (precision)

2012-07-03 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13405990#comment-13405990
 ] 

Jonathan Ellis commented on CASSANDRA-2698:
---

Hi Jason, you've come to the right place.  Fire away.

 Instrument repair to be able to assess it's efficiency (precision)
 --

 Key: CASSANDRA-2698
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2698
 Project: Cassandra
  Issue Type: Improvement
Reporter: Sylvain Lebresne
Priority: Minor
  Labels: lhf

 Some reports indicate that repair sometime transfer huge amounts of data. One 
 hypothesis is that the merkle tree precision may deteriorate too much at some 
 data size. To check this hypothesis, it would be reasonably to gather 
 statistic during the merkle tree building of how many rows each merkle tree 
 range account for (and the size that this represent). It is probably an 
 interesting statistic to have anyway.   

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-2698) Instrument repair to be able to assess it's efficiency (precision)

2011-09-01 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-2698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13095348#comment-13095348
 ] 

Sylvain Lebresne commented on CASSANDRA-2698:
-

An EstimatedHistogram would be just fine. That plus for each pair of merkle 
tree, the number of ranges that differs and the corresponding streamed size of 
the data would be a very good start imho.

I think the only thing we need to figure out for this patch is where it makes 
the most sense to record that data. What I mean here is that the merkle trees 
are computed on each node participating in a repair (and thus that is where the 
EstimatedHistogram can be computed), while the computing of the differences is 
only done on the coordinator. But on an ideal world, it would seem more useful 
to store those information together (for a given repair) because they are 
related.

 Instrument repair to be able to assess it's efficiency (precision)
 --

 Key: CASSANDRA-2698
 URL: https://issues.apache.org/jira/browse/CASSANDRA-2698
 Project: Cassandra
  Issue Type: Improvement
Reporter: Sylvain Lebresne
Priority: Minor
  Labels: lhf

 Some reports indicate that repair sometime transfer huge amounts of data. One 
 hypothesis is that the merkle tree precision may deteriorate too much at some 
 data size. To check this hypothesis, it would be reasonably to gather 
 statistic during the merkle tree building of how many rows each merkle tree 
 range account for (and the size that this represent). It is probably an 
 interesting statistic to have anyway.   

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira