[jira] [Commented] (CASSANDRA-5791) A nodetool command to validate all sstables in a node

2015-07-31 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14649200#comment-14649200
 ] 

Jonathan Ellis commented on CASSANDRA-5791:
---

Created CASSANDRA-9947 to follow up.

 A nodetool command to validate all sstables in a node
 -

 Key: CASSANDRA-5791
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5791
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: sankalp kohli
Assignee: Jeff Jirsa
Priority: Minor
 Fix For: 2.2.0 beta 1

 Attachments: cassandra-5791-20150319.diff, 
 cassandra-5791-patch-3.diff, cassandra-5791.patch-2


 CUrrently there is no nodetool command to validate all sstables on disk. The 
 only way to do this is to run a repair and see if it succeeds. But we cannot 
 repair the system keyspace. 
 Also we can run upgrade sstables but that re writes all the sstables. 
 This command should check the hash of all sstables and return whether all 
 data is readable all not. This should NOT care about consistency. 
 The compressed sstables do not have hash so not sure how it will work there.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-5791) A nodetool command to validate all sstables in a node

2015-07-09 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14620541#comment-14620541
 ] 

Jonathan Ellis commented on CASSANDRA-5791:
---

bq. without the marking-unrepaired part, incremental repair does not handle the 
bitrot case formerly handled by non-incremental repair

The problem I'm bringing up is that even *with* marking unrepaired, incremental 
repair does not handle bitrot.  What we'd need to do is mark unrepaired the 
exact sstables that contain the same data as the bitrotted one, on the other 
replicas, which is impossible.

Incremental repair syncs up all the replicas but it doesn't repair when we 
lose data that we once had but don't anymore, that's the tradeoff.  So for 
bitrot, like other data loss, you need full repair.  (But, we can do better 
here than a typical human would, by only doing a full repair on the range 
covered by the corrupt sstable.)

 A nodetool command to validate all sstables in a node
 -

 Key: CASSANDRA-5791
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5791
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: sankalp kohli
Assignee: Jeff Jirsa
Priority: Minor
 Fix For: 2.2.0 beta 1

 Attachments: cassandra-5791-20150319.diff, 
 cassandra-5791-patch-3.diff, cassandra-5791.patch-2


 CUrrently there is no nodetool command to validate all sstables on disk. The 
 only way to do this is to run a repair and see if it succeeds. But we cannot 
 repair the system keyspace. 
 Also we can run upgrade sstables but that re writes all the sstables. 
 This command should check the hash of all sstables and return whether all 
 data is readable all not. This should NOT care about consistency. 
 The compressed sstables do not have hash so not sure how it will work there.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-5791) A nodetool command to validate all sstables in a node

2015-07-09 Thread Robert Coli (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14619966#comment-14619966
 ] 

Robert Coli commented on CASSANDRA-5791:


{quote}
[...] so we mark sstables that fail verification as unrepaired? Because that's 
not going to help much [...]
IMO what we should do is:
# scrub, because it's quite likely we'll fail reading from the sstable 
otherwise and
# full repair across the data range covered by the sstable
{quote}
This is the wrinkle from the (otherwise duplicate of this ticket) 
CASSANDRA-8703 :
{quote}
From my understanding, if bitrot is detected (via eg the CRC on the read path) 
then all SSTables containing the corrupted range need to be marked unrepaired 
on all replicas. Per marcuse@IRC, the naive/simplest response would be to just 
trigger a full repair in this case.
{quote}
As an operator, my personal interest in 5791/8703 is that without the 
marking-unrepaired part, incremental repair does not handle the bitrot case 
formerly handled by non-incremental repair. 

tl;dr - I am +1 to the above approach!

 A nodetool command to validate all sstables in a node
 -

 Key: CASSANDRA-5791
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5791
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: sankalp kohli
Assignee: Jeff Jirsa
Priority: Minor
 Fix For: 2.2.0 beta 1

 Attachments: cassandra-5791-20150319.diff, 
 cassandra-5791-patch-3.diff, cassandra-5791.patch-2


 CUrrently there is no nodetool command to validate all sstables on disk. The 
 only way to do this is to run a repair and see if it succeeds. But we cannot 
 repair the system keyspace. 
 Also we can run upgrade sstables but that re writes all the sstables. 
 This command should check the hash of all sstables and return whether all 
 data is readable all not. This should NOT care about consistency. 
 The compressed sstables do not have hash so not sure how it will work there.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-5791) A nodetool command to validate all sstables in a node

2015-07-08 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14619190#comment-14619190
 ] 

Jonathan Ellis commented on CASSANDRA-5791:
---

... so we mark sstables that fail verification as unrepaired?  Because that's 
not going to help much: it means the local node will use that sstable in the 
next repair, but other nodes will not.  So all we'll end up doing is streaming 
whatever data we can read from it, to the other replicas.

IMO what we should do is:

# scrub, because it's quite likely we'll fail reading from the sstable 
otherwise and
# full repair across the data range covered by the sstable



 A nodetool command to validate all sstables in a node
 -

 Key: CASSANDRA-5791
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5791
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: sankalp kohli
Assignee: Jeff Jirsa
Priority: Minor
 Fix For: 2.2.0 beta 1

 Attachments: cassandra-5791-20150319.diff, 
 cassandra-5791-patch-3.diff, cassandra-5791.patch-2


 CUrrently there is no nodetool command to validate all sstables on disk. The 
 only way to do this is to run a repair and see if it succeeds. But we cannot 
 repair the system keyspace. 
 Also we can run upgrade sstables but that re writes all the sstables. 
 This command should check the hash of all sstables and return whether all 
 data is readable all not. This should NOT care about consistency. 
 The compressed sstables do not have hash so not sure how it will work there.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-5791) A nodetool command to validate all sstables in a node

2015-07-08 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14619194#comment-14619194
 ] 

Jonathan Ellis commented on CASSANDRA-5791:
---

More minor point:

I'm not sure that keeping extended verify code around is worth it.  Since the 
point is to work around not having a checksum, we could just scrub instead.  
This is slightly more heavyweight but it would be a one-time cost (scrub would 
build a new checksum) and we wouldn't have to worry about keeping two versions 
of almost-the-same-code in sync.

 A nodetool command to validate all sstables in a node
 -

 Key: CASSANDRA-5791
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5791
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: sankalp kohli
Assignee: Jeff Jirsa
Priority: Minor
 Fix For: 2.2.0 beta 1

 Attachments: cassandra-5791-20150319.diff, 
 cassandra-5791-patch-3.diff, cassandra-5791.patch-2


 CUrrently there is no nodetool command to validate all sstables on disk. The 
 only way to do this is to run a repair and see if it succeeds. But we cannot 
 repair the system keyspace. 
 Also we can run upgrade sstables but that re writes all the sstables. 
 This command should check the hash of all sstables and return whether all 
 data is readable all not. This should NOT care about consistency. 
 The compressed sstables do not have hash so not sure how it will work there.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-5791) A nodetool command to validate all sstables in a node

2015-04-01 Thread Dave Brosius (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14391891#comment-14391891
 ] 

Dave Brosius commented on CASSANDRA-5791:
-

ok, thanks.

i ninja'ed a minor change to make it more obvious this was intentional, so as 
not to confuse the reader

-SSTableIdentityIterator atoms = new 
SSTableIdentityIterator(sstable, dataFile, key, true);
+//mimic the scrub read path
+new SSTableIdentityIterator(sstable, dataFile, key, true);


 A nodetool command to validate all sstables in a node
 -

 Key: CASSANDRA-5791
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5791
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: sankalp kohli
Assignee: Jeff Jirsa
Priority: Minor
 Fix For: 3.0

 Attachments: cassandra-5791-20150319.diff, 
 cassandra-5791-patch-3.diff, cassandra-5791.patch-2


 CUrrently there is no nodetool command to validate all sstables on disk. The 
 only way to do this is to run a repair and see if it succeeds. But we cannot 
 repair the system keyspace. 
 Also we can run upgrade sstables but that re writes all the sstables. 
 This command should check the hash of all sstables and return whether all 
 data is readable all not. This should NOT care about consistency. 
 The compressed sstables do not have hash so not sure how it will work there.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-5791) A nodetool command to validate all sstables in a node

2015-04-01 Thread Jeff Jirsa (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14391712#comment-14391712
 ] 

Jeff Jirsa commented on CASSANDRA-5791:
---

Sylvain asked the same thing in IRC this morning - the intent was to mimic the 
scrub read path, which would allow the SSTableIdentityIterator to discover 
compression checksum exceptions while decompressing, though - as Sylvain 
pointed out to me - it's probably insufficient for large partitions.  I'll need 
investigate to determine whether or not it will actually throw on blocks beyond 
the start of the partition. 



 A nodetool command to validate all sstables in a node
 -

 Key: CASSANDRA-5791
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5791
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: sankalp kohli
Assignee: Jeff Jirsa
Priority: Minor
 Fix For: 3.0

 Attachments: cassandra-5791-20150319.diff, 
 cassandra-5791-patch-3.diff, cassandra-5791.patch-2


 CUrrently there is no nodetool command to validate all sstables on disk. The 
 only way to do this is to run a repair and see if it succeeds. But we cannot 
 repair the system keyspace. 
 Also we can run upgrade sstables but that re writes all the sstables. 
 This command should check the hash of all sstables and return whether all 
 data is readable all not. This should NOT care about consistency. 
 The compressed sstables do not have hash so not sure how it will work there.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-5791) A nodetool command to validate all sstables in a node

2015-04-01 Thread Dave Brosius (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14391691#comment-14391691
 ] 

Dave Brosius commented on CASSANDRA-5791:
-

In Verifier ~ line 187 in method  public void verify(boolean extended) throws 
IOException

is

SSTableIdentityIterator atoms = new SSTableIdentityIterator(sstable, dataFile, 
key, true);

is that doing something here? a side effect perhaps, or just left over to be 
removed?



 A nodetool command to validate all sstables in a node
 -

 Key: CASSANDRA-5791
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5791
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: sankalp kohli
Assignee: Jeff Jirsa
Priority: Minor
 Fix For: 3.0

 Attachments: cassandra-5791-20150319.diff, 
 cassandra-5791-patch-3.diff, cassandra-5791.patch-2


 CUrrently there is no nodetool command to validate all sstables on disk. The 
 only way to do this is to run a repair and see if it succeeds. But we cannot 
 repair the system keyspace. 
 Also we can run upgrade sstables but that re writes all the sstables. 
 This command should check the hash of all sstables and return whether all 
 data is readable all not. This should NOT care about consistency. 
 The compressed sstables do not have hash so not sure how it will work there.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-5791) A nodetool command to validate all sstables in a node

2015-03-31 Thread Branimir Lambov (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14388344#comment-14388344
 ] 

Branimir Lambov commented on CASSANDRA-5791:


+1

I agree that there isn't need to test DataIntegrityMetadata separately.

 A nodetool command to validate all sstables in a node
 -

 Key: CASSANDRA-5791
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5791
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: sankalp kohli
Assignee: Jeff Jirsa
Priority: Minor
 Fix For: 3.0

 Attachments: cassandra-5791-20150319.diff, 
 cassandra-5791-patch-3.diff, cassandra-5791.patch-2


 CUrrently there is no nodetool command to validate all sstables on disk. The 
 only way to do this is to run a repair and see if it succeeds. But we cannot 
 repair the system keyspace. 
 Also we can run upgrade sstables but that re writes all the sstables. 
 This command should check the hash of all sstables and return whether all 
 data is readable all not. This should NOT care about consistency. 
 The compressed sstables do not have hash so not sure how it will work there.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-5791) A nodetool command to validate all sstables in a node

2015-03-19 Thread Jeff Jirsa (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14370665#comment-14370665
 ] 

Jeff Jirsa commented on CASSANDRA-5791:
---

So the same bug corrected in CASSANDRA-8778 was re-introduced by CASSANDRA-8709 
, as it was developed in parallel and was likely merged/reviewed without the 
benefit of knowing about #8778.

I've done the following:

1) Rebased to trunk as of 20150319
2) Removed o.a.c.io.DataIntegrityMetadata#append
3) Corrected o.a.c.io.DataIntegrityMetadata#appendDirect
4) Brought over [~benedict]'s PureJavaCRC32 's fix from above (which was 
correct - 7bef6f93aea3a6897b53e909688f5948c018ccdf) 

Commit: 
https://github.com/jeffjirsa/cassandra/commit/79642ea4f56a33f249e807abdd562f89d20f6c36
Diff at 
https://github.com/apache/cassandra/compare/trunk...jeffjirsa:cassandra-5791.diff
 , I'll also attach here as cassandra-5791-20150319.diff

Passing: 
{noformat}
[junit] -  ---
[junit] Testsuite: org.apache.cassandra.db.VerifyTest
[junit] Tests run: 10, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
2.803 sec
[junit] 
{noformat}

[~jbellis] and [~benedict] - if you want unit tests for DataIntegrityMetadata, 
Jira it, assign me, and I'll write them. I'd have done it tonight but I can't 
convince myself that they're not redundant with the (included) verifier unit 
tests which will test the checksums anyway. 


 A nodetool command to validate all sstables in a node
 -

 Key: CASSANDRA-5791
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5791
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: sankalp kohli
Assignee: Jeff Jirsa
Priority: Minor
 Fix For: 3.0

 Attachments: cassandra-5791-20150319.diff, 
 cassandra-5791-patch-3.diff, cassandra-5791.patch-2


 CUrrently there is no nodetool command to validate all sstables on disk. The 
 only way to do this is to run a repair and see if it succeeds. But we cannot 
 repair the system keyspace. 
 Also we can run upgrade sstables but that re writes all the sstables. 
 This command should check the hash of all sstables and return whether all 
 data is readable all not. This should NOT care about consistency. 
 The compressed sstables do not have hash so not sure how it will work there.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-5791) A nodetool command to validate all sstables in a node

2015-03-19 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14369750#comment-14369750
 ] 

Jonathan Ellis commented on CASSANDRA-5791:
---

reverted in b25adc765769869d16410f1ca156227745d9b17b until the tests can be 
fixed

 A nodetool command to validate all sstables in a node
 -

 Key: CASSANDRA-5791
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5791
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: sankalp kohli
Assignee: Jeff Jirsa
Priority: Minor
 Fix For: 3.0

 Attachments: cassandra-5791-patch-3.diff, cassandra-5791.patch-2


 CUrrently there is no nodetool command to validate all sstables on disk. The 
 only way to do this is to run a repair and see if it succeeds. But we cannot 
 repair the system keyspace. 
 Also we can run upgrade sstables but that re writes all the sstables. 
 This command should check the hash of all sstables and return whether all 
 data is readable all not. This should NOT care about consistency. 
 The compressed sstables do not have hash so not sure how it will work there.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-5791) A nodetool command to validate all sstables in a node

2015-03-19 Thread Jeff Jirsa (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14370077#comment-14370077
 ] 

Jeff Jirsa commented on CASSANDRA-5791:
---

Cause of tests failing is that checksums are incorrect for compressed sstables 
again. 

{noformat}
# cat 
/Users/jeff/.ccm/snapshot/node1/data/test2/metrics-aded07e0ce7711e4897c85b755fc16c4/la-1-big-Digest.adler32
 
822598308
# java AdlerCheckSum 
/Users/jeff/.ccm/snapshot/node1/data/test2/metrics-aded07e0ce7711e4897c85b755fc16c4/la-1-big-Data.db
864477438
{/oformat}

The checksums should have been corrected by CASSANDRA-8778 so I'll figure out 
where the regression happened tonight after business hours PST.



 A nodetool command to validate all sstables in a node
 -

 Key: CASSANDRA-5791
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5791
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: sankalp kohli
Assignee: Jeff Jirsa
Priority: Minor
 Fix For: 3.0

 Attachments: cassandra-5791-patch-3.diff, cassandra-5791.patch-2


 CUrrently there is no nodetool command to validate all sstables on disk. The 
 only way to do this is to run a repair and see if it succeeds. But we cannot 
 repair the system keyspace. 
 Also we can run upgrade sstables but that re writes all the sstables. 
 This command should check the hash of all sstables and return whether all 
 data is readable all not. This should NOT care about consistency. 
 The compressed sstables do not have hash so not sure how it will work there.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-5791) A nodetool command to validate all sstables in a node

2015-03-18 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14366959#comment-14366959
 ] 

Benedict commented on CASSANDRA-5791:
-

Looks like this referenced PureJavaCRC32, which no longer is found on trunk. 
Could you have a look to confirm the patch otherwise applied cleanly?

 A nodetool command to validate all sstables in a node
 -

 Key: CASSANDRA-5791
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5791
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: sankalp kohli
Assignee: Jeff Jirsa
Priority: Minor
 Fix For: 3.0

 Attachments: cassandra-5791-patch-3.diff, cassandra-5791.patch-2


 CUrrently there is no nodetool command to validate all sstables on disk. The 
 only way to do this is to run a repair and see if it succeeds. But we cannot 
 repair the system keyspace. 
 Also we can run upgrade sstables but that re writes all the sstables. 
 This command should check the hash of all sstables and return whether all 
 data is readable all not. This should NOT care about consistency. 
 The compressed sstables do not have hash so not sure how it will work there.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-5791) A nodetool command to validate all sstables in a node

2015-03-17 Thread Branimir Lambov (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14364963#comment-14364963
 ] 

Branimir Lambov commented on CASSANDRA-5791:


+1

[~benedict], I think the patch is ready. Could you commit it?

 A nodetool command to validate all sstables in a node
 -

 Key: CASSANDRA-5791
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5791
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: sankalp kohli
Assignee: Jeff Jirsa
Priority: Minor
 Attachments: cassandra-5791-patch-3.diff, cassandra-5791.patch-2


 CUrrently there is no nodetool command to validate all sstables on disk. The 
 only way to do this is to run a repair and see if it succeeds. But we cannot 
 repair the system keyspace. 
 Also we can run upgrade sstables but that re writes all the sstables. 
 This command should check the hash of all sstables and return whether all 
 data is readable all not. This should NOT care about consistency. 
 The compressed sstables do not have hash so not sure how it will work there.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-5791) A nodetool command to validate all sstables in a node

2015-03-10 Thread Jeff Jirsa (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14355775#comment-14355775
 ] 

Jeff Jirsa commented on CASSANDRA-5791:
---

New Diff at 
https://github.com/apache/cassandra/compare/trunk...jeffjirsa:cassandra-5791.diff


 A nodetool command to validate all sstables in a node
 -

 Key: CASSANDRA-5791
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5791
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: sankalp kohli
Assignee: Jeff Jirsa
Priority: Minor
 Attachments: cassandra-5791-patch-3.diff, cassandra-5791.patch-2


 CUrrently there is no nodetool command to validate all sstables on disk. The 
 only way to do this is to run a repair and see if it succeeds. But we cannot 
 repair the system keyspace. 
 Also we can run upgrade sstables but that re writes all the sstables. 
 This command should check the hash of all sstables and return whether all 
 data is readable all not. This should NOT care about consistency. 
 The compressed sstables do not have hash so not sure how it will work there.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-5791) A nodetool command to validate all sstables in a node

2015-03-09 Thread Branimir Lambov (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14353049#comment-14353049
 ] 

Branimir Lambov commented on CASSANDRA-5791:


+1, with two nits:
[DataIntegrityMetadata 
115,125|https://github.com/jeffjirsa/cassandra/commit/89c1153def3f0ef0804d45883d12b09e04bb872d#diff-be889b1991c498fde94c039b5e327269R125]:
 This looks risky. As {{Verifier}} will not try to build a 
{{FileDigestValidator}} when the digest is not present, should we still have 
this special case?
[StandaloneVerifier 
126|https://github.com/jeffjirsa/cassandra/commit/89c1153def3f0ef0804d45883d12b09e04bb872d#diff-85d9fb1ffe8c937029e4ec870f662f6fR126]:
 You may want to exit with code 1 if {{hasFailed}} is true.

 A nodetool command to validate all sstables in a node
 -

 Key: CASSANDRA-5791
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5791
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: sankalp kohli
Assignee: Jeff Jirsa
Priority: Minor
 Attachments: cassandra-5791-patch-3.diff, cassandra-5791.patch-2


 CUrrently there is no nodetool command to validate all sstables on disk. The 
 only way to do this is to run a repair and see if it succeeds. But we cannot 
 repair the system keyspace. 
 Also we can run upgrade sstables but that re writes all the sstables. 
 This command should check the hash of all sstables and return whether all 
 data is readable all not. This should NOT care about consistency. 
 The compressed sstables do not have hash so not sure how it will work there.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-5791) A nodetool command to validate all sstables in a node

2015-03-07 Thread Jeff Jirsa (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14351923#comment-14351923
 ] 

Jeff Jirsa commented on CASSANDRA-5791:
---

Since [~benedict] pulled in CASSANDRA-8778 (thanks Benedict!), I've rebased to 
exclude that patch from this change set, added unit tests, and collapsed all 
requested changes into a single commit for easy merging. 

Find it attached, or online at: 
https://github.com/jeffjirsa/cassandra/compare/cassandra-5791.diff or 
https://github.com/jeffjirsa/cassandra/commit/89c1153def3f0ef0804d45883d12b09e04bb872d





 A nodetool command to validate all sstables in a node
 -

 Key: CASSANDRA-5791
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5791
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: sankalp kohli
Assignee: Jeff Jirsa
Priority: Minor
 Attachments: cassandra-5791-patch-3.diff, cassandra-5791.patch-2


 CUrrently there is no nodetool command to validate all sstables on disk. The 
 only way to do this is to run a repair and see if it succeeds. But we cannot 
 repair the system keyspace. 
 Also we can run upgrade sstables but that re writes all the sstables. 
 This command should check the hash of all sstables and return whether all 
 data is readable all not. This should NOT care about consistency. 
 The compressed sstables do not have hash so not sure how it will work there.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-5791) A nodetool command to validate all sstables in a node

2015-02-26 Thread Branimir Lambov (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14338652#comment-14338652
 ] 

Branimir Lambov commented on CASSANDRA-5791:


Thanks for the updates. Please extract the mutateRepairedAt/throw sequence to a 
private method to avoid the repetition.

The verifier also needs a test, preferably one that writes an SSTable and 
checks both that it validates correctly (so that anyone changing SSTable format 
has to update this as well) and that changing a byte in it/its index causes a 
validation failure. 

 A nodetool command to validate all sstables in a node
 -

 Key: CASSANDRA-5791
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5791
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: sankalp kohli
Assignee: Jeff Jirsa
Priority: Minor
 Attachments: cassandra-5791.patch-2


 CUrrently there is no nodetool command to validate all sstables on disk. The 
 only way to do this is to run a repair and see if it succeeds. But we cannot 
 repair the system keyspace. 
 Also we can run upgrade sstables but that re writes all the sstables. 
 This command should check the hash of all sstables and return whether all 
 data is readable all not. This should NOT care about consistency. 
 The compressed sstables do not have hash so not sure how it will work there.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-5791) A nodetool command to validate all sstables in a node

2015-02-19 Thread Branimir Lambov (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14327399#comment-14327399
 ] 

Branimir Lambov commented on CASSANDRA-5791:


Looks mostly good, I have a few questions about things you don't currently 
treat as failures and a couple of nits:

[Verifier.java 
104|https://github.com/jeffjirsa/cassandra/commit/7367b2fd88fe025006c783d6e07c8a4890690f0a#diff-fa9585e54785eef42f1245aceda15806R104]:
 Should we not mutateRepaired on missing digest?
[Verifier.java 
130|https://github.com/jeffjirsa/cassandra/commit/7367b2fd88fe025006c783d6e07c8a4890690f0a#diff-fa9585e54785eef42f1245aceda15806R130]:
 If the first position in the index is always supposed to be 0, doesn't a 
different value signify an index corruption? Why assert instead of treating 
this as a validation failure? 
[Verifier.java 
202|https://github.com/jeffjirsa/cassandra/commit/7367b2fd88fe025006c783d6e07c8a4890690f0a#diff-fa9585e54785eef42f1245aceda15806R202]
 and 
[168|https://github.com/jeffjirsa/cassandra/commit/7367b2fd88fe025006c783d6e07c8a4890690f0a#diff-fa9585e54785eef42f1245aceda15806R168]:
 You only warn on invalid index entries, but still use 
{{nextRowPositionFromIndex}} for walking the data-- shouldn't then an invalid 
index be also treated as a validation failure?
[Verifier.java 
196|https://github.com/jeffjirsa/cassandra/commit/7367b2fd88fe025006c783d6e07c8a4890690f0a#diff-fa9585e54785eef42f1245aceda15806R196]:
 Is this a good nor bad row? Shouldn't order violation be a validation error?
[Verifier.java 
227|https://github.com/jeffjirsa/cassandra/commit/7367b2fd88fe025006c783d6e07c8a4890690f0a#diff-fa9585e54785eef42f1245aceda15806R227]:
 Is this code reachable? The only place that sets badRows also throws.
[StandaloneVerifier.java 
128|https://github.com/jeffjirsa/cassandra/commit/7367b2fd88fe025006c783d6e07c8a4890690f0a#diff-85d9fb1ffe8c937029e4ec870f662f6fR128]:
 Should we stop on the first error? If there are more problems with the data 
wouldn't we want to mark all ranges as unrepaired?
[sstableverify.bat 
23|https://github.com/jeffjirsa/cassandra/commit/7367b2fd88fe025006c783d6e07c8a4890690f0a#diff-05cb8af4cf28268cb3ca58e70e47da22R23]:
 I'd remove the possibility to specify {{CASSANDRA_MAIN}} to avoid unexpected 
behaviour for people who might have it set for other purposes. Put the class 
name directly into the command as in {{nodetool.bat}}.
[DataIntegrityMetadata.java 
130|https://github.com/jeffjirsa/cassandra/commit/7367b2fd88fe025006c783d6e07c8a4890690f0a#diff-be889b1991c498fde94c039b5e327269R130]:
 Why not just use {{while (true)}} with {{break}} in the validate loop?
[Component.java 
50|https://github.com/jeffjirsa/cassandra/commit/7367b2fd88fe025006c783d6e07c8a4890690f0a#diff-2292395ba109e6df9e1650745810d30aR50]:
 Update comment too.
[NodeTool.java 
1313|https://github.com/jeffjirsa/cassandra/commit/7367b2fd88fe025006c783d6e07c8a4890690f0a#diff-1c11f86cc8881893cd6d369ffc23a809R1313]:
 Change the error message (flush-verification).


 A nodetool command to validate all sstables in a node
 -

 Key: CASSANDRA-5791
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5791
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: sankalp kohli
Assignee: Jeff Jirsa
Priority: Minor
 Attachments: cassandra-5791.patch-2


 CUrrently there is no nodetool command to validate all sstables on disk. The 
 only way to do this is to run a repair and see if it succeeds. But we cannot 
 repair the system keyspace. 
 Also we can run upgrade sstables but that re writes all the sstables. 
 This command should check the hash of all sstables and return whether all 
 data is readable all not. This should NOT care about consistency. 
 The compressed sstables do not have hash so not sure how it will work there.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-5791) A nodetool command to validate all sstables in a node

2015-02-19 Thread Jeff Jirsa (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14328280#comment-14328280
 ] 

Jeff Jirsa commented on CASSANDRA-5791:
---

Thanks for the feedback.

On whether or not a missing digest indicates corruption.  In the case of a 
missing digest, does it make more sense to imply --extended and verify atoms? 
Doing that at least verifies the inline checksums for compressed sstables? 

Most of the remaining nits are 100% valid, and due to me basing this on the 
scrub path without eliminating all of the obsolete code. Cleaning up to 
address. Only nit that seems inconsistent: sstableverify.bat ability to specify 
CASSANDRA_MAIN is consistent with other similar tools (sstablescrub, 
sstableupgrade, sstableloader, sstablekeys)



 A nodetool command to validate all sstables in a node
 -

 Key: CASSANDRA-5791
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5791
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: sankalp kohli
Assignee: Jeff Jirsa
Priority: Minor
 Attachments: cassandra-5791.patch-2


 CUrrently there is no nodetool command to validate all sstables on disk. The 
 only way to do this is to run a repair and see if it succeeds. But we cannot 
 repair the system keyspace. 
 Also we can run upgrade sstables but that re writes all the sstables. 
 This command should check the hash of all sstables and return whether all 
 data is readable all not. This should NOT care about consistency. 
 The compressed sstables do not have hash so not sure how it will work there.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-5791) A nodetool command to validate all sstables in a node

2015-02-12 Thread Jeff Jirsa (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14319695#comment-14319695
 ] 

Jeff Jirsa commented on CASSANDRA-5791:
---

Formerly attached cassandra-5791.patch.txt creates invalid Digests for 
Uncompressed tables (related to #8778). Will modify the patch to separate the 
fix for #8778 from the rest of #5791



 A nodetool command to validate all sstables in a node
 -

 Key: CASSANDRA-5791
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5791
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: sankalp kohli
Assignee: Jeff Jirsa
Priority: Minor

 CUrrently there is no nodetool command to validate all sstables on disk. The 
 only way to do this is to run a repair and see if it succeeds. But we cannot 
 repair the system keyspace. 
 Also we can run upgrade sstables but that re writes all the sstables. 
 This command should check the hash of all sstables and return whether all 
 data is readable all not. This should NOT care about consistency. 
 The compressed sstables do not have hash so not sure how it will work there.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-5791) A nodetool command to validate all sstables in a node

2015-02-06 Thread Jeff Jirsa (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14310025#comment-14310025
 ] 

Jeff Jirsa commented on CASSANDRA-5791:
---

Duplicating my comment from 8703 here since its a dupe and prone to closure :

I've got a version at 
https://github.com/jeffjirsa/cassandra/commits/cassandra-8703 that follows the 
scrub read path and implements nodetool verify / sstableverify. This works, for 
both compressed and uncompressed, but requires walking the entire sstable and 
verifies each on disk atom. This works, it just isn't very fast (though it is 
thorough).
The faster method will be checking against the Digest.sha1 file (which actually 
contains an adler32 hash), and skipping the full iteration. I'll rebase and 
work that in, using the 'walk all atoms' approach above as an optional extended 
verify (-e) or similar, unless someone objects. Also going to rename the DIGEST 
sstable component to Digest.adler32 since it's definitely not sha1 anymore.

 A nodetool command to validate all sstables in a node
 -

 Key: CASSANDRA-5791
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5791
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: sankalp kohli
Priority: Minor

 CUrrently there is no nodetool command to validate all sstables on disk. The 
 only way to do this is to run a repair and see if it succeeds. But we cannot 
 repair the system keyspace. 
 Also we can run upgrade sstables but that re writes all the sstables. 
 This command should check the hash of all sstables and return whether all 
 data is readable all not. This should NOT care about consistency. 
 The compressed sstables do not have hash so not sure how it will work there.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-5791) A nodetool command to validate all sstables in a node

2014-10-23 Thread John Sumsion (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14181858#comment-14181858
 ] 

John Sumsion commented on CASSANDRA-5791:
-

If there is bitrot that causes a checksum failure, I assume that this issue 
would cause the configured disk_failure_policy to take effect, is that true?

 A nodetool command to validate all sstables in a node
 -

 Key: CASSANDRA-5791
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5791
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: sankalp kohli
Priority: Minor

 CUrrently there is no nodetool command to validate all sstables on disk. The 
 only way to do this is to run a repair and see if it succeeds. But we cannot 
 repair the system keyspace. 
 Also we can run upgrade sstables but that re writes all the sstables. 
 This command should check the hash of all sstables and return whether all 
 data is readable all not. This should NOT care about consistency. 
 The compressed sstables do not have hash so not sure how it will work there.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-5791) A nodetool command to validate all sstables in a node

2014-03-13 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13934263#comment-13934263
 ] 

Jonathan Ellis commented on CASSANDRA-5791:
---

4165 is committed

 A nodetool command to validate all sstables in a node
 -

 Key: CASSANDRA-5791
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5791
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: sankalp kohli
Priority: Minor

 CUrrently there is no nodetool command to validate all sstables on disk. The 
 only way to do this is to run a repair and see if it succeeds. But we cannot 
 repair the system keyspace. 
 Also we can run upgrade sstables but that re writes all the sstables. 
 This command should check the hash of all sstables and return whether all 
 data is readable all not. This should NOT care about consistency. 
 The compressed sstables do not have hash so not sure how it will work there.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-5791) A nodetool command to validate all sstables in a node

2013-12-06 Thread Radovan Zvoncek (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13841241#comment-13841241
 ] 

Radovan Zvoncek commented on CASSANDRA-5791:


Once CASSANDRA-4165 gets solved, it might be quite neat to reuse the 
upgradesstables code to iterate over stables. This way we'd only need to add 
verifying the digests. As long as nobody is currently looking into this, I'd 
like to give it a shot.

 A nodetool command to validate all sstables in a node
 -

 Key: CASSANDRA-5791
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5791
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: sankalp kohli
Priority: Minor

 CUrrently there is no nodetool command to validate all sstables on disk. The 
 only way to do this is to run a repair and see if it succeeds. But we cannot 
 repair the system keyspace. 
 Also we can run upgrade sstables but that re writes all the sstables. 
 This command should check the hash of all sstables and return whether all 
 data is readable all not. This should NOT care about consistency. 
 The compressed sstables do not have hash so not sure how it will work there.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (CASSANDRA-5791) A nodetool command to validate all sstables in a node

2013-08-14 Thread sankalp kohli (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13740301#comment-13740301
 ] 

sankalp kohli commented on CASSANDRA-5791:
--

We might also want to check whether sstable is sorted and there is no overlap. 
Something like an online read only scrub

 A nodetool command to validate all sstables in a node
 -

 Key: CASSANDRA-5791
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5791
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: sankalp kohli
Assignee: Tyler Hobbs
Priority: Minor
 Fix For: 1.2.9


 CUrrently there is no nodetool command to validate all sstables on disk. The 
 only way to do this is to run a repair and see if it succeeds. But we cannot 
 repair the system keyspace. 
 Also we can run upgrade sstables but that re writes all the sstables. 
 This command should check the hash of all sstables and return whether all 
 data is readable all not. This should NOT care about consistency. 
 The compressed sstables do not have hash so not sure how it will work there.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-5791) A nodetool command to validate all sstables in a node

2013-08-07 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13732273#comment-13732273
 ] 

Jonathan Ellis commented on CASSANDRA-5791:
---

This may be something we can slip into 1.2.x, but if it looks hairy we'll push 
to 2.0.y.

 A nodetool command to validate all sstables in a node
 -

 Key: CASSANDRA-5791
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5791
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: sankalp kohli
Assignee: Tyler Hobbs
Priority: Minor
 Fix For: 1.2.9


 CUrrently there is no nodetool command to validate all sstables on disk. The 
 only way to do this is to run a repair and see if it succeeds. But we cannot 
 repair the system keyspace. 
 Also we can run upgrade sstables but that re writes all the sstables. 
 This command should check the hash of all sstables and return whether all 
 data is readable all not. This should NOT care about consistency. 
 The compressed sstables do not have hash so not sure how it will work there.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (CASSANDRA-5791) A nodetool command to validate all sstables in a node

2013-07-22 Thread Brandon Williams (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13715695#comment-13715695
 ] 

Brandon Williams commented on CASSANDRA-5791:
-

I think we should write a checksum for compressed sstables to make this easier, 
since it's trivial compared to extracting info from the CompressionInfo 
component.  Then it's just a matter of iterating the sstables and comparing 
checksums.

 A nodetool command to validate all sstables in a node
 -

 Key: CASSANDRA-5791
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5791
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: sankalp kohli
Priority: Minor

 CUrrently there is no nodetool command to validate all sstables on disk. The 
 only way to do this is to run a repair and see if it succeeds. But we cannot 
 repair the system keyspace. 
 Also we can run upgrade sstables but that re writes all the sstables. 
 This command should check the hash of all sstables and return whether all 
 data is readable all not. This should NOT care about consistency. 
 The compressed sstables do not have hash so not sure how it will work there.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira