[jira] [Commented] (CASSANDRA-13282) Commitlog replay may fail if last mutation is within 4 bytes of end of segment

2017-03-08 Thread Branimir Lambov (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15902606#comment-15902606
 ] 

Branimir Lambov commented on CASSANDRA-13282:
-

Not adding a guard is fine. I would change the comment a little, though, to 
mention such sections come from 2.1 and earlier.

> Commitlog replay may fail if last mutation is within 4 bytes of end of segment
> --
>
> Key: CASSANDRA-13282
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13282
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Jeff Jirsa
>Assignee: Jeff Jirsa
> Fix For: 2.2.x, 3.0.x, 3.11.x, 4.x
>
> Attachments: whiteboard.png
>
>
> Following CASSANDRA-9749 , stricter correctness checks on commitlog replay 
> can incorrectly detect "corrupt segments" and stop commitlog replay (and 
> potentially stop cassandra, depending on the configured policy). In 
> {{CommitlogReplayer#replaySyncSection}} we try to read a 4 byte int 
> {{serializedSize}}, and if it's 0 (which will happen due to zeroing when the 
> segment was created), we continue on to the next segment. However, it appears 
> that if a mutation is sized such that it ends with 1, 2, or 3 bytes remaining 
> in the segment, we'll pass the {{isEOF}} on the while loop but fail to read 
> the {{serializedSize}} int, and fail. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-13282) Commitlog replay may fail if last mutation is within 4 bytes of end of segment

2017-03-08 Thread Jeff Jirsa (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15901363#comment-15901363
 ] 

Jeff Jirsa commented on CASSANDRA-13282:


[~blambov] Testing did encounter this with 2.1 commitlogs, yes. I'm not sure 
it's worth adding a descriptor guard onto it - it should be safe in both 
situations, though (as you mention) less meaningful in 2.2+. I'm not opposed to 
it, but I'm also not convinced it's necessary. Will defer to your opinion on 
that.

> Commitlog replay may fail if last mutation is within 4 bytes of end of segment
> --
>
> Key: CASSANDRA-13282
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13282
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Jeff Jirsa
>Assignee: Jeff Jirsa
> Fix For: 2.2.x, 3.0.x, 3.11.x, 4.x
>
> Attachments: whiteboard.png
>
>
> Following CASSANDRA-9749 , stricter correctness checks on commitlog replay 
> can incorrectly detect "corrupt segments" and stop commitlog replay (and 
> potentially stop cassandra, depending on the configured policy). In 
> {{CommitlogReplayer#replaySyncSection}} we try to read a 4 byte int 
> {{serializedSize}}, and if it's 0 (which will happen due to zeroing when the 
> segment was created), we continue on to the next segment. However, it appears 
> that if a mutation is sized such that it ends with 1, 2, or 3 bytes remaining 
> in the segment, we'll pass the {{isEOF}} on the while loop but fail to read 
> the {{serializedSize}} int, and fail. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-13282) Commitlog replay may fail if last mutation is within 4 bytes of end of segment

2017-03-08 Thread Branimir Lambov (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15901219#comment-15901219
 ] 

Branimir Lambov commented on CASSANDRA-13282:
-

Patch LGTM, but I want to make sure this isn't showing a more serious problem. 
Is an upgrade involved in triggering this issue?

In 2.2+ we write the exact end of written data ({{endOfBuffer}}) on {{sync()}} 
and sync sections are thus always precisely sized, thus this should not be 
happening. We could end up in this situation upgrading from 2.1 and earlier, 
though. Perhaps we should only ignore this if the descriptor version is pre-2.2?

> Commitlog replay may fail if last mutation is within 4 bytes of end of segment
> --
>
> Key: CASSANDRA-13282
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13282
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Jeff Jirsa
>Assignee: Jeff Jirsa
> Fix For: 2.2.x, 3.0.x, 3.11.x, 4.x
>
> Attachments: whiteboard.png
>
>
> Following CASSANDRA-9749 , stricter correctness checks on commitlog replay 
> can incorrectly detect "corrupt segments" and stop commitlog replay (and 
> potentially stop cassandra, depending on the configured policy). In 
> {{CommitlogReplayer#replaySyncSection}} we try to read a 4 byte int 
> {{serializedSize}}, and if it's 0 (which will happen due to zeroing when the 
> segment was created), we continue on to the next segment. However, it appears 
> that if a mutation is sized such that it ends with 1, 2, or 3 bytes remaining 
> in the segment, we'll pass the {{isEOF}} on the while loop but fail to read 
> the {{serializedSize}} int, and fail. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-13282) Commitlog replay may fail if last mutation is within 4 bytes of end of segment

2017-03-07 Thread Joshua McKenzie (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15899472#comment-15899472
 ] 

Joshua McKenzie commented on CASSANDRA-13282:
-

[~blambov] to review.

> Commitlog replay may fail if last mutation is within 4 bytes of end of segment
> --
>
> Key: CASSANDRA-13282
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13282
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Jeff Jirsa
>Assignee: Jeff Jirsa
> Fix For: 2.2.x, 3.0.x, 3.11.x, 4.x
>
> Attachments: whiteboard.png
>
>
> Following CASSANDRA-9749 , stricter correctness checks on commitlog replay 
> can incorrectly detect "corrupt segments" and stop commitlog replay (and 
> potentially stop cassandra, depending on the configured policy). In 
> {{CommitlogReplayer#replaySyncSection}} we try to read a 4 byte int 
> {{serializedSize}}, and if it's 0 (which will happen due to zeroing when the 
> segment was created), we continue on to the next segment. However, it appears 
> that if a mutation is sized such that it ends with 1, 2, or 3 bytes remaining 
> in the segment, we'll pass the {{isEOF}} on the while loop but fail to read 
> the {{serializedSize}} int, and fail. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-13282) Commitlog replay may fail if last mutation is within 4 bytes of end of segment

2017-03-02 Thread Jeff Jirsa (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15892849#comment-15892849
 ] 

Jeff Jirsa commented on CASSANDRA-13282:


Attaching a drawing for whichever reviewer wants this ticket - drawing created 
while discussing this offline because it's somewhat nuanced apparently. 
Basically when we allocate in {{sync()}}, if we're at the end of a file, we 
return -1, and then the end marker for the segment gets set to the end of the 
file. Therefore within the while loop as we replay an individual sync section, 
we can get to a point where we throw trying to read an int from the unused tail 
of the section. 

> Commitlog replay may fail if last mutation is within 4 bytes of end of segment
> --
>
> Key: CASSANDRA-13282
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13282
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Jeff Jirsa
>Assignee: Jeff Jirsa
> Fix For: 2.2.x, 3.0.x, 3.11.x, 4.x
>
> Attachments: whiteboard.png
>
>
> Following CASSANDRA-9749 , stricter correctness checks on commitlog replay 
> can incorrectly detect "corrupt segments" and stop commitlog replay (and 
> potentially stop cassandra, depending on the configured policy). In 
> {{CommitlogReplayer#replaySyncSection}} we try to read a 4 byte int 
> {{serializedSize}}, and if it's 0 (which will happen due to zeroing when the 
> segment was created), we continue on to the next segment. However, it appears 
> that if a mutation is sized such that it ends with 1, 2, or 3 bytes remaining 
> in the segment, we'll pass the {{isEOF}} on the while loop but fail to read 
> the {{serializedSize}} int, and fail. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (CASSANDRA-13282) Commitlog replay may fail if last mutation is within 4 bytes of end of segment

2017-03-02 Thread Jeff Jirsa (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15892831#comment-15892831
 ] 

Jeff Jirsa commented on CASSANDRA-13282:


Will kick off CI in a bit, not going to do all 8 tests at once because I don't 
want to monopolize cassci, but links should be accurate once tests start. 

|| Branch || Unit Tests || DTests ||
| [2.2|https://github.com/jeffjirsa/cassandra/commits/cassandra-2.2-13282] | 
[testall|http://cassci.datastax.com/job/jeffjirsa-cassandra-2.2-13282-testall/] 
| [dtest|http://cassci.datastax.com/job/jeffjirsa-cassandra-2.2-13282-dtest/] |
| [3.0|https://github.com/jeffjirsa/cassandra/commits/cassandra-3.0-13282] | 
[testall|http://cassci.datastax.com/job/jeffjirsa-cassandra-3.0-13282-testall/] 
| [dtest|http://cassci.datastax.com/job/jeffjirsa-cassandra-3.0-13282-dtest/] |
| [3.11|https://github.com/jeffjirsa/cassandra/commits/cassandra-3.11-13282] | 
[testall|http://cassci.datastax.com/job/jeffjirsa-cassandra-3.11-13282-testall/]
 | [dtest|http://cassci.datastax.com/job/jeffjirsa-cassandra-3.11-13282-dtest/] 
|
| [trunk|https://github.com/jeffjirsa/cassandra/commits/cassandra-13282] | 
[testall|http://cassci.datastax.com/job/jeffjirsa-cassandra-13282-testall/] | 
[dtest|http://cassci.datastax.com/job/jeffjirsa-cassandra-13282-dtest/] |


> Commitlog replay may fail if last mutation is within 4 bytes of end of segment
> --
>
> Key: CASSANDRA-13282
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13282
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Jeff Jirsa
>Assignee: Jeff Jirsa
> Fix For: 3.0.x, 3.11.x, 4.x
>
>
> Following CASSANDRA-9749 , stricter correctness checks on commitlog replay 
> can incorrectly detect "corrupt segments" and stop commitlog replay (and 
> potentially stop cassandra, depending on the configured policy). In 
> {{CommitlogReplayer#replaySyncSection}} we try to read a 4 byte int 
> {{serializedSize}}, and if it's 0 (which will happen due to zeroing when the 
> segment was created), we continue on to the next segment. However, it appears 
> that if a mutation is sized such that it ends with 1, 2, or 3 bytes remaining 
> in the segment, we'll pass the {{isEOF}} on the while loop but fail to read 
> the {{serializedSize}} int, and fail. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)