[jira] [Commented] (CASSANDRA-13794) Fix short read protection logic for querying more rows

2017-09-20 Thread Aleksey Yeschenko (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16173500#comment-16173500
 ] 

Aleksey Yeschenko commented on CASSANDRA-13794:
---

Committed to 3.0 as 
[f93e6e3401c343dec74687d8b079b5697813ab28|https://github.com/apache/cassandra/commit/f93e6e3401c343dec74687d8b079b5697813ab28]
 and merged with 3.11 and trunk.

Circle run for 3.0 [here|https://circleci.com/gh/iamaleksey/cassandra/39] has 
two completely unrelated {{CommitLogSegmentManagerTest}} failures, and [dtest 
run|https://builds.apache.org/view/A-D/view/Cassandra/job/Cassandra-devbranch-dtest/314/testReport/]
 here is mostly failures to git clone.

The passing tests include the 3 new dtests added since this JIRA was created. 
My initial plan was to cover it with proper unit tests, too - similar to read 
repair tests we have - but doing it properly has proven to be too time 
consuming. In addition to the tests we have, I did a lot of manual testing 
(which uncovered a couple more issues - not affecting my branch). But more unit 
test coverage will be added later - we've budgeted a significant chunk of time 
on {{DataResolver}} testing alone.

Follow up JIRAs I'll file soonish. Thanks for the review! 

> Fix short read protection logic for querying more rows
> --
>
> Key: CASSANDRA-13794
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13794
> Project: Cassandra
>  Issue Type: Bug
>  Components: Coordination
>Reporter: Benedict
>Assignee: Aleksey Yeschenko
>  Labels: Correctness
> Fix For: 3.0.15, 3.11.1
>
>
> Discovered by [~benedict] while reviewing CASSANDRA-13747:
> {quote}
> While reviewing I got a little suspicious of the modified line 
> {{DataResolver}} :479, as it seemed that n and x were the wrong way around... 
> and, reading the comment of intent directly above, and reproducing the 
> calculation, they are indeed.
> This is probably a significant enough bug that it warrants its own ticket for 
> record keeping, though I'm fairly agnostic on that decision.
> I'm a little concerned about our current short read behaviour, as right now 
> it seems we should be requesting exactly one row, for any size of under-read, 
> which could mean extremely poor performance in case of large under-reads.
> I would suggest that the outer unconditional {{Math.max}} is a bad idea, has 
> been (poorly) insulating us from this error, and that we should first be 
> asserting that the calculation yields a value >= 0 before setting to 1.
> {quote}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13794) Fix short read protection logic for querying more rows

2017-09-14 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16166838#comment-16166838
 ] 

Benedict commented on CASSANDRA-13794:
--

This patch is great (excepting a couple of extraneous edits).  Love the 
comments.

+1

I would suggest filing two follow-up tickets to address some short comings with 
this code path, but they're edge-case, and not straight-forward enough, to not 
block this merge.

# For some extreme users, 16 rows could be a huge amount of data.  There should 
probably be some modulation of this lower bound based on known data sizes in 
the table, or the like.
# Conversely, in an overloaded cluster on which users are commonly performing 
fairly large-limit reads (say, 1k+ moderate sized rows) of larger partitions, 
we could find ourselves doubling the amount of work the cluster needs to do; 
during overload we can expect dropped writes, and a single missing row in any 
read would trigger a same-sized read.  This could iteratively compound the 
overload. 
 The best solutions to this problem are probably non-trivial, though a simplish 
approach might be to use exponential growth, bounded on both sides by a minimum 
and maximum (perhaps similarly determined from the known data size 
distribution) - with the query limit being used as the first value if it is 
small enough.

I _would_ say that (2) is now worse than the status-quo, given the per-request 
overheads are probably greater than the per-datum overheads in a typical 
cluster, but CASSANDRA-12872 suggests we haven't been incurring the full 
overheads of SRP, so we cannot claim that.  I still think it is reasonable to 
address this in a follow-up ticket, however.

> Fix short read protection logic for querying more rows
> --
>
> Key: CASSANDRA-13794
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13794
> Project: Cassandra
>  Issue Type: Bug
>  Components: Coordination
>Reporter: Benedict
>Assignee: Aleksey Yeschenko
>  Labels: Correctness
> Fix For: 3.0.x, 3.11.x
>
>
> Discovered by [~benedict] while reviewing CASSANDRA-13747:
> {quote}
> While reviewing I got a little suspicious of the modified line 
> {{DataResolver}} :479, as it seemed that n and x were the wrong way around... 
> and, reading the comment of intent directly above, and reproducing the 
> calculation, they are indeed.
> This is probably a significant enough bug that it warrants its own ticket for 
> record keeping, though I'm fairly agnostic on that decision.
> I'm a little concerned about our current short read behaviour, as right now 
> it seems we should be requesting exactly one row, for any size of under-read, 
> which could mean extremely poor performance in case of large under-reads.
> I would suggest that the outer unconditional {{Math.max}} is a bad idea, has 
> been (poorly) insulating us from this error, and that we should first be 
> asserting that the calculation yields a value >= 0 before setting to 1.
> {quote}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13794) Fix short read protection logic for querying more rows

2017-09-04 Thread Aleksey Yeschenko (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16152884#comment-16152884
 ] 

Aleksey Yeschenko commented on CASSANDRA-13794:
---

Work in progress branch 
[here|https://github.com/iamaleksey/cassandra/tree/13794-3.0]. Currently 
missing (new) tests, but I want to get the underlying logic reviewed and 
approved, first. Would add coverage before committing it.

A short summary of the issue: the code right now has two variables swapped, 
which ultimately results in us always fetching 1 extra row per short read 
protection requests, in a blocking manner, making it very inefficient. But upon 
closer look, there are some other inefficiencies here that can and should be 
addressed:

1. One of our stop conditions is {{lastCount == counter.counted()}}. It's 
supposed to abort a short read if our previous attempt to fetch more rows 
yielded 0 extra rows. It's not incorrect, but is only a special case of the 
more general scenario: our previous attempt to fetch more extra rows yielded 
fewer results than we requested for. That would mean there is no more rows to 
fetch at that replica, and allows us to abort earlier and more frequently.

2. Another of our stop conditions is {{!counter.isDoneForPartition()}}. Once 
again, it isn't incorrect, but it can be extended further. Due to the way 
{{isDoneForPartition()}} is defined ({{isDone() || rowInCurrentPartition >= 
perPartitionLimit}}) and because of that counter being counting-only, it is 
possible for us to have fetched enough rows total for other partitions short 
read retries previously to hit the global limit of rows in the counter. That 
would make {{isDone()}} return {{true}} always, and have 
{{isDoneForPartition()}} return false positives even if the partition currently 
processed only has a partition level deletion and/or tombstones. That can 
affect queries that set per partition limit explicitly or when running {{SELECT 
DISTINCT}} queries. Spotted that during CASSANDRA-13747 fixing.

3. Once we've swapped {{x}} and {{n}} in {{moreContents()}} to fix the logic 
error, we'd still have some issues. In degenerate cases where we have some 
nodes missing a fresh partition deletion, for example, the formula would fetch 
*a lot* of rows {{n * (n - 1)}}, with {{n}} growing exponentially with every 
attempt.

Upon closer inspection, the formula doesn't make 100% sense. It claims that we 
miss {{n - x}} rows - where {{n = counter.countedInCurrentPartition()}} and {{x 
= postReconciliationCounter.countedInCurrentPartition()}}, but the number we 
really miss is {{limit - postReconciliationCounter.counted()}} or 
{{perPartitionLimit - postReconciliationCounter.countedInCurrentPartition()}}. 
They might be the same on our first short read protection iteration, but will 
be diverging further and further with each request. In addition to that, it 
seems to assume a uniform distribution of tombstones (in the end result) rows 
in the source partition, which can't be true for most workloads.

I couldn't come up with some ideal heuristic that covers all workloads, so I 
stuck to something safe that respects client paging limits but still attempts 
to minimise the # of requests we make by fetching (in most cases) more rows 
than is minimally necessary. I'm not completely sure about it, but I welcome 
any ideas on how to make it better. Either way, anything we do should be 
significantly more efficient than what we have now.

I've also made some renames, refactorings, and moved a few things around to 
better understand the code myself, and make it clearer for future contributors 
- including future me. The most significant noticeable change is application of 
the per-response counter shift to {{mergeWithShortReadProtection()}} method, 
instead of overloading {{ShortReadRowProtection}} with responsibilities - I 
also like it to be next to the global counter creation, so you can see the 
contrast in arguments.

> Fix short read protection logic for querying more rows
> --
>
> Key: CASSANDRA-13794
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13794
> Project: Cassandra
>  Issue Type: Bug
>  Components: Coordination
>Reporter: Benedict
>Assignee: Aleksey Yeschenko
> Fix For: 3.0.x, 3.11.x
>
>
> Discovered by [~benedict] while reviewing CASSANDRA-13747:
> {quote}
> While reviewing I got a little suspicious of the modified line 
> {{DataResolver}} :479, as it seemed that n and x were the wrong way around... 
> and, reading the comment of intent directly above, and reproducing the 
> calculation, they are indeed.
> This is probably a significant enough bug that it warrants its own ticket for 
> record keeping, though I'm fairly agnostic on that decision.
> I'm a little concerned about our current short