[jira] [Commented] (CASSANDRA-8915) Improve MergeIterator performance

2015-07-16 Thread Branimir Lambov (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14629583#comment-14629583
 ] 

Branimir Lambov commented on CASSANDRA-8915:


The rebased and squashed code is 
[here|https://github.com/blambov/cassandra/tree/8195-rebase] with 
[testall|http://cassci.datastax.com/job/blambov-8195-rebase-testall/] and 
[dtest|http://cassci.datastax.com/job/blambov-8195-rebase-dtest/].

 Improve MergeIterator performance
 -

 Key: CASSANDRA-8915
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8915
 Project: Cassandra
  Issue Type: Improvement
Reporter: Branimir Lambov
Assignee: Branimir Lambov
Priority: Minor
  Labels: compaction, performance
 Fix For: 3.x

  Time Spent: 0.2h
  Remaining Estimate: 0h

 The implementation of {{MergeIterator}} uses a priority queue and applies a 
 pair of {{poll}}+{{add}} operations for every item in the resulting sequence. 
 This is quite inefficient as {{poll}} necessarily applies at least {{log N}} 
 comparisons (up to {{2log N}}), and {{add}} often requires another {{log N}}, 
 for example in the case where the inputs largely don't overlap (where {{N}} 
 is the number of iterators being merged).
 This can easily be replaced with a simple custom structure that can perform 
 replacement of the top of the queue in a single step, which will very often 
 complete after a couple of comparisons and in the worst case scenarios will 
 match the complexity of the current implementation.
 This should significantly improve merge performance for iterators with 
 limited overlap (e.g. levelled compaction).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8915) Improve MergeIterator performance

2015-07-16 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14629604#comment-14629604
 ] 

Benedict commented on CASSANDRA-8915:
-

Thanks. 
[Committed|https://github.com/apache/cassandra/commit/c29001b94df769fe428f414c042daffbb6dd209d].

For future reference, our commit form usually ends with a line like:

patch by branimir; reviewed by benedict for CASSANDRA-8915

It's not a perfect git history, but better to do our best to stick to it when 
we can (it also helps later analysis of commit/review). 

I think we should endeavour to create more descriptive commits alongside, 
though, like you have done here.

 Improve MergeIterator performance
 -

 Key: CASSANDRA-8915
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8915
 Project: Cassandra
  Issue Type: Improvement
Reporter: Branimir Lambov
Assignee: Branimir Lambov
Priority: Minor
  Labels: compaction, performance
 Fix For: 3.x

  Time Spent: 0.2h
  Remaining Estimate: 0h

 The implementation of {{MergeIterator}} uses a priority queue and applies a 
 pair of {{poll}}+{{add}} operations for every item in the resulting sequence. 
 This is quite inefficient as {{poll}} necessarily applies at least {{log N}} 
 comparisons (up to {{2log N}}), and {{add}} often requires another {{log N}}, 
 for example in the case where the inputs largely don't overlap (where {{N}} 
 is the number of iterators being merged).
 This can easily be replaced with a simple custom structure that can perform 
 replacement of the top of the queue in a single step, which will very often 
 complete after a couple of comparisons and in the worst case scenarios will 
 match the complexity of the current implementation.
 This should significantly improve merge performance for iterators with 
 limited overlap (e.g. levelled compaction).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8915) Improve MergeIterator performance

2015-07-15 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14627840#comment-14627840
 ] 

Benedict commented on CASSANDRA-8915:
-

[~blambov]: I realise this isn't marked as critical for 3.0, but it's a nice 
improvement and the only hurdle is a rebase / flatten - otherwise I'm happy and 
ready to commit. Would you mind prepping that when you have a moment?

 Improve MergeIterator performance
 -

 Key: CASSANDRA-8915
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8915
 Project: Cassandra
  Issue Type: Improvement
Reporter: Branimir Lambov
Assignee: Branimir Lambov
Priority: Minor
  Labels: compaction, performance
 Fix For: 3.x

  Time Spent: 0.2h
  Remaining Estimate: 0h

 The implementation of {{MergeIterator}} uses a priority queue and applies a 
 pair of {{poll}}+{{add}} operations for every item in the resulting sequence. 
 This is quite inefficient as {{poll}} necessarily applies at least {{log N}} 
 comparisons (up to {{2log N}}), and {{add}} often requires another {{log N}}, 
 for example in the case where the inputs largely don't overlap (where {{N}} 
 is the number of iterators being merged).
 This can easily be replaced with a simple custom structure that can perform 
 replacement of the top of the queue in a single step, which will very often 
 complete after a couple of comparisons and in the worst case scenarios will 
 match the complexity of the current implementation.
 This should significantly improve merge performance for iterators with 
 limited overlap (e.g. levelled compaction).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8915) Improve MergeIterator performance

2015-06-03 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14570791#comment-14570791
 ] 

Benedict commented on CASSANDRA-8915:
-

Thanks for taking the time to provide this. I have some comments and 
suggestions for clarifying the proof for future readers. It is always tricky to 
find your bearings in a new proof, and I found myself confused by some subtle 
ambiguities that are hard to spot as the author.


* It would help to state that {{=}} refers to the partial order over key 
equality, not node identity (the first sentence in particular lead to some 
double-takes) 
* stands for and predicate (and use of a function-like identifier) to 
descibe {{σ(N)}} is confusing. This caused a surprising number of 
misunderstandings that took a good hour to crystalise in the realisation that 
{{σ(N)}} meant the flag or property that encodes the result of the predicate 
{{ρ(N) = N}}, not a short-hand for that predicate
* It would help to state {{P ≙ ep-heap(M,N)}} to indicate the ep-heap P rooted 
at N that leads to M, to avoid confusion with the similarity of rooted at and 
leading to 
* I also find that switching case to capitals for heaps/structures and 
retaining lower case for nodes/values is helpful for clarity. P and R are in 
the same run of variables as N, but refer to a different category of things. 
i.e. I would have found it clearer to follow the statement {{P ≙ ep-heap(m,n)}}.
* \{\{\}\} can then be used to mark the boundaries of mathematical expressions 
(so lower case chars are still easily parsed as not prose) 

bq. 3. ... If ¬σ(P), set σ(R) := (P = R), otherwise leave σ(R) unchanged.

Perhaps clarify, that σ(P) = σ(R), so already correct. I realise you state 
this later, but here is where the question is raised. The later sentence is 
also very hard to parse, and could be simplified with the earlier short 
explanation.

bq. 3. ...  Use the algorithm recursively to *turn* the two ep-subheaps leading 
to P at its children and the new value C  P into an ep-heap leading to P

Consider *merge*, to clarify that this step restores the binary-tree property.

I haven't yet corroborated the code against the  proof, but will do so shortly. 
I would consider supplying the initial proof outline with its three cases along 
with the code, so that we can annotate each major branch with a short 
demonstration of invariant maintenance.

 Improve MergeIterator performance
 -

 Key: CASSANDRA-8915
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8915
 Project: Cassandra
  Issue Type: Improvement
Reporter: Branimir Lambov
Assignee: Branimir Lambov
Priority: Minor
  Labels: compaction, performance
 Fix For: 3.x


 The implementation of {{MergeIterator}} uses a priority queue and applies a 
 pair of {{poll}}+{{add}} operations for every item in the resulting sequence. 
 This is quite inefficient as {{poll}} necessarily applies at least {{log N}} 
 comparisons (up to {{2log N}}), and {{add}} often requires another {{log N}}, 
 for example in the case where the inputs largely don't overlap (where {{N}} 
 is the number of iterators being merged).
 This can easily be replaced with a simple custom structure that can perform 
 replacement of the top of the queue in a single step, which will very often 
 complete after a couple of comparisons and in the worst case scenarios will 
 match the complexity of the current implementation.
 This should significantly improve merge performance for iterators with 
 limited overlap (e.g. levelled compaction).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8915) Improve MergeIterator performance

2015-06-03 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14570983#comment-14570983
 ] 

Benedict commented on CASSANDRA-8915:
-

np - they were just that: suggestions. Hopefully anyone who would be confused 
by the aspects that I was will be helped sufficiently by the combination of 
your updates and my comment.

I've finished vetting the code. I really like this implementation: not only 
does it ace the comparison tests, it is very clear to read. Perhaps reading the 
proof made it clearer, but I found it as - or easier - to read than the 
non-optimised variant. 

I've pushed some small comments and one minor cleanup (merging two 
near-identical paths in the replaceAndSink method). I've also applied my 
earlier updates to MergeIteratorComparisonTest. I ran these against the other 
two versions, and found it to meet or beat on every test (or be close enough 
that it didn't matter), which matches with expectation.  I'm happy to merge 
once we squash and get clean CI runs.

 Improve MergeIterator performance
 -

 Key: CASSANDRA-8915
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8915
 Project: Cassandra
  Issue Type: Improvement
Reporter: Branimir Lambov
Assignee: Branimir Lambov
Priority: Minor
  Labels: compaction, performance
 Fix For: 3.x


 The implementation of {{MergeIterator}} uses a priority queue and applies a 
 pair of {{poll}}+{{add}} operations for every item in the resulting sequence. 
 This is quite inefficient as {{poll}} necessarily applies at least {{log N}} 
 comparisons (up to {{2log N}}), and {{add}} often requires another {{log N}}, 
 for example in the case where the inputs largely don't overlap (where {{N}} 
 is the number of iterators being merged).
 This can easily be replaced with a simple custom structure that can perform 
 replacement of the top of the queue in a single step, which will very often 
 complete after a couple of comparisons and in the worst case scenarios will 
 match the complexity of the current implementation.
 This should significantly improve merge performance for iterators with 
 limited overlap (e.g. levelled compaction).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8915) Improve MergeIterator performance

2015-06-03 Thread Branimir Lambov (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14570981#comment-14570981
 ] 

Branimir Lambov commented on CASSANDRA-8915:


Updated above with some of your suggestions.

bq. Perhaps clarify, that σ(P) = σ(R), so already correct.

This is not entirely true, {{(σ(P)  P = R) = σ(R)}}. I would find it 
confusing to put that info there, though, I prefer to split actions from proof.

bq. It would help to state {{P ≙ ep-heap(M,N)}} ... and more.

I hesitate to add this; I think the way it is now is enough to convince people 
that it is sound even if it is not very precise (esp wrt bottom).

 Improve MergeIterator performance
 -

 Key: CASSANDRA-8915
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8915
 Project: Cassandra
  Issue Type: Improvement
Reporter: Branimir Lambov
Assignee: Branimir Lambov
Priority: Minor
  Labels: compaction, performance
 Fix For: 3.x


 The implementation of {{MergeIterator}} uses a priority queue and applies a 
 pair of {{poll}}+{{add}} operations for every item in the resulting sequence. 
 This is quite inefficient as {{poll}} necessarily applies at least {{log N}} 
 comparisons (up to {{2log N}}), and {{add}} often requires another {{log N}}, 
 for example in the case where the inputs largely don't overlap (where {{N}} 
 is the number of iterators being merged).
 This can easily be replaced with a simple custom structure that can perform 
 replacement of the top of the queue in a single step, which will very often 
 complete after a couple of comparisons and in the worst case scenarios will 
 match the complexity of the current implementation.
 This should significantly improve merge performance for iterators with 
 limited overlap (e.g. levelled compaction).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8915) Improve MergeIterator performance

2015-06-02 Thread Branimir Lambov (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14569392#comment-14569392
 ] 

Branimir Lambov commented on CASSANDRA-8915:


I wasn't completely sure of the correctness of the flagging approach and I 
wrote the following proof to make sure I'm not missing something:
{panel}
In the discussion below ρ(N) stands for parent of N and σ(N) stands for the 
'equalParent' predicate, i.e. σ(N) ↔ ρ(N) = N.

We define ep-heap leading to M to be a binary tree of nodes where each node N 
satisfies ρ(N) ≤ N  σ(N) ↔ ρ(N) = N and the root has M as its parent. Full 
ep-heaps are ep-heaps leading to top, where 'top' is assumed to be a special 
value strictly smaller than any other (thus σ(R) is always false for the root 
of a full ep-heap). For simplicity we will take all heaps to end at a special 
node called 'bottom' whose value is assumed to be greater than any other value.

The critical step here is adding a new value C  M to combine two existing 
well-formed ep-heaps P and R leading to the same M into a bigger single 
well-formed ep-heap leading to M. Without loss of generality we can take P ≤ R 
(otherwise we can swap the places of P and R in the discussion below).

We distinguish three cases:
# C  P (necessarily ¬σ(P)): We place C at the top by setting ρ(P), ρ(R) := C, 
ρ(C) := M and leaving σ(P), σ(R) := false (Note that σ(R) cannot be true as 
that contradicts with C  M). σ(C) := false. The ep-heap rooted at C is 
well-formed leading to M.
# C = P (necessarily ¬σ(P)): We place C at the top, setting ρ(P), ρ(R) := C, 
ρ(C) := M, σ(P) := true and σ(R) := (P = R). σ(C) := false. The heap rooted at 
C is well-formed as C = P  σ(P) as well as C = P ≤ R  C = R if and only if 
σ(R).
# C  P (σ(P) can be true or false): Set ρ(R) := P. If ¬σ(P), set σ(R) := (P = 
R), otherwise leave σ(R) unchanged. Use the algorithm recursively to turn the 
two ep-subheaps leading to P at its children and the new value C  P into an 
ep-heap leading to P. Let its root be Q. The necessary conditions for an 
element of an ep-heap hold for Q. They are also true for R as P ≤ R and either 
σ(P) in which case P = M and σ(R) if and only if R = M = P, or ¬σ(P) and σ(R) 
is explicitly set to be true if and only if P = R. We also have ρ(P) = M as 
well as P = M if and only if σ(P), thus the constructed ep-heap rooted at P 
leads to M.

The recursion always ends for finite heaps as we must reach the first case due 
to the ordering of bottom.

For performance the order of comparing elements and identifying the case to use 
can be changed to sequentially checking:
- σ(P): apply pt. 3 for P and return (P must be ≤ R). No update of σ(P)/σ(R) 
necessary.
- σ(R): apply pt. 3 for R and return (R must now be  P). No update of 
σ(P)/σ(R) necessary.
- compare P and R.
- If R  P, perform the next steps with P and R swapped.
- compare C and P.
- if C  P apply pt. 1.
- if C = P apply pt. 2.
- if C  P apply pt. 3.

Once an ep-heap with root R is formed, it trivially follows that for each 
element E in the heap E = R if and only if σ is true for E and the chain of 
parents of E leading to but not including the root. To consume the equal 
elements we can thus walk the sub-tree formed by the children which have σ set.

To advance the elements equal to the root, we can:
- Process the elements in the deepest level of the σ sub-tree in any order. For 
each of them:
  -- Advance the element. Let the new value be C.
  -- The children P, R of the element must have ¬σ(P), ¬σ(R). Overwrite ρ(P) 
and ρ(R) with top*. P and R now form ep-heaps leading to top.
  -- Use the algorithm to combine P, R and C into a new ep-heap leading to top.
  -- Let its root be Q. Because the heap leads to top we have ¬σ(Q).
- Continue with the next deepest level of the hierarchy until the top is 
reached.

\* The algorithm never actually uses the values of the parent function ρ, only 
the fact that σ is set correctly with respect to that parent. This means this 
step does not need to be actually performed.
{panel}

Updated [the 
branch|https://github.com/apache/cassandra/compare/trunk...blambov:8915-mergeiterator]
 to use this solution and cleaned the code a bit.

 Improve MergeIterator performance
 -

 Key: CASSANDRA-8915
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8915
 Project: Cassandra
  Issue Type: Improvement
Reporter: Branimir Lambov
Assignee: Branimir Lambov
Priority: Minor
  Labels: compaction, performance
 Fix For: 3.x


 The implementation of {{MergeIterator}} uses a priority queue and applies a 
 pair of {{poll}}+{{add}} operations for every item in the resulting sequence. 
 This is quite inefficient as {{poll}} necessarily applies at least {{log N}} 
 comparisons (up to {{2log N}}), and {{add}} 

[jira] [Commented] (CASSANDRA-8915) Improve MergeIterator performance

2015-05-05 Thread Branimir Lambov (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14528493#comment-14528493
 ] 

Branimir Lambov commented on CASSANDRA-8915:


bq. My optimisation... We could instead perform a merge, but even with this 
imperfect implementation the result is typically half as many comparisons as 
NoEqual, and never more.

I think this is better served by other heap types (e.g. Fibonacci), which I was 
planning to explore at a later time. Do you prefer to do this now?

You have convinced me, though, that avoiding that second comparison for 
{{consume}} is something that must be done, so I added a [{{FlagEqual}} 
variation|https://github.com/apache/cassandra/blob/545aa9c66e568b853673ea40da05f13bd3c86a73/src/java/org/apache/cassandra/utils/MergeIteratorFlagEqual.java]
 of the non-collapsing approach that flags equality when it finds it. The flag 
is then sufficient for {{consume}} and skipping over equal elements in the 
heap, providing equal comparison complexity with {{AllEqual}} but without the 
need to remove/readd iterators. The benchmarks show it to be on par with 
{{AllEqual}} in the random cases, and better than {{NoEqual}} in the limited 
overlap scenarios.

Also, I don't see any links in your comment. Could I also look at your 
experiments?

 Improve MergeIterator performance
 -

 Key: CASSANDRA-8915
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8915
 Project: Cassandra
  Issue Type: Improvement
Reporter: Branimir Lambov
Assignee: Branimir Lambov
Priority: Minor
 Fix For: 3.x


 The implementation of {{MergeIterator}} uses a priority queue and applies a 
 pair of {{poll}}+{{add}} operations for every item in the resulting sequence. 
 This is quite inefficient as {{poll}} necessarily applies at least {{log N}} 
 comparisons (up to {{2log N}}), and {{add}} often requires another {{log N}}, 
 for example in the case where the inputs largely don't overlap (where {{N}} 
 is the number of iterators being merged).
 This can easily be replaced with a simple custom structure that can perform 
 replacement of the top of the queue in a single step, which will very often 
 complete after a couple of comparisons and in the worst case scenarios will 
 match the complexity of the current implementation.
 This should significantly improve merge performance for iterators with 
 limited overlap (e.g. levelled compaction).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8915) Improve MergeIterator performance

2015-05-05 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14528503#comment-14528503
 ] 

Benedict commented on CASSANDRA-8915:
-

My mistake, I thought I linked it. My version with updated tests and 
optimization is 
[here|https://github.com/belliottsmith/cassandra/tree/mergeiterator-comparison]

That sounds like a much simpler solution; if we can get even 50% of the benefit 
with that approach I'm happy. I'll take a look shortly

 Improve MergeIterator performance
 -

 Key: CASSANDRA-8915
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8915
 Project: Cassandra
  Issue Type: Improvement
Reporter: Branimir Lambov
Assignee: Branimir Lambov
Priority: Minor
 Fix For: 3.x


 The implementation of {{MergeIterator}} uses a priority queue and applies a 
 pair of {{poll}}+{{add}} operations for every item in the resulting sequence. 
 This is quite inefficient as {{poll}} necessarily applies at least {{log N}} 
 comparisons (up to {{2log N}}), and {{add}} often requires another {{log N}}, 
 for example in the case where the inputs largely don't overlap (where {{N}} 
 is the number of iterators being merged).
 This can easily be replaced with a simple custom structure that can perform 
 replacement of the top of the queue in a single step, which will very often 
 complete after a couple of comparisons and in the worst case scenarios will 
 match the complexity of the current implementation.
 This should significantly improve merge performance for iterators with 
 limited overlap (e.g. levelled compaction).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8915) Improve MergeIterator performance

2015-05-01 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14523753#comment-14523753
 ] 

Benedict commented on CASSANDRA-8915:
-

Oh, one other minor quibble: would you mind applying the project codestyle to 
your IDE?

 Improve MergeIterator performance
 -

 Key: CASSANDRA-8915
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8915
 Project: Cassandra
  Issue Type: Improvement
Reporter: Branimir Lambov
Assignee: Branimir Lambov
Priority: Minor
 Fix For: 3.x


 The implementation of {{MergeIterator}} uses a priority queue and applies a 
 pair of {{poll}}+{{add}} operations for every item in the resulting sequence. 
 This is quite inefficient as {{poll}} necessarily applies at least {{log N}} 
 comparisons (up to {{2log N}}), and {{add}} often requires another {{log N}}, 
 for example in the case where the inputs largely don't overlap (where {{N}} 
 is the number of iterators being merged).
 This can easily be replaced with a simple custom structure that can perform 
 replacement of the top of the queue in a single step, which will very often 
 complete after a couple of comparisons and in the worst case scenarios will 
 match the complexity of the current implementation.
 This should significantly improve merge performance for iterators with 
 limited overlap (e.g. levelled compaction).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8915) Improve MergeIterator performance

2015-05-01 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14523745#comment-14523745
 ] 

Benedict commented on CASSANDRA-8915:
-

First off, I agree that the MergeIteratorAllEqual is unfortunately complex, 
though I haven't yet attempted to fully reason about its behaviour. Are we 
certain there's no way to reduce its complexity? If you don't think it can be 
improved complexity-wise from its current position, I'll find some time to chew 
on it myself as well, so we have at least had two distinct attempts to reduce 
that burden.

However from a performance point of view, I have two major quibbles:

* There is a simple optimisation of AllEqual, which I have introduced, and it 
is now universally faster (or as fast)
* I don't _think_ that's the best characterisation of LCS (please feel free to 
prove me wrong, though, as I may have a poor mental model of LCS)

My optimisation as stands is actually pretty rubbish; it could be done much 
better. What I do is construct a new heap from the equal candidates, and then 
push down the top element and bubble up the remainder. All this does really is 
compress the equal items, push down the one most likely to stay at the head, 
and expand the remainder from the smallest item upwards so minimizing the 
number of swaps. We could instead perform a merge, but even with this imperfect 
implementation the result is typically half as many comparisons as NoEqual, and 
never more.

As to LCS: to my knowledge, it only ever compacts two levels together, but 
within each level there is no compaction/comparison, they are simply 
concatenated together. Between the levels there's no guarantee of overlap, and 
the values will typically be randomly distributed since we default to hashed 
partitioning. Given each level is 10 times larger than the previous, we can 
also expect that the number of _runs_ of equality are very low. I've introduced 
simulation of varying numbers of levels being compacted together, along with 
varying numbers of L0 sstables in the mix, and with varying levels of overlap 
between the levels (100%, 50%, 0%; here I mean percentage of values in a level 
which definitely occur in a lower level). These all come out as either the same 
or faster under both versions of AllEqual.

Of course the statement likely stands for a variety of workloads with many CQL 
row overwrites, so catering for the case of many runs of overlaps is still an 
important one.

bq.  but please do not hesitate to disagree or bring other 
concerns/questions/scenarios to the discussion.

ditto :)

 Improve MergeIterator performance
 -

 Key: CASSANDRA-8915
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8915
 Project: Cassandra
  Issue Type: Improvement
Reporter: Branimir Lambov
Assignee: Branimir Lambov
Priority: Minor
 Fix For: 3.x


 The implementation of {{MergeIterator}} uses a priority queue and applies a 
 pair of {{poll}}+{{add}} operations for every item in the resulting sequence. 
 This is quite inefficient as {{poll}} necessarily applies at least {{log N}} 
 comparisons (up to {{2log N}}), and {{add}} often requires another {{log N}}, 
 for example in the case where the inputs largely don't overlap (where {{N}} 
 is the number of iterators being merged).
 This can easily be replaced with a simple custom structure that can perform 
 replacement of the top of the queue in a single step, which will very often 
 complete after a couple of comparisons and in the worst case scenarios will 
 match the complexity of the current implementation.
 This should significantly improve merge performance for iterators with 
 limited overlap (e.g. levelled compaction).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8915) Improve MergeIterator performance

2015-04-27 Thread Branimir Lambov (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14514183#comment-14514183
 ] 

Branimir Lambov commented on CASSANDRA-8915:


[~benedict]: I invested some time exploring the linked list of colliding 
iterators idea. 
[Here|https://github.com/apache/cassandra/compare/trunk...blambov:mergeiterator-comparison]
 you can find a branch that includes and can be used to compare four versions 
of the iterator:
- {{MergeIteratorNoEqual}}: my initial implementation, without collapsing equal 
iterators
- {{MergeIteratorAllEqual}}: fully implements collapse of equal iterators
- {{MergeIterator.ManyToOne}}: mixture of the two above, only collapsing when 
it is easy
- {{MergeIteratorComparisonTest.MergeIteratorPQ}}: the old implementation

What I can tell from the performance tests:
- AllEqual is noticeably faster for randomly spaced inputs, as well as for 
completely non-colliding iterators
- NoEqual, on the other hand, is better when the overlap is organized in 
sections (as in LeveledCompaction)
- the mixture is always slower than either

The explanation is that AllEqual can avoid the equality comparisons in 
{{consume}} (halving the number of comparisons in the no-overlap case), but if 
there are collisions it must pay the price when advancing multiple equal 
iterators, having to place all but one at the end of the heap and possibly 
having to pull them up all the way to the top. The latter is not a disadvantage 
for random data, but IMHO Cassandra's use cases do produce or impose structure 
where it would be. As the AllEqual implementation is also quite a bit more 
complicated and harder to follow, the code I would choose to continue with is 
the original, NoEqual version, but please do not hesitate to disagree or bring 
other concerns/questions/scenarios to the discussion.

Other heap structures may improve this further, perhaps this can be revisited 
when there is need to go further and/or we have more time on our hands?

[~Stefania]: Good stuff! It is orthogonal to this improvement-- it will work 
just as well or even better here, in practically the same way that you have 
implemented it. The bound-to-actual transition will only sink it down from the 
top, which is not going to be wasting much, practically nothing if the bound is 
good enough. We don't want to open iterators before they have reached the top 
as opening is incomparably slower than any waste the mergeIterator can have.

 Improve MergeIterator performance
 -

 Key: CASSANDRA-8915
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8915
 Project: Cassandra
  Issue Type: Improvement
Reporter: Branimir Lambov
Assignee: Branimir Lambov
Priority: Minor

 The implementation of {{MergeIterator}} uses a priority queue and applies a 
 pair of {{poll}}+{{add}} operations for every item in the resulting sequence. 
 This is quite inefficient as {{poll}} necessarily applies at least {{log N}} 
 comparisons (up to {{2log N}}), and {{add}} often requires another {{log N}}, 
 for example in the case where the inputs largely don't overlap (where {{N}} 
 is the number of iterators being merged).
 This can easily be replaced with a simple custom structure that can perform 
 replacement of the top of the queue in a single step, which will very often 
 complete after a couple of comparisons and in the worst case scenarios will 
 match the complexity of the current implementation.
 This should significantly improve merge performance for iterators with 
 limited overlap (e.g. levelled compaction).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8915) Improve MergeIterator performance

2015-04-27 Thread Stefania (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14516046#comment-14516046
 ] 

Stefania commented on CASSANDRA-8915:
-

Great to hear it's orthogonal, thanks!

 Improve MergeIterator performance
 -

 Key: CASSANDRA-8915
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8915
 Project: Cassandra
  Issue Type: Improvement
Reporter: Branimir Lambov
Assignee: Branimir Lambov
Priority: Minor

 The implementation of {{MergeIterator}} uses a priority queue and applies a 
 pair of {{poll}}+{{add}} operations for every item in the resulting sequence. 
 This is quite inefficient as {{poll}} necessarily applies at least {{log N}} 
 comparisons (up to {{2log N}}), and {{add}} often requires another {{log N}}, 
 for example in the case where the inputs largely don't overlap (where {{N}} 
 is the number of iterators being merged).
 This can easily be replaced with a simple custom structure that can perform 
 replacement of the top of the queue in a single step, which will very often 
 complete after a couple of comparisons and in the worst case scenarios will 
 match the complexity of the current implementation.
 This should significantly improve merge performance for iterators with 
 limited overlap (e.g. levelled compaction).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8915) Improve MergeIterator performance

2015-04-03 Thread Stefania (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14394244#comment-14394244
 ] 

Stefania commented on CASSANDRA-8915:
-

In case you guys have not seen it yet, please check the changes proposed by 
CASSANDRA-8180, specifically this comment here: 
https://issues.apache.org/jira/browse/CASSANDRA-8180?focusedCommentId=14381674page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14381674.

The idea is that there will be two type of candidates, one greedy that knows 
its first value as it is the case right now. Another one, lazy, that gets 
compared based on a less accurate lower bound. What this means is that once 
this lazy candidate is picked, only then will it access the iterator to 
determine the exact first value, which could be much higher that the initial 
lower bound. 

The way I implemented this with the present implementation of the merge 
iterator is to add the lazy candidate back to the priority queue after it has 
calculated its first accurate value. It's not very elegant however and it is 
kind of wasteful.

If too complex to merge both approaches in one algorithm, we can always 
specialize a separate merge iterator implementation for supporting lazy 
candidates.

 Improve MergeIterator performance
 -

 Key: CASSANDRA-8915
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8915
 Project: Cassandra
  Issue Type: Improvement
Reporter: Branimir Lambov
Assignee: Branimir Lambov
Priority: Minor

 The implementation of {{MergeIterator}} uses a priority queue and applies a 
 pair of {{poll}}+{{add}} operations for every item in the resulting sequence. 
 This is quite inefficient as {{poll}} necessarily applies at least {{log N}} 
 comparisons (up to {{2log N}}), and {{add}} often requires another {{log N}}, 
 for example in the case where the inputs largely don't overlap (where {{N}} 
 is the number of iterators being merged).
 This can easily be replaced with a simple custom structure that can perform 
 replacement of the top of the queue in a single step, which will very often 
 complete after a couple of comparisons and in the worst case scenarios will 
 match the complexity of the current implementation.
 This should significantly improve merge performance for iterators with 
 limited overlap (e.g. levelled compaction).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8915) Improve MergeIterator performance

2015-04-03 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14394250#comment-14394250
 ] 

Benedict commented on CASSANDRA-8915:
-

I perhaps should have commented when I first saw the link. It should be quite 
viable to merge the behaviours; the Candidate just needs to have a flag 
indicating if the value is real or not, and to just discard the not-real 
values it encounters.

 Improve MergeIterator performance
 -

 Key: CASSANDRA-8915
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8915
 Project: Cassandra
  Issue Type: Improvement
Reporter: Branimir Lambov
Assignee: Branimir Lambov
Priority: Minor

 The implementation of {{MergeIterator}} uses a priority queue and applies a 
 pair of {{poll}}+{{add}} operations for every item in the resulting sequence. 
 This is quite inefficient as {{poll}} necessarily applies at least {{log N}} 
 comparisons (up to {{2log N}}), and {{add}} often requires another {{log N}}, 
 for example in the case where the inputs largely don't overlap (where {{N}} 
 is the number of iterators being merged).
 This can easily be replaced with a simple custom structure that can perform 
 replacement of the top of the queue in a single step, which will very often 
 complete after a couple of comparisons and in the worst case scenarios will 
 match the complexity of the current implementation.
 This should significantly improve merge performance for iterators with 
 limited overlap (e.g. levelled compaction).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8915) Improve MergeIterator performance

2015-04-02 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14393517#comment-14393517
 ] 

Benedict commented on CASSANDRA-8915:
-

Suggestion: convert each Candidate into a linked-list of colliding iterators, 
so that consume() becomes guaranteed O(1). In the event of many equal items, 
this would incur only linear costs, and only on push down, rather than 
logarithmic costs on both push down and advance. This would particularly help 
the partition level (sstable/memtable) merge, as we are likely to encounter the 
same DecoratedKey many times.

 Improve MergeIterator performance
 -

 Key: CASSANDRA-8915
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8915
 Project: Cassandra
  Issue Type: Improvement
Reporter: Branimir Lambov
Assignee: Branimir Lambov
Priority: Minor

 The implementation of {{MergeIterator}} uses a priority queue and applies a 
 pair of {{poll}}+{{add}} operations for every item in the resulting sequence. 
 This is quite inefficient as {{poll}} necessarily applies at least {{log N}} 
 comparisons (up to {{2log N}}), and {{add}} often requires another {{log N}}, 
 for example in the case where the inputs largely don't overlap (where {{N}} 
 is the number of iterators being merged).
 This can easily be replaced with a simple custom structure that can perform 
 replacement of the top of the queue in a single step, which will very often 
 complete after a couple of comparisons and in the worst case scenarios will 
 match the complexity of the current implementation.
 This should significantly improve merge performance for iterators with 
 limited overlap (e.g. levelled compaction).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8915) Improve MergeIterator performance

2015-03-12 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14358578#comment-14358578
 ] 

Benedict commented on CASSANDRA-8915:
-

For this test, I would suggest running with a small memtable size (to amplify 
compaction costs), and running stress against a schema very large partition 
with multiple clustering components, and using the revisit parameter, and the 
sorted visitation order, to spread each partition sequentially across multiple 
flushes.

 Improve MergeIterator performance
 -

 Key: CASSANDRA-8915
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8915
 Project: Cassandra
  Issue Type: Improvement
Reporter: Branimir Lambov
Assignee: Branimir Lambov
Priority: Minor

 The implementation of {{MergeIterator}} uses a priority queue and applies a 
 pair of {{poll}}+{{add}} operations for every item in the resulting sequence. 
 This is quite inefficient as {{poll}} necessarily applies at least {{log N}} 
 comparisons (up to {{2log N}}), and {{add}} often requires another {{log N}}, 
 for example in the case where the inputs largely don't overlap (where {{N}} 
 is the number of iterators being merged).
 This can easily be replaced with a simple custom structure that can perform 
 replacement of the top of the queue in a single step, which will very often 
 complete after a couple of comparisons and in the worst case scenarios will 
 match the complexity of the current implementation.
 This should significantly improve merge performance for iterators with 
 limited overlap (e.g. levelled compaction).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8915) Improve MergeIterator performance

2015-03-12 Thread Branimir Lambov (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14358560#comment-14358560
 ] 

Branimir Lambov commented on CASSANDRA-8915:


Patch is available 
[here|https://github.com/apache/cassandra/compare/trunk...blambov:8915-mergeiterator].

The code turned out to be somewhat more complicated than I expected because of 
the need to support lazy advance of the underlying iterators. The new iterator 
is between several times and 10-20% faster depending on the overlap of the 
inputs and the complexity of comparisons. I wasn't able to demonstrate the 
effect in cstar_perf (see 
[this|http://cstar.datastax.com/graph?stats=fb747232-c819-11e4-91b4-42010af0688fmetric=op_rateoperation=1_write]
 but also 
[this|http://cstar.datastax.com/graph?stats=ad2dbdb6-c843-11e4-a8b4-42010af0688fmetric=op_rateoperation=1_write]).
 Any suggestions on how to test compaction performance?

At the moment the patch fails {{BlacklistingCompactionsTest}} because of 
CASSANDRA-8960.

 Improve MergeIterator performance
 -

 Key: CASSANDRA-8915
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8915
 Project: Cassandra
  Issue Type: Improvement
Reporter: Branimir Lambov
Assignee: Branimir Lambov
Priority: Minor

 The implementation of {{MergeIterator}} uses a priority queue and applies a 
 pair of {{poll}}+{{add}} operations for every item in the resulting sequence. 
 This is quite inefficient as {{poll}} necessarily applies at least {{log N}} 
 comparisons (up to {{2log N}}), and {{add}} often requires another {{log N}}, 
 for example in the case where the inputs largely don't overlap (where {{N}} 
 is the number of iterators being merged).
 This can easily be replaced with a simple custom structure that can perform 
 replacement of the top of the queue in a single step, which will very often 
 complete after a couple of comparisons and in the worst case scenarios will 
 match the complexity of the current implementation.
 This should significantly improve merge performance for iterators with 
 limited overlap (e.g. levelled compaction).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8915) Improve MergeIterator performance

2015-03-05 Thread Branimir Lambov (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14348795#comment-14348795
 ] 

Branimir Lambov commented on CASSANDRA-8915:


Yes, what I'm suggesting is a replacement of the priority queue with a custom 
version that supports {{swapHead}}. It actually doesn't have to be a full PQ 
implementation, {{swapHead}} is the only operation needed (the initial heap 
construction can be done by sorting).

 Improve MergeIterator performance
 -

 Key: CASSANDRA-8915
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8915
 Project: Cassandra
  Issue Type: Improvement
Reporter: Branimir Lambov
Assignee: Branimir Lambov
Priority: Minor

 The implementation of {{MergeIterator}} uses a priority queue and applies a 
 pair of {{poll}}+{{add}} operations for every item in the resulting sequence. 
 This is quite inefficient as {{poll}} necessarily applies at least {{log N}} 
 comparisons (up to {{2log N}}), and {{add}} often requires another {{log N}}, 
 for example in the case where the inputs largely don't overlap (where {{N}} 
 is the number of iterators being merged).
 This can easily be replaced with a simple custom structure that can perform 
 replacement of the top of the queue in a single step, which will very often 
 complete after a couple of comparisons and in the worst case scenarios will 
 match the complexity of the current implementation.
 This should significantly improve merge performance for iterators with 
 limited overlap (e.g. levelled compaction).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8915) Improve MergeIterator performance

2015-03-05 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14348800#comment-14348800
 ] 

Benedict commented on CASSANDRA-8915:
-

+1 then

 Improve MergeIterator performance
 -

 Key: CASSANDRA-8915
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8915
 Project: Cassandra
  Issue Type: Improvement
Reporter: Branimir Lambov
Assignee: Branimir Lambov
Priority: Minor

 The implementation of {{MergeIterator}} uses a priority queue and applies a 
 pair of {{poll}}+{{add}} operations for every item in the resulting sequence. 
 This is quite inefficient as {{poll}} necessarily applies at least {{log N}} 
 comparisons (up to {{2log N}}), and {{add}} often requires another {{log N}}, 
 for example in the case where the inputs largely don't overlap (where {{N}} 
 is the number of iterators being merged).
 This can easily be replaced with a simple custom structure that can perform 
 replacement of the top of the queue in a single step, which will very often 
 complete after a couple of comparisons and in the worst case scenarios will 
 match the complexity of the current implementation.
 This should significantly improve merge performance for iterators with 
 limited overlap (e.g. levelled compaction).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8915) Improve MergeIterator performance

2015-03-05 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14348732#comment-14348732
 ] 

Benedict commented on CASSANDRA-8915:
-

It sounds like you're suggesting a full priority queue implementation with a 
swapHead() (or similar) method that heapifies the new value (possibly 
terminating immediately), but could you confirm? This seems preferable to me, 
but it could be taken that this proposal is for a simple wrapper that just 
extracts the head of the queue and wraps an existing implementation, which 
would only help situations with mostly disjoint files (so any single file with 
wide overlap would result in close to prior performance).

 Improve MergeIterator performance
 -

 Key: CASSANDRA-8915
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8915
 Project: Cassandra
  Issue Type: Improvement
Reporter: Branimir Lambov
Assignee: Branimir Lambov
Priority: Minor

 The implementation of {{MergeIterator}} uses a priority queue and applies a 
 pair of {{poll}}+{{add}} operations for every item in the resulting sequence. 
 This is quite inefficient as {{poll}} necessarily applies at least {{log N}} 
 comparisons (up to {{2log N}}), and {{add}} often requires another {{log N}}, 
 for example in the case where the inputs largely don't overlap (where {{N}} 
 is the number of iterators being merged).
 This can easily be replaced with a simple custom structure that can perform 
 replacement of the top of the queue in a single step, which will very often 
 complete after a couple of comparisons and in the worst case scenarios will 
 match the complexity of the current implementation.
 This should significantly improve merge performance for iterators with 
 limited overlap (e.g. levelled compaction).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)