[jira] [Commented] (CASSANDRA-8833) Stop opening compaction results early

2015-03-13 Thread Marcus Eriksson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14360030#comment-14360030
 ] 

Marcus Eriksson commented on CASSANDRA-8833:


[~thobbs] do we have any statistics on how much we gain in real life by the 
redistribution?

 Stop opening compaction results early
 -

 Key: CASSANDRA-8833
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8833
 Project: Cassandra
  Issue Type: Improvement
Reporter: Marcus Eriksson
 Fix For: 2.1.4


 We should simplify the code base by not doing early opening of compaction 
 results. It makes it very hard to reason about sstable life cycles since they 
 can be in many different states, opened early, starts moved, shadowed, 
 final, instead of as before, basically just one (tmp files are not really 
 'live' yet so I don't count those). The ref counting of shared resources 
 between sstables in these different states is also hard to reason about. This 
 has caused quite a few issues since we released 2.1
 I think it all boils down to a performance vs code complexity issue, is 
 opening compaction results early really 'worth it' wrt the performance gain? 
 The results in CASSANDRA-6916 sure look like the benefits are big enough, but 
 the difference should not be as big for people on SSDs (which most people who 
 care about latencies are)
 WDYT [~benedict] [~jbellis] [~iamaleksey] [~JoshuaMcKenzie]?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8833) Stop opening compaction results early

2015-03-13 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14360188#comment-14360188
 ] 

Benedict commented on CASSANDRA-8833:
-

At the risk of sounding like a broken record (in case my earlier statement was 
missed): bloom filter resizing would need to break this assumption also, and is 
sort of intrinsically linked to the discussion around summary resizing. Whether 
or not we want this is another matter I'll leave aside for the moment. 

Just to outline the remainder of my strategy for mitigating any and all of 
these risks:

* In CASSANDRA-8568, I will:
** introduce a stable set of sstables that never changes, and all 
non-hot-path accesses will use these so they don't risk confusion, and don't 
have to worry about first/last issues
** ensure compaction strategies are only informed of changes to this stable 
set of readers
** remove the shadowed state of an sstable
** make the modification of tracker state transactional and more declarative, 
so both easier to follow and much harder to let get into a bad state
* CASSANDRA-8893, CASSANDRA-7066 and some related work will:
** eliminate the distinction between early open, temporary, and final files on 
disk, so eliminate at least one layer of the cleanup logic (i.e. make its 
requirements equivalent to summary/bf resizing)
** which also permits us to simplify the early open logic, by special casing it 
much less

In conjunction with the major overhaul of resource cleanup, AFAICT this 
mitigates most of the problems: 

* resource counting is now much easier to reason about, and soon will be even 
easier. it is also safer to get it wrong.
* only paths we know are safe to use overlapping sstables will do so (and in 
parallel we also enforce the non-overlapping rule)
* compaction doesn't have to even be aware it is happening
* what I think has been the biggest problem, the actual safe application of 
state changes (which were never atomic and could actually screw themselves up 
willfully through assertions) will be transactional and ensure exceptions do 
not interrupt their execution. it will also encapsulate its own safe rollback, 
so if we screw up somewhere, it will fix it for us.

I don't pretend it'll be 100% first time, but I think this new state will be 
safer by a significant margin than the pre-early-open state, which we are still 
seeing bug reports for in the 2.0 line, and has been the cause of many serious 
bugs (and at least one major public downtime of a well known deployment). I 
very much hope all of these changes will restore confidence in not only the 
early open feature, but resource management in general, and hopefully reduce 
the burden on all maintainers.

 Stop opening compaction results early
 -

 Key: CASSANDRA-8833
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8833
 Project: Cassandra
  Issue Type: Improvement
Reporter: Marcus Eriksson
 Fix For: 2.1.4


 We should simplify the code base by not doing early opening of compaction 
 results. It makes it very hard to reason about sstable life cycles since they 
 can be in many different states, opened early, starts moved, shadowed, 
 final, instead of as before, basically just one (tmp files are not really 
 'live' yet so I don't count those). The ref counting of shared resources 
 between sstables in these different states is also hard to reason about. This 
 has caused quite a few issues since we released 2.1
 I think it all boils down to a performance vs code complexity issue, is 
 opening compaction results early really 'worth it' wrt the performance gain? 
 The results in CASSANDRA-6916 sure look like the benefits are big enough, but 
 the difference should not be as big for people on SSDs (which most people who 
 care about latencies are)
 WDYT [~benedict] [~jbellis] [~iamaleksey] [~JoshuaMcKenzie]?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8833) Stop opening compaction results early

2015-03-09 Thread Ryan McGuire (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14352980#comment-14352980
 ] 

Ryan McGuire commented on CASSANDRA-8833:
-

 [~benedict] I think what [~krummas] was saying was run the exact same test (on 
the exact same versions) as CASSANDRA-6916, since that original test was done 
on spinning disk. We would only test a newer C* after those results came in for 
SSDs. Is that the test you have in mind [~krummas]?


 Stop opening compaction results early
 -

 Key: CASSANDRA-8833
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8833
 Project: Cassandra
  Issue Type: Improvement
Reporter: Marcus Eriksson
 Fix For: 2.1.4


 We should simplify the code base by not doing early opening of compaction 
 results. It makes it very hard to reason about sstable life cycles since they 
 can be in many different states, opened early, starts moved, shadowed, 
 final, instead of as before, basically just one (tmp files are not really 
 'live' yet so I don't count those). The ref counting of shared resources 
 between sstables in these different states is also hard to reason about. This 
 has caused quite a few issues since we released 2.1
 I think it all boils down to a performance vs code complexity issue, is 
 opening compaction results early really 'worth it' wrt the performance gain? 
 The results in CASSANDRA-6916 sure look like the benefits are big enough, but 
 the difference should not be as big for people on SSDs (which most people who 
 care about latencies are)
 WDYT [~benedict] [~jbellis] [~iamaleksey] [~JoshuaMcKenzie]?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8833) Stop opening compaction results early

2015-03-09 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14353003#comment-14353003
 ] 

Benedict commented on CASSANDRA-8833:
-

bq.  I think it would be OK to run it straight up current 2.1 vs current 2.1 
with sstable_preemptive_open_interval_in_mb: -1 as well, just to get a feeling 
for it

This isn't a like-for-like comparison, as without preemptive open we need to 
evict the page cache for compaction results to avoid inactive data polluting it.

 Stop opening compaction results early
 -

 Key: CASSANDRA-8833
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8833
 Project: Cassandra
  Issue Type: Improvement
Reporter: Marcus Eriksson
 Fix For: 2.1.4


 We should simplify the code base by not doing early opening of compaction 
 results. It makes it very hard to reason about sstable life cycles since they 
 can be in many different states, opened early, starts moved, shadowed, 
 final, instead of as before, basically just one (tmp files are not really 
 'live' yet so I don't count those). The ref counting of shared resources 
 between sstables in these different states is also hard to reason about. This 
 has caused quite a few issues since we released 2.1
 I think it all boils down to a performance vs code complexity issue, is 
 opening compaction results early really 'worth it' wrt the performance gain? 
 The results in CASSANDRA-6916 sure look like the benefits are big enough, but 
 the difference should not be as big for people on SSDs (which most people who 
 care about latencies are)
 WDYT [~benedict] [~jbellis] [~iamaleksey] [~JoshuaMcKenzie]?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8833) Stop opening compaction results early

2015-03-09 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14352991#comment-14352991
 ] 

Benedict commented on CASSANDRA-8833:
-

bq. run the exact same test 

that works too; should give a reasonable idea of ballpark effect on current 
state

 Stop opening compaction results early
 -

 Key: CASSANDRA-8833
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8833
 Project: Cassandra
  Issue Type: Improvement
Reporter: Marcus Eriksson
 Fix For: 2.1.4


 We should simplify the code base by not doing early opening of compaction 
 results. It makes it very hard to reason about sstable life cycles since they 
 can be in many different states, opened early, starts moved, shadowed, 
 final, instead of as before, basically just one (tmp files are not really 
 'live' yet so I don't count those). The ref counting of shared resources 
 between sstables in these different states is also hard to reason about. This 
 has caused quite a few issues since we released 2.1
 I think it all boils down to a performance vs code complexity issue, is 
 opening compaction results early really 'worth it' wrt the performance gain? 
 The results in CASSANDRA-6916 sure look like the benefits are big enough, but 
 the difference should not be as big for people on SSDs (which most people who 
 care about latencies are)
 WDYT [~benedict] [~jbellis] [~iamaleksey] [~JoshuaMcKenzie]?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8833) Stop opening compaction results early

2015-03-09 Thread Marcus Eriksson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14352999#comment-14352999
 ] 

Marcus Eriksson commented on CASSANDRA-8833:


[~enigmacurry] sure, that works (though, I think it would be OK to run it 
straight up current 2.1 vs current 2.1 with 
{{sstable_preemptive_open_interval_in_mb: -1}} as well, just to get a feeling 
for it)

 Stop opening compaction results early
 -

 Key: CASSANDRA-8833
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8833
 Project: Cassandra
  Issue Type: Improvement
Reporter: Marcus Eriksson
 Fix For: 2.1.4


 We should simplify the code base by not doing early opening of compaction 
 results. It makes it very hard to reason about sstable life cycles since they 
 can be in many different states, opened early, starts moved, shadowed, 
 final, instead of as before, basically just one (tmp files are not really 
 'live' yet so I don't count those). The ref counting of shared resources 
 between sstables in these different states is also hard to reason about. This 
 has caused quite a few issues since we released 2.1
 I think it all boils down to a performance vs code complexity issue, is 
 opening compaction results early really 'worth it' wrt the performance gain? 
 The results in CASSANDRA-6916 sure look like the benefits are big enough, but 
 the difference should not be as big for people on SSDs (which most people who 
 care about latencies are)
 WDYT [~benedict] [~jbellis] [~iamaleksey] [~JoshuaMcKenzie]?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8833) Stop opening compaction results early

2015-03-09 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14352815#comment-14352815
 ] 

Benedict commented on CASSANDRA-8833:
-

bq. run benchmarks again, this time on ssds (Ryan McGuire could your team help 
out with that?)

This is a little tricky, since we've disabled the eviction of page cache 
entries for the new files in 2.1, so we'll need to roll a patch to reintroduce 
it. It's also worth caveating in advance that, since testing SSDs we likely 
care about latency more than throughput, Gil Tene's arguments (my final 
thoughts about which can be found 
[here|https://www.google.co.uk/url?sa=trct=jq=esrc=ssource=webcd=1cad=rjauact=8ved=0CCEQFjAAurl=https%3A%2F%2Fgroups.google.com%2Fd%2Fmsg%2Fmechanical-sympathy%2FicNZJejUHfE%2FLMQdQZJwvxEJei=2X39VLi5B4XbUdKUgMAOusg=AFQjCNF4_q-i6496RrmjZiXHZzAEmOy48Abvm=bv.87611401,d.d24])
 about stress in its current state do hold water, and we we will significantly 
underestimate any negative effect on latency, although I may be able to 
calculate a rough adjustment after the fact. This isn't a problem for 
provisioning tests, since you should never saturate throughput for such a test, 
but for testing these kinds of effects we absolutely should. Fixing stress to 
properly simulate all of the messages at a target throughput rate is on my hit 
list, but won't be done for a few weeks at least.

 Stop opening compaction results early
 -

 Key: CASSANDRA-8833
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8833
 Project: Cassandra
  Issue Type: Improvement
Reporter: Marcus Eriksson
 Fix For: 2.1.4


 We should simplify the code base by not doing early opening of compaction 
 results. It makes it very hard to reason about sstable life cycles since they 
 can be in many different states, opened early, starts moved, shadowed, 
 final, instead of as before, basically just one (tmp files are not really 
 'live' yet so I don't count those). The ref counting of shared resources 
 between sstables in these different states is also hard to reason about. This 
 has caused quite a few issues since we released 2.1
 I think it all boils down to a performance vs code complexity issue, is 
 opening compaction results early really 'worth it' wrt the performance gain? 
 The results in CASSANDRA-6916 sure look like the benefits are big enough, but 
 the difference should not be as big for people on SSDs (which most people who 
 care about latencies are)
 WDYT [~benedict] [~jbellis] [~iamaleksey] [~JoshuaMcKenzie]?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8833) Stop opening compaction results early

2015-03-09 Thread Tyler Hobbs (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14353097#comment-14353097
 ] 

Tyler Hobbs commented on CASSANDRA-8833:


bq. see if we can solve the index summary redistribution some other way (only 
do it on compaction?) to be able to make SSTableReader totally immutable again, 
got any insight here Tyler Hobbs?

We could switch to doing it on compaction only.  I suppose we would determine 
the new summary interval based on the read rates for SSTables participating in 
the compaction.  The main downside is that it would be less responsive to 
changes.  For example, if a new SSTable is flushed that shadows most of the 
entries in an old SSTable, the old SSTable may keep its existing (and possibly 
expensive) summary interval until the next compaction.

 Stop opening compaction results early
 -

 Key: CASSANDRA-8833
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8833
 Project: Cassandra
  Issue Type: Improvement
Reporter: Marcus Eriksson
 Fix For: 2.1.4


 We should simplify the code base by not doing early opening of compaction 
 results. It makes it very hard to reason about sstable life cycles since they 
 can be in many different states, opened early, starts moved, shadowed, 
 final, instead of as before, basically just one (tmp files are not really 
 'live' yet so I don't count those). The ref counting of shared resources 
 between sstables in these different states is also hard to reason about. This 
 has caused quite a few issues since we released 2.1
 I think it all boils down to a performance vs code complexity issue, is 
 opening compaction results early really 'worth it' wrt the performance gain? 
 The results in CASSANDRA-6916 sure look like the benefits are big enough, but 
 the difference should not be as big for people on SSDs (which most people who 
 care about latencies are)
 WDYT [~benedict] [~jbellis] [~iamaleksey] [~JoshuaMcKenzie]?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8833) Stop opening compaction results early

2015-03-09 Thread Marcus Eriksson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14352719#comment-14352719
 ] 

Marcus Eriksson commented on CASSANDRA-8833:


Coming back to this, what I would like is:

# run benchmarks again, this time on ssds ([~enigmacurry] could your team help 
out with that?)
# see if we can solve the index summary redistribution some other way (only do 
it on compaction?) to be able to make SSTableReader totally immutable again, 
got any insight here [~thobbs]?

 Stop opening compaction results early
 -

 Key: CASSANDRA-8833
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8833
 Project: Cassandra
  Issue Type: Improvement
Reporter: Marcus Eriksson
 Fix For: 2.1.4


 We should simplify the code base by not doing early opening of compaction 
 results. It makes it very hard to reason about sstable life cycles since they 
 can be in many different states, opened early, starts moved, shadowed, 
 final, instead of as before, basically just one (tmp files are not really 
 'live' yet so I don't count those). The ref counting of shared resources 
 between sstables in these different states is also hard to reason about. This 
 has caused quite a few issues since we released 2.1
 I think it all boils down to a performance vs code complexity issue, is 
 opening compaction results early really 'worth it' wrt the performance gain? 
 The results in CASSANDRA-6916 sure look like the benefits are big enough, but 
 the difference should not be as big for people on SSDs (which most people who 
 care about latencies are)
 WDYT [~benedict] [~jbellis] [~iamaleksey] [~JoshuaMcKenzie]?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8833) Stop opening compaction results early

2015-02-20 Thread Matt Stump (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329192#comment-14329192
 ] 

Matt Stump commented on CASSANDRA-8833:
---

I'm not against refactoring of SSTableReader, I think it's overly complicated, 
but I also think that the fix being proposed is overly complicated. You can't 
get away from OS specific code because the code to manage the file system cache 
is OS specific. The trySkipCache logic in CLibrary is currently specific to 
Linux. I've slightly broadened it to Linux and some BSD variants (excluding 
OSX) in my patch for CASSANDRA-8160. We can get the desired behavior for 
Windows through selective use of the SEC_NOCACHE flag when opening the memory 
mapped file for windows, but that would require pushing the cache advisory 
logic one lower deeper. I don't think switching to an NIO channel will 
alleviate us from the need to provide hints to the OS of which IO is or is not 
appropriate to cache.

 Stop opening compaction results early
 -

 Key: CASSANDRA-8833
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8833
 Project: Cassandra
  Issue Type: Improvement
Reporter: Marcus Eriksson
 Fix For: 2.1.4


 We should simplify the code base by not doing early opening of compaction 
 results. It makes it very hard to reason about sstable life cycles since they 
 can be in many different states, opened early, starts moved, shadowed, 
 final, instead of as before, basically just one (tmp files are not really 
 'live' yet so I don't count those). The ref counting of shared resources 
 between sstables in these different states is also hard to reason about. This 
 has caused quite a few issues since we released 2.1
 I think it all boils down to a performance vs code complexity issue, is 
 opening compaction results early really 'worth it' wrt the performance gain? 
 The results in CASSANDRA-6916 sure look like the benefits are big enough, but 
 the difference should not be as big for people on SSDs (which most people who 
 care about latencies are)
 WDYT [~benedict] [~jbellis] [~iamaleksey] [~JoshuaMcKenzie]?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8833) Stop opening compaction results early

2015-02-20 Thread T Jake Luciani (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329221#comment-14329221
 ] 

T Jake Luciani commented on CASSANDRA-8833:
---

I think the reason pre-opening results has exposed so many issues is we had 
always assumed in the code that sstables are immutable.  pre-opening basically 
makes them mutable so in hindsight it's pretty clearly a major issue.  I'm on 
the fence as to what we should do here. Perhaps we can solve this performance 
problem differently once CASSANDRA-5863 is in.  If we control the page cache we 
can expunge the old sstables and warm the new one more intelligently than with 
the FADVISE tricks

 Stop opening compaction results early
 -

 Key: CASSANDRA-8833
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8833
 Project: Cassandra
  Issue Type: Improvement
Reporter: Marcus Eriksson
 Fix For: 2.1.4


 We should simplify the code base by not doing early opening of compaction 
 results. It makes it very hard to reason about sstable life cycles since they 
 can be in many different states, opened early, starts moved, shadowed, 
 final, instead of as before, basically just one (tmp files are not really 
 'live' yet so I don't count those). The ref counting of shared resources 
 between sstables in these different states is also hard to reason about. This 
 has caused quite a few issues since we released 2.1
 I think it all boils down to a performance vs code complexity issue, is 
 opening compaction results early really 'worth it' wrt the performance gain? 
 The results in CASSANDRA-6916 sure look like the benefits are big enough, but 
 the difference should not be as big for people on SSDs (which most people who 
 care about latencies are)
 WDYT [~benedict] [~jbellis] [~iamaleksey] [~JoshuaMcKenzie]?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8833) Stop opening compaction results early

2015-02-20 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329253#comment-14329253
 ] 

Benedict commented on CASSANDRA-8833:
-

I'm not sure I follow. Is your comment intended for this ticket?

 Stop opening compaction results early
 -

 Key: CASSANDRA-8833
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8833
 Project: Cassandra
  Issue Type: Improvement
Reporter: Marcus Eriksson
 Fix For: 2.1.4


 We should simplify the code base by not doing early opening of compaction 
 results. It makes it very hard to reason about sstable life cycles since they 
 can be in many different states, opened early, starts moved, shadowed, 
 final, instead of as before, basically just one (tmp files are not really 
 'live' yet so I don't count those). The ref counting of shared resources 
 between sstables in these different states is also hard to reason about. This 
 has caused quite a few issues since we released 2.1
 I think it all boils down to a performance vs code complexity issue, is 
 opening compaction results early really 'worth it' wrt the performance gain? 
 The results in CASSANDRA-6916 sure look like the benefits are big enough, but 
 the difference should not be as big for people on SSDs (which most people who 
 care about latencies are)
 WDYT [~benedict] [~jbellis] [~iamaleksey] [~JoshuaMcKenzie]?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8833) Stop opening compaction results early

2015-02-20 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14329257#comment-14329257
 ] 

Benedict commented on CASSANDRA-8833:
-

bq.  pre-opening basically makes them mutable so in hindsight it's pretty 
clearly a major issue.

Pre-opening did _not_ make them mutable. Index summary redistribution did this, 
and I followed suit in the way it dealt with this. We _will_ have to eliminate 
index summary redistribution as well if we want to restore immutability.

 Stop opening compaction results early
 -

 Key: CASSANDRA-8833
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8833
 Project: Cassandra
  Issue Type: Improvement
Reporter: Marcus Eriksson
 Fix For: 2.1.4


 We should simplify the code base by not doing early opening of compaction 
 results. It makes it very hard to reason about sstable life cycles since they 
 can be in many different states, opened early, starts moved, shadowed, 
 final, instead of as before, basically just one (tmp files are not really 
 'live' yet so I don't count those). The ref counting of shared resources 
 between sstables in these different states is also hard to reason about. This 
 has caused quite a few issues since we released 2.1
 I think it all boils down to a performance vs code complexity issue, is 
 opening compaction results early really 'worth it' wrt the performance gain? 
 The results in CASSANDRA-6916 sure look like the benefits are big enough, but 
 the difference should not be as big for people on SSDs (which most people who 
 care about latencies are)
 WDYT [~benedict] [~jbellis] [~iamaleksey] [~JoshuaMcKenzie]?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8833) Stop opening compaction results early

2015-02-19 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14327970#comment-14327970
 ] 

Benedict commented on CASSANDRA-8833:
-

bq. I suspect that memory mapping will eclipse performance gains
bq. I'd prefer hard #'s from test beds.

Make your mind up ;)

Joking aside, the two aren't mutually exclusive. We only rule out early 
reopening with mmap because of hard links, but they only live for the duration 
of the early reopening. We could delay mmapping until the final opening. The 
two are also very different performance profile implications; early opening 
mitigates worst case behaviour far more importantly than improving overall 
system performance (although it does that too).

bq.  Code-changes to stabilize this have been increasing quite a bit recently

They almost all happened within the past month, when I finished up my other 
commitments and decided to try and put a lid on this and the other correlated 
problems. So I'm not sure that's indicative of much.


 Stop opening compaction results early
 -

 Key: CASSANDRA-8833
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8833
 Project: Cassandra
  Issue Type: Improvement
Reporter: Marcus Eriksson
 Fix For: 2.1.4


 We should simplify the code base by not doing early opening of compaction 
 results. It makes it very hard to reason about sstable life cycles since they 
 can be in many different states, opened early, starts moved, shadowed, 
 final, instead of as before, basically just one (tmp files are not really 
 'live' yet so I don't count those). The ref counting of shared resources 
 between sstables in these different states is also hard to reason about. This 
 has caused quite a few issues since we released 2.1
 I think it all boils down to a performance vs code complexity issue, is 
 opening compaction results early really 'worth it' wrt the performance gain? 
 The results in CASSANDRA-6916 sure look like the benefits are big enough, but 
 the difference should not be as big for people on SSDs (which most people who 
 care about latencies are)
 WDYT [~benedict] [~jbellis] [~iamaleksey] [~JoshuaMcKenzie]?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8833) Stop opening compaction results early

2015-02-19 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14328025#comment-14328025
 ] 

Benedict commented on CASSANDRA-8833:
-

bq. there may be unintended side-effects to some of those changes

I have no doubt, which is partially why I consider this premature. We had a 
period of problems (not isolated to this, I want to reiterate), and we've had a 
very recent attempt to put them to bed. But not let that settle down to see how 
successful it has been. I'm expecting a few more minor hiccups. If we see some 
more major ones, especially regressions, that's a different matter.

bq. Thus far we've avoided platform-specific code-paths as much as possible, 
but that seems a simple enough solution that it would be worth looking into.

I'd actually be totally cool with (and maybe even prefer) making this a 
universal behaviour. That way we don't have multiple mappings of the same file, 
and we still retain most of the upside, with no behavioural split.

 Stop opening compaction results early
 -

 Key: CASSANDRA-8833
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8833
 Project: Cassandra
  Issue Type: Improvement
Reporter: Marcus Eriksson
 Fix For: 2.1.4


 We should simplify the code base by not doing early opening of compaction 
 results. It makes it very hard to reason about sstable life cycles since they 
 can be in many different states, opened early, starts moved, shadowed, 
 final, instead of as before, basically just one (tmp files are not really 
 'live' yet so I don't count those). The ref counting of shared resources 
 between sstables in these different states is also hard to reason about. This 
 has caused quite a few issues since we released 2.1
 I think it all boils down to a performance vs code complexity issue, is 
 opening compaction results early really 'worth it' wrt the performance gain? 
 The results in CASSANDRA-6916 sure look like the benefits are big enough, but 
 the difference should not be as big for people on SSDs (which most people who 
 care about latencies are)
 WDYT [~benedict] [~jbellis] [~iamaleksey] [~JoshuaMcKenzie]?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8833) Stop opening compaction results early

2015-02-19 Thread Joshua McKenzie (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14328019#comment-14328019
 ] 

Joshua McKenzie commented on CASSANDRA-8833:


bq. Make your mind up
Touché.  :)  I have numbers on another ticket and Branimir's been seeing 
similar effects on the read path on Windows, so this is more a symptom of me 
being lazy on bringing them into this discussion (combined with a lack of rigor 
on producing those #'s thus far).

bq. the two aren't mutually exclusive
Absolutely true, but the headache associated with renaming or deleting files on 
Windows is compounded with hard-links and memory-mapping. Delaying the mapping 
until finalization of the sstable would be a simple solution however that's 
adding more complexity on top of an already complex situation.  Thus far we've 
avoided platform-specific code-paths as much as possible, but that seems a 
simple enough solution that it would be worth looking into.

Regarding the flurry of recent fixes - as you're well aware, little in this 
code-base is as simple as it may appear at first glance.  I suspect we're going 
to have other things we need to tidy up with these recent commits and there may 
be unintended side-effects to some of those changes.  A lot of hand-waving 
here, certainly, but past experience indicates that changes that touch that 
many places in the code-base almost always have some surprises in store for us.

bq. I'm not certain what you're referring to here
Ah - so the 9% w/populate_io_cache_on_flush was the pathological case and the 
crazy cliff drop-off was the normal use-case then?  A bit of clarification 
there goes a long way.

 Stop opening compaction results early
 -

 Key: CASSANDRA-8833
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8833
 Project: Cassandra
  Issue Type: Improvement
Reporter: Marcus Eriksson
 Fix For: 2.1.4


 We should simplify the code base by not doing early opening of compaction 
 results. It makes it very hard to reason about sstable life cycles since they 
 can be in many different states, opened early, starts moved, shadowed, 
 final, instead of as before, basically just one (tmp files are not really 
 'live' yet so I don't count those). The ref counting of shared resources 
 between sstables in these different states is also hard to reason about. This 
 has caused quite a few issues since we released 2.1
 I think it all boils down to a performance vs code complexity issue, is 
 opening compaction results early really 'worth it' wrt the performance gain? 
 The results in CASSANDRA-6916 sure look like the benefits are big enough, but 
 the difference should not be as big for people on SSDs (which most people who 
 care about latencies are)
 WDYT [~benedict] [~jbellis] [~iamaleksey] [~JoshuaMcKenzie]?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8833) Stop opening compaction results early

2015-02-19 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14327612#comment-14327612
 ] 

Jonathan Ellis commented on CASSANDRA-8833:
---

/cc [~pmcfadin] [~mstump] [~rssvihla]

 Stop opening compaction results early
 -

 Key: CASSANDRA-8833
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8833
 Project: Cassandra
  Issue Type: Improvement
Reporter: Marcus Eriksson
 Fix For: 2.1.4


 We should simplify the code base by not doing early opening of compaction 
 results. It makes it very hard to reason about sstable life cycles since they 
 can be in many different states, opened early, starts moved, shadowed, 
 final, instead of as before, basically just one (tmp files are not really 
 'live' yet so I don't count those). The ref counting of shared resources 
 between sstables in these different states is also hard to reason about. This 
 has caused quite a few issues since we released 2.1
 I think it all boils down to a performance vs code complexity issue, is 
 opening compaction results early really 'worth it' wrt the performance gain? 
 The results in CASSANDRA-6916 sure look like the benefits are big enough, but 
 the difference should not be as big for people on SSDs (which most people who 
 care about latencies are)
 WDYT [~benedict] [~jbellis] [~iamaleksey] [~JoshuaMcKenzie]?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8833) Stop opening compaction results early

2015-02-19 Thread Joshua McKenzie (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14327961#comment-14327961
 ] 

Joshua McKenzie commented on CASSANDRA-8833:


Regarding Windows: early re-open with memory-mapped index files isn't a 
technical feasibility due to ntfs restrictions and I suspect that memory 
mapping will eclipse performance gains from early re-open for most use-cases.  
CASSANDRA-8709 will make it an option w/out mmap and we can leave that as the 
default path.

I'm familiar with the cost leading up to this point in time to stabilize this 
feature and the other systems it stressed.  Code-changes to stabilize this have 
been increasing quite a bit recently so I don't see evidence that we're near 
the end of stabilization for this feature yet, and as you stated:

bq. every line touched is a new bug in waiting

Also, the performance gains from early re-open look to vary greatly depending 
on test setup; the graphs referenced in CASSANDRA-6916 vary from a 
night-and-day comparison down to a 9% total ops improvement on the regression 
case. When speaking of the performance improvement of this feature in the 
comments above I see several phrases that stick out to me: can be, Possibly 
introducing, may see, could be. I'd prefer hard #'s from test beds.

 Stop opening compaction results early
 -

 Key: CASSANDRA-8833
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8833
 Project: Cassandra
  Issue Type: Improvement
Reporter: Marcus Eriksson
 Fix For: 2.1.4


 We should simplify the code base by not doing early opening of compaction 
 results. It makes it very hard to reason about sstable life cycles since they 
 can be in many different states, opened early, starts moved, shadowed, 
 final, instead of as before, basically just one (tmp files are not really 
 'live' yet so I don't count those). The ref counting of shared resources 
 between sstables in these different states is also hard to reason about. This 
 has caused quite a few issues since we released 2.1
 I think it all boils down to a performance vs code complexity issue, is 
 opening compaction results early really 'worth it' wrt the performance gain? 
 The results in CASSANDRA-6916 sure look like the benefits are big enough, but 
 the difference should not be as big for people on SSDs (which most people who 
 care about latencies are)
 WDYT [~benedict] [~jbellis] [~iamaleksey] [~JoshuaMcKenzie]?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8833) Stop opening compaction results early

2015-02-19 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14328236#comment-14328236
 ] 

Benedict commented on CASSANDRA-8833:
-

bq. Ah - so the 9% w/populate_io_cache_on_flush was the pathological case and 
the crazy cliff drop-off was the normal use-case then? A bit of clarification 
there goes a long way.

Sorry, I didn't respond to this, so to avoid any lack of clarity: yep, spot on.

 Stop opening compaction results early
 -

 Key: CASSANDRA-8833
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8833
 Project: Cassandra
  Issue Type: Improvement
Reporter: Marcus Eriksson
 Fix For: 2.1.4


 We should simplify the code base by not doing early opening of compaction 
 results. It makes it very hard to reason about sstable life cycles since they 
 can be in many different states, opened early, starts moved, shadowed, 
 final, instead of as before, basically just one (tmp files are not really 
 'live' yet so I don't count those). The ref counting of shared resources 
 between sstables in these different states is also hard to reason about. This 
 has caused quite a few issues since we released 2.1
 I think it all boils down to a performance vs code complexity issue, is 
 opening compaction results early really 'worth it' wrt the performance gain? 
 The results in CASSANDRA-6916 sure look like the benefits are big enough, but 
 the difference should not be as big for people on SSDs (which most people who 
 care about latencies are)
 WDYT [~benedict] [~jbellis] [~iamaleksey] [~JoshuaMcKenzie]?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8833) Stop opening compaction results early

2015-02-19 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14327688#comment-14327688
 ] 

Benedict commented on CASSANDRA-8833:
-

We still have to deal with the complexity of the fact there are multiple 
instances of a given reader, which is I think where almost all of the problems 
stem from, and is certainly the most complex part to reason about and get 
right. I don't think the existence of other kinds of instance is significantly 
more complex, but I may be wrong about that.

 Stop opening compaction results early
 -

 Key: CASSANDRA-8833
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8833
 Project: Cassandra
  Issue Type: Improvement
Reporter: Marcus Eriksson
 Fix For: 2.1.4


 We should simplify the code base by not doing early opening of compaction 
 results. It makes it very hard to reason about sstable life cycles since they 
 can be in many different states, opened early, starts moved, shadowed, 
 final, instead of as before, basically just one (tmp files are not really 
 'live' yet so I don't count those). The ref counting of shared resources 
 between sstables in these different states is also hard to reason about. This 
 has caused quite a few issues since we released 2.1
 I think it all boils down to a performance vs code complexity issue, is 
 opening compaction results early really 'worth it' wrt the performance gain? 
 The results in CASSANDRA-6916 sure look like the benefits are big enough, but 
 the difference should not be as big for people on SSDs (which most people who 
 care about latencies are)
 WDYT [~benedict] [~jbellis] [~iamaleksey] [~JoshuaMcKenzie]?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8833) Stop opening compaction results early

2015-02-19 Thread Marcus Eriksson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14327685#comment-14327685
 ] 

Marcus Eriksson commented on CASSANDRA-8833:


bq. SSTableReader lifecycles are still approximately as complex due to index 
summary redistribution.
Don't see how that can be as complex as we don't open early, move starts or 
shadow any files when doing this

 Stop opening compaction results early
 -

 Key: CASSANDRA-8833
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8833
 Project: Cassandra
  Issue Type: Improvement
Reporter: Marcus Eriksson
 Fix For: 2.1.4


 We should simplify the code base by not doing early opening of compaction 
 results. It makes it very hard to reason about sstable life cycles since they 
 can be in many different states, opened early, starts moved, shadowed, 
 final, instead of as before, basically just one (tmp files are not really 
 'live' yet so I don't count those). The ref counting of shared resources 
 between sstables in these different states is also hard to reason about. This 
 has caused quite a few issues since we released 2.1
 I think it all boils down to a performance vs code complexity issue, is 
 opening compaction results early really 'worth it' wrt the performance gain? 
 The results in CASSANDRA-6916 sure look like the benefits are big enough, but 
 the difference should not be as big for people on SSDs (which most people who 
 care about latencies are)
 WDYT [~benedict] [~jbellis] [~iamaleksey] [~JoshuaMcKenzie]?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8833) Stop opening compaction results early

2015-02-19 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14327699#comment-14327699
 ] 

Benedict commented on CASSANDRA-8833:
-

The TL;DR of my view is: We've only just really (or nearly) finished the work 
for this, and many of its problems have been more general than just it. So 
aborting now is premature. However it is broader in scope than originally 
envisaged, and it has been painful - especially for Marcus (and also for 
myself) - so I understand the desire to see the back of it, and won't fight for 
its retention.

 Stop opening compaction results early
 -

 Key: CASSANDRA-8833
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8833
 Project: Cassandra
  Issue Type: Improvement
Reporter: Marcus Eriksson
 Fix For: 2.1.4


 We should simplify the code base by not doing early opening of compaction 
 results. It makes it very hard to reason about sstable life cycles since they 
 can be in many different states, opened early, starts moved, shadowed, 
 final, instead of as before, basically just one (tmp files are not really 
 'live' yet so I don't count those). The ref counting of shared resources 
 between sstables in these different states is also hard to reason about. This 
 has caused quite a few issues since we released 2.1
 I think it all boils down to a performance vs code complexity issue, is 
 opening compaction results early really 'worth it' wrt the performance gain? 
 The results in CASSANDRA-6916 sure look like the benefits are big enough, but 
 the difference should not be as big for people on SSDs (which most people who 
 care about latencies are)
 WDYT [~benedict] [~jbellis] [~iamaleksey] [~JoshuaMcKenzie]?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8833) Stop opening compaction results early

2015-02-19 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14327678#comment-14327678
 ] 

Benedict commented on CASSANDRA-8833:
-

Well, it certainly has turned out to be more complicated than expected. In 
particular shadowed is something I would have liked to avoid.

The mistake I made when implementing this, apart from not realising LCS 
required no overlaps, was to not expend a great deal of effort making it _more_ 
robust than the existing solution. Many of the problems encountered here have 
exhibited in the prior framework, or in other new pieces that are unrelated, 
just less frequently. Cleanup of compaction has been buggy since time 
immemorial; bloom filters, index summaries and entire sstables have been left 
allocated indefinitely until very recent fixes. Index summary redistribution 
has had all of the same DataTracker problems. So an argument could be made that 
this is actually helping us out, by exercising these codepaths more 
aggressively. After a major overhaul, there was an oversight (CASSANDRA-8764) 
which is probably to be expected, and is the impetus for this ticket. This 
overhaul has fixed numerous problems prior as well as new.

Removing this will introduce its own risks (every line touched is a new bug in 
waiting), and doesn't eliminate as much complexity as might be desired. 
SSTableReader lifecycles are still approximately as complex due to index 
summary redistribution. Bloom filter resizing (CASSANDRA-6633) will have the 
same requirement. The same goes for shared resources.

Reference counting has been broken for a long time, and we are now in a safer 
position than we have ever been (although improvements are still needed, such 
as CASSANDRA-8568), and the latest bugs we are finding in 2.0 are known not to 
affect 2.1. We wouldn't be there without this as an impetus.

So, many of the changes will not be rolled back, because they were necessary in 
themselves. They were just less pressing. There would still be a significant 
removal of code, though.

As to the improvement: the compaction cliff can be pretty significant if 
you're compacting huge files. Possibly introducing several minutes (or longer) 
of seriously degraded performance. If you compact tens of gigabytes you may see 
your entire page cache wiped out. Even with SSDs that's going to be painful for 
a period. Without SSDs it could be tens of minutes.

Other than that, I'll let everyone else reach a conclusion about this.

 Stop opening compaction results early
 -

 Key: CASSANDRA-8833
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8833
 Project: Cassandra
  Issue Type: Improvement
Reporter: Marcus Eriksson
 Fix For: 2.1.4


 We should simplify the code base by not doing early opening of compaction 
 results. It makes it very hard to reason about sstable life cycles since they 
 can be in many different states, opened early, starts moved, shadowed, 
 final, instead of as before, basically just one (tmp files are not really 
 'live' yet so I don't count those). The ref counting of shared resources 
 between sstables in these different states is also hard to reason about. This 
 has caused quite a few issues since we released 2.1
 I think it all boils down to a performance vs code complexity issue, is 
 opening compaction results early really 'worth it' wrt the performance gain? 
 The results in CASSANDRA-6916 sure look like the benefits are big enough, but 
 the difference should not be as big for people on SSDs (which most people who 
 care about latencies are)
 WDYT [~benedict] [~jbellis] [~iamaleksey] [~JoshuaMcKenzie]?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8833) Stop opening compaction results early

2015-02-19 Thread Marcus Eriksson (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14327746#comment-14327746
 ] 

Marcus Eriksson commented on CASSANDRA-8833:


I agree that we might almost be there and it would suck to 'give up' at this 
point, but the reason I created the ticket now is that I fear the issues around 
this will continue, and that it will make it harder to implement new features 
around compaction (or anything involving sstables really) in the future.

 Stop opening compaction results early
 -

 Key: CASSANDRA-8833
 URL: https://issues.apache.org/jira/browse/CASSANDRA-8833
 Project: Cassandra
  Issue Type: Improvement
Reporter: Marcus Eriksson
 Fix For: 2.1.4


 We should simplify the code base by not doing early opening of compaction 
 results. It makes it very hard to reason about sstable life cycles since they 
 can be in many different states, opened early, starts moved, shadowed, 
 final, instead of as before, basically just one (tmp files are not really 
 'live' yet so I don't count those). The ref counting of shared resources 
 between sstables in these different states is also hard to reason about. This 
 has caused quite a few issues since we released 2.1
 I think it all boils down to a performance vs code complexity issue, is 
 opening compaction results early really 'worth it' wrt the performance gain? 
 The results in CASSANDRA-6916 sure look like the benefits are big enough, but 
 the difference should not be as big for people on SSDs (which most people who 
 care about latencies are)
 WDYT [~benedict] [~jbellis] [~iamaleksey] [~JoshuaMcKenzie]?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)