[jira] [Commented] (CASSANDRA-3181) Compaction fails to occur

2011-09-14 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13104481#comment-13104481
 ] 

Sylvain Lebresne commented on CASSANDRA-3181:
-

+1

 Compaction fails to occur
 -

 Key: CASSANDRA-3181
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3181
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 1.0.0
Reporter: Brandon Williams
Assignee: Jonathan Ellis
  Labels: compaction
 Fix For: 1.0.0

 Attachments: 3181-2.txt, 3181.txt


 Compaction just stops running at some point.  To repro, insert like 20M rows 
 with a 1G heap and you'll get around 1k sstables.  Restarting doesn't help, 
 you have to invoke a major to get anything to happen.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3181) Compaction fails to occur

2011-09-13 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13103821#comment-13103821
 ] 

Sylvain Lebresne commented on CASSANDRA-3181:
-

What is unclear is what did stop the compactions from happening ? Because 
Brandon said that restarting didn't helped, but I get from that that compaction 
stopped before the restart. Basically, if the only thing that happened is 
after a restart, if you don't do any inserts, no compaction happens, then ok, 
that's not a big deal. But if it is indeed that Compaction just stops running 
at some point, then there is something that we need to fix.

 Compaction fails to occur
 -

 Key: CASSANDRA-3181
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3181
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 1.0.0
Reporter: Brandon Williams
Assignee: Benjamin Coverston
 Fix For: 1.0.0


 Compaction just stops running at some point.  To repro, insert like 20M rows 
 with a 1G heap and you'll get around 1k sstables.  Restarting doesn't help, 
 you have to invoke a major to get anything to happen.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3181) Compaction fails to occur

2011-09-13 Thread Benjamin Coverston (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13103826#comment-13103826
 ] 

Benjamin Coverston commented on CASSANDRA-3181:
---

From his repro I was able to get compactions going again with a simple 
write/flush.

From Brandon, he's not sure how compactions stopped. There are only two 
scenarios where I see compactions stopping like this with the tiered 
compaction strategy:

# the server was restarted
# a compaction failed unexpectedly

The fact that there were no errors in the log makes #2 unlikely. I'm pretty 
sure that we're just looking at a problem where a server restarts and there is 
no more activity triggering a flush.

 Compaction fails to occur
 -

 Key: CASSANDRA-3181
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3181
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 1.0.0
Reporter: Brandon Williams
Assignee: Benjamin Coverston
 Fix For: 1.0.0


 Compaction just stops running at some point.  To repro, insert like 20M rows 
 with a 1G heap and you'll get around 1k sstables.  Restarting doesn't help, 
 you have to invoke a major to get anything to happen.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3181) Compaction fails to occur

2011-09-13 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13103949#comment-13103949
 ] 

Sylvain Lebresne commented on CASSANDRA-3181:
-

bq. . So I think repair was blocking minors

repair should be on its own executor now, so it shouldn't block minors.

bq. a compaction failed unexpectedly

hum, a compaction failing shouldn't stop other compactions. Otherwise this is 
worth fixing.

bq. I'm pretty sure that we're just looking at a problem where a server 
restarts and there is no more activity triggering a flush

If that's the case, then is there really much we want to do ? And even if we 
want, we should move that to 1.0.1. Just want to make sure this doesn't hide a 
real, unknown,  problem.

 Compaction fails to occur
 -

 Key: CASSANDRA-3181
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3181
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 1.0.0
Reporter: Brandon Williams
Assignee: Benjamin Coverston
 Fix For: 1.0.0


 Compaction just stops running at some point.  To repro, insert like 20M rows 
 with a 1G heap and you'll get around 1k sstables.  Restarting doesn't help, 
 you have to invoke a major to get anything to happen.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3181) Compaction fails to occur

2011-09-13 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13103952#comment-13103952
 ] 

Jonathan Ellis commented on CASSANDRA-3181:
---

bq. CASSANDRA-2444 got in the way

I'm not sure what the right solution is here.  I buy the premise of 2444 that 
you don't necessarily want to get hammered by compaction when you're first 
starting up (warming up caches).  So I don't think check for compactions ever 
N seconds is a great policy.  But, I'm not sure check every N seconds, 
starting M minutes after startup is great either because it's not something a 
user will just guess when he's wondering why aren't compactions happening yet?

Any other ideas?

 Compaction fails to occur
 -

 Key: CASSANDRA-3181
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3181
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 1.0.0
Reporter: Brandon Williams
Assignee: Benjamin Coverston
 Fix For: 1.0.0


 Compaction just stops running at some point.  To repro, insert like 20M rows 
 with a 1G heap and you'll get around 1k sstables.  Restarting doesn't help, 
 you have to invoke a major to get anything to happen.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3181) Compaction fails to occur

2011-09-13 Thread Brandon Williams (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13103950#comment-13103950
 ] 

Brandon Williams commented on CASSANDRA-3181:
-

bq. repair should be on its own executor now, so it shouldn't block minors.

Ok, then maybe I just hit one of the OOM bugs, and compaction had never fully 
completed.  After restarting I never did any more writes, and we know 
compaction won't happen at startup.

bq. If that's the case, then is there really much we want to do ? And even if 
we want, we should move that to 1.0.1. Just want to make sure this doesn't hide 
a real, unknown, problem.

I think CASSANDRA-2444 was wrong.  It should be an option, and one that is off 
by default.  Starting a server with 1k sstables and having nothing happen is a 
bit of a shock, and having no way out of it besides hacks like forcing a flush 
or a major isn't great.

 Compaction fails to occur
 -

 Key: CASSANDRA-3181
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3181
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 1.0.0
Reporter: Brandon Williams
Assignee: Benjamin Coverston
 Fix For: 1.0.0


 Compaction just stops running at some point.  To repro, insert like 20M rows 
 with a 1G heap and you'll get around 1k sstables.  Restarting doesn't help, 
 you have to invoke a major to get anything to happen.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3181) Compaction fails to occur

2011-09-13 Thread Benjamin Coverston (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13104059#comment-13104059
 ] 

Benjamin Coverston commented on CASSANDRA-3181:
---

Was a little concerned about removing the scheduled compaction from leveldb, 
but the mechanics are really no different from the tiered compaction in terms 
of it will stop when it's finished assuming that nothing goes wrong with the 
running compactions.

To be (probably overly) pedantic another advantage this has is that you are 
essentially kicking off only a single compaction where when the server was 
brought down there were probably multiple compactions in flight.

+1



 Compaction fails to occur
 -

 Key: CASSANDRA-3181
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3181
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 1.0.0
Reporter: Brandon Williams
Assignee: Jonathan Ellis
  Labels: compaction
 Fix For: 1.0.0

 Attachments: 3181.txt


 Compaction just stops running at some point.  To repro, insert like 20M rows 
 with a 1G heap and you'll get around 1k sstables.  Restarting doesn't help, 
 you have to invoke a major to get anything to happen.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3181) Compaction fails to occur

2011-09-12 Thread Brandon Williams (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13102751#comment-13102751
 ] 

Brandon Williams commented on CASSANDRA-3181:
-

Note that this is with the default compaction strategy.

 Compaction fails to occur
 -

 Key: CASSANDRA-3181
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3181
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 1.0.0
Reporter: Brandon Williams
Assignee: Benjamin Coverston
 Fix For: 1.0.0


 Compaction just stops running at some point.  To repro, insert like 20M rows 
 with a 1G heap and you'll get around 1k sstables.  Restarting doesn't help, 
 you have to invoke a major to get anything to happen.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (CASSANDRA-3181) Compaction fails to occur

2011-09-12 Thread Benjamin Coverston (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-3181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13103334#comment-13103334
 ] 

Benjamin Coverston commented on CASSANDRA-3181:
---

Tested this a bit and I was able to reproduce the problem.

Basically what I think happened was:

# There were compactions in-flight and scheduled. 
# Something happened to halt the compactions (probably restarted the server)
# https://issues.apache.org/jira/browse/CASSANDRA-2444 got in the way

I propose one of the following: 

* add option in as originally proposed
OR
* put compactions on a timer and ask periodically `does any compaction need to 
be done`. Also stagger the start of the compactions by some variable amount 
(5-10 minutes) similar to what we did with Hinted Handoff in 0.7.

 

 Compaction fails to occur
 -

 Key: CASSANDRA-3181
 URL: https://issues.apache.org/jira/browse/CASSANDRA-3181
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Affects Versions: 1.0.0
Reporter: Brandon Williams
Assignee: Benjamin Coverston
 Fix For: 1.0.0


 Compaction just stops running at some point.  To repro, insert like 20M rows 
 with a 1G heap and you'll get around 1k sstables.  Restarting doesn't help, 
 you have to invoke a major to get anything to happen.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira