[jira] [Commented] (CASSANDRA-6788) Race condition silently kills thrift server
[ https://issues.apache.org/jira/browse/CASSANDRA-6788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14018401#comment-14018401 ] Vincent Mallet commented on CASSANDRA-6788: --- +1 on the port to 1.2, we're hoping to grab your patch as soon as you feel comfortable with it and commit it for 1.2.17. Race condition silently kills thrift server --- Key: CASSANDRA-6788 URL: https://issues.apache.org/jira/browse/CASSANDRA-6788 Project: Cassandra Issue Type: Bug Reporter: Christian Rolf Assignee: Christian Rolf Fix For: 1.2.17, 2.0.7, 2.1 beta2 Attachments: 6788-v2.txt, 6788-v3.txt, 6793-v3-rebased.txt, race_patch.diff There's a race condition in CustomTThreadPoolServer that can cause the thrift server to silently stop listening for connections. It happens when the executor service throws a RejectedExecutionException, which is not caught. Silent in the sense that OpsCenter doesn't notice any problem since JMX is still running fine. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-6756) Provide option to avoid loading orphan SSTables on startup
[ https://issues.apache.org/jira/browse/CASSANDRA-6756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13918313#comment-13918313 ] Vincent Mallet commented on CASSANDRA-6756: --- +1 on the lost+found idea. Btw we're trying to analyze the source of some of these SSTables we're finding in some clusters and there seems to be other causes than failed repairs (in 1.1) (OOM, problem with compaction, etc; still investigating). Having that option would make us sleep better at night. Provide option to avoid loading orphan SSTables on startup -- Key: CASSANDRA-6756 URL: https://issues.apache.org/jira/browse/CASSANDRA-6756 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Vincent Mallet Fix For: 1.2.16 When Cassandra starts up, it enumerates all SSTables on disk for a known column family and proceeds to loading all of them, even those that were left behind before the restart because of a problem of some sort. This can lead to data gain (resurrected data) which is just as bad as data loss. The ask is to provide a yaml config option which would allow one to turn that behavior off by default so a cassandra cluster would be immune to data gain when nodes get restarted (at least with Leveled where Cassandra keeps track of SSTables). This is sort of a follow-up to CASSANDRA-6503 (fixed in 1.2.14). We're just extremely nervous that orphan SSTables could appear because of some other potential problem somewhere else and cause zombie data on a random reboot. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (CASSANDRA-6756) Provide option to avoid loading orphan SSTables on startup
Vincent Mallet created CASSANDRA-6756: - Summary: Provide option to avoid loading orphan SSTables on startup Key: CASSANDRA-6756 URL: https://issues.apache.org/jira/browse/CASSANDRA-6756 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Vincent Mallet Fix For: 1.2.16 When Cassandra starts up, it enumerates all SSTables on disk for a known column family and proceeds to loading all of them, even those that were left behind before the restart because of a problem of some sort. This can lead to data gain (resurrected data) which is just as bad as data loss. The ask is to provide a yaml config option which would allow one to turn that behavior off by default so a cassandra cluster would be immune to data gain when nodes get restarted (at least with Leveled where Cassandra keeps track of SSTables). This is sort of a follow-up to CASSANDRA-6503 (fixed in 1.2.14). We're just extremely nervous that orphan SSTables could appear because of some other potential problem somewhere else and cause zombie data on a random reboot. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (CASSANDRA-6756) Provide option to avoid loading orphan SSTables on startup
[ https://issues.apache.org/jira/browse/CASSANDRA-6756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13910074#comment-13910074 ] Vincent Mallet commented on CASSANDRA-6756: --- Any kind, really. The stalled repair problem hit us pretty massively on a recent cluster bounce, and I'm thinking who knows what other problem or other bug is going to leave orphan SSTables behind. Fair enough there shouldn't be any, but the day there are it's not worth us paying the price of zombie data. We're also thinking of grabbing that patch and porting it to 1.1 while we're on it until we migrate to 1.2. The default behavior of sucking in any SSTables that are laying around is just making us very nervous. Hope that makes sense, thanks. Provide option to avoid loading orphan SSTables on startup -- Key: CASSANDRA-6756 URL: https://issues.apache.org/jira/browse/CASSANDRA-6756 Project: Cassandra Issue Type: Improvement Components: Core Reporter: Vincent Mallet Fix For: 1.2.16 When Cassandra starts up, it enumerates all SSTables on disk for a known column family and proceeds to loading all of them, even those that were left behind before the restart because of a problem of some sort. This can lead to data gain (resurrected data) which is just as bad as data loss. The ask is to provide a yaml config option which would allow one to turn that behavior off by default so a cassandra cluster would be immune to data gain when nodes get restarted (at least with Leveled where Cassandra keeps track of SSTables). This is sort of a follow-up to CASSANDRA-6503 (fixed in 1.2.14). We're just extremely nervous that orphan SSTables could appear because of some other potential problem somewhere else and cause zombie data on a random reboot. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (CASSANDRA-5893) CqlParser throws StackOverflowError on bigger batch operation
[ https://issues.apache.org/jira/browse/CASSANDRA-5893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13750416#comment-13750416 ] Vincent Mallet commented on CASSANDRA-5893: --- Great, thanks! CqlParser throws StackOverflowError on bigger batch operation - Key: CASSANDRA-5893 URL: https://issues.apache.org/jira/browse/CASSANDRA-5893 Project: Cassandra Issue Type: Bug Components: Core Reporter: Vincent Mallet Assignee: Aleksey Yeschenko Fix For: 1.2.9 Attachments: 5893.txt We are seeing a problem with CQL3/Cassandra 1.2.8 where a large batch operation causes the CqlParser to throw a StackOverflowError (-Xss180k initially, then -Xss325k). Shouldn't a batch be processed iteratively to avoid having to bump stack sizes to unreasonably large values? Here is more info from the original problem description: It looks like the CqlParser in 1.2.8 (probably 1.2.x, but i didn't look) is implemented recursively in such a way that large batch statements blow up the stack. We, of course on a Friday night, have a particular piece of code that's hitting a degenerate case that creates a batch of inserts with a VERY large number of collection items, and it manifests as a StackOverflow coming out the cass servers: java.lang.StackOverflowError at org.apache.cassandra.cql3.CqlParser.value(CqlParser.java:5266) at org.apache.cassandra.cql3.CqlParser.term(CqlParser.java:5627) at org.apache.cassandra.cql3.CqlParser.set_tail(CqlParser.java:4807) at org.apache.cassandra.cql3.CqlParser.set_tail(CqlParser.java:4813) at org.apache.cassandra.cql3.CqlParser.set_tail(CqlParser.java:4813) at org.apache.cassandra.cql3.CqlParser.set_tail(CqlParser.java:4813) at org.apache.cassandra.cql3.CqlParser.set_tail(CqlParser.java:4813) at org.apache.cassandra.cql3.CqlParser.set_tail(CqlParser.java:4813) at org.apache.cassandra.cql3.CqlParser.set_tail(CqlParser.java:4813) ... I think in the short term I can give up the atomicity of a batch in this code and kind of suck it up, but obviously I'd prefer not to. I'm also not sure if I kept a single batch, but split this into smaller pieces in each statement, whether that would still fail. I'm guessing I could also crank the hell out of the stack size on the servers, but that feels pretty dirty. It seems like the CqlParser should probably be implemented in a way that isn't quite so vulnerable to this, though I fully accept that this batch is koo-koo-bananas. Thanks! -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (CASSANDRA-5893) CqlParser throws StackOverflowError on bigger batch operation
Vincent Mallet created CASSANDRA-5893: - Summary: CqlParser throws StackOverflowError on bigger batch operation Key: CASSANDRA-5893 URL: https://issues.apache.org/jira/browse/CASSANDRA-5893 Project: Cassandra Issue Type: Bug Components: Core Reporter: Vincent Mallet Fix For: 1.2.8 We are seeing a problem with CQL3/Cassandra 1.2.8 where a large batch operation causes the CqlParser to throw a StackOverflowError (-Xss180k initially, then -Xss325k). Shouldn't a batch be processed iteratively to avoid having to bump stack sizes to unreasonably large values? Here is more info from the original problem description: It looks like the CqlParser in 1.2.8 (probably 1.2.x, but i didn't look) is implemented recursively in such a way that large batch statements blow up the stack. We, of course on a Friday night, have a particular piece of code that's hitting a degenerate case that creates a batch of inserts with a VERY large number of collection items, and it manifests as a StackOverflow coming out the cass servers: java.lang.StackOverflowError at org.apache.cassandra.cql3.CqlParser.value(CqlParser.java:5266) at org.apache.cassandra.cql3.CqlParser.term(CqlParser.java:5627) at org.apache.cassandra.cql3.CqlParser.set_tail(CqlParser.java:4807) at org.apache.cassandra.cql3.CqlParser.set_tail(CqlParser.java:4813) at org.apache.cassandra.cql3.CqlParser.set_tail(CqlParser.java:4813) at org.apache.cassandra.cql3.CqlParser.set_tail(CqlParser.java:4813) at org.apache.cassandra.cql3.CqlParser.set_tail(CqlParser.java:4813) at org.apache.cassandra.cql3.CqlParser.set_tail(CqlParser.java:4813) at org.apache.cassandra.cql3.CqlParser.set_tail(CqlParser.java:4813) ... I think in the short term I can give up the atomicity of a batch in this code and kind of suck it up, but obviously I'd prefer not to. I'm also not sure if I kept a single batch, but split this into smaller pieces in each statement, whether that would still fail. I'm guessing I could also crank the hell out of the stack size on the servers, but that feels pretty dirty. It seems like the CqlParser should probably be implemented in a way that isn't quite so vulnerable to this, though I fully accept that this batch is koo-koo-bananas. Thanks! -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira