[jira] [Commented] (CASSANDRA-6788) Race condition silently kills thrift server

2014-06-04 Thread Vincent Mallet (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14018401#comment-14018401
 ] 

Vincent Mallet commented on CASSANDRA-6788:
---

+1 on the port to 1.2, we're hoping to grab your patch as soon as you feel 
comfortable with it and commit it for 1.2.17.


 Race condition silently kills thrift server
 ---

 Key: CASSANDRA-6788
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6788
 Project: Cassandra
  Issue Type: Bug
Reporter: Christian Rolf
Assignee: Christian Rolf
 Fix For: 1.2.17, 2.0.7, 2.1 beta2

 Attachments: 6788-v2.txt, 6788-v3.txt, 6793-v3-rebased.txt, 
 race_patch.diff


 There's a race condition in CustomTThreadPoolServer that can cause the thrift 
 server to silently stop listening for connections. 
 It happens when the executor service throws a RejectedExecutionException, 
 which is not caught.
  
 Silent in the sense that OpsCenter doesn't notice any problem since JMX is 
 still running fine.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (CASSANDRA-6756) Provide option to avoid loading orphan SSTables on startup

2014-03-03 Thread Vincent Mallet (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13918313#comment-13918313
 ] 

Vincent Mallet commented on CASSANDRA-6756:
---

+1 on the lost+found idea.
Btw we're trying to analyze the source of some of these SSTables we're finding 
in some clusters and there seems to be other causes than failed repairs (in 
1.1) (OOM, problem with compaction, etc; still investigating). Having that 
option would make us sleep better at night.



 Provide option to avoid loading orphan SSTables on startup
 --

 Key: CASSANDRA-6756
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6756
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Vincent Mallet
 Fix For: 1.2.16


 When Cassandra starts up, it enumerates all SSTables on disk for a known 
 column family and proceeds to loading all of them, even those that were left 
 behind before the restart because of a problem of some sort. This can lead to 
 data gain (resurrected data) which is just as bad as data loss.
 The ask is to provide a yaml config option which would allow one to turn that 
 behavior off by default so a cassandra cluster would be immune to data gain 
 when nodes get restarted (at least with Leveled where Cassandra keeps track 
 of SSTables).
 This is sort of a follow-up to CASSANDRA-6503 (fixed in 1.2.14). We're just 
 extremely nervous that orphan SSTables could appear because of some other 
 potential problem somewhere else and cause zombie data on a random reboot. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (CASSANDRA-6756) Provide option to avoid loading orphan SSTables on startup

2014-02-23 Thread Vincent Mallet (JIRA)
Vincent Mallet created CASSANDRA-6756:
-

 Summary: Provide option to avoid loading orphan SSTables on startup
 Key: CASSANDRA-6756
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6756
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Vincent Mallet
 Fix For: 1.2.16


When Cassandra starts up, it enumerates all SSTables on disk for a known column 
family and proceeds to loading all of them, even those that were left behind 
before the restart because of a problem of some sort. This can lead to data 
gain (resurrected data) which is just as bad as data loss.

The ask is to provide a yaml config option which would allow one to turn that 
behavior off by default so a cassandra cluster would be immune to data gain 
when nodes get restarted (at least with Leveled where Cassandra keeps track of 
SSTables).

This is sort of a follow-up to CASSANDRA-6503 (fixed in 1.2.14). We're just 
extremely nervous that orphan SSTables could appear because of some other 
potential problem somewhere else and cause zombie data on a random reboot. 




--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (CASSANDRA-6756) Provide option to avoid loading orphan SSTables on startup

2014-02-23 Thread Vincent Mallet (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13910074#comment-13910074
 ] 

Vincent Mallet commented on CASSANDRA-6756:
---

Any kind, really. The stalled repair problem hit us pretty massively on a 
recent cluster bounce, and I'm thinking who knows what other problem or other 
bug is going to leave orphan SSTables behind. Fair enough there shouldn't be 
any, but the day there are it's not worth us paying the price of zombie data. 
We're also thinking of grabbing that patch and porting it to 1.1 while we're on 
it until we migrate to 1.2. The default behavior of sucking in any SSTables 
that are laying around is just making us very nervous.

Hope that makes sense, thanks.


 Provide option to avoid loading orphan SSTables on startup
 --

 Key: CASSANDRA-6756
 URL: https://issues.apache.org/jira/browse/CASSANDRA-6756
 Project: Cassandra
  Issue Type: Improvement
  Components: Core
Reporter: Vincent Mallet
 Fix For: 1.2.16


 When Cassandra starts up, it enumerates all SSTables on disk for a known 
 column family and proceeds to loading all of them, even those that were left 
 behind before the restart because of a problem of some sort. This can lead to 
 data gain (resurrected data) which is just as bad as data loss.
 The ask is to provide a yaml config option which would allow one to turn that 
 behavior off by default so a cassandra cluster would be immune to data gain 
 when nodes get restarted (at least with Leveled where Cassandra keeps track 
 of SSTables).
 This is sort of a follow-up to CASSANDRA-6503 (fixed in 1.2.14). We're just 
 extremely nervous that orphan SSTables could appear because of some other 
 potential problem somewhere else and cause zombie data on a random reboot. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (CASSANDRA-5893) CqlParser throws StackOverflowError on bigger batch operation

2013-08-26 Thread Vincent Mallet (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13750416#comment-13750416
 ] 

Vincent Mallet commented on CASSANDRA-5893:
---

Great, thanks!

 CqlParser throws StackOverflowError on bigger batch operation
 -

 Key: CASSANDRA-5893
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5893
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Vincent Mallet
Assignee: Aleksey Yeschenko
 Fix For: 1.2.9

 Attachments: 5893.txt


 We are seeing a problem with CQL3/Cassandra 1.2.8 where a large batch 
 operation causes the CqlParser to throw a StackOverflowError (-Xss180k 
 initially, then -Xss325k).
 Shouldn't a batch be processed iteratively to avoid having to bump stack 
 sizes to unreasonably large values?
 Here is more info from the original problem description:
 
 It looks like the CqlParser in 1.2.8 (probably 1.2.x, but i didn't look) is 
 implemented recursively in such a way that large batch statements blow up the 
 stack. We, of course on a Friday night, have a particular piece of code 
 that's hitting a degenerate case that creates a batch of inserts with a VERY 
 large number of collection items, and it manifests as a StackOverflow coming 
 out the cass servers:
 java.lang.StackOverflowError
at org.apache.cassandra.cql3.CqlParser.value(CqlParser.java:5266)
at org.apache.cassandra.cql3.CqlParser.term(CqlParser.java:5627)
at org.apache.cassandra.cql3.CqlParser.set_tail(CqlParser.java:4807)
at org.apache.cassandra.cql3.CqlParser.set_tail(CqlParser.java:4813)
at org.apache.cassandra.cql3.CqlParser.set_tail(CqlParser.java:4813)
at org.apache.cassandra.cql3.CqlParser.set_tail(CqlParser.java:4813)
at org.apache.cassandra.cql3.CqlParser.set_tail(CqlParser.java:4813)
at org.apache.cassandra.cql3.CqlParser.set_tail(CqlParser.java:4813)
at org.apache.cassandra.cql3.CqlParser.set_tail(CqlParser.java:4813)
   ...
   
 I think in the short term I can give up the atomicity of a batch in this code 
 and kind of suck it up, but obviously I'd prefer not to. I'm also not sure if 
 I kept a single batch, but split this into smaller pieces in each statement, 
 whether that would still fail. I'm guessing I could also crank the hell out 
 of the stack size on the servers, but that feels pretty dirty.
 It seems like the CqlParser should probably be implemented in a way that 
 isn't quite so vulnerable to this, though I fully accept that this batch is 
 koo-koo-bananas.
 
 Thanks!
  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (CASSANDRA-5893) CqlParser throws StackOverflowError on bigger batch operation

2013-08-15 Thread Vincent Mallet (JIRA)
Vincent Mallet created CASSANDRA-5893:
-

 Summary: CqlParser throws StackOverflowError on bigger batch 
operation
 Key: CASSANDRA-5893
 URL: https://issues.apache.org/jira/browse/CASSANDRA-5893
 Project: Cassandra
  Issue Type: Bug
  Components: Core
Reporter: Vincent Mallet
 Fix For: 1.2.8


We are seeing a problem with CQL3/Cassandra 1.2.8 where a large batch operation 
causes the CqlParser to throw a StackOverflowError (-Xss180k initially, then 
-Xss325k).

Shouldn't a batch be processed iteratively to avoid having to bump stack sizes 
to unreasonably large values?

Here is more info from the original problem description:



It looks like the CqlParser in 1.2.8 (probably 1.2.x, but i didn't look) is 
implemented recursively in such a way that large batch statements blow up the 
stack. We, of course on a Friday night, have a particular piece of code that's 
hitting a degenerate case that creates a batch of inserts with a VERY large 
number of collection items, and it manifests as a StackOverflow coming out the 
cass servers:

java.lang.StackOverflowError
   at org.apache.cassandra.cql3.CqlParser.value(CqlParser.java:5266)
   at org.apache.cassandra.cql3.CqlParser.term(CqlParser.java:5627)
   at org.apache.cassandra.cql3.CqlParser.set_tail(CqlParser.java:4807)
   at org.apache.cassandra.cql3.CqlParser.set_tail(CqlParser.java:4813)
   at org.apache.cassandra.cql3.CqlParser.set_tail(CqlParser.java:4813)
   at org.apache.cassandra.cql3.CqlParser.set_tail(CqlParser.java:4813)
   at org.apache.cassandra.cql3.CqlParser.set_tail(CqlParser.java:4813)
   at org.apache.cassandra.cql3.CqlParser.set_tail(CqlParser.java:4813)
   at org.apache.cassandra.cql3.CqlParser.set_tail(CqlParser.java:4813)
...

I think in the short term I can give up the atomicity of a batch in this code 
and kind of suck it up, but obviously I'd prefer not to. I'm also not sure if I 
kept a single batch, but split this into smaller pieces in each statement, 
whether that would still fail. I'm guessing I could also crank the hell out of 
the stack size on the servers, but that feels pretty dirty.

It seems like the CqlParser should probably be implemented in a way that isn't 
quite so vulnerable to this, though I fully accept that this batch is 
koo-koo-bananas.


Thanks!

 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira