Bootstrap stuck: vnode enabled 1.2.12

2014-02-14 Thread Arindam Barua

After our otherwise successful upgrade procedure to enable vnodes, when adding 
back new hosts to our cluster, one non-seed host ran into a hardware issue 
during bootstrap. By the time the hardware issue was fixed a week later, all 
other nodes were added successfully, cleaned, repaired. The disks on this node 
were untouched, and when the node was started back up, it detected an 
interrupted bootstrap, and attempted to bootstrap. However, after ~24 hrs it 
was still stuck in the 'JOINING' state according to nodetool netstats on that 
node, even though no streams were flowing to/from it. Also, it did not appear 
in nodetool status in any way/form (not even as JOINING).

From couple of observed thread dumps, the stack of the thread blocked during 
bootstrap is at [1].

Since the node wasn't making any progress, I ended up stopping Cassandra, 
cleaning up the data and commitlog directories, and attempted a fresh 
bootstrap. Nodetool netstats immediately reported a whole bunch of streams 
queued up, and data started streaming to the node. The data directory quickly 
grew to 18 GB (the other nodes had ~25GB, but we have lot of data with low 
TTLs). However, the node ended up being in the earlier reported state, i.e. 
nodetool netstats doesn't have anything queued, but still reports the JOINING 
state, even though it's been  24 hrs. There are no other ERRORS in the logs, 
and new data being written to the cluster makes it to this node just fine, 
triggering compactions, etc from time to time.

Any help is appreciated.

Thanks,
Arindam

[1] Thread dump
Thread 3708: (state = BLOCKED)
 - sun.misc.Unsafe.park(boolean, long) @bci=0 (Compiled frame; information may
   be imprecise)
 - java.util.concurrent.locks.LockSupport.park(java.lang.Object) @bci=14,
   line=156 (Interpreted frame)
 - java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt()
   @bci=1, line=811 (Interpreted frame)
 -
   
java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(int)
   @bci=55, line=969 (Interpreted frame)
 -
   
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(int)
   @bci=24, line=1281 (Interpreted frame)
 - java.util.concurrent.CountDownLatch.await() @bci=5, line=207 (Interpreted
   frame)
 - org.apache.cassandra.dht.RangeStreamer.fetch() @bci=209, line=256
   (Interpreted frame)
 - org.apache.cassandra.dht.BootStrapper.bootstrap() @bci=120, line=84
   (Interpreted frame)
 - org.apache.cassandra.service.StorageService.bootstrap(java.util.Collection)
   @bci=172, line=978 (Interpreted frame)
 - org.apache.cassandra.service.StorageService.joinTokenRing(int) @bci=827,
   line=744 (Interpreted frame)
 - org.apache.cassandra.service.StorageService.initServer(int) @bci=363,
   line=585 (Interpreted frame)
 - org.apache.cassandra.service.StorageService.initServer() @bci=4, line=482
   (Interpreted frame)
 - org.apache.cassandra.service.CassandraDaemon.setup() @bci=1069, line=348
   (Interpreted frame)
 - org.apache.cassandra.service.CassandraDaemon.activate() @bci=59, line=447
   (Interpreted frame)
 - org.apache.cassandra.service.CassandraDaemon.main(java.lang.String[]) @bci=3,
   line=490 (Interpreted frame)


Re: TimedOutException in Java but not in cqlsh

2014-02-14 Thread Cyril Scetbon
After a few tests, it does not depend on the query. Whatever cql3 query I do, I 
always get the same exception. If someone sees something ...
-- 
Cyril SCETBON

On 13 Feb 2014, at 17:22, Cyril Scetbon cyril.scet...@free.fr wrote:

 Hi,
 
 I get a weird issue with cassandra 1.2.13. As written in the subject, a query 
 executed by class CqlPagingRecordReader raises a TimedOutException exception 
 in Java but I don't have any error when I use it with cqlsh. What's the 
 difference between those 2 ways ? Does cqlsh bypass some configuration 
 compared to Java ?
 
 You can find my sample code at http://pastebin.com/vbAFyAys (don't take care 
 of the way it's coded cause it's just a sample code). FYI, I can't reproduce 
 it on another cluster. Here is the output of the 2 different ways (java and 
 cqlsh) I used http://pastebin.com/umMNXJRw
 
 Thanks
 -- 
 Cyril SCETBON
 



Re: TimedOutException in Java but not in cqlsh

2014-02-14 Thread Vivek Mishra
Check for consisteny level and socket timeout setting on client side.

-Vivek


On Fri, Feb 14, 2014 at 2:36 PM, Cyril Scetbon cyril.scet...@free.frwrote:

 After a few tests, it does not depend on the query. Whatever cql3 query I
 do, I always get the same exception. If someone sees something ...
 --
 Cyril SCETBON

 On 13 Feb 2014, at 17:22, Cyril Scetbon cyril.scet...@free.fr wrote:

  Hi,
 
  I get a weird issue with cassandra 1.2.13. As written in the subject, a
 query executed by class CqlPagingRecordReader raises a TimedOutException
 exception in Java but I don't have any error when I use it with cqlsh.
 What's the difference between those 2 ways ? Does cqlsh bypass some
 configuration compared to Java ?
 
  You can find my sample code at http://pastebin.com/vbAFyAys (don't take
 care of the way it's coded cause it's just a sample code). FYI, I can't
 reproduce it on another cluster. Here is the output of the 2 different ways
 (java and cqlsh) I used http://pastebin.com/umMNXJRw
 
  Thanks
  --
  Cyril SCETBON
 




Re: TimedOutException in Java but not in cqlsh

2014-02-14 Thread Cyril Scetbon
Hi,

Good advice. I found earlier in the morning that it's related to consistency 
LOCAL_ONE. I'll check later if it should raise an error in some cases.

Thanks
-- 
Cyril SCETBON

On 14 Feb 2014, at 10:12, Vivek Mishra mishra.v...@gmail.com wrote:

 Check for consisteny level and socket timeout setting on client side.
 
 -Vivek
 
 
 On Fri, Feb 14, 2014 at 2:36 PM, Cyril Scetbon cyril.scet...@free.fr wrote:
 After a few tests, it does not depend on the query. Whatever cql3 query I do, 
 I always get the same exception. If someone sees something ...
 --
 Cyril SCETBON
 
 On 13 Feb 2014, at 17:22, Cyril Scetbon cyril.scet...@free.fr wrote:
 
  Hi,
 
  I get a weird issue with cassandra 1.2.13. As written in the subject, a 
  query executed by class CqlPagingRecordReader raises a TimedOutException 
  exception in Java but I don't have any error when I use it with cqlsh. 
  What's the difference between those 2 ways ? Does cqlsh bypass some 
  configuration compared to Java ?
 
  You can find my sample code at http://pastebin.com/vbAFyAys (don't take 
  care of the way it's coded cause it's just a sample code). FYI, I can't 
  reproduce it on another cluster. Here is the output of the 2 different ways 
  (java and cqlsh) I used http://pastebin.com/umMNXJRw
 
  Thanks
  --
  Cyril SCETBON
 
 
 



Re: Bootstrap failure on C* 1.2.13

2014-02-14 Thread Alain RODRIGUEZ
Hi Paulo,

Did you find out how to fix this issue ? I am experimenting the exact same
issue after trying to help you on this exact subject a few days ago :).

Config : 32 C*1.2.11 nodes, Vnodes enabled, RF=3, 1 DC, On AWS EC2
m1.xlarge.

We added a few nodes (4) and it seems that this occurs on one node out of
two...

INFO 12:52:16,889 Finished streaming session
d5e4d014-9558-11e3-950d-cd6aba92807e from /xxx.xxx.xxx.xxx
java.lang.RuntimeException: Unable to fetch range
[(20078703525355016727168231761171377180,20105424945623564908585534414693308183],
(129753652951782325468767616123724624016,129754698153613057562227134647005586420],
(449910615740630024413140540076738,4524540663392564361402125588359485564],
(122461441134035840782923349842361962551,122462803389597917496737056756119104930],
(107970238065835199457922160357012606207,107987706615224138615506976884972465320],
(129754698153613057562227134647005586420,129760990520285412763184172827801136526],
(38338043252657275110873170917842646549,38368318768493907804399955985800320618],
(42022774431506526693485667522039962965,42053289032932587102300879230918436885],
(66836265760288088017242608238099612345,66844191330959602627129212011239690831],
(52540232739182066369547232798226785314,52559117354438503565212218200939569114],
(145046787539667961591986998676504957238,145057153206926436867917708334845130444],
(108279691586280658015556401795266720050,108305470056478513440634738885678702409],
(40039571254531814244837067525035822613,40053379084508254942645157728035688263],
(132027653159543236812527609067336099062,132029648290617316887203744857701890860],
(52516518106546460227349801041398186304,52540232739182066369547232798226785314],
(151797253868519929321029931533765036527,151828244658375264200603444399788004805],
(145057153206926436867917708334845130444,145084033851007428646660791831082771964],
(107963567982152736714636832273817259428,107970238065835199457922160357012606207]]
for keyspace foo_bar from any hosts
at org.apache.cassandra.dht.RangeStreamer.fetch(RangeStreamer.java:260)
at org.apache.cassandra.dht.BootStrapper.bootstrap(BootStrapper.java:84)
at
org.apache.cassandra.service.StorageService.bootstrap(StorageService.java:973)
at
org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:740)
at
org.apache.cassandra.service.StorageService.initServer(StorageService.java:584)
at
org.apache.cassandra.service.StorageService.initServer(StorageService.java:481)
at
org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:348)
at
org.apache.cassandra.service.CassandraDaemon.init(CassandraDaemon.java:381)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at
org.apache.commons.daemon.support.DaemonLoader.load(DaemonLoader.java:212)

Cannot load daemon

Service exit with a return value of 3

Hope you'll be able to help me on this one :)


2014-02-07 19:24 GMT+01:00 Robert Coli rc...@eventbrite.com:

 On Fri, Feb 7, 2014 at 4:41 AM, Alain RODRIGUEZ arodr...@gmail.comwrote:

 From changelog :


 1.2.15
  * Move handling of migration event source to solve bootstrap race 
 (CASSANDRA-6648)

 Maybe should you give this new version a try, if you suspect your issue to 
 be related to CASSANDRA-6648.

 6648 appears to have been introduced in 1.2.14, by :

 https://issues.apache.org/jira/browse/CASSANDRA-6615

 So it should only affect 1.2.14.

 =Rob




Exception in cassandra logs while processing the message

2014-02-14 Thread ankit tyagi
Hello,

I am seeing below exception in my cassandra
logs(/var/log/cassandra/system.log).

INFO [ScheduledTasks:1] 2014-02-13 13:13:57,641 GCInspector.java (line 119)
GC for ParNew: 273 ms for 1 collections, 2319121816 used; max is 445
6448000
 INFO [ScheduledTasks:1] 2014-02-13 13:14:02,695 GCInspector.java (line
119) GC for ParNew: 214 ms for 1 collections, 2315368976 used; max is 445
6448000
 INFO [OptionalTasks:1] 2014-02-13 13:14:08,093 MeteredFlusher.java (line
64) flushing high-traffic column family CFS(Keyspace='comsdb', ColumnFa
mily='product_update') (estimated 213624220 bytes)
 INFO [OptionalTasks:1] 2014-02-13 13:14:08,093 ColumnFamilyStore.java
(line 626) Enqueuing flush of Memtable-product_update@1067619242(31239028/
213625108 serialized/live bytes, 222393 ops)
 INFO [FlushWriter:94] 2014-02-13 13:14:08,127 Memtable.java (line 400)
Writing Memtable-product_update@1067619242(31239028/213625108 serialized/
live bytes, 222393 ops)
 INFO [ScheduledTasks:1] 2014-02-13 13:14:08,696 GCInspector.java (line
119) GC for ParNew: 214 ms for 1 collections, 2480175160 used; max is 445
6448000
* INFO [FlushWriter:94] 2014-02-13 13:14:10,836 Memtable.java (line 438)
Completed flushing /cassandra1/data/comsdb/product_update/comsdb-product_*
*update-ic-416-Data.db (15707248 bytes) for commitlog position
ReplayPosition(segmentId=1391568233618, position=13712751)*
*ERROR [Thrift:13] 2014-02-13 13:15:45,694 CustomTThreadPoolServer.java
(line 213) Thrift error occurred during processing of message.*
*org.apache.thrift.TException: Negative length: -2147418111*
at
org.apache.thrift.protocol.TBinaryProtocol.checkReadLength(TBinaryProtocol.java:388)
at
org.apache.thrift.protocol.TBinaryProtocol.readBinary(TBinaryProtocol.java:363)
at
org.apache.cassandra.thrift.Cassandra$batch_mutate_args.read(Cassandra.java:20304)
at
org.apache.thrift.ProcessFunction.process(ProcessFunction.java:21)
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:34)
at
org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:199)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:679)
ERROR [Thrift:103] 2014-02-13 13:21:25,719 CustomTThreadPoolServer.java
(line 213) Thrift error occurred during processing of message.
org.apache.thrift.TException: Negative length: -2147418111


Below is my cassandra version and hector client version, which is being
used currently.

Cassandra-version: 1.2.11
Hector-client: 1.0-2

Any lead would be appreciated though we are planning to move cassandra 2.0
version with java-driver but it may take some time meanwhile need to find
the root cause and resolve this issue.


Regards,
Ankit Tyagi


Expired column showing up

2014-02-14 Thread mahesh rajamani
Hi,

I am using Cassandra 2.0.2 version. On a wide row (approx. 1 columns),
I expire few column by setting TTL as 1 second. At times these columns show
up during slice query.

When I have this issue, running count and get commands for that row using
Cassandra cli it gives different column counts.

But once I run flush and compact, the issue goes off and expired columns
don't show up.

Can someone provide some help on this issue.

-- 
Regards,
Mahesh Rajamani


Re: Expired column showing up

2014-02-14 Thread Yogi Nerella
I am just learning, I don't know answer to your question, but What is the
use case for TTL as 1 second?




On Fri, Feb 14, 2014 at 6:45 AM, mahesh rajamani
rajamani.mah...@gmail.comwrote:

 Hi,

 I am using Cassandra 2.0.2 version. On a wide row (approx. 1 columns),
 I expire few column by setting TTL as 1 second. At times these columns show
 up during slice query.

 When I have this issue, running count and get commands for that row using
 Cassandra cli it gives different column counts.

 But once I run flush and compact, the issue goes off and expired columns
 don't show up.

 Can someone provide some help on this issue.

 --
 Regards,
 Mahesh Rajamani



Re: Bootstrap failure on C* 1.2.13

2014-02-14 Thread Paulo Ricardo Motta Gomes
Hello Alain,

I solved this with a brute force solution, but didn't understand exactly
what happened behind the scenes. What I did was:

a) removed the failed node from the ring with the unsafeAssassinate JMX
option.
b) this caused requests to that node to be routed to the following node
which didn't have the data, so in order to fix the problem I inserted a new
dummy node with the same token as the failed node, but with
autobootstrap=false
c) after the node joined the ring again, I did a clean shutdown with
nodetool -h localhost disablethrift
nodetool -h localhost disablegossip  sleep 10
nodetool -h localhost drain
d) restart the bootstrap process again in the new node.

But in our case, our cluster was not using VNodes, so this workaround will
probably not work with VNodes, since you cannot specify the 256 tokens from
the old node.

This really seem like some kind of metadata inconsistency in gossip, so you
probably should check if your nodetool gossipinfo shows a node that's not
supposed to be in the ring and unsafeAssassinate it. This post has more
info about it: http://nartax.com/2012/09/assassinate-cassandra-node/

But be careful to know what you're doing, as this can be a dangerous
operation.

Good luck!

Cheers,

Paulo




On Fri, Feb 14, 2014 at 11:17 AM, Alain RODRIGUEZ arodr...@gmail.comwrote:

 Hi Paulo,

 Did you find out how to fix this issue ? I am experimenting the exact same
 issue after trying to help you on this exact subject a few days ago :).

 Config : 32 C*1.2.11 nodes, Vnodes enabled, RF=3, 1 DC, On AWS EC2
 m1.xlarge.

 We added a few nodes (4) and it seems that this occurs on one node out of
 two...

 INFO 12:52:16,889 Finished streaming session
 d5e4d014-9558-11e3-950d-cd6aba92807e from /xxx.xxx.xxx.xxx
 java.lang.RuntimeException: Unable to fetch range
 [(20078703525355016727168231761171377180,20105424945623564908585534414693308183],
 (129753652951782325468767616123724624016,129754698153613057562227134647005586420],
 (449910615740630024413140540076738,4524540663392564361402125588359485564],
 (122461441134035840782923349842361962551,122462803389597917496737056756119104930],
 (107970238065835199457922160357012606207,107987706615224138615506976884972465320],
 (129754698153613057562227134647005586420,129760990520285412763184172827801136526],
 (38338043252657275110873170917842646549,38368318768493907804399955985800320618],
 (42022774431506526693485667522039962965,42053289032932587102300879230918436885],
 (66836265760288088017242608238099612345,66844191330959602627129212011239690831],
 (52540232739182066369547232798226785314,52559117354438503565212218200939569114],
 (145046787539667961591986998676504957238,145057153206926436867917708334845130444],
 (108279691586280658015556401795266720050,108305470056478513440634738885678702409],
 (40039571254531814244837067525035822613,40053379084508254942645157728035688263],
 (132027653159543236812527609067336099062,132029648290617316887203744857701890860],
 (52516518106546460227349801041398186304,52540232739182066369547232798226785314],
 (151797253868519929321029931533765036527,151828244658375264200603444399788004805],
 (145057153206926436867917708334845130444,145084033851007428646660791831082771964],
 (107963567982152736714636832273817259428,107970238065835199457922160357012606207]]
 for keyspace foo_bar from any hosts

 at org.apache.cassandra.dht.RangeStreamer.fetch(RangeStreamer.java:260)
 at org.apache.cassandra.dht.BootStrapper.bootstrap(BootStrapper.java:84)
 at
 org.apache.cassandra.service.StorageService.bootstrap(StorageService.java:973)
 at
 org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:740)
 at
 org.apache.cassandra.service.StorageService.initServer(StorageService.java:584)
 at
 org.apache.cassandra.service.StorageService.initServer(StorageService.java:481)
 at
 org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:348)
 at
 org.apache.cassandra.service.CassandraDaemon.init(CassandraDaemon.java:381)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at
 org.apache.commons.daemon.support.DaemonLoader.load(DaemonLoader.java:212)

 Cannot load daemon

 Service exit with a return value of 3

 Hope you'll be able to help me on this one :)


 2014-02-07 19:24 GMT+01:00 Robert Coli rc...@eventbrite.com:

 On Fri, Feb 7, 2014 at 4:41 AM, Alain RODRIGUEZ arodr...@gmail.comwrote:

 From changelog :



 1.2.15
  * Move handling of migration event source to solve bootstrap race 
 (CASSANDRA-6648)

 Maybe should you give this new version a try, if you suspect your issue to 
 be related to CASSANDRA-6648.

 6648 appears to have been introduced in 1.2.14, by :

 https://issues.apache.org/jira/browse/CASSANDRA-6615

 So it should only affect 1.2.14.

 =Rob






Re: Expired column showing up

2014-02-14 Thread Edward Capriolo
You should upgrade. Cassandra 2.0.2 is not the latest version. If you still
have the problem report a bug.


On Fri, Feb 14, 2014 at 12:50 PM, Yogi Nerella ynerella...@gmail.comwrote:

 I am just learning, I don't know answer to your question, but What is the
 use case for TTL as 1 second?




 On Fri, Feb 14, 2014 at 6:45 AM, mahesh rajamani 
 rajamani.mah...@gmail.com wrote:

 Hi,

 I am using Cassandra 2.0.2 version. On a wide row (approx. 1
 columns), I expire few column by setting TTL as 1 second. At times these
 columns show up during slice query.

 When I have this issue, running count and get commands for that row using
 Cassandra cli it gives different column counts.

 But once I run flush and compact, the issue goes off and expired columns
 don't show up.

 Can someone provide some help on this issue.

 --
 Regards,
 Mahesh Rajamani





Re: Intermittent long application pauses on nodes

2014-02-14 Thread Frank Ng
Sorry, I have not had a chance to file a JIRA ticket.  We have not been
able to resolve the issue.  But since Joel mentioned that upgrading to
Cassandra 2.0.X solved it for them, we may need to upgrade.  We are
currently on Java 1.7 and Cassandra 1.2.8



On Thu, Feb 13, 2014 at 12:40 PM, Keith Wright kwri...@nanigans.com wrote:

 You're running 2.0.* in production?  May I ask what C* version and OS?
  Any hardware details would be appreciated as well.  Thx!

 From: Joel Samuelsson samuelsson.j...@gmail.com
 Reply-To: user@cassandra.apache.org user@cassandra.apache.org
 Date: Thursday, February 13, 2014 at 11:39 AM

 To: user@cassandra.apache.org user@cassandra.apache.org
 Subject: Re: Intermittent long application pauses on nodes

 We have had similar issues and upgrading C* to 2.0.x and Java to 1.7 seems
 to have helped our issues.


 2014-02-13 Keith Wright kwri...@nanigans.com:

 Frank did you ever file a ticket for this issue or find the root cause?
  I believe we are seeing the same issues when attempting to bootstrap.

 Thanks

 From: Robert Coli rc...@eventbrite.com
 Reply-To: user@cassandra.apache.org user@cassandra.apache.org
 Date: Monday, February 3, 2014 at 6:10 PM
 To: user@cassandra.apache.org user@cassandra.apache.org
 Subject: Re: Intermittent long application pauses on nodes

 On Mon, Feb 3, 2014 at 8:52 AM, Benedict Elliott Smith 
 belliottsm...@datastax.com wrote:


 It's possible that this is a JVM issue, but if so there may be some
 remedial action we can take anyway. There are some more flags we should
 add, but we can discuss that once you open a ticket. If you could include
 the strange JMX error as well, that might be helpful.


 It would be appreciated if you could inform this thread of the JIRA
 ticket number, for the benefit of the community and google searchers. :)

 =Rob





Re: Expired column showing up

2014-02-14 Thread horschi
Hi Mahesh,

is it possible you are creating columns with a long TTL, then update these
columns with a smaller TTL?

kind regards,
Christian


On Fri, Feb 14, 2014 at 3:45 PM, mahesh rajamani
rajamani.mah...@gmail.comwrote:

 Hi,

 I am using Cassandra 2.0.2 version. On a wide row (approx. 1 columns),
 I expire few column by setting TTL as 1 second. At times these columns show
 up during slice query.

 When I have this issue, running count and get commands for that row using
 Cassandra cli it gives different column counts.

 But once I run flush and compact, the issue goes off and expired columns
 don't show up.

 Can someone provide some help on this issue.

 --
 Regards,
 Mahesh Rajamani



Re: Expired column showing up

2014-02-14 Thread Jacob Rhoden
It is my understanding that rows with TTLs don't mix well with rows that don't 
have TTLs. ie they should all have TTL or all not have TTL. 

That said if you can create a small java class (test case) that demonstrates 
the problem, I'm happy to try it out on 2.0.5. This code can be attached to a 
jira ticket if needed.

__
Sent from iPhone

 On 15 Feb 2014, at 1:45 am, mahesh rajamani rajamani.mah...@gmail.com wrote:
 
 Hi,
 
 I am using Cassandra 2.0.2 version. On a wide row (approx. 1 columns),
 I expire few column by setting TTL as 1 second. At times these columns show
 up during slice query.
 
 When I have this issue, running count and get commands for that row using
 Cassandra cli it gives different column counts.
 
 But once I run flush and compact, the issue goes off and expired columns
 don't show up.
 
 Can someone provide some help on this issue.
 
 -- 
 Regards,
 Mahesh Rajamani


Re: Bootstrap failure on C* 1.2.13

2014-02-14 Thread Robert Coli
On Fri, Feb 14, 2014 at 10:08 AM, Paulo Ricardo Motta Gomes 
paulo.mo...@chaordicsystems.com wrote:

 But in our case, our cluster was not using VNodes, so this workaround will
 probably not work with VNodes, since you cannot specify the 256 tokens from
 the old node.


Sure you can, in a comma delimited list. I plan to write a short blog post
about this, but...

I recommend that anyone using Cassandra, vnodes or not, always explicitly
populate their initial_token line in cassandra.yaml. There are a number of
cases where you will lose if you do not do so, and AFAICT no cases where
you lose by doing so.

If one is using vnodes and wants to do this, the process goes like :

1) set num_tokens to the desired number of vnodes
2) start node/bootstrap
3) use a one liner like jeffj's :

nodetool info -T | grep ^Token | awk '{ print $3 }' | tr \\n , | sed -e
's/,$/\n/'

to get a comma delimited list of the vnode tokens
4) insert this comma delimited list in initial_token, and comment out
num_tokens (though it is a NOOP)

=Rob


Re: supervisord and cassandra

2014-02-14 Thread David Montgomery
Hi,

Using now oracle 7.  commented out the line StringTableSize=103

same issue.  but nothing in the log file now.


but I start from, the command line the works.


Thanks





On Fri, Feb 14, 2014 at 9:48 AM, Michael Shuler mich...@pbandjelly.orgwrote:

 On 02/13/2014 07:03 PM, David Montgomery wrote:

 I only added the -f flag after the first time it did not work.  If I
 dont use the -f flag.

 cassandra_server:cassandra   FATAL  Exited too quickly (process
 log may have details)


 From your original message:


  Unrecognized VM option 'StringTableSize=103'
  Could not create the Java virtual machine.

 Comment out the -XX:StringTableSize=103 line in conf/cassandra-env.sh
 and see what happens.


  java version 1.7.0_25
  OpenJDK Runtime Environment (IcedTea 2.3.10)
 (7u25-2.3.10-1ubuntu0.12.04.2)

 Use Oracle's JVM and see what happens.

 --
 Michael



Re: supervisord and cassandra

2014-02-14 Thread Michael Shuler

On 02/14/2014 06:58 PM, David Montgomery wrote:

Hi,

Using now oracle 7.  commented out the line StringTableSize=103
same issue.  but nothing in the log file now.

but I start from, the command line the works.


What user are you running c* with, when running from the command line? 
What user is running c* via supervisord?


--
Michael



Re: supervisord and cassandra

2014-02-14 Thread Michael Shuler

On 02/14/2014 07:34 PM, Michael Shuler wrote:

On 02/14/2014 06:58 PM, David Montgomery wrote:

Hi,

Using now oracle 7.  commented out the line StringTableSize=103
same issue.  but nothing in the log file now.

but I start from, the command line the works.


What user are you running c* with, when running from the command line?
What user is running c* via supervisord?


So you peaked my interest and tried supervisord in a vm.  I think you 
need to probably go hit up the supervisord community for some how do I 
do this correctly questions.


Attached a console log and the conf I used.  Here's what I did:

- installed c* 2.0.5 with /var/{lib,log}/cassandra owned by my user, as 
usual

- verified c* runs fine from the command line
- killed c*
- installed supervisor package and added the attached conf
- stopped/started supervisord to pick up the new conf
- c* is running fine and nodetool confirms
- supervisorctl status shows ignorance of c* running (wrong config, I 
assume)

- stopped supervisord, c* still running (not sure if this is normal..)

I have never played with supervisord.  It's interesting, but my guess is 
there is some additional magic needed by supervisor experts to help you 
with a properly behaving configuration.


Good luck and do report back with a good config for the archives!

--
Kind regards,
Michael
mshuler@debian:~$ ps axu|grep [j]ava
mshuler@debian:~$ 
mshuler@debian:~$ sudo invoke-rc.d supervisor start
Starting supervisor: supervisord.
mshuler@debian:~$ 
mshuler@debian:~$ ps axu|grep [j]ava
mshuler   5313 75.6 16.1 1053044 166152 ?  Sl   19:58   0:03 java -ea 
-javaagent:/opt/cassandra/bin/../lib/jamm-0.2.5.jar -XX:+UseThreadPriorities 
-XX:ThreadPriorityPolicy=42 -Xms501M -Xmx501M -Xmn100M 
-XX:+HeapDumpOnOutOfMemoryError -Xss256k -XX:StringTableSize=103 
-XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled 
-XX:SurvivorRatio=8 -XX:MaxTenuringThreshold=1 
-XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly 
-XX:+UseTLAB -XX:+UseCondCardMark -Djava.net.preferIPv4Stack=true 
-Dcom.sun.management.jmxremote.port=7199 
-Dcom.sun.management.jmxremote.ssl=false 
-Dcom.sun.management.jmxremote.authenticate=false 
-Dlog4j.configuration=log4j-server.properties -Dlog4j.defaultInitOverride=true 
-cp 
/opt/cassandra/bin/../conf:/opt/cassandra/bin/../build/classes/main:/opt/cassandra/bin/../build/classes/thrift:/opt/cassandra/bin/../lib/antlr-3.2.jar:/opt/cassandra/bin/../lib/apache-cassandra-2.0.5.jar:/opt/cassandra/bin/../lib/apache-cassandra-clientutil-2.0.5.jar:/opt/cassandra/bin/../lib/apache-cassandra-thrift-2.0.5.jar:/opt/cassandra/bin/../lib/commons-cli-1.1.jar:/opt/cassandra/bin/../lib/commons-codec-1.2.jar:/opt/cassandra/bin/../lib/commons-lang3-3.1.jar:/opt/cassandra/bin/../lib/compress-lzf-0.8.4.jar:/opt/cassandra/bin/../lib/concurrentlinkedhashmap-lru-1.3.jar:/opt/cassandra/bin/../lib/disruptor-3.0.1.jar:/opt/cassandra/bin/../lib/guava-15.0.jar:/opt/cassandra/bin/../lib/high-scale-lib-1.1.2.jar:/opt/cassandra/bin/../lib/jackson-core-asl-1.9.2.jar:/opt/cassandra/bin/../lib/jackson-mapper-asl-1.9.2.jar:/opt/cassandra/bin/../lib/jamm-0.2.5.jar:/opt/cassandra/bin/../lib/jbcrypt-0.3m.jar:/opt/cassandra/bin/../lib/jline-1.0.jar:/opt/cassandra/bin/../lib/json-simple-1.1.jar:/opt/cassandra/bin/../lib/libthrift-0.9.1.jar:/opt/cassandra/bin/../lib/log4j-1.2.16.jar:/opt/cassandra/bin/../lib/lz4-1.2.0.jar:/opt/cassandra/bin/../lib/metrics-core-2.2.0.jar:/opt/cassandra/bin/../lib/netty-3.6.6.Final.jar:/opt/cassandra/bin/../lib/reporter-config-2.1.0.jar:/opt/cassandra/bin/../lib/servlet-api-2.5-20081211.jar:/opt/cassandra/bin/../lib/slf4j-api-1.7.2.jar:/opt/cassandra/bin/../lib/slf4j-log4j12-1.7.2.jar:/opt/cassandra/bin/../lib/snakeyaml-1.11.jar:/opt/cassandra/bin/../lib/snappy-java-1.0.5.jar:/opt/cassandra/bin/../lib/snaptree-0.1.jar:/opt/cassandra/bin/../lib/thrift-server-0.3.3.jar
 org.apache.cassandra.service.CassandraDaemon
mshuler@debian:~$ 
mshuler@debian:~$ sudo supervisorctl status
cassandra_server:cassandra   FATAL  Exited too quickly (process log may 
have details)
mshuler@debian:~$ 
mshuler@debian:~$ /opt/cassandra/bin/nodetool status
Datacenter: datacenter1
===
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  AddressLoad   Tokens  Owns (effective)  Host ID 
  Rack
UN  127.0.0.1  114.22 KB  256 100.0%
3a7fde73-b1ca-4503-b5f2-b4cd9b41032c  rack1
mshuler@debian:~$ 
mshuler@debian:~$ killall java
mshuler@debian:~$ 
mshuler@debian:~$ ps axu|grep [j]ava
mshuler@debian:~$ 
mshuler@debian:~$ ps axu|grep [j]ava
mshuler@debian:~$ 
mshuler@debian:~$ date
Fri Feb 14 20:00:09 CST 2014
mshuler@debian:~$ 
mshuler@debian:~$ ps axu|grep [j]ava
mshuler@debian:~$ 
mshuler@debian:~$ cat /tmp/cassandra.*
mshuler@debian:~$ 
mshuler@debian:~$ grep cassandra /var/log/supervisor/supervisord.log
2014-02-14 19:54:11,521 WARN Included extra file 

Re: supervisord and cassandra

2014-02-14 Thread Michael Shuler

On 02/14/2014 08:10 PM, Michael Shuler wrote:

Attached a console log and the conf I used.  Here's what I did:

- installed c* 2.0.5 with /var/{lib,log}/cassandra owned by my user, as
usual
- verified c* runs fine from the command line
- killed c*
- installed supervisor package and added the attached conf
- stopped/started supervisord to pick up the new conf
- c* is running fine and nodetool confirms
- supervisorctl status shows ignorance of c* running (wrong config, I
assume)


I missed a couple steps in here:

- killed c* and supervisor never restarted it
- restarted supervisor service, which starts up c* fine
  (not in my console paste)


- stopped supervisord, c* still running (not sure if this is normal..)

  (not in my console paste)

Anyway, let us know how this works out!

--
Michael


Re: supervisord and cassandra

2014-02-14 Thread Michael Shuler

On 02/14/2014 08:10 PM, Michael Shuler wrote:

mshuler@debian:~$ sudo supervisorctl status
cassandra_server:cassandra   FATAL  Exited too quickly (process log may 
have details)


I imagine the problems all stem from the fact that the initializing 
script, in my case, /opt/cassandra/bin/cassandra, is executed and it's 
done starting c* (Exited too quickly), and the process that actually 
needs to be supervised is the java process (thus the ignorance that it 
is running, and the fact that killing it is not recognized).


--
Michael


Re: supervisord and cassandra

2014-02-14 Thread Michael Shuler

On 02/14/2014 08:27 PM, Michael Shuler wrote:

On 02/14/2014 08:10 PM, Michael Shuler wrote:

mshuler@debian:~$ sudo supervisorctl status
cassandra_server:cassandra   FATAL  Exited too quickly
(process log may have details)


I imagine the problems all stem from the fact that the initializing
script, in my case, /opt/cassandra/bin/cassandra, is executed and it's
done starting c* (Exited too quickly), and the process that actually
needs to be supervised is the java process (thus the ignorance that it
is running, and the fact that killing it is not recognized).


Yup.
https://lists.supervisord.org/pipermail/supervisor-users/2012-December/001207.html

--
Michael


Re: supervisord and cassandra

2014-02-14 Thread Michael Shuler

On 02/14/2014 08:32 PM, Michael Shuler wrote:

On 02/14/2014 08:27 PM, Michael Shuler wrote:

On 02/14/2014 08:10 PM, Michael Shuler wrote:

mshuler@debian:~$ sudo supervisorctl status
cassandra_server:cassandra   FATAL  Exited too quickly
(process log may have details)


I imagine the problems all stem from the fact that the initializing
script, in my case, /opt/cassandra/bin/cassandra, is executed and it's
done starting c* (Exited too quickly), and the process that actually
needs to be supervised is the java process (thus the ignorance that it
is running, and the fact that killing it is not recognized).


Yup.
https://lists.supervisord.org/pipermail/supervisor-users/2012-December/001207.html


(Self reply again..)

with cassandra -f, which is not being backgrounded in the exec line of 
the script, my conf works:


mshuler@debian:~$ sudo supervisorctl status
cassandra_server:cassandra   RUNNINGpid 2784, uptime 0:00:23

mshuler@debian:~$ pkill java
mshuler@debian:~$ ps axu|grep java
mshuler   2988  0.0  0.0   7828   876 pts/0S+   20:45   0:00 grep java
mshuler@debian:~$ ps axu|grep java
mshuler   2989 18.1 16.4 1056412 168492 ?  Sl   20:45   0:04 java 
-ea -javaagent:/opt/cassandra/bin/../lib/jamm-0.2.5.jar 
-XX:+UseThreadPriorities ...


and in the supervisor log:

2014-02-14 20:42:23,067 INFO daemonizing the supervisord process
2014-02-14 20:42:23,067 INFO supervisord started with pid 2777
2014-02-14 20:42:24,072 INFO spawned: 'cassandra' with pid 2784
2014-02-14 20:42:39,302 INFO success: cassandra entered RUNNING state, 
process has stayed up for  than 15 seconds (startsecs)
2014-02-14 20:45:57,131 INFO exited: cassandra (exit status 143; not 
expected)

2014-02-14 20:45:58,134 INFO spawned: 'cassandra' with pid 2989
2014-02-14 20:46:13,241 INFO success: cassandra entered RUNNING state, 
process has stayed up for  than 15 seconds (startsecs)


--
Michael



Re: supervisord and cassandra

2014-02-14 Thread Michael Shuler
So.. see the rest of my replies for a working configuration, but I 
wanted to reply to your initial post.


What problem are you trying to solve, and why do you think using 
supervisord to restart a failed c* node will help?


You really don't want a node to be bouncing up and down..  A dead or 
dieing node should stay down, until you can troubleshoot *why* it is 
dead or dieing and determine if it should be replaced by a new node, or 
has a repairable issue that will allow you to rejoin it to the ring.


--
Kind regards,
Michael


Re: supervisord and cassandra

2014-02-14 Thread David Montgomery
I had to give up on supervisor.  I installed the deb package rather than
from source.  that worked though.

thanks


On Sat, Feb 15, 2014 at 10:10 AM, Michael Shuler mich...@pbandjelly.orgwrote:

 On 02/14/2014 07:34 PM, Michael Shuler wrote:

 On 02/14/2014 06:58 PM, David Montgomery wrote:

 Hi,

 Using now oracle 7.  commented out the line StringTableSize=103
 same issue.  but nothing in the log file now.

 but I start from, the command line the works.


 What user are you running c* with, when running from the command line?
 What user is running c* via supervisord?


 So you peaked my interest and tried supervisord in a vm.  I think you need
 to probably go hit up the supervisord community for some how do I do this
 correctly questions.

 Attached a console log and the conf I used.  Here's what I did:

 - installed c* 2.0.5 with /var/{lib,log}/cassandra owned by my user, as
 usual
 - verified c* runs fine from the command line
 - killed c*
 - installed supervisor package and added the attached conf
 - stopped/started supervisord to pick up the new conf
 - c* is running fine and nodetool confirms
 - supervisorctl status shows ignorance of c* running (wrong config, I
 assume)
 - stopped supervisord, c* still running (not sure if this is normal..)

 I have never played with supervisord.  It's interesting, but my guess is
 there is some additional magic needed by supervisor experts to help you
 with a properly behaving configuration.

 Good luck and do report back with a good config for the archives!

 --
 Kind regards,
 Michael