Fwd: Re: virtual memory of all cassandra-nodes is growing extremly since Cassandra 1.1.0

2012-08-01 Thread Thomas Spengler
we have monitoring of std *nix stuff via zabbix
and cassandra, as all other java via mbeans and zabbix

Best
Tom


 Original Message 
Subject: Re: virtual memory of all cassandra-nodes is growing extremly
since Cassandra 1.1.0
Date: Wed, 1 Aug 2012 14:43:17 -0500
From: Greg Fausak 
Reply-To: user@cassandra.apache.org
To: user@cassandra.apache.org

Mina,

Thanks for that post.  Very interesting :-)

What sort of things are you graphing?  Standard *nux stuff
(mem/cpu/etc)?  Or do you
have some hooks in to the C* process (I saw somoething about port 1414
in the .yaml file).

Best,

-g


On Thu, Jul 26, 2012 at 9:27 AM, Mina Naguib
 wrote:
>
> Hi Thomas
>
> On a modern 64bit server, I recommend you pay little attention to the virtual 
> size.  It's made up of almost everything within the process's address space, 
> including on-disk files mmap()ed in for zero-copy access.  It's not 
> unreasonable for a machine with N amount RAM to have a process whose virtual 
> size is several times the value of N.  That in and of itself is not 
> problematic
>
> In a default cassandra 1.1.x setup, the bulk of that will be your sstables' 
> data and index files.  On linux you can invoke the "pmap" tool on the 
> cassandra process's PID to see what's in there.  Much of it will be anonymous 
> memory allocations (the JVM heap itself, off-heap data structures, etc), but 
> lots of it will be references to files on disk (binaries, libraries, mmap()ed 
> files, etc).
>
> What's more important to keep an eye on is the JVM heap - typically 
> statically allocated to a fixed size at cassandra startup.  You can get info 
> about its used/capacity values via "nodetool -h localhost info".  You can 
> also hook up jconsole and trend it over time.
>
> The other critical piece is the process's RESident memory size, which 
> includes the JVM heap but also other off-heap data structures and 
> miscellanea.  Cassandra has recently been making more use of off-heap 
> structures (for example, row caching via SerializingCacheProvider).  This is 
> done as a matter of efficiency - a serialized off-heap row is much smaller 
> than a classical object sitting in the JVM heap - so you can do more with 
> less.
>
> Unfortunately, in my experience, it's not perfect.  They still have a cost, 
> in terms of on-heap usage, as well as off-heap growth over time.
>
> Specifically, my experience with cassandra 1.1.0 showed that off-heap row 
> caches incurred a very high on-heap cost (ironic) - see my post at 
> http://mail-archives.apache.org/mod_mbox/cassandra-user/201206.mbox/%3c6feb097f-287b-471d-bea2-48862b30f...@bloomdigital.com%3E
>  - as documented in that email, I managed that with regularly scheduled full 
> GC runs via System.gc()
>
> I have, since then, moved away from scheduled System.gc() to scheduled row 
> cache invalidations.  While this had the same effect as System.gc() I 
> described in my email, it eliminated the 20-30 second pause associated with 
> it.  It did however introduce (or may be I never noticed earlier), slow creep 
> in memory usage outside of the heap.
>
> It's typical in my case for example for a process configured with 6G of JVM 
> heap to start up, stabilize at 6.5 - 7GB RESident usage, then creep up slowly 
> throughout a week to 10-11GB range.  Depending on what else the box is doing, 
> I've experienced the linux OOM killer killing cassandra as you've described, 
> or heavy swap usage bringing everything down (we're latency-sensitive), etc..
>
> And now for the good news.  Since I've upgraded to 1.1.2:
> 1. There's no more need for regularly scheduled System.gc()
> 2. There's no more need for regularly scheduled row cache invalidation
> 3. The HEAP usage within the JVM is stable over time
> 4. The RESident size of the process appears also stable over time
>
> Point #4 above is still pending as I only have 3 day graphs since the 
> upgrade, but they show promising results compared to the slope of the same 
> graph before the upgrade to 1.1.2
>
> So my advice is give 1.1.2 a shot - just be mindful of 
> https://issues.apache.org/jira/browse/CASSANDRA-4411
>
>
> On 2012-07-26, at 2:18 AM, Thomas Spengler wrote:
>
>> I saw this.
>>
>> All works fine upto version 1.1.0
>> the 0.8.x takes 5GB of memory of an 8GB machine
>> the 1.0.x takes between 6 and 7 GB on a 8GB machine
>> and
>> the 1.1.0 takes all
>>
>> and it is a problem
>> for me it is no solution to wait of the OOM-Killer from the linux kernel
>> and restart the cassandraprocess
>>
>> when my machine has less then 100MB ram available then I have a problem.
>>
>>
>>
>> On 07/25/2012 07:06 PM, Tyler Hobbs wrote:
>>> Are you actually seeing any problems from this? High virtual memory usage
>>> on its own really doesn't mean anything. See
>>> http://wiki.apache.org/cassandra/FAQ#mmap
>>>
>>> On Wed, Jul 25, 2012 at 1:21 AM, Thomas Spengler <
>>> thomas.speng...@toptarif.de> wrote:
>>>
 No one has any idea?

 

Re: Does Cassandra support operations in a transaction?

2012-08-01 Thread Jeffrey Kesselman
True consistancy, btw,  pretty much is only possible in a transactional
environment.

On Thu, Aug 2, 2012 at 12:56 AM, Jeffrey Kesselman  wrote:

> Roshni,
>
> Thats not what consistancy in ACID means.  Its not consistancy of reading
> the ame data, its referntial integrity between related pecies of data.
>
> "Consistency
> Data is in a consistent state when a transaction starts and when it ends. For
> example, in an application that transfers funds from one account to
> another, the consistency property ensures that the total value of funds in
> both the accounts is the same at the start and end of each transaction. "
>
> http://publib.boulder.ibm.com/infocenter/cicsts/v3r2/index.jsp?topic=%2Fcom.ibm.cics.ts.productoverview.doc%2Fconcepts%2Facid.html
>
> A lot of people i nthe NoSql wqorld use the term "consistancy" when what
> they mean is "durability."
>
> " Durability After a transaction successfully completes, changes to data
> persist and are not undone, even in the event of a system failure. "
>
> Many NoSql databses (includiogn Cassandra) are eventuallydurable, in the
> sense that a read immediately after a write may noit reflect that write,
> but at soem l;ater point, it will.
>
> None p[rovide true consistancy that I am aware of.
>
>
>
> :
>
> On Thu, Aug 2, 2012 at 12:24 AM, Roshni Rajagopal <
> roshni.rajago...@wal-mart.com> wrote:
>
>> Hi Ivan,
>>
>> Cassandra supports 'tunable consistency' . If you always read and write
>> at a quorum (or local quorum for multi data center) from one , you can
>> guarantee that the results will be consistent as in all the data will be
>> compared and the latest will be returned, and no data will be out of date.
>> This is at a loss of performance- it will be fastest to just read and write
>> once rather than check a quorum of nodes.
>>
>> What you chose depends on what your application needs are. Is it ok if
>> some users receive out of date data (it isn't earth shattering if someone
>> doesn't know what you're eating right now), or is it a banking transaction
>> system where all entities must be consistently updated.
>>
>> So designing in cassandra priortizes de-normalization. You cannot have
>> referential integrity that 2 tables (col families in cassandra) are in sync
>> because the database has designed it to be so using foreign keys. The
>> application needs to ensure that all data in column families are accurate
>> and not out of sync, because data elements may be duplicated in different
>> col families.
>>
>>
>> You cannot have 2 different entities and ensure that changes to both will
>> be done and then only be visible to others.
>>
>>
>> Regards,
>>
>>
>> From: Jeffrey Kesselman mailto:jef...@gmail.com>>
>> Reply-To: "user@cassandra.apache.org" <
>> user@cassandra.apache.org>
>> To: "user@cassandra.apache.org" <
>> user@cassandra.apache.org>
>> Subject: Re: Does Cassandra support operations in a transaction?
>>
>> Short story is that few if any of the NoSql systems supprot transactions
>> natively. Thats oen of the big compromises they make.  What they call
>> "eventual consistancy" is actually eventual Durabiltiy in ACID terms.
>>
>> Consistancy, as meant by the C in ACID,  is not gauranteed at all.
>>
>> On Wed, Aug 1, 2012 at 6:21 AM, Ivan Jiang > wiwi1...@gmail.com>> wrote:
>> Hi,
>> I am a new guy to Cassandra, I wonder if available to call Cassandra
>> in one Transaction such as in Relation-DB.
>>
>> Thanks in advance.
>>
>> Best Regards,
>> Ivan Jiang
>>
>>
>>
>> --
>> It's always darkest just before you are eaten by a grue.
>>
>> This email and any files transmitted with it are confidential and
>> intended solely for the individual or entity to whom they are addressed. If
>> you have received this email in error destroy it immediately. *** Walmart
>> Confidential ***
>>
>
>
>
> --
> It's always darkest just before you are eaten by a grue.
>



-- 
It's always darkest just before you are eaten by a grue.


Re: Does Cassandra support operations in a transaction?

2012-08-01 Thread Jeffrey Kesselman
Roshni,

Thats not what consistancy in ACID means.  Its not consistancy of reading
the ame data, its referntial integrity between related pecies of data.

"Consistency
Data is in a consistent state when a transaction starts and when it ends. For
example, in an application that transfers funds from one account to
another, the consistency property ensures that the total value of funds in
both the accounts is the same at the start and end of each transaction. "
http://publib.boulder.ibm.com/infocenter/cicsts/v3r2/index.jsp?topic=%2Fcom.ibm.cics.ts.productoverview.doc%2Fconcepts%2Facid.html

A lot of people i nthe NoSql wqorld use the term "consistancy" when what
they mean is "durability."

" Durability After a transaction successfully completes, changes to data
persist and are not undone, even in the event of a system failure. "

Many NoSql databses (includiogn Cassandra) are eventuallydurable, in the
sense that a read immediately after a write may noit reflect that write,
but at soem l;ater point, it will.

None p[rovide true consistancy that I am aware of.



:

On Thu, Aug 2, 2012 at 12:24 AM, Roshni Rajagopal <
roshni.rajago...@wal-mart.com> wrote:

> Hi Ivan,
>
> Cassandra supports 'tunable consistency' . If you always read and write at
> a quorum (or local quorum for multi data center) from one , you can
> guarantee that the results will be consistent as in all the data will be
> compared and the latest will be returned, and no data will be out of date.
> This is at a loss of performance- it will be fastest to just read and write
> once rather than check a quorum of nodes.
>
> What you chose depends on what your application needs are. Is it ok if
> some users receive out of date data (it isn't earth shattering if someone
> doesn't know what you're eating right now), or is it a banking transaction
> system where all entities must be consistently updated.
>
> So designing in cassandra priortizes de-normalization. You cannot have
> referential integrity that 2 tables (col families in cassandra) are in sync
> because the database has designed it to be so using foreign keys. The
> application needs to ensure that all data in column families are accurate
> and not out of sync, because data elements may be duplicated in different
> col families.
>
>
> You cannot have 2 different entities and ensure that changes to both will
> be done and then only be visible to others.
>
>
> Regards,
>
>
> From: Jeffrey Kesselman mailto:jef...@gmail.com>>
> Reply-To: "user@cassandra.apache.org" <
> user@cassandra.apache.org>
> To: "user@cassandra.apache.org" <
> user@cassandra.apache.org>
> Subject: Re: Does Cassandra support operations in a transaction?
>
> Short story is that few if any of the NoSql systems supprot transactions
> natively. Thats oen of the big compromises they make.  What they call
> "eventual consistancy" is actually eventual Durabiltiy in ACID terms.
>
> Consistancy, as meant by the C in ACID,  is not gauranteed at all.
>
> On Wed, Aug 1, 2012 at 6:21 AM, Ivan Jiang  wiwi1...@gmail.com>> wrote:
> Hi,
> I am a new guy to Cassandra, I wonder if available to call Cassandra
> in one Transaction such as in Relation-DB.
>
> Thanks in advance.
>
> Best Regards,
> Ivan Jiang
>
>
>
> --
> It's always darkest just before you are eaten by a grue.
>
> This email and any files transmitted with it are confidential and intended
> solely for the individual or entity to whom they are addressed. If you have
> received this email in error destroy it immediately. *** Walmart
> Confidential ***
>



-- 
It's always darkest just before you are eaten by a grue.


Re: Looking for a good Ruby client

2012-08-01 Thread Thorsten von Eicken
Harry, we're in a similar situation and are starting to work out our own
ruby client. The biggest issue is that it doesn't make much sense to
build a higher level abstraction on anything other than CQL3, given
where things are headed. At least this is our opinion.
At the same time, CQL3 is just barely becoming usable and still seems
rather deficient in wide-row usage. The tricky part is that with the
current CQL3 you have to construct quite complex iterators to retrieve a
large result set. Which means that you end up having to either parse
CQL3 coming in to insert the iteration stuff, or you have to pass CQL3
fragments in and compose them together with iterator clauses. Not fun
stuff either way.
The only good solution I see is to switch to a streaming protocol (or
build some form of "continue" on top of thrift) such that the client can
ask for a huge result set and the cassandra coordinator can break it
into sub-queries as it sees fit and return results chunk-by-chunk. If
this is really the path forward then all abstractions built above CQL3
before that will either have a good piece of complex code that can be
deleted or worse, will have an interface that is no longer best practice.
Good luck!
Thorsten


On 8/1/2012 1:47 PM, Harry Wilkinson wrote:
> Hi,
>
> I'm looking for a Ruby client for Cassandra that is pretty high-level.
>  I am really hoping to find a Ruby gem of high quality that allows a
> developer to create models like you would with ActiveModel.
>
> So far I have figured out that the canonical Ruby client for Cassandra
> is Twitter's Cassandra gem  of
> the same name.  It looks great - mature, still in active development,
> etc.  No stated support for Ruby 1.9.3 that I can see, but I can
> probably live with that for now.
>
> What I'm looking for is a higher-level gem built on that gem that
> works like ActiveModel in that you just include a module in your model
> class and that gives you methods to declare your model's serialized
> attributes and also the usual ActiveModel methods like 'save!',
> 'valid?', 'find', etc.
>
> I've been trying out some different NoSQL databases recently, and for
> example there is an official Ruby client
>  for Riak with a domain
> model that is close to Riak's, but then there's also a gem called
> 'Ripple'  that uses a domain
> model that is closer to what most Ruby developers are used to.  So it
> looks like Twitter's Cassandra gem is the one that stays close to the
> domain model of Cassandra, and what I'm looking for is a gem that's a
> Cassandra equivalent of RIpple.
>
> From some searching I found cassandra_object
> , which has been inactive
> for a couple of years, but there's a fork
>  that looks like it's
> being maintained, but I have not found any kind of information to
> suggest the maintained fork is in general use yet.  I have found quite
> a lot of gems of a similar style that people have started and then not
> really got very far with.
>
> So, does anybody know of a suitable gem?  Would you recommend it?  Or
> perhaps you would recommend not using such a gem and sticking with the
> lower-level client gem?
>
> Thanks in advance for your advice.
>
> Harry




Re: Does Cassandra support operations in a transaction?

2012-08-01 Thread Roshni Rajagopal
Hi Ivan,

Cassandra supports 'tunable consistency' . If you always read and write at a 
quorum (or local quorum for multi data center) from one , you can guarantee 
that the results will be consistent as in all the data will be compared and the 
latest will be returned, and no data will be out of date. This is at a loss of 
performance- it will be fastest to just read and write once rather than check a 
quorum of nodes.

What you chose depends on what your application needs are. Is it ok if some 
users receive out of date data (it isn't earth shattering if someone doesn't 
know what you're eating right now), or is it a banking transaction system where 
all entities must be consistently updated.

So designing in cassandra priortizes de-normalization. You cannot have 
referential integrity that 2 tables (col families in cassandra) are in sync 
because the database has designed it to be so using foreign keys. The 
application needs to ensure that all data in column families are accurate and 
not out of sync, because data elements may be duplicated in different col 
families.


You cannot have 2 different entities and ensure that changes to both will be 
done and then only be visible to others.


Regards,


From: Jeffrey Kesselman mailto:jef...@gmail.com>>
Reply-To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Subject: Re: Does Cassandra support operations in a transaction?

Short story is that few if any of the NoSql systems supprot transactions 
natively. Thats oen of the big compromises they make.  What they call "eventual 
consistancy" is actually eventual Durabiltiy in ACID terms.

Consistancy, as meant by the C in ACID,  is not gauranteed at all.

On Wed, Aug 1, 2012 at 6:21 AM, Ivan Jiang 
mailto:wiwi1...@gmail.com>> wrote:
Hi,
I am a new guy to Cassandra, I wonder if available to call Cassandra in one 
Transaction such as in Relation-DB.

Thanks in advance.

Best Regards,
Ivan Jiang



--
It's always darkest just before you are eaten by a grue.

This email and any files transmitted with it are confidential and intended 
solely for the individual or entity to whom they are addressed. If you have 
received this email in error destroy it immediately. *** Walmart Confidential 
***


Re: Does Cassandra support operations in a transaction?

2012-08-01 Thread Jeffrey Kesselman
Short story is that few if any of the NoSql systems supprot transactions
natively. Thats oen of the big compromises they make.  What they call
"eventual consistancy" is actually eventual Durabiltiy in ACID terms.

Consistancy, as meant by the C in ACID,  is not gauranteed at all.

On Wed, Aug 1, 2012 at 6:21 AM, Ivan Jiang  wrote:

> Hi,
> I am a new guy to Cassandra, I wonder if available to call Cassandra
> in one Transaction such as in Relation-DB.
>
> Thanks in advance.
>
> Best Regards,
> Ivan Jiang
>



-- 
It's always darkest just before you are eaten by a grue.


Re: Does Cassandra support operations in a transaction?

2012-08-01 Thread Ivan Jiang
Hi Greg,

Thank you for your answers.

I should have to convert my mind to NoSql from RD-SQL while using Cassandra.

Best Regards,
Ivan



On Wed, Aug 1, 2012 at 9:20 PM, Greg Fausak  wrote:

> Hi Ivan,
>
> No Cassandra does not support transactions.
>
> I believe each operation is atomic.  If that operation returns
> a successful result, then it worked.  You can't do things like
> bind two operations and guarantee is either fails they both fail.
>
> You will find that Cassandra doesn't do a lot of things compared to a sql
> db :-)
>
> But, it does write a lot of data quickly.
>
> -g
>
>
> On Wed, Aug 1, 2012 at 5:21 AM, Ivan Jiang  wrote:
> > Hi,
> > I am a new guy to Cassandra, I wonder if available to call Cassandra
> in
> > one Transaction such as in Relation-DB.
> >
> > Thanks in advance.
> >
> > Best Regards,
> > Ivan Jiang
>


Looking for a good Ruby client

2012-08-01 Thread Harry Wilkinson
Hi,

I'm looking for a Ruby client for Cassandra that is pretty high-level.  I
am really hoping to find a Ruby gem of high quality that allows a developer
to create models like you would with ActiveModel.

So far I have figured out that the canonical Ruby client for Cassandra
is Twitter's
Cassandra gem  of the same name.  It
looks great - mature, still in active development, etc.  No stated support
for Ruby 1.9.3 that I can see, but I can probably live with that for now.

What I'm looking for is a higher-level gem built on that gem that works
like ActiveModel in that you just include a module in your model class and
that gives you methods to declare your model's serialized attributes and
also the usual ActiveModel methods like 'save!', 'valid?', 'find', etc.

I've been trying out some different NoSQL databases recently, and for
example there is an official Ruby
clientfor Riak with a
domain model that is close to Riak's, but then there's also
a gem called 'Ripple'  that uses a
domain model that is closer to what most Ruby developers are used to.  So
it looks like Twitter's Cassandra gem is the one that stays close to the
domain model of Cassandra, and what I'm looking for is a gem that's a
Cassandra equivalent of RIpple.

>From some searching I found
cassandra_object,
which has been inactive for a couple of years, but there's a
forkthat looks like
it's being maintained, but I have not found any kind of
information to suggest the maintained fork is in general use yet.  I have
found quite a lot of gems of a similar style that people have started and
then not really got very far with.

So, does anybody know of a suitable gem?  Would you recommend it?  Or
perhaps you would recommend not using such a gem and sticking with the
lower-level client gem?

Thanks in advance for your advice.

Harry


Re: virtual memory of all cassandra-nodes is growing extremly since Cassandra 1.1.0

2012-08-01 Thread Mina Naguib

All our servers (cassandra and otherwise) get monitored with nagios + get many 
basic metrics graphed by pnp4nagios.  This covers a large chunk of a box's 
health, as well as cassandra basics (specifically the pending tasks, JVM heap 
state).  IMO it's not possible to clearly debug a cassandra issue if you don't 
have a good holistic view of the boxes' health (CPU, RAM, swap, disk 
throughput, etc.)

Separate from that we have an operational dashboard.  It's a bunch of 
manually-defined RRD files and custom scripts that grab metrics, store, and 
graph the health of various layers in the infrastructure in an an 
easy-to-digest way (for example, each data center gets a color scheme - stacked 
machines within multiple DCs can just be eyeballed).  There we can see for 
example our total read volume, total write volume, struggling boxes, dynamic 
endpoint snitch reaction, etc...

Finally, almost all the software we write integrates with statsd + graphite.  
In graphite we have more metrics than we know what to do with, but it's better 
than the other way around.  From there for example we can see cassandra's 
response time including things cassandra itself can't measure (network, thrift, 
etc), across various different client softwares that talk to it.  Within 
graphite we have several dashboards defined (users make their own, some 
infrastructure components have shared dashboards.)


--
Mina Naguib :: Director, Infrastructure Engineering
Bloom Digital Platforms :: T 514.394.7951 #208
http://bloom-hq.com/



On 2012-08-01, at 3:43 PM, Greg Fausak wrote:

> Mina,
> 
> Thanks for that post.  Very interesting :-)
> 
> What sort of things are you graphing?  Standard *nux stuff
> (mem/cpu/etc)?  Or do you
> have some hooks in to the C* process (I saw somoething about port 1414
> in the .yaml file).
> 
> Best,
> 
> -g
> 
> 
> On Thu, Jul 26, 2012 at 9:27 AM, Mina Naguib
>  wrote:
>> 
>> Hi Thomas
>> 
>> On a modern 64bit server, I recommend you pay little attention to the 
>> virtual size.  It's made up of almost everything within the process's 
>> address space, including on-disk files mmap()ed in for zero-copy access.  
>> It's not unreasonable for a machine with N amount RAM to have a process 
>> whose virtual size is several times the value of N.  That in and of itself 
>> is not problematic
>> 
>> In a default cassandra 1.1.x setup, the bulk of that will be your sstables' 
>> data and index files.  On linux you can invoke the "pmap" tool on the 
>> cassandra process's PID to see what's in there.  Much of it will be 
>> anonymous memory allocations (the JVM heap itself, off-heap data structures, 
>> etc), but lots of it will be references to files on disk (binaries, 
>> libraries, mmap()ed files, etc).
>> 
>> What's more important to keep an eye on is the JVM heap - typically 
>> statically allocated to a fixed size at cassandra startup.  You can get info 
>> about its used/capacity values via "nodetool -h localhost info".  You can 
>> also hook up jconsole and trend it over time.
>> 
>> The other critical piece is the process's RESident memory size, which 
>> includes the JVM heap but also other off-heap data structures and 
>> miscellanea.  Cassandra has recently been making more use of off-heap 
>> structures (for example, row caching via SerializingCacheProvider).  This is 
>> done as a matter of efficiency - a serialized off-heap row is much smaller 
>> than a classical object sitting in the JVM heap - so you can do more with 
>> less.
>> 
>> Unfortunately, in my experience, it's not perfect.  They still have a cost, 
>> in terms of on-heap usage, as well as off-heap growth over time.
>> 
>> Specifically, my experience with cassandra 1.1.0 showed that off-heap row 
>> caches incurred a very high on-heap cost (ironic) - see my post at 
>> http://mail-archives.apache.org/mod_mbox/cassandra-user/201206.mbox/%3c6feb097f-287b-471d-bea2-48862b30f...@bloomdigital.com%3E
>>  - as documented in that email, I managed that with regularly scheduled full 
>> GC runs via System.gc()
>> 
>> I have, since then, moved away from scheduled System.gc() to scheduled row 
>> cache invalidations.  While this had the same effect as System.gc() I 
>> described in my email, it eliminated the 20-30 second pause associated with 
>> it.  It did however introduce (or may be I never noticed earlier), slow 
>> creep in memory usage outside of the heap.
>> 
>> It's typical in my case for example for a process configured with 6G of JVM 
>> heap to start up, stabilize at 6.5 - 7GB RESident usage, then creep up 
>> slowly throughout a week to 10-11GB range.  Depending on what else the box 
>> is doing, I've experienced the linux OOM killer killing cassandra as you've 
>> described, or heavy swap usage bringing everything down (we're 
>> latency-sensitive), etc..
>> 
>> And now for the good news.  Since I've upgraded to 1.1.2:
>>1. There's no more need for regularly scheduled System.gc()
>>2. There's no more need for regula

Re: virtual memory of all cassandra-nodes is growing extremly since Cassandra 1.1.0

2012-08-01 Thread Greg Fausak
Mina,

Thanks for that post.  Very interesting :-)

What sort of things are you graphing?  Standard *nux stuff
(mem/cpu/etc)?  Or do you
have some hooks in to the C* process (I saw somoething about port 1414
in the .yaml file).

Best,

-g


On Thu, Jul 26, 2012 at 9:27 AM, Mina Naguib
 wrote:
>
> Hi Thomas
>
> On a modern 64bit server, I recommend you pay little attention to the virtual 
> size.  It's made up of almost everything within the process's address space, 
> including on-disk files mmap()ed in for zero-copy access.  It's not 
> unreasonable for a machine with N amount RAM to have a process whose virtual 
> size is several times the value of N.  That in and of itself is not 
> problematic
>
> In a default cassandra 1.1.x setup, the bulk of that will be your sstables' 
> data and index files.  On linux you can invoke the "pmap" tool on the 
> cassandra process's PID to see what's in there.  Much of it will be anonymous 
> memory allocations (the JVM heap itself, off-heap data structures, etc), but 
> lots of it will be references to files on disk (binaries, libraries, mmap()ed 
> files, etc).
>
> What's more important to keep an eye on is the JVM heap - typically 
> statically allocated to a fixed size at cassandra startup.  You can get info 
> about its used/capacity values via "nodetool -h localhost info".  You can 
> also hook up jconsole and trend it over time.
>
> The other critical piece is the process's RESident memory size, which 
> includes the JVM heap but also other off-heap data structures and 
> miscellanea.  Cassandra has recently been making more use of off-heap 
> structures (for example, row caching via SerializingCacheProvider).  This is 
> done as a matter of efficiency - a serialized off-heap row is much smaller 
> than a classical object sitting in the JVM heap - so you can do more with 
> less.
>
> Unfortunately, in my experience, it's not perfect.  They still have a cost, 
> in terms of on-heap usage, as well as off-heap growth over time.
>
> Specifically, my experience with cassandra 1.1.0 showed that off-heap row 
> caches incurred a very high on-heap cost (ironic) - see my post at 
> http://mail-archives.apache.org/mod_mbox/cassandra-user/201206.mbox/%3c6feb097f-287b-471d-bea2-48862b30f...@bloomdigital.com%3E
>  - as documented in that email, I managed that with regularly scheduled full 
> GC runs via System.gc()
>
> I have, since then, moved away from scheduled System.gc() to scheduled row 
> cache invalidations.  While this had the same effect as System.gc() I 
> described in my email, it eliminated the 20-30 second pause associated with 
> it.  It did however introduce (or may be I never noticed earlier), slow creep 
> in memory usage outside of the heap.
>
> It's typical in my case for example for a process configured with 6G of JVM 
> heap to start up, stabilize at 6.5 - 7GB RESident usage, then creep up slowly 
> throughout a week to 10-11GB range.  Depending on what else the box is doing, 
> I've experienced the linux OOM killer killing cassandra as you've described, 
> or heavy swap usage bringing everything down (we're latency-sensitive), etc..
>
> And now for the good news.  Since I've upgraded to 1.1.2:
> 1. There's no more need for regularly scheduled System.gc()
> 2. There's no more need for regularly scheduled row cache invalidation
> 3. The HEAP usage within the JVM is stable over time
> 4. The RESident size of the process appears also stable over time
>
> Point #4 above is still pending as I only have 3 day graphs since the 
> upgrade, but they show promising results compared to the slope of the same 
> graph before the upgrade to 1.1.2
>
> So my advice is give 1.1.2 a shot - just be mindful of 
> https://issues.apache.org/jira/browse/CASSANDRA-4411
>
>
> On 2012-07-26, at 2:18 AM, Thomas Spengler wrote:
>
>> I saw this.
>>
>> All works fine upto version 1.1.0
>> the 0.8.x takes 5GB of memory of an 8GB machine
>> the 1.0.x takes between 6 and 7 GB on a 8GB machine
>> and
>> the 1.1.0 takes all
>>
>> and it is a problem
>> for me it is no solution to wait of the OOM-Killer from the linux kernel
>> and restart the cassandraprocess
>>
>> when my machine has less then 100MB ram available then I have a problem.
>>
>>
>>
>> On 07/25/2012 07:06 PM, Tyler Hobbs wrote:
>>> Are you actually seeing any problems from this? High virtual memory usage
>>> on its own really doesn't mean anything. See
>>> http://wiki.apache.org/cassandra/FAQ#mmap
>>>
>>> On Wed, Jul 25, 2012 at 1:21 AM, Thomas Spengler <
>>> thomas.speng...@toptarif.de> wrote:
>>>
 No one has any idea?

 we tryed

 update to 1.1.2
 DiskAccessMode standard, indexAccessMode standard
 row_cache_size_in_mb: 0
 key_cache_size_in_mb: 0


 Our next try will to change

 SerializingCacheProvider to ConcurrentLinkedHashCacheProvider

 any other proposals are welcom

 On 07/04/2012 02:13 PM, Thomas Spengler wrote:
>

Re: Creating counter columns in cassandra

2012-08-01 Thread Pushpalanka Jayawardhana
Hi All,

I faced this same problem when trying to query the counter values. I am
using a phone number as row key and updating the number of calls taken to
that number. So my query is like

SELECT KEY FROM  WHERE No_of_Calls>5

This does not return any data, neither any exception, though I am 100% sure
that entries are there which satisfy that query.
I used same code as Amila mentioned. My doubt is this is due to some
mismatch types with the counter value representation and query value, but
failed to resolve this. :(

Any ideas or guidance is greatly helpful.
Thanks in advance!


On Tue, Jul 31, 2012 at 1:49 PM, Amila Paranawithana wrote:

> Hi all,
> Thanks all for the valuable feedback. I have a problem with running
> queries with Cqlsh.
> My query is  SELECT * FROM rule1 WHERE sms=3;
>
> java.lang.NumberFormatException: An hex string representing bytes must
> have an even length
>  at org.apache.cassandra.utils.Hex.hexToBytes(Hex.java:52)
> at
> org.apache.cassandra.utils.ByteBufferUtil.hexToBytes(ByteBufferUtil.java:501)
>  at
> org.apache.cassandra.db.marshal.CounterColumnType.fromString(CounterColumnType.java:57)
>  at org.apache.cassandra.cql.Term.getByteBuffer(Term.java:96)
> at
> org.apache.cassandra.cql.QueryProcessor.multiRangeSlice(QueryProcessor.java:185)
>  at
> org.apache.cassandra.cql.QueryProcessor.processStatement(QueryProcessor.java:484)
> at org.apache.cassandra.cql.QueryProcessor.process(QueryProcessor.java:877)
>  at
> org.apache.cassandra.thrift.CassandraServer.execute_cql_query(CassandraServer.java:1235)
>  at
> org.apache.cassandra.thrift.Cassandra$Processor$execute_cql_query.getResult(Cassandra.java:3542)
>  at
> org.apache.cassandra.thrift.Cassandra$Processor$execute_cql_query.getResult(Cassandra.java:3530)
>  at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:32)
> at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:34)
>  at
> org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:186)
>  at
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>  at java.lang.Thread.run(Thread.java:662)
>
> but when I say SELECT * FROM rule1 WHERE sms=03; no exceptions are shown.
> But though I have entries where sms count =3 that entry is not retrieved.
>
> And for queries like SELECT * FROM rule1 WHERE sms>=03;
> Bad Request: No indexed columns present in by-columns clause with "equals"
> operator
>
> Can anyone recognize the problem here??
>
> Following are the methods I used.
>
> //for indexing columns
> void indexColumn(String idxColumnName,String CountercfName){
>
> Cluster cluster = HFactory.getOrCreateCluster(
> BasicConf.CASSANDRA_CLUSTER, BasicConf.CLUSTER_PORT);
> KeyspaceDefinition keyspaceDefinition =
> cluster.describeKeyspace(BasicConf.KEYSPACE);
>
> List cdfs = keyspaceDefinition.getCfDefs();
> ColumnFamilyDefinition cfd = null;
> for(ColumnFamilyDefinition c:cdfs){
>  if(c.getName().toString().equals(CountercfName)) {
>  System.out.println(c.getName());
>  cfd=c;
>  break;
>  }
> }
>
> BasicColumnFamilyDefinition columnFamilyDefinition = new
> BasicColumnFamilyDefinition(cfd);
>
> BasicColumnDefinition bcdf = new BasicColumnDefinition();
> bcdf.setName(StringSerializer.get().toByteBuffer(idxColumnName));
> bcdf.setIndexName(idxColumnName+"index");
> bcdf.setIndexType(ColumnIndexType.KEYS);
> bcdf.setValidationClass(ComparatorType.COUNTERTYPE.getClassName());
>
> columnFamilyDefinition.addColumnDefinition(bcdf);
> cluster.updateColumnFamily(new
> ThriftCfDef(columnFamilyDefinition));
>
>  }
>
> // for adding a new counter column
> void insertCounterColumn(String cfName, String counterColumnName,
>  String phoneNumberKey) {
>
>  Mutator mutator = HFactory.createMutator(keyspace,
>  StringSerializer.get());
>  mutator.insertCounter(phoneNumberKey, cfName, HFactory
> .createCounterColumn(counterColumnName, 1L,
>  StringSerializer.get()));
>  mutator.execute();
> CounterQuery counter = new
> ThriftCounterColumnQuery(
>  keyspace, StringSerializer.get(), StringSerializer.get());
> counter.setColumnFamily(cfName).setKey(phoneNumberKey)
>  .setName(counterColumnName);
>
>  indexColumn(columnName, cfName);
>
>  }
>
> // incrementing counter values
> void incrementCounter(String ruleName, String columnName,
> HashMap entries) {
>
> Mutator mutator = HFactory.createMutator(keyspace,
> StringSerializer.get());
>
> Set keys = entries.keySet();
> for (String s : keys) {
>  mutator.incrementCounter(s, ruleName, columnName, entries.get(s));
>
> }
>
> mutator.execute();
>
> }
>
>
>
> On Sun, Jul 29, 2012 at 3:29 PM, Paolo Bernardi wrote:
>
>> On Sun, Jul 29, 2012 at 9:30 AM, Abhijit Chanda
>>  wrote:
>> > There should be at least one "=" (equals) in the WHERE

Re: Unsuccessful attempt to add a second node to a ring.

2012-08-01 Thread Jakub Glapa
I found a similar thread from March :
http://www.mail-archive.com/user@cassandra.apache.org/msg21007.html

For me clearing the data and starting from the beginning didn't help.

It's interesting because on my dev environment I was able to add another
node without any problems.

The only difference is that the second node now is in a different data
center. (but I'm not using any different settings, SimpleSnitch)
7000,9160,7199 ports were open between those 2 nodes.

How else can I check if the communication between those 2 nodes is working?
In the logs I see that:
DEBUG [WRITE-NODE1/node1.ip] 2012-07-31 13:50:39,642
OutboundTcpConnection.java (line 206) attempting to connect to
NODE1/node1.ip

So I assume that the communication is somehow established?


--
regards,
Jakub Glapa


On Wed, Aug 1, 2012 at 11:36 AM, Jakub Glapa  wrote:

> yes it's the same
>
>
>
> --
> regards,
> pozdrawiam,
> Jakub Glapa
>
>
> On Wed, Aug 1, 2012 at 11:24 AM, Roshni Rajagopal <
> roshni.rajago...@wal-mart.com> wrote:
>
>> Ok, sorry it may not be required,
>> I was thinking of a configuration I had done on my local laptop, where I
>> had aliased my IP address.
>> In that case the directories and jmx port needed to be different.
>>
>> Cluster name is same right?
>>
>>
>> From: Jakub Glapa mailto:jakub.gl...@gmail.com>>
>> Reply-To: "user@cassandra.apache.org" <
>> user@cassandra.apache.org>
>> To: "user@cassandra.apache.org" <
>> user@cassandra.apache.org>
>> Subject: Re: Unsuccessful attempt to add a second node to a ring.
>>
>> Hi Roshni,
>> no they are the same, my changes in cassandra.yaml were only in the
>> listen_address, rpc_address, seeds and initial_token field.
>> The rest is exactly the same as on node1.
>>
>> That's how the file looks on node2:
>>
>>
>>
>> cluster_name: 'Test Cluster'
>> initial_token: 85070591730234615865843651857942052864
>> hinted_handoff_enabled: true
>> hinted_handoff_throttle_delay_in_ms: 1
>> authenticator: org.apache.cassandra.auth.AllowAllAuthenticator
>> authority: org.apache.cassandra.auth.AllowAllAuthority
>> partitioner: org.apache.cassandra.dht.RandomPartitioner
>> data_file_directories:
>> - /data/servers/cassandra_sbe_edtool/cassandra_data/data
>> commitlog_directory:
>> /data/servers/cassandra_sbe_edtool/cassandra_data/commitlog
>> saved_caches_directory:
>> /data/servers/cassandra_sbe_edtool/cassandra_data/saved_caches
>> commitlog_sync: periodic
>> commitlog_sync_period_in_ms: 1
>> seed_provider:
>> - class_name: org.apache.cassandra.locator.SimpleSeedProvider
>>   parameters:
>>   - seeds: "NODE1"
>> flush_largest_memtables_at: 0.75
>> reduce_cache_sizes_at: 0.85
>> reduce_cache_capacity_to: 0.6
>> concurrent_reads: 32
>> concurrent_writes: 32
>> memtable_flush_queue_size: 4
>> sliced_buffer_size_in_kb: 64
>> storage_port: 7000
>> ssl_storage_port: 7001
>> listen_address: NODE2
>> rpc_address: NODE2
>> rpc_port: 9160
>> rpc_keepalive: true
>> rpc_server_type: sync
>> thrift_framed_transport_size_in_mb: 15
>> thrift_max_message_length_in_mb: 16
>> incremental_backups: false
>> snapshot_before_compaction: false
>> column_index_size_in_kb: 64
>> in_memory_compaction_limit_in_mb: 64
>> multithreaded_compaction: false
>> compaction_throughput_mb_per_sec: 16
>> compaction_preheat_key_cache: true
>> rpc_timeout_in_ms: 1
>> endpoint_snitch: org.apache.cassandra.locator.SimpleSnitch
>> dynamic_snitch_update_interval_in_ms: 100
>> dynamic_snitch_reset_interval_in_ms: 60
>> dynamic_snitch_badness_threshold: 0.1
>> request_scheduler: org.apache.cassandra.scheduler.NoScheduler
>> index_interval: 128
>> encryption_options:
>> internode_encryption: none
>> keystore: conf/.keystore
>> keystore_password: cassandra
>> truststore: conf/.truststore
>> truststore_password: cassandra
>>
>>
>>
>>
>> --
>> regards,
>> pozdrawiam,
>> Jakub Glapa
>>
>>
>> On Wed, Aug 1, 2012 at 10:29 AM, Roshni Rajagopal <
>> roshni.rajago...@wal-mart.com>
>> wrote:
>> Jakub,
>>
>> Have you set the
>> Data, commitlog, saved cache directories to different ones in each yaml
>> file for each node?
>>
>> Regards,
>> Roshni
>>
>>
>> From: Jakub Glapa mailto:jakub.gl...@gmail.com
>> >>>
>> Reply-To: "user@cassandra.apache.org> >>" <
>> user@cassandra.apache.org> user@cassandra.apache.org>>
>> To: "user@cassandra.apache.org> user@cassandra.apache.org>" <
>> user@cassandra.apache.org> user@cassandra.apache.org>>
>> Subject: Unsuccessful attempt to add

Re: virtual memory of all cassandra-nodes is growing extremly since Cassandra 1.1.0

2012-08-01 Thread Thomas Spengler
Just for information

we are running on 1.1.2
JNA or not, had no difference
Manually call full gc, had no difference

but
in my case

the reduction of
commitlog_total_space_in_mb to 2048 (from default 4096)
makes the difference.




On 07/26/2012 04:27 PM, Mina Naguib wrote:
> 
> Hi Thomas
> 
> On a modern 64bit server, I recommend you pay little attention to the virtual 
> size.  It's made up of almost everything within the process's address space, 
> including on-disk files mmap()ed in for zero-copy access.  It's not 
> unreasonable for a machine with N amount RAM to have a process whose virtual 
> size is several times the value of N.  That in and of itself is not 
> problematic
> 
> In a default cassandra 1.1.x setup, the bulk of that will be your sstables' 
> data and index files.  On linux you can invoke the "pmap" tool on the 
> cassandra process's PID to see what's in there.  Much of it will be anonymous 
> memory allocations (the JVM heap itself, off-heap data structures, etc), but 
> lots of it will be references to files on disk (binaries, libraries, mmap()ed 
> files, etc).
> 
> What's more important to keep an eye on is the JVM heap - typically 
> statically allocated to a fixed size at cassandra startup.  You can get info 
> about its used/capacity values via "nodetool -h localhost info".  You can 
> also hook up jconsole and trend it over time.
> 
> The other critical piece is the process's RESident memory size, which 
> includes the JVM heap but also other off-heap data structures and 
> miscellanea.  Cassandra has recently been making more use of off-heap 
> structures (for example, row caching via SerializingCacheProvider).  This is 
> done as a matter of efficiency - a serialized off-heap row is much smaller 
> than a classical object sitting in the JVM heap - so you can do more with 
> less.
> 
> Unfortunately, in my experience, it's not perfect.  They still have a cost, 
> in terms of on-heap usage, as well as off-heap growth over time.
> 
> Specifically, my experience with cassandra 1.1.0 showed that off-heap row 
> caches incurred a very high on-heap cost (ironic) - see my post at 
> http://mail-archives.apache.org/mod_mbox/cassandra-user/201206.mbox/%3c6feb097f-287b-471d-bea2-48862b30f...@bloomdigital.com%3E
>  - as documented in that email, I managed that with regularly scheduled full 
> GC runs via System.gc()
> 
> I have, since then, moved away from scheduled System.gc() to scheduled row 
> cache invalidations.  While this had the same effect as System.gc() I 
> described in my email, it eliminated the 20-30 second pause associated with 
> it.  It did however introduce (or may be I never noticed earlier), slow creep 
> in memory usage outside of the heap.
> 
> It's typical in my case for example for a process configured with 6G of JVM 
> heap to start up, stabilize at 6.5 - 7GB RESident usage, then creep up slowly 
> throughout a week to 10-11GB range.  Depending on what else the box is doing, 
> I've experienced the linux OOM killer killing cassandra as you've described, 
> or heavy swap usage bringing everything down (we're latency-sensitive), etc..
> 
> And now for the good news.  Since I've upgraded to 1.1.2:
>   1. There's no more need for regularly scheduled System.gc()
>   2. There's no more need for regularly scheduled row cache invalidation
>   3. The HEAP usage within the JVM is stable over time
>   4. The RESident size of the process appears also stable over time
> 
> Point #4 above is still pending as I only have 3 day graphs since the 
> upgrade, but they show promising results compared to the slope of the same 
> graph before the upgrade to 1.1.2
> 
> So my advice is give 1.1.2 a shot - just be mindful of 
> https://issues.apache.org/jira/browse/CASSANDRA-4411
> 
> 
> On 2012-07-26, at 2:18 AM, Thomas Spengler wrote:
> 
>> I saw this.
>>
>> All works fine upto version 1.1.0
>> the 0.8.x takes 5GB of memory of an 8GB machine
>> the 1.0.x takes between 6 and 7 GB on a 8GB machine
>> and
>> the 1.1.0 takes all
>>
>> and it is a problem
>> for me it is no solution to wait of the OOM-Killer from the linux kernel
>> and restart the cassandraprocess
>>
>> when my machine has less then 100MB ram available then I have a problem.
>>
>>
>>
>> On 07/25/2012 07:06 PM, Tyler Hobbs wrote:
>>> Are you actually seeing any problems from this? High virtual memory usage
>>> on its own really doesn't mean anything. See
>>> http://wiki.apache.org/cassandra/FAQ#mmap
>>>
>>> On Wed, Jul 25, 2012 at 1:21 AM, Thomas Spengler <
>>> thomas.speng...@toptarif.de> wrote:
>>>
 No one has any idea?

 we tryed

 update to 1.1.2
 DiskAccessMode standard, indexAccessMode standard
 row_cache_size_in_mb: 0
 key_cache_size_in_mb: 0


 Our next try will to change

 SerializingCacheProvider to ConcurrentLinkedHashCacheProvider

 any other proposals are welcom

 On 07/04/2012 02:13 PM, Thomas Spengler wrote:
> Hi @all,
>

Re: Does Cassandra support operations in a transaction?

2012-08-01 Thread Greg Fausak
Hi Ivan,

No Cassandra does not support transactions.

I believe each operation is atomic.  If that operation returns
a successful result, then it worked.  You can't do things like
bind two operations and guarantee is either fails they both fail.

You will find that Cassandra doesn't do a lot of things compared to a sql db :-)

But, it does write a lot of data quickly.

-g


On Wed, Aug 1, 2012 at 5:21 AM, Ivan Jiang  wrote:
> Hi,
> I am a new guy to Cassandra, I wonder if available to call Cassandra in
> one Transaction such as in Relation-DB.
>
> Thanks in advance.
>
> Best Regards,
> Ivan Jiang


Restore snapshot

2012-08-01 Thread Desimpel, Ignace
Hi,

Is it possible to restore a snapshot of a keyspace on a live cassandra cluster 
(I mean without restarting)?



Re: Unsuccessful attempt to add a second node to a ring.

2012-08-01 Thread Jakub Glapa
yes it's the same


--
regards,
pozdrawiam,
Jakub Glapa


On Wed, Aug 1, 2012 at 11:24 AM, Roshni Rajagopal <
roshni.rajago...@wal-mart.com> wrote:

> Ok, sorry it may not be required,
> I was thinking of a configuration I had done on my local laptop, where I
> had aliased my IP address.
> In that case the directories and jmx port needed to be different.
>
> Cluster name is same right?
>
>
> From: Jakub Glapa mailto:jakub.gl...@gmail.com>>
> Reply-To: "user@cassandra.apache.org" <
> user@cassandra.apache.org>
> To: "user@cassandra.apache.org" <
> user@cassandra.apache.org>
> Subject: Re: Unsuccessful attempt to add a second node to a ring.
>
> Hi Roshni,
> no they are the same, my changes in cassandra.yaml were only in the
> listen_address, rpc_address, seeds and initial_token field.
> The rest is exactly the same as on node1.
>
> That's how the file looks on node2:
>
>
>
> cluster_name: 'Test Cluster'
> initial_token: 85070591730234615865843651857942052864
> hinted_handoff_enabled: true
> hinted_handoff_throttle_delay_in_ms: 1
> authenticator: org.apache.cassandra.auth.AllowAllAuthenticator
> authority: org.apache.cassandra.auth.AllowAllAuthority
> partitioner: org.apache.cassandra.dht.RandomPartitioner
> data_file_directories:
> - /data/servers/cassandra_sbe_edtool/cassandra_data/data
> commitlog_directory:
> /data/servers/cassandra_sbe_edtool/cassandra_data/commitlog
> saved_caches_directory:
> /data/servers/cassandra_sbe_edtool/cassandra_data/saved_caches
> commitlog_sync: periodic
> commitlog_sync_period_in_ms: 1
> seed_provider:
> - class_name: org.apache.cassandra.locator.SimpleSeedProvider
>   parameters:
>   - seeds: "NODE1"
> flush_largest_memtables_at: 0.75
> reduce_cache_sizes_at: 0.85
> reduce_cache_capacity_to: 0.6
> concurrent_reads: 32
> concurrent_writes: 32
> memtable_flush_queue_size: 4
> sliced_buffer_size_in_kb: 64
> storage_port: 7000
> ssl_storage_port: 7001
> listen_address: NODE2
> rpc_address: NODE2
> rpc_port: 9160
> rpc_keepalive: true
> rpc_server_type: sync
> thrift_framed_transport_size_in_mb: 15
> thrift_max_message_length_in_mb: 16
> incremental_backups: false
> snapshot_before_compaction: false
> column_index_size_in_kb: 64
> in_memory_compaction_limit_in_mb: 64
> multithreaded_compaction: false
> compaction_throughput_mb_per_sec: 16
> compaction_preheat_key_cache: true
> rpc_timeout_in_ms: 1
> endpoint_snitch: org.apache.cassandra.locator.SimpleSnitch
> dynamic_snitch_update_interval_in_ms: 100
> dynamic_snitch_reset_interval_in_ms: 60
> dynamic_snitch_badness_threshold: 0.1
> request_scheduler: org.apache.cassandra.scheduler.NoScheduler
> index_interval: 128
> encryption_options:
> internode_encryption: none
> keystore: conf/.keystore
> keystore_password: cassandra
> truststore: conf/.truststore
> truststore_password: cassandra
>
>
>
>
> --
> regards,
> pozdrawiam,
> Jakub Glapa
>
>
> On Wed, Aug 1, 2012 at 10:29 AM, Roshni Rajagopal <
> roshni.rajago...@wal-mart.com>
> wrote:
> Jakub,
>
> Have you set the
> Data, commitlog, saved cache directories to different ones in each yaml
> file for each node?
>
> Regards,
> Roshni
>
>
> From: Jakub Glapa mailto:jakub.gl...@gmail.com
> >>>
> Reply-To: "user@cassandra.apache.org >>" <
> user@cassandra.apache.org user@cassandra.apache.org>>
> To: "user@cassandra.apache.org user@cassandra.apache.org>" <
> user@cassandra.apache.org user@cassandra.apache.org>>
> Subject: Unsuccessful attempt to add a second node to a ring.
>
> Hi Everybody!
>
> I'm trying to add a second node to an already operating one node cluster.
>
> Some specs:
> - cassandra 1.0.7
> - both nodes have a routable listen_address and rpc_address.
> - Ports are open: (from node2) telnet node1 7000 is successful
> - Seeds parameter on node2 points to node 1.
>
> [node1] nodetool -h localhost ring
> Address DC  RackStatus State   Load
>  OwnsToken
> node1.ip datacenter1 rack1   Up Normal  74.33 KB
>  100.00% 0
>
> - initial token on node2 was specified
>
> I see something like that in the logs on node2:
>
> DEBUG [main] 2012-07-31 13:50:38,640 CollationController.java (line 76)
> collectTimeOrderedData
>  INFO [main] 2012-07-31 13:50:38,641 StorageService.java (line 667)
> JOINING: waiting for ring and schema information
> DEBUG [WRITE-NODE1/node1.ip] 2012-07-31 13:50:39,642
> OutboundTcpConnection.java (line 206) attempting to connect to
> NODE1/node1

Re: Unsuccessful attempt to add a second node to a ring.

2012-08-01 Thread Roshni Rajagopal
Ok, sorry it may not be required,
I was thinking of a configuration I had done on my local laptop, where I had 
aliased my IP address.
In that case the directories and jmx port needed to be different.

Cluster name is same right?


From: Jakub Glapa mailto:jakub.gl...@gmail.com>>
Reply-To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Subject: Re: Unsuccessful attempt to add a second node to a ring.

Hi Roshni,
no they are the same, my changes in cassandra.yaml were only in the 
listen_address, rpc_address, seeds and initial_token field.
The rest is exactly the same as on node1.

That's how the file looks on node2:



cluster_name: 'Test Cluster'
initial_token: 85070591730234615865843651857942052864
hinted_handoff_enabled: true
hinted_handoff_throttle_delay_in_ms: 1
authenticator: org.apache.cassandra.auth.AllowAllAuthenticator
authority: org.apache.cassandra.auth.AllowAllAuthority
partitioner: org.apache.cassandra.dht.RandomPartitioner
data_file_directories:
- /data/servers/cassandra_sbe_edtool/cassandra_data/data
commitlog_directory: /data/servers/cassandra_sbe_edtool/cassandra_data/commitlog
saved_caches_directory: 
/data/servers/cassandra_sbe_edtool/cassandra_data/saved_caches
commitlog_sync: periodic
commitlog_sync_period_in_ms: 1
seed_provider:
- class_name: org.apache.cassandra.locator.SimpleSeedProvider
  parameters:
  - seeds: "NODE1"
flush_largest_memtables_at: 0.75
reduce_cache_sizes_at: 0.85
reduce_cache_capacity_to: 0.6
concurrent_reads: 32
concurrent_writes: 32
memtable_flush_queue_size: 4
sliced_buffer_size_in_kb: 64
storage_port: 7000
ssl_storage_port: 7001
listen_address: NODE2
rpc_address: NODE2
rpc_port: 9160
rpc_keepalive: true
rpc_server_type: sync
thrift_framed_transport_size_in_mb: 15
thrift_max_message_length_in_mb: 16
incremental_backups: false
snapshot_before_compaction: false
column_index_size_in_kb: 64
in_memory_compaction_limit_in_mb: 64
multithreaded_compaction: false
compaction_throughput_mb_per_sec: 16
compaction_preheat_key_cache: true
rpc_timeout_in_ms: 1
endpoint_snitch: org.apache.cassandra.locator.SimpleSnitch
dynamic_snitch_update_interval_in_ms: 100
dynamic_snitch_reset_interval_in_ms: 60
dynamic_snitch_badness_threshold: 0.1
request_scheduler: org.apache.cassandra.scheduler.NoScheduler
index_interval: 128
encryption_options:
internode_encryption: none
keystore: conf/.keystore
keystore_password: cassandra
truststore: conf/.truststore
truststore_password: cassandra




--
regards,
pozdrawiam,
Jakub Glapa


On Wed, Aug 1, 2012 at 10:29 AM, Roshni Rajagopal 
mailto:roshni.rajago...@wal-mart.com>> wrote:
Jakub,

Have you set the
Data, commitlog, saved cache directories to different ones in each yaml file 
for each node?

Regards,
Roshni


From: Jakub Glapa 
mailto:jakub.gl...@gmail.com>>>
Reply-To: 
"user@cassandra.apache.org>"
 
mailto:user@cassandra.apache.org>>>
To: 
"user@cassandra.apache.org>"
 
mailto:user@cassandra.apache.org>>>
Subject: Unsuccessful attempt to add a second node to a ring.

Hi Everybody!

I'm trying to add a second node to an already operating one node cluster.

Some specs:
- cassandra 1.0.7
- both nodes have a routable listen_address and rpc_address.
- Ports are open: (from node2) telnet node1 7000 is successful
- Seeds parameter on node2 points to node 1.

[node1] nodetool -h localhost ring
Address DC  RackStatus State   LoadOwns
Token
node1.ip datacenter1 rack1   Up Normal  74.33 KB100.00% 0

- initial token on node2 was specified

I see something like that in the logs on node2:

DEBUG [main] 2012-07-31 13:50:38,640 CollationController.java (line 76) 
collectTimeOrderedData
 INFO [main] 2012-07-31 13:50:38,641 StorageService.java (line 667) JOINING: 
waiting for ring and schema information
DEBUG [WRITE-NODE1/node1.ip] 2012-07-31 13:50:39,642 OutboundTcpConnection.java 
(line 206) attempting to connect to NODE1/node1.ip
DEBUG [ScheduledTasks:1] 2012-07-31 13:50:40,639 LoadBroadcaster.java (line 86) 
Disseminating load info ...
 INFO [main] 2012-07-31 13:51:08,641 StorageService.java (line 667) JOINING: 
schema complete, ready to bootstrap
DEBUG [main] 2012-07-31 13:51:08,642 StorageService.java (line 554) ... got 
ring + schema info
 INFO [main] 2012-07-31 13:51:08,642 StorageService.java (line 667) JOINING: 
getting bootstrap token
DEBUG [main] 2012-07-31 13:51:08,644 BootStrapper.java (line 138) 

Re: Unsuccessful attempt to add a second node to a ring.

2012-08-01 Thread Jakub Glapa
Hi Roshni,
no they are the same, my changes in cassandra.yaml were only in the
listen_address, rpc_address, seeds and initial_token field.
The rest is exactly the same as on node1.

That's how the file looks on node2:



cluster_name: 'Test Cluster'
initial_token: 85070591730234615865843651857942052864
hinted_handoff_enabled: true
hinted_handoff_throttle_delay_in_ms: 1
authenticator: org.apache.cassandra.auth.AllowAllAuthenticator
authority: org.apache.cassandra.auth.AllowAllAuthority
partitioner: org.apache.cassandra.dht.RandomPartitioner
data_file_directories:
- /data/servers/cassandra_sbe_edtool/cassandra_data/data
commitlog_directory:
/data/servers/cassandra_sbe_edtool/cassandra_data/commitlog
saved_caches_directory:
/data/servers/cassandra_sbe_edtool/cassandra_data/saved_caches
commitlog_sync: periodic
commitlog_sync_period_in_ms: 1
seed_provider:
- class_name: org.apache.cassandra.locator.SimpleSeedProvider
  parameters:
  - seeds: "NODE1"
flush_largest_memtables_at: 0.75
reduce_cache_sizes_at: 0.85
reduce_cache_capacity_to: 0.6
concurrent_reads: 32
concurrent_writes: 32
memtable_flush_queue_size: 4
sliced_buffer_size_in_kb: 64
storage_port: 7000
ssl_storage_port: 7001
listen_address: NODE2
rpc_address: NODE2
rpc_port: 9160
rpc_keepalive: true
rpc_server_type: sync
thrift_framed_transport_size_in_mb: 15
thrift_max_message_length_in_mb: 16
incremental_backups: false
snapshot_before_compaction: false
column_index_size_in_kb: 64
in_memory_compaction_limit_in_mb: 64
multithreaded_compaction: false
compaction_throughput_mb_per_sec: 16
compaction_preheat_key_cache: true
rpc_timeout_in_ms: 1
endpoint_snitch: org.apache.cassandra.locator.SimpleSnitch
dynamic_snitch_update_interval_in_ms: 100
dynamic_snitch_reset_interval_in_ms: 60
dynamic_snitch_badness_threshold: 0.1
request_scheduler: org.apache.cassandra.scheduler.NoScheduler
index_interval: 128
encryption_options:
internode_encryption: none
keystore: conf/.keystore
keystore_password: cassandra
truststore: conf/.truststore
truststore_password: cassandra




--
regards,
pozdrawiam,
Jakub Glapa


On Wed, Aug 1, 2012 at 10:29 AM, Roshni Rajagopal <
roshni.rajago...@wal-mart.com> wrote:

> Jakub,
>
> Have you set the
> Data, commitlog, saved cache directories to different ones in each yaml
> file for each node?
>
> Regards,
> Roshni
>
>
> From: Jakub Glapa mailto:jakub.gl...@gmail.com>>
> Reply-To: "user@cassandra.apache.org" <
> user@cassandra.apache.org>
> To: "user@cassandra.apache.org" <
> user@cassandra.apache.org>
> Subject: Unsuccessful attempt to add a second node to a ring.
>
> Hi Everybody!
>
> I'm trying to add a second node to an already operating one node cluster.
>
> Some specs:
> - cassandra 1.0.7
> - both nodes have a routable listen_address and rpc_address.
> - Ports are open: (from node2) telnet node1 7000 is successful
> - Seeds parameter on node2 points to node 1.
>
> [node1] nodetool -h localhost ring
> Address DC  RackStatus State   Load
>  OwnsToken
> node1.ip datacenter1 rack1   Up Normal  74.33 KB
>  100.00% 0
>
> - initial token on node2 was specified
>
> I see something like that in the logs on node2:
>
> DEBUG [main] 2012-07-31 13:50:38,640 CollationController.java (line 76)
> collectTimeOrderedData
>  INFO [main] 2012-07-31 13:50:38,641 StorageService.java (line 667)
> JOINING: waiting for ring and schema information
> DEBUG [WRITE-NODE1/node1.ip] 2012-07-31 13:50:39,642
> OutboundTcpConnection.java (line 206) attempting to connect to
> NODE1/node1.ip
> DEBUG [ScheduledTasks:1] 2012-07-31 13:50:40,639 LoadBroadcaster.java
> (line 86) Disseminating load info ...
>  INFO [main] 2012-07-31 13:51:08,641 StorageService.java (line 667)
> JOINING: schema complete, ready to bootstrap
> DEBUG [main] 2012-07-31 13:51:08,642 StorageService.java (line 554) ...
> got ring + schema info
>  INFO [main] 2012-07-31 13:51:08,642 StorageService.java (line 667)
> JOINING: getting bootstrap token
> DEBUG [main] 2012-07-31 13:51:08,644 BootStrapper.java (line 138) token
> manually specified as 85070591730234615865843651857942052864
> DEBUG [main] 2012-07-31 13:51:08,645 Table.java (line 387) applying
> mutation of row 4c
>
>
> but it doesn't join the ring:
>
> [node2] nodetool -h localhost ring
> Address DC  RackStatus State   Load
>  OwnsToken
> node2.ip   datacenter1 rack1   Up Normal  13.49 KB100.00%
> 85070591730234615865843651857942052864
>
>
>
> I'm attaching the full log from node2 startup in debug mode.
>
>
>
> PS.
> When I didn't specified the initial token on node2 I ended up with
> exception like that:
> "Exception encountered during startup: No other nodes seen!  Unable to
> bootstrap.If you intended to start a single-node cluster, you should make
> sure your broadcast_addr

Re: Unsuccessful attempt to add a second node to a ring.

2012-08-01 Thread Roshni Rajagopal
Jakub,

Have you set the
Data, commitlog, saved cache directories to different ones in each yaml file 
for each node?

Regards,
Roshni


From: Jakub Glapa mailto:jakub.gl...@gmail.com>>
Reply-To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Subject: Unsuccessful attempt to add a second node to a ring.

Hi Everybody!

I'm trying to add a second node to an already operating one node cluster.

Some specs:
- cassandra 1.0.7
- both nodes have a routable listen_address and rpc_address.
- Ports are open: (from node2) telnet node1 7000 is successful
- Seeds parameter on node2 points to node 1.

[node1] nodetool -h localhost ring
Address DC  RackStatus State   LoadOwns
Token
node1.ip datacenter1 rack1   Up Normal  74.33 KB100.00% 0

- initial token on node2 was specified

I see something like that in the logs on node2:

DEBUG [main] 2012-07-31 13:50:38,640 CollationController.java (line 76) 
collectTimeOrderedData
 INFO [main] 2012-07-31 13:50:38,641 StorageService.java (line 667) JOINING: 
waiting for ring and schema information
DEBUG [WRITE-NODE1/node1.ip] 2012-07-31 13:50:39,642 OutboundTcpConnection.java 
(line 206) attempting to connect to NODE1/node1.ip
DEBUG [ScheduledTasks:1] 2012-07-31 13:50:40,639 LoadBroadcaster.java (line 86) 
Disseminating load info ...
 INFO [main] 2012-07-31 13:51:08,641 StorageService.java (line 667) JOINING: 
schema complete, ready to bootstrap
DEBUG [main] 2012-07-31 13:51:08,642 StorageService.java (line 554) ... got 
ring + schema info
 INFO [main] 2012-07-31 13:51:08,642 StorageService.java (line 667) JOINING: 
getting bootstrap token
DEBUG [main] 2012-07-31 13:51:08,644 BootStrapper.java (line 138) token 
manually specified as 85070591730234615865843651857942052864
DEBUG [main] 2012-07-31 13:51:08,645 Table.java (line 387) applying mutation of 
row 4c


but it doesn't join the ring:

[node2] nodetool -h localhost ring
Address DC  RackStatus State   LoadOwns
Token
node2.ip   datacenter1 rack1   Up Normal  13.49 KB100.00% 
85070591730234615865843651857942052864



I'm attaching the full log from node2 startup in debug mode.



PS.
When I didn't specified the initial token on node2 I ended up with exception 
like that:
"Exception encountered during startup: No other nodes seen!  Unable to 
bootstrap.If you intended to start a single-node cluster, you should make sure 
your broadcast_address (or listen_address) is listed as a seed.
Otherwise, you need to determine why the seed being contacted has no knowledge 
of the rest of the cluster.  Usually, this can be solved by giving all nodes 
the same seed list."


I'm not sure how to proceed now. I found a couple of posts with problems like 
that but they weren't very useful.

--
regards,
Jakub Glapa

This email and any files transmitted with it are confidential and intended 
solely for the individual or entity to whom they are addressed. If you have 
received this email in error destroy it immediately. *** Walmart Confidential 
***


Unsuccessful attempt to add a second node to a ring.

2012-08-01 Thread Jakub Glapa
Hi Everybody!

I'm trying to add a second node to an already operating one node cluster.

Some specs:
- cassandra 1.0.7
- both nodes have a routable listen_address and rpc_address.
- Ports are open: (from node2) telnet node1 7000 is successful
- Seeds parameter on node2 points to node 1.

[node1] nodetool -h localhost ring
Address DC  RackStatus State   LoadOwns
   Token
node1.ip datacenter1 rack1   Up Normal  74.33 KB100.00%
0

- initial token on node2 was specified

I see something like that in the logs on node2:

DEBUG [main] 2012-07-31 13:50:38,640 CollationController.java (line 76)
collectTimeOrderedData
 INFO [main] 2012-07-31 13:50:38,641 StorageService.java (line 667)
JOINING: waiting for ring and schema information
DEBUG [WRITE-NODE1/node1.ip] 2012-07-31 13:50:39,642
OutboundTcpConnection.java (line 206) attempting to connect to
NODE1/node1.ip
DEBUG [ScheduledTasks:1] 2012-07-31 13:50:40,639 LoadBroadcaster.java (line
86) Disseminating load info ...
 INFO [main] 2012-07-31 13:51:08,641 StorageService.java (line 667)
JOINING: schema complete, ready to bootstrap
DEBUG [main] 2012-07-31 13:51:08,642 StorageService.java (line 554) ... got
ring + schema info
 INFO [main] 2012-07-31 13:51:08,642 StorageService.java (line 667)
JOINING: getting bootstrap token
DEBUG [main] 2012-07-31 13:51:08,644 BootStrapper.java (line 138) token
manually specified as 85070591730234615865843651857942052864
DEBUG [main] 2012-07-31 13:51:08,645 Table.java (line 387) applying
mutation of row 4c


but it doesn't join the ring:

[node2] nodetool -h localhost ring
Address DC  RackStatus State   LoadOwns
   Token
node2.ip   datacenter1 rack1   Up Normal  13.49 KB100.00%
85070591730234615865843651857942052864



I'm attaching the full log from node2 startup in debug mode.



PS.
When I didn't specified the initial token on node2 I ended up with
exception like that:
"Exception encountered during startup: No other nodes seen!  Unable to
bootstrap.If you intended to start a single-node cluster, you should make
sure your broadcast_address (or listen_address) is listed as a seed.
Otherwise, you need to determine why the seed being contacted has no
knowledge of the rest of the cluster.  Usually, this can be solved by
giving all nodes the same seed list."


I'm not sure how to proceed now. I found a couple of posts with problems
like that but they weren't very useful.

--
regards,
Jakub Glapa


system.log
Description: Binary data


Re: Cassandra 1.0 hangs during GC

2012-08-01 Thread Nikolay Kоvshov
And the final solution

http://unbxd.com/blog/2012/07/java-and-ksoftirqd-100-cpu-due-to-leap-second/

Doing $ date -s "`date`" solved the problem. 


30.07.2012, 16:09, "Nikolay Kоvshov" :

>  You mean using swap memory? I have total of 48G of RAM and Cassandra never 
> used more than 2G, swap is disabled.
>
>  But as I have little clues, I can give this a try. Is there any fresh 
> instruction on running Cassandra with JNA ?
>
>  30.07.2012, 16:01, "Mateusz Korniak" :
>>   On Monday 30 of July 2012, Nikolay Kоvshov wrote:
>>>    -  JNA is not installed on both machines
>>   So your GC times may be strongly [1] affected by swapping.
>>   IIRC, also snapshotting is more expensive and may trigger more swapping.
>>   I would start with turning JNA mlockall on [2].
>>
>>   [1]:
>>   Not sure if up to numbers you presented ( many seconds)...
>>
>>   [2]:
>>   INFO [main] 2012-07-27 12:18:27,135 CLibrary.java (line 109) JNA mlockall
>>   successful
>>
>>   --
>>   Mateusz Korniak