Re: Null Pointer Exception / Secondary Indices

2010-10-11 Thread J T
Hi, Looks like I was premature in my response. I had cause today to wipe my datastore and restart cassandra and reload the .yaml containing the schema definition. After doing a restart of my app which essentially inserted into a CF with a 2ndary idx and then queried that CF I was left with log f

Re: getSchemaVersion

2010-10-11 Thread Jonathan Ellis
On Mon, Oct 11, 2010 at 9:48 PM, B. Todd Burruss wrote: > i was actually doing this to start with and was worried that i could have > two clients modifying schemas at the same time.  it seems this could cause > multiple valid versions and a race condition.  maybe it simply "works out" > that i wai

Re: getSchemaVersion

2010-10-11 Thread B. Todd Burruss
On 10/11/2010 06:14 PM, Jonathan Ellis wrote: On Mon, Oct 11, 2010 at 7:53 PM, B. Todd Burruss wrote: to determine if my programmatic schema changes have been distributed throughout the cluster, I am supposed to use getSchemaVersionMap, correct? my question is how do I properly use it? I h

Re: getSchemaVersion

2010-10-11 Thread Jonathan Ellis
On Mon, Oct 11, 2010 at 7:53 PM, B. Todd Burruss wrote: >  to determine if my programmatic schema changes have been distributed > throughout the cluster, I am supposed to use getSchemaVersionMap, correct? > > my question is how do I properly use it?  I have the schema version returned > from the t

getSchemaVersion

2010-10-11 Thread B. Todd Burruss
to determine if my programmatic schema changes have been distributed throughout the cluster, I am supposed to use getSchemaVersionMap, correct? my question is how do I properly use it? I have the schema version returned from the thrift method, and I can lookup in the schema map returned getS

Re: Exception in the tool

2010-10-11 Thread Aaron Morton
Sounds like your are getting this problem... http://www.mail-archive.com/user@cassandra.apache.org/msg06295.htmlShould be fixed in the nightly build. You can still get the stats via JConsole. AaronOn 12 Oct, 2010,at 01:14 PM, Dmitri Smirnov wrote:Is below a normal thing? I am a newby, just unpacke

Exception in the tool

2010-10-11 Thread Dmitri Smirnov
Is below a normal thing? I am a newby, just unpacked and started a single node. $ bin/nodetool -h localhost -p 8080 version ReleaseVersion: 0.7.0-beta2 $ bin/nodetool -h localhost -p 8080 tpstats Pool NameActive Pending Completed MIGRATION_STAGE 0

Re: Nodes getting slowed down after a few days of smooth operation

2010-10-11 Thread Ran Tavory
Peter, you're my JVM GC hero! Thank you! On Tue, Oct 12, 2010 at 12:38 AM, Peter Schuller < peter.schul...@infidyne.com> wrote: > > My motivation was that since I don't have too much data (10G each node) > then > > why don't I cache the hell out of it, so I started with a cache size of > 100% > >

Understanding Range queries with Random Partition

2010-10-11 Thread Rana Aich
Hi, I've used range queries for Order Preserving Partition and got the satisfactory results. For instance, I can find first 1 million keys that starts with key '2008010100' and ends with '2008010200'. Now I'm trying to do the same with Random Partitioning. But here I find that for Range r

Re: Nodes getting slowed down after a few days of smooth operation

2010-10-11 Thread Peter Schuller
> My motivation was that since I don't have too much data (10G each node) then > why don't I cache the hell out of it, so I started with a cache size of 100% > and a much larger heap size (started with 12G out of the 16G ram). Over time > I've learned that too much heap for the JVM is like a kid in

Re: Nodes getting slowed down after a few days of smooth operation

2010-10-11 Thread Ran Tavory
Thanks Peter, Robert and Brandon. So it seems that the only suspect by now is my excessive caching ;) I'll get a better look at the GC activity next time shit starts to happen, but in the mean time, as for the cache size (cassandra's internal cache), it's row cache capacity is set to 10,000,000. I

Re: Wide rows or tons of rows?

2010-10-11 Thread Aaron Morton
No idea about a partial row cache, but I would start with fat rows in your use case. If you find that performance is really a problem then you could add a second "recent / oldest" CF that you maintain with the most recent entries and use the row cache there. OR add more nodes.  AaronOn 12 Oct, 2010

Re: Wide rows or tons of rows?

2010-10-11 Thread Jeremy Davis
Thanks for this reply. I'm wondering about the same issue... Should I bucket things into Wide rows (say 10M rows), or narrow (say 10K or 100K).. Of course it depends on my access patterns right... Does anyone know if a partial row cache is a feasible feature to implement? My use case is something

Re: Nodes getting slowed down after a few days of smooth operation

2010-10-11 Thread Peter Schuller
> 170141183460469231731687303715884105727 > 192.168.252.88Up         10.07 GB Firstly, I second the point raised about the row cache size (very frequent concurrent GC:s is definitely an indicator that the JVM heap size is too small, and the row cache seems like a likely contender - especially give

Re: Nodes getting slowed down after a few days of smooth operation

2010-10-11 Thread Peter Schuller
> I have wondered before whether there is any technical reason why the commit > log replay should end with a flush, and from what I can tell, there isn't > one other than the general goal of not having a large commit log. My > personal feeling is that the last thing you want your production node do

Re: Nodes getting slowed down after a few days of smooth operation

2010-10-11 Thread Robert Coli
On 10/11/10 7:13 AM, Ran Tavory wrote: After a node gets restarted it compacts the sstable files on disk. I'm not sure whether compactions always take place after restart, maybe it's just minor compactions, I'm a little confused here, but my story would work best if (major) compactions were al

Re: Wide rows or tons of rows?

2010-10-11 Thread Héctor Izquierdo Seliva
El lun, 11-10-2010 a las 11:08 -0400, Edward Capriolo escribió: Inlined: > 2010/10/11 Héctor Izquierdo Seliva : > > Hi everyone. > > > > I'm sure this question or similar has come up before, but I can't find a > > clear answer. I have to store a unknown number of items in cassandra, > > which can

Re: Multi Data Center Strategy

2010-10-11 Thread Edward Capriolo
On Mon, Oct 11, 2010 at 9:53 AM, Henry Luo wrote: > We have an application that does a lot of updates to the rows. We use > replication factor of 3 and are moving to multiple data centers. We would > like to accomplish the following setup: > > > > Data are replicated to other data centers. RackAwa

Re: Wide rows or tons of rows?

2010-10-11 Thread Edward Capriolo
2010/10/11 Héctor Izquierdo Seliva : > Hi everyone. > > I'm sure this question or similar has come up before, but I can't find a > clear answer. I have to store a unknown number of items in cassandra, > which can vary from a few hundreds to a few millions per customer. > > I read that in cassandra

Re: Nodes getting slowed down after a few days of smooth operation

2010-10-11 Thread Brandon Williams
On Mon, Oct 11, 2010 at 9:13 AM, Ran Tavory wrote: > In my production cluster I've been seeing the following pattern. > When a node goes up it operates smoothly for a few days but then, after a > few days the node start to show excessive CPU usage, I see GC activity (and > it may also be excessiv

Re: Problem Starting Cassandra

2010-10-11 Thread Eric Evans
On Fri, 2010-10-08 at 16:34 -0500, Michael Shuler wrote: > This looks like you haven't set up the system to use the Sun JRE, yet. > Debian/Ubuntu uses CGJ by default. OpenJDK works fine as well (package openjdk-6-jre). -- Eric Evans eev...@rackspace.com

Wide rows or tons of rows?

2010-10-11 Thread Héctor Izquierdo Seliva
Hi everyone. I'm sure this question or similar has come up before, but I can't find a clear answer. I have to store a unknown number of items in cassandra, which can vary from a few hundreds to a few millions per customer. I read that in cassandra wide rows are better than a lot of rows, but then

Multi Data Center Strategy

2010-10-11 Thread Henry Luo
We have an application that does a lot of updates to the rows. We use replication factor of 3 and are moving to multiple data centers. We would like to accomplish the following setup: Data are replicated to other data centers. RackAwareStrategy seems to be able to handle that, however 1)

Re: Cassandra newbie question

2010-10-11 Thread Gary Dusbabek
On Mon, Oct 11, 2010 at 04:01, Arijit Mukherjee wrote: > Hi All > > I've just started reading about Cassandra and writing simple tests > using Cassandra 0.6.5 to see if we can use it for our product. > > I have a data store with a set of columns, like C1, C2, C3, and C4, > but the columns aren't m

Re: replacing a dead node

2010-10-11 Thread Gary Dusbabek
On Mon, Oct 11, 2010 at 03:41, Chen Xinli wrote: > Hi, > > We have a cassandra cluster of 6 nodes with RF=3, read-repair enabled, > hinted handoff disabled, WRITE with QUORUM, READ with ONE. > we want to rely on read-repair totally for node failure, as returning > inconsistent result temporarily i

Re: Retaining commit logs

2010-10-11 Thread Oleg Anastasyev
Matthew Dennis riptano.com> writes: > Yes, please file it to Jira.  It seems like it would be pretty useful for various things and fairly easy to change the code to move it to another directory whenever C* thinks it should be deleted... Here it is for 0.6.4 version. Should work on a 0.6.5 as well

Re: Cassandra newbie question

2010-10-11 Thread Arijit Mukherjee
Just a follow on question to this - would PIG be a good fit for such questions? Arijit On 11 October 2010 14:31, Arijit Mukherjee wrote: > Hi All > > I've just started reading about Cassandra and writing simple tests > using Cassandra 0.6.5 to see if we can use it for our product. > > I have a d

Cassandra newbie question

2010-10-11 Thread Arijit Mukherjee
Hi All I've just started reading about Cassandra and writing simple tests using Cassandra 0.6.5 to see if we can use it for our product. I have a data store with a set of columns, like C1, C2, C3, and C4, but the columns aren't mandatory. For example, there can be a list of (k.v) pairs with only

replacing a dead node

2010-10-11 Thread Chen Xinli
Hi, We have a cassandra cluster of 6 nodes with RF=3, read-repair enabled, hinted handoff disabled, WRITE with QUORUM, READ with ONE. we want to rely on read-repair totally for node failure, as returning inconsistent result temporarily is ok for us. If a node is temporarily dead and returneded to