RE: error using get_range_slice with random partitioner

2010-08-09 Thread Adam Crain
Hi Thomas,

Can you share your client code for the iteration? It would probably help me 
catch my problem. Anyone know where in the cassandra source the integration 
tests are for this functionality on the random partitioner?

Note that I posted a specific example where the iteration failed and I was not 
throwing out good keys only duplicate ones. That means 1 of 2 things:

1) I'm somehow using the API incorrectly
2) I am the only one encountering a bug

My money is on 1) of course.  I can check the thrift API against what my Scala 
client is calling under the hood.

-Adam


-Original Message-
From: th.hel...@gmail.com on behalf of Thomas Heller
Sent: Fri 8/6/2010 7:17 PM
To: user@cassandra.apache.org
Subject: Re: error using get_range_slice with random partitioner
 
On Sat, Aug 7, 2010 at 1:05 AM, Adam Crain
adam.cr...@greenenergycorp.com wrote:
 I took this approach... reject the first result of subsequent get_range_slice 
 requests. If you look back at output I posted (below) you'll notice that not 
 all of the 30 keys [key1...key30] get listed! The iteration dies and can't 
 proceed past key2.

 1) 1st batch gets 10 unique keys.
 2) 2nd batch only gets 9 unique keys with the 1st being a repeat
 3) 3rd batch only get 2 unqiue keys 

 That means the iteration didn't see 9 keys in the CF. Key7 and Key30 are 
 missing for example.


Remember the returned results are NOT sorted, so you whenever you are
dropping the first by default, you might be dropping a good one. At
least that would be my guess here.

I have iteration implemented in my client and everything is working as
expected and so far I never had duplicates (running 0.6.3). I'm using
tokens for range_slices tho, increment/decrement for get_slice only.

/thomas




winmail.dat

Re: TokenRange contains endpoints without any port information?

2010-08-09 Thread Gary Dusbabek
On Sun, Aug 8, 2010 at 07:21, Carsten Krebs carsten.kr...@gmx.net wrote:

 I'm wondering why a TokenRange returned by describe_ring(keyspace) of the 
 thrift API just returns endpoints consisting only of an address but omits any 
 port information?
 My first thought was, this method could be used to expose some information 
 about the ring structure to the client, i.e. to do some client side load 
 balancing. But now, I'm not sure about this anymore. Additionally, when 
 looking into the code, I guess the address returned as part of the TokenRange 
 is the address of the storage service which could differ from the thrift 
 address, which in turn would make the returned endpoint useless for the 
 client.
 What is the purpose of this method

To give a picture of the ring topology.

or respectively why is the port information omitted?

You already knew the thrift port to make the query connection.  The
only other port you *might* need to be concerned with is the storage
port, which is assumed to be constant across the cluster.  But really,
from a client perspective it does you no good to know this port, so
why bother exposing it?

Gary.


 TIA,

 Carsten




Question on nodetool ring

2010-08-09 Thread Mark
I'm running a 2 node cluster and when I run nodetool ring I get the 
following output


Address Status State   LoadToken
   
160032583171087979418578389981025646900
127.0.0.1  Up Normal  42.28 MB
42909338385373526599163667549814010691
127.0.0.2   Up Normal  42.26 MB
160032583171087979418578389981025646900


The columns/values are pretty much self explanatory except for the first 
line. What is this value?


Thanks


Re: Question on load balancing in a cluster

2010-08-09 Thread anand_s

Cool thanks, I think I will experiment with nodetool move.

Can somebody confirm on the reason for decommissioning, instead of just
splitting the token on the fly? Yes it does seem simpler to just
decommission and bootstrap, but that does mean a lot of data has to be moved
around to get a reasonable load distribution. Load distribution is a bigger
need when a new node is introduced and so load needs to be balanced.  This
also means that when a node is decommissioned the load on its immediate
neighbor increases. 

In this example where A,B,C,E is a cluster with load being 80, 78, 83, 84.
ow I add a new node D (position will be before E), so eventually after all
the rebalance activity I want the load to be ~66 (245/5). Now is that
unreasonable to expect, because if it is not then the each node will have to
be decommissioned and bootstrapped to get that distribution (almost the
entire dataset will need to be moved), now that is a lot of data movement!!
unless I have got this wrong?
-- 
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Question-on-load-balancing-in-a-cluster-tp5375140p5389827.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: batch_mutate atomicity

2010-08-09 Thread Peter Schuller
 I am using the familiar meanings from ACID:

 atomic means either the entire update will succeed or none of it.

 isolated means other threads will not see partial updates while it is
 being applied.

A related concern is whether there is a write *ordering* guarantee for
mutations within a row key. Ensuring consistency in the face of
concurrent access can in some (probably several) cases become a lot
easier with an ordering guarantee which would otherwise necessitate an
RPC call between potentially every mutation (depending on where
dependencies between writes are).

Upon cursory inspection it *seems* to me that ordering is maintained,
but even if this is correct, can one consider Cassandra to have such
an ordering guarantee or is any such behavior an artifact of current
implementation details?

By ordering guarantee I am only thinking of single column reads or
distinct get_slice() calls; I am not expecting ordering guarantees
w.r.t. visibility within a single get_slice(). (Additionally I am
assuming QUOROM or RF=1, else it would not be useful to rely on
anyway.)

-- 
/ Peter Schuller


COMMIT-LOG_WRITER Assertion Error

2010-08-09 Thread Arya Goudarzi
Just throwing this out there as it could be a concern. I had a cluster of 3 
nodes running. Over the weekend I updated to trunc (Aug 9th @ 2pm). Today, I 
came to run my daily tests and my client kept giving me TSocket timeouts. 
Checking the error log of Cassandra servers, all 3 nodes had this and they all 
became unresponsive! Not sure how to reproduce this but a restart of all 3 
nodes fixed the issue:

ERROR [COMMIT-LOG-WRITER] 2010-08-09 11:30:27,722 CassandraDaemon.java (line 
82) Uncaught exception in thread Thread[COMMIT-LOG-WRITER,5,main]
java.lang.AssertionError
at 
org.apache.cassandra.db.commitlog.CommitLogHeader$CommitLogHeaderSerializer.serialize(CommitLogHeader.java:157)
at 
org.apache.cassandra.db.commitlog.CommitLogHeader.writeCommitLogHeader(CommitLogHeader.java:124)
at 
org.apache.cassandra.db.commitlog.CommitLogSegment.writeHeader(CommitLogSegment.java:70)
at 
org.apache.cassandra.db.commitlog.CommitLogSegment.write(CommitLogSegment.java:103)
at 
org.apache.cassandra.db.commitlog.CommitLog$LogRecordAdder.run(CommitLog.java:521)
at 
org.apache.cassandra.db.commitlog.PeriodicCommitLogExecutorService$1.runMayThrow(PeriodicCommitLogExecutorService.java:52)
at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
at java.lang.Thread.run(Thread.java:636)


-Arya


Re: Question on nodetool ring

2010-08-09 Thread S Ahmed
that's the token range

so node#1 is from 1600.. to 429..
node#2 is from 429... to 1600...

hopefully others can chime into confirm.

On Mon, Aug 9, 2010 at 12:30 PM, Mark static.void@gmail.com wrote:

 I'm running a 2 node cluster and when I run nodetool ring I get the
 following output

 Address Status State   LoadToken

 160032583171087979418578389981025646900
 127.0.0.1  Up Normal  42.28 MB
  42909338385373526599163667549814010691
 127.0.0.2   Up Normal  42.26 MB
  160032583171087979418578389981025646900

 The columns/values are pretty much self explanatory except for the first
 line. What is this value?

 Thanks



Re: Question on nodetool ring

2010-08-09 Thread Mark

On 8/9/10 12:51 PM, S Ahmed wrote:

that's the token range

so node#1 is from 1600.. to 429..
node#2 is from 429... to 1600...

hopefully others can chime into confirm.

On Mon, Aug 9, 2010 at 12:30 PM, Mark static.void@gmail.com 
mailto:static.void@gmail.com wrote:


I'm running a 2 node cluster and when I run nodetool ring I get
the following output

Address Status State   LoadToken
 
160032583171087979418578389981025646900
127.0.0.1  Up Normal  42.28 MB  
 42909338385373526599163667549814010691
127.0.0.2   Up Normal  42.26 MB  
 160032583171087979418578389981025646900


The columns/values are pretty much self explanatory except for the
first line. What is this value?

Thanks


I was just wondering why the 160032583171087979418578389981025646900 
token is on a line by itself and listed under 127.0.0.2.


Growing commit log directory.

2010-08-09 Thread Edward Capriolo
I have a 16 node 6.3 cluster and two nodes from my cluster are giving
me major headaches.

10.71.71.56   Up 58.19 GB
10827166220211678382926910108067277|   ^
10.71.71.61   Down   67.77 GB
123739042516704895804863493611552076888v   |
10.71.71.66   Up 43.51 GB
127605887595351923798765477786913079296|   ^
10.71.71.59   Down   90.22 GB
139206422831293007780471430312996086499v   |
10.71.71.65   Up 22.97 GB
148873535527910577765226390751398592512|   ^

The symptoms I am seeing are nodes 61 and nodes 59 have huge 6 GB +
commit log directories. They keep growing, along with memory usage,
eventually the logs start showing GCInspection errors and then the
nodes will go OOM

INFO 14:20:01,296 Creating new commitlog segment
/var/lib/cassandra/commitlog/CommitLog-1281378001296.log
 INFO 14:20:02,199 GC for ParNew: 327 ms, 57545496 reclaimed leaving
7955651792 used; max is 9773776896
 INFO 14:20:03,201 GC for ParNew: 443 ms, 45124504 reclaimed leaving
8137412920 used; max is 9773776896
 INFO 14:20:04,314 GC for ParNew: 438 ms, 54158832 reclaimed leaving
8310139720 used; max is 9773776896
 INFO 14:20:05,547 GC for ParNew: 409 ms, 56888760 reclaimed leaving
8480136592 used; max is 9773776896
 INFO 14:20:06,900 GC for ParNew: 441 ms, 58149704 reclaimed leaving
8648872520 used; max is 9773776896
 INFO 14:20:08,904 GC for ParNew: 462 ms, 59185992 reclaimed leaving
8816581312 used; max is 9773776896
 INFO 14:20:09,973 GC for ParNew: 460 ms, 57403840 reclaimed leaving
8986063136 used; max is 9773776896
 INFO 14:20:11,976 GC for ParNew: 447 ms, 59814376 reclaimed leaving
9153134392 used; max is 9773776896
 INFO 14:20:13,150 GC for ParNew: 441 ms, 61879728 reclaimed leaving
9318140296 used; max is 9773776896
java.lang.OutOfMemoryError: Java heap space
Dumping heap to java_pid10913.hprof ...
 INFO 14:22:30,620 InetAddress /10.71.71.66 is now dead.
 INFO 14:22:30,621 InetAddress /10.71.71.65 is now dead.
 INFO 14:22:30,621 GC for ConcurrentMarkSweep: 44862 ms, 261200
reclaimed leaving 9334753480 used; max is 9773776896
 INFO 14:22:30,621 InetAddress /10.71.71.64 is now dead.

Heap dump file created [12730501093 bytes in 253.445 secs]
ERROR 14:28:08,945 Uncaught exception in thread Thread[Thread-2288,5,main]
java.lang.OutOfMemoryError: Java heap space
at 
org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:71)
ERROR 14:28:08,948 Uncaught exception in thread Thread[Thread-2281,5,main]
java.lang.OutOfMemoryError: Java heap space
at 
org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:71)
 INFO 14:28:09,017 GC for ConcurrentMarkSweep: 33737 ms, 85880
reclaimed leaving 9335215296 used; max is 9773776896

Does anyone have any ideas what is going on?


Re: TokenRange contains endpoints without any port information?

2010-08-09 Thread Carsten Krebs

On 08.08.2010, at 14:47 aaron morton wrote:
 
 What sort of client side load balancing where you thinking of? I just use 
 round robin DNS to distribute clients around the cluster, and have them 
 recycle their connections every so often. 
 
I was thinking about to use this method to give the client to the ability to 
learn what nodes are part of the cluster. Using this information to 
automatically adapt the set of nodes used by the client if a new node is added 
to or respectively removed from the cluster.

Why do you prefer round robin DNS for load balancing? 
One advantage I see is, that the client does not has to take care about the 
node set and especially the management of the node set. The reason why I was 
thinking about a client side load balancing was to avoid the need to write 
additional tools, to monitor all nodes in the cluster and changing the DNS 
entry if any node fails - and this as fast as possible to prevent the clients 
from trying to use a dead node. But the time writing this, I doesn't think 
anymore, that this is good point. This is just a point of some sort of retry 
logic, which is needed anyway in the client.

Carsten



Re: Question on nodetool ring

2010-08-09 Thread S Ahmed
b/c node#1 has a start and end range, so you can see the boundaries for each
node by looking at the last column.

On Mon, Aug 9, 2010 at 4:12 PM, Mark static.void@gmail.com wrote:

  On 8/9/10 12:51 PM, S Ahmed wrote:

 that's the token range

  so node#1 is from 1600.. to 429..
 node#2 is from 429... to 1600...

  hopefully others can chime into confirm.

 On Mon, Aug 9, 2010 at 12:30 PM, Mark static.void@gmail.com wrote:

 I'm running a 2 node cluster and when I run nodetool ring I get the
 following output

 Address Status State   LoadToken

 160032583171087979418578389981025646900
 127.0.0.1  Up Normal  42.28 MB
  42909338385373526599163667549814010691
 127.0.0.2   Up Normal  42.26 MB
  160032583171087979418578389981025646900

 The columns/values are pretty much self explanatory except for the first
 line. What is this value?

 Thanks


  I was just wondering why the 160032583171087979418578389981025646900
 token is on a line by itself and listed under 127.0.0.2.



Re: Growing commit log directory.

2010-08-09 Thread S Ahmed
if your commit logs are not getting cleared, doesn't that indicate your load
is more than your servers can handle?


On Mon, Aug 9, 2010 at 4:50 PM, Edward Capriolo edlinuxg...@gmail.comwrote:

 I have a 16 node 6.3 cluster and two nodes from my cluster are giving
 me major headaches.

 10.71.71.56   Up 58.19 GB
 10827166220211678382926910108067277|   ^
 10.71.71.61   Down   67.77 GB
 123739042516704895804863493611552076888v   |
 10.71.71.66   Up 43.51 GB
 127605887595351923798765477786913079296|   ^
 10.71.71.59   Down   90.22 GB
 139206422831293007780471430312996086499v   |
 10.71.71.65   Up 22.97 GB
 148873535527910577765226390751398592512|   ^

 The symptoms I am seeing are nodes 61 and nodes 59 have huge 6 GB +
 commit log directories. They keep growing, along with memory usage,
 eventually the logs start showing GCInspection errors and then the
 nodes will go OOM

 INFO 14:20:01,296 Creating new commitlog segment
 /var/lib/cassandra/commitlog/CommitLog-1281378001296.log
  INFO 14:20:02,199 GC for ParNew: 327 ms, 57545496 reclaimed leaving
 7955651792 used; max is 9773776896
  INFO 14:20:03,201 GC for ParNew: 443 ms, 45124504 reclaimed leaving
 8137412920 used; max is 9773776896
  INFO 14:20:04,314 GC for ParNew: 438 ms, 54158832 reclaimed leaving
 8310139720 used; max is 9773776896
  INFO 14:20:05,547 GC for ParNew: 409 ms, 56888760 reclaimed leaving
 8480136592 used; max is 9773776896
  INFO 14:20:06,900 GC for ParNew: 441 ms, 58149704 reclaimed leaving
 8648872520 used; max is 9773776896
  INFO 14:20:08,904 GC for ParNew: 462 ms, 59185992 reclaimed leaving
 8816581312 used; max is 9773776896
  INFO 14:20:09,973 GC for ParNew: 460 ms, 57403840 reclaimed leaving
 8986063136 used; max is 9773776896
  INFO 14:20:11,976 GC for ParNew: 447 ms, 59814376 reclaimed leaving
 9153134392 used; max is 9773776896
  INFO 14:20:13,150 GC for ParNew: 441 ms, 61879728 reclaimed leaving
 9318140296 used; max is 9773776896
 java.lang.OutOfMemoryError: Java heap space
 Dumping heap to java_pid10913.hprof ...
  INFO 14:22:30,620 InetAddress /10.71.71.66 is now dead.
  INFO 14:22:30,621 InetAddress /10.71.71.65 is now dead.
  INFO 14:22:30,621 GC for ConcurrentMarkSweep: 44862 ms, 261200
 reclaimed leaving 9334753480 used; max is 9773776896
  INFO 14:22:30,621 InetAddress /10.71.71.64 is now dead.

 Heap dump file created [12730501093 bytes in 253.445 secs]
 ERROR 14:28:08,945 Uncaught exception in thread Thread[Thread-2288,5,main]
 java.lang.OutOfMemoryError: Java heap space
at
 org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:71)
 ERROR 14:28:08,948 Uncaught exception in thread Thread[Thread-2281,5,main]
 java.lang.OutOfMemoryError: Java heap space
at
 org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:71)
  INFO 14:28:09,017 GC for ConcurrentMarkSweep: 33737 ms, 85880
 reclaimed leaving 9335215296 used; max is 9773776896

 Does anyone have any ideas what is going on?



Re: row cache during bootstrap

2010-08-09 Thread Artie Copeland
On Sun, Aug 8, 2010 at 5:24 AM, aaron morton aa...@thelastpickle.comwrote:

 Not sure how feasible it is or if it's planned. But it would probably
 require that the nodes are able so share the state of their row cache so as
 to know which parts to warm. Otherwise it sounds like you're assuming the
 node can hold the entire data set in memory.

 Im not assuming the node can hold the entire data set in cassandra in
memory, if thats what you meant. I was thinking of sharing the state of the
row cache, but only those keys that are being moved for the token.  the
other keys can stay hidden to the node.


 If you know in your application when you would like data to be in the
 cache, you can send a query like get_range_slices to the cluster and ask for
 0 columns. That will warm the row cache for the keys it hits.


This is a tuff one as our row cache is over 20 million and takes a while to
get a large hit ratio. so while we try to preload it is taking requests.  If
it were possible to bring up a node that doesnt announce its availability to
the cluster that would help us manually warm the cache.  I know this feature
is in the issue tracker currently, but didnt look like it would come out
anytime before 0.8.


 I have heard it mentioned that the coordinator node will take action to
 when one node is considered to be running slow. So it may be able to work
 around the new node until it gets warmed up.


That is interesting i haven't heard that one.  I think with the parallel
reads that are happening it makes sense that it would be possible.  That is
unless the data is local.  I believe in that case it always prefers to read
local vs over the network, so if the local machine is the slow node that
wouldnt help.


 Are you adding nodes often?

Currently not that often.  The main issue is we have very stringent latency
requirements and anything that would affect those we have to understand the
worst case cost to see if we can avoid them.


 Aaron

 On 7 Aug 2010, at 11:17, Artie Copeland wrote:

 the way i understand how row caches work is that each node has an
 independent cache, in that they do not push there cache contents with other
 nodes.  if that the case is it also true that when a new node is added to
 the cluster it has to build up its own cache.  if thats the case i see that
 as a possible performance bottle neck once the node starts to accept
 requests.  since there is no way i know of to warm the cache without adding
 the node to the cluster.  would it be infeasible to have part of the
 bootstrap process not only stream data from nodes but also cached rows that
 are associated with those same keys?  that would allow the new nodes to be
 able to provide the best performance once the bootstrap process finishes.

 --
 http://yeslinux.org
 http://yestech.org





-- 
http://yeslinux.org
http://yestech.org


Re: backport of pre cache load

2010-08-09 Thread Artie Copeland
No we aren't caching 100%, we cache over 20 - 30 million which only starts
to get a high hit rate overtime so to have a useful cache can take over a
week of running.  We would love to store the complete CF in memory but know
know of a server that can hold that much data in memory while still being
commodity.  Our data set is currently over 100GB.

On Fri, Aug 6, 2010 at 5:54 PM, Jonathan Ellis jbel...@gmail.com wrote:

 are you caching 100% of the CF?

 if not this is not super useful.

 On Fri, Aug 6, 2010 at 7:10 PM, Artie Copeland yeslinux@gmail.com
 wrote:
  would it be possible to backport the 0.7 feature, the ability to safe and
  preload row caches after a restart.  i think that is a very nice and
  important feature that would help users with very large caches, that take
 a
  long time to get the proper hot set.  for example we can get pretty good
  cache row cache hits if we run the servers for a month or more as the
 data
  tends to settle down.
  --
  http://yeslinux.org
  http://yestech.org
 



 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of Riptano, the source for professional Cassandra support
 http://riptano.com




-- 
http://yeslinux.org
http://yestech.org


Re: TokenRange contains endpoints without any port information?

2010-08-09 Thread Aaron Morton
The FAQ lists Round-Robin as the recommended way to find a node to connect to...http://wiki.apache.org/cassandra/FAQ#node_clients_connect_toAs you say, your clients need to retry anyway. I have them hold the connection for a while (on the scale of minutes), then hit the DNS again and acquire a new connection. This lets them pickup new nodes and (i think over time) helps with keeping connections balanced around the cluster.If a node goes down for a shot time, it should not have too much of an affect on the clients. If you are taking a node out of the cluster you will need to update the DNS to remove it.AaronOn 10 Aug, 2010,at 08:51 AM, Carsten Krebs carsten.kr...@gmx.net wrote:
On 08.08.2010, at 14:47 aaron morton wrote:
 
 What sort of client side load balancing where you thinking of? I just use round robin DNS to distribute clients around the cluster, and have them recycle their connections every so often. 
 
I was thinking about to use this method to give the client to the ability to "learn" what nodes are part of the cluster. Using this information to automatically adapt the set of nodes used by the client if a new node is added to or respectively removed from the cluster.

Why do you prefer round robin DNS for load balancing? 
One advantage I see is, that the client does not has to take care about the node set and especially the management of the node set. The reason why I was thinking about a client side load balancing was to avoid the need to write additional tools, to monitor all nodes in the cluster and changing the DNS entry if any node fails - and this as fast as possible to prevent the clients from trying to use a dead node. But the time writing this, I doesn't think anymore, that this is good point. This is just a point of some sort of retry logic, which is needed anyway in the client.

Carsten



Re: 2 nodes on one machine

2010-08-09 Thread Aaron Morton
http://www.onemanclapping.org/2010/03/running-multiple-cassandra-nodes-on.htmlAlso some recent discussion on the users list.AaronOn 10 Aug, 2010,at 08:58 AM, Pavlo Baron p...@pbit.org wrote:Hello users,

I'm a total Cassandra noob beside what I read about it, so please be 
patient :)

I want to setup a cluster with 2 nodes on one VirtualBox-ed CentOS. I 
don't really want to start with the single node example, but with my 
desired setup. Do I have to do much more than to have 2 network 
interfaces so I can configure Cassandra nodes to run on different 
subnets / IP-Adresses? Is it possible at all to have several instances 
run on the same machine?

Could you point me at a doc which describes a setup like that? The rest 
of the setup would even be a cluster of those 2 nodes, but even with 
nodes running on the same machine. Background: I want this setup to be 
frozen in the VirtualBox image.

many thx in advance and best rgds

Pavlo


Re: 2 nodes on one machine

2010-08-09 Thread Pavlo Baron
cool, thank you Aaron, I'll check it out through the next days and post 
the results


Pavlo

Am 10.08.2010 00:11, schrieb Aaron Morton:

http://www.onemanclapping.org/2010/03/running-multiple-cassandra-nodes-on.html

Also some recent discussion on the users list.

Aaron


On 10 Aug, 2010,at 08:58 AM, Pavlo Baron p...@pbit.org wrote:


Hello users,

I'm a total Cassandra noob beside what I read about it, so please be
patient :)

I want to setup a cluster with 2 nodes on one VirtualBox-ed CentOS. I
don't really want to start with the single node example, but with my
desired setup. Do I have to do much more than to have 2 network
interfaces so I can configure Cassandra nodes to run on different
subnets / IP-Adresses? Is it possible at all to have several instances
run on the same machine?

Could you point me at a doc which describes a setup like that? The rest
of the setup would even be a cluster of those 2 nodes, but even with
nodes running on the same machine. Background: I want this setup to be
frozen in the VirtualBox image.

many thx in advance and best rgds

Pavlo




Using a separate commit log drive was 4x slower

2010-08-09 Thread Jeremy Davis
I have a weird one to share with the list, Using a separate commit log drive
dropped my performance a lot more than I would expect...

I'm doing perf tests on 3 identical machines but with 3 different drive
sets. (SAS 15K,10K, and SATA 7.5K)
Each system has a single system disk (Same as the data set) and the data set
( a 5 drive RAID-0 )

I'm using Cassandra 0.6.4 with Java 1.6_20.
This is all RF=1, CL=1

I inserted an initial data set of 100K keys (each with 1000 columns of
random data (1000 bytes). Compacted and restarted Cassandra.

Then I did a write baseline where I have 500 threads inserting a random 1000
bytes on a random key/column combination (always 1 column per request).
If my commit log is on my RAID'd Data drive I get about 19K Columns/Inserts
per second.

If I then add some random reads ( 30 threads doing a random read Key/Column
read - always 1 column per read) I get ~ 8K Reads/Writes per second

 Host
 Write Baseline Columns Per Second.
 Write Columns Per Second.
 Read Columns Per Second.
  SAS15K
 18800
 8700
 8100
  SAS10K
 15800
 7600
 7300
  SATA
 13200
 7300
 8000

Now, if I do the same thing but with the commit log on the system disk, I
get:

 Host
 Write Baseline Columns Per Second.
 Write Columns Per Second.
 Read Columns Per Second.
  SAS15K
 12600
 2200
 1600
  SAS10K
 11400
 3000
 1900
  SATA
 9900
 3100
 1800


I would think that the Write level would stay at about the baseline, and I
have no idea why the read level would be so low.

Any thoughts?


Some iostat (while separate commit log):

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
  38.330.004.722.480.00   54.47

Device: rrqm/s   wrqm/s r/s w/s   rsec/s   wsec/s avgrq-sz
avgqu-sz   await  svctm  %util
cciss/c0d00.00   908.500.00  110.50 0.00  8152.00
73.77 0.575.20   4.93  54.50
cciss/c0d10.00 0.00   16.500.00  1424.00 0.00
86.30 0.106.06   2.73   4.50
dm-0  0.00 0.000.00 1019.00 0.00  8152.00
8.00 6.256.13   0.53  54.50
dm-1  0.00 0.000.000.00 0.00 0.00
0.00 0.000.00   0.00   0.00

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
  37.420.002.373.540.00   56.68

Device: rrqm/s   wrqm/s r/s w/s   rsec/s   wsec/s avgrq-sz
avgqu-sz   await  svctm  %util
cciss/c0d00.00   854.500.00  124.50 0.00  7816.00
62.78 0.614.94   4.82  60.00
cciss/c0d10.00 0.00   32.000.00  4032.00 0.00
126.00 0.216.72   3.12  10.00
dm-0  0.00 0.000.00  979.50 0.00  7836.00
8.00 5.575.69   0.61  60.00
dm-1  0.00 0.000.000.00 0.00 0.00
0.00 0.000.00   0.00   0.00


some top (while separate commit log):

top - 15:56:38 up 6 days, 21:26,  9 users,  load average: 17.09, 7.92, 6.87
Tasks: 358 total,   1 running, 357 sleeping,   0 stopped,   0 zombie
Cpu(s): 35.4%us,  1.6%sy,  0.0%ni, 59.4%id,  3.2%wa,  0.0%hi,  0.5%si,
0.0%st
Mem:  24729068k total, 19789732k used,  4939336k free,   132056k buffers
Swap:  5849080k total,54976k used,  5794104k free, 14839884k cached

  PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+
COMMAND

19411 root  20   0  142g 8.2g 5.0g S  599 34.9 423:25.42 java




Storage
  ClusterNameTest Cluster/ClusterName
  AutoBootstrapfalse/AutoBootstrap
  HintedHandoffEnabledtrue/HintedHandoffEnabled

Keyspaces
Keyspace Name=Keyspace1
  ColumnFamily Name=PerfTest CompareWith=LongType/

ReplicaPlacementStrategyorg.apache.cassandra.locator.RackUnawareStrategy/ReplicaPlacementStrategy
  ReplicationFactor1/ReplicationFactor

EndPointSnitchorg.apache.cassandra.locator.EndPointSnitch/EndPointSnitch
/Keyspace

  /Keyspaces


Authenticatororg.apache.cassandra.auth.AllowAllAuthenticator/Authenticator

Partitionerorg.apache.cassandra.dht.OrderPreservingPartitioner/Partitioner
  InitialToken/InitialToken
  CommitLogDirectory/data/commitlog/CommitLogDirectory
  DataFileDirectories
  DataFileDirectory/data/data/DataFileDirectory
  /DataFileDirectories
  Seeds
  Seed127.0.0.1/Seed
  /Seeds

RpcTimeoutInMillis1/RpcTimeoutInMillis
  CommitLogRotationThresholdInMB1024/CommitLogRotationThresholdInMB
  ListenAddress10.2.60.20/ListenAddress
  StoragePort7000/StoragePort
  ThriftAddress10.2.60.20/ThriftAddress
  ThriftPort9160/ThriftPort
  ThriftFramedTransportfalse/ThriftFramedTransport
  DiskAccessModeauto/DiskAccessMode
  RowWarningThresholdInMB512/RowWarningThresholdInMB
  SlicedBufferSizeInKB64/SlicedBufferSizeInKB

FlushDataBufferSizeInMB32/FlushDataBufferSizeInMB
  FlushIndexBufferSizeInMB8/FlushIndexBufferSizeInMB
  ColumnIndexSizeInKB64/ColumnIndexSizeInKB
  MemtableThroughputInMB512/MemtableThroughputInMB
  BinaryMemtableThroughputInMB256/BinaryMemtableThroughputInMB
  MemtableOperationsInMillions1.2/MemtableOperationsInMillions
  

Re: error using get_range_slice with random partitioner

2010-08-09 Thread Thomas Heller
Sure, but its in my ruby client which currently has close to no
documentation. ;)

Client is here:
http://github.com/thheller/greek_architect

Relevant Row Spec:
http://bit.ly/9uS6Ba

Row-based iteration:
http://bit.ly/cRVSTc #each_slice

Currently uses a hack since I wasnt able to produce cassandra
BigInteger Tokens in Ruby. I'm a math noob and couldnt figure out why
some of the Tokens would differ. I just spawn a Java Process and use
that to generate the Tokens, insanely slow but I dont use that feature
anymore anyways. ;)

CF-Iteration:
http://bit.ly/bNgsRG #each

Its all a little abstracted away I guess but I hope you can follow the
relevant thrift calls.

HTH,
/thomas


On Mon, Aug 9, 2010 at 3:55 PM, Adam Crain
adam.cr...@greenenergycorp.com wrote:
 Hi Thomas,

 Can you share your client code for the iteration? It would probably help me 
 catch my problem. Anyone know where in the cassandra source the integration 
 tests are for this functionality on the random partitioner?

 Note that I posted a specific example where the iteration failed and I was 
 not throwing out good keys only duplicate ones. That means 1 of 2 things:

 1) I'm somehow using the API incorrectly
 2) I am the only one encountering a bug

 My money is on 1) of course.  I can check the thrift API against what my 
 Scala client is calling under the hood.

 -Adam


 -Original Message-
 From: th.hel...@gmail.com on behalf of Thomas Heller
 Sent: Fri 8/6/2010 7:17 PM
 To: user@cassandra.apache.org
 Subject: Re: error using get_range_slice with random partitioner

 On Sat, Aug 7, 2010 at 1:05 AM, Adam Crain
 adam.cr...@greenenergycorp.com wrote:
 I took this approach... reject the first result of subsequent 
 get_range_slice requests. If you look back at output I posted (below) you'll 
 notice that not all of the 30 keys [key1...key30] get listed! The iteration 
 dies and can't proceed past key2.

 1) 1st batch gets 10 unique keys.
 2) 2nd batch only gets 9 unique keys with the 1st being a repeat
 3) 3rd batch only get 2 unqiue keys 

 That means the iteration didn't see 9 keys in the CF. Key7 and Key30 are 
 missing for example.


 Remember the returned results are NOT sorted, so you whenever you are
 dropping the first by default, you might be dropping a good one. At
 least that would be my guess here.

 I have iteration implemented in my client and everything is working as
 expected and so far I never had duplicates (running 0.6.3). I'm using
 tokens for range_slices tho, increment/decrement for get_slice only.

 /thomas







Re: Growing commit log directory.

2010-08-09 Thread Benjamin Black
what does the io load look like on those nodes?

On Mon, Aug 9, 2010 at 1:50 PM, Edward Capriolo edlinuxg...@gmail.com wrote:
 I have a 16 node 6.3 cluster and two nodes from my cluster are giving
 me major headaches.

 10.71.71.56   Up         58.19 GB
 10827166220211678382926910108067277    |   ^
 10.71.71.61   Down       67.77 GB
 123739042516704895804863493611552076888    v   |
 10.71.71.66   Up         43.51 GB
 127605887595351923798765477786913079296    |   ^
 10.71.71.59   Down       90.22 GB
 139206422831293007780471430312996086499    v   |
 10.71.71.65   Up         22.97 GB
 148873535527910577765226390751398592512    |   ^

 The symptoms I am seeing are nodes 61 and nodes 59 have huge 6 GB +
 commit log directories. They keep growing, along with memory usage,
 eventually the logs start showing GCInspection errors and then the
 nodes will go OOM

 INFO 14:20:01,296 Creating new commitlog segment
 /var/lib/cassandra/commitlog/CommitLog-1281378001296.log
  INFO 14:20:02,199 GC for ParNew: 327 ms, 57545496 reclaimed leaving
 7955651792 used; max is 9773776896
  INFO 14:20:03,201 GC for ParNew: 443 ms, 45124504 reclaimed leaving
 8137412920 used; max is 9773776896
  INFO 14:20:04,314 GC for ParNew: 438 ms, 54158832 reclaimed leaving
 8310139720 used; max is 9773776896
  INFO 14:20:05,547 GC for ParNew: 409 ms, 56888760 reclaimed leaving
 8480136592 used; max is 9773776896
  INFO 14:20:06,900 GC for ParNew: 441 ms, 58149704 reclaimed leaving
 8648872520 used; max is 9773776896
  INFO 14:20:08,904 GC for ParNew: 462 ms, 59185992 reclaimed leaving
 8816581312 used; max is 9773776896
  INFO 14:20:09,973 GC for ParNew: 460 ms, 57403840 reclaimed leaving
 8986063136 used; max is 9773776896
  INFO 14:20:11,976 GC for ParNew: 447 ms, 59814376 reclaimed leaving
 9153134392 used; max is 9773776896
  INFO 14:20:13,150 GC for ParNew: 441 ms, 61879728 reclaimed leaving
 9318140296 used; max is 9773776896
 java.lang.OutOfMemoryError: Java heap space
 Dumping heap to java_pid10913.hprof ...
  INFO 14:22:30,620 InetAddress /10.71.71.66 is now dead.
  INFO 14:22:30,621 InetAddress /10.71.71.65 is now dead.
  INFO 14:22:30,621 GC for ConcurrentMarkSweep: 44862 ms, 261200
 reclaimed leaving 9334753480 used; max is 9773776896
  INFO 14:22:30,621 InetAddress /10.71.71.64 is now dead.

 Heap dump file created [12730501093 bytes in 253.445 secs]
 ERROR 14:28:08,945 Uncaught exception in thread Thread[Thread-2288,5,main]
 java.lang.OutOfMemoryError: Java heap space
        at 
 org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:71)
 ERROR 14:28:08,948 Uncaught exception in thread Thread[Thread-2281,5,main]
 java.lang.OutOfMemoryError: Java heap space
        at 
 org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:71)
  INFO 14:28:09,017 GC for ConcurrentMarkSweep: 33737 ms, 85880
 reclaimed leaving 9335215296 used; max is 9773776896

 Does anyone have any ideas what is going on?



Re: COMMIT-LOG_WRITER Assertion Error

2010-08-09 Thread Jonathan Ellis
Sounds like you upgraded to trunk from 0.6 without draining your
commitlog first?

On Mon, Aug 9, 2010 at 3:30 PM, Arya Goudarzi agouda...@gaiaonline.com wrote:
 Just throwing this out there as it could be a concern. I had a cluster of 3 
 nodes running. Over the weekend I updated to trunc (Aug 9th @ 2pm). Today, I 
 came to run my daily tests and my client kept giving me TSocket timeouts. 
 Checking the error log of Cassandra servers, all 3 nodes had this and they 
 all became unresponsive! Not sure how to reproduce this but a restart of all 
 3 nodes fixed the issue:

 ERROR [COMMIT-LOG-WRITER] 2010-08-09 11:30:27,722 CassandraDaemon.java (line 
 82) Uncaught exception in thread Thread[COMMIT-LOG-WRITER,5,main]
 java.lang.AssertionError
        at 
 org.apache.cassandra.db.commitlog.CommitLogHeader$CommitLogHeaderSerializer.serialize(CommitLogHeader.java:157)
        at 
 org.apache.cassandra.db.commitlog.CommitLogHeader.writeCommitLogHeader(CommitLogHeader.java:124)
        at 
 org.apache.cassandra.db.commitlog.CommitLogSegment.writeHeader(CommitLogSegment.java:70)
        at 
 org.apache.cassandra.db.commitlog.CommitLogSegment.write(CommitLogSegment.java:103)
        at 
 org.apache.cassandra.db.commitlog.CommitLog$LogRecordAdder.run(CommitLog.java:521)
        at 
 org.apache.cassandra.db.commitlog.PeriodicCommitLogExecutorService$1.runMayThrow(PeriodicCommitLogExecutorService.java:52)
        at 
 org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
        at java.lang.Thread.run(Thread.java:636)


 -Arya




-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com


Re: Growing commit log directory.

2010-08-09 Thread Jonathan Ellis
what does tpstats or other JMX monitoring of the o.a.c.concurrent stages show?

On Mon, Aug 9, 2010 at 4:50 PM, Edward Capriolo edlinuxg...@gmail.com wrote:
 I have a 16 node 6.3 cluster and two nodes from my cluster are giving
 me major headaches.

 10.71.71.56   Up         58.19 GB
 10827166220211678382926910108067277    |   ^
 10.71.71.61   Down       67.77 GB
 123739042516704895804863493611552076888    v   |
 10.71.71.66   Up         43.51 GB
 127605887595351923798765477786913079296    |   ^
 10.71.71.59   Down       90.22 GB
 139206422831293007780471430312996086499    v   |
 10.71.71.65   Up         22.97 GB
 148873535527910577765226390751398592512    |   ^

 The symptoms I am seeing are nodes 61 and nodes 59 have huge 6 GB +
 commit log directories. They keep growing, along with memory usage,
 eventually the logs start showing GCInspection errors and then the
 nodes will go OOM

 INFO 14:20:01,296 Creating new commitlog segment
 /var/lib/cassandra/commitlog/CommitLog-1281378001296.log
  INFO 14:20:02,199 GC for ParNew: 327 ms, 57545496 reclaimed leaving
 7955651792 used; max is 9773776896
  INFO 14:20:03,201 GC for ParNew: 443 ms, 45124504 reclaimed leaving
 8137412920 used; max is 9773776896
  INFO 14:20:04,314 GC for ParNew: 438 ms, 54158832 reclaimed leaving
 8310139720 used; max is 9773776896
  INFO 14:20:05,547 GC for ParNew: 409 ms, 56888760 reclaimed leaving
 8480136592 used; max is 9773776896
  INFO 14:20:06,900 GC for ParNew: 441 ms, 58149704 reclaimed leaving
 8648872520 used; max is 9773776896
  INFO 14:20:08,904 GC for ParNew: 462 ms, 59185992 reclaimed leaving
 8816581312 used; max is 9773776896
  INFO 14:20:09,973 GC for ParNew: 460 ms, 57403840 reclaimed leaving
 8986063136 used; max is 9773776896
  INFO 14:20:11,976 GC for ParNew: 447 ms, 59814376 reclaimed leaving
 9153134392 used; max is 9773776896
  INFO 14:20:13,150 GC for ParNew: 441 ms, 61879728 reclaimed leaving
 9318140296 used; max is 9773776896
 java.lang.OutOfMemoryError: Java heap space
 Dumping heap to java_pid10913.hprof ...
  INFO 14:22:30,620 InetAddress /10.71.71.66 is now dead.
  INFO 14:22:30,621 InetAddress /10.71.71.65 is now dead.
  INFO 14:22:30,621 GC for ConcurrentMarkSweep: 44862 ms, 261200
 reclaimed leaving 9334753480 used; max is 9773776896
  INFO 14:22:30,621 InetAddress /10.71.71.64 is now dead.

 Heap dump file created [12730501093 bytes in 253.445 secs]
 ERROR 14:28:08,945 Uncaught exception in thread Thread[Thread-2288,5,main]
 java.lang.OutOfMemoryError: Java heap space
        at 
 org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:71)
 ERROR 14:28:08,948 Uncaught exception in thread Thread[Thread-2281,5,main]
 java.lang.OutOfMemoryError: Java heap space
        at 
 org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:71)
  INFO 14:28:09,017 GC for ConcurrentMarkSweep: 33737 ms, 85880
 reclaimed leaving 9335215296 used; max is 9773776896

 Does anyone have any ideas what is going on?




-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com


Re: COMMIT-LOG_WRITER Assertion Error

2010-08-09 Thread Arya Goudarzi
I've never run 0.6. I have been running of trunc with automatic svn update and 
build everyday at 2pm. One of my nodes got this error which lead to the same 
last error prior to build and restart today. Hope this helps better:

java.lang.RuntimeException: java.util.concurrent.ExecutionException: 
java.lang.RuntimeException: java.lang.RuntimeException: 
java.util.concurrent.ExecutionException: java.lang.AssertionError
at 
org.apache.cassandra.utils.FBUtilities.waitOnFutures(FBUtilities.java:549)
at 
org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:339)
at 
org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:174)
at 
org.apache.cassandra.thrift.CassandraDaemon.setup(CassandraDaemon.java:120)
at 
org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:90)
at 
org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:224)
Caused by: java.util.concurrent.ExecutionException: java.lang.RuntimeException: 
java.lang.RuntimeException: java.util.concurrent.ExecutionException: 
java.lang.AssertionError
at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:252)
at java.util.concurrent.FutureTask.get(FutureTask.java:111)
at 
org.apache.cassandra.utils.FBUtilities.waitOnFutures(FBUtilities.java:545)
... 5 more
Caused by: java.lang.RuntimeException: java.lang.RuntimeException: 
java.util.concurrent.ExecutionException: java.lang.AssertionError
at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:636)
Caused by: java.lang.RuntimeException: java.util.concurrent.ExecutionException: 
java.lang.AssertionError
at 
org.apache.cassandra.db.commitlog.CommitLog.discardCompletedSegments(CommitLog.java:408)
at 
org.apache.cassandra.db.ColumnFamilyStore$2.runMayThrow(ColumnFamilyStore.java:445)
at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
... 6 more
Caused by: java.util.concurrent.ExecutionException: java.lang.AssertionError
at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:252)
at java.util.concurrent.FutureTask.get(FutureTask.java:111)
at 
org.apache.cassandra.db.commitlog.CommitLog.discardCompletedSegments(CommitLog.java:400)
... 8 more
Caused by: java.lang.AssertionError
at 
org.apache.cassandra.db.commitlog.CommitLogHeader$CommitLogHeaderSerializer.serialize(CommitLogHeader.java:157)
at 
org.apache.cassandra.db.commitlog.CommitLogHeader.writeCommitLogHeader(CommitLogHeader.java:124)
at 
org.apache.cassandra.db.commitlog.CommitLogSegment.writeHeader(CommitLogSegment.java:70)
at 
org.apache.cassandra.db.commitlog.CommitLog.discardCompletedSegmentsInternal(CommitLog.java:450)
at 
org.apache.cassandra.db.commitlog.CommitLog.access$300(CommitLog.java:75)
at 
org.apache.cassandra.db.commitlog.CommitLog$6.call(CommitLog.java:394)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at 
org.apache.cassandra.db.commitlog.PeriodicCommitLogExecutorService$1.runMayThrow(PeriodicCommitLogExecutorService.java:52)
at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
... 1 more 

- Original Message -
From: Jonathan Ellis jbel...@gmail.com
To: user@cassandra.apache.org
Sent: Monday, August 9, 2010 5:18:35 PM
Subject: Re: COMMIT-LOG_WRITER Assertion Error

Sounds like you upgraded to trunk from 0.6 without draining your
commitlog first?

On Mon, Aug 9, 2010 at 3:30 PM, Arya Goudarzi agouda...@gaiaonline.com wrote:
 Just throwing this out there as it could be a concern. I had a cluster of 3 
 nodes running. Over the weekend I updated to trunc (Aug 9th @ 2pm). Today, I 
 came to run my daily tests and my client kept giving me TSocket timeouts. 
 Checking the error log of Cassandra servers, all 3 nodes had this and they 
 all became unresponsive! Not sure how to reproduce this but a restart of all 
 3 nodes fixed the issue:

 ERROR [COMMIT-LOG-WRITER] 2010-08-09 11:30:27,722 CassandraDaemon.java (line 
 82) Uncaught exception in thread Thread[COMMIT-LOG-WRITER,5,main]
 java.lang.AssertionError
        at 
 org.apache.cassandra.db.commitlog.CommitLogHeader$CommitLogHeaderSerializer.serialize(CommitLogHeader.java:157)
        at 
 

Re: Growing commit log directory.

2010-08-09 Thread Edward Capriolo
On Mon, Aug 9, 2010 at 8:20 PM, Jonathan Ellis jbel...@gmail.com wrote:
 what does tpstats or other JMX monitoring of the o.a.c.concurrent stages show?

 On Mon, Aug 9, 2010 at 4:50 PM, Edward Capriolo edlinuxg...@gmail.com wrote:
 I have a 16 node 6.3 cluster and two nodes from my cluster are giving
 me major headaches.

 10.71.71.56   Up         58.19 GB
 10827166220211678382926910108067277    |   ^
 10.71.71.61   Down       67.77 GB
 123739042516704895804863493611552076888    v   |
 10.71.71.66   Up         43.51 GB
 127605887595351923798765477786913079296    |   ^
 10.71.71.59   Down       90.22 GB
 139206422831293007780471430312996086499    v   |
 10.71.71.65   Up         22.97 GB
 148873535527910577765226390751398592512    |   ^

 The symptoms I am seeing are nodes 61 and nodes 59 have huge 6 GB +
 commit log directories. They keep growing, along with memory usage,
 eventually the logs start showing GCInspection errors and then the
 nodes will go OOM

 INFO 14:20:01,296 Creating new commitlog segment
 /var/lib/cassandra/commitlog/CommitLog-1281378001296.log
  INFO 14:20:02,199 GC for ParNew: 327 ms, 57545496 reclaimed leaving
 7955651792 used; max is 9773776896
  INFO 14:20:03,201 GC for ParNew: 443 ms, 45124504 reclaimed leaving
 8137412920 used; max is 9773776896
  INFO 14:20:04,314 GC for ParNew: 438 ms, 54158832 reclaimed leaving
 8310139720 used; max is 9773776896
  INFO 14:20:05,547 GC for ParNew: 409 ms, 56888760 reclaimed leaving
 8480136592 used; max is 9773776896
  INFO 14:20:06,900 GC for ParNew: 441 ms, 58149704 reclaimed leaving
 8648872520 used; max is 9773776896
  INFO 14:20:08,904 GC for ParNew: 462 ms, 59185992 reclaimed leaving
 8816581312 used; max is 9773776896
  INFO 14:20:09,973 GC for ParNew: 460 ms, 57403840 reclaimed leaving
 8986063136 used; max is 9773776896
  INFO 14:20:11,976 GC for ParNew: 447 ms, 59814376 reclaimed leaving
 9153134392 used; max is 9773776896
  INFO 14:20:13,150 GC for ParNew: 441 ms, 61879728 reclaimed leaving
 9318140296 used; max is 9773776896
 java.lang.OutOfMemoryError: Java heap space
 Dumping heap to java_pid10913.hprof ...
  INFO 14:22:30,620 InetAddress /10.71.71.66 is now dead.
  INFO 14:22:30,621 InetAddress /10.71.71.65 is now dead.
  INFO 14:22:30,621 GC for ConcurrentMarkSweep: 44862 ms, 261200
 reclaimed leaving 9334753480 used; max is 9773776896
  INFO 14:22:30,621 InetAddress /10.71.71.64 is now dead.

 Heap dump file created [12730501093 bytes in 253.445 secs]
 ERROR 14:28:08,945 Uncaught exception in thread Thread[Thread-2288,5,main]
 java.lang.OutOfMemoryError: Java heap space
        at 
 org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:71)
 ERROR 14:28:08,948 Uncaught exception in thread Thread[Thread-2281,5,main]
 java.lang.OutOfMemoryError: Java heap space
        at 
 org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:71)
  INFO 14:28:09,017 GC for ConcurrentMarkSweep: 33737 ms, 85880
 reclaimed leaving 9335215296 used; max is 9773776896

 Does anyone have any ideas what is going on?




 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of Riptano, the source for professional Cassandra support
 http://riptano.com


Hey guys thanks for the help. I had lowered my Xmx from 12GB to 10xmx
because I saw:

[r...@cdbsd09 ~]# /usr/local/cassandra/bin/nodetool --host 10.71.71.59
--port 8585 info
123739042516704895804863493611552076888
Load : 68.91 GB
Generation No: 1281407425
Uptime (seconds) : 1459
Heap Memory (MB) : 6476.70 / 12261.00

This was happening:
[r...@cdbsd11 ~]# /usr/local/cassandra/bin/nodetool --host
cdbsd09.hadoop.pvt --port 8585 tpstats
Pool NameActive   Pending  Completed
STREAM-STAGE  0 0  0
RESPONSE-STAGE0 0  16478
ROW-READ-STAGE   64  4014  18190
LB-OPERATIONS 0 0  0
MESSAGE-DESERIALIZER-POOL 0 0  60290
GMFD  0 0385
LB-TARGET 0 0  0
CONSISTENCY-MANAGER   0 0   7526
ROW-MUTATION-STAGE   64   908 182612
MESSAGE-STREAMING-POOL0 0  0
LOAD-BALANCER-STAGE   0 0  0
FLUSH-SORTER-POOL 0 0  0
MEMTABLE-POST-FLUSHER 0 0  8
FLUSH-WRITER-POOL 0 0  8
AE-SERVICE-STAGE  0 0  0
HINTED-HANDOFF-POOL   1 9  6

After raising the level I realized I was maxing out the heap. The
other nodes are running fine with xmx9GB but I guess these nodes can
not.

Thanks again.
Edward


explanation of generated files and ops

2010-08-09 Thread S Ahmed
In /var/lib/cassandra there is:

/data/system
LocationInfo-4-Data.db
LocationInfo-4-Filter.db
LocationInfo-4-Index.db
..
..

/data/Keyspace1/
Standard2-2-Data.db
Standard2-2-Filter.db
Standard2-2-Index.db

/commitlog
CommitLog-timestamp.log

/var/log/cassandra
system.log



Is this pretty much all the files that Cassandra generates? (have I missed
any)

Are there are common administrative tasks that admins might need to perform
on these files at all?

What exactly is stored in the -Filter.db files?


Re: How to migrate any relational database to Cassandra

2010-08-09 Thread Peng Guo
Maybe you could integrate with Hadoop.

On Mon, Aug 9, 2010 at 1:15 PM, sonia gehlot sonia.geh...@gmail.com wrote:

 Hi Guys,

 Thanks for sharing your experiences and valuable links these are really
 helpful.

 But I want to do ETL and then wanted to load data in Cassandra. I have link
 10-15 various source system, presently daily ETL jobs runs load data in our
 database which is Netezza. How can I do this in Cassandra, like what if my
 target data base is source are the same (MySQL, Oracle, Netezza..etc)?

 -Sonia

 On Sat, Aug 7, 2010 at 7:46 PM, Zhong Li z...@voxeo.com wrote:

 Yes, I use OrderPreservngPartitioner, the token considers
 datacenter+ip+function+timestamp+recordId+...


 On Aug 7, 2010, at 10:36 PM, Jonathan Ellis wrote:

  are you using OrderPreservingPartitioner then?

 On Sat, Aug 7, 2010 at 10:32 PM, Zhong Li z...@voxeo.com wrote:

 Here is just my personal experiences.

 I recently use Cassandra to implement a system cross 5 datacenters.
 Because
 it is impossible to do it in SQL Database at low cost, Cassandra helps.

 Cassandra is all about indexing, there is no relationship naturally, you
 have to use indexing to keep all relationships. This is fine, because
 you
 can add new index when you want.

 The big pain is the token. Only one token you can choose for a node, all
 system have to adopt same rule to create index. It is huge huge pain.

 If Cassandra can implement token at CF level, it is much nature and easy
 for
 us to implement a system.

 Best,

 Zhong


 On Aug 6, 2010, at 9:23 PM, Peter Harrison wrote:

  On Sat, Aug 7, 2010 at 6:00 AM, sonia gehlot sonia.geh...@gmail.com
 wrote:

  Can you please help me how to move forward? How should I do all the
 setup
 for this?


 My view is that Cassandra is fundamentally different from SQL
 databases.
 There
 may be artefact's which are superficially similar between the two
 systems,
 but
 I guess I'm thinking of a move to Cassandra like my move from dBase to
 Delphi;
 in other words there were concepts which modified how you write
 applications.

 Now, you can do something similar to a SQL database, but I don't think
 you
 would
 be leveraging the features of Cassandra. That said, I think there will
 be
 a new
 generation of abstraction tools that will make modeling easier.

 A perhaps more practical answer: there is no one to one mapping between
 SQL
 and Cassandra.






 --
 Jonathan Ellis
 Project Chair, Apache Cassandra
 co-founder of Riptano, the source for professional Cassandra support
 http://riptano.com






-- 
Regards
Peng Guo