RE: error using get_range_slice with random partitioner
Hi Thomas, Can you share your client code for the iteration? It would probably help me catch my problem. Anyone know where in the cassandra source the integration tests are for this functionality on the random partitioner? Note that I posted a specific example where the iteration failed and I was not throwing out good keys only duplicate ones. That means 1 of 2 things: 1) I'm somehow using the API incorrectly 2) I am the only one encountering a bug My money is on 1) of course. I can check the thrift API against what my Scala client is calling under the hood. -Adam -Original Message- From: th.hel...@gmail.com on behalf of Thomas Heller Sent: Fri 8/6/2010 7:17 PM To: user@cassandra.apache.org Subject: Re: error using get_range_slice with random partitioner On Sat, Aug 7, 2010 at 1:05 AM, Adam Crain adam.cr...@greenenergycorp.com wrote: I took this approach... reject the first result of subsequent get_range_slice requests. If you look back at output I posted (below) you'll notice that not all of the 30 keys [key1...key30] get listed! The iteration dies and can't proceed past key2. 1) 1st batch gets 10 unique keys. 2) 2nd batch only gets 9 unique keys with the 1st being a repeat 3) 3rd batch only get 2 unqiue keys That means the iteration didn't see 9 keys in the CF. Key7 and Key30 are missing for example. Remember the returned results are NOT sorted, so you whenever you are dropping the first by default, you might be dropping a good one. At least that would be my guess here. I have iteration implemented in my client and everything is working as expected and so far I never had duplicates (running 0.6.3). I'm using tokens for range_slices tho, increment/decrement for get_slice only. /thomas winmail.dat
Re: TokenRange contains endpoints without any port information?
On Sun, Aug 8, 2010 at 07:21, Carsten Krebs carsten.kr...@gmx.net wrote: I'm wondering why a TokenRange returned by describe_ring(keyspace) of the thrift API just returns endpoints consisting only of an address but omits any port information? My first thought was, this method could be used to expose some information about the ring structure to the client, i.e. to do some client side load balancing. But now, I'm not sure about this anymore. Additionally, when looking into the code, I guess the address returned as part of the TokenRange is the address of the storage service which could differ from the thrift address, which in turn would make the returned endpoint useless for the client. What is the purpose of this method To give a picture of the ring topology. or respectively why is the port information omitted? You already knew the thrift port to make the query connection. The only other port you *might* need to be concerned with is the storage port, which is assumed to be constant across the cluster. But really, from a client perspective it does you no good to know this port, so why bother exposing it? Gary. TIA, Carsten
Question on nodetool ring
I'm running a 2 node cluster and when I run nodetool ring I get the following output Address Status State LoadToken 160032583171087979418578389981025646900 127.0.0.1 Up Normal 42.28 MB 42909338385373526599163667549814010691 127.0.0.2 Up Normal 42.26 MB 160032583171087979418578389981025646900 The columns/values are pretty much self explanatory except for the first line. What is this value? Thanks
Re: Question on load balancing in a cluster
Cool thanks, I think I will experiment with nodetool move. Can somebody confirm on the reason for decommissioning, instead of just splitting the token on the fly? Yes it does seem simpler to just decommission and bootstrap, but that does mean a lot of data has to be moved around to get a reasonable load distribution. Load distribution is a bigger need when a new node is introduced and so load needs to be balanced. This also means that when a node is decommissioned the load on its immediate neighbor increases. In this example where A,B,C,E is a cluster with load being 80, 78, 83, 84. ow I add a new node D (position will be before E), so eventually after all the rebalance activity I want the load to be ~66 (245/5). Now is that unreasonable to expect, because if it is not then the each node will have to be decommissioned and bootstrapped to get that distribution (almost the entire dataset will need to be moved), now that is a lot of data movement!! unless I have got this wrong? -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Question-on-load-balancing-in-a-cluster-tp5375140p5389827.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: batch_mutate atomicity
I am using the familiar meanings from ACID: atomic means either the entire update will succeed or none of it. isolated means other threads will not see partial updates while it is being applied. A related concern is whether there is a write *ordering* guarantee for mutations within a row key. Ensuring consistency in the face of concurrent access can in some (probably several) cases become a lot easier with an ordering guarantee which would otherwise necessitate an RPC call between potentially every mutation (depending on where dependencies between writes are). Upon cursory inspection it *seems* to me that ordering is maintained, but even if this is correct, can one consider Cassandra to have such an ordering guarantee or is any such behavior an artifact of current implementation details? By ordering guarantee I am only thinking of single column reads or distinct get_slice() calls; I am not expecting ordering guarantees w.r.t. visibility within a single get_slice(). (Additionally I am assuming QUOROM or RF=1, else it would not be useful to rely on anyway.) -- / Peter Schuller
COMMIT-LOG_WRITER Assertion Error
Just throwing this out there as it could be a concern. I had a cluster of 3 nodes running. Over the weekend I updated to trunc (Aug 9th @ 2pm). Today, I came to run my daily tests and my client kept giving me TSocket timeouts. Checking the error log of Cassandra servers, all 3 nodes had this and they all became unresponsive! Not sure how to reproduce this but a restart of all 3 nodes fixed the issue: ERROR [COMMIT-LOG-WRITER] 2010-08-09 11:30:27,722 CassandraDaemon.java (line 82) Uncaught exception in thread Thread[COMMIT-LOG-WRITER,5,main] java.lang.AssertionError at org.apache.cassandra.db.commitlog.CommitLogHeader$CommitLogHeaderSerializer.serialize(CommitLogHeader.java:157) at org.apache.cassandra.db.commitlog.CommitLogHeader.writeCommitLogHeader(CommitLogHeader.java:124) at org.apache.cassandra.db.commitlog.CommitLogSegment.writeHeader(CommitLogSegment.java:70) at org.apache.cassandra.db.commitlog.CommitLogSegment.write(CommitLogSegment.java:103) at org.apache.cassandra.db.commitlog.CommitLog$LogRecordAdder.run(CommitLog.java:521) at org.apache.cassandra.db.commitlog.PeriodicCommitLogExecutorService$1.runMayThrow(PeriodicCommitLogExecutorService.java:52) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30) at java.lang.Thread.run(Thread.java:636) -Arya
Re: Question on nodetool ring
that's the token range so node#1 is from 1600.. to 429.. node#2 is from 429... to 1600... hopefully others can chime into confirm. On Mon, Aug 9, 2010 at 12:30 PM, Mark static.void@gmail.com wrote: I'm running a 2 node cluster and when I run nodetool ring I get the following output Address Status State LoadToken 160032583171087979418578389981025646900 127.0.0.1 Up Normal 42.28 MB 42909338385373526599163667549814010691 127.0.0.2 Up Normal 42.26 MB 160032583171087979418578389981025646900 The columns/values are pretty much self explanatory except for the first line. What is this value? Thanks
Re: Question on nodetool ring
On 8/9/10 12:51 PM, S Ahmed wrote: that's the token range so node#1 is from 1600.. to 429.. node#2 is from 429... to 1600... hopefully others can chime into confirm. On Mon, Aug 9, 2010 at 12:30 PM, Mark static.void@gmail.com mailto:static.void@gmail.com wrote: I'm running a 2 node cluster and when I run nodetool ring I get the following output Address Status State LoadToken 160032583171087979418578389981025646900 127.0.0.1 Up Normal 42.28 MB 42909338385373526599163667549814010691 127.0.0.2 Up Normal 42.26 MB 160032583171087979418578389981025646900 The columns/values are pretty much self explanatory except for the first line. What is this value? Thanks I was just wondering why the 160032583171087979418578389981025646900 token is on a line by itself and listed under 127.0.0.2.
Growing commit log directory.
I have a 16 node 6.3 cluster and two nodes from my cluster are giving me major headaches. 10.71.71.56 Up 58.19 GB 10827166220211678382926910108067277| ^ 10.71.71.61 Down 67.77 GB 123739042516704895804863493611552076888v | 10.71.71.66 Up 43.51 GB 127605887595351923798765477786913079296| ^ 10.71.71.59 Down 90.22 GB 139206422831293007780471430312996086499v | 10.71.71.65 Up 22.97 GB 148873535527910577765226390751398592512| ^ The symptoms I am seeing are nodes 61 and nodes 59 have huge 6 GB + commit log directories. They keep growing, along with memory usage, eventually the logs start showing GCInspection errors and then the nodes will go OOM INFO 14:20:01,296 Creating new commitlog segment /var/lib/cassandra/commitlog/CommitLog-1281378001296.log INFO 14:20:02,199 GC for ParNew: 327 ms, 57545496 reclaimed leaving 7955651792 used; max is 9773776896 INFO 14:20:03,201 GC for ParNew: 443 ms, 45124504 reclaimed leaving 8137412920 used; max is 9773776896 INFO 14:20:04,314 GC for ParNew: 438 ms, 54158832 reclaimed leaving 8310139720 used; max is 9773776896 INFO 14:20:05,547 GC for ParNew: 409 ms, 56888760 reclaimed leaving 8480136592 used; max is 9773776896 INFO 14:20:06,900 GC for ParNew: 441 ms, 58149704 reclaimed leaving 8648872520 used; max is 9773776896 INFO 14:20:08,904 GC for ParNew: 462 ms, 59185992 reclaimed leaving 8816581312 used; max is 9773776896 INFO 14:20:09,973 GC for ParNew: 460 ms, 57403840 reclaimed leaving 8986063136 used; max is 9773776896 INFO 14:20:11,976 GC for ParNew: 447 ms, 59814376 reclaimed leaving 9153134392 used; max is 9773776896 INFO 14:20:13,150 GC for ParNew: 441 ms, 61879728 reclaimed leaving 9318140296 used; max is 9773776896 java.lang.OutOfMemoryError: Java heap space Dumping heap to java_pid10913.hprof ... INFO 14:22:30,620 InetAddress /10.71.71.66 is now dead. INFO 14:22:30,621 InetAddress /10.71.71.65 is now dead. INFO 14:22:30,621 GC for ConcurrentMarkSweep: 44862 ms, 261200 reclaimed leaving 9334753480 used; max is 9773776896 INFO 14:22:30,621 InetAddress /10.71.71.64 is now dead. Heap dump file created [12730501093 bytes in 253.445 secs] ERROR 14:28:08,945 Uncaught exception in thread Thread[Thread-2288,5,main] java.lang.OutOfMemoryError: Java heap space at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:71) ERROR 14:28:08,948 Uncaught exception in thread Thread[Thread-2281,5,main] java.lang.OutOfMemoryError: Java heap space at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:71) INFO 14:28:09,017 GC for ConcurrentMarkSweep: 33737 ms, 85880 reclaimed leaving 9335215296 used; max is 9773776896 Does anyone have any ideas what is going on?
Re: TokenRange contains endpoints without any port information?
On 08.08.2010, at 14:47 aaron morton wrote: What sort of client side load balancing where you thinking of? I just use round robin DNS to distribute clients around the cluster, and have them recycle their connections every so often. I was thinking about to use this method to give the client to the ability to learn what nodes are part of the cluster. Using this information to automatically adapt the set of nodes used by the client if a new node is added to or respectively removed from the cluster. Why do you prefer round robin DNS for load balancing? One advantage I see is, that the client does not has to take care about the node set and especially the management of the node set. The reason why I was thinking about a client side load balancing was to avoid the need to write additional tools, to monitor all nodes in the cluster and changing the DNS entry if any node fails - and this as fast as possible to prevent the clients from trying to use a dead node. But the time writing this, I doesn't think anymore, that this is good point. This is just a point of some sort of retry logic, which is needed anyway in the client. Carsten
Re: Question on nodetool ring
b/c node#1 has a start and end range, so you can see the boundaries for each node by looking at the last column. On Mon, Aug 9, 2010 at 4:12 PM, Mark static.void@gmail.com wrote: On 8/9/10 12:51 PM, S Ahmed wrote: that's the token range so node#1 is from 1600.. to 429.. node#2 is from 429... to 1600... hopefully others can chime into confirm. On Mon, Aug 9, 2010 at 12:30 PM, Mark static.void@gmail.com wrote: I'm running a 2 node cluster and when I run nodetool ring I get the following output Address Status State LoadToken 160032583171087979418578389981025646900 127.0.0.1 Up Normal 42.28 MB 42909338385373526599163667549814010691 127.0.0.2 Up Normal 42.26 MB 160032583171087979418578389981025646900 The columns/values are pretty much self explanatory except for the first line. What is this value? Thanks I was just wondering why the 160032583171087979418578389981025646900 token is on a line by itself and listed under 127.0.0.2.
Re: Growing commit log directory.
if your commit logs are not getting cleared, doesn't that indicate your load is more than your servers can handle? On Mon, Aug 9, 2010 at 4:50 PM, Edward Capriolo edlinuxg...@gmail.comwrote: I have a 16 node 6.3 cluster and two nodes from my cluster are giving me major headaches. 10.71.71.56 Up 58.19 GB 10827166220211678382926910108067277| ^ 10.71.71.61 Down 67.77 GB 123739042516704895804863493611552076888v | 10.71.71.66 Up 43.51 GB 127605887595351923798765477786913079296| ^ 10.71.71.59 Down 90.22 GB 139206422831293007780471430312996086499v | 10.71.71.65 Up 22.97 GB 148873535527910577765226390751398592512| ^ The symptoms I am seeing are nodes 61 and nodes 59 have huge 6 GB + commit log directories. They keep growing, along with memory usage, eventually the logs start showing GCInspection errors and then the nodes will go OOM INFO 14:20:01,296 Creating new commitlog segment /var/lib/cassandra/commitlog/CommitLog-1281378001296.log INFO 14:20:02,199 GC for ParNew: 327 ms, 57545496 reclaimed leaving 7955651792 used; max is 9773776896 INFO 14:20:03,201 GC for ParNew: 443 ms, 45124504 reclaimed leaving 8137412920 used; max is 9773776896 INFO 14:20:04,314 GC for ParNew: 438 ms, 54158832 reclaimed leaving 8310139720 used; max is 9773776896 INFO 14:20:05,547 GC for ParNew: 409 ms, 56888760 reclaimed leaving 8480136592 used; max is 9773776896 INFO 14:20:06,900 GC for ParNew: 441 ms, 58149704 reclaimed leaving 8648872520 used; max is 9773776896 INFO 14:20:08,904 GC for ParNew: 462 ms, 59185992 reclaimed leaving 8816581312 used; max is 9773776896 INFO 14:20:09,973 GC for ParNew: 460 ms, 57403840 reclaimed leaving 8986063136 used; max is 9773776896 INFO 14:20:11,976 GC for ParNew: 447 ms, 59814376 reclaimed leaving 9153134392 used; max is 9773776896 INFO 14:20:13,150 GC for ParNew: 441 ms, 61879728 reclaimed leaving 9318140296 used; max is 9773776896 java.lang.OutOfMemoryError: Java heap space Dumping heap to java_pid10913.hprof ... INFO 14:22:30,620 InetAddress /10.71.71.66 is now dead. INFO 14:22:30,621 InetAddress /10.71.71.65 is now dead. INFO 14:22:30,621 GC for ConcurrentMarkSweep: 44862 ms, 261200 reclaimed leaving 9334753480 used; max is 9773776896 INFO 14:22:30,621 InetAddress /10.71.71.64 is now dead. Heap dump file created [12730501093 bytes in 253.445 secs] ERROR 14:28:08,945 Uncaught exception in thread Thread[Thread-2288,5,main] java.lang.OutOfMemoryError: Java heap space at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:71) ERROR 14:28:08,948 Uncaught exception in thread Thread[Thread-2281,5,main] java.lang.OutOfMemoryError: Java heap space at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:71) INFO 14:28:09,017 GC for ConcurrentMarkSweep: 33737 ms, 85880 reclaimed leaving 9335215296 used; max is 9773776896 Does anyone have any ideas what is going on?
Re: row cache during bootstrap
On Sun, Aug 8, 2010 at 5:24 AM, aaron morton aa...@thelastpickle.comwrote: Not sure how feasible it is or if it's planned. But it would probably require that the nodes are able so share the state of their row cache so as to know which parts to warm. Otherwise it sounds like you're assuming the node can hold the entire data set in memory. Im not assuming the node can hold the entire data set in cassandra in memory, if thats what you meant. I was thinking of sharing the state of the row cache, but only those keys that are being moved for the token. the other keys can stay hidden to the node. If you know in your application when you would like data to be in the cache, you can send a query like get_range_slices to the cluster and ask for 0 columns. That will warm the row cache for the keys it hits. This is a tuff one as our row cache is over 20 million and takes a while to get a large hit ratio. so while we try to preload it is taking requests. If it were possible to bring up a node that doesnt announce its availability to the cluster that would help us manually warm the cache. I know this feature is in the issue tracker currently, but didnt look like it would come out anytime before 0.8. I have heard it mentioned that the coordinator node will take action to when one node is considered to be running slow. So it may be able to work around the new node until it gets warmed up. That is interesting i haven't heard that one. I think with the parallel reads that are happening it makes sense that it would be possible. That is unless the data is local. I believe in that case it always prefers to read local vs over the network, so if the local machine is the slow node that wouldnt help. Are you adding nodes often? Currently not that often. The main issue is we have very stringent latency requirements and anything that would affect those we have to understand the worst case cost to see if we can avoid them. Aaron On 7 Aug 2010, at 11:17, Artie Copeland wrote: the way i understand how row caches work is that each node has an independent cache, in that they do not push there cache contents with other nodes. if that the case is it also true that when a new node is added to the cluster it has to build up its own cache. if thats the case i see that as a possible performance bottle neck once the node starts to accept requests. since there is no way i know of to warm the cache without adding the node to the cluster. would it be infeasible to have part of the bootstrap process not only stream data from nodes but also cached rows that are associated with those same keys? that would allow the new nodes to be able to provide the best performance once the bootstrap process finishes. -- http://yeslinux.org http://yestech.org -- http://yeslinux.org http://yestech.org
Re: backport of pre cache load
No we aren't caching 100%, we cache over 20 - 30 million which only starts to get a high hit rate overtime so to have a useful cache can take over a week of running. We would love to store the complete CF in memory but know know of a server that can hold that much data in memory while still being commodity. Our data set is currently over 100GB. On Fri, Aug 6, 2010 at 5:54 PM, Jonathan Ellis jbel...@gmail.com wrote: are you caching 100% of the CF? if not this is not super useful. On Fri, Aug 6, 2010 at 7:10 PM, Artie Copeland yeslinux@gmail.com wrote: would it be possible to backport the 0.7 feature, the ability to safe and preload row caches after a restart. i think that is a very nice and important feature that would help users with very large caches, that take a long time to get the proper hot set. for example we can get pretty good cache row cache hits if we run the servers for a month or more as the data tends to settle down. -- http://yeslinux.org http://yestech.org -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com -- http://yeslinux.org http://yestech.org
Re: TokenRange contains endpoints without any port information?
The FAQ lists Round-Robin as the recommended way to find a node to connect to...http://wiki.apache.org/cassandra/FAQ#node_clients_connect_toAs you say, your clients need to retry anyway. I have them hold the connection for a while (on the scale of minutes), then hit the DNS again and acquire a new connection. This lets them pickup new nodes and (i think over time) helps with keeping connections balanced around the cluster.If a node goes down for a shot time, it should not have too much of an affect on the clients. If you are taking a node out of the cluster you will need to update the DNS to remove it.AaronOn 10 Aug, 2010,at 08:51 AM, Carsten Krebs carsten.kr...@gmx.net wrote: On 08.08.2010, at 14:47 aaron morton wrote: What sort of client side load balancing where you thinking of? I just use round robin DNS to distribute clients around the cluster, and have them recycle their connections every so often. I was thinking about to use this method to give the client to the ability to "learn" what nodes are part of the cluster. Using this information to automatically adapt the set of nodes used by the client if a new node is added to or respectively removed from the cluster. Why do you prefer round robin DNS for load balancing? One advantage I see is, that the client does not has to take care about the node set and especially the management of the node set. The reason why I was thinking about a client side load balancing was to avoid the need to write additional tools, to monitor all nodes in the cluster and changing the DNS entry if any node fails - and this as fast as possible to prevent the clients from trying to use a dead node. But the time writing this, I doesn't think anymore, that this is good point. This is just a point of some sort of retry logic, which is needed anyway in the client. Carsten
Re: 2 nodes on one machine
http://www.onemanclapping.org/2010/03/running-multiple-cassandra-nodes-on.htmlAlso some recent discussion on the users list.AaronOn 10 Aug, 2010,at 08:58 AM, Pavlo Baron p...@pbit.org wrote:Hello users, I'm a total Cassandra noob beside what I read about it, so please be patient :) I want to setup a cluster with 2 nodes on one VirtualBox-ed CentOS. I don't really want to start with the single node example, but with my desired setup. Do I have to do much more than to have 2 network interfaces so I can configure Cassandra nodes to run on different subnets / IP-Adresses? Is it possible at all to have several instances run on the same machine? Could you point me at a doc which describes a setup like that? The rest of the setup would even be a cluster of those 2 nodes, but even with nodes running on the same machine. Background: I want this setup to be frozen in the VirtualBox image. many thx in advance and best rgds Pavlo
Re: 2 nodes on one machine
cool, thank you Aaron, I'll check it out through the next days and post the results Pavlo Am 10.08.2010 00:11, schrieb Aaron Morton: http://www.onemanclapping.org/2010/03/running-multiple-cassandra-nodes-on.html Also some recent discussion on the users list. Aaron On 10 Aug, 2010,at 08:58 AM, Pavlo Baron p...@pbit.org wrote: Hello users, I'm a total Cassandra noob beside what I read about it, so please be patient :) I want to setup a cluster with 2 nodes on one VirtualBox-ed CentOS. I don't really want to start with the single node example, but with my desired setup. Do I have to do much more than to have 2 network interfaces so I can configure Cassandra nodes to run on different subnets / IP-Adresses? Is it possible at all to have several instances run on the same machine? Could you point me at a doc which describes a setup like that? The rest of the setup would even be a cluster of those 2 nodes, but even with nodes running on the same machine. Background: I want this setup to be frozen in the VirtualBox image. many thx in advance and best rgds Pavlo
Using a separate commit log drive was 4x slower
I have a weird one to share with the list, Using a separate commit log drive dropped my performance a lot more than I would expect... I'm doing perf tests on 3 identical machines but with 3 different drive sets. (SAS 15K,10K, and SATA 7.5K) Each system has a single system disk (Same as the data set) and the data set ( a 5 drive RAID-0 ) I'm using Cassandra 0.6.4 with Java 1.6_20. This is all RF=1, CL=1 I inserted an initial data set of 100K keys (each with 1000 columns of random data (1000 bytes). Compacted and restarted Cassandra. Then I did a write baseline where I have 500 threads inserting a random 1000 bytes on a random key/column combination (always 1 column per request). If my commit log is on my RAID'd Data drive I get about 19K Columns/Inserts per second. If I then add some random reads ( 30 threads doing a random read Key/Column read - always 1 column per read) I get ~ 8K Reads/Writes per second Host Write Baseline Columns Per Second. Write Columns Per Second. Read Columns Per Second. SAS15K 18800 8700 8100 SAS10K 15800 7600 7300 SATA 13200 7300 8000 Now, if I do the same thing but with the commit log on the system disk, I get: Host Write Baseline Columns Per Second. Write Columns Per Second. Read Columns Per Second. SAS15K 12600 2200 1600 SAS10K 11400 3000 1900 SATA 9900 3100 1800 I would think that the Write level would stay at about the baseline, and I have no idea why the read level would be so low. Any thoughts? Some iostat (while separate commit log): avg-cpu: %user %nice %system %iowait %steal %idle 38.330.004.722.480.00 54.47 Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util cciss/c0d00.00 908.500.00 110.50 0.00 8152.00 73.77 0.575.20 4.93 54.50 cciss/c0d10.00 0.00 16.500.00 1424.00 0.00 86.30 0.106.06 2.73 4.50 dm-0 0.00 0.000.00 1019.00 0.00 8152.00 8.00 6.256.13 0.53 54.50 dm-1 0.00 0.000.000.00 0.00 0.00 0.00 0.000.00 0.00 0.00 avg-cpu: %user %nice %system %iowait %steal %idle 37.420.002.373.540.00 56.68 Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util cciss/c0d00.00 854.500.00 124.50 0.00 7816.00 62.78 0.614.94 4.82 60.00 cciss/c0d10.00 0.00 32.000.00 4032.00 0.00 126.00 0.216.72 3.12 10.00 dm-0 0.00 0.000.00 979.50 0.00 7836.00 8.00 5.575.69 0.61 60.00 dm-1 0.00 0.000.000.00 0.00 0.00 0.00 0.000.00 0.00 0.00 some top (while separate commit log): top - 15:56:38 up 6 days, 21:26, 9 users, load average: 17.09, 7.92, 6.87 Tasks: 358 total, 1 running, 357 sleeping, 0 stopped, 0 zombie Cpu(s): 35.4%us, 1.6%sy, 0.0%ni, 59.4%id, 3.2%wa, 0.0%hi, 0.5%si, 0.0%st Mem: 24729068k total, 19789732k used, 4939336k free, 132056k buffers Swap: 5849080k total,54976k used, 5794104k free, 14839884k cached PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND 19411 root 20 0 142g 8.2g 5.0g S 599 34.9 423:25.42 java Storage ClusterNameTest Cluster/ClusterName AutoBootstrapfalse/AutoBootstrap HintedHandoffEnabledtrue/HintedHandoffEnabled Keyspaces Keyspace Name=Keyspace1 ColumnFamily Name=PerfTest CompareWith=LongType/ ReplicaPlacementStrategyorg.apache.cassandra.locator.RackUnawareStrategy/ReplicaPlacementStrategy ReplicationFactor1/ReplicationFactor EndPointSnitchorg.apache.cassandra.locator.EndPointSnitch/EndPointSnitch /Keyspace /Keyspaces Authenticatororg.apache.cassandra.auth.AllowAllAuthenticator/Authenticator Partitionerorg.apache.cassandra.dht.OrderPreservingPartitioner/Partitioner InitialToken/InitialToken CommitLogDirectory/data/commitlog/CommitLogDirectory DataFileDirectories DataFileDirectory/data/data/DataFileDirectory /DataFileDirectories Seeds Seed127.0.0.1/Seed /Seeds RpcTimeoutInMillis1/RpcTimeoutInMillis CommitLogRotationThresholdInMB1024/CommitLogRotationThresholdInMB ListenAddress10.2.60.20/ListenAddress StoragePort7000/StoragePort ThriftAddress10.2.60.20/ThriftAddress ThriftPort9160/ThriftPort ThriftFramedTransportfalse/ThriftFramedTransport DiskAccessModeauto/DiskAccessMode RowWarningThresholdInMB512/RowWarningThresholdInMB SlicedBufferSizeInKB64/SlicedBufferSizeInKB FlushDataBufferSizeInMB32/FlushDataBufferSizeInMB FlushIndexBufferSizeInMB8/FlushIndexBufferSizeInMB ColumnIndexSizeInKB64/ColumnIndexSizeInKB MemtableThroughputInMB512/MemtableThroughputInMB BinaryMemtableThroughputInMB256/BinaryMemtableThroughputInMB MemtableOperationsInMillions1.2/MemtableOperationsInMillions
Re: error using get_range_slice with random partitioner
Sure, but its in my ruby client which currently has close to no documentation. ;) Client is here: http://github.com/thheller/greek_architect Relevant Row Spec: http://bit.ly/9uS6Ba Row-based iteration: http://bit.ly/cRVSTc #each_slice Currently uses a hack since I wasnt able to produce cassandra BigInteger Tokens in Ruby. I'm a math noob and couldnt figure out why some of the Tokens would differ. I just spawn a Java Process and use that to generate the Tokens, insanely slow but I dont use that feature anymore anyways. ;) CF-Iteration: http://bit.ly/bNgsRG #each Its all a little abstracted away I guess but I hope you can follow the relevant thrift calls. HTH, /thomas On Mon, Aug 9, 2010 at 3:55 PM, Adam Crain adam.cr...@greenenergycorp.com wrote: Hi Thomas, Can you share your client code for the iteration? It would probably help me catch my problem. Anyone know where in the cassandra source the integration tests are for this functionality on the random partitioner? Note that I posted a specific example where the iteration failed and I was not throwing out good keys only duplicate ones. That means 1 of 2 things: 1) I'm somehow using the API incorrectly 2) I am the only one encountering a bug My money is on 1) of course. I can check the thrift API against what my Scala client is calling under the hood. -Adam -Original Message- From: th.hel...@gmail.com on behalf of Thomas Heller Sent: Fri 8/6/2010 7:17 PM To: user@cassandra.apache.org Subject: Re: error using get_range_slice with random partitioner On Sat, Aug 7, 2010 at 1:05 AM, Adam Crain adam.cr...@greenenergycorp.com wrote: I took this approach... reject the first result of subsequent get_range_slice requests. If you look back at output I posted (below) you'll notice that not all of the 30 keys [key1...key30] get listed! The iteration dies and can't proceed past key2. 1) 1st batch gets 10 unique keys. 2) 2nd batch only gets 9 unique keys with the 1st being a repeat 3) 3rd batch only get 2 unqiue keys That means the iteration didn't see 9 keys in the CF. Key7 and Key30 are missing for example. Remember the returned results are NOT sorted, so you whenever you are dropping the first by default, you might be dropping a good one. At least that would be my guess here. I have iteration implemented in my client and everything is working as expected and so far I never had duplicates (running 0.6.3). I'm using tokens for range_slices tho, increment/decrement for get_slice only. /thomas
Re: Growing commit log directory.
what does the io load look like on those nodes? On Mon, Aug 9, 2010 at 1:50 PM, Edward Capriolo edlinuxg...@gmail.com wrote: I have a 16 node 6.3 cluster and two nodes from my cluster are giving me major headaches. 10.71.71.56 Up 58.19 GB 10827166220211678382926910108067277 | ^ 10.71.71.61 Down 67.77 GB 123739042516704895804863493611552076888 v | 10.71.71.66 Up 43.51 GB 127605887595351923798765477786913079296 | ^ 10.71.71.59 Down 90.22 GB 139206422831293007780471430312996086499 v | 10.71.71.65 Up 22.97 GB 148873535527910577765226390751398592512 | ^ The symptoms I am seeing are nodes 61 and nodes 59 have huge 6 GB + commit log directories. They keep growing, along with memory usage, eventually the logs start showing GCInspection errors and then the nodes will go OOM INFO 14:20:01,296 Creating new commitlog segment /var/lib/cassandra/commitlog/CommitLog-1281378001296.log INFO 14:20:02,199 GC for ParNew: 327 ms, 57545496 reclaimed leaving 7955651792 used; max is 9773776896 INFO 14:20:03,201 GC for ParNew: 443 ms, 45124504 reclaimed leaving 8137412920 used; max is 9773776896 INFO 14:20:04,314 GC for ParNew: 438 ms, 54158832 reclaimed leaving 8310139720 used; max is 9773776896 INFO 14:20:05,547 GC for ParNew: 409 ms, 56888760 reclaimed leaving 8480136592 used; max is 9773776896 INFO 14:20:06,900 GC for ParNew: 441 ms, 58149704 reclaimed leaving 8648872520 used; max is 9773776896 INFO 14:20:08,904 GC for ParNew: 462 ms, 59185992 reclaimed leaving 8816581312 used; max is 9773776896 INFO 14:20:09,973 GC for ParNew: 460 ms, 57403840 reclaimed leaving 8986063136 used; max is 9773776896 INFO 14:20:11,976 GC for ParNew: 447 ms, 59814376 reclaimed leaving 9153134392 used; max is 9773776896 INFO 14:20:13,150 GC for ParNew: 441 ms, 61879728 reclaimed leaving 9318140296 used; max is 9773776896 java.lang.OutOfMemoryError: Java heap space Dumping heap to java_pid10913.hprof ... INFO 14:22:30,620 InetAddress /10.71.71.66 is now dead. INFO 14:22:30,621 InetAddress /10.71.71.65 is now dead. INFO 14:22:30,621 GC for ConcurrentMarkSweep: 44862 ms, 261200 reclaimed leaving 9334753480 used; max is 9773776896 INFO 14:22:30,621 InetAddress /10.71.71.64 is now dead. Heap dump file created [12730501093 bytes in 253.445 secs] ERROR 14:28:08,945 Uncaught exception in thread Thread[Thread-2288,5,main] java.lang.OutOfMemoryError: Java heap space at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:71) ERROR 14:28:08,948 Uncaught exception in thread Thread[Thread-2281,5,main] java.lang.OutOfMemoryError: Java heap space at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:71) INFO 14:28:09,017 GC for ConcurrentMarkSweep: 33737 ms, 85880 reclaimed leaving 9335215296 used; max is 9773776896 Does anyone have any ideas what is going on?
Re: COMMIT-LOG_WRITER Assertion Error
Sounds like you upgraded to trunk from 0.6 without draining your commitlog first? On Mon, Aug 9, 2010 at 3:30 PM, Arya Goudarzi agouda...@gaiaonline.com wrote: Just throwing this out there as it could be a concern. I had a cluster of 3 nodes running. Over the weekend I updated to trunc (Aug 9th @ 2pm). Today, I came to run my daily tests and my client kept giving me TSocket timeouts. Checking the error log of Cassandra servers, all 3 nodes had this and they all became unresponsive! Not sure how to reproduce this but a restart of all 3 nodes fixed the issue: ERROR [COMMIT-LOG-WRITER] 2010-08-09 11:30:27,722 CassandraDaemon.java (line 82) Uncaught exception in thread Thread[COMMIT-LOG-WRITER,5,main] java.lang.AssertionError at org.apache.cassandra.db.commitlog.CommitLogHeader$CommitLogHeaderSerializer.serialize(CommitLogHeader.java:157) at org.apache.cassandra.db.commitlog.CommitLogHeader.writeCommitLogHeader(CommitLogHeader.java:124) at org.apache.cassandra.db.commitlog.CommitLogSegment.writeHeader(CommitLogSegment.java:70) at org.apache.cassandra.db.commitlog.CommitLogSegment.write(CommitLogSegment.java:103) at org.apache.cassandra.db.commitlog.CommitLog$LogRecordAdder.run(CommitLog.java:521) at org.apache.cassandra.db.commitlog.PeriodicCommitLogExecutorService$1.runMayThrow(PeriodicCommitLogExecutorService.java:52) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30) at java.lang.Thread.run(Thread.java:636) -Arya -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com
Re: Growing commit log directory.
what does tpstats or other JMX monitoring of the o.a.c.concurrent stages show? On Mon, Aug 9, 2010 at 4:50 PM, Edward Capriolo edlinuxg...@gmail.com wrote: I have a 16 node 6.3 cluster and two nodes from my cluster are giving me major headaches. 10.71.71.56 Up 58.19 GB 10827166220211678382926910108067277 | ^ 10.71.71.61 Down 67.77 GB 123739042516704895804863493611552076888 v | 10.71.71.66 Up 43.51 GB 127605887595351923798765477786913079296 | ^ 10.71.71.59 Down 90.22 GB 139206422831293007780471430312996086499 v | 10.71.71.65 Up 22.97 GB 148873535527910577765226390751398592512 | ^ The symptoms I am seeing are nodes 61 and nodes 59 have huge 6 GB + commit log directories. They keep growing, along with memory usage, eventually the logs start showing GCInspection errors and then the nodes will go OOM INFO 14:20:01,296 Creating new commitlog segment /var/lib/cassandra/commitlog/CommitLog-1281378001296.log INFO 14:20:02,199 GC for ParNew: 327 ms, 57545496 reclaimed leaving 7955651792 used; max is 9773776896 INFO 14:20:03,201 GC for ParNew: 443 ms, 45124504 reclaimed leaving 8137412920 used; max is 9773776896 INFO 14:20:04,314 GC for ParNew: 438 ms, 54158832 reclaimed leaving 8310139720 used; max is 9773776896 INFO 14:20:05,547 GC for ParNew: 409 ms, 56888760 reclaimed leaving 8480136592 used; max is 9773776896 INFO 14:20:06,900 GC for ParNew: 441 ms, 58149704 reclaimed leaving 8648872520 used; max is 9773776896 INFO 14:20:08,904 GC for ParNew: 462 ms, 59185992 reclaimed leaving 8816581312 used; max is 9773776896 INFO 14:20:09,973 GC for ParNew: 460 ms, 57403840 reclaimed leaving 8986063136 used; max is 9773776896 INFO 14:20:11,976 GC for ParNew: 447 ms, 59814376 reclaimed leaving 9153134392 used; max is 9773776896 INFO 14:20:13,150 GC for ParNew: 441 ms, 61879728 reclaimed leaving 9318140296 used; max is 9773776896 java.lang.OutOfMemoryError: Java heap space Dumping heap to java_pid10913.hprof ... INFO 14:22:30,620 InetAddress /10.71.71.66 is now dead. INFO 14:22:30,621 InetAddress /10.71.71.65 is now dead. INFO 14:22:30,621 GC for ConcurrentMarkSweep: 44862 ms, 261200 reclaimed leaving 9334753480 used; max is 9773776896 INFO 14:22:30,621 InetAddress /10.71.71.64 is now dead. Heap dump file created [12730501093 bytes in 253.445 secs] ERROR 14:28:08,945 Uncaught exception in thread Thread[Thread-2288,5,main] java.lang.OutOfMemoryError: Java heap space at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:71) ERROR 14:28:08,948 Uncaught exception in thread Thread[Thread-2281,5,main] java.lang.OutOfMemoryError: Java heap space at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:71) INFO 14:28:09,017 GC for ConcurrentMarkSweep: 33737 ms, 85880 reclaimed leaving 9335215296 used; max is 9773776896 Does anyone have any ideas what is going on? -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com
Re: COMMIT-LOG_WRITER Assertion Error
I've never run 0.6. I have been running of trunc with automatic svn update and build everyday at 2pm. One of my nodes got this error which lead to the same last error prior to build and restart today. Hope this helps better: java.lang.RuntimeException: java.util.concurrent.ExecutionException: java.lang.RuntimeException: java.lang.RuntimeException: java.util.concurrent.ExecutionException: java.lang.AssertionError at org.apache.cassandra.utils.FBUtilities.waitOnFutures(FBUtilities.java:549) at org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:339) at org.apache.cassandra.db.commitlog.CommitLog.recover(CommitLog.java:174) at org.apache.cassandra.thrift.CassandraDaemon.setup(CassandraDaemon.java:120) at org.apache.cassandra.service.AbstractCassandraDaemon.activate(AbstractCassandraDaemon.java:90) at org.apache.cassandra.thrift.CassandraDaemon.main(CassandraDaemon.java:224) Caused by: java.util.concurrent.ExecutionException: java.lang.RuntimeException: java.lang.RuntimeException: java.util.concurrent.ExecutionException: java.lang.AssertionError at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:252) at java.util.concurrent.FutureTask.get(FutureTask.java:111) at org.apache.cassandra.utils.FBUtilities.waitOnFutures(FBUtilities.java:545) ... 5 more Caused by: java.lang.RuntimeException: java.lang.RuntimeException: java.util.concurrent.ExecutionException: java.lang.AssertionError at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:34) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:636) Caused by: java.lang.RuntimeException: java.util.concurrent.ExecutionException: java.lang.AssertionError at org.apache.cassandra.db.commitlog.CommitLog.discardCompletedSegments(CommitLog.java:408) at org.apache.cassandra.db.ColumnFamilyStore$2.runMayThrow(ColumnFamilyStore.java:445) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30) ... 6 more Caused by: java.util.concurrent.ExecutionException: java.lang.AssertionError at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:252) at java.util.concurrent.FutureTask.get(FutureTask.java:111) at org.apache.cassandra.db.commitlog.CommitLog.discardCompletedSegments(CommitLog.java:400) ... 8 more Caused by: java.lang.AssertionError at org.apache.cassandra.db.commitlog.CommitLogHeader$CommitLogHeaderSerializer.serialize(CommitLogHeader.java:157) at org.apache.cassandra.db.commitlog.CommitLogHeader.writeCommitLogHeader(CommitLogHeader.java:124) at org.apache.cassandra.db.commitlog.CommitLogSegment.writeHeader(CommitLogSegment.java:70) at org.apache.cassandra.db.commitlog.CommitLog.discardCompletedSegmentsInternal(CommitLog.java:450) at org.apache.cassandra.db.commitlog.CommitLog.access$300(CommitLog.java:75) at org.apache.cassandra.db.commitlog.CommitLog$6.call(CommitLog.java:394) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at org.apache.cassandra.db.commitlog.PeriodicCommitLogExecutorService$1.runMayThrow(PeriodicCommitLogExecutorService.java:52) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30) ... 1 more - Original Message - From: Jonathan Ellis jbel...@gmail.com To: user@cassandra.apache.org Sent: Monday, August 9, 2010 5:18:35 PM Subject: Re: COMMIT-LOG_WRITER Assertion Error Sounds like you upgraded to trunk from 0.6 without draining your commitlog first? On Mon, Aug 9, 2010 at 3:30 PM, Arya Goudarzi agouda...@gaiaonline.com wrote: Just throwing this out there as it could be a concern. I had a cluster of 3 nodes running. Over the weekend I updated to trunc (Aug 9th @ 2pm). Today, I came to run my daily tests and my client kept giving me TSocket timeouts. Checking the error log of Cassandra servers, all 3 nodes had this and they all became unresponsive! Not sure how to reproduce this but a restart of all 3 nodes fixed the issue: ERROR [COMMIT-LOG-WRITER] 2010-08-09 11:30:27,722 CassandraDaemon.java (line 82) Uncaught exception in thread Thread[COMMIT-LOG-WRITER,5,main] java.lang.AssertionError at org.apache.cassandra.db.commitlog.CommitLogHeader$CommitLogHeaderSerializer.serialize(CommitLogHeader.java:157) at
Re: Growing commit log directory.
On Mon, Aug 9, 2010 at 8:20 PM, Jonathan Ellis jbel...@gmail.com wrote: what does tpstats or other JMX monitoring of the o.a.c.concurrent stages show? On Mon, Aug 9, 2010 at 4:50 PM, Edward Capriolo edlinuxg...@gmail.com wrote: I have a 16 node 6.3 cluster and two nodes from my cluster are giving me major headaches. 10.71.71.56 Up 58.19 GB 10827166220211678382926910108067277 | ^ 10.71.71.61 Down 67.77 GB 123739042516704895804863493611552076888 v | 10.71.71.66 Up 43.51 GB 127605887595351923798765477786913079296 | ^ 10.71.71.59 Down 90.22 GB 139206422831293007780471430312996086499 v | 10.71.71.65 Up 22.97 GB 148873535527910577765226390751398592512 | ^ The symptoms I am seeing are nodes 61 and nodes 59 have huge 6 GB + commit log directories. They keep growing, along with memory usage, eventually the logs start showing GCInspection errors and then the nodes will go OOM INFO 14:20:01,296 Creating new commitlog segment /var/lib/cassandra/commitlog/CommitLog-1281378001296.log INFO 14:20:02,199 GC for ParNew: 327 ms, 57545496 reclaimed leaving 7955651792 used; max is 9773776896 INFO 14:20:03,201 GC for ParNew: 443 ms, 45124504 reclaimed leaving 8137412920 used; max is 9773776896 INFO 14:20:04,314 GC for ParNew: 438 ms, 54158832 reclaimed leaving 8310139720 used; max is 9773776896 INFO 14:20:05,547 GC for ParNew: 409 ms, 56888760 reclaimed leaving 8480136592 used; max is 9773776896 INFO 14:20:06,900 GC for ParNew: 441 ms, 58149704 reclaimed leaving 8648872520 used; max is 9773776896 INFO 14:20:08,904 GC for ParNew: 462 ms, 59185992 reclaimed leaving 8816581312 used; max is 9773776896 INFO 14:20:09,973 GC for ParNew: 460 ms, 57403840 reclaimed leaving 8986063136 used; max is 9773776896 INFO 14:20:11,976 GC for ParNew: 447 ms, 59814376 reclaimed leaving 9153134392 used; max is 9773776896 INFO 14:20:13,150 GC for ParNew: 441 ms, 61879728 reclaimed leaving 9318140296 used; max is 9773776896 java.lang.OutOfMemoryError: Java heap space Dumping heap to java_pid10913.hprof ... INFO 14:22:30,620 InetAddress /10.71.71.66 is now dead. INFO 14:22:30,621 InetAddress /10.71.71.65 is now dead. INFO 14:22:30,621 GC for ConcurrentMarkSweep: 44862 ms, 261200 reclaimed leaving 9334753480 used; max is 9773776896 INFO 14:22:30,621 InetAddress /10.71.71.64 is now dead. Heap dump file created [12730501093 bytes in 253.445 secs] ERROR 14:28:08,945 Uncaught exception in thread Thread[Thread-2288,5,main] java.lang.OutOfMemoryError: Java heap space at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:71) ERROR 14:28:08,948 Uncaught exception in thread Thread[Thread-2281,5,main] java.lang.OutOfMemoryError: Java heap space at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:71) INFO 14:28:09,017 GC for ConcurrentMarkSweep: 33737 ms, 85880 reclaimed leaving 9335215296 used; max is 9773776896 Does anyone have any ideas what is going on? -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com Hey guys thanks for the help. I had lowered my Xmx from 12GB to 10xmx because I saw: [r...@cdbsd09 ~]# /usr/local/cassandra/bin/nodetool --host 10.71.71.59 --port 8585 info 123739042516704895804863493611552076888 Load : 68.91 GB Generation No: 1281407425 Uptime (seconds) : 1459 Heap Memory (MB) : 6476.70 / 12261.00 This was happening: [r...@cdbsd11 ~]# /usr/local/cassandra/bin/nodetool --host cdbsd09.hadoop.pvt --port 8585 tpstats Pool NameActive Pending Completed STREAM-STAGE 0 0 0 RESPONSE-STAGE0 0 16478 ROW-READ-STAGE 64 4014 18190 LB-OPERATIONS 0 0 0 MESSAGE-DESERIALIZER-POOL 0 0 60290 GMFD 0 0385 LB-TARGET 0 0 0 CONSISTENCY-MANAGER 0 0 7526 ROW-MUTATION-STAGE 64 908 182612 MESSAGE-STREAMING-POOL0 0 0 LOAD-BALANCER-STAGE 0 0 0 FLUSH-SORTER-POOL 0 0 0 MEMTABLE-POST-FLUSHER 0 0 8 FLUSH-WRITER-POOL 0 0 8 AE-SERVICE-STAGE 0 0 0 HINTED-HANDOFF-POOL 1 9 6 After raising the level I realized I was maxing out the heap. The other nodes are running fine with xmx9GB but I guess these nodes can not. Thanks again. Edward
explanation of generated files and ops
In /var/lib/cassandra there is: /data/system LocationInfo-4-Data.db LocationInfo-4-Filter.db LocationInfo-4-Index.db .. .. /data/Keyspace1/ Standard2-2-Data.db Standard2-2-Filter.db Standard2-2-Index.db /commitlog CommitLog-timestamp.log /var/log/cassandra system.log Is this pretty much all the files that Cassandra generates? (have I missed any) Are there are common administrative tasks that admins might need to perform on these files at all? What exactly is stored in the -Filter.db files?
Re: How to migrate any relational database to Cassandra
Maybe you could integrate with Hadoop. On Mon, Aug 9, 2010 at 1:15 PM, sonia gehlot sonia.geh...@gmail.com wrote: Hi Guys, Thanks for sharing your experiences and valuable links these are really helpful. But I want to do ETL and then wanted to load data in Cassandra. I have link 10-15 various source system, presently daily ETL jobs runs load data in our database which is Netezza. How can I do this in Cassandra, like what if my target data base is source are the same (MySQL, Oracle, Netezza..etc)? -Sonia On Sat, Aug 7, 2010 at 7:46 PM, Zhong Li z...@voxeo.com wrote: Yes, I use OrderPreservngPartitioner, the token considers datacenter+ip+function+timestamp+recordId+... On Aug 7, 2010, at 10:36 PM, Jonathan Ellis wrote: are you using OrderPreservingPartitioner then? On Sat, Aug 7, 2010 at 10:32 PM, Zhong Li z...@voxeo.com wrote: Here is just my personal experiences. I recently use Cassandra to implement a system cross 5 datacenters. Because it is impossible to do it in SQL Database at low cost, Cassandra helps. Cassandra is all about indexing, there is no relationship naturally, you have to use indexing to keep all relationships. This is fine, because you can add new index when you want. The big pain is the token. Only one token you can choose for a node, all system have to adopt same rule to create index. It is huge huge pain. If Cassandra can implement token at CF level, it is much nature and easy for us to implement a system. Best, Zhong On Aug 6, 2010, at 9:23 PM, Peter Harrison wrote: On Sat, Aug 7, 2010 at 6:00 AM, sonia gehlot sonia.geh...@gmail.com wrote: Can you please help me how to move forward? How should I do all the setup for this? My view is that Cassandra is fundamentally different from SQL databases. There may be artefact's which are superficially similar between the two systems, but I guess I'm thinking of a move to Cassandra like my move from dBase to Delphi; in other words there were concepts which modified how you write applications. Now, you can do something similar to a SQL database, but I don't think you would be leveraging the features of Cassandra. That said, I think there will be a new generation of abstraction tools that will make modeling easier. A perhaps more practical answer: there is no one to one mapping between SQL and Cassandra. -- Jonathan Ellis Project Chair, Apache Cassandra co-founder of Riptano, the source for professional Cassandra support http://riptano.com -- Regards Peng Guo