sometimes get timeout while batch inserting. (using pycassa)

2012-09-20 Thread Yan Chunlu
I am testing the performance of 1 cassandra node on a production server. I wrote a script to insert 1 million items into cassandra. the data is like below: *prefix = benchmark_* *dct = {}* *for i in range(0,100):* *key = %s%d % (prefix,i)* *dct[key] = abc*200* and the inserting code

Re: sometimes get timeout while batch inserting. (using pycassa)

2012-09-20 Thread Yan Chunlu
forgot to mention the rpc configuration in cassandra.yaml is: rpc_timeout_in_ms: 2 and the cassandra version on production server is: 1.1.3 the cassandra version I am using on my macbook is: 1.0.10 On Thu, Sep 20, 2012 at 6:07 PM, Yan Chunlu springri...@gmail.com wrote: I am testing

Re: increased RF and repair, not working?

2012-07-27 Thread Yan Chunlu
...@mebigfatguy.comwrote: Quorum is defined as (replication_factor / 2) + 1 therefore quorum when rf = 2 is 2! so in your case, both nodes must be up. Really, using Quorum only starts making sense as a 'quorum' when RF=3 On 07/26/2012 10:38 PM, Yan Chunlu wrote: I am using Cassandra 1.0.2, have a 3

increased RF and repair, not working?

2012-07-26 Thread Yan Chunlu
I am using Cassandra 1.0.2, have a 3 nodes cluster. the consistency level of read write are both QUORUM. At first the RF=1, and I figured that one node down will cause the cluster unusable. so I changed RF to 2, and run nodetool repair on every node(actually I did it twice). After the

cassandra halt after started minutes later

2012-07-01 Thread Yan Chunlu
I have a three node cluster running 1.0.2, today there's a very strange problem that suddenly two of cassandra node(let's say B and C) was costing a lot of cpu, turned out for some reason the java binary just dont run I am using OpenJDK1.6.0_18, so I switched to sun jdk, which works okay.

Re: cassandra halt after started minutes later

2012-07-01 Thread Yan Chunlu
@/192.168.1.40 DEBUG [Thread-11] 2012-07-01 23:29:42,939 IncomingTcpConnection.java (line 116) Version is now 3 On Sun, Jul 1, 2012 at 11:14 PM, Yan Chunlu springri...@gmail.com wrote: I have a three node cluster running 1.0.2, today there's a very strange problem that suddenly two of cassandra

Re: cassandra halt after started minutes later

2012-07-01 Thread Yan Chunlu
+%m%d%H%M%C%y.%S`; date; In a terminal and see if everything starts working again. I hope this helps. -- David Daeschler On Sun, Jul 1, 2012 at 11:33 AM, Yan Chunlu springri...@gmail.com wrote: adjust the timezone of java by -Duser.timezone and the timezone of cassandra is the same

how to reduce latency?

2012-06-13 Thread Yan Chunlu
I have three nodes running cassandra 0.7.4 about two years, as showed below: 10.x.x.x Up Normal 138.07 GB 33.33% 0 10.x.x.x Up Normal 143.97 GB 33.33% 56713727820156410577229101238628035242 10.x.x.x Up Normal 137.33 GB 33.33%

Re: is that possible to add more data structure(key-list) in cassandra?

2011-11-11 Thread Yan Chunlu
I thought currently no one is maintaining supercolumns related code, and also it not quite efficient. On Fri, Nov 11, 2011 at 2:46 PM, Radim Kolar h...@sendmail.cz wrote: Dne 11.11.2011 5:58, Yan Chunlu napsal(a): I think cassandra is doing great job on key-value data store, it saved me

is that possible to add more data structure(key-list) in cassandra?

2011-11-10 Thread Yan Chunlu
I think cassandra is doing great job on key-value data store, it saved me tremendous work on maintain the data consistency and service availability. But I think it would be great if it could support more data structures such as key-list, currently I am using key-value save the list, it seems

Re: anyway to throttle nodetool repair?

2011-10-11 Thread Yan Chunlu
as I asked earlier: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/how-does-compaction-throughput-kb-per-sec-affect-disk-io-td6831711.html might not directly throttle the disk I/O? it would be easy if ionice could work with cassandra. not sure it is because of jvm or something

Re: anyway to throttle nodetool repair?

2011-10-10 Thread Yan Chunlu
so how about disk io? is there anyway to use ionice to control it? I have tried to adjust the priority by ionice -c3 -p [cassandra pid]. seems not working... On Wed, Sep 28, 2011 at 12:02 AM, Peter Schuller peter.schul...@infidyne.com wrote: I saw the ticket about compaction throttling,

Re: anyway to throttle nodetool repair?

2011-10-10 Thread Yan Chunlu
I am using commodity hardware so even minor compact make disk io goes 100% and server load get very high On Tue, Oct 11, 2011 at 11:19 AM, Yan Chunlu springri...@gmail.com wrote: so how about disk io? is there anyway to use ionice to control it? I have tried to adjust the priority by ionice

anyway to disable row/key cache on single node while starting it?

2011-09-27 Thread Yan Chunlu
again I was doing repair on single CF and it crashed because of OOM, leaving 286GB data(should be 40GB). the problem here is it take very very long to make the node back to alive, seems because of it was loading row cache. the last time I encountered this, I did people suggested that

anyway to throttle nodetool repair?

2011-09-27 Thread Yan Chunlu
I saw the ticket about compaction throttling, just wonder is that necessary to add an option or is there anyway to do repair throttling? every time I run nodetool repair, it uses all disk io and the server load goes up quickly, just wonder is there anyway to make it smoother.

how does compaction_throughput_kb_per_sec affect disk io?

2011-09-26 Thread Yan Chunlu
I am using the default 16MB when running repair. but the disk io is still quite high: Device: rrqm/s wrqm/s r/s w/srkB/swkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util sdb 136.00 0.00 506.00 26.00 63430.00 5880.00 260.56 101.73 224.38

Re: how does compaction_throughput_kb_per_sec affect disk io?

2011-09-26 Thread Yan Chunlu
okay, thanks! On Mon, Sep 26, 2011 at 10:38 PM, Jonathan Ellis jbel...@gmail.com wrote: compaction throughput doesn't affect flushing or reads On Mon, Sep 26, 2011 at 7:40 AM, Yan Chunlu springri...@gmail.com wrote: I am using the default 16MB when running repair. but the disk io is still

Re: progress of sstableloader keeps 0?

2011-09-25 Thread Yan Chunlu
. - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 25/09/2011, at 6:07 PM, Yan Chunlu wrote: yes, I did. thought 0.8 is downward compatible. is there other ways to load 0.7's data into 0.8? will copy the data dir directly will work? I would like

Re: progress of sstableloader keeps 0?

2011-09-25 Thread Yan Chunlu
is not receiving writes. If you want to merge the data from 3 nodes rename the files AFAIK they do not have to have contiguous file numbers. Cheers - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 25/09/2011, at 10:45 PM, Yan Chunlu

Re: progress of sstableloader keeps 0?

2011-09-24 Thread Yan Chunlu
like it is complaining that you are trying to load a 0.7 SSTable in 0.8. Cheers - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 23/09/2011, at 5:23 PM, Yan Chunlu wrote: sorry I did not look into it after check it I found

Re: Moving to a new cluster

2011-09-24 Thread Yan Chunlu
will be related to how out of sync things are, so once you get repair working smoothly it will be less of problem. Cheers - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 23/09/2011, at 3:04 AM, Yan Chunlu wrote: hi Aaron: could you

progress of sstableloader keeps 0?

2011-09-22 Thread Yan Chunlu
I took a snapshot of one of my node in a cluster 0.7.4(N=RF=3). use sstableloader to load the snapshot data to another 1 node cluster(N=RF=1). after execute bin/sstableloader /disk2/mykeyspace/ it saysStarting client (and waiting 30 seconds for gossip) ... Streaming revelant part of

Re: progress of sstableloader keeps 0?

2011-09-22 Thread Yan Chunlu
, 2011 at 2:16 AM, Jonathan Ellis jbel...@gmail.com wrote: Did you check for errors in logs on both loader + target? On Thu, Sep 22, 2011 at 10:52 AM, Yan Chunlu springri...@gmail.com wrote: I took a snapshot of one of my node in a cluster 0.7.4(N=RF=3). use sstableloader to load

Re: Ignorning message. showing in the log while upgrade to 0.8

2011-09-20 Thread Yan Chunlu
, Yan Chunlu wrote: any help on this? thanks! On Sun, Sep 18, 2011 at 5:04 PM, Yan Chunlu springri...@gmail.com wrote: thanks! is the load info also a bug? node1 supposed to have 80MB. bash-3.2$ bin/nodetool -h localhost ring Address DC RackStatus State Load

Re: [RELEASE] Apache Cassandra 0.8.6 released

2011-09-20 Thread Yan Chunlu
Great! just waiting for it. On Tue, Sep 20, 2011 at 6:12 PM, Sylvain Lebresne sylv...@datastax.comwrote: The Cassandra team is pleased to announce the release of Apache Cassandra version 0.8.6. Cassandra is a highly scalable second-generation distributed database, bringing together

Re: cassandra crashed while repairing, leave node size X3

2011-09-19 Thread Yan Chunlu
I am using 0.7.4 too. and would waiting for 0.8.6 stable to release because of CASSANDRA-3166. did you already using 0.8.6 in production? 2011/9/19 Jonas Borgström jonas.borgst...@trioptima.com On 09/19/2011 04:26 AM, Anand Somani wrote: In my tests I have seen repair sometimes take a lot of

Re: Ignorning message. showing in the log while upgrade to 0.8

2011-09-19 Thread Yan Chunlu
any help on this? thanks! On Sun, Sep 18, 2011 at 5:04 PM, Yan Chunlu springri...@gmail.com wrote: thanks! is the load info also a bug? node1 supposed to have 80MB. bash-3.2$ bin/nodetool -h localhost ring Address DC RackStatus State LoadOwns

Re: cassandra crashed while repairing, leave node size X3

2011-09-19 Thread Yan Chunlu
got it, thanks! On Tue, Sep 20, 2011 at 12:27 AM, Peter Schuller peter.schul...@infidyne.com wrote: In my tests I have seen repair sometimes take a lot of space (2-3 times), cleanup did not clean it, the only way I could clean that was using major compaction.

Re: Ignorning message. showing in the log while upgrade to 0.8

2011-09-18 Thread Yan Chunlu
thanks! is the load info also a bug? node1 supposed to have 80MB. bash-3.2$ bin/nodetool -h localhost ring Address DC RackStatus State LoadOwns Token 93798607613553124915572813490354413064 node2 datacenter1 rack1 Up Normal 86.03 MB

cassandra crashed while repairing, leave node size X3

2011-09-18 Thread Yan Chunlu
while doing repair on node3, the Load keep increasing, suddenly cassandra has encountered OOM, and the Load stopped at 140GB, after cassandra came back, I tried node cleanup but it seems not working does node repair generate many temp sstables? how to get rid of them? thanks! Address

Re: cassandra crashed while repairing, leave node size X3

2011-09-18 Thread Yan Chunlu
could clean that was using major compaction. On Sun, Sep 18, 2011 at 6:51 PM, Yan Chunlu springri...@gmail.com wrote: while doing repair on node3, the Load keep increasing, suddenly cassandra has encountered OOM, and the Load stopped at 140GB, after cassandra came back, I tried node cleanup

Re: Ignorning message. showing in the log while upgrade to 0.8

2011-09-17 Thread Yan Chunlu
also not fixed in 0.8.5? I am using the binary version of 0.8.5. Applying the fix might need to compile it from the source? On Sun, Sep 18, 2011 at 3:28 AM, Peter Schuller peter.schul...@infidyne.com wrote: I am running local tests about upgrade cassandra. upgrade from 0.7.4 to 0.8.5

Ignorning message. showing in the log while upgrade to 0.8

2011-09-16 Thread Yan Chunlu
I am running local tests about upgrade cassandra. upgrade from 0.7.4 to 0.8.5 after upgrade one node1, two problem happened: 1, node2 keep saying: Received connection from newer protocol version. Ignorning message. is that normal behaviour? 2, while running describe cluster on node1, it

Re: Ignorning message. showing in the log while upgrade to 0.8

2011-09-16 Thread Yan Chunlu
after kill node1 and start it again, node 3 has the same problems with node2... On Fri, Sep 16, 2011 at 10:42 PM, Yan Chunlu springri...@gmail.com wrote: I am running local tests about upgrade cassandra. upgrade from 0.7.4 to 0.8.5 after upgrade one node1, two problem happened: 1, node2

Re: Ignorning message. showing in the log while upgrade to 0.8

2011-09-16 Thread Yan Chunlu
:48 PM, Yan Chunlu springri...@gmail.com wrote: after kill node1 and start it again, node 3 has the same problems with node2... On Fri, Sep 16, 2011 at 10:42 PM, Yan Chunlu springri...@gmail.comwrote: I am running local tests about upgrade cassandra. upgrade from 0.7.4 to 0.8.5 after

how did hprof file generated?

2011-09-15 Thread Yan Chunlu
in one of my node, I found many hprof files in the cassandra installation directory, they are using as much as 200GB disk space. other nodes didn't have those files. turns out that those files are used for memory analyzing, not sure how they are generated? like these: java_pid10626.hprof

Re: how did hprof file generated?

2011-09-15 Thread Yan Chunlu
got it! thanks! On Thu, Sep 15, 2011 at 4:10 PM, Peter Schuller peter.schul...@infidyne.com wrote: in one of my node, I found many hprof files in the cassandra installation directory, they are using as much as 200GB disk space. other nodes didn't have those files. turns out that those

Re: what's the difference between repair CF separately and repair the entire node?

2011-09-14 Thread Yan Chunlu
is 0.8 ready for production use? as I know currently many companies including reddit.com are using 0.7, how does they get rid of the repair problem? On Wed, Sep 14, 2011 at 2:47 PM, Sylvain Lebresne sylv...@datastax.comwrote: On Wed, Sep 14, 2011 at 2:38 AM, Yan Chunlu springri...@gmail.com

Re: what's the difference between repair CF separately and repair the entire node?

2011-09-14 Thread Yan Chunlu
thanks a lot for the help! I have read the post and think 0.8 might be good enough for me, especially 0.8.5. also change gc_grace_seconds is a acceptable solution. On Wed, Sep 14, 2011 at 4:03 PM, Sylvain Lebresne sylv...@datastax.comwrote: On Wed, Sep 14, 2011 at 9:27 AM, Yan Chunlu

segment fault with 0.8.5

2011-09-14 Thread Yan Chunlu
just tried cassandra 0.8.5 binary version, and got Segment fault I am using Sun JDK so this is not CASSANDRA-2441 OS is Debian 5.0 java -version java version 1.6.0_04 Java(TM) SE Runtime Environment (build 1.6.0_04-b12) Java HotSpot(TM) Server VM (build 10.0-b19, mixed mode) uname -a

Re: what's the difference between repair CF separately and repair the entire node?

2011-09-13 Thread Yan Chunlu
me neither don't want to repair one CF at the time. the node repair took a week and still running, compactionstats and netstream shows nothing is running on every node, and also no error message, no exception, really no idea what was it doing, I stopped yesterday. maybe I should run repair

Re: what's the difference between repair CF separately and repair the entire node?

2011-09-12 Thread Yan Chunlu
I think it is a serious problem since I can not repair. I am using cassandra on production servers. is there some way to fix it without upgrade? I heard of that 0.8.x is still not quite ready in production environment. thanks! On Tue, Sep 13, 2011 at 1:44 AM, Peter Schuller

what's the difference between repair CF separately and repair the entire node?

2011-09-08 Thread Yan Chunlu
I have 3 nodes and RF=3. I tried to repair every node in the cluster by using nodetool repair mykeyspace mycf on every column family. it finished within 3 hours, the data size is no more than 50GB. after the repair, I have tried using nodetool repair immediately to repair the entire node, but

cassandra auto create snapshots?

2011-08-29 Thread Yan Chunlu
just found the data dir consume a lot of space, which is because there was many snapshots in it. but I have set snapshot_before_compaction: false. is that possible that cassandra create those snapshot automatically? could I delete them? the dir names is strange(normally it should contain date

Re: cassandra auto create snapshots?

2011-08-29 Thread Yan Chunlu
at 4:19 PM, Yan Chunlu springri...@gmail.com wrote: just found the data dir consume a lot of space, which is because there was many snapshots in it. but I have set snapshot_before_compaction: false. is that possible that cassandra create those snapshot automatically? could I delete them

Re: cassandra auto create snapshots?

2011-08-29 Thread Yan Chunlu
wrote: No. On Mon, Aug 29, 2011 at 8:15 PM, Yan Chunlu springri...@gmail.com wrote: so it was useless? I didn't drop any CF/KS, could nodetool move, nodetool repair cause the problem? On Tue, Aug 30, 2011 at 5:23 AM, Jonathan Ellis jbel...@gmail.com wrote: Perhaps you are seeing

Re: how to know if nodetool cleanup is safe?

2011-08-24 Thread Yan Chunlu
got it! thanks a lot for the explanation! On Wed, Aug 24, 2011 at 1:06 AM, Edward Capriolo edlinuxg...@gmail.comwrote: On Tue, Aug 23, 2011 at 11:56 AM, Sam Overton sover...@acunu.com wrote: On 21 August 2011 12:34, Yan Chunlu springri...@gmail.com wrote: since nodetool cleanup could

Re: get mycf['rowkey']['column_name'] return 'Value was not found' in cassandra-cli

2011-08-22 Thread Yan Chunlu
that isn't dealing with bytestype column names correctly On Mon, Aug 22, 2011 at 12:08 AM, Yan Chunlu springri...@gmail.com wrote: connect to cassandra-cli and issue the list my cf I got RowKey: comments_62559 = (column=76616c7565, value

Re: get mycf['rowkey']['column_name'] return 'Value was not found' in cassandra-cli

2011-08-22 Thread Yan Chunlu
thanks a lot! On Mon, Aug 22, 2011 at 10:14 PM, Edward Capriolo edlinuxg...@gmail.comwrote: On Mon, Aug 22, 2011 at 1:08 AM, Yan Chunlu springri...@gmail.com wrote: connect to cassandra-cli and issue the list my cf I got RowKey: comments_62559 = (column=76616c7565, value

how to know if nodetool cleanup is safe?

2011-08-21 Thread Yan Chunlu
since nodetool cleanup could remove hinted handoff, will it cause the data loss?

would it possible for this kind of data loss?

2011-08-21 Thread Yan Chunlu
I was aware of the deleted items might be come back alive without proper node repair. how about modified items, for example 'A'={1,2,3}. then 'A'={4,5}. if that possible 'A' change back to {1,2,3}? I have encountered this mystery problem after go through a mess procedure with cassandra nodes,

The schema has not settled in 10 seconds; further migrations are ill-advised until it does.?

2011-08-21 Thread Yan Chunlu
I have encountered this problem while update the key cache and row cache. I once updated them to 0(disable) while node2 was not available, when it comeback they eventually have the same schema version. [default@prjspace] describe cluster; Cluster Information: Snitch:

Re: Cassandra Cluster Admin - phpMyAdmin for Cassandra

2011-08-21 Thread Yan Chunlu
just tried it and it works like a charming! thanks a lot for the great work! On Mon, Aug 22, 2011 at 9:47 AM, SebWajam sebast...@wajam.com wrote: Hi, I'm working on this project for a few months now and I think it's mature enough to post it here: Cassandra Cluster Admin on

Re: The schema has not settled in 10 seconds; further migrations are ill-advised until it does.?

2011-08-21 Thread Yan Chunlu
was interrupted? it happens EVERY time I update the schema, that's the part I was worrying about, but after the error describe cluster didn't show anything wrong On Mon, Aug 22, 2011 at 10:19 AM, Edward Capriolo edlinuxg...@gmail.comwrote: On Sun, Aug 21, 2011 at 10:09 PM, Yan Chunlu

Re: node restart taking too long

2011-08-20 Thread Yan Chunlu
any suggestion? thanks! On Fri, Aug 19, 2011 at 10:26 PM, Yan Chunlu springri...@gmail.com wrote: the log file shows as follows, not sure what does 'Couldn't find cfId=1000' means(google just returned useless results): INFO [main] 2011-08-18 07:23:17,688 DatabaseDescriptor.java (line 453

Re: node restart taking too long

2011-08-20 Thread Yan Chunlu
? On Sun, Aug 21, 2011 at 5:56 AM, Jonathan Ellis jbel...@gmail.com wrote: This means you should upgrade, because we've fixed bugs about ignoring deleted CFs since 0.7.4. On Fri, Aug 19, 2011 at 9:26 AM, Yan Chunlu springri...@gmail.com wrote: the log file shows as follows, not sure what does

Re: node restart taking too long

2011-08-19 Thread Yan Chunlu
-3bd951658d61: [node1, node3] 2127b2ef-6998-11e0-b45b-3bd951658d61: [node2] is that enough delete Schema* Migrations* sstables and restart the node? On Thu, Aug 18, 2011 at 5:13 PM, Yan Chunlu springri...@gmail.com wrote: thanks a lot for all the help! I have gone through the steps

Re: node restart taking too long

2011-08-18 Thread Yan Chunlu
into cassandra during the startup of cassandra. On Wed, Aug 17, 2011 at 5:52 PM, Yan Chunlu springri...@gmail.com wrote: but the data size in the saved_cache are relatively small: will that cause the load problem? ls -lh /cassandra/saved_caches/ total 32M -rw-r--r-- 1 cass cass 2.9M 2011-08

Re: node restart taking too long

2011-08-18 Thread Yan Chunlu
, node3] 2127b2ef-6998-11e0-b45b-3bd951658d61: [node2] is that enough delete Schema* Migrations* sstables and restart the node? On Thu, Aug 18, 2011 at 5:13 PM, Yan Chunlu springri...@gmail.com wrote: thanks a lot for all the help! I have gone through the steps and successfully brought up

Re: node restart taking too long

2011-08-17 Thread Yan Chunlu
Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 17/08/2011, at 2:59 PM, Yan Chunlu wrote: does this need to be cluster wide? or I could just modify the caches on one node?   since I could not connect to the node with cassandra-cli, it says connection refused

Re: node restart taking too long

2011-08-16 Thread Yan Chunlu
, at 04:23, Yan Chunlu wrote: I got 3 nodes and RF=3, when I repairing ndoe3, it seems alot data generated. and server can not afford the load then crashed. after come back, node 3 can not return for more than 96 hours for 34GB data, the node 2 could restart and back online within 1

Re: node restart taking too long

2011-08-16 Thread Yan Chunlu
:false:621@1313192538616112 On Tue, Aug 16, 2011 at 7:32 PM, Yan Chunlu springri...@gmail.com wrote: but it seems the row cache is cluster wide, how will the change of row cache affect the read speed? On Mon, Aug 15, 2011 at 7:33 AM, Jonathan Ellis jbel...@gmail.com wrote: Or leave row cache

Re: node restart taking too long

2011-08-16 Thread Yan Chunlu
the following: * Set all row key caches in your CFs to 0 via cassandra-cli * Kill Cassandra * Remove all files in the saved_caches directory * Start Cassandra * Slowly bring back row key caches (if desired, we left them off) Cheers,        T. On 16/08/11 23:35, Yan Chunlu wrote:  I saw alot

node restart taking too long

2011-08-14 Thread Yan Chunlu
I got 3 nodes and RF=3, when I repairing ndoe3, it seems alot data generated. and server can not afford the load then crashed. after come back, node 3 can not return for more than 96 hours for 34GB data, the node 2 could restart and back online within 1 hour. I am not sure what's wrong with

Re: move one node for load re-balancing then it status stuck at Leaving

2011-08-07 Thread Yan Chunlu
, Yan Chunlu springri...@gmail.com wrote: nothing... nodetool -h node3 netstats Mode: Normal Not sending any streams. Nothing streaming from /10.28.53.11 Pool NameActive Pending Completed Commandsn/a 0 186669475 Responses

Re: move one node for load re-balancing then it status stuck at Leaving

2011-08-07 Thread Yan Chunlu
the exception. On Sun, Aug 7, 2011 at 2:03 PM, Yan Chunlu springri...@gmail.com wrote: is that possible that the implements of cassandra only calculate live nodes? for example: node move node3 cause node3 Leaving, then cassandra iterate over the endpoints and found node1 and node2. so

Re: how to solve one node is in heavy load in unbalanced cluster

2011-08-07 Thread Yan Chunlu
Cassandra Developer @aaronmorton http://www.thelastpickle.com On 5 Aug 2011, at 03:39, Yan Chunlu wrote: hi, any help? thanks! On Thu, Aug 4, 2011 at 5:02 AM, Yan Chunlu springri...@gmail.com wrote: forgot to mention I am using cassandra 0.7.4 On Thu, Aug 4, 2011 at 5:00 PM, Yan Chunlu

Re: move one node for load re-balancing then it status stuck at Leaving

2011-08-05 Thread Yan Chunlu
nothing... nodetool -h node3 netstats Mode: Normal Not sending any streams. Nothing streaming from /10.28.53.11 Pool NameActive Pending Completed Commandsn/a 0 186669475 Responses n/a 0 117986130

Re: how to solve one node is in heavy load in unbalanced cluster

2011-08-04 Thread Yan Chunlu
cassandra-cli. On Mon, Aug 1, 2011 at 10:22 AM, Yan Chunlu springri...@gmail.com wrote: thanks a lot! I will try the move. On Mon, Aug 1, 2011 at 7:07 AM, mcasandra mohitanch...@gmail.com wrote: springrider wrote: is that okay to do nodetool move before a completely repair? using

Re: how to solve one node is in heavy load in unbalanced cluster

2011-08-04 Thread Yan Chunlu
n/a 0 99372520 On Thu, Aug 4, 2011 at 4:56 PM, Yan Chunlu springri...@gmail.com wrote: sorry the ring info should be this: nodetool -h node3 ring Address Status State LoadOwnsToken 84944475733633104818662955375549269696 node1 Up Normal

Re: how to solve one node is in heavy load in unbalanced cluster

2011-08-04 Thread Yan Chunlu
hi, any help? thanks! On Thu, Aug 4, 2011 at 5:02 AM, Yan Chunlu springri...@gmail.com wrote: forgot to mention I am using cassandra 0.7.4 On Thu, Aug 4, 2011 at 5:00 PM, Yan Chunlu springri...@gmail.com wrote: also nothing happens about the streaming: nodetool -h node3 netstats Mode

move one node for load re-balancing then it status stuck at Leaving

2011-08-04 Thread Yan Chunlu
I have 3 nodes and the RF used to be 2, after awhile I have changed it to 3;  using Cassandra 0.7.4 I have tried the nodetool move but get the following error node3:~# nodetool -h node3 move 0 Exception in thread main java.lang.IllegalStateException: replication factor (3) exceeds number of

Re: how to solve one node is in heavy load in unbalanced cluster

2011-07-31 Thread Yan Chunlu
any help? thanks! On Fri, Jul 29, 2011 at 12:05 PM, Yan Chunlu springri...@gmail.com wrote: and by the way, my RF=3 and the other two nodes have much more capacity, why does they always routed the request to node3? coud I do a rebalance now? before node repair? On Fri, Jul 29, 2011 at 12

Re: how to solve one node is in heavy load in unbalanced cluster

2011-07-31 Thread Yan Chunlu
is that okay to do nodetool move before a completely repair? using this equation? def tokens(nodes): - for x in xrange(nodes): - print 2 ** 127 / nodes * x On Mon, Aug 1, 2011 at 1:17 AM, mcasandra mohitanch...@gmail.com wrote: First run nodetool move and then you can run nodetool

Could I run node repair when disable gossip and thrift?

2011-07-31 Thread Yan Chunlu
I am running 3 nodes and RF=3, cassandra v0.7.4 seems when disablegossip and disablethrift could keep node in pretty low load. sometimes when the node repair doing rebuilding sstable, I would disable gossip and thrift to lower the load. not sure if I could disable them in the whole procedure.

Re: Could I run node repair when disable gossip and thrift?

2011-07-31 Thread Yan Chunlu
- Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 1 Aug 2011, at 06:09, Yan Chunlu wrote: I am running 3 nodes and RF=3, cassandra v0.7.4 seems when disablegossip and disablethrift could keep node in pretty low load. sometimes when the node repair

Re: how to solve one node is in heavy load in unbalanced cluster

2011-07-31 Thread Yan Chunlu
. Cheers - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 1 Aug 2011, at 05:48, Yan Chunlu wrote: is that okay to do nodetool move before a completely repair? using this equation? def tokens(nodes): - for x in xrange(nodes

Re: how to solve one node is in heavy load in unbalanced cluster

2011-07-31 Thread Yan Chunlu
thanks a lot! I will try the move. On Mon, Aug 1, 2011 at 7:07 AM, mcasandra mohitanch...@gmail.com wrote: springrider wrote: is that okay to do nodetool move before a completely repair? using this equation? def tokens(nodes): - for x in xrange(nodes): - print 2 ** 127

how to solve one node is in heavy load in unbalanced cluster

2011-07-28 Thread Yan Chunlu
I have three nodes and RF=3.here is the current ring: Address Status State Load Owns Token 84944475733633104818662955375549269696 node1 Up Normal 15.32 GB 81.09% 52773518586096316348543097376923124102 node2 Up Normal 22.51 GB 10.48% 70597222385644499881390884416714081360 node3 Up Normal 56.1 GB

Re: how to solve one node is in heavy load in unbalanced cluster

2011-07-28 Thread Yan Chunlu
nodes and physically rebooted the offending node(s). The entire cluster then calmed down. On Thu, Jul 28, 2011 at 2:24 PM, Yan Chunlu springri...@gmail.com wrote: I have three nodes and RF=3.here is the current ring: Address Status State Load Owns Token

Re: how to solve one node is in heavy load in unbalanced cluster

2011-07-28 Thread Yan Chunlu
and by the way, my RF=3 and the other two nodes have much more capacity, why does they always routed the request to node3? coud I do a rebalance now? before node repair? On Fri, Jul 29, 2011 at 12:01 PM, Yan Chunlu springri...@gmail.com wrote: add new nodes seems added more pressure

Re: do I need to add more nodes? minor compaction eat all IO

2011-07-25 Thread Yan Chunlu
I am using normal SATA disk, actually I was worrying about whether it is okay if every time cassandra using all the io resources? further more when is the good time to add more nodes when I was just using normal SATA disk and with 100r/s it could reach 100 %util how large the data size it

Re: do I need to add more nodes? minor compaction eat all IO

2011-07-25 Thread Yan Chunlu
the status of my cluster, if it is normal On Mon, Jul 25, 2011 at 8:59 PM, Yan Chunlu springri...@gmail.com wrote: I am using normal SATA disk,  actually I was worrying about whether it is okay if every time cassandra using all the io resources? further more when is the good time to add more nodes

do I need to add more nodes? minor compaction eat all IO

2011-07-23 Thread Yan Chunlu
I have three nodes and RF=3, every time it is do minor compaction, the cpu load(8 core) get to 30, and iostat -x 2 shows utils is 100%, is that means I need more nodes? the total data size is 60G thanks! --

Re: node repair eat up all disk io and slow down entire cluster(3 nodes)

2011-07-21 Thread Yan Chunlu
Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 21 Jul 2011, at 23:17, Yan Chunlu wrote: after tried nodetool -h reagon repair key cf, I found that even repair single CF, it involves rebuild all sstables(using nodetool compactionstats), is that normal

node repair eat up all disk io and slow down entire cluster(3 nodes)

2011-07-20 Thread Yan Chunlu
at the beginning of using cassandra, I have no idea that I should run node repair frequently, so basically, I have 3 nodes with RF=3 and have not run node repair for months, the data size is 20G. the problem is when I start running node repair now, it eat up all disk io and the server load became

Re: node repair eat up all disk io and slow down entire cluster(3 nodes)

2011-07-20 Thread Yan Chunlu
at 4:44 PM, Yan Chunlu springri...@gmail.com wrote: at the beginning of using cassandra, I have no idea that I should run node repair frequently, so basically, I have 3 nodes with RF=3 and have not run node repair for months, the data size is 20G. the problem is when I start running node repair

with proof Re: cassandra goes infinite loop and data lost.....

2011-07-20 Thread Yan Chunlu
doing slices with a much larger limit than is advisable (good way to OOM the way you already did). On Wed, Jul 13, 2011 at 8:27 PM, Yan Chunlu springri...@gmail.com wrote: I gave cassandra 8GB heap size and somehow it run out of memory and crashed. after I start it, it just runs

Re: node repair eat up all disk io and slow down entire cluster(3 nodes)

2011-07-20 Thread Yan Chunlu
Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 20/07/2011, at 11:46 PM, Yan Chunlu springri...@gmail.com wrote: just found this: https://issues.apache.org/jira/browse/CASSANDRA-2156 https://issues.apache.org/jira/browse/CASSANDRA-2156 but seems only available to 0.8

Re: with proof Re: cassandra goes infinite loop and data lost.....

2011-07-20 Thread Yan Chunlu
see the column objects being iterated over are different. Like I said last time, I do see that it's saying N of 2147483647 which looks like you're doing slices with a much larger limit than is advisable. On Wed, Jul 20, 2011 at 9:00 PM, Yan Chunlu springri...@gmail.com wrote: this time

Re: with proof Re: cassandra goes infinite loop and data lost.....

2011-07-20 Thread Yan Chunlu
On 21 Jul 2011, at 15:14, Yan Chunlu wrote: sorry for the misunderstanding. I saw many N of 2147483647 which N=0 and thought it was not doing anything. my node was very unbalanced and I was intend to rebalance it by nodetool move after a node repair, does that cause the slices much large

Re: cassandra goes infinite loop and data lost.....

2011-07-14 Thread Yan Chunlu
anything here that indicates an infinite loop. I do see that it's saying N of 2147483647 which looks like you're doing slices with a much larger limit than is advisable (good way to OOM the way you already did). On Wed, Jul 13, 2011 at 8:27 PM, Yan Chunlu springri...@gmail.com wrote: I gave

cassandra goes infinite loop and data lost.....

2011-07-13 Thread Yan Chunlu
DEBUG [main] 2011-07-13 22:19:00,586 SliceQueryFilter.java (line 123) collecting 0 of 2147483647: 100zs:false:14@1310168625866434

Re: cassandra goes infinite loop and data lost.....

2011-07-13 Thread Yan Chunlu
(line 123) collecting 2 of 2147483647: 1018e:false:14@1310168759614715 DEBUG [main] 2011-07-13 22:19:00,587 SliceQueryFilter.java (line 123) collecting 3 of 2147483647: 101dd:false:14@1310169260225339 On Thu, Jul 14, 2011 at 11:27 AM, Yan Chunlu springri...@gmail.com wrote: DEBUG [main] 2011-07-13

Re: cassandra goes infinite loop and data lost.....

2011-07-13 Thread Yan Chunlu
16GB On Thu, Jul 14, 2011 at 11:29 AM, Bret Palsson b...@getjive.com wrote: How much total memory does your machine have? -- Bret On Wednesday, July 13, 2011 at 9:27 PM, Yan Chunlu wrote: I gave cassandra 8GB heap size and somehow it run out of memory and crashed. after I start

Re: cassandra goes infinite loop and data lost.....

2011-07-13 Thread Yan Chunlu
problem is I can't take cassandra back does that because not enough memory for cassandra? On Thu, Jul 14, 2011 at 11:29 AM, Bret Palsson b...@getjive.com wrote: How much total memory does your machine have? -- Bret On Wednesday, July 13, 2011 at 9:27 PM, Yan Chunlu wrote: I gave

Re: how large cassandra could scale when it need to do manual operation?

2011-07-10 Thread Yan Chunlu
/clusters that could fail right away. On Sat, Jul 9, 2011 at 12:17 AM, Yan Chunlu springri...@gmail.com wrote: thank you very much for the reply. which brings me more confidence on cassandra. I will try the automation tools, the examples you've listed seems quite promising! about the decommission

Re: how large cassandra could scale when it need to do manual operation?

2011-07-10 Thread Yan Chunlu
sure machines get serviced in certain time windows and have an extensive automated burn-in process of (disk, memory, drives) to not roll out nodes/clusters that could fail right away. On Sat, Jul 9, 2011 at 12:17 AM, Yan Chunlu springri...@gmail.com wrote: thank you very much for the reply

Re: Corrupted data

2011-07-10 Thread Yan Chunlu
I am running RF=2(I have changed it from 2-3 and back to 2) and 3 nodes and didn't running node repair more than 10 days, did not aware of this is critical. I run node repair recently and one of the node always hung... from log it seems doing nothing related to the repair. so I got two problems:

Re: Corrupted data

2011-07-10 Thread Yan Chunlu
Developer @aaronmorton http://www.thelastpickle.com On 10 Jul 2011, at 03:26, Yan Chunlu wrote: I am running RF=2(I have changed it from 2-3 and back to 2) and 3 nodes and didn't running node repair more than 10 days, did not aware of this is critical. I run node repair recently and one

  1   2   >