Adding and removing node procedures
I just wanted to verify the procedures to add and remove nodes in my environment, please feel free to comments or advise. I have 3 node cluster N1, N2, N3 with Vnode configured as (256) on each node. All are in one data center. 1. Procedure to Change node hardware or replace to new node machines (N1, N2 and N3) to (N11, N22 and N31) nodetool -h node2 decommission Bootstrap N21 nodetool repair nodetool -h node1 decommission Bootstrap N11 nodetool repair nodetool -h node3 decommission Bootstrap N31 nodetool repair --- 2. Procedure for changing 3 nodes cluster to 2 nodes cluster (N1, N2 and N3) to (N1, N3) nodetool -h node2 decommission Physically get rid of Node2 --- 3. Procedure for adding new node (N1, N2 and N3) to (N1, N2, N3, N4) Bootstrap N4 nodetool repair --- 4. Procedure to remove dead node/crashed node. (node n2 unable to start) (n1,n2, n3) to (n1,n3) Shutdown N2 if possible nodetool removenode xx_hostid_Of_N2_xx nodetool repair --- 5. Procedure to remove dead node/crashed node and replace with N21. (node n2 unable to start) (n1,n2, n3) to (n1,n3, n21) Shutdown N2 if possible nodetool removenode xx_hostid_Of_N2_xx Bootstrap N21 nodetool repair --- Thanks in advance for pointing any mistake or advise.
Snapshot the data with 3 node and replicationfactor=3
Is there any reason you would like to take snapshot of column family on each node when cluster consists of 3 nodes with keyspace on replication factor =3? I am thinking of taking snapshot of CF on only one node. For restore, I will follow below 1. drop and recreate the CF on node1 2. copy snapshotted files to node 1 data directory of CF 3. perform nodetool refresh on node 1 Any suggestions/advise? ng
Re: Snapshot the data with 3 node and replicationfactor=3
I am not worried about eventually consistent data. I just wanted to get rough data in close proximate. ng On Wed, Jun 4, 2014 at 2:49 PM, Robert Coli rc...@eventbrite.com wrote: On Wed, Jun 4, 2014 at 1:26 PM, ng pipeli...@gmail.com wrote: Is there any reason you would like to take snapshot of column family on each node when cluster consists of 3 nodes with keyspace on replication factor =3? Unless all read/write occurs with CL.ALL (which is an availability problem), there is a nonzero chance of any given write not being on any given node at any given time. =Rob
Cassandra snapshot
I need to make sure that all the data in sstable before taking the snapshot. I am thinking of nodetool cleanup nodetool repair nodetool flush nodetool snapshot Am I missing anything else? Thanks in advance for the responses/suggestions. ng
Backup Solution
I want to discuss the question asked by Rene last year again. http://www.mail-archive.com/user%40cassandra.apache.org/msg28465.html Is the following a good backup solution. Create two data-centers: - A live data-center with multiple nodes (commodity hardware) (6 nodes with replication factor of 3). Clients connect to this cluster with LOCAL_QUORUM. - A backup data-center with 1 node (with fast SSDs). Clients do not connect to this cluster. Cluster only used for creating and storing snapshots. Advantages: - No snapshots and bulk network I/O (transfer snapshots) needed on the live cluster. Also no need to take snapshot on each node. - Clients are not slowed down because writes to the backup data-center are async. - On the backup cluster snapshots are made on a regular basis. This again does not affect the live cluster. - The back-up cluster does not need to process client requests/reads, so we need less machines for the backup cluster than the live cluster. Are there any disadvantages with this approach? I don't see any issue with it. It is backup solution...not replication solution. Both DC can be on physically same location/network. Copy of the snapshots can be placed to separate shared location on daily basis from backup DC node. I must be missing something..please advise.
Datacenter understanding question
If I have configuration of two data center with one node each. Replication factor is also 1. Will these 2 nodes going to be mirrored/replicated?
Row_key from sstable2json to actual value of the key
sstable2json tomcat-t5-ic-1-Data.db -e gives me 0021 001f 0020 How do I convert this (hex) to actual value of column so I can do below select * from tomcat.t5 where c1='concerted value'; Thanks in advance for the help.
Exporting column family data to csv
I want to export all the data of particular column family to the text file from Cassandra cluster. I tried copy keyspace.mycolumnfamily to '/root/ddd/xx.csv'; It gave me timeout error I tried below in Cassandra.yaml request_timeout_in_ms: 1000 read_request_timeout_in_ms: 1000 range_request_timeout_in_ms: 1000 truncate_request_timeout_in_ms: 1000 I still have no luck. Any advise how to achieve this? I am NOT limited to copy command. What is the best way to achieve this? Thanks in advance for the help. ng
Re: Exporting column family data to csv
Thanks for the reply. Most of the solutions provided over web involves some kind of 'where' clause in data extract and then export the next set until done. I have column family with no time stamp and no other column I can use to filter the data. One other solution provided was to use pagination, but I could not find any example any where over web that achieves this. This can not be that hard! I must be missing something. Please advise. On Wednesday, April 2, 2014, Viktor Jevdokimov viktor.jevdoki...@adform.com wrote: http://mail-archives.apache.org/mod_mbox/cassandra-user/201309.mbox/%3C9AF3ADEDDFED4DDEA840B8F5C6286BBA@vig.local%3E http://stackoverflow.com/questions/18872422/rpc-timeout-error-while-exporting-data-from-cql Google for more. Best regards / Pagarbiai *Viktor Jevdokimov* Senior Developer Email: viktor.jevdoki...@adform.comjavascript:_e(%7B%7D,'cvml','viktor.jevdoki...@adform.com'); Phone: +370 5 212 3063, Fax +370 5 261 0453 J. Jasinskio 16C, LT-03163 Vilnius, Lithuania Follow us on Twitter: @adforminsider http://twitter.com/#!/adforminsider Experience Adform DNA http://vimeo.com/76421547 [image: Adform News] http://www.adform.com [image: Adform awarded the Best Employer 2012] http://www.adform.com/site/blog/adform/adform-takes-top-spot-in-best-employer-survey/ Disclaimer: The information contained in this message and attachments is intended solely for the attention and use of the named addressee and may be confidential. If you are not the intended recipient, you are reminded that the information remains the property of the sender. You must not use, disclose, distribute, copy, print or rely on this e-mail. If you have received this message in error, please contact the sender immediately and irrevocably delete this message and any copies. *From:* ng [mailto:pipeli...@gmail.comjavascript:_e(%7B%7D,'cvml','pipeli...@gmail.com');] *Sent:* Wednesday, April 2, 2014 6:04 PM *To:* user@cassandra.apache.orgjavascript:_e(%7B%7D,'cvml','user@cassandra.apache.org'); *Subject:* Exporting column family data to csv I want to export all the data of particular column family to the text file from Cassandra cluster. I tried copy keyspace.mycolumnfamily to '/root/ddd/xx.csv'; It gave me timeout error I tried below in Cassandra.yaml request_timeout_in_ms: 1000 read_request_timeout_in_ms: 1000 range_request_timeout_in_ms: 1000 truncate_request_timeout_in_ms: 1000 I still have no luck. Any advise how to achieve this? I am NOT limited to copy command. What is the best way to achieve this? Thanks in advance for the help. ng inline: signature-best-employer-logo4823.pnginline: signature-logo29.png
Re: Intermittent long application pauses on nodes
: [GC pause (G1 Evacuation Pause) (young) Desired survivor size 37748736 bytes, new threshold 15 (max 15) - age 1: 17213632 bytes, 17213632 total - age 2: 19391208 bytes, 36604840 total , 0.1664300 secs] [Parallel Time: 163.9 ms, GC Workers: 2] [GC Worker Start (ms): Min: 222346218.3, Avg: 222346218.3, Max: 222346218.3, Diff: 0.0] [Ext Root Scanning (ms): Min: 6.0, Avg: 6.9, Max: 7.7, Diff: 1.7, Sum: 13.7] [Update RS (ms): Min: 20.4, Avg: 21.3, Max: 22.1, Diff: 1.7, Sum: 42.6] [Processed Buffers: Min: 49, Avg: 60.0, Max: 71, Diff: 22, Sum: 120] [Scan RS (ms): Min: 23.2, Avg: 23.2, Max: 23.3, Diff: 0.1, Sum: 46.5] [Object Copy (ms): Min: 112.3, Avg: 112.3, Max: 112.4, Diff: 0.1, Sum: 224.6] [Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.1, Diff: 0.0, Sum: 0.1] [GC Worker Other (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.1] [GC Worker Total (ms): Min: 163.8, Avg: 163.8, Max: 163.8, Diff: 0.0, Sum: 327.6] [GC Worker End (ms): Min: 222346382.1, Avg: 222346382.1, Max: 222346382.1, Diff: 0.0] [Code Root Fixup: 0.0 ms] [Clear CT: 0.4 ms] [Other: 2.1 ms] [Choose CSet: 0.0 ms] [Ref Proc: 1.1 ms] [Ref Enq: 0.0 ms] [Free CSet: 0.4 ms] [Eden: 524.0M(524.0M)-0.0B(476.0M) Survivors: 44.0M-68.0M Heap: 3518.5M(8192.0M)-3018.5M(8192.0M)] Heap after GC invocations=4074 (full 1): garbage-first heap total 8388608K, used 3090914K [0x0005f5c0, 0x0007f5c0, 0x0007f5c0) region size 4096K, 17 young (69632K), 17 survivors (69632K) compacting perm gen total 28672K, used 27428K [0x0007f5c0, 0x0007f780, 0x0008) the space 28672K, 95% used [0x0007f5c0, 0x0007f76c9108, 0x0007f76c9200, 0x0007f780) No shared spaces configured. } [Times: user=0.35 sys=0.00, real=27.58 secs] 222346.219: G1IncCollectionPause [ 111 0 0] [ 0 0 0 0 27586] 0 And the total thime for which application threads were stopped is 27.58 seconds. CMS behaves in a similar manner. We thought it would be GC, waiting for mmaped files being read from disk (the thread cannot reach safepoint during this operation), but it doesn't explain the huge time. We'll try jhiccup to see if it provides any additional information. The test was done on mixed aws/openstack environment, openjdk 1.7.0_45, cassandra 1.2.11. Upgrading to 2.0.x is no option for us. regards, ondrej cernos On Fri, Feb 14, 2014 at 8:53 PM, Frank Ng fnt...@gmail.com wrote: Sorry, I have not had a chance to file a JIRA ticket. We have not been able to resolve the issue. But since Joel mentioned that upgrading to Cassandra 2.0.X solved it for them, we may need to upgrade. We are currently on Java 1.7 and Cassandra 1.2.8 On Thu, Feb 13, 2014 at 12:40 PM, Keith Wright kwri...@nanigans.com wrote: You’re running 2.0.* in production? May I ask what C* version and OS? Any hardware details would be appreciated as well. Thx! From: Joel Samuelsson samuelsson.j...@gmail.com Reply-To: user@cassandra.apache.org user@cassandra.apache.org Date: Thursday, February 13, 2014 at 11:39 AM To: user@cassandra.apache.org user@cassandra.apache.org Subject: Re: Intermittent long application pauses on nodes We have had similar issues and upgrading C* to 2.0.x and Java to 1.7 seems to have helped our issues. 2014-02-13 Keith Wright kwri...@nanigans.com: Frank did you ever file a ticket for this issue or find the root cause? I believe we are seeing the same issues when attempting to bootstrap. Thanks From: Robert Coli rc...@eventbrite.com Reply-To: user@cassandra.apache.org user@cassandra.apache.org Date: Monday, February 3, 2014 at 6:10 PM To: user@cassandra.apache.org user@cassandra.apache.org Subject: Re: Intermittent long application pauses on nodes On Mon, Feb 3, 2014 at 8:52 AM, Benedict Elliott Smith belliottsm...@datastax.com wrote: It's possible that this is a JVM issue, but if so there may be some remedial action we can take anyway. There are some more flags we should add, but we can discuss that once you open a ticket. If you could include the strange JMX error as well, that might be helpful. It would be appreciated if you could inform this thread of the JIRA ticket number, for the benefit of the community and google searchers. :) =Rob
Re: Intermittent long application pauses on nodes
We have swap disabled. Can death by paging still happen? On Thu, Feb 27, 2014 at 11:32 AM, Benedict Elliott Smith belliottsm...@datastax.com wrote: That sounds a lot like death by paging. On 27 February 2014 16:29, Frank Ng fnt...@gmail.com wrote: I just caught that a node was down based on running nodetool status on a different node. I tried to ssh into the downed node at that time and it was very slow logging on. Looking at the gc.log file, there was a ParNew that only took 0.09 secs. Yet the overall application threads stop time is 315 seconds (5 minutes). Our cluster is handling alot of read requests. If there were network hiccups, would that cause a delay in the Cassandra process when it tries to get to a safepoint? I assume Cassandra has threads running with lots of network activity and maybe taking a long time to reach a safepoint. thanks, Frank On Fri, Feb 21, 2014 at 4:24 AM, Joel Samuelsson samuelsson.j...@gmail.com wrote: What happens if a ParNew is triggered while CMS is running? Will it wait for the CMS to finish? If so, that would be the eplanation of our long ParNew above. Regards, Joel 2014-02-20 16:29 GMT+01:00 Joel Samuelsson samuelsson.j...@gmail.com: Hi Frank, We got a (quite) long GC pause today on 2.0.5: INFO [ScheduledTasks:1] 2014-02-20 13:51:14,528 GCInspector.java (line 116) GC for ParNew: 1627 ms for 1 collections, 425562984 used; max is 4253024256 INFO [ScheduledTasks:1] 2014-02-20 13:51:14,542 GCInspector.java (line 116) GC for ConcurrentMarkSweep: 3703 ms for 2 collections, 434394920 used; max is 4253024256 Unfortunately it's a production cluster so I have no additional GC-logging enabled. This may be an indication that upgrading is not the (complete) solution. Regards, Joel 2014-02-17 13:41 GMT+01:00 Benedict Elliott Smith belliottsm...@datastax.com: Hi Ondrej, It's possible you were hit by the problems in this thread before, but it looks potentially like you may have other issues. Of course it may be that on G1 you have one issue and CMS another, but 27s is extreme even for G1, so it seems unlikely. If you're hitting these pause times in CMS and you get some more output from the safepoint tracing, please do contribute as I would love to get to the bottom of that, however is it possible you're experiencing paging activity? Have you made certain the VM memory is locked (and preferably that paging is entirely disabled, as the bloom filters and other memory won't be locked, although that shouldn't cause pauses during GC) Note that mmapped file accesses and other native work shouldn't in anyway inhibit GC activity or other safepoint pause times, unless there's a bug in the VM. These threads will simply enter a safepoint as they return to the VM execution context, and are considered safe for the duration they are outside. On 17 February 2014 12:30, Ondřej Černoš cern...@gmail.com wrote: Hi, we tried to switch to G1 because we observed this behaviour on CMS too (27 seconds pause in G1 is quite an advise not to use it). Pauses with CMS were not easily traceable - JVM stopped even without stop-the-world pause scheduled (defragmentation, remarking). We thought the go-to-safepoint waiting time might have been involved (we saw waiting for safepoint resolution) - especially because access to mmpaped files is not preemptive, afaik, but it doesn't explain tens of seconds waiting times, even slow IO should read our sstables into memory in much less time. We switched to G1 out of desperation - and to try different code paths - not that we'd thought it was a great idea. So I think we were hit by the problem discussed in this thread, just the G1 report wasn't very clear, sorry. regards, ondrej On Mon, Feb 17, 2014 at 11:45 AM, Benedict Elliott Smith belliottsm...@datastax.com wrote: Ondrej, It seems like your issue is much less difficult to diagnose: your collection times are long. At least, the pause you printed the time for is all attributable to the G1 pause. Note that G1 has not generally performed well with Cassandra in our testing. There are a number of changes going in soon that may change that, but for the time being it is advisable to stick with CMS. With tuning you can no doubt bring your pauses down considerably. On 17 February 2014 10:17, Ondřej Černoš cern...@gmail.com wrote: Hi all, we are seeing the same kind of long pauses in Cassandra. We tried to switch CMS to G1 without positive result. The stress test is read heavy, 2 datacenters, 6 nodes, 400reqs/sec on one datacenter. We see spikes in latency on 99.99 percentil and higher, caused by threads being stopped in JVM. The GC in G1 looks like this: {Heap before GC invocations=4073 (full 1): garbage-first heap total 8388608K, used 3602914K [0x0005f5c0, 0x0007f5c0, 0x0007f5c0) region size 4096K, 142 young (581632K), 11 survivors (45056K
Re: Intermittent long application pauses on nodes
Sorry, I have not had a chance to file a JIRA ticket. We have not been able to resolve the issue. But since Joel mentioned that upgrading to Cassandra 2.0.X solved it for them, we may need to upgrade. We are currently on Java 1.7 and Cassandra 1.2.8 On Thu, Feb 13, 2014 at 12:40 PM, Keith Wright kwri...@nanigans.com wrote: You're running 2.0.* in production? May I ask what C* version and OS? Any hardware details would be appreciated as well. Thx! From: Joel Samuelsson samuelsson.j...@gmail.com Reply-To: user@cassandra.apache.org user@cassandra.apache.org Date: Thursday, February 13, 2014 at 11:39 AM To: user@cassandra.apache.org user@cassandra.apache.org Subject: Re: Intermittent long application pauses on nodes We have had similar issues and upgrading C* to 2.0.x and Java to 1.7 seems to have helped our issues. 2014-02-13 Keith Wright kwri...@nanigans.com: Frank did you ever file a ticket for this issue or find the root cause? I believe we are seeing the same issues when attempting to bootstrap. Thanks From: Robert Coli rc...@eventbrite.com Reply-To: user@cassandra.apache.org user@cassandra.apache.org Date: Monday, February 3, 2014 at 6:10 PM To: user@cassandra.apache.org user@cassandra.apache.org Subject: Re: Intermittent long application pauses on nodes On Mon, Feb 3, 2014 at 8:52 AM, Benedict Elliott Smith belliottsm...@datastax.com wrote: It's possible that this is a JVM issue, but if so there may be some remedial action we can take anyway. There are some more flags we should add, but we can discuss that once you open a ticket. If you could include the strange JMX error as well, that might be helpful. It would be appreciated if you could inform this thread of the JIRA ticket number, for the benefit of the community and google searchers. :) =Rob
Intermittent long application pauses on nodes
All, We've been having intermittent long application pauses (version 1.2.8) and not sure if it's a cassandra bug. During these pauses, there are dropped messages in the cassandra log file along with the node seeing other nodes as down. We've turned on gc logging and the following is an example of a long stopped or pause event in the gc.log file. 2014-01-28T23:11:12.183-0500: 1337654.424: Total time for which application threads were stopped: 0.091450 seconds 2014-01-28T23:14:11.161-0500: 1337833.401: Total time for which application threads were stopped: 51.8190260 seconds 2014-01-28T23:14:19.870-0500: 1337842.111: Total time for which application threads were stopped: 0.005470 seconds As seen above, there was a 0.091450 secs pause, then a 51.8190260 secs pause. There were no GC log events between those 2 log statements. Since there's no GC logs in between, something else must be causing the long stop time to reach a safepoint. Could there be a Cassandra thread that is taking a long time to reach a safepoint and what is it trying to do? Along with the node seeing other nodes as down in the cassandra log file, the StatusLogger shows 1599 Pending in ReadStage and 9 Pending in MutationStage. There is mention of cassandra batch revoke bias locks as a possible cause (not GC) via: http://www.mail-archive.com/user@cassandra.apache.org/msg34401.html We have JNA, no swap, and the cluster runs fine besides there intermittent long pause that can cause a node to appear down to other nodes. Any ideas as the cause of the long pause above? It seems not related to GC. thanks.
Re: Intermittent long application pauses on nodes
Thanks for the update. Our logs indicated that there were 0 pending for CompactionManager at that time. Also, there were no nodetool repairs running at that time. The log statements above state that the application had to stop to reach a safepoint. Yet, it doesn't say what is requesting the safepoint. On Wed, Jan 29, 2014 at 1:20 PM, Shao-Chuan Wang shaochuan.w...@bloomreach.com wrote: We had similar latency spikes when pending compactions can't keep it up or repair/streaming taking too much cycles. On Wed, Jan 29, 2014 at 10:07 AM, Frank Ng fnt...@gmail.com wrote: All, We've been having intermittent long application pauses (version 1.2.8) and not sure if it's a cassandra bug. During these pauses, there are dropped messages in the cassandra log file along with the node seeing other nodes as down. We've turned on gc logging and the following is an example of a long stopped or pause event in the gc.log file. 2014-01-28T23:11:12.183-0500: 1337654.424: Total time for which application threads were stopped: 0.091450 seconds 2014-01-28T23:14:11.161-0500: 1337833.401: Total time for which application threads were stopped: 51.8190260 seconds 2014-01-28T23:14:19.870-0500: 1337842.111: Total time for which application threads were stopped: 0.005470 seconds As seen above, there was a 0.091450 secs pause, then a 51.8190260 secs pause. There were no GC log events between those 2 log statements. Since there's no GC logs in between, something else must be causing the long stop time to reach a safepoint. Could there be a Cassandra thread that is taking a long time to reach a safepoint and what is it trying to do? Along with the node seeing other nodes as down in the cassandra log file, the StatusLogger shows 1599 Pending in ReadStage and 9 Pending in MutationStage. There is mention of cassandra batch revoke bias locks as a possible cause (not GC) via: http://www.mail-archive.com/user@cassandra.apache.org/msg34401.html We have JNA, no swap, and the cluster runs fine besides there intermittent long pause that can cause a node to appear down to other nodes. Any ideas as the cause of the long pause above? It seems not related to GC. thanks.
Re: Intermittent long application pauses on nodes
Benedict, Thanks for the advice. I've tried turning on PrintSafepointStatistics. However, that info is only sent to the STDOUT console. The cassandra startup script closes the STDOUT when it finishes, so nothing is shown for safepoint statistics once it's done starting up. Do you know how to startup cassandra and send all stdout to a log file and tell cassandra not to close stdout? Also, we have swap turned off as recommended. thanks On Wed, Jan 29, 2014 at 3:39 PM, Benedict Elliott Smith belliottsm...@datastax.com wrote: Frank, The same advice for investigating holds: add the VM flags -XX:+PrintSafepointStatistics -XX:PrintSafepointStatisticsCount=1 (you could put something above 1 there, to reduce the amount of logging, since a pause of 52s will be pretty obvious even if aggregated with lots of other safe points; the count is the number of safepoints to aggregate into one log message) 52s is a very extreme pause, and I would be surprised if revoke bias could cause this. I wonder if the VM is swapping out. On 29 January 2014 19:02, Frank Ng fnt...@gmail.com wrote: Thanks for the update. Our logs indicated that there were 0 pending for CompactionManager at that time. Also, there were no nodetool repairs running at that time. The log statements above state that the application had to stop to reach a safepoint. Yet, it doesn't say what is requesting the safepoint. On Wed, Jan 29, 2014 at 1:20 PM, Shao-Chuan Wang shaochuan.w...@bloomreach.com wrote: We had similar latency spikes when pending compactions can't keep it up or repair/streaming taking too much cycles. On Wed, Jan 29, 2014 at 10:07 AM, Frank Ng fnt...@gmail.com wrote: All, We've been having intermittent long application pauses (version 1.2.8) and not sure if it's a cassandra bug. During these pauses, there are dropped messages in the cassandra log file along with the node seeing other nodes as down. We've turned on gc logging and the following is an example of a long stopped or pause event in the gc.log file. 2014-01-28T23:11:12.183-0500: 1337654.424: Total time for which application threads were stopped: 0.091450 seconds 2014-01-28T23:14:11.161-0500: 1337833.401: Total time for which application threads were stopped: 51.8190260 seconds 2014-01-28T23:14:19.870-0500: 1337842.111: Total time for which application threads were stopped: 0.005470 seconds As seen above, there was a 0.091450 secs pause, then a 51.8190260 secs pause. There were no GC log events between those 2 log statements. Since there's no GC logs in between, something else must be causing the long stop time to reach a safepoint. Could there be a Cassandra thread that is taking a long time to reach a safepoint and what is it trying to do? Along with the node seeing other nodes as down in the cassandra log file, the StatusLogger shows 1599 Pending in ReadStage and 9 Pending in MutationStage. There is mention of cassandra batch revoke bias locks as a possible cause (not GC) via: http://www.mail-archive.com/user@cassandra.apache.org/msg34401.html We have JNA, no swap, and the cluster runs fine besides there intermittent long pause that can cause a node to appear down to other nodes. Any ideas as the cause of the long pause above? It seems not related to GC. thanks.
Fat Client Commit Log
Hi All, We are using the Fat Client and notice that there are files written to the commit log directory on the Fat Client. Does anyone know what these files are storing? Are these hinted handoff data? The Fat Client has no files in the data directory, as expected. thanks
Re: user Digest of: get.23021
I am having the same issue in 1.0.7 with leveled compation. It seems that the repair is flaky. It either completes relatively fast in a TEST environment (7 minutes) or gets stuck trying to receive a merkle tree from a peer that is already sending it the merkle tree. Only solution is to restart cassandra. But, we that's not good. On Thu, Apr 26, 2012 at 2:12 PM, user-h...@cassandra.apache.org wrote: user Digest of: get.23021 Topics (messages 23021 through 23021) repair waiting for something 23021 by: Igor Return-Path: buzzt...@gmail.com Received: (qmail 18382 invoked by uid 99); 26 Apr 2012 18:12:10 - Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 26 Apr 2012 18:12:10 + X-ASF-Spam-Status: No, hits=1.5 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of buzztemk@gmail.comdesignates 209.85.213.44 as permitted sender) Received: from [209.85.213.44] (HELO mail-yw0-f44.google.com) (209.85.213.44) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 26 Apr 2012 18:12:03 + Received: by yhkk25 with SMTP id k25so1353248yhk.31 for user-get.23...@cassandra.apache.org; Thu, 26 Apr 2012 11:11:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=r9z+JIAEkTfLo/8PFQJjtEFfJbNxrmswWgqgBxX7sGs=; b=SqDIdsaA/YBsIb8yTAjwlLyz/3KvP2fJzedX1lywPYnAT698AbE2yGI30qpGo8rUQM q/QFJ5mFNQkdrn0Ghr6L+wKe+slq6Teb8C/feeHBU9BkjbaAY40UPPljJyf/L0Yr9Sp8 ryso93dpcgcC18DdwbAPHmxd0C9G20gf4dbQcpquAKgyxtTK849GQXpPICS4AUHlG2bL OY83kIzRIBv7g3Zy2SJALwYX9eeB6zGin0DbnrtgGr7IqI0LBscWv6eKNMS658twLGG+ 37cVt+Wmtf5QIIT/Jm2qUdBZ7NViwwlnkJL79ULGnesj4Hewp2npFQAmLypK+8fGqoAM ie9Q== MIME-Version: 1.0 Received: by 10.182.113.106 with SMTP id ix10mr10045510obb.26.1335463902287; Thu, 26 Apr 2012 11:11:42 -0700 (PDT) Received: by 10.60.143.102 with HTTP; Thu, 26 Apr 2012 11:11:42 -0700 (PDT) Date: Thu, 26 Apr 2012 14:11:42 -0400 Message-ID: caal7ocavuw1rtaqwlddzbnzosv7-qxqfhot7w6uj8q08m03...@mail.gmail.com Subject: Get From: Frank Ng buzzt...@gmail.com To: user-get.23...@cassandra.apache.org Content-Type: multipart/alternative; boundary=f46d0447f3b081982d04be98eb8e -- Hi, 10 nodes cassandra 1.0.3, several DC. weekly nodetool repair stuck for unusual long time for node 10.254.237.2. output log on this node: INFO 11:19:42,045 Starting repair command #1, repairing 5 ranges. INFO 11:19:42,053 [repair #040aae00-28a1-11e1--e378018944ff] new session: will sync *localhost/10.254.237.2, /10.254.221.2, /10.253.2.2, / 10.254.217.2, /10.254.94.2* on range (85070591730234615865843651857942052864,85070591730234615865843651857942052865] for meter.[eventschema, schema, ids, transaction] INFO 11:19:42,055 [repair #040aae00-28a1-11e1--e378018944ff] requests for merkle tree sent for eventschema (to [/10.253.2.2, /10.254.221.2, localhost/10.254.237.2, /10.254.217.2, /10.254.94.2]) INFO 11:19:42,063 Enqueuing flush of Memtable-eventschema@1509399856(18748/23435 serialized/live bytes, 4 ops) INFO 11:19:42,063 Writing Memtable-eventschema@1509399856(18748/23435 serialized/live bytes, 4 ops) INFO 11:19:42,072 Completed flushing /spool1/cassandra/data/meter/eventschema-hb-40-Data.db (4745 bytes) INFO 11:19:42,073 Discarding obsolete commit log:CommitLogSegment(/var/lib/cassandra/commitlog/CommitLog-1324019623060.log) INFO 11:19:42,076 [repair #040aae00-28a1-11e1--e378018944ff] Received merkle tree for eventschema from localhost/10.254.237.2 INFO 11:19:42,102 [repair #040aae00-28a1-11e1--e378018944ff] Received merkle tree for eventschema from /10.254.221.2 INFO 11:19:42,128 [repair #040aae00-28a1-11e1--e378018944ff] Received merkle tree for eventschema from /10.254.217.2 INFO 11:19:42,228 [repair #040aae00-28a1-11e1--e378018944ff] Received merkle tree for eventschema from /10.253.2.2 And nothing after that for long time. So node sent request for trees to other nodes and received all but from the 10.254.94.2* *On that 10.254.94.2 node: INFO 11:19:42,083 [repair #040aae00-28a1-11e1--e378018944ff] Sending completed merkle tree to /10.254.237.2 for (meter,eventschema) So merkle tree were lost somewhere. Will this waiting break somehow or I need to restart node?
Re: repair waiting for something
I am having the same issue in 1.0.7 with leveled compation. It seems that the repair is flaky. It either completes relatively fast in a TEST environment (7 minutes) or gets stuck trying to receive a merkle tree from a peer that is already sending it the merkle tree. Only solution is to restart cassandra. But, we that's not good.
Re: Repair Process Taking too long
I also noticed that if I use the -pr option, the repair process went down from 30 hours to 9 hours. Is the -pr option safe to use if I want to run repair processes in parallel on nodes that are not replication peers? thanks On Thu, Apr 12, 2012 at 12:06 AM, Frank Ng berryt...@gmail.com wrote: Thank you for confirming that the per node data size is most likely causing the long repair process. I have tried a repair on smaller column families and it was significantly faster. On Wed, Apr 11, 2012 at 9:55 PM, aaron morton aa...@thelastpickle.comwrote: If you have 1TB of data it will take a long time to repair. Every bit of data has to be read and a hash generated. This is one of the reasons we often suggest that around 300 to 400Gb per node is a good load in the general case. Look at nodetool compactionstats .Is there a validation compaction running ? If so it is still building the merkle hash tree. Look at nodetool netstats . Is it streaming data ? If so all hash trees have been calculated. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 12/04/2012, at 2:16 AM, Frank Ng wrote: Can you expand further on your issue? Were you using Random Patitioner? thanks On Tue, Apr 10, 2012 at 5:35 PM, David Leimbach leim...@gmail.comwrote: I had this happen when I had really poorly generated tokens for the ring. Cassandra seems to accept numbers that are too big. You get hot spots when you think you should be balanced and repair never ends (I think there is a 48 hour timeout). On Tuesday, April 10, 2012, Frank Ng wrote: I am not using tier-sized compaction. On Tue, Apr 10, 2012 at 12:56 PM, Jonathan Rhone rh...@tinyco.comwrote: Data size, number of nodes, RF? Are you using size-tiered compaction on any of the column families that hold a lot of your data? Do your cassandra logs say you are streaming a lot of ranges? zgrep -E (Performing streaming repair|out of sync) On Tue, Apr 10, 2012 at 9:45 AM, Igor i...@4friends.od.ua wrote: On 04/10/2012 07:16 PM, Frank Ng wrote: Short answer - yes. But you are asking wrong question. I think both processes are taking a while. When it starts up, netstats and compactionstats show nothing. Anyone out there successfully using ext3 and their repair processes are faster than this? On Tue, Apr 10, 2012 at 10:42 AM, Igor i...@4friends.od.ua wrote: Hi You can check with nodetool which part of repair process is slow - network streams or verify compactions. use nodetool netstats or compactionstats. On 04/10/2012 05:16 PM, Frank Ng wrote: Hello, I am on Cassandra 1.0.7. My repair processes are taking over 30 hours to complete. Is it normal for the repair process to take this long? I wonder if it's because I am using the ext3 file system. thanks -- Jonathan Rhone Software Engineer *TinyCo* 800 Market St., Fl 6 San Francisco, CA 94102 www.tinyco.com
Re: Repair Process Taking too long
Thanks for the clarification. I'm running repairs as in case 2 (to avoid deleted data coming back). On Thu, Apr 12, 2012 at 10:59 AM, Sylvain Lebresne sylv...@datastax.comwrote: On Thu, Apr 12, 2012 at 4:06 PM, Frank Ng buzzt...@gmail.com wrote: I also noticed that if I use the -pr option, the repair process went down from 30 hours to 9 hours. Is the -pr option safe to use if I want to run repair processes in parallel on nodes that are not replication peers? There is pretty much two use case for repair: 1) to rebuild a node: if say a node has lost some data due to a hard drive corruption or the like and you want to to rebuild what's missing 2) the periodic repairs to avoid problem with deleted data coming back from the dead (basically: http://wiki.apache.org/cassandra/Operations#Frequency_of_nodetool_repair) In case 1) you want to run 'nodetool repair' (without -pr) against the node to rebuild. In case 2) (which I suspect is the case your talking now), you *want* to use 'nodetool repair -pr' on *every* node of the cluster. I.e. that's the most efficient way to do it. The only reason not to use -pr in this case would be that it's not available because you're using an old version of Cassandra. And yes, it's is safe to run with -pr in parallel on nodes that are not replication peers. -- Sylvain thanks On Thu, Apr 12, 2012 at 12:06 AM, Frank Ng berryt...@gmail.com wrote: Thank you for confirming that the per node data size is most likely causing the long repair process. I have tried a repair on smaller column families and it was significantly faster. On Wed, Apr 11, 2012 at 9:55 PM, aaron morton aa...@thelastpickle.com wrote: If you have 1TB of data it will take a long time to repair. Every bit of data has to be read and a hash generated. This is one of the reasons we often suggest that around 300 to 400Gb per node is a good load in the general case. Look at nodetool compactionstats .Is there a validation compaction running ? If so it is still building the merkle hash tree. Look at nodetool netstats . Is it streaming data ? If so all hash trees have been calculated. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 12/04/2012, at 2:16 AM, Frank Ng wrote: Can you expand further on your issue? Were you using Random Patitioner? thanks On Tue, Apr 10, 2012 at 5:35 PM, David Leimbach leim...@gmail.com wrote: I had this happen when I had really poorly generated tokens for the ring. Cassandra seems to accept numbers that are too big. You get hot spots when you think you should be balanced and repair never ends (I think there is a 48 hour timeout). On Tuesday, April 10, 2012, Frank Ng wrote: I am not using tier-sized compaction. On Tue, Apr 10, 2012 at 12:56 PM, Jonathan Rhone rh...@tinyco.com wrote: Data size, number of nodes, RF? Are you using size-tiered compaction on any of the column families that hold a lot of your data? Do your cassandra logs say you are streaming a lot of ranges? zgrep -E (Performing streaming repair|out of sync) On Tue, Apr 10, 2012 at 9:45 AM, Igor i...@4friends.od.ua wrote: On 04/10/2012 07:16 PM, Frank Ng wrote: Short answer - yes. But you are asking wrong question. I think both processes are taking a while. When it starts up, netstats and compactionstats show nothing. Anyone out there successfully using ext3 and their repair processes are faster than this? On Tue, Apr 10, 2012 at 10:42 AM, Igor i...@4friends.od.ua wrote: Hi You can check with nodetool which part of repair process is slow - network streams or verify compactions. use nodetool netstats or compactionstats. On 04/10/2012 05:16 PM, Frank Ng wrote: Hello, I am on Cassandra 1.0.7. My repair processes are taking over 30 hours to complete. Is it normal for the repair process to take this long? I wonder if it's because I am using the ext3 file system. thanks -- Jonathan Rhone Software Engineer TinyCo 800 Market St., Fl 6 San Francisco, CA 94102 www.tinyco.com
Re: Repair Process Taking too long
Can you expand further on your issue? Were you using Random Patitioner? thanks On Tue, Apr 10, 2012 at 5:35 PM, David Leimbach leim...@gmail.com wrote: I had this happen when I had really poorly generated tokens for the ring. Cassandra seems to accept numbers that are too big. You get hot spots when you think you should be balanced and repair never ends (I think there is a 48 hour timeout). On Tuesday, April 10, 2012, Frank Ng wrote: I am not using tier-sized compaction. On Tue, Apr 10, 2012 at 12:56 PM, Jonathan Rhone rh...@tinyco.comwrote: Data size, number of nodes, RF? Are you using size-tiered compaction on any of the column families that hold a lot of your data? Do your cassandra logs say you are streaming a lot of ranges? zgrep -E (Performing streaming repair|out of sync) On Tue, Apr 10, 2012 at 9:45 AM, Igor i...@4friends.od.ua wrote: On 04/10/2012 07:16 PM, Frank Ng wrote: Short answer - yes. But you are asking wrong question. I think both processes are taking a while. When it starts up, netstats and compactionstats show nothing. Anyone out there successfully using ext3 and their repair processes are faster than this? On Tue, Apr 10, 2012 at 10:42 AM, Igor i...@4friends.od.ua wrote: Hi You can check with nodetool which part of repair process is slow - network streams or verify compactions. use nodetool netstats or compactionstats. On 04/10/2012 05:16 PM, Frank Ng wrote: Hello, I am on Cassandra 1.0.7. My repair processes are taking over 30 hours to complete. Is it normal for the repair process to take this long? I wonder if it's because I am using the ext3 file system. thanks -- Jonathan Rhone Software Engineer *TinyCo* 800 Market St., Fl 6 San Francisco, CA 94102 www.tinyco.com
Re: Repair Process Taking too long
Thank you for confirming that the per node data size is most likely causing the long repair process. I have tried a repair on smaller column families and it was significantly faster. On Wed, Apr 11, 2012 at 9:55 PM, aaron morton aa...@thelastpickle.comwrote: If you have 1TB of data it will take a long time to repair. Every bit of data has to be read and a hash generated. This is one of the reasons we often suggest that around 300 to 400Gb per node is a good load in the general case. Look at nodetool compactionstats .Is there a validation compaction running ? If so it is still building the merkle hash tree. Look at nodetool netstats . Is it streaming data ? If so all hash trees have been calculated. Cheers - Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 12/04/2012, at 2:16 AM, Frank Ng wrote: Can you expand further on your issue? Were you using Random Patitioner? thanks On Tue, Apr 10, 2012 at 5:35 PM, David Leimbach leim...@gmail.com wrote: I had this happen when I had really poorly generated tokens for the ring. Cassandra seems to accept numbers that are too big. You get hot spots when you think you should be balanced and repair never ends (I think there is a 48 hour timeout). On Tuesday, April 10, 2012, Frank Ng wrote: I am not using tier-sized compaction. On Tue, Apr 10, 2012 at 12:56 PM, Jonathan Rhone rh...@tinyco.comwrote: Data size, number of nodes, RF? Are you using size-tiered compaction on any of the column families that hold a lot of your data? Do your cassandra logs say you are streaming a lot of ranges? zgrep -E (Performing streaming repair|out of sync) On Tue, Apr 10, 2012 at 9:45 AM, Igor i...@4friends.od.ua wrote: On 04/10/2012 07:16 PM, Frank Ng wrote: Short answer - yes. But you are asking wrong question. I think both processes are taking a while. When it starts up, netstats and compactionstats show nothing. Anyone out there successfully using ext3 and their repair processes are faster than this? On Tue, Apr 10, 2012 at 10:42 AM, Igor i...@4friends.od.ua wrote: Hi You can check with nodetool which part of repair process is slow - network streams or verify compactions. use nodetool netstats or compactionstats. On 04/10/2012 05:16 PM, Frank Ng wrote: Hello, I am on Cassandra 1.0.7. My repair processes are taking over 30 hours to complete. Is it normal for the repair process to take this long? I wonder if it's because I am using the ext3 file system. thanks -- Jonathan Rhone Software Engineer *TinyCo* 800 Market St., Fl 6 San Francisco, CA 94102 www.tinyco.com
Repair Process Taking too long
Hello, I am on Cassandra 1.0.7. My repair processes are taking over 30 hours to complete. Is it normal for the repair process to take this long? I wonder if it's because I am using the ext3 file system. thanks
Re: Repair Process Taking too long
I think both processes are taking a while. When it starts up, netstats and compactionstats show nothing. Anyone out there successfully using ext3 and their repair processes are faster than this? On Tue, Apr 10, 2012 at 10:42 AM, Igor i...@4friends.od.ua wrote: Hi You can check with nodetool which part of repair process is slow - network streams or verify compactions. use nodetool netstats or compactionstats. On 04/10/2012 05:16 PM, Frank Ng wrote: Hello, I am on Cassandra 1.0.7. My repair processes are taking over 30 hours to complete. Is it normal for the repair process to take this long? I wonder if it's because I am using the ext3 file system. thanks
Re: Repair Process Taking too long
I have 12 nodes with approximately 1TB load per node. The RF is 3. I am considering moving to ext4. I checked the ranges and the numbers go from 1 to the 9000s . On Tue, Apr 10, 2012 at 12:56 PM, Jonathan Rhone rh...@tinyco.com wrote: Data size, number of nodes, RF? Are you using size-tiered compaction on any of the column families that hold a lot of your data? Do your cassandra logs say you are streaming a lot of ranges? zgrep -E (Performing streaming repair|out of sync) On Tue, Apr 10, 2012 at 9:45 AM, Igor i...@4friends.od.ua wrote: On 04/10/2012 07:16 PM, Frank Ng wrote: Short answer - yes. But you are asking wrong question. I think both processes are taking a while. When it starts up, netstats and compactionstats show nothing. Anyone out there successfully using ext3 and their repair processes are faster than this? On Tue, Apr 10, 2012 at 10:42 AM, Igor i...@4friends.od.ua wrote: Hi You can check with nodetool which part of repair process is slow - network streams or verify compactions. use nodetool netstats or compactionstats. On 04/10/2012 05:16 PM, Frank Ng wrote: Hello, I am on Cassandra 1.0.7. My repair processes are taking over 30 hours to complete. Is it normal for the repair process to take this long? I wonder if it's because I am using the ext3 file system. thanks -- Jonathan Rhone Software Engineer *TinyCo* 800 Market St., Fl 6 San Francisco, CA 94102 www.tinyco.com
Re: Repair Process Taking too long
I am not using tier-sized compaction. On Tue, Apr 10, 2012 at 12:56 PM, Jonathan Rhone rh...@tinyco.com wrote: Data size, number of nodes, RF? Are you using size-tiered compaction on any of the column families that hold a lot of your data? Do your cassandra logs say you are streaming a lot of ranges? zgrep -E (Performing streaming repair|out of sync) On Tue, Apr 10, 2012 at 9:45 AM, Igor i...@4friends.od.ua wrote: On 04/10/2012 07:16 PM, Frank Ng wrote: Short answer - yes. But you are asking wrong question. I think both processes are taking a while. When it starts up, netstats and compactionstats show nothing. Anyone out there successfully using ext3 and their repair processes are faster than this? On Tue, Apr 10, 2012 at 10:42 AM, Igor i...@4friends.od.ua wrote: Hi You can check with nodetool which part of repair process is slow - network streams or verify compactions. use nodetool netstats or compactionstats. On 04/10/2012 05:16 PM, Frank Ng wrote: Hello, I am on Cassandra 1.0.7. My repair processes are taking over 30 hours to complete. Is it normal for the repair process to take this long? I wonder if it's because I am using the ext3 file system. thanks -- Jonathan Rhone Software Engineer *TinyCo* 800 Market St., Fl 6 San Francisco, CA 94102 www.tinyco.com