Long running nodetool repair
Hi, I am new to Cassandra and I am not sure if this is the normal behavior but nodetool repair runs for too long even for small dataset per node. As I am writing I started a nodetool repair last night at 18:41 and now it's 9:18 and it's still running, the size of my data is only ~500mb per node. We have 3 Node cluster in DC1 with RF 3 1 Node Cluster in DC2 with RF 1 1 Node cluster in DC3 with RF 1 and running Cassandra V1.2.1 with 256 vNodes. From cassandra logs I do not see AntiEntropy logs anymore only compaction Task and FlushWriter. Is this a normal behaviour of nodetool repair? Is the running time grow linearly with the size of the data? Any help or direction will be much appreciated. Thanks, H
Testing compaction strategies on a single production server?
Hey, Version 1.1 of Cassandra introduced live traffic sampling, which allows you to measure the performance of a node without it really joining the cluster: http://www.datastax.com/dev/blog/whats-new-in-cassandra-1-1-live-traffic-sampling That page mentions that you can change the compaction strategy through jmx if you want to test out a different strategy on your survey node. That's great, but it doesn't give you a complete view of how your performance would change, since you're not doing reads from the survey node. But what would happen if you used jmx to change the compaction strategy of a column family on a single *production* node? Would that be a safe way to test it out or are there side-effects of doing that live? And if you do that, would running a major compaction transform the entire column family to the new format? Finally, if the test was a success, how do you proceed from there? Just change the schema? /Henrik
RE: Testing compaction strategies on a single production server?
Just turn off dynamic snitch on survey node and make read requests from it directly with CL.ONE, watch histograms, compare. Regarding switching compaction strategy there're a lot of info already. Best regards / Pagarbiai Viktor Jevdokimov Senior Developer Email: viktor.jevdoki...@adform.commailto:viktor.jevdoki...@adform.com Phone: +370 5 212 3063, Fax +370 5 261 0453 J. Jasinskio 16C, LT-01112 Vilnius, Lithuania Follow us on Twitter: @adforminsiderhttp://twitter.com/#!/adforminsider Take a ride with Adform's Rich Media Suitehttp://vimeo.com/adform/richmedia [Adform News] http://www.adform.com [Adform awarded the Best Employer 2012] http://www.adform.com/site/blog/adform/adform-takes-top-spot-in-best-employer-survey/ Disclaimer: The information contained in this message and attachments is intended solely for the attention and use of the named addressee and may be confidential. If you are not the intended recipient, you are reminded that the information remains the property of the sender. You must not use, disclose, distribute, copy, print or rely on this e-mail. If you have received this message in error, please contact the sender immediately and irrevocably delete this message and any copies. From: Henrik Schröder [mailto:skro...@gmail.com] Sent: Tuesday, February 19, 2013 15:57 To: user Subject: Testing compaction strategies on a single production server? Hey, Version 1.1 of Cassandra introduced live traffic sampling, which allows you to measure the performance of a node without it really joining the cluster: http://www.datastax.com/dev/blog/whats-new-in-cassandra-1-1-live-traffic-sampling That page mentions that you can change the compaction strategy through jmx if you want to test out a different strategy on your survey node. That's great, but it doesn't give you a complete view of how your performance would change, since you're not doing reads from the survey node. But what would happen if you used jmx to change the compaction strategy of a column family on a single *production* node? Would that be a safe way to test it out or are there side-effects of doing that live? And if you do that, would running a major compaction transform the entire column family to the new format? Finally, if the test was a success, how do you proceed from there? Just change the schema? /Henrik inline: signature-logo18be.pnginline: signature-best-employer-logo6784.png
Re: cassandra vs. mongodb quick question(good additional info)
The 40 TB use case you heard about is probably one 40TB mysql machine that someone migrated to mongo so it would be web scale Cassandra is NOT good with drives that big, get a blade center or a high density chassis. On Mon, Feb 18, 2013 at 8:00 PM, Hiller, Dean dean.hil...@nrel.gov wrote: I thought about this more, and even with a 10Gbit network, it would take 40 days to bring up a replacement node if mongodb did truly have a 42T / node like I had heard. I wrote the below email to the person I heard this from going back to basics which really puts some perspective on it….(and a lot of people don't even have a 10Gbit network like we do) Nodes are hooked up by a 10G network at most right now where that is 10gigabit. We are talking about 10Terabytes on disk per node recently. Google 10 gigabit in gigabytes gives me 1.25 gigabytes/second (yes I could have divided by 8 in my head but eh…course when I saw the number, I went duh) So trying to transfer 10 Terabytes or 10,000 Gigabytes to a node that we are bringing online to replace a dead node would take approximately 5 days??? This means no one else is using the bandwidth too ;). 10,000Gigabytes * 1 second/1.25 * 1hr/60secs * 1 day / 24 hrs = 5.55 days. This is more likely 11 days if we only use 50% of the network. So bringing a new node up to speed is more like 11 days once it is crashed. I think this is the main reason the 1Terabyte exists to begin with, right? From an ops perspective, this could sound like a nightmare scenario of waiting 10 days…..maybe it is livable though. Either way, I thought it would be good to share the numbers. ALSO, that is assuming the bus with it's 10 disk can keep up with 10G Can it? What is the limit of throughput on a bus / second on the computers we have as on wikipedia there is a huge variance? What is the rate of the disks too (multiplied by 10 of course)? Will they keep up with a 10G rate for bringing a new node online? This all comes into play even more so when you want to double the size of your cluster of course as all nodes have to transfer half of what they have to all the new nodes that come online(cassandra actually has a very data center/rack aware topology to transfer data correctly to not use up all bandwidth unecessarily…I am not sure mongodb has that). Anyways, just food for thought. From: aaron morton aa...@thelastpickle.commailto:aa...@thelastpickle.com Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Date: Monday, February 18, 2013 1:39 PM To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org, Vegard Berget p...@fantasista.nomailto:p...@fantasista.no Subject: Re: cassandra vs. mongodb quick question My experience is repair of 300GB compressed data takes longer than 300GB of uncompressed, but I cannot point to an exact number. Calculating the differences is mostly CPU bound and works on the non compressed data. Streaming uses compression (after uncompressing the on disk data). So if you have 300GB of compressed data, take a look at how long repair takes and see if you are comfortable with that. You may also want to test replacing a node so you can get the procedure documented and understand how long it takes. The idea of the soft 300GB to 500GB limit cam about because of a number of cases where people had 1 TB on a single node and they were surprised it took days to repair or replace. If you know how long things may take, and that fits in your operations then go with it. Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 18/02/2013, at 10:08 PM, Vegard Berget p...@fantasista.nomailto:p...@fantasista.no wrote: Just out of curiosity : When using compression, does this affect this one way or another? Is 300G (compressed) SSTable size, or total size of data? .vegard, - Original Message - From: user@cassandra.apache.orgmailto:user@cassandra.apache.org To: user@cassandra.apache.orgmailto:user@cassandra.apache.org Cc: Sent: Mon, 18 Feb 2013 08:41:25 +1300 Subject: Re: cassandra vs. mongodb quick question If you have spinning disk and 1G networking and no virtual nodes, I would still say 300G to 500G is a soft limit. If you are using virtual nodes, SSD, JBOD disk configuration or faster networking you may go higher. The limiting factors are the time it take to repair, the time it takes to replace a node, the memory considerations for 100's of millions of rows. If you the performance of those operations is acceptable to you, then go crazy. Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.comhttp://www.thelastpickle.com/ On 16/02/2013, at 9:05 AM,
Re: JNA not found.
Hey Guys, I just wanted to follow up on this thread on how I go JNA to work with the cassandra 1.2.1 tarball I downloaded. On CentOS I went : [root@cassandra-node01 ~]# yum provides */jna.jar ... jna-3.4.0-4.el5.x86_64 : Pure Java access to native libraries Repo: epel Matched from: Filename: /usr/share/java/jna.jar [root@cassandra-node01 ~]# yum install jna-3.4.0-4.el5.x86_64 [root@cassandra-node01 ~]# ln -s /usr/share/java/jna.jar /usr/local/apache-cassandra-1.2.1/lib/jna.jar Now when I start Cassandra I see this message: INFO 10:11:11,852 JNA mlockall successful That's a win! Can't for the life of me figure out why Cassandra 1.2 was refusing to recognize the downloaded jna.jar file in it's lib directory. But the above trick seems to work every time. Thanks for all your input. Tim On Tue, Jan 29, 2013 at 10:18 PM, Tim Dunphy bluethu...@gmail.com wrote: Hi Chandra, I'm using Cassandra 1.2.1 and jna/platform 3.5.1. One thing I should mention is that I tried putting the jar files into my java jre/lib directory. The theory being those jars would be available to all java apps. In that case Cassandra will start but still not recognize JNA. If I copy the jars to the cassandra/lib directory, I have the same crashing issue. Even if I symlink from the jre/lib directory to the cassandra/lib directory the same issue occurs. It's like this version of Cassandra can't stand having the jna jar in it's lib directory. I'm beginning to wonder if if anyone has gotten JNA to work with this version of cassandra and if so how. I've only tried a tarball install so far, I can't say about the package install which may well work. Thanks Tim On Tue, Jan 29, 2013 at 10:07 PM, chandra Varahala hadoopandcassan...@gmail.com wrote: we had this issue before, but after adding those two jar the error gone. We used 1.0.8 cassandra (JNA 3.3.0, JNA platform. 3.3.0). what version cassnadra you are using ? -chandra On Tue, Jan 29, 2013 at 12:19 PM, Tim Dunphy bluethu...@gmail.comwrote: Hi Chandra, Thanks for your reply. Well I have added both jna.jar and platform.jar to my lib directory (jna 3.3.0): [root@cassandra-node01 cassandrahome]# ls -l lib/jna.jar lib/platform.jar -rw-r--r-- 1 root root 865400 Jan 29 12:14 lib/jna.jar -rw-r--r-- 1 root root 841291 Jan 29 12:14 lib/platform.jar But sadly I get the same result: [root@cassandra-node01 cassandrahome]# ./bin/cassandra -f xss = -ea -javaagent:/etc/alternatives/cassandrahome/lib/jamm-0.2.5.jar -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms295M -Xmx295M -Xmn73M -XX:+HeapDumpOnOutOfMemoryError -Xss180k INFO 12:14:52,493 Logging initialized INFO 12:14:52,507 JVM vendor/version: Java HotSpot(TM) 64-Bit Server VM/1.6.0_34 INFO 12:14:52,507 Heap size: 301727744/302776320 INFO 12:14:52,508 Classpath: /etc/alternatives/cassandrahome/conf:/etc/alternatives/cassandrahome/build/classes/main:/etc/alternatives/cassandrahome/build/classes/thrift:/etc/alternatives/cassandrahome/lib/antlr-3.2.jar:/etc/alternatives/cassandrahome/lib/apache-cassandra-1.2.1.jar:/etc/alternatives/cassandrahome/lib/apache-cassandra-clientutil-1.2.1.jar:/etc/alternatives/cassandrahome/lib/apache-cassandra-thrift-1.2.1.jar:/etc/alternatives/cassandrahome/lib/avro-1.4.0-fixes.jar:/etc/alternatives/cassandrahome/lib/avro-1.4.0-sources-fixes.jar:/etc/alternatives/cassandrahome/lib/commons-cli-1.1.jar:/etc/alternatives/cassandrahome/lib/commons-codec-1.2.jar:/etc/alternatives/cassandrahome/lib/commons-lang-2.6.jar:/etc/alternatives/cassandrahome/lib/compress-lzf-0.8.4.jar:/etc/alternatives/cassandrahome/lib/concurrentlinkedhashmap-lru-1.3.jar:/etc/alternatives/cassandrahome/lib/guava-13.0.1.jar:/etc/alternatives/cassandrahome/lib/high-scale-lib-1.1.2.jar:/etc/alternatives/cassandrahome/lib/jackson-core-asl-1.9.2.jar:/etc/alternatives/cassandrahome/lib/jackson-mapper-asl-1.9.2.jar:/etc/alternatives/cassandrahome/lib/jamm-0.2.5.jar:/etc/alternatives/cassandrahome/lib/jline-1.0.jar:/etc/alternatives/cassandrahome/lib/jna.jar:/etc/alternatives/cassandrahome/lib/json-simple-1.1.jar:/etc/alternatives/cassandrahome/lib/libthrift-0.7.0.jar:/etc/alternatives/cassandrahome/lib/log4j-1.2.16.jar:/etc/alternatives/cassandrahome/lib/metrics-core-2.0.3.jar:/etc/alternatives/cassandrahome/lib/netty-3.5.9.Final.jar:/etc/alternatives/cassandrahome/lib/platform.jar:/etc/alternatives/cassandrahome/lib/servlet-api-2.5-20081211.jar:/etc/alternatives/cassandrahome/lib/slf4j-api-1.7.2.jar:/etc/alternatives/cassandrahome/lib/slf4j-log4j12-1.7.2.jar:/etc/alternatives/cassandrahome/lib/snakeyaml-1.6.jar:/etc/alternatives/cassandrahome/lib/snappy-java-1.0.4.1.jar:/etc/alternatives/cassandrahome/lib/snaptree-0.1.jar:/etc/alternatives/cassandrahome/lib/jamm-0.2.5.jar Killed And still when I remove those library files cassandra starts without a problem exception the fact that it is not able to use JNA. I'd appreciate any input
Re: JNA not found.
based on your mount/selinux settings sometimes the os is unwilling to tolerate so files outside certain directories. Edward On Tue, Feb 19, 2013 at 10:13 AM, Tim Dunphy bluethu...@gmail.com wrote: Hey Guys, I just wanted to follow up on this thread on how I go JNA to work with the cassandra 1.2.1 tarball I downloaded. On CentOS I went : [root@cassandra-node01 ~]# yum provides */jna.jar ... jna-3.4.0-4.el5.x86_64 : Pure Java access to native libraries Repo: epel Matched from: Filename: /usr/share/java/jna.jar [root@cassandra-node01 ~]# yum install jna-3.4.0-4.el5.x86_64 [root@cassandra-node01 ~]# ln -s /usr/share/java/jna.jar /usr/local/apache-cassandra-1.2.1/lib/jna.jar Now when I start Cassandra I see this message: INFO 10:11:11,852 JNA mlockall successful That's a win! Can't for the life of me figure out why Cassandra 1.2 was refusing to recognize the downloaded jna.jar file in it's lib directory. But the above trick seems to work every time. Thanks for all your input. Tim On Tue, Jan 29, 2013 at 10:18 PM, Tim Dunphy bluethu...@gmail.com wrote: Hi Chandra, I'm using Cassandra 1.2.1 and jna/platform 3.5.1. One thing I should mention is that I tried putting the jar files into my java jre/lib directory. The theory being those jars would be available to all java apps. In that case Cassandra will start but still not recognize JNA. If I copy the jars to the cassandra/lib directory, I have the same crashing issue. Even if I symlink from the jre/lib directory to the cassandra/lib directory the same issue occurs. It's like this version of Cassandra can't stand having the jna jar in it's lib directory. I'm beginning to wonder if if anyone has gotten JNA to work with this version of cassandra and if so how. I've only tried a tarball install so far, I can't say about the package install which may well work. Thanks Tim On Tue, Jan 29, 2013 at 10:07 PM, chandra Varahala hadoopandcassan...@gmail.com wrote: we had this issue before, but after adding those two jar the error gone. We used 1.0.8 cassandra (JNA 3.3.0, JNA platform. 3.3.0). what version cassnadra you are using ? -chandra On Tue, Jan 29, 2013 at 12:19 PM, Tim Dunphy bluethu...@gmail.com wrote: Hi Chandra, Thanks for your reply. Well I have added both jna.jar and platform.jar to my lib directory (jna 3.3.0): [root@cassandra-node01 cassandrahome]# ls -l lib/jna.jar lib/platform.jar -rw-r--r-- 1 root root 865400 Jan 29 12:14 lib/jna.jar -rw-r--r-- 1 root root 841291 Jan 29 12:14 lib/platform.jar But sadly I get the same result: [root@cassandra-node01 cassandrahome]# ./bin/cassandra -f xss = -ea -javaagent:/etc/alternatives/cassandrahome/lib/jamm-0.2.5.jar -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms295M -Xmx295M -Xmn73M -XX:+HeapDumpOnOutOfMemoryError -Xss180k INFO 12:14:52,493 Logging initialized INFO 12:14:52,507 JVM vendor/version: Java HotSpot(TM) 64-Bit Server VM/1.6.0_34 INFO 12:14:52,507 Heap size: 301727744/302776320 INFO 12:14:52,508 Classpath:
Re: Long running nodetool repair
This is very normal (unfortunately). Are you doing a repair –pr or a straight up repair? Does nodetool netstats show anything? I frequently see repair hang in 1.2.1, and I haven't been able to figure out why yet though. Feel free to take a stack dump with jstack on the node doing the repair and see if there are any deadlocks potentially occurring after the merkel tree's are received. And to help more, do you have the last logs after AntiEntrophy? Any streaming sessions from other nodes? Bug is being tracked here: https://issues.apache.org/jira/browse/CASSANDRA-5146 Best, Michael From: Haithem Jarraya haithem.jarr...@struq.commailto:haithem.jarr...@struq.com Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Date: Tuesday, February 19, 2013 1:29 AM To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Subject: Long running nodetool repair Hi, I am new to Cassandra and I am not sure if this is the normal behavior but nodetool repair runs for too long even for small dataset per node. As I am writing I started a nodetool repair last night at 18:41 and now it's 9:18 and it's still running, the size of my data is only ~500mb per node. We have 3 Node cluster in DC1 with RF 3 1 Node Cluster in DC2 with RF 1 1 Node cluster in DC3 with RF 1 and running Cassandra V1.2.1 with 256 vNodes. From cassandra logs I do not see AntiEntropy logs anymore only compaction Task and FlushWriter. Is this a normal behaviour of nodetool repair? Is the running time grow linearly with the size of the data? Any help or direction will be much appreciated. Thanks, H
Re: Testing compaction strategies on a single production server?
Well, that answer didn't really help. I know how to make a survey node, and I know how to simulate reads to it, it's just that that's a lot of work, and I wouldn't be sure that the simulated load is the same as the production load. We gather a lot of metrics from our production servers, so we know exactly how they perform over long periods of time. Changing a single server to run a different compaction strategy would allow us to know in detail how a different strategy would impact the cluster. So, is it possible to modify org.apache.cassandra.db.[keyspace].[column family].CompactionStrategyClass through jmx on a production server without any ill effects? Or is this only possible to do on a survey node while it is in a specific state? /Henrik On Tue, Feb 19, 2013 at 3:09 PM, Viktor Jevdokimov viktor.jevdoki...@adform.com wrote: Just turn off dynamic snitch on survey node and make read requests from it directly with CL.ONE, watch histograms, compare. ** ** Regarding switching compaction strategy there’re a lot of info already.*** * ** ** ** ** Best regards / Pagarbiai *Viktor Jevdokimov* Senior Developer Email: viktor.jevdoki...@adform.com Phone: +370 5 212 3063, Fax +370 5 261 0453 J. Jasinskio 16C, LT-01112 Vilnius, Lithuania Follow us on Twitter: @adforminsider http://twitter.com/#!/adforminsider Take a ride with Adform's Rich Media Suitehttp://vimeo.com/adform/richmedia [image: Adform News] http://www.adform.com [image: Adform awarded the Best Employer 2012] http://www.adform.com/site/blog/adform/adform-takes-top-spot-in-best-employer-survey/ Disclaimer: The information contained in this message and attachments is intended solely for the attention and use of the named addressee and may be confidential. If you are not the intended recipient, you are reminded that the information remains the property of the sender. You must not use, disclose, distribute, copy, print or rely on this e-mail. If you have received this message in error, please contact the sender immediately and irrevocably delete this message and any copies. *From:* Henrik Schröder [mailto:skro...@gmail.com] *Sent:* Tuesday, February 19, 2013 15:57 *To:* user *Subject:* Testing compaction strategies on a single production server?*** * ** ** Hey, Version 1.1 of Cassandra introduced live traffic sampling, which allows you to measure the performance of a node without it really joining the cluster: http://www.datastax.com/dev/blog/whats-new-in-cassandra-1-1-live-traffic-sampling That page mentions that you can change the compaction strategy through jmx if you want to test out a different strategy on your survey node. That's great, but it doesn't give you a complete view of how your performance would change, since you're not doing reads from the survey node. But what would happen if you used jmx to change the compaction strategy of a column family on a single *production* node? Would that be a safe way to test it out or are there side-effects of doing that live? And if you do that, would running a major compaction transform the entire column family to the new format? Finally, if the test was a success, how do you proceed from there? Just change the schema? /Henrik signature-best-employer-logo6784.pngsignature-logo18be.png
Re: cassandra vs. mongodb quick question(good additional info)
From my limited experience with Mongo, it seems that Mongo only performs when the whole data set is in the memory which makes me wonder how the 40TB data works.. - Original Message - From: Edward Capriolo edlinuxg...@gmail.com To: user@cassandra.apache.org Sent: Tuesday, February 19, 2013 7:02:56 AM Subject: Re: cassandra vs. mongodb quick question(good additional info) The 40 TB use case you heard about is probably one 40TB mysql machine that someone migrated to mongo so it would be web scale Cassandra is NOT good with drives that big, get a blade center or a high density chassis. On Mon, Feb 18, 2013 at 8:00 PM, Hiller, Dean dean.hil...@nrel.gov wrote: I thought about this more, and even with a 10Gbit network, it would take 40 days to bring up a replacement node if mongodb did truly have a 42T / node like I had heard. I wrote the below email to the person I heard this from going back to basics which really puts some perspective on it….(and a lot of people don't even have a 10Gbit network like we do) Nodes are hooked up by a 10G network at most right now where that is 10gigabit. We are talking about 10Terabytes on disk per node recently. Google 10 gigabit in gigabytes gives me 1.25 gigabytes/second (yes I could have divided by 8 in my head but eh…course when I saw the number, I went duh) So trying to transfer 10 Terabytes or 10,000 Gigabytes to a node that we are bringing online to replace a dead node would take approximately 5 days??? This means no one else is using the bandwidth too ;). 10,000Gigabytes * 1 second/1.25 * 1hr/60secs * 1 day / 24 hrs = 5.55 days. This is more likely 11 days if we only use 50% of the network. So bringing a new node up to speed is more like 11 days once it is crashed. I think this is the main reason the 1Terabyte exists to begin with, right? From an ops perspective, this could sound like a nightmare scenario of waiting 10 days…..maybe it is livable though. Either way, I thought it would be good to share the numbers. ALSO, that is assuming the bus with it's 10 disk can keep up with 10G Can it? What is the limit of throughput on a bus / second on the computers we have as on wikipedia there is a huge variance? What is the rate of the disks too (multiplied by 10 of course)? Will they keep up with a 10G rate for bringing a new node online? This all comes into play even more so when you want to double the size of your cluster of course as all nodes have to transfer half of what they have to all the new nodes that come online(cassandra actually has a very data center/rack aware topology to transfer data correctly to not use up all bandwidth unecessarily…I am not sure mongodb has that). Anyways, just food for thought. From: aaron morton aa...@thelastpickle.commailto:aa...@thelastpickle.com Reply-To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org Date: Monday, February 18, 2013 1:39 PM To: user@cassandra.apache.orgmailto:user@cassandra.apache.org user@cassandra.apache.orgmailto:user@cassandra.apache.org, Vegard Berget p...@fantasista.nomailto:p...@fantasista.no Subject: Re: cassandra vs. mongodb quick question My experience is repair of 300GB compressed data takes longer than 300GB of uncompressed, but I cannot point to an exact number. Calculating the differences is mostly CPU bound and works on the non compressed data. Streaming uses compression (after uncompressing the on disk data). So if you have 300GB of compressed data, take a look at how long repair takes and see if you are comfortable with that. You may also want to test replacing a node so you can get the procedure documented and understand how long it takes. The idea of the soft 300GB to 500GB limit cam about because of a number of cases where people had 1 TB on a single node and they were surprised it took days to repair or replace. If you know how long things may take, and that fits in your operations then go with it. Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 18/02/2013, at 10:08 PM, Vegard Berget p...@fantasista.nomailto:p...@fantasista.no wrote: Just out of curiosity : When using compression, does this affect this one way or another? Is 300G (compressed) SSTable size, or total size of data? .vegard, - Original Message - From: user@cassandra.apache.orgmailto:user@cassandra.apache.org To: user@cassandra.apache.orgmailto:user@cassandra.apache.org Cc: Sent: Mon, 18 Feb 2013 08:41:25 +1300 Subject: Re: cassandra vs. mongodb quick question If you have spinning disk and 1G networking and no virtual nodes, I would still say 300G to 500G is a soft limit. If you are using virtual nodes, SSD, JBOD disk configuration or faster networking you may go higher. The limiting factors are the time it take
Re: Long running nodetool repair
It should not take that long. For my 200G node, it takes about an hour to calculate the Merkle tree and then data streaming. By the way, how do you know the repair is not done? If you run nodetool tpstats, it should give you the AntiEntropy session info, active/pending/completed etc. While calculating Merkle tree, you can see the progress from nodetool compactionstats. While streaming data, you can see the progress from nodetool netstats. Also you can grep the log by Merkle and repair. - Original Message - From: Haithem Jarraya haithem.jarr...@struq.com To: user@cassandra.apache.org Sent: Tuesday, February 19, 2013 1:29:19 AM Subject: Long running nodetool repair Hi, I am new to Cassandra and I am not sure if this is the normal behavior but nodetool repair runs for too long even for small dataset per node. As I am writing I started a nodetool repair last night at 18:41 and now it's 9:18 and it's still running, the size of my data is only ~500mb per node. We have 3 Node cluster in DC1 with RF 3 1 Node Cluster in DC2 with RF 1 1 Node cluster in DC3 with RF 1 and running Cassandra V1.2.1 with 256 vNodes. From cassandra logs I do not see AntiEntropy logs anymore only compaction Task and FlushWriter. Is this a normal behaviour of nodetool repair? Is the running time grow linearly with the size of the data? Any help or direction will be much appreciated. Thanks, H
unsubscribe
Unsubscribe me please. Thanks A
Re: unsubscribe
Read the message you answered to, and help yourself ! Alain 2013/2/19 Anurag Gujral anurag.guj...@gmail.com Unsubscribe me please. Thanks A
RE: Question on Cassandra Snapshot
Thank you Aaron. From: aa...@thelastpickle.com Subject: Re: Question on Cassandra Snapshot Date: Mon, 18 Feb 2013 06:37:34 +1300 To: user@cassandra.apache.org With incremental_backup turned OFF in Cassandra.yaml - Are all SSTables are under /data/TestKeySpace/ColumnFamily at all times?No. They are deleted when they are compacted and no internal operations are referencing them. With incremental_backup turned ON in cassandra.yaml - Are current SSTables under /data/TestKeySpace/ColumnFamily/ with a hardlink to /data/TestKeySpace/ColumnFamily/backups? Yes, sort of. *All* SSTables ever created are in the backups directory. Not just the ones currently live. Lets say I have taken snapshot and moved the /data/TestKeySpace/ColumnFamily/snapshots/snapshot-name/*.db to tape, at what point should I be backing up *.db files from /data/TestKeySpace/ColumnFamily/backups directory. Also, should I be deleting the *.db files whose inode matches with the files in the snapshot? Is that a correct approach? Backup all files in the snapshots. There may be non .db extensions files if you use levelled compactionsWhen you are finished with the snapshot delete it. If the inode is not longer referenced from the live data dir it will be deleted. I noticed /data/TestKeySpace/ColumnFamily/snapshots/timestamp-ColumnFamily/ what are these timestamp directories?Probably automatic snapshot from dropping KS or CF's Cheers -Aaron MortonFreelance Cassandra DeveloperNew Zealand @aaronmortonhttp://www.thelastpickle.com On 16/02/2013, at 4:41 AM, S C as...@outlook.com wrote:I appreciate any advise or pointers on this. Thanks in advance. From: as...@outlook.com To: user@cassandra.apache.org Subject: Question on Cassandra Snapshot Date: Thu, 14 Feb 2013 20:47:14 -0600 I have been looking at incremental backups and snapshots. I have done some experimentation but could not come to a conclusion. Can somebody please help me understanding it right? /data is my data partition With incremental_backup turned OFF in Cassandra.yaml - Are all SSTables are under /data/TestKeySpace/ColumnFamily at all times?With incremental_backup turned ON in cassandra.yaml - Are current SSTables under /data/TestKeySpace/ColumnFamily/ with a hardlink to /data/TestKeySpace/ColumnFamily/backups? Lets say I have taken snapshot and moved the /data/TestKeySpace/ColumnFamily/snapshots/snapshot-name/*.db to tape, at what point should I be backing up *.db files from /data/TestKeySpace/ColumnFamily/backups directory. Also, should I be deleting the *.db files whose inode matches with the files in the snapshot? Is that a correct approach? I noticed /data/TestKeySpace/ColumnFamily/snapshots/timestamp-ColumnFamily/ what are these timestamp directories? Thanks in advance. SC
Cassandra network latency tuning
I have a 5 node cluster and currently running ver 1.2. Prior to full scale deployment, I'm running some benchmarks using YCSB. From a hadoop cluster deployment we saw an excellent improvement using higher speed networks. However Cassandra does not include network latencies and I would like to understand how we can capture network latencies between a 1GbE and 10GbE for ex. As of now all the graphs look the same. We will soon be adding SSD's and was wondering how Cassandra can utilize the 10GbE and the SSD's and if there are specific tuning that is required.
How to limit query results like from row 50 to 100
With CQL or an API.
Re: Mutation dropped
Does the rpc_timeout not control the client timeout ? No it is how long a node will wait for a response from other nodes before raising a TimedOutException if less than CL nodes have responded. Set the client side socket timeout using your preferred client. Is there any param which is configurable to control the replication timeout between nodes ? There is no such thing. rpc_timeout is roughly like that, but it's not right to think about it that way. i.e. if a message to a replica times out and CL nodes have already responded then we are happy to call the request complete. Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 19/02/2013, at 1:48 AM, Kanwar Sangha kan...@mavenir.com wrote: Thanks Aaron. Does the rpc_timeout not control the client timeout ? Is there any param which is configurable to control the replication timeout between nodes ? Or the same param is used to control that since the other node is also like a client ? From: aaron morton [mailto:aa...@thelastpickle.com] Sent: 17 February 2013 11:26 To: user@cassandra.apache.org Subject: Re: Mutation dropped You are hitting the maximum throughput on the cluster. The messages are dropped because the node fails to start processing them before rpc_timeout. However the request is still a success because the client requested CL was achieved. Testing with RF 2 and CL 1 really just tests the disks on one local machine. Both nodes replicate each row, and writes are sent to each replica, so the only thing the client is waiting on is the local node to write to it's commit log. Testing with (and running in prod) RF3 and CL QUROUM is a more real world scenario. Cheers - Aaron Morton Freelance Cassandra Developer New Zealand @aaronmorton http://www.thelastpickle.com On 15/02/2013, at 9:42 AM, Kanwar Sangha kan...@mavenir.com wrote: Hi – Is there a parameter which can be tuned to prevent the mutations from being dropped ? Is this logic correct ? Node A and B with RF=2, CL =1. Load balanced between the two. -- Address Load Tokens Owns (effective) Host ID Rack UN 10.x.x.x 746.78 GB 256 100.0% dbc9e539-f735-4b0b-8067-b97a85522a1a rack1 UN 10.x.x.x 880.77 GB 256 100.0% 95d59054-be99-455f-90d1-f43981d3d778 rack1 Once we hit a very high TPS (around 50k/sec of inserts), the nodes start falling behind and we see the mutation dropped messages. But there are no failures on the client. Does that mean other node is not able to persist the replicated data ? Is there some timeout associated with replicated data persistence ? Thanks, Kanwar From: Kanwar Sangha [mailto:kan...@mavenir.com] Sent: 14 February 2013 09:08 To: user@cassandra.apache.org Subject: Mutation dropped Hi – I am doing a load test using YCSB across 2 nodes in a cluster and seeing a lot of mutation dropped messages. I understand that this is due to the replica not being written to the other node ? RF = 2, CL =1. From the wiki - For MUTATION messages this means that the mutation was not applied to all replicas it was sent to. The inconsistency will be repaired by Read Repair or Anti Entropy Repair Thanks, Kanwar