Re: Disabling auto snapshots
To disable auto snapshots, set the property auto_snapshot: false in your cassandra.yaml file. Mark On 21 May 2015 at 08:30, Ali Akhtar ali.rac...@gmail.com wrote: Is there a config setting where automatic snapshots can be disabled? I have a use case where a table is truncated quite often, and would like to not have snapshots. I can't find anything on google. Thanks.
Re: Disabling auto snapshots
Thanks! On Thu, May 21, 2015 at 12:34 PM, Mark Reddy mark.l.re...@gmail.com wrote: To disable auto snapshots, set the property auto_snapshot: false in your cassandra.yaml file. Mark On 21 May 2015 at 08:30, Ali Akhtar ali.rac...@gmail.com wrote: Is there a config setting where automatic snapshots can be disabled? I have a use case where a table is truncated quite often, and would like to not have snapshots. I can't find anything on google. Thanks.
Re: amd64 binaries gone from Debian apt repository (20x series)?
They seem to be in place now. Thanks! On Wed, May 20, 2015 at 5:46 PM, Michael Shuler mich...@pbandjelly.org wrote: On 05/20/2015 03:19 AM, Andrej Pregl wrote: So i was trying to install Cassandra via the Debian repository at http://www.apache.org/dist/cassandra/debian and for some reason the 20x series doesn't have the amd64 binaries. They have previously been available there so I'm wondering why they are gone? This should be corrected now - I asked the devs to resync apache-mirror. Thanks! $ apt-cache policy cassandra cassandra: Installed: (none) Candidate: 2.0.15 Version table: 2.0.15 0 500 http://www.apache.org/dist/cassandra/debian/ 20x/main amd64 Packages
Disabling auto snapshots
Is there a config setting where automatic snapshots can be disabled? I have a use case where a table is truncated quite often, and would like to not have snapshots. I can't find anything on google. Thanks.
Re: Multiple cassandra instances per physical node
Hi, I also advice against multiple instances on the same hardware. If you have really big boxes why not virtualize? Other option is experiment with CCM. Although there are some limitations with CCM (ex: JNA is disabled) If you follow up on this I would to hear how it went. Em 21/05/2015 19:33, Dan Kinder dkin...@turnitin.com escreveu: Hi, I'd just like some clarity and advice regarding running multiple cassandra instances on a single large machine (big JBOD array, plenty of CPU/RAM). First, I am aware this was not Cassandra's original design, and doing this seems to unreasonably go against the commodity hardware intentions of Cassandra's design. In general it seems to be recommended against (at least as far as I've heard from @Rob Coli and others). However maybe this term commodity is changing... my hardware/ops team argues that due to cooling, power, and other datacenter costs, having slightly larger nodes (=32G RAM, =24 CPU, =8 disks JBOD) is actually a better price point. Now, I am not a hardware guy, so if this is not actually true I'd love to hear why, otherwise I pretty much need to take them at their word. Now, Cassandra features seemed to have improved such that JBOD works fairly well, but especially with memory/GC this seems to be reaching its limit. One Cassandra instance can only scale up so much. So my question is: suppose I take a 12 disk JBOD and run 2 Cassandra nodes (each with 5 data disks, 1 commit log disk) and either give each its own container IP or change the listen ports. Will this work? What are the risks? Will/should Cassandra support this better in the future? -- --
Re: Disabling auto snapshots
Is there any method to disable this programmatically on a table-by-table basis. I'm running into an issue regarding drop table which I'll post in a separate thread. On Thu, May 21, 2015 at 3:34 AM, Mark Reddy mark.l.re...@gmail.com wrote: To disable auto snapshots, set the property auto_snapshot: false in your cassandra.yaml file. Mark On 21 May 2015 at 08:30, Ali Akhtar ali.rac...@gmail.com wrote: Is there a config setting where automatic snapshots can be disabled? I have a use case where a table is truncated quite often, and would like to not have snapshots. I can't find anything on google. Thanks.
Re: Does Cassandra support delayed cross-datacenter replication?
No. On Thu, May 21, 2015 at 7:07 AM Eax Melanhovich m...@eax.me wrote: Say I would like to have a replica cluster, which state is a state of real cluster 12 hours ago. Does Cassandra support such feature? -- Best regards, Eax Melanhovich http://eax.me/
Re: Multiple cassandra instances per physical node
JBOD -- just a bunch of disks, no raid. All the best, [image: datastax_logo.png] http://www.datastax.com/ Sebastián Estévez Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com [image: linkedin.png] https://www.linkedin.com/company/datastax [image: facebook.png] https://www.facebook.com/datastax [image: twitter.png] https://twitter.com/datastax [image: g+.png] https://plus.google.com/+Datastax/about http://feeds.feedburner.com/datastax http://cassandrasummit-datastax.com/ DataStax is the fastest, most scalable distributed database technology, delivering Apache Cassandra to the world’s most innovative enterprises. Datastax is built to be agile, always-on, and predictably scalable to any size. With more than 500 customers in 45 countries, DataStax is the database technology and transactional backbone of choice for the worlds most innovative companies such as Netflix, Adobe, Intuit, and eBay. On Thu, May 21, 2015 at 4:00 PM, James Rothering jrother...@codojo.me wrote: Hmmm ... Not familiar with JBOD. Is that just RAID-0? Also ... wrt the container talk, is that a Docker container you're talking about? On Thu, May 21, 2015 at 12:48 PM, Jonathan Haddad j...@jonhaddad.com wrote: If you run it in a container with dedicated IPs it'll work just fine. Just be sure you aren't using the same machine to replicate it's own data. On Thu, May 21, 2015 at 12:43 PM Manoj Khangaonkar khangaon...@gmail.com wrote: +1. I agree we need to be able to run multiple server instances on one physical machine. This is especially necessary in development and test environments where one is experimenting and needs a cluster, but do not have access to multiple physical machines. If you google , you can find a few blogs that talk about how to do this. But it is less than ideal. We need to be able to do it by changing ports in cassandra.yaml. ( The way it is done easily with Hadoop or Apache Kafka or Redis and many other distributed systems) regards On Thu, May 21, 2015 at 10:32 AM, Dan Kinder dkin...@turnitin.com wrote: Hi, I'd just like some clarity and advice regarding running multiple cassandra instances on a single large machine (big JBOD array, plenty of CPU/RAM). First, I am aware this was not Cassandra's original design, and doing this seems to unreasonably go against the commodity hardware intentions of Cassandra's design. In general it seems to be recommended against (at least as far as I've heard from @Rob Coli and others). However maybe this term commodity is changing... my hardware/ops team argues that due to cooling, power, and other datacenter costs, having slightly larger nodes (=32G RAM, =24 CPU, =8 disks JBOD) is actually a better price point. Now, I am not a hardware guy, so if this is not actually true I'd love to hear why, otherwise I pretty much need to take them at their word. Now, Cassandra features seemed to have improved such that JBOD works fairly well, but especially with memory/GC this seems to be reaching its limit. One Cassandra instance can only scale up so much. So my question is: suppose I take a 12 disk JBOD and run 2 Cassandra nodes (each with 5 data disks, 1 commit log disk) and either give each its own container IP or change the listen ports. Will this work? What are the risks? Will/should Cassandra support this better in the future? -- http://khangaonkar.blogspot.com/
Re: Multiple cassandra instances per physical node
@James Rothering yeah I was thinking of container in a broad sense: either full virtual machines, docker containers, straight LXC, or whatever else would allow the Cassandra nodes to have their own IPs and bind to default ports. @Jonathan Haddad thanks for the blog post. To ensure the same host does not replicate its own data, would I basically need the nodes on a single host to be labeled as one rack? (Assuming I use vnodes) On Thu, May 21, 2015 at 1:02 PM, Sebastian Estevez sebastian.este...@datastax.com wrote: JBOD -- just a bunch of disks, no raid. All the best, [image: datastax_logo.png] http://www.datastax.com/ Sebastián Estévez Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com [image: linkedin.png] https://www.linkedin.com/company/datastax [image: facebook.png] https://www.facebook.com/datastax [image: twitter.png] https://twitter.com/datastax [image: g+.png] https://plus.google.com/+Datastax/about http://feeds.feedburner.com/datastax http://cassandrasummit-datastax.com/ DataStax is the fastest, most scalable distributed database technology, delivering Apache Cassandra to the world’s most innovative enterprises. Datastax is built to be agile, always-on, and predictably scalable to any size. With more than 500 customers in 45 countries, DataStax is the database technology and transactional backbone of choice for the worlds most innovative companies such as Netflix, Adobe, Intuit, and eBay. On Thu, May 21, 2015 at 4:00 PM, James Rothering jrother...@codojo.me wrote: Hmmm ... Not familiar with JBOD. Is that just RAID-0? Also ... wrt the container talk, is that a Docker container you're talking about? On Thu, May 21, 2015 at 12:48 PM, Jonathan Haddad j...@jonhaddad.com wrote: If you run it in a container with dedicated IPs it'll work just fine. Just be sure you aren't using the same machine to replicate it's own data. On Thu, May 21, 2015 at 12:43 PM Manoj Khangaonkar khangaon...@gmail.com wrote: +1. I agree we need to be able to run multiple server instances on one physical machine. This is especially necessary in development and test environments where one is experimenting and needs a cluster, but do not have access to multiple physical machines. If you google , you can find a few blogs that talk about how to do this. But it is less than ideal. We need to be able to do it by changing ports in cassandra.yaml. ( The way it is done easily with Hadoop or Apache Kafka or Redis and many other distributed systems) regards On Thu, May 21, 2015 at 10:32 AM, Dan Kinder dkin...@turnitin.com wrote: Hi, I'd just like some clarity and advice regarding running multiple cassandra instances on a single large machine (big JBOD array, plenty of CPU/RAM). First, I am aware this was not Cassandra's original design, and doing this seems to unreasonably go against the commodity hardware intentions of Cassandra's design. In general it seems to be recommended against (at least as far as I've heard from @Rob Coli and others). However maybe this term commodity is changing... my hardware/ops team argues that due to cooling, power, and other datacenter costs, having slightly larger nodes (=32G RAM, =24 CPU, =8 disks JBOD) is actually a better price point. Now, I am not a hardware guy, so if this is not actually true I'd love to hear why, otherwise I pretty much need to take them at their word. Now, Cassandra features seemed to have improved such that JBOD works fairly well, but especially with memory/GC this seems to be reaching its limit. One Cassandra instance can only scale up so much. So my question is: suppose I take a 12 disk JBOD and run 2 Cassandra nodes (each with 5 data disks, 1 commit log disk) and either give each its own container IP or change the listen ports. Will this work? What are the risks? Will/should Cassandra support this better in the future? -- http://khangaonkar.blogspot.com/ -- Dan Kinder Senior Software Engineer Turnitin – www.turnitin.com dkin...@turnitin.com
Re: Multiple cassandra instances per physical node
You could use docker but it's not required. You could use LXC if you wanted. Shameless self promo: http://rustyrazorblade.com/2013/08/advanced-devops-with-vagrant-and-lxc/ On Thu, May 21, 2015 at 1:00 PM James Rothering jrother...@codojo.me wrote: Hmmm ... Not familiar with JBOD. Is that just RAID-0? Also ... wrt the container talk, is that a Docker container you're talking about? On Thu, May 21, 2015 at 12:48 PM, Jonathan Haddad j...@jonhaddad.com wrote: If you run it in a container with dedicated IPs it'll work just fine. Just be sure you aren't using the same machine to replicate it's own data. On Thu, May 21, 2015 at 12:43 PM Manoj Khangaonkar khangaon...@gmail.com wrote: +1. I agree we need to be able to run multiple server instances on one physical machine. This is especially necessary in development and test environments where one is experimenting and needs a cluster, but do not have access to multiple physical machines. If you google , you can find a few blogs that talk about how to do this. But it is less than ideal. We need to be able to do it by changing ports in cassandra.yaml. ( The way it is done easily with Hadoop or Apache Kafka or Redis and many other distributed systems) regards On Thu, May 21, 2015 at 10:32 AM, Dan Kinder dkin...@turnitin.com wrote: Hi, I'd just like some clarity and advice regarding running multiple cassandra instances on a single large machine (big JBOD array, plenty of CPU/RAM). First, I am aware this was not Cassandra's original design, and doing this seems to unreasonably go against the commodity hardware intentions of Cassandra's design. In general it seems to be recommended against (at least as far as I've heard from @Rob Coli and others). However maybe this term commodity is changing... my hardware/ops team argues that due to cooling, power, and other datacenter costs, having slightly larger nodes (=32G RAM, =24 CPU, =8 disks JBOD) is actually a better price point. Now, I am not a hardware guy, so if this is not actually true I'd love to hear why, otherwise I pretty much need to take them at their word. Now, Cassandra features seemed to have improved such that JBOD works fairly well, but especially with memory/GC this seems to be reaching its limit. One Cassandra instance can only scale up so much. So my question is: suppose I take a 12 disk JBOD and run 2 Cassandra nodes (each with 5 data disks, 1 commit log disk) and either give each its own container IP or change the listen ports. Will this work? What are the risks? Will/should Cassandra support this better in the future? -- http://khangaonkar.blogspot.com/
Re: Multiple cassandra instances per physical node
Hi, we do operate multiple instances (of possibly different versions) of Cassandra on rather thick nodes. The only problem we encountered so far was sharing same physical data disk among multiple instances - it proved to not be the best idea.Sharing of commitlog disks caused no troubles so far. Other than that, it works without any problems. We manage the instances by a set of helper scripts (which change the env variables, so nodetool and such operates on right instance) and puppet templates. Jiri Horky On Thu, May 21, 2015 at 11:06 PM, Dan Kinder dkin...@turnitin.com wrote: @James Rothering yeah I was thinking of container in a broad sense: either full virtual machines, docker containers, straight LXC, or whatever else would allow the Cassandra nodes to have their own IPs and bind to default ports. @Jonathan Haddad thanks for the blog post. To ensure the same host does not replicate its own data, would I basically need the nodes on a single host to be labeled as one rack? (Assuming I use vnodes) On Thu, May 21, 2015 at 1:02 PM, Sebastian Estevez sebastian.este...@datastax.com wrote: JBOD -- just a bunch of disks, no raid. All the best, [image: datastax_logo.png] http://www.datastax.com/ Sebastián Estévez Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com [image: linkedin.png] https://www.linkedin.com/company/datastax [image: facebook.png] https://www.facebook.com/datastax [image: twitter.png] https://twitter.com/datastax [image: g+.png] https://plus.google.com/+Datastax/about http://feeds.feedburner.com/datastax http://cassandrasummit-datastax.com/ DataStax is the fastest, most scalable distributed database technology, delivering Apache Cassandra to the world’s most innovative enterprises. Datastax is built to be agile, always-on, and predictably scalable to any size. With more than 500 customers in 45 countries, DataStax is the database technology and transactional backbone of choice for the worlds most innovative companies such as Netflix, Adobe, Intuit, and eBay. On Thu, May 21, 2015 at 4:00 PM, James Rothering jrother...@codojo.me wrote: Hmmm ... Not familiar with JBOD. Is that just RAID-0? Also ... wrt the container talk, is that a Docker container you're talking about? On Thu, May 21, 2015 at 12:48 PM, Jonathan Haddad j...@jonhaddad.com wrote: If you run it in a container with dedicated IPs it'll work just fine. Just be sure you aren't using the same machine to replicate it's own data. On Thu, May 21, 2015 at 12:43 PM Manoj Khangaonkar khangaon...@gmail.com wrote: +1. I agree we need to be able to run multiple server instances on one physical machine. This is especially necessary in development and test environments where one is experimenting and needs a cluster, but do not have access to multiple physical machines. If you google , you can find a few blogs that talk about how to do this. But it is less than ideal. We need to be able to do it by changing ports in cassandra.yaml. ( The way it is done easily with Hadoop or Apache Kafka or Redis and many other distributed systems) regards On Thu, May 21, 2015 at 10:32 AM, Dan Kinder dkin...@turnitin.com wrote: Hi, I'd just like some clarity and advice regarding running multiple cassandra instances on a single large machine (big JBOD array, plenty of CPU/RAM). First, I am aware this was not Cassandra's original design, and doing this seems to unreasonably go against the commodity hardware intentions of Cassandra's design. In general it seems to be recommended against (at least as far as I've heard from @Rob Coli and others). However maybe this term commodity is changing... my hardware/ops team argues that due to cooling, power, and other datacenter costs, having slightly larger nodes (=32G RAM, =24 CPU, =8 disks JBOD) is actually a better price point. Now, I am not a hardware guy, so if this is not actually true I'd love to hear why, otherwise I pretty much need to take them at their word. Now, Cassandra features seemed to have improved such that JBOD works fairly well, but especially with memory/GC this seems to be reaching its limit. One Cassandra instance can only scale up so much. So my question is: suppose I take a 12 disk JBOD and run 2 Cassandra nodes (each with 5 data disks, 1 commit log disk) and either give each its own container IP or change the listen ports. Will this work? What are the risks? Will/should Cassandra support this better in the future? -- http://khangaonkar.blogspot.com/ -- Dan Kinder Senior Software Engineer Turnitin – www.turnitin.com dkin...@turnitin.com
Re: Multiple cassandra instances per physical node
Hmmm ... Not familiar with JBOD. Is that just RAID-0? Also ... wrt the container talk, is that a Docker container you're talking about? On Thu, May 21, 2015 at 12:48 PM, Jonathan Haddad j...@jonhaddad.com wrote: If you run it in a container with dedicated IPs it'll work just fine. Just be sure you aren't using the same machine to replicate it's own data. On Thu, May 21, 2015 at 12:43 PM Manoj Khangaonkar khangaon...@gmail.com wrote: +1. I agree we need to be able to run multiple server instances on one physical machine. This is especially necessary in development and test environments where one is experimenting and needs a cluster, but do not have access to multiple physical machines. If you google , you can find a few blogs that talk about how to do this. But it is less than ideal. We need to be able to do it by changing ports in cassandra.yaml. ( The way it is done easily with Hadoop or Apache Kafka or Redis and many other distributed systems) regards On Thu, May 21, 2015 at 10:32 AM, Dan Kinder dkin...@turnitin.com wrote: Hi, I'd just like some clarity and advice regarding running multiple cassandra instances on a single large machine (big JBOD array, plenty of CPU/RAM). First, I am aware this was not Cassandra's original design, and doing this seems to unreasonably go against the commodity hardware intentions of Cassandra's design. In general it seems to be recommended against (at least as far as I've heard from @Rob Coli and others). However maybe this term commodity is changing... my hardware/ops team argues that due to cooling, power, and other datacenter costs, having slightly larger nodes (=32G RAM, =24 CPU, =8 disks JBOD) is actually a better price point. Now, I am not a hardware guy, so if this is not actually true I'd love to hear why, otherwise I pretty much need to take them at their word. Now, Cassandra features seemed to have improved such that JBOD works fairly well, but especially with memory/GC this seems to be reaching its limit. One Cassandra instance can only scale up so much. So my question is: suppose I take a 12 disk JBOD and run 2 Cassandra nodes (each with 5 data disks, 1 commit log disk) and either give each its own container IP or change the listen ports. Will this work? What are the risks? Will/should Cassandra support this better in the future? -- http://khangaonkar.blogspot.com/
Re: Performance penalty of multiple UPDATEs of non-pk columns
Counters differ significantly between 2.0 and 2.1 ( https://issues.apache.org/jira/browse/CASSANDRA-6405 among others). But in both scenarios, you will pay more for counter reconciles and compactions vs. regular updates. The final counter performance fix will come with CASSANDRA-6506. For details read Aleksey's post - http://www.datastax.com/dev/blog/whats-new-in-cassandra-2-1-a-better-implementation-of-counters All the best, [image: datastax_logo.png] http://www.datastax.com/ Sebastián Estévez Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com [image: linkedin.png] https://www.linkedin.com/company/datastax [image: facebook.png] https://www.facebook.com/datastax [image: twitter.png] https://twitter.com/datastax [image: g+.png] https://plus.google.com/+Datastax/about http://feeds.feedburner.com/datastax http://cassandrasummit-datastax.com/ DataStax is the fastest, most scalable distributed database technology, delivering Apache Cassandra to the world’s most innovative enterprises. Datastax is built to be agile, always-on, and predictably scalable to any size. With more than 500 customers in 45 countries, DataStax is the database technology and transactional backbone of choice for the worlds most innovative companies such as Netflix, Adobe, Intuit, and eBay. On Thu, May 21, 2015 at 5:48 AM, Jens Rantil jens.ran...@tink.se wrote: Artur, That's not entirely true. Writes to Cassandra are first written to a memtable (in-memory table) which is periodically flushed to disk. If multiple writes are coming in before the flush, then only a single record will be written to the disk/sstable. If your have writes that aren't coming within the same flush, they will get removed when you are compacting just like you say. Unfortunately I can't answer this regarding Counters as I haven't worked with them. Hope this helped at least. Cheers, Jens On Fri, May 15, 2015 at 11:16 AM, Artur Siekielski a...@vhex.net wrote: I've seen some discussions about the topic on the list recently, but I would like to get more clear answers. Given the table: CREATE TABLE t1 ( f1 text, f2 text, f3 text, PRIMARY KEY(f1, f2) ); and assuming I will execute UPDATE of f3 multiple times (say, 1000) for the same key values k1, k2 and different values of 'newval': UPDATE t1 SET f3=newval WHERE f1=k1 AND f2=k2; How will the performance of selecting the current 'f3' value be affected?: SELECT f3 FROM t1 WHERE f1=k2 AND f2=k2; It looks like all the previous values are preserved until compaction, but does executing the SELECT reads all the values (O(n), n - number of updates) or only the current one (O(1)) ? How the situation looks for Counter types? -- Jens Rantil Backend engineer Tink AB Email: jens.ran...@tink.se Phone: +46 708 84 18 32 Web: www.tink.se Facebook https://www.facebook.com/#!/tink.se Linkedin http://www.linkedin.com/company/2735919?trk=vsrp_companies_res_phototrkInfo=VSRPsearchId%3A1057023381369207406670%2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary Twitter https://twitter.com/tink
Re: Multiple cassandra instances per physical node
Yep, that would be one way to handle it. On Thu, May 21, 2015 at 2:07 PM Dan Kinder dkin...@turnitin.com wrote: @James Rothering yeah I was thinking of container in a broad sense: either full virtual machines, docker containers, straight LXC, or whatever else would allow the Cassandra nodes to have their own IPs and bind to default ports. @Jonathan Haddad thanks for the blog post. To ensure the same host does not replicate its own data, would I basically need the nodes on a single host to be labeled as one rack? (Assuming I use vnodes) On Thu, May 21, 2015 at 1:02 PM, Sebastian Estevez sebastian.este...@datastax.com wrote: JBOD -- just a bunch of disks, no raid. All the best, [image: datastax_logo.png] http://www.datastax.com/ Sebastián Estévez Solutions Architect | 954 905 8615 | sebastian.este...@datastax.com [image: linkedin.png] https://www.linkedin.com/company/datastax [image: facebook.png] https://www.facebook.com/datastax [image: twitter.png] https://twitter.com/datastax [image: g+.png] https://plus.google.com/+Datastax/about http://feeds.feedburner.com/datastax http://cassandrasummit-datastax.com/ DataStax is the fastest, most scalable distributed database technology, delivering Apache Cassandra to the world’s most innovative enterprises. Datastax is built to be agile, always-on, and predictably scalable to any size. With more than 500 customers in 45 countries, DataStax is the database technology and transactional backbone of choice for the worlds most innovative companies such as Netflix, Adobe, Intuit, and eBay. On Thu, May 21, 2015 at 4:00 PM, James Rothering jrother...@codojo.me wrote: Hmmm ... Not familiar with JBOD. Is that just RAID-0? Also ... wrt the container talk, is that a Docker container you're talking about? On Thu, May 21, 2015 at 12:48 PM, Jonathan Haddad j...@jonhaddad.com wrote: If you run it in a container with dedicated IPs it'll work just fine. Just be sure you aren't using the same machine to replicate it's own data. On Thu, May 21, 2015 at 12:43 PM Manoj Khangaonkar khangaon...@gmail.com wrote: +1. I agree we need to be able to run multiple server instances on one physical machine. This is especially necessary in development and test environments where one is experimenting and needs a cluster, but do not have access to multiple physical machines. If you google , you can find a few blogs that talk about how to do this. But it is less than ideal. We need to be able to do it by changing ports in cassandra.yaml. ( The way it is done easily with Hadoop or Apache Kafka or Redis and many other distributed systems) regards On Thu, May 21, 2015 at 10:32 AM, Dan Kinder dkin...@turnitin.com wrote: Hi, I'd just like some clarity and advice regarding running multiple cassandra instances on a single large machine (big JBOD array, plenty of CPU/RAM). First, I am aware this was not Cassandra's original design, and doing this seems to unreasonably go against the commodity hardware intentions of Cassandra's design. In general it seems to be recommended against (at least as far as I've heard from @Rob Coli and others). However maybe this term commodity is changing... my hardware/ops team argues that due to cooling, power, and other datacenter costs, having slightly larger nodes (=32G RAM, =24 CPU, =8 disks JBOD) is actually a better price point. Now, I am not a hardware guy, so if this is not actually true I'd love to hear why, otherwise I pretty much need to take them at their word. Now, Cassandra features seemed to have improved such that JBOD works fairly well, but especially with memory/GC this seems to be reaching its limit. One Cassandra instance can only scale up so much. So my question is: suppose I take a 12 disk JBOD and run 2 Cassandra nodes (each with 5 data disks, 1 commit log disk) and either give each its own container IP or change the listen ports. Will this work? What are the risks? Will/should Cassandra support this better in the future? -- http://khangaonkar.blogspot.com/ -- Dan Kinder Senior Software Engineer Turnitin – www.turnitin.com dkin...@turnitin.com
Re: Performance penalty of multiple UPDATEs of non-pk columns
Artur, That's not entirely true. Writes to Cassandra are first written to a memtable (in-memory table) which is periodically flushed to disk. If multiple writes are coming in before the flush, then only a single record will be written to the disk/sstable. If your have writes that aren't coming within the same flush, they will get removed when you are compacting just like you say. Unfortunately I can't answer this regarding Counters as I haven't worked with them. Hope this helped at least. Cheers, Jens On Fri, May 15, 2015 at 11:16 AM, Artur Siekielski a...@vhex.net wrote: I've seen some discussions about the topic on the list recently, but I would like to get more clear answers. Given the table: CREATE TABLE t1 ( f1 text, f2 text, f3 text, PRIMARY KEY(f1, f2) ); and assuming I will execute UPDATE of f3 multiple times (say, 1000) for the same key values k1, k2 and different values of 'newval': UPDATE t1 SET f3=newval WHERE f1=k1 AND f2=k2; How will the performance of selecting the current 'f3' value be affected?: SELECT f3 FROM t1 WHERE f1=k2 AND f2=k2; It looks like all the previous values are preserved until compaction, but does executing the SELECT reads all the values (O(n), n - number of updates) or only the current one (O(1)) ? How the situation looks for Counter types? -- Jens Rantil Backend engineer Tink AB Email: jens.ran...@tink.se Phone: +46 708 84 18 32 Web: www.tink.se Facebook https://www.facebook.com/#!/tink.se Linkedin http://www.linkedin.com/company/2735919?trk=vsrp_companies_res_phototrkInfo=VSRPsearchId%3A1057023381369207406670%2CVSRPtargetId%3A2735919%2CVSRPcmpt%3Aprimary Twitter https://twitter.com/tink
Re: Drop/Create table with same CF Name
Yes, it's a known issue. For more information on the topic see this support post from DataStax: https://support.datastax.com/hc/en-us/articles/204226339-How-to-drop-and-recreate-a-table-in-Cassandra-versions-older-than-2-1 Mark On 21 May 2015 at 15:31, Ken Hancock ken.hanc...@schange.com wrote: We've been running into the reused key cache issue (CASSANDRA-5202) with dropping and recreating the same table in Cassandra 1.2.18 so we've been testing with key caches disabled which does not seem to solve the issue. In the latest logs it seems that old SSTables metadata gets read after the tables have been deleted by the previous drop, eventually causing an exception and the Thrift interface shut down. At this point is it a known issue that one CANNOT reuse a table name prior to Cassandra 2.1 ?
Drop/Create table with same CF Name
We've been running into the reused key cache issue (CASSANDRA-5202) with dropping and recreating the same table in Cassandra 1.2.18 so we've been testing with key caches disabled which does not seem to solve the issue. In the latest logs it seems that old SSTables metadata gets read after the tables have been deleted by the previous drop, eventually causing an exception and the Thrift interface shut down. At this point is it a known issue that one CANNOT reuse a table name prior to Cassandra 2.1 ?
Does Cassandra support delayed cross-datacenter replication?
Say I would like to have a replica cluster, which state is a state of real cluster 12 hours ago. Does Cassandra support such feature? -- Best regards, Eax Melanhovich http://eax.me/
Re: Nodetool on 2.1.5
yeah, you can confirm in the log such as the one below. WARN [main] 2015-05-22 11:23:25,584 CassandraDaemon.java:81 - JMX is not enabled to receive remote connections. Please see cassandra-env.sh for more info. we are running c* with ipv6, cqlsh works superb but not on local link. $ nodetool -h fe80::224:1ff:fed7:82ea cfstats system.hints; nodetool: Failed to connect to 'fe80::224:1ff:fed7:82ea:7199' - ConnectException: 'Connection refused'. On Fri, May 22, 2015 at 12:39 AM, Yuki Morishita mor.y...@gmail.com wrote: For security reason, Cassandra changes JMX to listen localhost only since version 2.0.14/2.1.4. From NEWS.txt: The default JMX config now listens to localhost only. You must enable the other JMX flags in cassandra-env.sh manually. On Thu, May 21, 2015 at 11:05 AM, Walsh, Stephen stephen.wa...@aspect.com wrote: Just wondering if anyone else is seeing this issue on the nodetool after installing 2.1.5 This works nodetool -h 127.0.0.1 cfstats keyspace.table This works nodetool -h localhost cfstats keyspace.table This works nodetool cfstats keyspace.table This doesn’t work nodetool -h 192.168.1.10 cfstats keyspace.table nodetool: Failed to connect to ‘192.168.1.10:7199' - ConnectException: 'Connection refused'. Where 192.168.1.10 is the machine IP, All firewalls are disabled and it worked fine on version 2.0.13 This has happened on both of our upgraded clusters. Also no longer able to view the “CF: Total MemTable Size” “flushes pending” in Ops Center 5.1.1, related issue? This email (including any attachments) is proprietary to Aspect Software, Inc. and may contain information that is confidential. If you have received this message in error, please do not read, copy or forward this message. Please notify the sender immediately, delete it from your system and destroy any copies. You may not further disclose or distribute this email or its attachments. -- Yuki Morishita t:yukim (http://twitter.com/yukim)
Re: Multiple cassandra instances per physical node
If you run it in a container with dedicated IPs it'll work just fine. Just be sure you aren't using the same machine to replicate it's own data. On Thu, May 21, 2015 at 12:43 PM Manoj Khangaonkar khangaon...@gmail.com wrote: +1. I agree we need to be able to run multiple server instances on one physical machine. This is especially necessary in development and test environments where one is experimenting and needs a cluster, but do not have access to multiple physical machines. If you google , you can find a few blogs that talk about how to do this. But it is less than ideal. We need to be able to do it by changing ports in cassandra.yaml. ( The way it is done easily with Hadoop or Apache Kafka or Redis and many other distributed systems) regards On Thu, May 21, 2015 at 10:32 AM, Dan Kinder dkin...@turnitin.com wrote: Hi, I'd just like some clarity and advice regarding running multiple cassandra instances on a single large machine (big JBOD array, plenty of CPU/RAM). First, I am aware this was not Cassandra's original design, and doing this seems to unreasonably go against the commodity hardware intentions of Cassandra's design. In general it seems to be recommended against (at least as far as I've heard from @Rob Coli and others). However maybe this term commodity is changing... my hardware/ops team argues that due to cooling, power, and other datacenter costs, having slightly larger nodes (=32G RAM, =24 CPU, =8 disks JBOD) is actually a better price point. Now, I am not a hardware guy, so if this is not actually true I'd love to hear why, otherwise I pretty much need to take them at their word. Now, Cassandra features seemed to have improved such that JBOD works fairly well, but especially with memory/GC this seems to be reaching its limit. One Cassandra instance can only scale up so much. So my question is: suppose I take a 12 disk JBOD and run 2 Cassandra nodes (each with 5 data disks, 1 commit log disk) and either give each its own container IP or change the listen ports. Will this work? What are the risks? Will/should Cassandra support this better in the future? -- http://khangaonkar.blogspot.com/
Re: Multiple cassandra instances per physical node
+1. I agree we need to be able to run multiple server instances on one physical machine. This is especially necessary in development and test environments where one is experimenting and needs a cluster, but do not have access to multiple physical machines. If you google , you can find a few blogs that talk about how to do this. But it is less than ideal. We need to be able to do it by changing ports in cassandra.yaml. ( The way it is done easily with Hadoop or Apache Kafka or Redis and many other distributed systems) regards On Thu, May 21, 2015 at 10:32 AM, Dan Kinder dkin...@turnitin.com wrote: Hi, I'd just like some clarity and advice regarding running multiple cassandra instances on a single large machine (big JBOD array, plenty of CPU/RAM). First, I am aware this was not Cassandra's original design, and doing this seems to unreasonably go against the commodity hardware intentions of Cassandra's design. In general it seems to be recommended against (at least as far as I've heard from @Rob Coli and others). However maybe this term commodity is changing... my hardware/ops team argues that due to cooling, power, and other datacenter costs, having slightly larger nodes (=32G RAM, =24 CPU, =8 disks JBOD) is actually a better price point. Now, I am not a hardware guy, so if this is not actually true I'd love to hear why, otherwise I pretty much need to take them at their word. Now, Cassandra features seemed to have improved such that JBOD works fairly well, but especially with memory/GC this seems to be reaching its limit. One Cassandra instance can only scale up so much. So my question is: suppose I take a 12 disk JBOD and run 2 Cassandra nodes (each with 5 data disks, 1 commit log disk) and either give each its own container IP or change the listen ports. Will this work? What are the risks? Will/should Cassandra support this better in the future? -- http://khangaonkar.blogspot.com/
Nodetool on 2.1.5
Just wondering if anyone else is seeing this issue on the nodetool after installing 2.1.5 This works nodetool -h 127.0.0.1 cfstats keyspace.table This works nodetool -h localhost cfstats keyspace.table This works nodetool cfstats keyspace.table This doesn't work nodetool -h 192.168.1.10 cfstats keyspace.table nodetool: Failed to connect to '192.168.1.10:7199' - ConnectException: 'Connection refused'. Where 192.168.1.10 is the machine IP, All firewalls are disabled and it worked fine on version 2.0.13 This has happened on both of our upgraded clusters. Also no longer able to view the CF: Total MemTable Size flushes pending in Ops Center 5.1.1, related issue? This email (including any attachments) is proprietary to Aspect Software, Inc. and may contain information that is confidential. If you have received this message in error, please do not read, copy or forward this message. Please notify the sender immediately, delete it from your system and destroy any copies. You may not further disclose or distribute this email or its attachments.
Re: Drop/Create table with same CF Name
Thanks Mark (though that article doesn't appear publicly accessible for others). Truncate would have been the tool of choice, however my understanding is truncate fails unless all nodes are up and running which makes it a non-workable choice since we can't determine when failures will occur. Ken On Thu, May 21, 2015 at 11:00 AM, Mark Reddy mark.l.re...@gmail.com wrote: Yes, it's a known issue. For more information on the topic see this support post from DataStax: https://support.datastax.com/hc/en-us/articles/204226339-How-to-drop-and-recreate-a-table-in-Cassandra-versions-older-than-2-1 Mark On 21 May 2015 at 15:31, Ken Hancock ken.hanc...@schange.com wrote: We've been running into the reused key cache issue (CASSANDRA-5202) with dropping and recreating the same table in Cassandra 1.2.18 so we've been testing with key caches disabled which does not seem to solve the issue. In the latest logs it seems that old SSTables metadata gets read after the tables have been deleted by the previous drop, eventually causing an exception and the Thrift interface shut down. At this point is it a known issue that one CANNOT reuse a table name prior to Cassandra 2.1 ?
Re: Nodetool on 2.1.5
For security reason, Cassandra changes JMX to listen localhost only since version 2.0.14/2.1.4. From NEWS.txt: The default JMX config now listens to localhost only. You must enable the other JMX flags in cassandra-env.sh manually. On Thu, May 21, 2015 at 11:05 AM, Walsh, Stephen stephen.wa...@aspect.com wrote: Just wondering if anyone else is seeing this issue on the nodetool after installing 2.1.5 This works nodetool -h 127.0.0.1 cfstats keyspace.table This works nodetool -h localhost cfstats keyspace.table This works nodetool cfstats keyspace.table This doesn’t work nodetool -h 192.168.1.10 cfstats keyspace.table nodetool: Failed to connect to ‘192.168.1.10:7199' - ConnectException: 'Connection refused'. Where 192.168.1.10 is the machine IP, All firewalls are disabled and it worked fine on version 2.0.13 This has happened on both of our upgraded clusters. Also no longer able to view the “CF: Total MemTable Size” “flushes pending” in Ops Center 5.1.1, related issue? This email (including any attachments) is proprietary to Aspect Software, Inc. and may contain information that is confidential. If you have received this message in error, please do not read, copy or forward this message. Please notify the sender immediately, delete it from your system and destroy any copies. You may not further disclose or distribute this email or its attachments. -- Yuki Morishita t:yukim (http://twitter.com/yukim)
Multiple cassandra instances per physical node
Hi, I'd just like some clarity and advice regarding running multiple cassandra instances on a single large machine (big JBOD array, plenty of CPU/RAM). First, I am aware this was not Cassandra's original design, and doing this seems to unreasonably go against the commodity hardware intentions of Cassandra's design. In general it seems to be recommended against (at least as far as I've heard from @Rob Coli and others). However maybe this term commodity is changing... my hardware/ops team argues that due to cooling, power, and other datacenter costs, having slightly larger nodes (=32G RAM, =24 CPU, =8 disks JBOD) is actually a better price point. Now, I am not a hardware guy, so if this is not actually true I'd love to hear why, otherwise I pretty much need to take them at their word. Now, Cassandra features seemed to have improved such that JBOD works fairly well, but especially with memory/GC this seems to be reaching its limit. One Cassandra instance can only scale up so much. So my question is: suppose I take a 12 disk JBOD and run 2 Cassandra nodes (each with 5 data disks, 1 commit log disk) and either give each its own container IP or change the listen ports. Will this work? What are the risks? Will/should Cassandra support this better in the future?