Re: Setting up Cassandra to store on a specific node and not replicate
Probably yes, if you also disabled any sort of failovers from the token-aware client… (Talking about this makes you realize how many failsafes Cassandra has. And still you can lose data… :-P) /Janne On 18 Dec 2013, at 20:31, Robert Coli rc...@eventbrite.com wrote: On Wed, Dec 18, 2013 at 2:44 AM, Sylvain Lebresne sylv...@datastax.com wrote: As Janne said, you could still have hint being written by other nodes if the one storage node is dead, but you can use the system property cassandra.maxHintTTL to 0 to disable hints. If one uses a Token Aware client with RF=1, that would seem to preclude hinting even without disabling HH for the entire system; if the coordinator is always the single replica, why would it send a copy anywhere else? =Rob
Re: Setting up Cassandra to store on a specific node and not replicate
On Wed, Dec 18, 2013 at 7:31 PM, Robert Coli rc...@eventbrite.com wrote: On Wed, Dec 18, 2013 at 2:44 AM, Sylvain Lebresne sylv...@datastax.comwrote: As Janne said, you could still have hint being written by other nodes if the one storage node is dead, but you can use the system property cassandra.maxHintTTL to 0 to disable hints. If one uses a Token Aware client with RF=1, that would seem to preclude hinting even without disabling HH for the entire system; if the coordinator is always the single replica, why would it send a copy anywhere else? Colin explicitly said that he would several nodes and I said I wasn't going to judge, so I implicitly assumed there was a reason for having multiple nodes. If you're going to always ever hit one node, then using a token aware client is over-complicating it. Just use a one node cluster and you'll have nothing to worry about or to configure. That being said, Colin, do be aware that as far as I can tell there is indeed relatively little benefit to having a multi-node cluster on which all data is on one node (in particular, there is no cache at the coordinator level, so that even if your client hit other nodes, everything will still be forwarded to the one node that stores it all, the other nodes won't store anything really, not even in memory). -- Sylvain
Re: Setting up Cassandra to store on a specific node and not replicate
This may be hard because the coordinator could store hinted handoff (HH) data on disk. You could turn HH off and have RF=1 to keep data on a single instance, but you would be likely to lose data if you had any problems with your instances… Also you would need to tweak the memtable flushing so that it goes to disk more often than the ten seconds which is the default. Or lose data. You will also have an interesting time scaling your cluster and would have to plan for that in your custom database. Essentially you want to turn off all the features which make Cassandra a robust product ;-). Without knowing your requirements more precisely, I'd be inclined to recommend manually sharding on MariaDB or Postgres instances instead, or use their underlying storage engines directly (e.g. InnoDB), if you're just looking for a key-value store. /Janne On 18 Dec 2013, at 11:20, Colin MacDonald colin.macdon...@sas.com wrote: Ahoy the list. I am evaluating Cassandra in the context of using it as a storage back end for the Titan graph database. We’ll have several nodes in the cluster. However, one of our requirements is that data has to be loaded into and stored on a specific node and only on that node. Also, it cannot be replicated around the system, at least not stored persistently on disk – we will of course make copies in memory and on the wire as we access remote notes. These requirements are non-negotiable. We understand that this is essentially the opposite of what Cassandra is designed for, and that we’re missing all the scalability and robustness, but is it technically possible? First, I would need to create a custom partitioner – is there any tutorial on that? I see a few “you don’t need” to threads, but I do. Second, how easy is it to have Cassandra not replicate data between nodes in a cluster? I’m not seeing an obvious configuration option for that, presumably because it obviates much of the point of using Cassandra, but again, we’re working within some rather unfortunate constraints. Any hints or suggestions would be most gratefully received. Kind regards, -Colin MacDonald-
RE: Setting up Cassandra to store on a specific node and not replicate
-Original Message- From: Janne Jalkanen [mailto:janne.jalka...@ecyrd.com] Essentially you want to turn off all the features which make Cassandra a robust product ;-). Oh, I don't want to, but sadly those are the requirements that I have to work with. Again, the context is using it as the storage back for a graph database. I'm currently looking at the Titan graph DBMS, which supports the use of Cassandra or HBase for a distributed graph, both of which will need to be hobbled to prevent them working the way they're designed. So it really is a question of: *can* I cripple Cassandra in this way, and if so how? Thanks for the response. -Colin MacDonald-
Re: Setting up Cassandra to store on a specific node and not replicate
You seem to be well aware that you're not looking at using Cassandra for what it is designed for (which obviously imply you'll need to expect under-optimal behavior), so I'm not going to insist on it. As to how you could achieve that, a relatively simple solution (that do not require writing your own partitioner) would consist in using 2 datacenters (that obviously don't have to be real physical datacenter), to put the one that should have it all in one datacenter with RF=1 and to pull all other nodes in the other datacenter with RF=0. As Janne said, you could still have hint being written by other nodes if the one storage node is dead, but you can use the system property cassandra.maxHintTTL to 0 to disable hints. -- Sylvain On Wed, Dec 18, 2013 at 10:20 AM, Colin MacDonald colin.macdon...@sas.comwrote: Ahoy the list. I am evaluating Cassandra in the context of using it as a storage back end for the Titan graph database. We’ll have several nodes in the cluster. However, one of our requirements is that data has to be loaded into and stored on a specific node and only on that node. Also, it cannot be replicated around the system, at least not stored persistently on disk – we will of course make copies in memory and on the wire as we access remote notes. These requirements are non-negotiable. We understand that this is essentially the opposite of what Cassandra is designed for, and that we’re missing all the scalability and robustness, but is it technically possible? First, I would need to create a custom partitioner – is there any tutorial on that? I see a few “you don’t need” to threads, but I do. Second, how easy is it to have Cassandra not replicate data between nodes in a cluster? I’m not seeing an obvious configuration option for that, presumably because it obviates much of the point of using Cassandra, but again, we’re working within some rather unfortunate constraints. Any hints or suggestions would be most gratefully received. Kind regards, -Colin MacDonald-
RE: Setting up Cassandra to store on a specific node and not replicate
-Original Message- From: Sylvain Lebresne [mailto:sylv...@datastax.com] Sent: 18 December 2013 10:45 You seem to be well aware that you're not looking at using Cassandra for what it is designed for (which obviously imply you'll need to expect under- optimal behavior), so I'm not going to insist on it. Very kind of you. ;) I'm suspect that that this requirement is viscerally horrifying, but as I said, it's idiosyncratic, specified by an... idiosyncrat. It's a pragmatic solution that I'm looking for, just to get a proof of concept going, it doesn't have to be elegant at this stage. As to how you could achieve that, a relatively simple solution (that do not require writing your own partitioner) would consist in using 2 datacenters (that obviously don't have to be real physical datacenter), to put the one that should have it all in one datacenter with RF=1 and to pull all other nodes in the other datacenter with RF=0. As Janne said, you could still have hint being written by other nodes if the one storage node is dead, but you can use the system property cassandra.maxHintTTL to 0 to disable hints. Thanks Sylvain, I'll look into that. I'm coming to Cassandra cold, I hadn't even spotted that the replication factor was configurable - I don't see an option for in the cassandra.yaml that came with 2.0.2. I should be able to figure it out though, and that's great news, it looks like it takes care of one issue. However, I'm not immediately seeing how to control which node will get the single copy of the data. Won't the partitioner still allocate data around the cluster? Ah, is a datacentre a logical group *within* an overall cluster? So I can create a separate datacentre for each node, and if I write to that node the data will be forced to stay in that datacentre, i.e. that node? I do apologise for the noobish questions, my attention is currently split between investigating several possible solutions. I rather favour Cassandra though, if I can hobble it appropriately. Kind regards, -Colin MacDonald-
RE: Setting up Cassandra to store on a specific node and not replicate
-Original Message- From: Sylvain Lebresne [mailto:sylv...@datastax.com] Sent: 18 December 2013 12:46 Google up NetworkTopologyStrategy. This is what you want to use and it's not configured in cassandra.yaml but when you create the keyspace. Basically, you define your topology in cassandra-topology.yaml (where you basically manually set which node is in which DC, which you can really just see as assigning nodes to named groups) and then you can define the replication factor for each DC (so if RF=1 on the 1 node group and 0 on the other nodes group, C* will gladly honor it and store no data on node of the other nodes group). -- Sylvain Thank you so much, that's clear and helpful. I appreciate you taking the time to explain it. -Colin-
Re: Setting up Cassandra to store on a specific node and not replicate
On Wed, Dec 18, 2013 at 2:44 AM, Sylvain Lebresne sylv...@datastax.comwrote: As Janne said, you could still have hint being written by other nodes if the one storage node is dead, but you can use the system property cassandra.maxHintTTL to 0 to disable hints. If one uses a Token Aware client with RF=1, that would seem to preclude hinting even without disabling HH for the entire system; if the coordinator is always the single replica, why would it send a copy anywhere else? =Rob