Setting up Cassandra to store on a specific node and not replicate

2013-12-18 Thread Colin MacDonald
Ahoy the list.  I am evaluating Cassandra in the context of using it as a 
storage back end for the Titan graph database.

We’ll have several nodes in the cluster.  However, one of our requirements is 
that data has to be loaded into and stored on a specific node and only on that 
node.  Also, it cannot be replicated around the system, at least not stored 
persistently on disk – we will of course make copies in memory and on the wire 
as we access remote notes.  These requirements are non-negotiable.

We understand that this is essentially the opposite of what Cassandra is 
designed for, and that we’re missing all the scalability and robustness, but is 
it technically possible?

First, I would need to create a custom partitioner – is there any tutorial on 
that?  I see a few “you don’t need” to threads, but I do.

Second, how easy is it to have Cassandra not replicate data between nodes in a 
cluster?  I’m not seeing an obvious configuration option for that, presumably 
because it obviates much of the point of using Cassandra, but again, we’re 
working within some rather unfortunate constraints.

Any hints or suggestions would be most gratefully received.

Kind regards,

-Colin MacDonald-



RE: Setting up Cassandra to store on a specific node and not replicate

2013-12-18 Thread Colin MacDonald
 -Original Message-
 From: Janne Jalkanen [mailto:janne.jalka...@ecyrd.com]
 
 Essentially you want to turn off all the features which make Cassandra a
 robust product ;-).

Oh, I don't want to, but sadly those are the requirements that I have to work 
with.

Again, the context is using it as the storage back for a graph database.  I'm 
currently looking at the Titan graph DBMS, which supports the use of Cassandra 
or HBase for a distributed graph, both of which will need to be hobbled to 
prevent them working the way they're designed.

So it really is a question of: *can* I cripple Cassandra in this way, and if so 
how?

Thanks for the response.

-Colin MacDonald- 



RE: Setting up Cassandra to store on a specific node and not replicate

2013-12-18 Thread Colin MacDonald
 -Original Message-
 From: Sylvain Lebresne [mailto:sylv...@datastax.com]
 Sent: 18 December 2013 10:45
 
 You seem to be well aware that you're not looking at using Cassandra for
 what it is designed for (which obviously imply you'll need to expect under-
 optimal behavior), so I'm not going to insist on it.

Very kind of you. ;)

I'm suspect that that this requirement is viscerally horrifying, but as I said, 
it's idiosyncratic, specified by an... idiosyncrat.

It's a pragmatic solution that I'm looking for, just to get a proof of concept 
going, it doesn't have to be elegant at this stage.

 As to how you could achieve that, a relatively simple solution (that do not
 require writing your own partitioner) would consist in using 2 datacenters
 (that obviously don't have to be real physical datacenter), to put the one 
 that
 should have it all in one datacenter with RF=1 and to pull all other nodes in
 the other datacenter with RF=0.
 
 As Janne said, you could still have hint being written by other nodes if the
 one storage node is dead, but you can use the system property
 cassandra.maxHintTTL to 0 to disable hints.

Thanks Sylvain, I'll look into that.  I'm coming to Cassandra cold, I hadn't 
even spotted that the replication factor was configurable - I don't see an 
option for in the cassandra.yaml that came with 2.0.2.  I should be able to 
figure it out though, and that's great news, it looks like it takes care of one 
issue.

However, I'm not immediately seeing how to control which node will get the 
single copy of the data.  Won't the partitioner still allocate data around the 
cluster?

Ah, is a datacentre a logical group *within* an overall cluster?  So I can 
create a separate datacentre for each node, and if I write to that node the 
data will be forced to stay in that datacentre, i.e. that node?

I do apologise for the noobish questions, my attention is currently split 
between investigating several possible solutions.  I rather favour Cassandra 
though, if I can hobble it appropriately.

Kind regards,

-Colin MacDonald- 



RE: Setting up Cassandra to store on a specific node and not replicate

2013-12-18 Thread Colin MacDonald
 -Original Message-
 From: Sylvain Lebresne [mailto:sylv...@datastax.com]
 Sent: 18 December 2013 12:46
 Google up NetworkTopologyStrategy. This is what you want to use and it's
 not configured in cassandra.yaml but when you create the keyspace.
 
 Basically, you define your topology in cassandra-topology.yaml (where you
 basically manually set which node is in which DC, which you can really just
 see as assigning nodes to named groups) and then you can define the
 replication factor for each DC (so if RF=1 on the 1 node group and 0 on the
 other nodes group, C* will gladly honor it and store no data on node of the
 other nodes group).
 
 --
 Sylvain

Thank you so much, that's clear and helpful.  I appreciate you taking the time 
to explain it.

-Colin-