Re: Setting up Cassandra to store on a specific node and not replicate

2013-12-19 Thread Janne Jalkanen

Probably yes, if you also disabled any sort of failovers from the token-aware 
client…

(Talking about this makes you realize how many failsafes Cassandra has. And 
still you can lose data… :-P)

/Janne

On 18 Dec 2013, at 20:31, Robert Coli rc...@eventbrite.com wrote:

 On Wed, Dec 18, 2013 at 2:44 AM, Sylvain Lebresne sylv...@datastax.com 
 wrote:
 As Janne said, you could still have hint being written by other nodes if the 
 one storage node is dead, but you can use the system property 
 cassandra.maxHintTTL to 0 to disable hints.
 
 If one uses a Token Aware client with RF=1, that would seem to preclude 
 hinting even without disabling HH for the entire system; if the coordinator 
 is always the single replica, why would it send a copy anywhere else?
 
 =Rob



Re: Setting up Cassandra to store on a specific node and not replicate

2013-12-19 Thread Sylvain Lebresne
On Wed, Dec 18, 2013 at 7:31 PM, Robert Coli rc...@eventbrite.com wrote:

 On Wed, Dec 18, 2013 at 2:44 AM, Sylvain Lebresne sylv...@datastax.comwrote:

 As Janne said, you could still have hint being written by other nodes if
 the one storage node is dead, but you can use the system
 property cassandra.maxHintTTL to 0 to disable hints.


 If one uses a Token Aware client with RF=1, that would seem to preclude
 hinting even without disabling HH for the entire system; if the coordinator
 is always the single replica, why would it send a copy anywhere else?


Colin explicitly said that he would several nodes and I said I wasn't going
to judge, so I implicitly assumed there was a reason for having multiple
nodes.

If you're going to always ever hit one node, then using a token aware
client is over-complicating it. Just use a one node cluster and you'll have
nothing to worry about or to configure.

That being said, Colin, do be aware that as far as I can tell there is
indeed relatively little benefit to having a multi-node cluster on which
all data is on one node (in particular, there is no cache at the
coordinator level, so that even if your client hit other nodes, everything
will still be forwarded to the one node that stores it all, the other nodes
won't store anything really, not even in memory).

--
Sylvain


Re: Setting up Cassandra to store on a specific node and not replicate

2013-12-18 Thread Janne Jalkanen

This may be hard because the coordinator could store hinted handoff (HH) data 
on disk. You could turn HH off and have RF=1 to keep data on a single instance, 
but you would be likely to lose data if you had any problems with your 
instances… Also you would need to tweak the memtable flushing so that it goes 
to disk more often than the ten seconds which is the default. Or lose data. You 
will also have an interesting time scaling your cluster and would have to 
plan for that in your custom database.

Essentially you want to turn off all the features which make Cassandra a robust 
product ;-). Without knowing your requirements more precisely, I'd be inclined 
to recommend manually sharding on MariaDB or Postgres instances instead, or use 
their underlying storage engines directly (e.g. InnoDB), if you're just looking 
for a key-value store.

/Janne

On 18 Dec 2013, at 11:20, Colin MacDonald colin.macdon...@sas.com wrote:

 Ahoy the list.  I am evaluating Cassandra in the context of using it as a 
 storage back end for the Titan graph database.
  
 We’ll have several nodes in the cluster.  However, one of our requirements is 
 that data has to be loaded into and stored on a specific node and only on 
 that node.  Also, it cannot be replicated around the system, at least not 
 stored persistently on disk – we will of course make copies in memory and on 
 the wire as we access remote notes.  These requirements are non-negotiable.
  
 We understand that this is essentially the opposite of what Cassandra is 
 designed for, and that we’re missing all the scalability and robustness, but 
 is it technically possible?
  
 First, I would need to create a custom partitioner – is there any tutorial on 
 that?  I see a few “you don’t need” to threads, but I do.
  
 Second, how easy is it to have Cassandra not replicate data between nodes in 
 a cluster?  I’m not seeing an obvious configuration option for that, 
 presumably because it obviates much of the point of using Cassandra, but 
 again, we’re working within some rather unfortunate constraints.
  
 Any hints or suggestions would be most gratefully received.
  
 Kind regards,
  
 -Colin MacDonald-
  



RE: Setting up Cassandra to store on a specific node and not replicate

2013-12-18 Thread Colin MacDonald
 -Original Message-
 From: Janne Jalkanen [mailto:janne.jalka...@ecyrd.com]
 
 Essentially you want to turn off all the features which make Cassandra a
 robust product ;-).

Oh, I don't want to, but sadly those are the requirements that I have to work 
with.

Again, the context is using it as the storage back for a graph database.  I'm 
currently looking at the Titan graph DBMS, which supports the use of Cassandra 
or HBase for a distributed graph, both of which will need to be hobbled to 
prevent them working the way they're designed.

So it really is a question of: *can* I cripple Cassandra in this way, and if so 
how?

Thanks for the response.

-Colin MacDonald- 



Re: Setting up Cassandra to store on a specific node and not replicate

2013-12-18 Thread Sylvain Lebresne
You seem to be well aware that you're not looking at using Cassandra for
what it is designed for (which obviously imply you'll need to expect
under-optimal behavior), so I'm not going to insist on it.

As to how you could achieve that, a relatively simple solution (that do not
require writing your own partitioner) would consist in using 2 datacenters
(that obviously don't have to be real physical datacenter), to put the one
that should have it all in one datacenter with RF=1 and to pull all other
nodes in the other datacenter with RF=0.

As Janne said, you could still have hint being written by other nodes if
the one storage node is dead, but you can use the system
property cassandra.maxHintTTL to 0 to disable hints.

--
Sylvain


On Wed, Dec 18, 2013 at 10:20 AM, Colin MacDonald
colin.macdon...@sas.comwrote:

  Ahoy the list.  I am evaluating Cassandra in the context of using it as
 a storage back end for the Titan graph database.



 We’ll have several nodes in the cluster.  However, one of our
 requirements is that data has to be loaded into and stored on a specific
 node and only on that node.  Also, it cannot be replicated around the
 system, at least not stored persistently on disk – we will of course make
 copies in memory and on the wire as we access remote notes.  These
 requirements are non-negotiable.



 We understand that this is essentially the opposite of what Cassandra is
 designed for, and that we’re missing all the scalability and robustness,
 but is it technically possible?



 First, I would need to create a custom partitioner – is there any
 tutorial on that?  I see a few “you don’t need” to threads, but I do.



 Second, how easy is it to have Cassandra not replicate data between nodes
 in a cluster?  I’m not seeing an obvious configuration option for that,
 presumably because it obviates much of the point of using Cassandra, but
 again, we’re working within some rather unfortunate constraints.



 Any hints or suggestions would be most gratefully received.



 Kind regards,



 -Colin MacDonald-





RE: Setting up Cassandra to store on a specific node and not replicate

2013-12-18 Thread Colin MacDonald
 -Original Message-
 From: Sylvain Lebresne [mailto:sylv...@datastax.com]
 Sent: 18 December 2013 10:45
 
 You seem to be well aware that you're not looking at using Cassandra for
 what it is designed for (which obviously imply you'll need to expect under-
 optimal behavior), so I'm not going to insist on it.

Very kind of you. ;)

I'm suspect that that this requirement is viscerally horrifying, but as I said, 
it's idiosyncratic, specified by an... idiosyncrat.

It's a pragmatic solution that I'm looking for, just to get a proof of concept 
going, it doesn't have to be elegant at this stage.

 As to how you could achieve that, a relatively simple solution (that do not
 require writing your own partitioner) would consist in using 2 datacenters
 (that obviously don't have to be real physical datacenter), to put the one 
 that
 should have it all in one datacenter with RF=1 and to pull all other nodes in
 the other datacenter with RF=0.
 
 As Janne said, you could still have hint being written by other nodes if the
 one storage node is dead, but you can use the system property
 cassandra.maxHintTTL to 0 to disable hints.

Thanks Sylvain, I'll look into that.  I'm coming to Cassandra cold, I hadn't 
even spotted that the replication factor was configurable - I don't see an 
option for in the cassandra.yaml that came with 2.0.2.  I should be able to 
figure it out though, and that's great news, it looks like it takes care of one 
issue.

However, I'm not immediately seeing how to control which node will get the 
single copy of the data.  Won't the partitioner still allocate data around the 
cluster?

Ah, is a datacentre a logical group *within* an overall cluster?  So I can 
create a separate datacentre for each node, and if I write to that node the 
data will be forced to stay in that datacentre, i.e. that node?

I do apologise for the noobish questions, my attention is currently split 
between investigating several possible solutions.  I rather favour Cassandra 
though, if I can hobble it appropriately.

Kind regards,

-Colin MacDonald- 



RE: Setting up Cassandra to store on a specific node and not replicate

2013-12-18 Thread Colin MacDonald
 -Original Message-
 From: Sylvain Lebresne [mailto:sylv...@datastax.com]
 Sent: 18 December 2013 12:46
 Google up NetworkTopologyStrategy. This is what you want to use and it's
 not configured in cassandra.yaml but when you create the keyspace.
 
 Basically, you define your topology in cassandra-topology.yaml (where you
 basically manually set which node is in which DC, which you can really just
 see as assigning nodes to named groups) and then you can define the
 replication factor for each DC (so if RF=1 on the 1 node group and 0 on the
 other nodes group, C* will gladly honor it and store no data on node of the
 other nodes group).
 
 --
 Sylvain

Thank you so much, that's clear and helpful.  I appreciate you taking the time 
to explain it.

-Colin-


Re: Setting up Cassandra to store on a specific node and not replicate

2013-12-18 Thread Robert Coli
On Wed, Dec 18, 2013 at 2:44 AM, Sylvain Lebresne sylv...@datastax.comwrote:

 As Janne said, you could still have hint being written by other nodes if
 the one storage node is dead, but you can use the system
 property cassandra.maxHintTTL to 0 to disable hints.


If one uses a Token Aware client with RF=1, that would seem to preclude
hinting even without disabling HH for the entire system; if the coordinator
is always the single replica, why would it send a copy anywhere else?

=Rob