Re: two dimensional slicing

aaron morton Mon, 30 Jan 2012 10:34:52 -0800

(not trolling) but do you have any ideas on how ? 

The token produced by the partitioner is used as the key in the distributed 
hash table so we can map keys to nodes, and evenly distribute load.  If the 
range of tokens for the DHT are infinite it's difficult to evenly map them to a 
finite set of nodes.


So…

If you know that the number of DHT keys (and so row keys) are finite then it is 
easier to use the BOP. 

Or if you know that the row keys are something like a time series you could use 
the sort of approach used with Horizontal Partitioning in a RDBMS and run a 
sliding window of nodes. Every month drop the oldest partition / node off the 
end and add a new one for the next month. 

Just some thoughts.
A

-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 30/01/2012, at 7:19 PM, Terje Marthinussen wrote:

> 
> 
> On Sun, Jan 29, 2012 at 7:26 PM, aaron morton <aa...@thelastpickle.com> wrote:
>> and compare them, but at this point I need to focus on one to get
>> things working, so I'm trying to make a best initial guess.
> I would go for RP then, BOP may look like less work to start with but it 
> *will* bite you later. If you use an increasing version number as a key you 
> will get a hot spot. Get it working with RP and Standard CF's, accept the 
> extra lookups, and then see if where you are performance / complexity wise. 
> Cassandra can be pretty fast.
> 
> Of course, there is no guarantee that it will bite you.
> 
> Whatever data hotspot you may get may very well be minor vs. the advantage of 
> slicing continous blocks of data on a single server vs. random bits and 
> pieces all over the place.
> 
> For instance, there are many large data repositories out there of analytic 
> data which only have a few queries per hour. BOP will most likely have no 
> performance at all for many of these, indeed, it may be much faster than the 
> alternatives.
> 
> BOP is very useful and powerful for many things and saves a fair chunk of 
> development time vs. the alternatives when you can use it.
> 
> If we really want everybody to stop using it, we should change cassandra so 
> it by default can provide the same function in some other way without adding 
> days and maybe weeks of development and extra complexity to your project.
>  
> Terje
> 
>

Re: two dimensional slicing

Reply via email to