Re: Data model for streaming a large table in real time.

Colin Sat, 07 Jun 2014 08:04:23 -0700

I believe Byteorderedpartitioner is being deprecated and for good reason.  I 
would look at what you could achieve by using wide rows and murmur3partitioner.




--
Colin
320-221-9531


> On Jun 6, 2014, at 5:27 PM, Kevin Burton <bur...@spinn3r.com> wrote:
> 
> We have the requirement to have clients read from our tables while they're 
> being written.
> 
> Basically, any write that we make to cassandra needs to be sent out over the 
> Internet to our customers.
> 
> We also need them to resume so if they go offline, they can just pick up 
> where they left off.
> 
> They need to do this in parallel, so if we have 20 cassandra nodes, they can 
> have 20 readers each efficiently (and without coordination) reading from our 
> tables.
> 
> Here's how we're planning on doing it.
> 
> We're going to use the ByteOrderedPartitioner .
> 
> I'm writing with a primary key of the timestamp, however, in practice, this 
> would yield hotspots.
> 
> (I'm also aware that time isn't a very good pk in a distribute system as I 
> can easily have a collision so we're going to use a scheme similar to a uuid 
> to make it unique per writer).
> 
> One node would take all the load, followed by the next node, etc.
> 
> So my plan to stop this is to prefix a slice ID to the timestamp.  This way 
> each piece of content has a unique ID, but the prefix will place it on a node.
> 
> The slide ID is just a byte… so this means there are 255 buckets in which I 
> can place data.  
> 
> This means I can have clients each start with a slice, and a timestamp, and 
> page through the data with tokens.
> 
> This way I can have a client reading with 255 threads from 255 regions in the 
> cluster, in parallel, without any hot spots.
> 
> Thoughts on this strategy?  
> 
> -- 
> Founder/CEO Spinn3r.com
> Location: San Francisco, CA
> Skype: burtonator
> blog: http://burtonator.wordpress.com
> … or check out my Google+ profile
> 
> War is peace. Freedom is slavery. Ignorance is strength. Corporations are 
> people.

Re: Data model for streaming a large table in real time.

Reply via email to