access pattern is select *from datacenter where datacentername = '' and publish>$time and publish<$time
On Wed, Aug 31, 2016 at 8:37 PM, Carlos Alonso <i...@mrcalonso.com> wrote: > Maybe a good question could be: > > Which is your access pattern to this data? > > Carlos Alonso | Software Engineer | @calonso <https://twitter.com/calonso> > > On 31 August 2016 at 11:47, Stone Fang <cnstonef...@gmail.com> wrote: > >> Hi all, >> have some questions on how to define clustering key. >> >> have a table like this >> >> CREATE TABLE datacenter{ >> >> datacentername varchar, >> >> publish timestamp, >> >> value varchar, >> >> PRIMARY KEY(datacentername,publish) >> >> } >> >> >> *issues:* >> there are only two datacenter,so the data would only have two >> partitions.and store >> in two nodes.want to spread the data evenly around the cluster. >> >> take this post for reference >> http://www.datastax.com/dev/blog/basic-rules-of-cassandra-data-modeling >> >> CREATE TABLE datacenter{ >> >> datacentername varchar, >> >> publish_pre text, >> >> publish timestamp, >> >> value varchar, >> >> PRIMARY KEY((datacentername,publish_pre),publish) >> >> } >> >> publish_pre is from 1~12 hours.*but the workload is high.i dont want to >> all workload inserted into one node in a hour.* >> >> have no idea on how to define the partition key to spread data evenly >> around the cluster,and the partition not split by time.which means that >> data should not be inserted one node at a certain time window. >> >> thanks >> stone >> > >