Jon, Great article. Thank you. (I have nothing to do with this issue, but I appreciate nuggets of information I glean from the list)
Regards, Eric On Tue, Apr 17, 2018 at 10:57 PM Jonathan Haddad <j...@jonhaddad.com> wrote: > To add to what Nate suggested, we have an entire blog post on scaling time > series data models: > > > http://thelastpickle.com/blog/2017/08/02/time-series-data-modeling-massive-scale.html > > Jon > > > On Tue, Apr 17, 2018 at 7:39 PM Nate McCall <n...@thelastpickle.com> > wrote: > >> I disagree. Create date as a raw integer is an excellent surrogate for >> controlling time series "buckets" as it gives you complete control over the >> granularity. You can even have multiple granularities in the same table - >> remember that partition key "misses" in Cassandra are pretty lightweight as >> they won't make it past the bloom filter on the read path. >> >> On Wed, Apr 18, 2018 at 10:00 AM, Javier Pareja <pareja.jav...@gmail.com> >> wrote: >> >>> Hi David, >>> >>> Could you describe why you chose to include the create date in the >>> partition key? If the vin in enough "partitioning", meaning that the size >>> (number of rows x size of row) of each partition is less than 100MB, then >>> remove the date and just use the create_time, because the date is already >>> included in that column anyways. >>> >>> For example if columns "a" and "b" (from your table) are of max 256 UTF8 >>> characters, then you can have approx 100MB / (2*256*2Bytes) = 100,000 rows >>> per partition. You can actually have many more but you don't want to go >>> much higher for performance reasons. >>> >>> If this is not enough you could use create_month instead of create_date, >>> for example, to reduce the partition size while not being too granular. >>> >>> >>> On Tue, 17 Apr 2018, 22:17 Nate McCall, <n...@thelastpickle.com> wrote: >>> >>>> Your table design will work fine as you have appropriately bucketed by >>>> an integer-based 'create_date' field. >>>> >>>> Your goal for this refactor should be to remove the "IN" clause from >>>> your code. This will move the rollup of multiple partition keys being >>>> retrieved into the client instead of relying on the coordinator assembling >>>> the results. You have to do more work and add some complexity, but the >>>> trade off will be much higher performance as you are removing the single >>>> coordinator as the bottleneck. >>>> >>>> On Tue, Apr 17, 2018 at 10:05 PM, Xiangfei Ni <xiangfei...@cm-dt.com> >>>> wrote: >>>> >>>>> Hi Nate, >>>>> >>>>> Thanks for your reply! >>>>> >>>>> Is there other way to design this table to meet this requirement? >>>>> >>>>> >>>>> >>>>> Best Regards, >>>>> >>>>> >>>>> >>>>> 倪项菲*/ **David Ni* >>>>> >>>>> 中移德电网络科技有限公司 >>>>> >>>>> Virtue Intelligent Network Ltd, co. >>>>> >>>>> Add: 2003,20F No.35 Luojia creative city,Luoyu Road,Wuhan,HuBei >>>>> >>>>> Mob: +86 13797007811|Tel: + 86 27 5024 2516 >>>>> >>>>> >>>>> >>>>> *发件人:* Nate McCall <n...@thelastpickle.com> >>>>> *发送时间:* 2018年4月17日 7:12 >>>>> *收件人:* Cassandra Users <user@cassandra.apache.org> >>>>> *主题:* Re: Time serial column family design >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> Select * from test where vin =“ZD41578123DSAFWE12313” and create_date >>>>> in (20180416, 20180415, 20180414, 20180413, 20180412………………………………….); >>>>> >>>>> But this cause the cql query is very long,and I don’t know whether >>>>> there is limitation for the length of the cql. >>>>> >>>>> Please give me some advice,thanks in advance. >>>>> >>>>> >>>>> >>>>> Using the SELECT ... IN syntax means that: >>>>> >>>>> - the driver will not be able to route the queries to the nodes which >>>>> have the partition >>>>> >>>>> - a single coordinator must scatter-gather the query and results >>>>> >>>>> >>>>> >>>>> Break this up into a series of single statements using the >>>>> executeAsync method and gather the results via something like Futures in >>>>> Guava or similar. >>>>> >>>> >>>> >>>> >>>> -- >>>> ----------------- >>>> Nate McCall >>>> Wellington, NZ >>>> @zznate >>>> >>>> CTO >>>> Apache Cassandra Consulting >>>> http://www.thelastpickle.com >>>> >>> >> >> >> -- >> ----------------- >> Nate McCall >> Wellington, NZ >> @zznate >> >> CTO >> Apache Cassandra Consulting >> http://www.thelastpickle.com >> >