Re: 答复: Time serial column family design

Eric Plowe Tue, 17 Apr 2018 21:05:40 -0700

Jon,

Great article. Thank you. (I have nothing to do with this issue, but I
appreciate nuggets of information I glean from the list)


Regards,

Eric
On Tue, Apr 17, 2018 at 10:57 PM Jonathan Haddad <j...@jonhaddad.com> wrote:

> To add to what Nate suggested, we have an entire blog post on scaling time
> series data models:
>
>
> http://thelastpickle.com/blog/2017/08/02/time-series-data-modeling-massive-scale.html
>
> Jon
>
>
> On Tue, Apr 17, 2018 at 7:39 PM Nate McCall <n...@thelastpickle.com>
> wrote:
>
>> I disagree. Create date as a raw integer is an excellent surrogate for
>> controlling time series "buckets" as it gives you complete control over the
>> granularity. You can even have multiple granularities in the same table -
>> remember that partition key "misses" in Cassandra are pretty lightweight as
>> they won't make it past the bloom filter on the read path.
>>
>> On Wed, Apr 18, 2018 at 10:00 AM, Javier Pareja <pareja.jav...@gmail.com>
>> wrote:
>>
>>> Hi David,
>>>
>>> Could you describe why you chose to include the create date in the
>>> partition key? If the vin in enough "partitioning", meaning that the size
>>> (number of rows x size of row) of each partition is less than 100MB, then
>>> remove the date and just use the create_time, because the date is already
>>> included in that column anyways.
>>>
>>> For example if columns "a" and "b" (from your table) are of max 256 UTF8
>>> characters, then you can have approx 100MB / (2*256*2Bytes) = 100,000 rows
>>> per partition. You can actually have many more but you don't want to go
>>> much higher for performance reasons.
>>>
>>> If this is not enough you could use create_month instead of create_date,
>>> for example, to reduce the partition size while not being too granular.
>>>
>>>
>>> On Tue, 17 Apr 2018, 22:17 Nate McCall, <n...@thelastpickle.com> wrote:
>>>
>>>> Your table design will work fine as you have appropriately bucketed by
>>>> an integer-based 'create_date' field.
>>>>
>>>> Your goal for this refactor should be to remove the "IN" clause from
>>>> your code. This will move the rollup of multiple partition keys being
>>>> retrieved into the client instead of relying on the coordinator assembling
>>>> the results. You have to do more work and add some complexity, but the
>>>> trade off will be much higher performance as you are removing the single
>>>> coordinator as the bottleneck.
>>>>
>>>> On Tue, Apr 17, 2018 at 10:05 PM, Xiangfei Ni <xiangfei...@cm-dt.com>
>>>> wrote:
>>>>
>>>>> Hi Nate,
>>>>>
>>>>>     Thanks for your reply!
>>>>>
>>>>>     Is there other way to design this table to meet this requirement?
>>>>>
>>>>>
>>>>>
>>>>> Best Regards,
>>>>>
>>>>>
>>>>>
>>>>> 倪项菲*/ **David Ni*
>>>>>
>>>>> 中移德电网络科技有限公司
>>>>>
>>>>> Virtue Intelligent Network Ltd, co.
>>>>>
>>>>> Add: 2003,20F No.35 Luojia creative city,Luoyu Road,Wuhan,HuBei
>>>>>
>>>>> Mob: +86 13797007811|Tel: + 86 27 5024 2516
>>>>>
>>>>>
>>>>>
>>>>> *发件人:* Nate McCall <n...@thelastpickle.com>
>>>>> *发送时间:* 2018年4月17日 7:12
>>>>> *收件人:* Cassandra Users <user@cassandra.apache.org>
>>>>> *主题:* Re: Time serial column family design
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> Select * from test where vin =“ZD41578123DSAFWE12313” and create_date
>>>>> in (20180416, 20180415, 20180414, 20180413, 20180412………………………………….);
>>>>>
>>>>> But this cause the cql query is very long,and I don’t know whether
>>>>> there is limitation for the length of the cql.
>>>>>
>>>>> Please give me some advice,thanks in advance.
>>>>>
>>>>>
>>>>>
>>>>> Using the SELECT ... IN syntax  means that:
>>>>>
>>>>> - the driver will not be able to route the queries to the nodes which
>>>>> have the partition
>>>>>
>>>>> - a single coordinator must scatter-gather the query and results
>>>>>
>>>>>
>>>>>
>>>>> Break this up into a series of single statements using the
>>>>> executeAsync method and gather the results via something like Futures in
>>>>> Guava or similar.
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> -----------------
>>>> Nate McCall
>>>> Wellington, NZ
>>>> @zznate
>>>>
>>>> CTO
>>>> Apache Cassandra Consulting
>>>> http://www.thelastpickle.com
>>>>
>>>
>>
>>
>> --
>> -----------------
>> Nate McCall
>> Wellington, NZ
>> @zznate
>>
>> CTO
>> Apache Cassandra Consulting
>> http://www.thelastpickle.com
>>
>

Re: 答复: Time serial column family design

Reply via email to