B would work better in the case where you need to do sequential or ranged
style reads on the id, particularly if id has any significant sparseness
(eg, id is a timeuuid).  You can compute the buckets and do reads of entire
buckets within your range.  However if you're doing random access by id,
then you'll have a lot of bloom filter true positives (on the partition
key), but where the clustering key still doesn't exist.

We use both types of model for differing situations.  In one our reads are
totally random access, and we just use id as the sole key, in the other we
need to reassemble all objects that happen in a range, but the object ID's
are reasonably sparse, so we have time-bound bucket as the partition key
and the id as the clustering key.

The appropriate density of rows in your partition/bucket will depend on
your typical read patterns, look at striving for some multiple of your
typical read ranges (eg, if you typically would query for all objects
within a day, bucket might be 1 or 2 hours, if you typically query by hour,
perhaps bucket is 10 minutes, etc.).  Practically speaking, depending on
your hardware you'll want to try to keep your partitions under anywhere
from a few hundred kb to a mb if possible just to reduce gc pressure and
improve other operations like repair.

On Fri Dec 05 2014 at 11:04:22 AM DuyHai Doan <doanduy...@gmail.com> wrote:

> Another argument for table A is that it leverages a lot Bloom filter for
> fast lookup. If negative, no disk hit otherwise at most 1 or 2 disk hits
> depending on the fp chance.
>
> Compaction also works better on skinny partition.
>
> On Fri, Dec 5, 2014 at 6:33 PM, Tyler Hobbs <ty...@datastax.com> wrote:
>
>>
>> On Fri, Dec 5, 2014 at 11:14 AM, Robert Wille <rwi...@fold3.com> wrote:
>>
>>>
>>>  And lets say that bucket is computed as id / N. For analysis purposes,
>>> lets assume I have 100 million id’s to store.
>>>
>>>  Table a is obviously going to have a larger bloom filter. That’s a
>>> clear negative.
>>>
>>
>> That's true, *but*, if you frequently ask for rows that do not exist,
>> Table B will have few BF false positives, while Table A will almost always
>> get a "hit" from the BF and have to look into the SSTable to see that the
>> requested row doesn't actually exist.
>>
>>
>>>
>>>  When I request a record, table a will have less data to load from
>>> disk, so that seems like a positive.
>>>
>>
>> Correct.
>>
>>
>>>
>>>  Table a will never have its columns scattered across multiple
>>> SSTables, but table b might. If I only want one row from a partition in
>>> table b, does fragmentation matter (I think probably not, but I’m not sure)?
>>>
>>
>> Yes, fragmentation can matter.  Cassandra knows the min and max
>> clustering column values for each SSTable, so it can use those to narrow
>> down the set of SSTables it needs to read if you request a specific
>> clustering column value.  However, in your example, this isn't likely to
>> narrow things down much, so it will have to check many more SSTables.
>>
>>
>>>
>>>  It’s not clear to me which will fit more efficiently on disk, but I
>>> would guess that table a wins.
>>>
>>
>> They're probably close enough not to matter very much.
>>
>>
>>>
>>>  Smaller partitions means sending less data during repair, but I
>>> suspect that when computing the Merkle tree for the table, more partitions
>>> might mean more overhead, but that’s only a guess. Which one repairs more
>>> efficiently?
>>>
>>
>> Table A repairs more efficiently by far.  Currently repair must repair
>> entire partitions when they differ.  It cannot repair individual rows
>> within a partition.
>>
>>
>>>
>>>  In your opinion, which one is best and why? If you think table b is
>>> best, what would you choose N to be?
>>>
>>
>> Table A, hands down.  Here's why: you should model your tables to fit
>> your queries.  If you're doing a basic K/V lookup, model it like table A.
>> People recommend wide partitions because many (if not most) queries are
>> best served by that type of model, so if you're not using wide partitions,
>> it's a sign that something might be wrong.  However, there are certainly
>> plenty of use cases where single-row partitions are fine.
>>
>>
>> --
>> Tyler Hobbs
>> DataStax <http://datastax.com/>
>>
>
>

Reply via email to