Re: Data model question, storing Queue Message

Morgan Segalis Mon, 30 Apr 2012 05:23:08 -0700

Hi Samal,

Thanks for the TTL feature, I wasn't aware of it's existence.


Day's partitioning will be less wider than month partitionning (about 30 times 
less give or take ;-) )
Per day it should have something like 100 000 messages stored, most of it would 
be retrieved so deleted before the TTL feature should come do it's work.

Le 30 avr. 2012 à 13:16, samal a écrit :

> 
> 
> On Mon, Apr 30, 2012 at 4:25 PM, Morgan Segalis <msega...@gmail.com> wrote:
> Hi Aaron,
> 
> Thank you for your answer, I was beginning to think that my question would 
> never be answered ;-)
> 
> Actually, this is what I was going for, except one thing, instead of 
> partitioning row per month, I though about partitioning per day, like that 
> everyday I launch the cleaning tool, and it will delete the day from X month 
> earlier.
> 
> USE TTL feature of column as it will remove column after TTL is over (no need 
> for manual job). 
> 
> I guess that will reduce the workload drastically, does it have any downside 
> comparing to month partitioning?
> 
> key belongs to particular node , so depending on size of your data day or 
> month wise partitioning matters. Other wise it can lead to Fat row which will 
> cause system problem. 
> 
>  
> At one point I was going to do something like the twissandra example, Having 
> a CF per User's queue, and another CF per day storing every message's ID of 
> the day, in that way If I want to delete them, I only look into this row, and 
> delete them using ID's for deleting them in the User's queue CF… Is that a 
> good way to do ? Or should I stick with the first implementation ?
> 
> Best regards,
> 
> Morgan.
> 
> Le 30 avr. 2012 à 05:52, aaron morton a écrit :
> 
>> Message Queue is often not a great use case for Cassandra. For information 
>> on how to handle high delete workloads see 
>> http://www.datastax.com/dev/blog/leveled-compaction-in-apache-cassandra
>> 
>> It hard to create a model without some idea of the data load, but I would 
>> suggest you start with:
>> 
>> CF: UserMessages
>> Key: ReceiverID
>> Columns : column name = TimeUUID ; column value = message ID and Body
>> 
>> That will order the messages by time. 
>> 
>> Depending on load (and to support deleting a previous months messages) you 
>> may want to partition the rows by month:
>> 
>> CF: UserMessagesMonth
>> Key: ReceiverID+YYYYMM
>> Columns : column name = TimeUUID ; column value = message ID and Body
>> 
>> Everything the same as before. But now a user has a row for each month and 
>> which you can delete as a whole. This also helps avoid very big rows. 
>> 
>>> I really don't think that storage will be an issue, I have 2TB per nodes, 
>>> messages are 1KB limited.
>> I would suggest you keep the per node limit to 300 to 400 GB. It can take a 
>> long time to compact, repair and move the data when it gets above 400GB. 
>> 
>> Hope that helps. 
>> 
>> -----------------
>> Aaron Morton
>> Freelance Developer
>> @aaronmorton
>> http://www.thelastpickle.com
>> 
>> On 27/04/2012, at 1:30 AM, Morgan Segalis wrote:
>> 
>>> Hi everyone !
>>> 
>>> I'm fairly new to cassandra and I'm not quite yet familiarized with column 
>>> oriented NoSQL model.
>>> I have worked a while on it, but I can't seems to find the best model for 
>>> what I'm looking for.
>>> 
>>> I have a Erlang software that let user connecting and communicate with each 
>>> others, when an user (A) sends
>>> a message to a disconnected user (B), it stores it on the database and wait 
>>> for the user (B) to connect and retrieve
>>> the message queue, and deletes it. 
>>> 
>>> Here's some key point : 
>>> - Users are identified by integer IDs
>>> - Each message are unique by combination of : Sender ID - Receiver ID - 
>>> Message ID - time
>>> 
>>> I have a queue Message, and here's the operations I would need to do as 
>>> fast as possible : 
>>> 
>>> - Store from 1 to X messages per registered user
>>> - Get the number of stored messages per user (Can be a incremental variable 
>>> updated at each store // this is often retrieved)
>>> - retrieve all messages from an user at once.
>>> - delete all messages from an user at once.
>>> - delete all messages that are older than Y months (from all users).
>>> 
>>> I really don't think that storage will be an issue, I have 2TB per nodes, 
>>> messages are 1KB limited.
>>> I'm really looking for speed rather than storage optimization.
>>> 
>>> My configuration is 2 dedicated server which are both :
>>> - 4 x Intel i7 2.66 Ghz
>>> - 64 bits
>>> - 24 Go
>>> - 2 TB
>>> 
>>> Thank you all.
>> 
> 
>

Re: Data model question, storing Queue Message

Reply via email to