"The problem is whether I should denormalize details of pins into the board
table or just retrieve pins by page (page size can be 10~20) and then
multi-get by pin_ids to obtain details"

--> Denormalize is the best way to go in your case. Otherwise, for 1 board
read, you'll have 10-20 subsequent reads to load the pins. Multiply it by
the number of users listing boards and you'll be quickly in trouble...

 For the update of pins like count, you'll need to use counter type.

 "denormalization seems to bring a lot of load on the write side as well as
application code complexity" --> first C* copes quite well with write load.
Second, you should ask yourself: "how often is the update scenario vs the
read scenario" ? Usually the read pattern is predominant.

 About update code complexity it's the price to pay for read performance.
CQRS pattern will help you to separate the write and read stages, also
heavy unit and integration testing.



On Fri, May 16, 2014 at 5:14 AM, ziju feng <pkdog...@gmail.com> wrote:

> Hello,
>
> I'm working on data modeling for a Pinterest-like project. There are
> basically two main concepts: Pin and Board, just like Pinterest, where pin
> is an item containing an image, description and some other information such
> as a like count, and each board should contain a sorted list of Pins.
>
> The board can be modeled with primary key (board_id, created_at, pin_id)
> where created_at is used to sort the pins of the board by date. The problem
> is whether I should denormalize details of pins into the board table or
> just retrieve pins by page (page size can be 10~20) and then multi-get by
> pin_ids to obtain details.
>
> Since there are some boards that are accessed very often (like the home
> board), denormalization seems to be a reasonable choice to enhance read
> performance. However, we then have to update not only the pin table be also
> each row in the board table that contains the pin whenever a pin is
> updated, which sometimes could be quite frequent (such as updating the like
> count). Since a pin may be contained by many boards (could be thousands),
> denormalization seems to bring a lot of load on the write side as well as
> application code complexity.
>
> Any suggestion to whether our data model should go denormalized or the
> normalized/multi-get way which then perhaps need a separate cached layer
> for read?
>
> Thanks,
>
> Ziju
>

Reply via email to