On Thu, May 1, 2014 at 4:26 AM, Andreas Joseph Krogh <andr...@visena.com>wrote:
> I have a schema where I have lots of messages and some users who might > have read some of them. When a message is read by a user I create an entry > i a table message_property holding the property (is_read) for that user. > > The schema is as follows: > [...] > > create table person( > id serial primary key, > username varchar not null unique > ); > > create table message( > id serial primary key, > subject varchar > ); > > create table message_property( > message_id integer not null references message(id), > person_id integer not null references person(id), > is_read boolean not null default false, > unique(message_id, person_id) > ); > [...] > So, for person 1 there are 10 unread messages, out of a total 1mill. 5 of > those unread does not have an entry in message_property and 5 have an entry > and is_read set to FALSE. > Here's a possible enhancement: add two columns, an indexed timestamp to the message table, and a "timestamp of the oldest message this user has NOT read" on the person table. If most users read messages in a timely fashion, this would (in most cases) narrow down the portion of the messages table to a tiny fraction of the total -- just those messages newer than the oldest message this user has not read. When you sign up a new user, you can set his timestamp to the time the account was created, since presumably messages before that time don't apply. Whether this will help depends a lot on actual use patterns, i.e. do users typically read all messages or do they leave a bunch of unread messages sitting around forever? Craig