Re: [PERFORM] Optimize query for listing un-read messages

2014-05-02 Thread Andreas Joseph Krogh
På fredag 02. mai 2014 kl. 02:17:58, skrev Craig James cja...@emolecules.com 
mailto:cja...@emolecules.com: On Thu, May 1, 2014 at 4:26 AM, Andreas Joseph 
Kroghandr...@visena.com mailto:andr...@visena.com wrote: I have a schema 
where I have lots of messages and some users who might have read some of them. 
When a message is read by a user I create an entry i a table message_property 
holding the property (is_read) for that user.  The schema is as follows: [...]  
 create table person(    id serial primary key,
     username varchar not null unique
 );   create table message(
     id serial primary key,
     subject varchar
 );   create table message_property(
     message_id integer not null references message(id),
     person_id integer not null references person(id),
     is_read boolean not null default false,
     unique(message_id, person_id)
 );   [...]  So, for person 1 there are 10 unread messages, out of a total 
1mill. 5 of those unread does not have an entry in message_property and 5 have 
an entry and is_read set to FALSE.   Here's a possible enhancement: add two 
columns, an indexed timestamp to the message table, and a timestamp of the 
oldest message this user has NOT read on the person table. If most users read 
messages in a timely fashion, this would (in most cases) narrow down the 
portion of the messages table to a tiny fraction of the total -- just those 
messages newer than the oldest message this user has not read.
   When you sign up a new user, you can set his timestamp to the time the 
account was created, since presumably messages before that time don't apply.
   Whether this will help depends a lot on actual use patterns, i.e. do users 
typically read all messages or do they leave a bunch of unread messages sitting 
around forever?   Thanks fort the suggestion. A user must be able to read 
arbitrary old messages, and messages don't expire.   -- Andreas Jospeh Krogh 
CTO / Partner - Visena AS Mobile: +47 909 56 963 andr...@visena.com 
mailto:andr...@visena.com www.visena.com https://www.visena.com  
https://www.visena.com  

Re: [PERFORM] Optimize query for listing un-read messages

2014-05-02 Thread Vitalii Tymchyshyn
What statistics do you have on the data? I suppose most messages are read
by low number of users, mostly 0 or one.
I can see two options to consider:
1) Use arrays to store information on which users have already read the
message. You may need GIN/GIST index to search fast.
2) Introduce some kind of special column(s) for the cases when the message
is unread by everybody or was read by at most one user. E.g. read_by
columns with null value for unread, special value for read by many and real
user if read by only one.
in this case your condition would be (read_by is null or read_by not in
(current_user or special_value) or (read_by = special_value and not
exists()). Note that optimizer may have problems with such a complex
expression nd you may need to use union all instead on or. Partial
index(es) for null/special value may help.

Best regards, Vitalii Tymchyshyn


2014-05-02 10:20 GMT+03:00 Andreas Joseph Krogh andr...@visena.com:

 På fredag 02. mai 2014 kl. 02:17:58, skrev Craig James 
 cja...@emolecules.com:

 On Thu, May 1, 2014 at 4:26 AM, Andreas Joseph Krogh 
 andr...@visena.comwrote:

 I have a schema where I have lots of messages and some users who might
 have read some of them. When a message is read by a user I create an entry
 i a table message_property holding the property (is_read) for that user.

 The schema is as follows:

 [...]


 create table person(
 id serial primary key,
 username varchar not null unique
 );

 create table message(
 id serial primary key,
 subject varchar
 );

 create table message_property(
 message_id integer not null references message(id),
 person_id integer not null references person(id),
 is_read boolean not null default false,
 unique(message_id, person_id)
 );


 [...]

  So, for person 1 there are 10 unread messages, out of a total 1mill. 5
 of those unread does not have an entry in message_property and 5 have an
 entry and is_read set to FALSE.


 Here's a possible enhancement: add two columns, an indexed timestamp to
 the message table, and a timestamp of the oldest message this user has NOT
 read on the person table. If most users read messages in a timely fashion,
 this would (in most cases) narrow down the portion of the messages table to
 a tiny fraction of the total -- just those messages newer than the oldest
 message this user has not read.

 When you sign up a new user, you can set his timestamp to the time the
 account was created, since presumably messages before that time don't apply.

 Whether this will help depends a lot on actual use patterns, i.e. do users
 typically read all messages or do they leave a bunch of unread messages
 sitting around forever?


 Thanks fort the suggestion. A user must be able to read arbitrary old
 messages, and messages don't expire.

  --
 *Andreas Jospeh Krogh*
 CTO / Partner - Visena AS
 Mobile: +47 909 56 963
 andr...@visena.com
 www.visena.com
  https://www.visena.com