Hi all,   I'm using PostgreSQL 9.3.2 on x86_64-unknown-linux-gnu 
 I have a schema where I have lots of messages and some users who might have 
read some of them. When a message is read by a user I create an entry i a table 
message_property holding the property (is_read) for that user.   The schema is 
as follows:   drop table if exists message_property;
 drop table if exists message;
 drop table if exists person;   create table person(
     id serial primary key,
     username varchar not null unique
 );   create table message(
     id serial primary key,
     subject varchar
 );   create table message_property(
     message_id integer not null references message(id),
     person_id integer not null references person(id),
     is_read boolean not null default false,
     unique(message_id, person_id)
 );   insert into person(username) values('user_' || generate_series(0, 999));
 insert into message(subject) values('Subject ' || random() || 
generate_series(0, 999999));
 insert into message_property(message_id, person_id, is_read) select id, 1, 
true from message order by id limit 999990;
 insert into message_property(message_id, person_id, is_read) select id, 1, 
false from message order by id limit 5 offset 999990; analyze;   So, for person 
1 there are 10 unread messages, out of a total 1mill. 5 of those unread does 
not have an entry in message_property and 5 have an entry and is_read set to 
FALSE.   I have the following query to list all un-read messages for person 
with id=1:   SELECT
     m.id                          AS message_id,
     prop.person_id,
     coalesce(prop.is_read, FALSE) AS is_read,
     m.subject
 FROM message m
     LEFT OUTER JOIN message_property prop ON prop.message_id = m.id AND 
prop.person_id = 1
 WHERE 1 = 1
       AND NOT EXISTS(SELECT
                          *
                      FROM message_property pr
                      WHERE pr.message_id = m.id AND pr.person_id = 
prop.person_id AND prop.is_read = TRUE)
     ; 
 The problem is that it's not quite efficient and performs badly, explain 
analyze shows: 
                                                                                
         
QUERY PLAN
 
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  Merge Anti Join  (cost=1.27..148784.09 rows=5 width=40) (actual 
time=918.906..918.913 rows=10 loops=1)
    Merge Cond: (m.id = pr.message_id)
    Join Filter: (prop.is_read AND (pr.person_id = prop.person_id))
    Rows Removed by Join Filter: 5
    ->  Merge Left Join  (cost=0.85..90300.76 rows=1000000 width=40) (actual 
time=0.040..530.748 rows=1000000 loops=1)
          Merge Cond: (m.id = prop.message_id)
          ->  Index Scan using message_pkey on message m  (cost=0.42..34317.43 
rows=1000000 width=35) (actual time=0.014..115.829 rows=1000000 loops=1)
          ->  Index Scan using message_property_message_id_person_id_key on 
message_property prop  (cost=0.42..40983.40 rows=999995 width=9) (actual 
time=0.020..130.728 rows=999995 loops=1)
                Index Cond: (person_id = 1)
    ->  Index Only Scan using message_property_message_id_person_id_key on 
message_property pr  (cost=0.42..40983.40 rows=999995 width=8) (actual 
time=0.024..140.349 rows=999995 loops=1)
          Index Cond: (person_id = 1)
          Heap Fetches: 999995
  Total runtime: 918.975 ms
 (13 rows)   
 Does anyone have suggestions on how to optimize the query or schema? It's 
important that any message not having an entry in message_property for a user 
is considered un-read. 
 Thanks!   -- Andreas Jospeh Krogh CTO / Partner - Visena AS Mobile: +47 909 
56 963 andr...@visena.com <mailto:andr...@visena.com> www.visena.com 
<https://www.visena.com>  <https://www.visena.com>

Reply via email to