Re: [PERFORM] Optimize query for listing un-read messages

2014-05-02 Thread Andreas Joseph Krogh
På fredag 02. mai 2014 kl. 02:17:58, skrev Craig James cja...@emolecules.com 
mailto:cja...@emolecules.com: On Thu, May 1, 2014 at 4:26 AM, Andreas Joseph 
Kroghandr...@visena.com mailto:andr...@visena.com wrote: I have a schema 
where I have lots of messages and some users who might have read some of them. 
When a message is read by a user I create an entry i a table message_property 
holding the property (is_read) for that user.  The schema is as follows: [...]  
 create table person(    id serial primary key,
     username varchar not null unique
 );   create table message(
     id serial primary key,
     subject varchar
 );   create table message_property(
     message_id integer not null references message(id),
     person_id integer not null references person(id),
     is_read boolean not null default false,
     unique(message_id, person_id)
 );   [...]  So, for person 1 there are 10 unread messages, out of a total 
1mill. 5 of those unread does not have an entry in message_property and 5 have 
an entry and is_read set to FALSE.   Here's a possible enhancement: add two 
columns, an indexed timestamp to the message table, and a timestamp of the 
oldest message this user has NOT read on the person table. If most users read 
messages in a timely fashion, this would (in most cases) narrow down the 
portion of the messages table to a tiny fraction of the total -- just those 
messages newer than the oldest message this user has not read.
   When you sign up a new user, you can set his timestamp to the time the 
account was created, since presumably messages before that time don't apply.
   Whether this will help depends a lot on actual use patterns, i.e. do users 
typically read all messages or do they leave a bunch of unread messages sitting 
around forever?   Thanks fort the suggestion. A user must be able to read 
arbitrary old messages, and messages don't expire.   -- Andreas Jospeh Krogh 
CTO / Partner - Visena AS Mobile: +47 909 56 963 andr...@visena.com 
mailto:andr...@visena.com www.visena.com https://www.visena.com  
https://www.visena.com  

Re: [PERFORM] Optimize query for listing un-read messages

2014-05-02 Thread Vitalii Tymchyshyn
What statistics do you have on the data? I suppose most messages are read
by low number of users, mostly 0 or one.
I can see two options to consider:
1) Use arrays to store information on which users have already read the
message. You may need GIN/GIST index to search fast.
2) Introduce some kind of special column(s) for the cases when the message
is unread by everybody or was read by at most one user. E.g. read_by
columns with null value for unread, special value for read by many and real
user if read by only one.
in this case your condition would be (read_by is null or read_by not in
(current_user or special_value) or (read_by = special_value and not
exists()). Note that optimizer may have problems with such a complex
expression nd you may need to use union all instead on or. Partial
index(es) for null/special value may help.

Best regards, Vitalii Tymchyshyn


2014-05-02 10:20 GMT+03:00 Andreas Joseph Krogh andr...@visena.com:

 På fredag 02. mai 2014 kl. 02:17:58, skrev Craig James 
 cja...@emolecules.com:

 On Thu, May 1, 2014 at 4:26 AM, Andreas Joseph Krogh 
 andr...@visena.comwrote:

 I have a schema where I have lots of messages and some users who might
 have read some of them. When a message is read by a user I create an entry
 i a table message_property holding the property (is_read) for that user.

 The schema is as follows:

 [...]


 create table person(
 id serial primary key,
 username varchar not null unique
 );

 create table message(
 id serial primary key,
 subject varchar
 );

 create table message_property(
 message_id integer not null references message(id),
 person_id integer not null references person(id),
 is_read boolean not null default false,
 unique(message_id, person_id)
 );


 [...]

  So, for person 1 there are 10 unread messages, out of a total 1mill. 5
 of those unread does not have an entry in message_property and 5 have an
 entry and is_read set to FALSE.


 Here's a possible enhancement: add two columns, an indexed timestamp to
 the message table, and a timestamp of the oldest message this user has NOT
 read on the person table. If most users read messages in a timely fashion,
 this would (in most cases) narrow down the portion of the messages table to
 a tiny fraction of the total -- just those messages newer than the oldest
 message this user has not read.

 When you sign up a new user, you can set his timestamp to the time the
 account was created, since presumably messages before that time don't apply.

 Whether this will help depends a lot on actual use patterns, i.e. do users
 typically read all messages or do they leave a bunch of unread messages
 sitting around forever?


 Thanks fort the suggestion. A user must be able to read arbitrary old
 messages, and messages don't expire.

  --
 *Andreas Jospeh Krogh*
 CTO / Partner - Visena AS
 Mobile: +47 909 56 963
 andr...@visena.com
 www.visena.com
  https://www.visena.com




[PERFORM] Optimize query for listing un-read messages

2014-05-01 Thread Andreas Joseph Krogh
Hi all,   I'm using PostgreSQL 9.3.2 on x86_64-unknown-linux-gnu 
 I have a schema where I have lots of messages and some users who might have 
read some of them. When a message is read by a user I create an entry i a table 
message_property holding the property (is_read) for that user.   The schema is 
as follows:   drop table if exists message_property;
 drop table if exists message;
 drop table if exists person;   create table person(
     id serial primary key,
     username varchar not null unique
 );   create table message(
     id serial primary key,
     subject varchar
 );   create table message_property(
     message_id integer not null references message(id),
     person_id integer not null references person(id),
     is_read boolean not null default false,
     unique(message_id, person_id)
 );   insert into person(username) values('user_' || generate_series(0, 999));
 insert into message(subject) values('Subject ' || random() || 
generate_series(0, 99));
 insert into message_property(message_id, person_id, is_read) select id, 1, 
true from message order by id limit 90;
 insert into message_property(message_id, person_id, is_read) select id, 1, 
false from message order by id limit 5 offset 90; analyze;   So, for person 
1 there are 10 unread messages, out of a total 1mill. 5 of those unread does 
not have an entry in message_property and 5 have an entry and is_read set to 
FALSE.   I have the following query to list all un-read messages for person 
with id=1:   SELECT
     m.id  AS message_id,
     prop.person_id,
     coalesce(prop.is_read, FALSE) AS is_read,
     m.subject
 FROM message m
     LEFT OUTER JOIN message_property prop ON prop.message_id = m.id AND 
prop.person_id = 1
 WHERE 1 = 1
   AND NOT EXISTS(SELECT
  *
  FROM message_property pr
  WHERE pr.message_id = m.id AND pr.person_id = 
prop.person_id AND prop.is_read = TRUE)
     ; 
 The problem is that it's not quite efficient and performs badly, explain 
analyze shows: 

 
QUERY PLAN
 
-
  Merge Anti Join  (cost=1.27..148784.09 rows=5 width=40) (actual 
time=918.906..918.913 rows=10 loops=1)
    Merge Cond: (m.id = pr.message_id)
    Join Filter: (prop.is_read AND (pr.person_id = prop.person_id))
    Rows Removed by Join Filter: 5
    -  Merge Left Join  (cost=0.85..90300.76 rows=100 width=40) (actual 
time=0.040..530.748 rows=100 loops=1)
  Merge Cond: (m.id = prop.message_id)
  -  Index Scan using message_pkey on message m  (cost=0.42..34317.43 
rows=100 width=35) (actual time=0.014..115.829 rows=100 loops=1)
  -  Index Scan using message_property_message_id_person_id_key on 
message_property prop  (cost=0.42..40983.40 rows=95 width=9) (actual 
time=0.020..130.728 rows=95 loops=1)
    Index Cond: (person_id = 1)
    -  Index Only Scan using message_property_message_id_person_id_key on 
message_property pr  (cost=0.42..40983.40 rows=95 width=8) (actual 
time=0.024..140.349 rows=95 loops=1)
  Index Cond: (person_id = 1)
  Heap Fetches: 95
  Total runtime: 918.975 ms
 (13 rows)   
 Does anyone have suggestions on how to optimize the query or schema? It's 
important that any message not having an entry in message_property for a user 
is considered un-read. 
 Thanks!   -- Andreas Jospeh Krogh CTO / Partner - Visena AS Mobile: +47 909 
56 963 andr...@visena.com mailto:andr...@visena.com www.visena.com 
https://www.visena.com  https://www.visena.com

Re: [PERFORM] Optimize query for listing un-read messages

2014-05-01 Thread Jochem Berndsen


Hi Andreas,

[New to this list, forgive my ignorance.]

On 05/01/2014 01:26 PM, Andreas Joseph Krogh wrote:
I'm using PostgreSQL 9.3.2 on x86_64-unknown-linux-gnu 

My machine has PostgreSQL 9.1.13 on x86_64-unknown-linux-gnu.
I have a schema where I have lots of messages and some users who might 
have read some of them. When a message is read by a user I create an 
entry i a table message_property holding the property (is_read) for 
that user.

The schema is as follows:
drop table if exists message_property;
drop table if exists message;
drop table if exists person;
create table person(
id serial primary key,
username varchar not null unique
);
create table message(
id serial primary key,
subject varchar
);
create table message_property(
message_id integer not null references message(id),
person_id integer not null references person(id),
is_read boolean not null default false,
unique(message_id, person_id)
);

[snip]
So, for person 1 there are 10 unread messages, out of a total 1mill. 5 
of those unread does not have an entry in message_property and 5 have 
an entry and is_read set to FALSE.
I have the following query to list all un-read messages for person 
with id=1:

SELECT
m.id  AS message_id,
prop.person_id,
coalesce(prop.is_read, FALSE) AS is_read,
m.subject
FROM message m
LEFT OUTER JOIN message_property prop ON prop.message_id = m.id 
AND prop.person_id = 1

WHERE 1 = 1
  AND NOT EXISTS(SELECT
 *
 FROM message_property pr
 WHERE pr.message_id = m.id AND pr.person_id = 
prop.person_id AND prop.is_read = TRUE)

;

The problem is that it's not quite efficient and performs badly, 
explain analyze shows:

[snip]


Does anyone have suggestions on how to optimize the query or schema?


I'm getting better performance with:

SELECT
m.id AS message_id,
1 AS person_id,
FALSE AS is_read,
m.subject
FROM message m
WHERE 1 = 1
AND NOT EXISTS(SELECT
*
FROM message_property pr
WHERE pr.message_id = m.id AND pr.person_id = 1 AND pr.is_read);

You then lose the distinction between message_property with is_read = 
FALSE, and nonexistent message_property for the message row.


If that is essential, I'm getting a roughly 2x speedup on my non-tuned 
PostgreSQL with:

 SELECT
m.id  AS message_id,
prop.person_id,
coalesce(prop.is_read, FALSE) AS is_read,
m.subject
FROM message m
LEFT OUTER JOIN message_property prop ON prop.message_id = m.id AND 
prop.person_id = 1

WHERE not coalesce(prop.is_read, false);

HTH,
Jochem

--
Jochem Berndsen | joc...@functor.nl



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Optimize query for listing un-read messages

2014-05-01 Thread Andreas Joseph Krogh
På torsdag 01. mai 2014 kl. 20:35:07, skrev Jochem Berndsen joc...@functor.nl 
mailto:joc...@functor.nl: 
 Hi Andreas,

 [New to this list, forgive my ignorance.]
 [snip]
 I'm getting better performance with:

 SELECT
 m.id AS message_id,
 1 AS person_id,
 FALSE AS is_read,
 m.subject
 FROM message m
 WHERE 1 = 1
 AND NOT EXISTS(SELECT
      *
      FROM message_property pr
      WHERE pr.message_id = m.id AND pr.person_id = 1 AND pr.is_read);

 You then lose the distinction between message_property with is_read =
 FALSE, and nonexistent message_property for the message row.

 If that is essential, I'm getting a roughly 2x speedup on my non-tuned
 PostgreSQL with:
   SELECT
      m.id                          AS message_id,
      prop.person_id,
      coalesce(prop.is_read, FALSE) AS is_read,
      m.subject
 FROM message m
      LEFT OUTER JOIN message_property prop ON prop.message_id = m.id AND
 prop.person_id = 1
 WHERE not coalesce(prop.is_read, false);     Hi Jochem,   Thansk for looking 
at it. I'm still seing ~500ms being spent and I was hoping for a way to do this 
using index so one could achieve 1-10ms, but maybe that's impossible given the 
schema?   Is there a way to design an equivalent  schema to achieve 10ms 
execution-time?   -- Andreas Jospeh Krogh CTO / Partner - Visena AS Mobile: +47 
909 56 963 andr...@visena.com mailto:andr...@visena.com www.visena.com 
https://www.visena.com  https://www.visena.com  

Re: [PERFORM] Optimize query for listing un-read messages

2014-05-01 Thread Pavel Stehule
Hello


2014-05-01 21:17 GMT+02:00 Andreas Joseph Krogh andr...@visena.com:

 På torsdag 01. mai 2014 kl. 20:35:07, skrev Jochem Berndsen 
 joc...@functor.nl:


 Hi Andreas,

 [New to this list, forgive my ignorance.]
 [snip]
 I'm getting better performance with:

 SELECT
 m.id AS message_id,
 1 AS person_id,
 FALSE AS is_read,
 m.subject
 FROM message m
 WHERE 1 = 1
 AND NOT EXISTS(SELECT
  *
  FROM message_property pr
  WHERE pr.message_id = m.id AND pr.person_id = 1 AND pr.is_read);

 You then lose the distinction between message_property with is_read =
 FALSE, and nonexistent message_property for the message row.

 If that is essential, I'm getting a roughly 2x speedup on my non-tuned
 PostgreSQL with:
   SELECT
  m.id  AS message_id,
  prop.person_id,
  coalesce(prop.is_read, FALSE) AS is_read,
  m.subject
 FROM message m
  LEFT OUTER JOIN message_property prop ON prop.message_id = m.id AND
 prop.person_id = 1
 WHERE not coalesce(prop.is_read, false);



 Hi Jochem,

 Thansk for looking at it. I'm still seing ~500ms being spent and I was
 hoping for a way to do this using index so one could achieve 1-10ms, but
 maybe that's impossible given the schema?

 Is there a way to design an equivalent  schema to achieve 10ms
 execution-time?


I had a perfect success on similar use case with descent ordered partial
index

http://www.postgresql.org/docs/9.3/interactive/sql-createindex.html

Regards

Pavel



  --
 *Andreas Jospeh Krogh*
 CTO / Partner - Visena AS
 Mobile: +47 909 56 963
 andr...@visena.com
 www.visena.com
  https://www.visena.com




Re: [PERFORM] Optimize query for listing un-read messages

2014-05-01 Thread Andreas Joseph Krogh
På torsdag 01. mai 2014 kl. 21:30:39, skrev Pavel Stehule 
pavel.steh...@gmail.com mailto:pavel.steh...@gmail.com: Hello [snip]   I had 
a perfect success on similar use case with descent ordered partial index

http://www.postgresql.org/docs/9.3/interactive/sql-createindex.html 
http://www.postgresql.org/docs/9.3/interactive/sql-createindex.html   I'm not 
getting good performance. Are you able to craft an example using my schema and 
partial index?   Thanks.   -- Andreas Jospeh Krogh CTO / Partner - Visena AS 
Mobile: +47 909 56 963 andr...@visena.com mailto:andr...@visena.com 
www.visena.com https://www.visena.com  https://www.visena.com  

Re: [PERFORM] Optimize query for listing un-read messages

2014-05-01 Thread Pavel Stehule
2014-05-01 21:39 GMT+02:00 Andreas Joseph Krogh andr...@visena.com:

 På torsdag 01. mai 2014 kl. 21:30:39, skrev Pavel Stehule 
 pavel.steh...@gmail.com:

 Hello
 [snip]

 I had a perfect success on similar use case with descent ordered partial
 index

 http://www.postgresql.org/docs/9.3/interactive/sql-createindex.html


 I'm not getting good performance. Are you able to craft an example using
 my schema and partial index?


maybe some like

CREATE INDEX ON message_property (person_id, message_id) WHERE pr.is_read

When I am thinking about your schema, it is designed well, but it is not
index friendly, so for some fast access you should to hold a cache (table)
of unread messages.

Regards

Pavel



 Thanks.

  --
 *Andreas Jospeh Krogh*
 CTO / Partner - Visena AS
 Mobile: +47 909 56 963
 andr...@visena.com
 www.visena.com
  https://www.visena.com




Re: [PERFORM] Optimize query for listing un-read messages

2014-05-01 Thread Andreas Joseph Krogh
På torsdag 01. mai 2014 kl. 21:53:32, skrev Pavel Stehule 
pavel.steh...@gmail.com mailto:pavel.steh...@gmail.com:     2014-05-01 21:39 
GMT+02:00 Andreas Joseph Kroghandr...@visena.com mailto:andr...@visena.com: 
På torsdag 01. mai 2014 kl. 21:30:39, skrev Pavel Stehule 
pavel.steh...@gmail.com mailto:pavel.steh...@gmail.com: Hello [snip]   I had 
a perfect success on similar use case with descent ordered partial index

http://www.postgresql.org/docs/9.3/interactive/sql-createindex.html 
http://www.postgresql.org/docs/9.3/interactive/sql-createindex.html   I'm not 
getting good performance. Are you able to craft an example using my schema and 
partial index?   maybe some like
   CREATE INDEX ON message_property (person_id, message_id) WHERE pr.is_read
   When I am thinking about your schema, it is designed well, but it is not 
index friendly, so for some fast access you should to hold a cache (table) of 
unread messages   Ah, that's what I was hoping to not having to do. In my 
system, messages arrive all the time and having to update a cache for all new 
messages for all users seems messy... Seems I could just as well create a 
message_property for all users when a new message arrives, so I can INNER JOIN 
it and get good performance. But that table will quickly grow *very* large...   
--Andreas Jospeh Krogh CTO / Partner - Visena AS Mobile: +47 909 56 963 
andr...@visena.com mailto:andr...@visena.com www.visena.com 
https://www.visena.com  https://www.visena.com  

Re: [PERFORM] Optimize query for listing un-read messages

2014-05-01 Thread Pavel Stehule
2014-05-01 22:30 GMT+02:00 Andreas Joseph Krogh andr...@visena.com:

 På torsdag 01. mai 2014 kl. 21:53:32, skrev Pavel Stehule 
 pavel.steh...@gmail.com:



 2014-05-01 21:39 GMT+02:00 Andreas Joseph Krogh andr...@visena.com:

 På torsdag 01. mai 2014 kl. 21:30:39, skrev Pavel Stehule 
 pavel.steh...@gmail.com:

 Hello
 [snip]

 I had a perfect success on similar use case with descent ordered partial
 index

 http://www.postgresql.org/docs/9.3/interactive/sql-createindex.html


 I'm not getting good performance. Are you able to craft an example using
 my schema and partial index?


 maybe some like

 CREATE INDEX ON message_property (person_id, message_id) WHERE pr.is_read

 When I am thinking about your schema, it is designed well, but it is not
 index friendly, so for some fast access you should to hold a cache (table)
 of unread messages


 Ah, that's what I was hoping to not having to do. In my system, messages
 arrive all the time and having to update a cache for all new messages for
 all users seems messy... Seems I could just as well create a
 message_property for all users when a new message arrives, so I can INNER
 JOIN it and get good performance. But that table will quickly grow *very*
 large...


What you need is a JOIN index, that is not possible in Postgres.

I afraid so some ugly solutions is necessary (when you require extra fast
access). You need a index (small index) and it require some existing set -
you cannot do index on the difference two sets.

I expect so some flag on the relation message - like it should not be
not read can helps little bit - and can be used in partial index as
conditions. Other possibility is some variant of partitioning - you can
divide a messages and users to distinct sets and then you decrease a number
of possible combinations.

Regards

Pavel



  --
 *Andreas Jospeh Krogh*
 CTO / Partner - Visena AS
 Mobile: +47 909 56 963
 andr...@visena.com
 www.visena.com
  https://www.visena.com




Re: [PERFORM] Optimize query for listing un-read messages

2014-05-01 Thread David G Johnston
How does something like:

WITH unreads AS (
SELECT messageid FROM message
EXCEPT
SELECT messageid FROM message_property WHERE personid=1 AND has_read
)
SELECT ...
FROM unreads
JOIN messages USING (messageid)
;

perform?

David J.




--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Optimize-query-for-listing-un-read-messages-tp5802097p5802157.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Optimize query for listing un-read messages

2014-05-01 Thread Andreas Joseph Krogh
På torsdag 01. mai 2014 kl. 23:19:55, skrev David G Johnston 
david.g.johns...@gmail.com mailto:david.g.johns...@gmail.com: How does 
something like:

 WITH unreads AS (
 SELECT messageid FROM message
 EXCEPT
 SELECT messageid FROM message_property WHERE personid=1 AND has_read
 )
 SELECT ...
 FROM unreads
 JOIN messages USING (messageid)
 ;

 perform?   It actually performs worse.   The best query so far is:   SELECT
     m.id  AS message_id,
     prop.person_id,
     coalesce(prop.is_read, FALSE) AS is_read,
     m.subject
 FROM message m
     LEFT OUTER JOIN message_property prop ON prop.message_id = m.id AND
  prop.person_id = 1
 WHERE coalesce(prop.is_read, false) = false;   Giving the plan: 

  
QUERY PLAN
 
---
  Merge Left Join  (cost=4.20..90300.76 rows=50 width=40) (actual 
time=445.021..445.025 rows=10 loops=1)
    Merge Cond: (m.id = prop.message_id)
    Filter: (NOT COALESCE(prop.is_read, false))
    Rows Removed by Filter: 90
    -  Index Scan using message_pkey on message m  (cost=0.42..34317.43 
rows=100 width=35) (actual time=0.014..113.314 rows=100 loops=1)
    -  Index Scan using message_property_message_id_person_id_key on 
message_property prop  (cost=0.42..40983.40 rows=95 width=9) (actual 
time=0.018..115.019 rows=95 loops=1)
  Index Cond: (person_id = 1)
  Total runtime: 445.076 ms
 (8 rows)   -- Andreas Jospeh Krogh CTO / Partner - Visena AS Mobile: +47 909 
56 963 andr...@visena.com mailto:andr...@visena.com www.visena.com 
https://www.visena.com  https://www.visena.com  

Re: [PERFORM] Optimize query for listing un-read messages

2014-05-01 Thread Tomas Vondra
On 1.5.2014 23:19, Andreas Joseph Krogh wrote:
 På torsdag 01. mai 2014 kl. 23:02:13, skrev Pavel Stehule
 pavel.steh...@gmail.com mailto:pavel.steh...@gmail.com:
 
  
  
 2014-05-01 22:30 GMT+02:00 Andreas Joseph Krogh andr...@visena.com
 mailto:andr...@visena.com:
 
 På torsdag 01. mai 2014 kl. 21:53:32, skrev Pavel Stehule
 pavel.steh...@gmail.com mailto:pavel.steh...@gmail.com:
 
  
  
 2014-05-01 21:39 GMT+02:00 Andreas Joseph Krogh
 andr...@visena.com mailto:andr...@visena.com:
 
 På torsdag 01. mai 2014 kl. 21:30:39, skrev Pavel
 Stehule pavel.steh...@gmail.com
 mailto:pavel.steh...@gmail.com:
 
 Hello
 [snip]
  
 I had a perfect success on similar use case with
 descent ordered partial index
 
 
 http://www.postgresql.org/docs/9.3/interactive/sql-createindex.html
 
  
 I'm not getting good performance. Are you able to craft
 an example using my schema and partial index?
 
  
 maybe some like
  
 CREATE INDEX ON message_property (person_id, message_id)
 WHERE pr.is_read
  
 When I am thinking about your schema, it is designed well,
 but it is not index friendly, so for some fast access you
 should to hold a cache (table) of unread messages
 
  
 Ah, that's what I was hoping to not having to do. In my system,
 messages arrive all the time and having to update a cache for
 all new messages for all users seems messy... Seems I could just
 as well create a message_property for all users when a new
 message arrives, so I can INNER JOIN it and get good
 performance. But that table will quickly grow *very* large...
 
  
 What you need is a JOIN index, that is not possible in Postgres.
  
 I afraid so some ugly solutions is necessary (when you require
 extra fast access). You need a index (small index) and it require
 some existing set - you cannot do index on the difference two sets.
  
 I expect so some flag on the relation message - like it should
 not be not read can helps little bit - and can be used in partial
 index as conditions. Other possibility is some variant of
 partitioning - you can divide a messages and users to distinct sets
 and then you decrease a number of possible combinations.
 
  
 Just curious:
 Is such a JOIN index possible in other DBs, if so - which?
 Can other DBs do index on difference between two sets?
 Will PG ever have this, is it even possible?

I'm not aware of such database, but maybe it's possible at least for
some cases. But I'd expect that to significantly depend on the schema.
And I'm not aware of any such effort in case of PostgreSQL, do don't
hold your breath.

IMHO the problem with your schema is that while each 'read' message has
a matching row in message_property, 'undread' messages may or may not
have a matching row. Is there a particular reason for that?

If you could get rid of this, i.e. require that each pair (message,
recipient) has a row in message_propery (irrespectedly whether the
message was read or not), you can do this:

CREATE INDEX message_unread_idx
ON message_property(message_id, person_id) WHERE (NOT is_read)

and then simply do the query like this:

SELECT
m.id,
prop.person_id,
prop.is_read,
m.subject
FROM messages m JOIN message_property p ON (m.id = p.message_id)
WHERE (NOT is_read) AND person_id = :pid

and I'd expect this to use the partial index, and being much faster.

regards
Tomas



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Optimize query for listing un-read messages

2014-05-01 Thread Andreas Joseph Krogh
På torsdag 01. mai 2014 kl. 23:45:49, skrev Tomas Vondra t...@fuzzy.cz 
mailto:t...@fuzzy.cz: On 1.5.2014 23:19, Andreas Joseph Krogh wrote:
  På torsdag 01. mai 2014 kl. 23:02:13, skrev Pavel Stehule
  pavel.steh...@gmail.com mailto:pavel.steh...@gmail.com:
 
      
      
      2014-05-01 22:30 GMT+02:00 Andreas Joseph Krogh andr...@visena.com
      mailto:andr...@visena.com:
 
          På torsdag 01. mai 2014 kl. 21:53:32, skrev Pavel Stehule
          pavel.steh...@gmail.com mailto:pavel.steh...@gmail.com:
 
              
              
              2014-05-01 21:39 GMT+02:00 Andreas Joseph Krogh
              andr...@visena.com mailto:andr...@visena.com:
 
                  På torsdag 01. mai 2014 kl. 21:30:39, skrev Pavel
                  Stehule pavel.steh...@gmail.com
                  mailto:pavel.steh...@gmail.com:
 
                      Hello
                      [snip]
                      
                      I had a perfect success on similar use case with
                      descent ordered partial index
 
                     
 http://www.postgresql.org/docs/9.3/interactive/sql-createindex.html
 
                  
                  I'm not getting good performance. Are you able to craft
                  an example using my schema and partial index?
 
              
              maybe some like
              
              CREATE INDEX ON message_property (person_id, message_id)
              WHERE pr.is_read
              
              When I am thinking about your schema, it is designed well,
              but it is not index friendly, so for some fast access you
              should to hold a cache (table) of unread messages
 
          
          Ah, that's what I was hoping to not having to do. In my system,
          messages arrive all the time and having to update a cache for
          all new messages for all users seems messy... Seems I could just
          as well create a message_property for all users when a new
          message arrives, so I can INNER JOIN it and get good
          performance. But that table will quickly grow *very* large...
 
      
      What you need is a JOIN index, that is not possible in Postgres.
      
      I afraid so some ugly solutions is necessary (when you require
      extra fast access). You need a index (small index) and it require
      some existing set - you cannot do index on the difference two sets.
      
      I expect so some flag on the relation message - like it should
      not be not read can helps little bit - and can be used in partial
      index as conditions. Other possibility is some variant of
      partitioning - you can divide a messages and users to distinct sets
      and then you decrease a number of possible combinations.
 
  
  Just curious:
  Is such a JOIN index possible in other DBs, if so - which?
  Can other DBs do index on difference between two sets?
  Will PG ever have this, is it even possible?

 I'm not aware of such database, but maybe it's possible at least for
 some cases. But I'd expect that to significantly depend on the schema.
 And I'm not aware of any such effort in case of PostgreSQL, do don't
 hold your breath.

 IMHO the problem with your schema is that while each 'read' message has
 a matching row in message_property, 'undread' messages may or may not
 have a matching row. Is there a particular reason for that?     Yes. The 
point is that maintaining a message_property pair for all messages for all 
users in the system imposes quite a maintainance-headache. As the schema is now 
any new message is per definition un-read, and when a user reads it then it 
gets an entry with is_read=true in message_property. This table holds other 
properties too. This way I'm avoiding having to book-keep so much when a new 
message arrives and when a new user is created. A message in my system does not 
necessarily have only one recipient, it might have one, many or none, and might 
be visible to many.   If you could get rid of this, i.e. require that each pair 
(message,
 recipient) has a row in message_propery (irrespectedly whether the
 message was read or not), you can do this:

 CREATE INDEX message_unread_idx
     ON message_property(message_id, person_id) WHERE (NOT is_read)

 and then simply do the query like this:

 SELECT
     m.id,
     prop.person_id,
     prop.is_read,
     m.subject
 FROM messages m JOIN message_property p ON (m.id = p.message_id)
 WHERE (NOT is_read) AND person_id = :pid

 and I'd expect this to use the partial index, and being much faster.   I'm 
aware of the performance-gain using such a plain JOIN-query.   Thanks for your 
feedback.   -- Andreas Jospeh Krogh CTO / Partner - Visena AS Mobile: +47 909 
56 963 andr...@visena.com mailto:andr...@visena.com www.visena.com 
https://www.visena.com  https://www.visena.com  

Re: [PERFORM] Optimize query for listing un-read messages

2014-05-01 Thread Tomas Vondra
On 1.5.2014 23:58, Andreas Joseph Krogh wrote:
 På torsdag 01. mai 2014 kl. 23:45:49, skrev Tomas Vondra t...@fuzzy.cz
 mailto:t...@fuzzy.cz:
 
 On 1.5.2014 23:19, Andreas Joseph Krogh wrote:
  Just curious:
  Is such a JOIN index possible in other DBs, if so - which?
  Can other DBs do index on difference between two sets?
  Will PG ever have this, is it even possible?
 
 I'm not aware of such database, but maybe it's possible at least for
 some cases. But I'd expect that to significantly depend on the schema.
 And I'm not aware of any such effort in case of PostgreSQL, do don't
 hold your breath.
 
 IMHO the problem with your schema is that while each 'read' message has
 a matching row in message_property, 'undread' messages may or may not
 have a matching row. Is there a particular reason for that?
 
  
  
 Yes. The point is that maintaining a message_property pair for all
 messages for all users in the system imposes quite a
 maintainance-headache. As the schema is now any new message is per
 definition un-read, and when a user reads it then it gets an entry with
 is_read=true in message_property. This table holds other properties too.
 This way I'm avoiding having to book-keep so much when a new message
 arrives and when a new user is created. A message in my system does not
 necessarily have only one recipient, it might have one, many or none,
 and might be visible to many.

So how do you determine who's the recipient of a message? Or is that the
case that everyone can read everything (which is why you're displaying
them the unread messages, right)?

I understand you're trying to solve this without storing a row for each
possible message-person combination, but won't this eventually happen
anyway (with is_read=true for all rows)?

Tomas


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Optimize query for listing un-read messages

2014-05-01 Thread Andreas Joseph Krogh
På fredag 02. mai 2014 kl. 00:34:34, skrev Tomas Vondra t...@fuzzy.cz 
mailto:t...@fuzzy.cz: On 1.5.2014 23:58, Andreas Joseph Krogh wrote:
  På torsdag 01. mai 2014 kl. 23:45:49, skrev Tomas Vondra t...@fuzzy.cz
  mailto:t...@fuzzy.cz:
 
      On 1.5.2014 23:19, Andreas Joseph Krogh wrote:
       Just curious:
       Is such a JOIN index possible in other DBs, if so - which?
       Can other DBs do index on difference between two sets?
       Will PG ever have this, is it even possible?
 
      I'm not aware of such database, but maybe it's possible at least for
      some cases. But I'd expect that to significantly depend on the schema.
      And I'm not aware of any such effort in case of PostgreSQL, do don't
      hold your breath.
 
      IMHO the problem with your schema is that while each 'read' message has
      a matching row in message_property, 'undread' messages may or may not
      have a matching row. Is there a particular reason for that?
 
  
  
  Yes. The point is that maintaining a message_property pair for all
  messages for all users in the system imposes quite a
  maintainance-headache. As the schema is now any new message is per
  definition un-read, and when a user reads it then it gets an entry with
  is_read=true in message_property. This table holds other properties too.
  This way I'm avoiding having to book-keep so much when a new message
  arrives and when a new user is created. A message in my system does not
  necessarily have only one recipient, it might have one, many or none,
  and might be visible to many.

 So how do you determine who's the recipient of a message? Or is that the
 case that everyone can read everything (which is why you're displaying
 them the unread messages, right)?     A message might have a recipient and 
might be read by others.   I understand you're trying to solve this without 
storing a row for each
 possible message-person combination, but won't this eventually happen
 anyway (with is_read=true for all rows)?     I will end up with that only if 
all users read all messages, which is not nearly the case.   -- Andreas Jospeh 
Krogh CTO / Partner - Visena AS Mobile: +47 909 56 963 andr...@visena.com 
mailto:andr...@visena.com www.visena.com https://www.visena.com  
https://www.visena.com  

Re: [PERFORM] Optimize query for listing un-read messages

2014-05-01 Thread David G Johnston
Andreas Joseph Krogh-2 wrote
 I will end up with that only if 
 all users read all messages, which is not nearly the case.

These observations probably won't help but...

You have what amounts to a mathematical spare matrix problem on your
hands...

Is there any way to expire messages so that dimension does not grow
unbounded?

Per-User caching does seem to be something that is going to be needed...

Depending on how many users are being tracked would storing the reader_id
in an indexed array improve matters?   SELECT ... FROM message WHERE NOT (1
= ANY(reader_ids)) ; UPDATE message SET reader_ids = reader_ids || 1 WHERE
messageid = ...  I'm not that familiar with how well indexes over arrays
work or which kind is needed (i.e. gin/gist).

HTH

David J.




--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Optimize-query-for-listing-un-read-messages-tp5802097p5802170.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Optimize query for listing un-read messages

2014-05-01 Thread Andreas Joseph Krogh
På fredag 02. mai 2014 kl. 00:55:25, skrev David G Johnston 
david.g.johns...@gmail.com mailto:david.g.johns...@gmail.com: Andreas Joseph 
Krogh-2 wrote
  I will end up with that only if
  all users read all messages, which is not nearly the case.

 These observations probably won't help but...

 You have what amounts to a mathematical spare matrix problem on your
 hands...

 Is there any way to expire messages so that dimension does not grow
 unbounded?     No, unfortunately...   Per-User caching does seem to be 
something that is going to be needed...

 Depending on how many users are being tracked would storing the reader_id
 in an indexed array improve matters?   SELECT ... FROM message WHERE NOT (1
 = ANY(reader_ids)) ; UPDATE message SET reader_ids = reader_ids || 1 WHERE
 messageid = ...  I'm not that familiar with how well indexes over arrays
 work or which kind is needed (i.e. gin/gist).     is_read is one of many 
properties being tracked for a message...   -- Andreas Jospeh Krogh CTO / 
Partner - Visena AS Mobile: +47 909 56 963 andr...@visena.com 
mailto:andr...@visena.com www.visena.com https://www.visena.com  
https://www.visena.com  

Re: [PERFORM] Optimize query for listing un-read messages

2014-05-01 Thread David G Johnston


 Per-User caching does seem to be something that is going to be needed...

 Depending on how many users are being tracked would storing the reader_id
 in an indexed array improve matters?   SELECT ... FROM message WHERE NOT
 (1
 = ANY(reader_ids)) ; UPDATE message SET reader_ids = reader_ids || 1 WHERE
 messageid = ...  I'm not that familiar with how well indexes over arrays
 work or which kind is needed (i.e. gin/gist).



 is_read is one of many properties being tracked for a message...


​But you don't have to have all of them on the same table.  Once you've
identified the messages in question performing a standard join onto a
supplemental detail table should be fairly straight-forward.

Do these other properties have values when is_read is false or only when
is_read is true?  Since you already allow for the possibility of a
missing record (giving it the meaning of not read)​ these other
properties cannot currently exist in that situation.

David J.




--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/Optimize-query-for-listing-un-read-messages-tp5802097p5802174.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.

Re: [PERFORM] Optimize query for listing un-read messages

2014-05-01 Thread Andreas Joseph Krogh
På fredag 02. mai 2014 kl. 01:58:04, skrev David G Johnston 
david.g.johns...@gmail.com mailto:david.g.johns...@gmail.com: 
 Per-User caching does seem to be something that is going to be needed...

 Depending on how many users are being tracked would storing the reader_id
 in an indexed array improve matters?   SELECT ... FROM message WHERE NOT (1
 = ANY(reader_ids)) ; UPDATE message SET reader_ids = reader_ids || 1 WHERE
 messageid = ...  I'm not that familiar with how well indexes over arrays
 work or which kind is needed (i.e. gin/gist).     is_read is one of many 
properties being tracked for a message...     ​But you don't have to have all 
of them on the same table.  Once you've identified the messages in question 
performing a standard join onto a supplemental detail table should be fairly 
straight-forward.   Do these other properties have values when is_read is 
false or only when is_read is true?  Since you already allow for the 
possibility of a missing record (giving it the meaning of not read)​ these 
other properties cannot currently exist in that situation.     A message might 
hold a property (ie. is_important) when is_read is FALSE (it might be set back 
to is_read=FALSE after being read the first time).   -- Andreas Jospeh Krogh 
CTO / Partner - Visena AS Mobile: +47 909 56 963 andr...@visena.com 
mailto:andr...@visena.com www.visena.com https://www.visena.com  
https://www.visena.com  

Re: [PERFORM] Optimize query for listing un-read messages

2014-05-01 Thread Craig James
On Thu, May 1, 2014 at 4:26 AM, Andreas Joseph Krogh andr...@visena.comwrote:

 I have a schema where I have lots of messages and some users who might
 have read some of them. When a message is read by a user I create an entry
 i a table message_property holding the property (is_read) for that user.

 The schema is as follows:

[...]


 create table person(
 id serial primary key,
 username varchar not null unique
 );

 create table message(
 id serial primary key,
 subject varchar
 );

 create table message_property(
 message_id integer not null references message(id),
 person_id integer not null references person(id),
 is_read boolean not null default false,
 unique(message_id, person_id)
 );

[...]

  So, for person 1 there are 10 unread messages, out of a total 1mill. 5 of
 those unread does not have an entry in message_property and 5 have an entry
 and is_read set to FALSE.


Here's a possible enhancement: add two columns, an indexed timestamp to the
message table, and a timestamp of the oldest message this user has NOT
read on the person table. If most users read messages in a timely fashion,
this would (in most cases) narrow down the portion of the messages table to
a tiny fraction of the total -- just those messages newer than the oldest
message this user has not read.

When you sign up a new user, you can set his timestamp to the time the
account was created, since presumably messages before that time don't apply.

Whether this will help depends a lot on actual use patterns, i.e. do users
typically read all messages or do they leave a bunch of unread messages
sitting around forever?

Craig