Re: [PERFORM] How would you store read/unread topic status?

2009-06-24 Thread Chris St Denis

Mathieu Nebra wrote:

Alexander Staubo a écrit :
  

On Tue, Jun 23, 2009 at 1:12 PM, Mathieu Nebramate...@siteduzero.com wrote:


This flags table has more or less the following fields:

UserID - TopicID - LastReadAnswerID
  

We are doing pretty much same thing.



My problem is that everytime a user READS a topic, it UPDATES this flags
table to remember he has read it. This leads to multiple updates at the
same time on the same table, and an update can take a few seconds. This
is not acceptable for my users.
  

First of all, and I'm sure you thought of this, an update isn't needed
every time a user reads a topic; only when there are new answers that
need to be marked as read. So an update ... where last_read_answer_id
 ? should avoid the need for an update.



We don't work that way. We just remember he has read these answers and
then we can tell him there are no new messages for you to read.
So we just need to write what he has read when he reads it.

  

(That said, I believe PostgreSQL diffs tuple updates, so in practice
PostgreSQL might not be writing anything if you run an update with
the same value. I will let someone more intimate with the internal
details of updates to comment on this.)

Secondly, an update should not take a few seconds. You might want to
investigate this part before you turn to further optimizations.



Yes, I know there is a problem but I don't know if I am competent enough
to tune PostgreSQL for that. It can take a while to understand the
problem, and I'm not sure I'll have the time for that.

I am, however, opened to suggestions. Maybe I'm doing something wrong
somewhere.

  

In our application we defer the updates to a separate asynchronous
process using a simple queue mechanism, but in our case, we found that
the updates are fast enough (in the order of a few milliseconds) not
to warrant batching them into single transactions.



A few milliseconds would be cool.
In fact, defering to another process is a good idea, but I'm not sure if
it is easy to implement. It would be great to have some sort of UPDATE
... LOW PRIORITY to make the request non blocking.

Thanks.

  
I use pg_send_query() 
http://ca2.php.net/manual/en/function.pg-send-query.php in php to 
achieve this for a views counter. Script execution is not blocked while 
the queries are executing.


It looks like this may just be a direct translation of PQsendQuery() 
from libpq. Your preferred language may have a function like this.




Re: [PERFORM] How would you store read/unread topic status?

2009-06-24 Thread Mathieu Nebra
Craig James a écrit :
 Mathieu Nebra wrote:
 Greg Stark a écrit :
 All the other comments are accurate, though it does seem like
 something the database ought to be able to handle.

 The other thing which hasn't been mentioned is that you have a lot of
 indexes. Updates require maintaining all those indexes. Are all of
 these indexes really necessary? Do you have routine queries which look
 up users based on their flags? Or all all your oltp transactions for
 specific userids in which case you probably just need the index on
 userid.


 We are using these indexes, but I can't be sure if we _really_ need them
 or not.

 I can go into detail. We have:

 UserID - TopicID - LastReadAnswerID - WrittenStatus - IsFavorite

 So basically, we toggle the boolean flag WrittenStatus when the user has
 written in that topic. The same goes for IsFavorite.
 
 Do those last two columns hold much data?  Another thing to consider is
 to split this into two tables:

The last two columns only store TRUE or FALSE, they're booleans. So
you're saying that an index on them might be useless ? We're retrieving
1000-2000 rows max and we need to extract only those who have TRUE on
the last column for example.

 
  UserID - TopicID - LastReadAnswerID
  UserID - TopicID - WrittenStatus - IsFavorite
 
 As others have pointed out, an UPDATE in Postgres is a
 select/delete/insert, and if you're updating just the LastReadAnswerID
 all the time, you're wasting time deleting and re-inserting a lot of
 data that never change (assuming they're not trivially small columns).

They are trivially small columns.

 
 This might also solve the problem of too many indexes -- the table
 that's updated frequently would only have an index on (UserID, TopicID),
 so the update only affects one index.

I'll investigate that way.

 
 Then to minimize the impact on your app, create a view that looks like
 the original table for read-only apps.

Good idea, thanks again.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] How would you store read/unread topic status?

2009-06-24 Thread Mathieu Nebra

   
 In our application we defer the updates to a separate asynchronous
 process using a simple queue mechanism, but in our case, we found that
 the updates are fast enough (in the order of a few milliseconds) not
 to warrant batching them into single transactions.
 

 A few milliseconds would be cool.
 In fact, defering to another process is a good idea, but I'm not sure if
 it is easy to implement. It would be great to have some sort of UPDATE
 ... LOW PRIORITY to make the request non blocking.

 Thanks.

   
 I use pg_send_query()
 http://ca2.php.net/manual/en/function.pg-send-query.php in php to
 achieve this for a views counter. Script execution is not blocked while
 the queries are executing.
 
 It looks like this may just be a direct translation of PQsendQuery()
 from libpq. Your preferred language may have a function like this.
 

I am using PHP. That was one of the thing I was looking for, thank you! :)
We'll combine this with a memcached solution so we just update every
1000 views for example.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] How would you store read/unread topic status?

2009-06-23 Thread Mathieu Nebra
Hi all,

I'm running a quite large website which has its own forums. They are
currently heavily used and I'm getting performance issues. Most of them
are due to repeated UPDATE queries on a flags table.

This flags table has more or less the following fields:

UserID - TopicID - LastReadAnswerID

The flags table keeps track of every topic a member has visited and
remembers the last answer which was posted at this moment. It allows the
user to come back a few days after and immediately jump to the last
answer he has not read.

My problem is that everytime a user READS a topic, it UPDATES this flags
table to remember he has read it. This leads to multiple updates at the
same time on the same table, and an update can take a few seconds. This
is not acceptable for my users.

Question: what is the general rule of thumb here? How would you store
this information?

Thanks a lot in advance.
Mathieu.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] How would you store read/unread topic status?

2009-06-23 Thread Andres Freund

On 06/23/2009 01:12 PM, Mathieu Nebra wrote:

I'm running a quite large website which has its own forums. They are
currently heavily used and I'm getting performance issues. Most of them
are due to repeated UPDATE queries on a flags table.

This flags table has more or less the following fields:

UserID - TopicID - LastReadAnswerID

The flags table keeps track of every topic a member has visited and
remembers the last answer which was posted at this moment. It allows the
user to come back a few days after and immediately jump to the last
answer he has not read.
My problem is that everytime a user READS a topic, it UPDATES this flags
table to remember he has read it. This leads to multiple updates at the
same time on the same table, and an update can take a few seconds. This
is not acceptable for my users.
Have you analyzed why it takes that long? Determining that is the first 
step of improving the current situation...


My first guess would be, that your disks cannot keep up with the number 
of syncronous writes/second. Do you know how many transactions with 
write access you have? Guessing from your description you do at least 
one write for every page hit on your forum.


With the default settings every transaction needs to wait for io at the 
end - to ensure transactional semantics.
Depending on your disk the number of possible writes/second is quite low 
- a normal SATA disk with 7200rpm can satisfy something around 130 
syncronous writes per second. Which is the upper limit on writing 
transactions per second.

What disks do you have?

On which OS are you? If you are on linux you could use iostat to get 
some relevant statistics like:

iostat -x /path/to/device/the/database/resides/on 2 10

That gives you 10 statistics over periods of 2 seconds.


Depending on those results there are numerous solutions to that problem...


Question: what is the general rule of thumb here? How would you store
this information?
The problem here is, that every read access writes to disk - that is not 
going to scale very well.
One possible solution is to use something like memcached to store the 
last read post in memory and periodically write it into the database.



Which pg version are you using?


Andres

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] How would you store read/unread topic status?

2009-06-23 Thread Alexander Staubo
On Tue, Jun 23, 2009 at 1:12 PM, Mathieu Nebramate...@siteduzero.com wrote:
 This flags table has more or less the following fields:

 UserID - TopicID - LastReadAnswerID

We are doing pretty much same thing.

 My problem is that everytime a user READS a topic, it UPDATES this flags
 table to remember he has read it. This leads to multiple updates at the
 same time on the same table, and an update can take a few seconds. This
 is not acceptable for my users.

First of all, and I'm sure you thought of this, an update isn't needed
every time a user reads a topic; only when there are new answers that
need to be marked as read. So an update ... where last_read_answer_id
 ? should avoid the need for an update.

(That said, I believe PostgreSQL diffs tuple updates, so in practice
PostgreSQL might not be writing anything if you run an update with
the same value. I will let someone more intimate with the internal
details of updates to comment on this.)

Secondly, an update should not take a few seconds. You might want to
investigate this part before you turn to further optimizations.

In our application we defer the updates to a separate asynchronous
process using a simple queue mechanism, but in our case, we found that
the updates are fast enough (in the order of a few milliseconds) not
to warrant batching them into single transactions.

A.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] How would you store read/unread topic status?

2009-06-23 Thread Andres Freund

On 06/23/2009 02:37 PM, Alexander Staubo wrote:

(That said, I believe PostgreSQL diffs tuple updates, so in practice
PostgreSQL might not be writing anything if you run an update with
the same value. I will let someone more intimate with the internal
details of updates to comment on this.)

No, it does not do that by default.
You can write a trigger to do that though - and there is one packaged 
with the core version in the upcoming 8.4 version.


Andres

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] How would you store read/unread topic status?

2009-06-23 Thread justin

Mathieu Nebra wrote:

Hi all,

I'm running a quite large website which has its own forums. They are
currently heavily used and I'm getting performance issues. Most of them
are due to repeated UPDATE queries on a flags table.

This flags table has more or less the following fields:

UserID - TopicID - LastReadAnswerID

The flags table keeps track of every topic a member has visited and
remembers the last answer which was posted at this moment. It allows the
user to come back a few days after and immediately jump to the last
answer he has not read.

My problem is that everytime a user READS a topic, it UPDATES this flags
table to remember he has read it. This leads to multiple updates at the
same time on the same table, and an update can take a few seconds. This
is not acceptable for my users.

Question: what is the general rule of thumb here? How would you store
this information?

Thanks a lot in advance.
Mathieu.

  
Sounds like the server is getting IO bound by checkpoints causing flush 
to disk causing a IO to become bound.


http://www.westnet.com/~gsmith/content/postgresql/chkp-bgw-83.htm
there is some 8.0-8.2 tuning ideas in this link.

Yes this is acceptable way to store such information. 

What is the PG version.  performance tuning  options are different 
depending on the version???

http://wiki.postgresql.org/wiki/Performance_Optimization
http://www.westnet.com/~gsmith/content/postgresql/chkp-bgw-83.htm

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] How would you store read/unread topic status?

2009-06-23 Thread Nikolas Everett

 In our application we defer the updates to a separate asynchronous
 process using a simple queue mechanism, but in our case, we found that
 the updates are fast enough (in the order of a few milliseconds) not
 to warrant batching them into single transactions.


We do a very similar trick for another sort of data and its worked wonders
for performance.  We had more frequent updates to fewer rows, though.  If
you happen to be using Java, HashMap and TreeMap are perfect for this
because they are reentrant so you don't have to worry about synchronizing
your sweeper with your web page activities.  As an added bonus, when you do
this trick you don't have to query this information from the database unless
you have a cache miss.


Re: [PERFORM] How would you store read/unread topic status?

2009-06-23 Thread Matthew Wakeling

On Tue, 23 Jun 2009, Nikolas Everett wrote:
If you happen to be using Java, HashMap and TreeMap are perfect for this 
because they are reentrant so you don't have to worry about 
synchronizing your sweeper with your web page activities.


See the note in http://java.sun.com/javase/6/docs/api/java/util/TreeMap.html


Note that this implementation is not synchronized.


If you have multiple threads accessing a TreeMap or HashMap, then they 
must be synchronised to ensure that only one thread at a time is accessing 
it. Otherwise, you may suffer severe data loss and possibly even JVM 
crashes. Perhaps you meant java.util.concurrent.ConcurrentHashMap?


Be very careful.

Matthew

--
Now, you would have thought these coefficients would be integers, given that
we're working out integer results. Using a fraction would seem really
stupid. Well, I'm quite willing to be stupid here - in fact, I'm going to
use complex numbers.-- Computer Science Lecturer

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] How would you store read/unread topic status?

2009-06-23 Thread Mathieu Nebra
 On 06/23/2009 01:12 PM, Mathieu Nebra wrote:
  I'm running a quite large website which has its own forums. They are
  currently heavily used and I'm getting performance issues. Most of
them
  are due to repeated UPDATE queries on a flags table.
 
  This flags table has more or less the following fields:
 
  UserID - TopicID - LastReadAnswerID
 
  The flags table keeps track of every topic a member has visited and
  remembers the last answer which was posted at this moment. It
allows the
  user to come back a few days after and immediately jump to the last
  answer he has not read.
  My problem is that everytime a user READS a topic, it UPDATES this
flags
  table to remember he has read it. This leads to multiple updates
at the
  same time on the same table, and an update can take a few seconds.
This
  is not acceptable for my users.
  Have you analyzed why it takes that long? Determining that is the first
  step of improving the current situation...
 
  My first guess would be, that your disks cannot keep up with the number
  of syncronous writes/second. Do you know how many transactions with
  write access you have? Guessing from your description you do at least
  one write for every page hit on your forum.

I don't know how many writes/s Pgsql can handle on my server, but I
first suspected that it was good practice to avoid unnecessary writes.

I do 1 write/page for every connected user on the forums.
I do the same on another part of my website to increment the number of
page views (this was not part of my initial question but it is very close).

 
  With the default settings every transaction needs to wait for io at the
  end - to ensure transactional semantics.
  Depending on your disk the number of possible writes/second is quite low
  - a normal SATA disk with 7200rpm can satisfy something around 130
  syncronous writes per second. Which is the upper limit on writing
  transactions per second.
  What disks do you have?

We have 2 SAS RAID 0 15000rpm disks.

 
  On which OS are you? If you are on linux you could use iostat to get
  some relevant statistics like:
  iostat -x /path/to/device/the/database/resides/on 2 10
 
  That gives you 10 statistics over periods of 2 seconds.
 
 
  Depending on those results there are numerous solutions to that
problem...

Here it is:

$ iostat -x /dev/sda 2 10
Linux 2.6.18-6-amd64 (scratchy) 23.06.2009

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
  18,020,00   12,87   13,130,00   55,98

Device: rrqm/s   wrqm/s   r/s   w/s   rsec/s   wsec/s avgrq-sz
avgqu-sz   await  svctm  %util
sda   0,94   328,98 29,62 103,06   736,58  6091,1451,46
0,040,25   0,04   0,51

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
  39,650,00   48,382,000,009,98

Device: rrqm/s   wrqm/s   r/s   w/s   rsec/s   wsec/s avgrq-sz
avgqu-sz   await  svctm  %util
sda   0,00 0,00 10,00 78,00   516,00  1928,0027,77
   6,44   73,20   2,75  24,20

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
  40,150,00   48,132,240,009,48

Device: rrqm/s   wrqm/s   r/s   w/s   rsec/s   wsec/s avgrq-sz
avgqu-sz   await  svctm  %util
sda   0,00 0,00  6,47 100,50   585,07  2288,5626,87
   13,00  121,56   3,00  32,04

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
  45,140,00   45,646,730,002,49

Device: rrqm/s   wrqm/s   r/s   w/s   rsec/s   wsec/s avgrq-sz
avgqu-sz   await  svctm  %util
sda   1,00 0,00 34,00 157,50  1232,00  3904,0026,82
   26,64  139,09   3,03  58,00

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
  46,250,00   49,253,500,001,00

Device: rrqm/s   wrqm/s   r/s   w/s   rsec/s   wsec/s avgrq-sz
avgqu-sz   await  svctm  %util
sda   0,00 0,00 27,00 173,00   884,00  4224,0025,54
   24,46  122,32   3,00  60,00

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
  44,420,00   47,642,230,005,71

Device: rrqm/s   wrqm/s   r/s   w/s   rsec/s   wsec/s avgrq-sz
avgqu-sz   await  svctm  %util
sda   0,00 0,00 15,42 140,30   700,50  3275,6225,53
   17,94  115,21   2,81  43,78

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
  41,750,00   48,502,500,007,25

Device: rrqm/s   wrqm/s   r/s   w/s   rsec/s   wsec/s avgrq-sz
avgqu-sz   await  svctm  %util
sda   0,50 0,00 21,11 116,08   888,44  2472,3624,50
   12,62   91,99   2,55  34,97

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
  44,030,00   46,272,990,006,72

Device: rrqm/s   wrqm/s   r/s   w/s   rsec/s   wsec/s avgrq-sz
avgqu-sz   await  svctm  %util
sda   9,00 0,00 10,00 119,00   484,00  2728,0024,90
   15,15  117,47   2,70  34,80

avg-cpu:  %user   %nice %system %iowait  %steal   

Re: [PERFORM] How would you store read/unread topic status?

2009-06-23 Thread Mathieu Nebra
Alexander Staubo a écrit :
 On Tue, Jun 23, 2009 at 1:12 PM, Mathieu Nebramate...@siteduzero.com wrote:
 This flags table has more or less the following fields:

 UserID - TopicID - LastReadAnswerID
 
 We are doing pretty much same thing.
 
 My problem is that everytime a user READS a topic, it UPDATES this flags
 table to remember he has read it. This leads to multiple updates at the
 same time on the same table, and an update can take a few seconds. This
 is not acceptable for my users.
 
 First of all, and I'm sure you thought of this, an update isn't needed
 every time a user reads a topic; only when there are new answers that
 need to be marked as read. So an update ... where last_read_answer_id
  ? should avoid the need for an update.

We don't work that way. We just remember he has read these answers and
then we can tell him there are no new messages for you to read.
So we just need to write what he has read when he reads it.

 
 (That said, I believe PostgreSQL diffs tuple updates, so in practice
 PostgreSQL might not be writing anything if you run an update with
 the same value. I will let someone more intimate with the internal
 details of updates to comment on this.)
 
 Secondly, an update should not take a few seconds. You might want to
 investigate this part before you turn to further optimizations.

Yes, I know there is a problem but I don't know if I am competent enough
to tune PostgreSQL for that. It can take a while to understand the
problem, and I'm not sure I'll have the time for that.

I am, however, opened to suggestions. Maybe I'm doing something wrong
somewhere.

 
 In our application we defer the updates to a separate asynchronous
 process using a simple queue mechanism, but in our case, we found that
 the updates are fast enough (in the order of a few milliseconds) not
 to warrant batching them into single transactions.

A few milliseconds would be cool.
In fact, defering to another process is a good idea, but I'm not sure if
it is easy to implement. It would be great to have some sort of UPDATE
... LOW PRIORITY to make the request non blocking.

Thanks.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] How would you store read/unread topic status?

2009-06-23 Thread Robert Haas
  Which pg version are you using?

 I should have mentionned that before sorry: PostgreSQL 8.2

I think there is an awful lot of speculation on this thread about what
your problem is without anywhere near enough investigation.  A couple
of seconds for an update is a really long time, unless your server is
absolutely slammed, in which case probably everything is taking a long
time.  We need to get some more information on what is happening here.
 Approximately how many requests per second are you servicing?  Also,
can you:

1. Run EXPLAIN ANALYZE on a representative UPDATE statement and post
the exact query and the output.

2. Run VACUUM VERBOSE on your database and send the last 10 lines or
so of the output.

3. Try your UPDATE statement at a low-traffic time of day and see
whether it's faster than it is at a high-traffic time of day, and by
how much.  Or dump your database and reload it on a dev server and see
how fast it runs there.

...Robert

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] How would you store read/unread topic status?

2009-06-23 Thread Guillaume Cottenceau
Mathieu Nebra mateo21 'at' siteduzero.com writes:

 (That said, I believe PostgreSQL diffs tuple updates, so in practice
 PostgreSQL might not be writing anything if you run an update with
 the same value. I will let someone more intimate with the internal
 details of updates to comment on this.)
 
 Secondly, an update should not take a few seconds. You might want to
 investigate this part before you turn to further optimizations.

 Yes, I know there is a problem but I don't know if I am competent enough
 to tune PostgreSQL for that. It can take a while to understand the
 problem, and I'm not sure I'll have the time for that.

Short story: run the query in psql prepending EXPLAIN ANALYZE in
front of it and copy-paste the output in reply to that list.

Long story: there are a lot of interesting material in PG
official documentation about optimization. It is very worth a
read but it's longer than a short story. In my experience,
database performance can be degraded orders of magnitude if not
configured properly.

 I am, however, opened to suggestions. Maybe I'm doing something wrong
 somewhere.

 
 In our application we defer the updates to a separate asynchronous
 process using a simple queue mechanism, but in our case, we found that
 the updates are fast enough (in the order of a few milliseconds) not
 to warrant batching them into single transactions.

 A few milliseconds would be cool.

That also depends on the query. If your update selects rows not
according to an index you're going to be in trouble if the table
hosts a lot of data, but that's fair. So you might just need an
index. That might also be related to row bloat. Your query with
EXPLAIN ANALYZE would tell what postgres does (if it uses an
index or not).

 In fact, defering to another process is a good idea, but I'm not sure if
 it is easy to implement. It would be great to have some sort of UPDATE

No article on the site du zéro explaining how to implement
producer-consumers? :) But that must really be thought before
implementing. It's not worth piling queries in memory because it
will create other problems if queries are produced faster than
consumed in the long run.

-- 
Guillaume Cottenceau

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] How would you store read/unread topic status?

2009-06-23 Thread Andres Freund

On 06/23/2009 04:54 PM, Mathieu Nebra wrote:

On 06/23/2009 01:12 PM, Mathieu Nebra wrote:

I'm running a quite large website which has its own forums.
They are currently heavily used and I'm getting performance
issues. Most of

them

are due to repeated UPDATE queries on a flags table.

This flags table has more or less the following fields:

UserID - TopicID - LastReadAnswerID

The flags table keeps track of every topic a member has
visited and remembers the last answer which was posted at
this moment. It allows the user to come back a few days
after and immediately jump to the last answer he has not
read. My problem is that everytime a user READS a topic, it
UPDATES this flags table to remember he has read it. This
leads to multiple updates at the same time on the same table,
and an update can take a few seconds. This is not acceptable
for my users.

Have you analyzed why it takes that long? Determining that is the
first step of improving the current situation...

My first guess would be, that your disks cannot keep up with the
 number of syncronous writes/second. Do you know how many
transactions with write access you have? Guessing from your
description you do at least one write for every page hit on your
 forum.


I don't know how many writes/s Pgsql can handle on my server, but I
first suspected that it was good practice to avoid unnecessary
writes.

It surely is.


I do 1 write/page for every connected user on the forums. I do the
same on another part of my website to increment the number of page
views (this was not part of my initial question but it is very
close).

That even more cries for some in-memory-caching.


On which OS are you? If you are on linux you could use iostat to
 get some relevant statistics like: iostat -x
/path/to/device/the/database/resides/on 2 10

That gives you 10 statistics over periods of 2 seconds.


Depending on those results there are numerous solutions to that

problem...

Here it is:

$ iostat -x /dev/sda 2 10 Linux 2.6.18-6-amd64 (scratchy) 23.06.2009

avg-cpu:  %user   %nice %system %iowait  %steal   %idle 18,02 0,00
12,87   13,130,00   55,98

Device: rrqm/s   wrqm/s   r/s   w/s   rsec/s   wsec/s
avgrq-sz avgqu-sz   await  svctm  %util sda   0,94
328,98 29,62 103,06   736,58  6091,1451,46 0,040,25   0,04
0,51

avg-cpu:  %user   %nice %system %iowait  %steal   %idle 39,65 0,00
48,382,000,009,98

Device: rrqm/s   wrqm/s   r/s   w/s   rsec/s   wsec/s
avgrq-sz avgqu-sz   await  svctm  %util sda   0,00 0,00
10,00 78,00   516,00  1928,0027,77 6,44   73,20   2,75 24,20

avg-cpu:  %user   %nice %system %iowait  %steal   %idle 40,15 0,00
48,132,240,009,48

Device: rrqm/s   wrqm/s   r/s   w/s   rsec/s   wsec/s
avgrq-sz avgqu-sz   await  svctm  %util sda   0,00 0,00
6,47 100,50   585,07  2288,5626,87 13,00  121,56   3,00 32,04

avg-cpu:  %user   %nice %system %iowait  %steal   %idle 45,14 0,00
45,646,730,002,49

Device: rrqm/s   wrqm/s   r/s   w/s   rsec/s   wsec/s
avgrq-sz avgqu-sz   await  svctm  %util sda   1,00 0,00
34,00 157,50  1232,00  3904,0026,82 26,64  139,09   3,03 58,00

avg-cpu:  %user   %nice %system %iowait  %steal   %idle 46,25 0,00
49,253,500,001,00

Device: rrqm/s   wrqm/s   r/s   w/s   rsec/s   wsec/s
avgrq-sz avgqu-sz   await  svctm  %util sda   0,00 0,00
27,00 173,00   884,00  4224,0025,54 24,46  122,32   3,00 60,00

avg-cpu:  %user   %nice %system %iowait  %steal   %idle 44,42 0,00
47,642,230,005,71

Device: rrqm/s   wrqm/s   r/s   w/s   rsec/s   wsec/s
avgrq-sz avgqu-sz   await  svctm  %util sda   0,00 0,00
15,42 140,30   700,50  3275,6225,53 17,94  115,21   2,81 43,78

avg-cpu:  %user   %nice %system %iowait  %steal   %idle 41,75 0,00
48,502,500,007,25

Device: rrqm/s   wrqm/s   r/s   w/s   rsec/s   wsec/s
avgrq-sz avgqu-sz   await  svctm  %util sda   0,50 0,00
21,11 116,08   888,44  2472,3624,50 12,62   91,99   2,55 34,97

avg-cpu:  %user   %nice %system %iowait  %steal   %idle 44,03 0,00
46,272,990,006,72

Device: rrqm/s   wrqm/s   r/s   w/s   rsec/s   wsec/s
avgrq-sz avgqu-sz   await  svctm  %util sda   9,00 0,00
10,00 119,00   484,00  2728,0024,90 15,15  117,47   2,70 34,80

avg-cpu:  %user   %nice %system %iowait  %steal   %idle 36,91 0,00
51,372,490,009,23

Device: rrqm/s   wrqm/s   r/s   w/s   rsec/s   wsec/s
avgrq-sz avgqu-sz   await  svctm  %util sda   0,99 0,00
14,78 136,45   390,15  2825,6221,26 21,86  144,52   2,58 39,01

avg-cpu:  %user   %nice %system %iowait  %steal   %idle 38,75 0,00
48,751,000,00   11,50

Device: rrqm/s   wrqm/s   r/s   w/s   rsec/s   wsec/s
avgrq-sz avgqu-sz   await  svctm  %util sda   0,00 0,00
7,54 67,34   377,89  1764,8228,62 5,38   71,89   2,95 22,11

You see that 

Re: [PERFORM] How would you store read/unread topic status?

2009-06-23 Thread Robert Haas
 Which pg version are you using?

 I should have mentionned that before sorry: PostgreSQL 8.2

 I definitely would consider upgrading to 8.3 - even without any config
 changes it might bring quite some improvement.

 But mainly it would allow you to use asynchronous commit - which could
 possibly increase your throughput tremendously.

HOT can potentitally help a lot for this workload, too, if the columns
being updated are not indexed.

...Robert

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] How would you store read/unread topic status?

2009-06-23 Thread Grzegorz Jaśkiewicz
not better just to store last time user visited the topic ? or forum in
general, and compare that ?


Re: [PERFORM] How would you store read/unread topic status?

2009-06-23 Thread Robert Haas
On Tue, Jun 23, 2009 at 11:50 AM, Mathieu Nebramate...@siteduzero.com wrote:
  Approximately how many requests per second are you servicing?  Also,

 How can I extract this information from the database? I know how to use
 pg_stat_user_tables. My table has:

I was thinking you might look at your httpd logs.  Not sure how to get
it otherwise.

 can you:

 1. Run EXPLAIN ANALYZE on a representative UPDATE statement and post
 the exact query and the output.

 Index Scan using prj_frm_flg_pkey on prj_frm_flg  (cost=0.00..8.58
 rows=1 width=18)
   Index Cond: ((flg_mid = 3) AND (flg_sid = 123764))

 This time it only took 54ms, but maybe it's already a lot.

That looks like EXPLAIN, not EXPLAIN ANALYZE.  And can we also have the query?

 2. Run VACUUM VERBOSE on your database and send the last 10 lines or
 so of the output.

 It's not very long, I can give you the whole log:

 INFO:  vacuuming public.prj_frm_flgINFO:  scanned index
 prj_frm_flg_pkey to remove 74091 row versions
 DETAIL:  CPU 0.15s/0.47u sec elapsed 53.10 sec.INFO:  scanned index
 flg_fav to remove 74091 row versions
 DETAIL:  CPU 0.28s/0.31u sec elapsed 91.82 sec.INFO:  scanned index
 flg_notif to remove 74091 row versions
 DETAIL:  CPU 0.36s/0.37u sec elapsed 80.75 sec.INFO:  scanned index
 flg_post to remove 74091 row versions
 DETAIL:  CPU 0.31s/0.37u sec elapsed 115.86 sec.INFO:  scanned index
 flg_no_inter to remove 74091 row versions
 DETAIL:  CPU 0.34s/0.33u sec elapsed 68.96 sec.INFO:  prj_frm_flg:
 removed 74091 row versions in 5979 pages
 DETAIL:  CPU 0.29s/0.34u sec elapsed 100.37 sec.INFO:  index
 prj_frm_flg_pkey now contains 1315895 row versions in 7716 pages
 DETAIL:  63153 index row versions were removed.
 672 index pages have been deleted, 639 are currently reusable.
 CPU 0.00s/0.00u sec elapsed 0.00 sec.INFO:  index flg_fav now contains
 1315895 row versions in 18228 pages
 DETAIL:  73628 index row versions were removed.
 21 index pages have been deleted, 16 are currently reusable.
 CPU 0.00s/0.00u sec elapsed 0.00 sec.INFO:  index flg_notif now
 contains 1315895 row versions in 18179 pages
 DETAIL:  73468 index row versions were removed.
 22 index pages have been deleted, 13 are currently reusable.
 CPU 0.00s/0.00u sec elapsed 0.00 sec.INFO:  index flg_post now
 contains 1315895 row versions in 18194 pages
 DETAIL:  73628 index row versions were removed.
 30 index pages have been deleted, 23 are currently reusable.
 CPU 0.00s/0.00u sec elapsed 0.00 sec.INFO:  index flg_no_inter now
 contains 1315895 row versions in 8596 pages
 DETAIL:  73628 index row versions were removed.
 13 index pages have been deleted, 8 are currently reusable.
 CPU 0.00s/0.00u sec elapsed 0.00 sec.INFO:  prj_frm_flg: found 74091
 removable, 1315895 nonremovable row versions in 10485 pages
 DETAIL:  326 dead row versions cannot be removed yet.
 There were 253639 unused item pointers.
 10431 pages contain useful free space.
 0 pages are entirely empty.
 CPU 1.91s/2.28u sec elapsed 542.75 sec.

 Total: 542877 ms.

Is that just for the one table?  I meant a database-wide VACUUM
VERBOSE, so you can see if you've blown out your free-space map.

 3. Try your UPDATE statement at a low-traffic time of day and see
 whether it's faster than it is at a high-traffic time of day, and by
 how much.  Or dump your database and reload it on a dev server and see
 how fast it runs there.

 It took 4ms.

Was that at a low traffic time of day, or on a different server?

...Robert

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] How would you store read/unread topic status?

2009-06-23 Thread Mike
So your update doesn't take long to run during off-peak times, so
basically your options are:

1. Optimize your postgresql.conf settings or upgrade to the latest
version of PostgreSQL.

2. Redesign your forum code so it can scale better.

3. Upgrade your servers hardware as it may be overloaded.

I would probably attack those in the order I described. 

As far as redesigning your forum code, keep in mind that in PostgreSQL
an update is basically a select, delete, insert in a single statement.
First it needs to find the rows to update, it marks the rows for
deletion (which vacuum later does) and inserts a new row. So updates
can be quite expensive. 

In SOME situations, it can be faster to do inserts only, and modify
your select query to get just the data you need, for example:

Rather then an update like this:

update table set LastReadAnswerID = value where UserID = value
AND TopicID = value

You could do this instead:

insert into table VALUES(user_id,topic_id,last_read_answer_id)

Then just modify your select statement slightly to get the last
inserted row:

select * from table where user_id = value AND topic_id = value
order by LastReadAnswerID DESC LIMIT 1

This makes your select statement slightly more expensive but your
insert statement pretty much as cheap as possible. Since its much
easier to cache select results you could easily wrap some caching
mechanism around your select query to reduce the load there too. 

Then using a task scheduler like cron simply clear out old rows from the
table you insert into every minute, 5 minutes, hour, day, whatever makes
most sense to keep the select queries fast.

A memcached solution would probably be much better, but its also likely
much more involved to do.


On Tue, 23 Jun 2009 17:50:50 +0200
Mathieu Nebra mate...@siteduzero.com wrote:

 Robert Haas a écrit :
  Which pg version are you using?
  I should have mentionned that before sorry: PostgreSQL 8.2
  
  I think there is an awful lot of speculation on this thread about
  what your problem is without anywhere near enough investigation.  A
  couple of seconds for an update is a really long time, unless your
  server is absolutely slammed, in which case probably everything is
  taking a long time.  We need to get some more information on what
  is happening here.
 
 You're right, I'll give you the information you need.
 
   Approximately how many requests per second are you servicing?
  Also,
 
 How can I extract this information from the database? I know how to
 use pg_stat_user_tables. My table has:
 
 seq_tup_read
 133793491714
 
 idx_scan
 12408612540
 
 idx_tup_fetch
 41041660903
 
 n_tup_ins
 14700038
 
 n_tup_upd
 6698236
 
 n_tup_del
 15990670
 
  can you:
  
  1. Run EXPLAIN ANALYZE on a representative UPDATE statement and post
  the exact query and the output.
 
 Index Scan using prj_frm_flg_pkey on prj_frm_flg  (cost=0.00..8.58
 rows=1 width=18)
   Index Cond: ((flg_mid = 3) AND (flg_sid = 123764))
 
 This time it only took 54ms, but maybe it's already a lot.
 
 
  
  2. Run VACUUM VERBOSE on your database and send the last 10 lines or
  so of the output.
 
 It's not very long, I can give you the whole log:
 
 INFO:  vacuuming public.prj_frm_flgINFO:  scanned index
 prj_frm_flg_pkey to remove 74091 row versions
 DETAIL:  CPU 0.15s/0.47u sec elapsed 53.10 sec.INFO:  scanned index
 flg_fav to remove 74091 row versions
 DETAIL:  CPU 0.28s/0.31u sec elapsed 91.82 sec.INFO:  scanned index
 flg_notif to remove 74091 row versions
 DETAIL:  CPU 0.36s/0.37u sec elapsed 80.75 sec.INFO:  scanned index
 flg_post to remove 74091 row versions
 DETAIL:  CPU 0.31s/0.37u sec elapsed 115.86 sec.INFO:  scanned index
 flg_no_inter to remove 74091 row versions
 DETAIL:  CPU 0.34s/0.33u sec elapsed 68.96 sec.INFO:  prj_frm_flg:
 removed 74091 row versions in 5979 pages
 DETAIL:  CPU 0.29s/0.34u sec elapsed 100.37 sec.INFO:  index
 prj_frm_flg_pkey now contains 1315895 row versions in 7716 pages
 DETAIL:  63153 index row versions were removed.
 672 index pages have been deleted, 639 are currently reusable.
 CPU 0.00s/0.00u sec elapsed 0.00 sec.INFO:  index flg_fav now
 contains 1315895 row versions in 18228 pages
 DETAIL:  73628 index row versions were removed.
 21 index pages have been deleted, 16 are currently reusable.
 CPU 0.00s/0.00u sec elapsed 0.00 sec.INFO:  index flg_notif now
 contains 1315895 row versions in 18179 pages
 DETAIL:  73468 index row versions were removed.
 22 index pages have been deleted, 13 are currently reusable.
 CPU 0.00s/0.00u sec elapsed 0.00 sec.INFO:  index flg_post now
 contains 1315895 row versions in 18194 pages
 DETAIL:  73628 index row versions were removed.
 30 index pages have been deleted, 23 are currently reusable.
 CPU 0.00s/0.00u sec elapsed 0.00 sec.INFO:  index flg_no_inter now
 contains 1315895 row versions in 8596 pages
 DETAIL:  73628 index row versions were removed.
 13 index pages have been deleted, 8 are currently reusable.
 CPU 0.00s/0.00u sec elapsed 0.00 sec.INFO: 

Re: [PERFORM] How would you store read/unread topic status?

2009-06-23 Thread Mike
So your update doesn't take long to run during off-peak times, so
basically your options are:

1. Optimize your postgresql.conf settings or upgrade to the latest
version of PostgreSQL.

2. Redesign your forum code so it can scale better.

3. Upgrade your servers hardware as it may be overloaded.

I would probably attack those in the order I described. 

As far as redesigning your forum code, keep in mind that in PostgreSQL
an update is basically a select, delete, insert in a single statement.
First it needs to find the rows to update, it marks the rows for
deletion (which vacuum later does) and inserts a new row. So updates
can be quite expensive. 

In SOME situations, it can be faster to do inserts only, and modify
your select query to get just the data you need, for example:

Rather then an update like this:

update table set LastReadAnswerID = value where UserID = value
AND TopicID = value

You could do this instead:

insert into table VALUES(user_id,topic_id,last_read_answer_id)

Then just modify your select statement slightly to get the last
inserted row:

select * from table where user_id = value AND topic_id = value
order by LastReadAnswerID DESC LIMIT 1

This makes your select statement slightly more expensive but your
insert statement pretty much as cheap as possible. Since its much
easier to cache select results you could easily wrap some caching
mechanism around your select query to reduce the load there too. 

Then using a task scheduler like cron simply clear out old rows from the
table you insert into every minute, 5 minutes, hour, day, whatever makes
most sense to keep the select queries fast.

A memcached solution would probably be much better, but its also likely
much more involved to do.



On Tue, 23 Jun 2009 17:50:50 +0200
Mathieu Nebra mate...@siteduzero.com wrote:

 Robert Haas a écrit :
  Which pg version are you using?
  I should have mentionned that before sorry: PostgreSQL 8.2
  
  I think there is an awful lot of speculation on this thread about
  what your problem is without anywhere near enough investigation.  A
  couple of seconds for an update is a really long time, unless your
  server is absolutely slammed, in which case probably everything is
  taking a long time.  We need to get some more information on what
  is happening here.
 
 You're right, I'll give you the information you need.
 
   Approximately how many requests per second are you servicing?
  Also,
 
 How can I extract this information from the database? I know how to
 use pg_stat_user_tables. My table has:
 
 seq_tup_read
 133793491714
 
 idx_scan
 12408612540
 
 idx_tup_fetch
 41041660903
 
 n_tup_ins
 14700038
 
 n_tup_upd
 6698236
 
 n_tup_del
 15990670
 
  can you:
  
  1. Run EXPLAIN ANALYZE on a representative UPDATE statement and post
  the exact query and the output.
 
 Index Scan using prj_frm_flg_pkey on prj_frm_flg  (cost=0.00..8.58
 rows=1 width=18)
   Index Cond: ((flg_mid = 3) AND (flg_sid = 123764))
 
 This time it only took 54ms, but maybe it's already a lot.
 
 
  
  2. Run VACUUM VERBOSE on your database and send the last 10 lines or
  so of the output.
 
 It's not very long, I can give you the whole log:
 
 INFO:  vacuuming public.prj_frm_flgINFO:  scanned index
 prj_frm_flg_pkey to remove 74091 row versions
 DETAIL:  CPU 0.15s/0.47u sec elapsed 53.10 sec.INFO:  scanned index
 flg_fav to remove 74091 row versions
 DETAIL:  CPU 0.28s/0.31u sec elapsed 91.82 sec.INFO:  scanned index
 flg_notif to remove 74091 row versions
 DETAIL:  CPU 0.36s/0.37u sec elapsed 80.75 sec.INFO:  scanned index
 flg_post to remove 74091 row versions
 DETAIL:  CPU 0.31s/0.37u sec elapsed 115.86 sec.INFO:  scanned index
 flg_no_inter to remove 74091 row versions
 DETAIL:  CPU 0.34s/0.33u sec elapsed 68.96 sec.INFO:  prj_frm_flg:
 removed 74091 row versions in 5979 pages
 DETAIL:  CPU 0.29s/0.34u sec elapsed 100.37 sec.INFO:  index
 prj_frm_flg_pkey now contains 1315895 row versions in 7716 pages
 DETAIL:  63153 index row versions were removed.
 672 index pages have been deleted, 639 are currently reusable.
 CPU 0.00s/0.00u sec elapsed 0.00 sec.INFO:  index flg_fav now
 contains 1315895 row versions in 18228 pages
 DETAIL:  73628 index row versions were removed.
 21 index pages have been deleted, 16 are currently reusable.
 CPU 0.00s/0.00u sec elapsed 0.00 sec.INFO:  index flg_notif now
 contains 1315895 row versions in 18179 pages
 DETAIL:  73468 index row versions were removed.
 22 index pages have been deleted, 13 are currently reusable.
 CPU 0.00s/0.00u sec elapsed 0.00 sec.INFO:  index flg_post now
 contains 1315895 row versions in 18194 pages
 DETAIL:  73628 index row versions were removed.
 30 index pages have been deleted, 23 are currently reusable.
 CPU 0.00s/0.00u sec elapsed 0.00 sec.INFO:  index flg_no_inter now
 contains 1315895 row versions in 8596 pages
 DETAIL:  73628 index row versions were removed.
 13 index pages have been deleted, 8 are currently reusable.
 CPU 0.00s/0.00u sec elapsed 0.00 

Re: [PERFORM] How would you store read/unread topic status?

2009-06-23 Thread Scott Carey
You're holding this behavior to far too strict of a transactional guarantee.

The client software can cache a set of recent views, and sent updates in
bulk every 1 or 2 seconds.  Worst case, if your client crashes you lose a
second worth of user metadata updates on last accessed and view counts.
This isn't a financial transaction, don't build the app like one.

The same facility can serve as a read cache for other bits that don't need
to be 'perfect' in the transactional sense -- counts on the number of views
/ posts of a topic, etc.  Using the db to store and retrieve such counts
synchronously is frankly, a bad application design.


The tricky part with the above is two fold:  you need to have client
software capable of a thread-safe shared cache, and the clients will have to
have sticky-session if you are load balancing.  Corner cases such as a
server going down and a user switching servers will need to be worked out.


On 6/23/09 4:12 AM, Mathieu Nebra mate...@siteduzero.com wrote:

 Hi all,
 
 I'm running a quite large website which has its own forums. They are
 currently heavily used and I'm getting performance issues. Most of them
 are due to repeated UPDATE queries on a flags table.
 
 This flags table has more or less the following fields:
 
 UserID - TopicID - LastReadAnswerID
 
 The flags table keeps track of every topic a member has visited and
 remembers the last answer which was posted at this moment. It allows the
 user to come back a few days after and immediately jump to the last
 answer he has not read.
 
 My problem is that everytime a user READS a topic, it UPDATES this flags
 table to remember he has read it. This leads to multiple updates at the
 same time on the same table, and an update can take a few seconds. This
 is not acceptable for my users.
 
 Question: what is the general rule of thumb here? How would you store
 this information?
 
 Thanks a lot in advance.
 Mathieu.
 
 --
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance
 


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] How would you store read/unread topic status?

2009-06-23 Thread Scott Carey
On 6/23/09 7:54 AM, Mathieu Nebra mate...@siteduzero.com wrote:

 On 06/23/2009 01:12 PM, Mathieu Nebra wrote:
 I'm running a quite large website which has its own forums. They are
 currently heavily used and I'm getting performance issues. Most of
 them
 are due to repeated UPDATE queries on a flags table.
 
 This flags table has more or less the following fields:
 
 UserID - TopicID - LastReadAnswerID
 
 The flags table keeps track of every topic a member has visited and
 remembers the last answer which was posted at this moment. It
 allows the
 user to come back a few days after and immediately jump to the last
 answer he has not read.
 My problem is that everytime a user READS a topic, it UPDATES this
 flags
 table to remember he has read it. This leads to multiple updates
 at the
 same time on the same table, and an update can take a few seconds.
 This
 is not acceptable for my users.
 Have you analyzed why it takes that long? Determining that is the first
 step of improving the current situation...
 
 My first guess would be, that your disks cannot keep up with the number
 of syncronous writes/second. Do you know how many transactions with
 write access you have? Guessing from your description you do at least
 one write for every page hit on your forum.
 
 I don't know how many writes/s Pgsql can handle on my server, but I
 first suspected that it was good practice to avoid unnecessary writes.
 
 I do 1 write/page for every connected user on the forums.
 I do the same on another part of my website to increment the number of
 page views (this was not part of my initial question but it is very close).
 
 
 With the default settings every transaction needs to wait for io at the
 end - to ensure transactional semantics.
 Depending on your disk the number of possible writes/second is quite low
 - a normal SATA disk with 7200rpm can satisfy something around 130
 syncronous writes per second. Which is the upper limit on writing
 transactions per second.
 What disks do you have?
 
 We have 2 SAS RAID 0 15000rpm disks.
 
 
 On which OS are you? If you are on linux you could use iostat to get
 some relevant statistics like:
 iostat -x /path/to/device/the/database/resides/on 2 10
 
 That gives you 10 statistics over periods of 2 seconds.
 
 
 Depending on those results there are numerous solutions to that
 problem...
 
 Here it is:
 
 $ iostat -x /dev/sda 2 10
 Linux 2.6.18-6-amd64 (scratchy) 23.06.2009
 
 avg-cpu:  %user   %nice %system %iowait  %steal   %idle
   18,020,00   12,87   13,130,00   55,98
 
 Device: rrqm/s   wrqm/s   r/s   w/s   rsec/s   wsec/s avgrq-sz
 avgqu-sz   await  svctm  %util
 sda   0,94   328,98 29,62 103,06   736,58  6091,1451,46
 0,040,25   0,04   0,51
 
 avg-cpu:  %user   %nice %system %iowait  %steal   %idle
   39,650,00   48,382,000,009,98
 
 Device: rrqm/s   wrqm/s   r/s   w/s   rsec/s   wsec/s avgrq-sz
 avgqu-sz   await  svctm  %util
 sda   0,00 0,00 10,00 78,00   516,00  1928,0027,77
6,44   73,20   2,75  24,20
 
 avg-cpu:  %user   %nice %system %iowait  %steal   %idle
   40,150,00   48,132,240,009,48
 
 Device: rrqm/s   wrqm/s   r/s   w/s   rsec/s   wsec/s avgrq-sz
 avgqu-sz   await  svctm  %util
 sda   0,00 0,00  6,47 100,50   585,07  2288,5626,87
13,00  121,56   3,00  32,04
 
 avg-cpu:  %user   %nice %system %iowait  %steal   %idle
   45,140,00   45,646,730,002,49
 
 Device: rrqm/s   wrqm/s   r/s   w/s   rsec/s   wsec/s avgrq-sz
 avgqu-sz   await  svctm  %util
 sda   1,00 0,00 34,00 157,50  1232,00  3904,0026,82
26,64  139,09   3,03  58,00
 
 avg-cpu:  %user   %nice %system %iowait  %steal   %idle
   46,250,00   49,253,500,001,00
 
 Device: rrqm/s   wrqm/s   r/s   w/s   rsec/s   wsec/s avgrq-sz
 avgqu-sz   await  svctm  %util
 sda   0,00 0,00 27,00 173,00   884,00  4224,0025,54
24,46  122,32   3,00  60,00
 
 avg-cpu:  %user   %nice %system %iowait  %steal   %idle
   44,420,00   47,642,230,005,71
 
 Device: rrqm/s   wrqm/s   r/s   w/s   rsec/s   wsec/s avgrq-sz
 avgqu-sz   await  svctm  %util
 sda   0,00 0,00 15,42 140,30   700,50  3275,6225,53
17,94  115,21   2,81  43,78
 
 avg-cpu:  %user   %nice %system %iowait  %steal   %idle
   41,750,00   48,502,500,007,25
 
 Device: rrqm/s   wrqm/s   r/s   w/s   rsec/s   wsec/s avgrq-sz
 avgqu-sz   await  svctm  %util
 sda   0,50 0,00 21,11 116,08   888,44  2472,3624,50
12,62   91,99   2,55  34,97
 
 avg-cpu:  %user   %nice %system %iowait  %steal   %idle
   44,030,00   46,272,990,006,72
 
 Device: rrqm/s   wrqm/s   r/s   w/s   rsec/s   wsec/s avgrq-sz
 avgqu-sz   await  svctm  %util
 sda   9,00 0,00 10,00 

Re: [PERFORM] How would you store read/unread topic status?

2009-06-23 Thread Greg Stark
All the other comments are accurate, though it does seem like
something the database ought to be able to handle.

The other thing which hasn't been mentioned is that you have a lot of
indexes. Updates require maintaining all those indexes. Are all of
these indexes really necessary? Do you have routine queries which look
up users based on their flags? Or all all your oltp transactions for
specific userids in which case you probably just need the index on
userid.

You'll probably find 8.3 helps this workload more than any tuning you
can do in the database though. Especially if you can reduce the number
of indexes and avoid an index on any flags that are being updated.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] How would you store read/unread topic status?

2009-06-23 Thread Mathieu Nebra
Greg Stark a écrit :
 All the other comments are accurate, though it does seem like
 something the database ought to be able to handle.
 
 The other thing which hasn't been mentioned is that you have a lot of
 indexes. Updates require maintaining all those indexes. Are all of
 these indexes really necessary? Do you have routine queries which look
 up users based on their flags? Or all all your oltp transactions for
 specific userids in which case you probably just need the index on
 userid.


We are using these indexes, but I can't be sure if we _really_ need them
or not.

I can go into detail. We have:

UserID - TopicID - LastReadAnswerID - WrittenStatus - IsFavorite

So basically, we toggle the boolean flag WrittenStatus when the user has
written in that topic. The same goes for IsFavorite.

We have indexes on them, so we can SELECT every topic WHERE the user has
written. Is it the good way of doing this?


Oh, I've made a mistake before, we have RAID 1 disks, not RAID 0.


 
 You'll probably find 8.3 helps this workload more than any tuning you
 can do in the database though. Especially if you can reduce the number
 of indexes and avoid an index on any flags that are being updated.

I'll start this way, thanks. First 8.3, then I'll check my flags.

I have a lot of ways to investigate and I would like to thank every
contributor here. I might come again with more precise information.

Thanks.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] How would you store read/unread topic status?

2009-06-23 Thread Alexander Staubo
On Tue, Jun 23, 2009 at 1:12 PM, Mathieu Nebramate...@siteduzero.com wrote:
 The flags table keeps track of every topic a member has visited and
 remembers the last answer which was posted at this moment. It allows the
 user to come back a few days after and immediately jump to the last
 answer he has not read.

I forgot to mention that we speed up our queries by caching the last
read ID in Memcached. This is the kind of thing that Memcached is
ideal for.

For example, we show the list of the most recent posts, along with a
comment count, eg. 42 comments (6 new). We found that joining posts
against the last-read table is expensive, so instead we read from
Memcached on every post to find the number of unread comments.

We use the thread's last commented at timestamp as part of the key
so that when somebody posts a new comment, every user's cached unread
count is invalidated; it is automatically recalculated the next time
they view the post.

A.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] How would you store read/unread topic status?

2009-06-23 Thread Craig James

Mathieu Nebra wrote:

Greg Stark a écrit :

All the other comments are accurate, though it does seem like
something the database ought to be able to handle.

The other thing which hasn't been mentioned is that you have a lot of
indexes. Updates require maintaining all those indexes. Are all of
these indexes really necessary? Do you have routine queries which look
up users based on their flags? Or all all your oltp transactions for
specific userids in which case you probably just need the index on
userid.



We are using these indexes, but I can't be sure if we _really_ need them
or not.

I can go into detail. We have:

UserID - TopicID - LastReadAnswerID - WrittenStatus - IsFavorite

So basically, we toggle the boolean flag WrittenStatus when the user has
written in that topic. The same goes for IsFavorite.


Do those last two columns hold much data?  Another thing to consider is to 
split this into two tables:

 UserID - TopicID - LastReadAnswerID 


 UserID - TopicID - WrittenStatus - IsFavorite

As others have pointed out, an UPDATE in Postgres is a select/delete/insert, 
and if you're updating just the LastReadAnswerID all the time, you're wasting 
time deleting and re-inserting a lot of data that never change (assuming 
they're not trivially small columns).

This might also solve the problem of too many indexes -- the table that's 
updated frequently would only have an index on (UserID, TopicID), so the update 
only affects one index.

Then to minimize the impact on your app, create a view that looks like the 
original table for read-only apps.

Craig

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] How would you store read/unread topic status?

2009-06-23 Thread Greg Stark
On Tue, Jun 23, 2009 at 9:04 PM, Mathieu Nebramate...@siteduzero.com wrote:
 We have indexes on them, so we can SELECT every topic WHERE the user has
 written. Is it the good way of doing this?

I'm kind of skeptical that a simple index on userid,topic isn't
sufficient to handle this case. But you would have to test it on
actual data to be sure. It depends whether you have enough topics and
enough userid,topic records for a given userid that scanning all the
topics for a given user is actually too slow.

Even if it's necessary you might consider having a partial index on
user,topic WHERE writtenstatus instead of having a three-column index.

-- 
greg
http://mit.edu/~gsstark/resume.pdf

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance