Re: [PERFORM] new to postgres (and db management) and performance already a problem :-(

2006-01-17 Thread Antoine
On 17/01/06, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote:
 Try a), b), and c) in order on the offending tables as they address the problem at increasing cost...thanks alot for the detailed information! the entire concept of vacuum isn'tyet that clear to me, so your explanations and hints are very much
appreciated. i'll defenitely try these steps this weekend when the next fullvacuum was scheduled :-)Thanks guys, that pretty much answered my question(s) too. I have a sneaking suspicion that vacuuming won't do too much for us however... now that I think about it - we do very little removing, pretty much only inserts and selects. I will give it a vacuum full and see what happens.
CheersAntoine -- This is where I should put some witty comment.


[PERFORM] Autovacuum / full vacuum

2006-01-17 Thread Michael Riess

hi,

I'm curious as to why autovacuum is not designed to do full vacuum. I 
know that the necessity of doing full vacuums can be reduced by 
increasing the FSM, but in my opinion that is the wrong decision for 
many applications. My application does not continuously 
insert/update/delete tuples at a constant rate. Basically there are long 
periods of relatively few modifications and short burst of high 
activity. Increasing the FSM so that even during these bursts most space 
 would be reused would mean to reduce the available memory for all 
other database tasks.


So my question is: What's the use of an autovacuum daemon if I still 
have to use a cron job to do full vacuums? wouldn't it just be a minor 
job to enhance autovacuum to be able to perform full vacuums, if one 
really wants it to do that - even if some developers think that it's the 
wrong approach?


Mike

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] Autovacuum / full vacuum

2006-01-17 Thread Christopher Kings-Lynne
So my question is: What's the use of an autovacuum daemon if I still 
have to use a cron job to do full vacuums? wouldn't it just be a minor 
job to enhance autovacuum to be able to perform full vacuums, if one 
really wants it to do that - even if some developers think that it's the 
wrong approach?


You should never have to do full vacuums...

Chris

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Autovacuum / full vacuum

2006-01-17 Thread Michael Riess

Hi,

did you read my post? In the first part I explained why I don't want to 
increase the FSM that much.


Mike

So my question is: What's the use of an autovacuum daemon if I still 
have to use a cron job to do full vacuums? wouldn't it just be a minor 
job to enhance autovacuum to be able to perform full vacuums, if one 
really wants it to do that - even if some developers think that it's 
the wrong approach?


You should never have to do full vacuums...

Chris

---(end of broadcast)---
TIP 6: explain analyze is your friend



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


[PERFORM] Use of Stored Procedures and

2006-01-17 Thread Marcos
Hi,

I already read the documentation for to use the SPI_PREPARE and
SPI_EXEC... but sincerely I don't understand how I will use this
resource in my statements.

I looked for examples, but I din't good examples :(..

Somebody can help me?

Thanks.

Marcos.




---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Autovacuum / full vacuum

2006-01-17 Thread Pandurangan R S
 You should never have to do full vacuums...

I would rather say, You should never have to do full vacuums by any
periodic means. It may be done on a adhoc basis, when you have figured
out that your table is never going to grow that big again.

On 1/17/06, Christopher Kings-Lynne [EMAIL PROTECTED] wrote:
  So my question is: What's the use of an autovacuum daemon if I still
  have to use a cron job to do full vacuums? wouldn't it just be a minor
  job to enhance autovacuum to be able to perform full vacuums, if one
  really wants it to do that - even if some developers think that it's the
  wrong approach?

 You should never have to do full vacuums...

 Chris

 ---(end of broadcast)---
 TIP 6: explain analyze is your friend


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Autovacuum / full vacuum

2006-01-17 Thread Michael Stone

On Tue, Jan 17, 2006 at 11:33:02AM +0100, Michael Riess wrote:
did you read my post? In the first part I explained why I don't want to 
increase the FSM that much.


Since you didn't quantify it, that wasn't much of a data point. (IOW,
you'd generally have to be seriously resource constrained before the FSM
would be a significant source of memory consumption--in which case more
RAM would probably be a much better solution than screwing with
autovacuum.)

Mike Stone

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Autovacuum / full vacuum

2006-01-17 Thread Alvaro Herrera
Michael Riess wrote:
 hi,
 
 I'm curious as to why autovacuum is not designed to do full vacuum.

Because a VACUUM FULL is too invasive.  Lazy vacuum is so light on the
system w.r.t. locks that it's generally not a problem to start one at
any time.  On the contrary, vacuum full could be a disaster on some
situations.

What's more, in general a lazy vacuum is enough to keep the dead space
within manageability, given a good autovacuum configuration and good FSM
configuration, so there's mostly no need for full vacuum.  (This is the
theory at least.)  For the situations where there is a need, we tell you
to issue it manually.

 So my question is: What's the use of an autovacuum daemon if I still 
 have to use a cron job to do full vacuums? wouldn't it just be a minor 
 job to enhance autovacuum to be able to perform full vacuums, if one 
 really wants it to do that - even if some developers think that it's the 
 wrong approach?

Yes, it is a minor job to enhance it to perform vacuum full.  The
problem is having a good approach to determining _when_ to issue a full
vacuum, and having a way to completely disallow it.  If you want to do
the development work, be my guest (but let us know your design first).
If you don't, I guess you would have to wait until it comes high enough
on someone's to-do list, maybe because you convinced him (or her, but we
don't have Postgres-ladies at the moment AFAIK) monetarily or something.

You can, of course, produce a patch and use it internally.  This is free
software, remember.

-- 
Alvaro Herrera   Developer, http://www.PostgreSQL.org
God is real, unless declared as int

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] new to postgres (and db management) and performance already a problem :-(

2006-01-17 Thread Andrew Sullivan
On Tue, Jan 17, 2006 at 09:14:27AM +0100, Antoine wrote:
 think about it - we do very little removing, pretty much only inserts and
 selects. I will give it a vacuum full and see what happens.

UPDATES?  Remember that, in Postgres, UPDATE is effectively DELETE +
INSERT (from the point of view of storage, not the point of view of
the user).

A

-- 
Andrew Sullivan  | [EMAIL PROTECTED]
The plural of anecdote is not data.
--Roger Brinner

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] new to postgres (and db management) and performance

2006-01-17 Thread Markus Schaber
Hi, Tom,

Tom Lane wrote:

in our db system (for a website), i notice performance boosts after a vacuum 
full. but then, a VACUUM FULL takes 50min+ during which the db is not really 
accessible to web-users. is there another way to perform maintenance tasks 
AND leaving the db fully operable and accessible?
 
 You're not doing regular vacuums often enough.

It may also help to increase the max_fsm_pages setting, so postmaster
has more memory to remember freed pages between VACUUMs.

HTH,
Markus

-- 
Markus Schaber | Logical TrackingTracing International AG
Dipl. Inf. | Software Development GIS

Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] new to postgres (and db management) and performance

2006-01-17 Thread Markus Schaber
Hi, Thomas,

[EMAIL PROTECTED] wrote:
 Try a), b), and c) in order on the offending tables as they address
 the problem at increasing cost...
 
 thanks alot for the detailed information! the entire concept of vacuum
 isn't yet that clear to me, so your explanations and hints are very much
 appreciated. i'll defenitely try these steps this weekend when the next
 full vacuum was scheduled :-)

Basically, VACUUM scans the whole table and looks for pages containing
garbage rows (or row versions), deletes the garbage, and adds those
pages to the free space map (if there are free slots). When allocating
new rows / row versions, PostgreSQL first tries to fit them in pages
from the free space maps before allocating new pages. This is why a high
max_fsm_pages setting can help when VACUUM freqency is low.

VACUUM FULL additionally moves rows between pages, trying to concentrate
all the free space at the end of the tables (aka defragmentation), so
it can then truncate the files and release the space to the filesystem.

CLUSTER basically rebuilds the tables by copying all rows into a new
table, in index order, and then dropping the old table, which also
reduces fragmentation, but not as strong as VACUUM FULL might.

ANALYZE creates statistics about the distribution of values in a column,
allowing the query optimizer to estimate the selectivity of query criteria.

(This explanation is rather simplified, and ignores indices as well as
the fact that a table can consist of multiple files. Also, I believe
that newer PostgreSQL versions allow VACUUM to truncate files when free
pages happen to appear at the very end of the file.)


HTH,
Markus

-- 
Markus Schaber | Logical TrackingTracing International AG
Dipl. Inf. | Software Development GIS

Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Autovacuum / full vacuum

2006-01-17 Thread Christopher Browne
 I'm curious as to why autovacuum is not designed to do full vacuum. 

Because that's terribly invasive due to the locks it takes out.

Lazy vacuum may chew some I/O, but it does *not* block your
application for the duration.

VACUUM FULL blocks the application.  That is NOT something that anyone
wants to throw into the activity mix randomly.
-- 
let name=cbbrowne and tld=gmail.com in String.concat @ [name;tld];;
http://linuxdatabases.info/info/slony.html
Signs of a Klingon Programmer #11: This machine is a piece of GAGH! I
need dual Pentium processors if I am to do battle with this code!

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Autovacuum / full vacuum

2006-01-17 Thread Michael Riess



VACUUM FULL blocks the application.  That is NOT something that anyone
wants to throw into the activity mix randomly.


There must be a way to implement a daemon which frees up space of a 
relation without blocking it too long. It could abort after a certain 
number of blocks have been freed and then move to the next relation.


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Autovacuum / full vacuum

2006-01-17 Thread Michael Riess

Well,

I think that the documentation is not exactly easy to understand. I 
always wondered why there are no examples for common postgresql 
configurations. All I know is that the default configuration seems to be 
too low for production use. And while running postgres I get no hints as 
to which setting needs to be increased to improve performance. I have no 
chance to see if my FSM settings are too low other than to run vacuum 
full verbose in psql, pipe the result to a text file and grep for some 
words to get a somewhat comprehensive idea of how much unused space 
there is in my system.


Don't get me wrong - I really like PostgreSQL and it works well in my 
application. But somehow I feel that it might run much better ...


about the FSM: You say that increasing the FSM is fairly cheap - how 
should I know that?



did you read my post? In the first part I explained why I don't want
to increase the FSM that much.


No, you didn't.  You explained *that* you thought you didn't want to
increase the FSM.  You didn't explain why.

FSM expansion comes fairly cheap  ...


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] Autovacuum / full vacuum

2006-01-17 Thread Andrew Sullivan
On Tue, Jan 17, 2006 at 11:18:59AM +0100, Michael Riess wrote:
 hi,
 
 I'm curious as to why autovacuum is not designed to do full vacuum. I 

Because nothing that runs automatically should ever take an exclusive
lock on the entire database, which is what VACUUM FULL does.

 activity. Increasing the FSM so that even during these bursts most space 
  would be reused would mean to reduce the available memory for all 
 other database tasks.

I don't believe the hit is enough that you should even notice it. 
You'd have to post some pretty incredible use cases to show that the
tiny loss of memory to FSM is worth (a) an exclusive lock and (b) the
loss of efficiency you get from having some preallocated pages in
tables.

A

-- 
Andrew Sullivan  | [EMAIL PROTECTED]
The fact that technology doesn't work is no bar to success in the marketplace.
--Philip Greenspun

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Autovacuum / full vacuum

2006-01-17 Thread Michael Riess

Hi,


hi,

I'm curious as to why autovacuum is not designed to do full vacuum. I 


Because nothing that runs automatically should ever take an exclusive
lock on the entire database, which is what VACUUM FULL does.


I thought that vacuum full only locks the table which it currently 
operates on? I'm pretty sure that once a table has been vacuumed, it can 
be accessed without any restrictions while the vacuum process works on 
the next table.




activity. Increasing the FSM so that even during these bursts most space 
 would be reused would mean to reduce the available memory for all 
other database tasks.


I don't believe the hit is enough that you should even notice it. 
You'd have to post some pretty incredible use cases to show that the

tiny loss of memory to FSM is worth (a) an exclusive lock and (b) the
loss of efficiency you get from having some preallocated pages in
tables.


I have 5000 tables and a workstation with 1 GB RAM which hosts an Apache 
  Web Server, Tomcat Servlet Container and PostgreSQL. RAM is not 
something that I have plenty of ... and the hardware is fixed and cannot 
be changed.




---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Autovacuum / full vacuum

2006-01-17 Thread Michael Stone

On Tue, Jan 17, 2006 at 03:50:38PM +0100, Michael Riess wrote:
about the FSM: You say that increasing the FSM is fairly cheap - how 
should I know that?


Why would you assume otherwise, to the point of not considering changing
the setting? 


The documentation explains how much memory is used for FSM entries. If
you look at vacuum verbose output it will tell you how much memory
you're currently using for the FSM.

Mike Stone

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] Autovacuum / full vacuum

2006-01-17 Thread Andrew Sullivan
On Tue, Jan 17, 2006 at 03:50:38PM +0100, Michael Riess wrote:
 always wondered why there are no examples for common postgresql 
 configurations. 

You mean like this one? (for 8.0):

http://www.powerpostgresql.com/Downloads/annotated_conf_80.html



 All I know is that the default configuration seems to be 
 too low for production use. 

Define production use.  It may be too low for you.

 chance to see if my FSM settings are too low other than to run vacuum 
 full verbose in psql, pipe the result to a text file and grep for some 

Not true.  You don't need a FULL on there to figure this out.

 about the FSM: You say that increasing the FSM is fairly cheap - how 
 should I know that?

Do the math.  The docs say this:

--snip---
max_fsm_pages (integer)

Sets the maximum number of disk pages for which free space will
be tracked in the shared free-space map. Six bytes of shared memory
are consumed for each page slot. This setting must be more than 16 *
max_fsm_relations. The default is 2. This option can only be set
at server start. 

max_fsm_relations (integer)

Sets the maximum number of relations (tables and indexes) for
which free space will be tracked in the shared free-space map.
Roughly seventy bytes of shared memory are consumed for each slot.
The default is 1000. This option can only be set at server start. 

---snip---

So by default, you have 6 B * 20,000 = 120,000 bytes for the FSM pages.

By default, you have 70 B * 1,000 = 70,000 bytes for the FSM
relations.

Now, there are two knobs.  One of them tracks the number of
relations.  How many relations do you have?  Count the number of
indexes and tables you have, and give yourself some headroom in case
you add some more, and poof, you have your number for the relations.

Now all you need to do is figure out what your churn rate is on
tables, and count up how many disk pages that's likely to be.  Give
yourself a little headroom, and the number of FSM pages is done, too.

This churn rate is often tough to estimate, though, so you may have
to fiddle with it from time to time. 

A


-- 
Andrew Sullivan  | [EMAIL PROTECTED]
I remember when computers were frustrating because they *did* exactly what 
you told them to.  That actually seems sort of quaint now.
--J.D. Baldwin

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Autovacuum / full vacuum

2006-01-17 Thread Andrew Sullivan
On Tue, Jan 17, 2006 at 09:09:02AM -0500, Matthew T. O'Connor wrote:
 vacuum.  As long as that percentage is small enough, the effect on 
 performance is negligible.  Have you measured to see if things are truly 

Actually, as long as the percentage is small enough and the pages are
really empty, the performance effect is positive.  If you have VACUUM
FULLed table, inserts have to extend the table before inserting,
whereas in a table with some space reclaimed, the I/O effect of
having to allocate another disk page is already done.

A

-- 
Andrew Sullivan  | [EMAIL PROTECTED]
When my information changes, I alter my conclusions.  What do you do sir?
--attr. John Maynard Keynes

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Autovacuum / full vacuum

2006-01-17 Thread hubert depesz lubaczewski
On 1/17/06, Michael Riess [EMAIL PROTECTED] wrote:
about the FSM: You say that increasing the FSM is fairly cheap - howshould I know that?comment from original postgresql.conf file seems pretty obvious:#max_fsm_pages = 2 # min max_fsm_relations*16, 6 bytes each
#max_fsm_relations = 1000 # min 100, ~70 bytes eachbasically setting max_fsm_pages to 100 consumes 6 megabytes. and i definitelly doubt you will ever hit that high.depesz


Re: [PERFORM] Autovacuum / full vacuum

2006-01-17 Thread Tom Lane
Michael Riess [EMAIL PROTECTED] writes:
 I'm curious as to why autovacuum is not designed to do full vacuum.

Locking considerations.  VACUUM FULL takes an exclusive lock, which
blocks any foreground transactions that want to touch the table ---
so it's really not the sort of thing you want being launched at
unpredictable times.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Autovacuum / full vacuum

2006-01-17 Thread Andrew Sullivan
On Tue, Jan 17, 2006 at 04:04:41PM +0100, Michael Riess wrote:
 
 I thought that vacuum full only locks the table which it currently 
 operates on? I'm pretty sure that once a table has been vacuumed, it can 
 be accessed without any restrictions while the vacuum process works on 
 the next table.

Yes, I think the way I phrased it was unfortunate.  But if you issue
VACUUM FULL you'll get an exclusive lock on everything, although not
all at the same time.  But of course, if your query load is like
this

BEGIN;
SELECT from t1, t2 where t1.col1 = t2.col2;
[application logic]
UPDATE t3 . . .
COMMIT;

you'll find yourself blocked in the first statement on both t1 and
t2; and then on t3 as well.  You sure don't want that to happen
automagically, in the middle of your business day.  

 I have 5000 tables and a workstation with 1 GB RAM which hosts an Apache 
   Web Server, Tomcat Servlet Container and PostgreSQL. RAM is not 
 something that I have plenty of ... and the hardware is fixed and cannot 
 be changed.

I see.  Well, I humbly submit that your problem is not the design of
the PostgreSQL server, then.  The hardware is fixed and cannot be
changed, is the first optimisation I'd make.  Heck, I gave away a
box to charity only two weeks ago that would solve your problem
better than automatically issuing VACUUM FULL.

A

-- 
Andrew Sullivan  | [EMAIL PROTECTED]
Information security isn't a technological problem.  It's an economics
problem.
--Bruce Schneier

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Autovacuum / full vacuum

2006-01-17 Thread Tom Lane
Michael Riess [EMAIL PROTECTED] writes:
 But actually I never understood why the database system slows down at 
 all when there is much unused space in the files.

Perhaps some of your common queries are doing sequential scans?  Those
would visit the empty pages as well as the full ones.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Autovacuum / full vacuum

2006-01-17 Thread Scott Marlowe
On Tue, 2006-01-17 at 09:08, Andrew Sullivan wrote:
 On Tue, Jan 17, 2006 at 03:50:38PM +0100, Michael Riess wrote:
  always wondered why there are no examples for common postgresql 
  configurations. 
 
 You mean like this one? (for 8.0):
 
 http://www.powerpostgresql.com/Downloads/annotated_conf_80.html

I have to admit, looking at the documentation, that we really don't
explain this all that well in the administration section, and I can see
how easily led astray beginners are.

I think it's time I joined the pgsql-docs mailing list...

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Autovacuum / full vacuum

2006-01-17 Thread Chris Browne
[EMAIL PROTECTED] (Andrew Sullivan) writes:
 On Tue, Jan 17, 2006 at 11:18:59AM +0100, Michael Riess wrote:
 hi,
 
 I'm curious as to why autovacuum is not designed to do full vacuum. I 

 Because nothing that runs automatically should ever take an exclusive
 lock on the entire database, which is what VACUUM FULL does.

That's a bit more than what autovacuum would probably do...
autovacuum does things table by table, so that what would be locked
should just be one table.

Even so, I'd not be keen on having anything that runs automatically
take an exclusive lock on even as much as a table.

 activity. Increasing the FSM so that even during these bursts most
 space would be reused would mean to reduce the available memory for
 all other database tasks.

 I don't believe the hit is enough that you should even notice
 it. You'd have to post some pretty incredible use cases to show that
 the tiny loss of memory to FSM is worth (a) an exclusive lock and
 (b) the loss of efficiency you get from having some preallocated
 pages in tables.

There is *a* case for setting up full vacuums of *some* objects.  If
you have a table whose tuples all get modified in the course of some
common query, that will lead to a pretty conspicuous bloating of *that
table.*

Even with a big FSM, the pattern of how updates take place will lead
to that table having ~50% of its space being dead/free, which is way
higher than the desirable stable proportion of 10-15%.

For that sort of table, it may be attractive to run VACUUM FULL on a
regular basis.  Of course, it may also be attractive to try to come up
with an update process that won't kill the whole table's contents at
once ;-).
-- 
let name=cbbrowne and tld=cbbrowne.com in String.concat @ [name;tld];;
http://cbbrowne.com/info/x.html
As  long as  each  individual is  facing  the TV  tube alone,  formal
freedom poses no threat to privilege.  --Noam Chomsky

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


[PERFORM] sum of left join greater than its parts

2006-01-17 Thread Robert Treat
8.1.1, everything vacuumed/analyzed. basically i have two queries that
when executed individually run quite quickly, but if I try to left join
the second query onto the first, everything gets quite a bit slower. 

rms=# explain analyze
rms-#   SELECT
rms-# software_download.*
rms-# FROM
rms-# (
rms(# SELECT
rms(# host_id, max(mtime) as mtime
rms(# FROM
rms(# software_download
rms(# WHERE
rms(# bds_status_id not in (6,17,18)
rms(# GROUP BY
rms(# host_id, software_binary_id
rms(#  ) latest_download
rms-#  JOIN software_download using (host_id,mtime)
rms-#  JOIN software_binary b USING (software_binary_id)
rms-# WHERE
rms-# binary_type_id IN (3,5,6);
   QUERY PLAN   
 
-
 Hash Join  (cost=870.00..992.56 rows=1 width=96) (actual time=90.566..125.782 
rows=472 loops=1)
   Hash Cond: ((outer.host_id = inner.host_id) AND (outer.?column2? = 
inner.mtime))
   -  HashAggregate  (cost=475.88..495.32 rows=1555 width=16) (actual 
time=51.300..70.761 rows=10870 loops=1)
 -  Seq Scan on software_download  (cost=0.00..377.78 rows=13080 
width=16) (actual time=0.010..23.700 rows=13167 loops=1)
   Filter: ((bds_status_id  6) AND (bds_status_id  17) AND 
(bds_status_id  18))
   -  Hash  (cost=379.37..379.37 rows=2949 width=96) (actual 
time=39.167..39.167 rows=639 loops=1)
 -  Hash Join  (cost=5.64..379.37 rows=2949 width=96) (actual 
time=0.185..37.808 rows=639 loops=1)
   Hash Cond: (outer.software_binary_id = 
inner.software_binary_id)
   -  Seq Scan on software_download  (cost=0.00..277.16 rows=13416 
width=96) (actual time=0.008..19.338 rows=13416 loops=1)
   -  Hash  (cost=5.59..5.59 rows=20 width=4) (actual 
time=0.149..0.149 rows=22 loops=1)
 -  Seq Scan on software_binary b  (cost=0.00..5.59 
rows=20 width=4) (actual time=0.011..0.108 rows=22 loops=1)
   Filter: ((binary_type_id = 3) OR (binary_type_id = 
5) OR (binary_type_id = 6))
 Total runtime: 126.704 ms
(13 rows)


rms=# explain analyze 
rms-# SELECT
rms-# entityid, rmsbinaryid, rmsbinaryid as 
software_binary_id, timestamp as downloaded, ia.host_id
rms-# FROM
rms-# (SELECT
rms(# entityid, rmsbinaryid,max(msgid) 
as msgid
rms(# FROM
rms(# msg306u
rms(# WHERE
rms(# downloadstatus=1
rms(# GROUP BY entityid,rmsbinaryid
rms(# ) a1
rms-# JOIN myapp_app ia on 
(entityid=myapp_app_id)
rms-# JOIN
rms-# (SELECT *
rms(# FROM msg306u
rms(# WHERE
rms(# downloadstatus != 0
rms(# ) a2 
USING(entityid,rmsbinaryid,msgid)
rms-# ;
QUERY PLAN  
   
---
 Nested Loop  (cost=1733.79..4620.38 rows=1 width=20) (actual 
time=81.160..89.826 rows=238 loops=1)
   -  Nested Loop  (cost=1733.79..4615.92 rows=1 width=20) (actual 
time=81.142..86.826 rows=238 loops=1)
 Join Filter: (outer.rmsbinaryid = inner.rmsbinaryid)
 -  HashAggregate  (cost=1733.79..1740.92 rows=570 width=12) (actual 
time=81.105..81.839 rows=323 loops=1)
   -  Bitmap Heap Scan on msg306u  (cost=111.75..1540.65 
rows=25752 width=12) (actual time=4.490..41.233 rows=25542 loops=1)
 -  Bitmap Index Scan on rht3  (cost=0.00..111.75 
rows=25752 width=0) (actual time=4.248..4.248 rows=25542 loops=1)
 -  Index Scan using msg306u_entityid_msgid_idx on msg306u  
(cost=0.00..5.02 rows=1 width=20) (actual time=0.008..0.010 rows=1 loops=323)
   Index Cond: ((outer.entityid = msg306u.entityid) AND 
(outer.?column3? = msg306u.msgid))
   Filter: (downloadstatus  '0'::text)
   -  Index Scan using myapp_app_pkey on myapp_app ia  (cost=0.00..4.44 rows=1 
width=8) (actual time=0.006..0.007 rows=1 loops=238)
 Index Cond: (outer.entityid = ia.myapp_app_id)
 Total runtime: 90.270 ms
(12 rows)


and here 

Re: [PERFORM] Autovacuum / full vacuum

2006-01-17 Thread Andrew Sullivan
On Tue, Jan 17, 2006 at 09:59:25AM -0600, Scott Marlowe wrote:
 I have to admit, looking at the documentation, that we really don't
 explain this all that well in the administration section, and I can see
 how easily led astray beginners are.

I understand what you mean, but I suppose my reaction would be that
what we really need is a place to keep these things, with a note in
the docs that the best practice settings for these are documented
at some url, and evolve over time as people gain expertise with the
new features.

I suspect, for instance, that nobody knows exactly the right settings
for any generic workload yet under 8.1 (although probably people know
them well enough for particular workloads).

A


-- 
Andrew Sullivan  | [EMAIL PROTECTED]
This work was visionary and imaginative, and goes to show that visionary
and imaginative work need not end up well. 
--Dennis Ritchie

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] Autovacuum / full vacuum

2006-01-17 Thread Markus Schaber
Hi, Michael,

Michael Riess wrote:

 But actually I never understood why the database system slows down at
 all when there is much unused space in the files. Are the unused pages
 cached by the system, or is there another reason for the impact on the
 performance?

No, they are not cached as such, but PostgreSQL caches whole pages, and
you don't have only empty pages, but also lots of partially empty pages,
so the signal/noise ratio is worse (means PostgreSQL has to fetch more
pages to get the same data).

Sequential scans etc. are also slower.

And some file systems get slower when files get bigger or there are more
files, but this effect should not really be noticeable here.

HTH,
Markus


-- 
Markus Schaber | Logical TrackingTracing International AG
Dipl. Inf. | Software Development GIS

Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Autovacuum / full vacuum

2006-01-17 Thread Alvaro Herrera
Chris Browne wrote:
 [EMAIL PROTECTED] (Andrew Sullivan) writes:
  On Tue, Jan 17, 2006 at 11:18:59AM +0100, Michael Riess wrote:
  hi,
  
  I'm curious as to why autovacuum is not designed to do full vacuum. I 
 
  Because nothing that runs automatically should ever take an exclusive
  lock on the entire database, which is what VACUUM FULL does.
 
 That's a bit more than what autovacuum would probably do...
 autovacuum does things table by table, so that what would be locked
 should just be one table.

Even a database-wide vacuum does not take locks on more than one table.
The table locks are acquired and released one by one, as the operation
proceeds.  And as you know, autovacuum (both 8.1's and contrib) does
issue database-wide vacuums, if it finds a database close to an xid
wraparound.

-- 
Alvaro Herrerahttp://www.advogato.org/person/alvherre
Las mujeres son como hondas:  mientras más resistencia tienen,
 más lejos puedes llegar con ellas  (Jonas Nightingale, Leap of Faith)

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Autovacuum / full vacuum

2006-01-17 Thread Scott Marlowe
On Tue, 2006-01-17 at 11:16, Andrew Sullivan wrote:
 On Tue, Jan 17, 2006 at 09:59:25AM -0600, Scott Marlowe wrote:
  I have to admit, looking at the documentation, that we really don't
  explain this all that well in the administration section, and I can see
  how easily led astray beginners are.
 
 I understand what you mean, but I suppose my reaction would be that
 what we really need is a place to keep these things, with a note in
 the docs that the best practice settings for these are documented
 at some url, and evolve over time as people gain expertise with the
 new features.
 
 I suspect, for instance, that nobody knows exactly the right settings
 for any generic workload yet under 8.1 (although probably people know
 them well enough for particular workloads).

But the problem is bigger than that.  The administrative docs were
obviously evolved over time, and now they kind of jump around and around
covering the same subject from different angles and at different
depths.  Even I find it hard to find what I need, and I know PostgreSQL
administration well enough to be pretty darned good at it.

For the beginner, it must seem much more confusing.  The more I look at
the administration section of the docs, the more I want to reorganize
the whole thing, and rewrite large sections of it as well.

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Autovacuum / full vacuum

2006-01-17 Thread Andrew Sullivan
On Tue, Jan 17, 2006 at 11:43:14AM -0500, Chris Browne wrote:
 [EMAIL PROTECTED] (Andrew Sullivan) writes:
  Because nothing that runs automatically should ever take an exclusive
  lock on the entire database, 

 That's a bit more than what autovacuum would probably do...

Or even VACUUM FULL, as I tried to make clearer in another message:
the way I phrased it suggests that it's a simultaneous lock on the
entire database (when it is most certainly not).  I didn't intend to
mislead; my apologies.

Note, though, that the actual effect for a user might look worse
than a lock on the entire database, though, if you conider
statement_timeout and certain use patterns.

Suppose you want to issue occasional VACCUM FULLs, but your
application is prepared for this, and depends on statement_timeout to
tell it sorry, too long, try again.  Now, if the exclusive lock on
any given table takes less than statement_timeout, so that each
statement is able to continue in its time, the application looks like
it's having an outage _even though_ it is actually blocked on
vacuums.  (Yes, it's poor application design.  There's plenty of that
in the world, and you can't always fix it.)

 There is *a* case for setting up full vacuums of *some* objects.  If
 you have a table whose tuples all get modified in the course of some
 common query, that will lead to a pretty conspicuous bloating of *that
 table.*

Sure.  And depending on your use model, that might be good.  In many
cases, though, a rotor table + view + truncate approach would be
better, and would allow improved uptime.  If you don't care about
uptime, and can take long outages every day, then the discussion is
sort of moot anyway.  And _all_ of this is moot, as near as I can
tell, given the OP's claim that the hardware is adequate and
immutable, even though the former claim is demonstrably false.

A

-- 
Andrew Sullivan  | [EMAIL PROTECTED]
The fact that technology doesn't work is no bar to success in the marketplace.
--Philip Greenspun

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Autovacuum / full vacuum

2006-01-17 Thread Chris Browne
[EMAIL PROTECTED] (Alvaro Herrera) writes:
 Chris Browne wrote:
 [EMAIL PROTECTED] (Andrew Sullivan) writes:
  On Tue, Jan 17, 2006 at 11:18:59AM +0100, Michael Riess wrote:
  hi,
  
  I'm curious as to why autovacuum is not designed to do full vacuum. I 
 
  Because nothing that runs automatically should ever take an exclusive
  lock on the entire database, which is what VACUUM FULL does.
 
 That's a bit more than what autovacuum would probably do...
 autovacuum does things table by table, so that what would be locked
 should just be one table.

 Even a database-wide vacuum does not take locks on more than one table.
 The table locks are acquired and released one by one, as the operation
 proceeds.  And as you know, autovacuum (both 8.1's and contrib) does
 issue database-wide vacuums, if it finds a database close to an xid
 wraparound.

Has that changed recently?  I have always seen vacuumdb or SQL
VACUUM (without table specifications) running as one long
transaction which doesn't release the locks that it is granted until
the end of the transaction.
-- 
cbbrowne,@,acm.org
http://cbbrowne.com/info/spiritual.html
My nostalgia for Icon makes me forget about any of the bad things.  I
don't have much nostalgia for Perl, so its faults I remember.
-- Scott Gilbert comp.lang.python

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] Autovacuum / full vacuum

2006-01-17 Thread Tom Lane
Chris Browne [EMAIL PROTECTED] writes:
 [EMAIL PROTECTED] (Alvaro Herrera) writes:
 Even a database-wide vacuum does not take locks on more than one table.
 The table locks are acquired and released one by one, as the operation
 proceeds.

 Has that changed recently?  I have always seen vacuumdb or SQL
 VACUUM (without table specifications) running as one long
 transaction which doesn't release the locks that it is granted until
 the end of the transaction.

You sure?  It's not supposed to, and watching a database-wide vacuum
with select * from pg_locks doesn't look to me like it ever has locks
on more than one table (plus the table's indexes and toast table).

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Suspending SELECTs

2006-01-17 Thread Alessandro Baretta

Tom Lane wrote:

Alessandro Baretta [EMAIL PROTECTED] writes:


I am aware that what I am dreaming of is already available through
cursors, but in a web application, cursors are bad boys, and should be
avoided. What I would like to be able to do is to plan a query and run
the plan to retreive a limited number of rows as well as the
executor's state. This way, the burden of maintaining the cursor on
hold, between activations of the web resource which uses it, is
transferred from the DBMS to the web application server,



This is a pipe dream, I'm afraid, as the state of a cursor does not
consist exclusively of bits that can be sent somewhere else and then
retrieved.  There are also locks to worry about, as well as the open
transaction itself, and these must stay alive inside the DBMS because
they affect the behavior of other transactions.  As an example, once
the cursor's originating transaction closes, there is nothing to stop
other transactions from modifying or removing rows it would have read.


I understand most of these issues, and expected this kind of reply. Please, 
allow me to insist that we reason on this problem and try to find a solution. My 
reason for doing so is that the future software industry is likely to see more 
and more web applications retrieving data from virtually endless databases, and 
in such contexts, it is sensible to ask the final client--the web client--to 
store the cursor state, because web interaction is intrinsically asynchronous, 
and you cannot count on users logging out when they're done, releasing resources 
allocated to them. Think of Google.


Let me propose a possible solution strategy for the problem of client-side 
cursors.
* Let us admit the limitation that a client-side cursor can only be declared 
in a transaction where no inserts, updates or deletes are allowed, so that such 
a transaction is virtually non-existent to other transactions. This allows the 
backend to close the transaction and release locks as soon as the cursor is 
declared.
* When the cursor state is pushed back to the backend, no new transaction is 
instantiated, but the XID of the original transaction is reused. In the MVCC 
system, this allows us to achieve a perfectly consistent view of the database at 
the instant the original transaction started, unless a VACUUM command has been 
executed in the meantime, in which case I would lose track of tuples which would 
have been live in the context of the original transaction, but have been updated 
or deleted and later vacuumed; however, this does not bother me at all.


Is this not a viable solution?

Alex

--
*
http://www.barettadeit.com/
Baretta DEIT
A division of Baretta SRL

tel. +39 02 370 111 55
fax. +39 02 370 111 54

Our technology:

The Application System/Xcaml (AS/Xcaml)
http://www.asxcaml.org/

The FreerP Project
http://www.freerp.org/

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PERFORM] Suspending SELECTs

2006-01-17 Thread Michael Stone

On Tue, Jan 17, 2006 at 08:56:00PM +0100, Alessandro Baretta wrote:
I understand most of these issues, and expected this kind of reply. Please, 
allow me to insist that we reason on this problem and try to find a 
solution. My reason for doing so is that the future software industry is 
likely to see more and more web applications retrieving data from virtually 
endless databases, and in such contexts, it is sensible to ask the final 
client--the web client--to store the cursor state, because web 
interaction is intrinsically asynchronous, and you cannot count on users 
logging out when they're done, releasing resources allocated to them. Think 
of Google.


I don't understand why it is better to rework the db instead of just
having the web middleware keep track of what cursors are associated with
what sessions?

Mike Stone

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PERFORM] Suspending SELECTs

2006-01-17 Thread Alessandro Baretta

Craig A. James wrote:


Alessandro Baretta [EMAIL PROTECTED] writes:

I think you're trying to do something at the wrong layer of your 
architecture.  This task normally goes in your middleware layer, not 
your database layer.


I am developing my applications in Objective Caml, and I have written the 
middleware layer myself. I could easily implement a cursor-pooling strategy, but 
there is no perfect solution to the problem of guaranteeing that cursors be 
closed. Remember that web applications require the user to open a session by 
connecting the appropriate HTTP resource, but users as never required to log 
out. Hence, in order to eventually reclaim all cursors, I must use magical 
log-out detection algorithm, which is usually implemented with a simple 
timeout. This guarantees the required property of safety (the population of 
cursors does not diverge) but does not guarantee the required property of 
liveness (a user connecting to the application, who has opened a session but has 
not logged out, and thus possesses a session token, should have access the 
execution context identified by his token).


Alex



--
*
http://www.barettadeit.com/
Baretta DEIT
A division of Baretta SRL

tel. +39 02 370 111 55
fax. +39 02 370 111 54

Our technology:

The Application System/Xcaml (AS/Xcaml)
http://www.asxcaml.org/

The FreerP Project
http://www.freerp.org/

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


[PERFORM] wildcard search performance with like

2006-01-17 Thread Yantao Shi

Hi,

I have a postges 8.1.1 table with over 29 million rows in it. The colunm 
(file_name) that I need to search on has entries like the following:


MOD04_L2.A2005311.1400.004.2005312013848.hdf

MYD04_L2.A2005311.0700.004.2005312013437.hdf 

I have an index on this column. But an index search is performance only 
when I give the full file_name for search:


testdbspc=# explain select file_name from catalog where file_name = 
'MOD04_L2.A2005311.1400.004.2005312013848.hdf';

QUERY PLAN
Index Scan using catalog_pk_idx on catalog  (cost=0.00..6.01 rows=1 
width=404)
 Index Cond: (file_name = 
'MOD04_L2.A2005311.1400.004.2005312013848.hdf'::bpchar)

(2 rows)

What I really need to do most of the time is a multi-wildcard search on 
this column, which is now doing a whole table scan without using the 
index at all:


testdbspc=# explain select file_name from catalog where file_name like 
'MOD04_L2.A2005311.%.004.2005312013%.hdf';

QUERY PLAN
Seq Scan on catalog  (cost=0.00..429.00 rows=1 width=404)
 Filter: (file_name ~~ 'MOD04_L2.A2005311.%.004.2005312013%.hdf'::text)
(2 rows)

Obviously, the performance of the table scan on such a large table is 
not acceptable.


I tried full-text indexing and searching. It did NOT work on this column 
because all the letters and numbers are linked together with . and 
considered one big single word by to_tsvector.


Any solutions for this column to use an index search with multiple wild 
cards?


Thanks a lot,
Yantao Shi




---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] Suspending SELECTs

2006-01-17 Thread Tom Lane
Alessandro Baretta [EMAIL PROTECTED] writes:
 * When the cursor state is pushed back to the backend, no new
 transaction is instantiated, but the XID of the original transaction
 is reused. In the MVCC system, this allows us to achieve a perfectly
 consistent view of the database at the instant the original
 transaction started, unless a VACUUM command has been executed in the
 meantime, in which case I would lose track of tuples which would have
 been live in the context of the original transaction, but have been
 updated or deleted and later vacuumed; however, this does not bother
 me at all.

 Is this not a viable solution?

No.  I'm not interested in solutions that can be translated as you
may or may not get the right answer, and there's no way even to know
whether you did or not.  That might be acceptable for your particular
application but you certainly can't argue that it's of general
usefulness.

Also, I can't accept the concept of pushing the entire execution engine
state out to the client and then back again; that state is large enough
that doing so for every few dozen rows would yield incredibly bad
performance.  (In most scenarios I think it'd be just as efficient for
the client to pull the whole cursor output at the start and page through
it for itself.)  Worse yet: this would represent a security hole large
enough to wheel West Virginia through.  We'd have no reasonable way to
validate the data the client sends back.

Lastly, you underestimate the problems associated with not holding the
locks the cursor is using.  As an example, it's likely that a btree
indexscan wouldn't successfully restart at all, because it couldn't find
where it had been if the index page had been split or deleted meanwhile.
So not running VACUUM is not enough to guarantee the query will still
work.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Suspending SELECTs

2006-01-17 Thread Jim C. Nasby
On Tue, Jan 17, 2006 at 09:06:53PM +0100, Alessandro Baretta wrote:
 Craig A. James wrote:
 
 Alessandro Baretta [EMAIL PROTECTED] writes:
 
 I think you're trying to do something at the wrong layer of your 
 architecture.  This task normally goes in your middleware layer, not 
 your database layer.
 
 I am developing my applications in Objective Caml, and I have written the 
 middleware layer myself. I could easily implement a cursor-pooling 
 strategy, but there is no perfect solution to the problem of guaranteeing 
 that cursors be closed. Remember that web applications require the user to 
 open a session by connecting the appropriate HTTP resource, but users as 
 never required to log out. Hence, in order to eventually reclaim all 
 cursors, I must use magical log-out detection algorithm, which is usually 
 implemented with a simple timeout. This guarantees the required property of 
 safety (the population of cursors does not diverge) but does not guarantee 
 the required property of liveness (a user connecting to the application, 
 who has opened a session but has not logged out, and thus possesses a 
 session token, should have access the execution context identified by his 
 token).

With some AJAX magic, it would probably be pretty easy to create an
application that let you know very quickly if a user left the
application (ie: browsed to another site, or closed the browser).
Essentially, you should be able to set it up so that it will ping the
application server fairly frequently (like every 10 seconds), so you
could drastically reduce the timeout interval.
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] wildcard search performance with like

2006-01-17 Thread Tom Lane
Yantao Shi [EMAIL PROTECTED] writes:
 testdbspc=# explain select file_name from catalog where file_name like 
 'MOD04_L2.A2005311.%.004.2005312013%.hdf';
 QUERY PLAN
 Seq Scan on catalog  (cost=0.00..429.00 rows=1 width=404)
   Filter: (file_name ~~ 'MOD04_L2.A2005311.%.004.2005312013%.hdf'::text)
 (2 rows)

I'm betting you are using a non-C locale.  You need either to run the
database in C locale, or to create a special index type that is
compatible with LIKE searches.  See
http://www.postgresql.org/docs/8.1/static/indexes-opclass.html

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] sum of left join greater than its parts

2006-01-17 Thread Josh Berkus
Hmmm, this looks like a planner bug to me:

 Hash
 Join  (cost=870.00..992.56 rows=1 width=96) (actual time=90.566..125.782
 rows=472 loops=1) Hash Cond: ((outer.host_id = inner.host_id) AND
 (outer.?column2? = inner.mtime)) -  HashAggregate 
 (cost=475.88..495.32 rows=1555 width=16) (actual time=51.300..70.761
 rows=10870 loops=1)

-- Nested Loop  (cost=1733.79..4620.38 rows=1 width=20) (actual
 time=81.160..89.826 rows=238 loops=1) -  Nested Loop 
 (cost=1733.79..4615.92 rows=1 width=20) (actual time=81.142..86.826
 rows=238 loops=1) Join Filter: (outer.rmsbinaryid =
 inner.rmsbinaryid) -  HashAggregate  (cost=1733.79..1740.92 rows=570
 width=12) (actual time=81.105..81.839 rows=323 loops=1) -  Bitmap Heap
 Scan on msg306u  (cost=111.75..1540.65 rows=25752 width=12) (actual
 time=4.490..41.233 rows=25542 loops=1)

Notice that for both queries, the estimates are reasonably accurate (within 
+/- 4x) until they get to left joining the subquery, at which point the 
estimate of rows joined becomes exactly 1.   That looks suspicios to 
me ... Tom?  Neil?

-- 
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Suspending SELECTs

2006-01-17 Thread mark
On Tue, Jan 17, 2006 at 08:56:00PM +0100, Alessandro Baretta wrote:
 I understand most of these issues, and expected this kind of reply. Please, 
 allow me to insist that we reason on this problem and try to find a 
 solution. My reason for doing so is that the future software industry is 
 likely to see more and more web applications retrieving data from virtually 
 endless databases, and in such contexts, it is sensible to ask the final 
 client--the web client--to store the cursor state, because web 
 interaction is intrinsically asynchronous, and you cannot count on users 
 logging out when they're done, releasing resources allocated to them. Think 
 of Google.

What is wrong with LIMIT and OFFSET? I assume your results are ordered
in some manner.

Especially with web users, who become bored if the page doesn't flicker
in a way that appeals to them, how could one have any expectation that
the cursor would ever be useful at all?

As a 'general' solution, I think optimizing the case where the same
query is executed multiple times, with only the LIMIT and OFFSET
parameters changing, would be a better bang for the buck. I'm thinking
along the lines of materialized views, for queries executed more than
a dozen times in a short length of time... :-)

In the mean time, I successfully use LIMIT and OFFSET without such an
optimization, and things have been fine for me.

Cheers,
mark

-- 
[EMAIL PROTECTED] / [EMAIL PROTECTED] / [EMAIL PROTECTED] 
__
.  .  _  ._  . .   .__.  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/|_ |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
   and in the darkness bind them...

   http://mark.mielke.cc/


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Suspending SELECTs

2006-01-17 Thread Mark Lewis
 I am developing my applications in Objective Caml, and I have written the 
 middleware layer myself. I could easily implement a cursor-pooling strategy, 
 but 
 there is no perfect solution to the problem of guaranteeing that cursors be 
 closed. Remember that web applications require the user to open a session 
 by 
 connecting the appropriate HTTP resource, but users as never required to log 
 out. Hence, in order to eventually reclaim all cursors, I must use magical 
 log-out detection algorithm, which is usually implemented with a simple 
 timeout. This guarantees the required property of safety (the population of 
 cursors does not diverge) but does not guarantee the required property of 
 liveness (a user connecting to the application, who has opened a session but 
 has 
 not logged out, and thus possesses a session token, should have access the 
 execution context identified by his token).

I fail to see the problem here.  Why should liveness be a required
property?  If is it simply that you can't promptly detect when a user is
finished with their web session so you can free resources, then remember
that there is no requirement that you dedicate a connection to their
session in the first place.  Even if you're using your own custom
middleware, it isn't a very complicated or conceptually difficult thing
to implement (see my previous post).  Certainly it's simpler than
allowing clients to pass around runtime state.

As far as implementing this sort of thing in the back-end, it would be
really hard with the PostgreSQL versioning model.  Oracle can more
easily (and kind of does) support cursors like you've described because
they implement MVCC differently than PostgreSQL, and in their
implementation you're guaranteed that you always have access to the most
recent x megabytes of historical rows, so even without an open
transaction to keep the required rows around you can still be relatively
sure they'll be around for long enough.  In PostgreSQL, historical
rows are kept in the tables themselves and periodically vacuumed, so
there is no such guarantee, which means that you would need to either
implement a lot of complex locking for little material gain, or just
hold the cursors in moderately long-running transactions, which leads
back to the solution suggested earlier.

-- Mark Lewis



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


[PERFORM] Multiple Order By Criteria

2006-01-17 Thread J



I'm trying to query a table with 250,000+ rows. My 
query requires I provide 5 colums in my "order by" clause:

selectcolumn
fromtable 
where 
column = '2004-3-22 0:0:0'order by 

 ds.receipt desc,
 ds.carrier_id asc,
 ds.batchnum asc,
 encounternum asc,
 ds.encounter_id ASC
limit 100 offset 0

I have an index built for each of these columns in 
my order by clause. This query takes an unacceptable amount of time to execute. 
Here are the results of the explain:

Limit (cost=229610.78..229611.03 rows=100 
width=717) - Sort (cost=229610.78..230132.37 
rows=208636 width=717) Sort Key: 
receipt, carrier_id, batchnum, encounternum, 
encounter_id - Seq Scan 
on detail_summary ds (cost=0.00..22647.13 rows=208636 
width=717) 
Filter: (receipt = '2004-03-22'::date)

When I have the order by just have 1 criteria, it's 
fine (just ds.receipt DESC)

Limit (cost=0.00..177.71 rows=100 
width=717) - Index Scan Backward using 
detail_summary_receipt_id_idx on detail_summary ds (cost=0.00..370756.84 
rows=208636 width=717) Index Cond: 
(receipt = '2004-03-22'::date)

I've increased my work_mem to up to 256meg with no 
speed increase. I think there's something here I just don't 
understand.

How do I make this go fast ?







Re: [PERFORM] Multiple Order By Criteria

2006-01-17 Thread Josh Berkus
J,

 I have an index built for each of these columns in my order by clause.
 This query takes an unacceptable amount of time to execute. Here are the
 results of the explain:

You need a single index which has all five columns, in order.

-- 
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Multiple Order By Criteria

2006-01-17 Thread J
I created the index, in order. Did a vacuum analyze on the table and my 
explain still says:


Limit  (cost=229610.78..229611.03 rows=100 width=717)
 -  Sort  (cost=229610.78..230132.37 rows=208636 width=717)
   Sort Key: receipt, carrier_id, batchnum, encounternum, encounter_id
   -  Seq Scan on detail_summary ds  (cost=0.00..22647.13 rows=208636 
width=717)

 Filter: (receipt = '2004-03-22'::date)


So, for fun I did
set enable_seqscan to off

But that didn't help. For some reason, the sort wants to do a seq scan and 
not use my super new index.


Am I doing something wrong ?

- Original Message - 
From: Josh Berkus josh@agliodbs.com

To: pgsql-performance@postgresql.org
Cc: [EMAIL PROTECTED]
Sent: Tuesday, January 17, 2006 5:25 PM
Subject: Re: [PERFORM] Multiple Order By Criteria



J,


I have an index built for each of these columns in my order by clause.
This query takes an unacceptable amount of time to execute. Here are the
results of the explain:


You need a single index which has all five columns, in order.




--
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly




---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PERFORM] Multiple Order By Criteria

2006-01-17 Thread Stephan Szabo

On Tue, 17 Jan 2006, Josh Berkus wrote:

 J,

  I have an index built for each of these columns in my order by clause.
  This query takes an unacceptable amount of time to execute. Here are the
  results of the explain:

 You need a single index which has all five columns, in order.

I think he'll also need a reverse opclass for the first column in the
index or for the others since he's doing desc, asc, asc, asc, asc.

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] Autovacuum / full vacuum

2006-01-17 Thread Jim C. Nasby
On Tue, Jan 17, 2006 at 03:50:38PM +0100, Michael Riess wrote:
 Well,
 
 I think that the documentation is not exactly easy to understand. I 
 always wondered why there are no examples for common postgresql 
 configurations. All I know is that the default configuration seems to be 
 too low for production use. And while running postgres I get no hints as 
 to which setting needs to be increased to improve performance. I have no 

There's a number of sites that have lots of info on postgresql.conf
tuning. Google for 'postgresql.conf tuning' or 'annotated
postgresql.conf'.

 chance to see if my FSM settings are too low other than to run vacuum 
 full verbose in psql, pipe the result to a text file and grep for some 
 words to get a somewhat comprehensive idea of how much unused space 
 there is in my system.
 
 Don't get me wrong - I really like PostgreSQL and it works well in my 
 application. But somehow I feel that it might run much better ...
 
 about the FSM: You say that increasing the FSM is fairly cheap - how 
 should I know that?

[EMAIL PROTECTED]:26]/opt/local/share/postgresql8:3%grep fsm \
postgresql.conf.sample 
#max_fsm_pages = 2  # min max_fsm_relations*16, 6 bytes each
#max_fsm_relations = 1000   # min 100, ~70 bytes each
[EMAIL PROTECTED]:26]/opt/local/share/postgresql8:4%
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] Suspending SELECTs

2006-01-17 Thread Frank Wiles
On Tue, 17 Jan 2006 16:12:59 -0500
[EMAIL PROTECTED] wrote:

 In the mean time, I successfully use LIMIT and OFFSET without such an
 optimization, and things have been fine for me.

  Same here. 

 -
   Frank Wiles [EMAIL PROTECTED]
   http://www.wiles.org
 -


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Suspending SELECTs

2006-01-17 Thread Josh Berkus
Alessandro,

 I understand most of these issues, and expected this kind of reply.
 Please, allow me to insist that we reason on this problem and try to
 find a solution. My reason for doing so is that the future software
 industry is likely to see more and more web applications retrieving data
 from virtually endless databases, and in such contexts, it is sensible
 to ask the final client--the web client--to store the cursor state,
 because web interaction is intrinsically asynchronous, and you cannot
 count on users logging out when they're done, releasing resources
 allocated to them. Think of Google.

I think you're trying to use an unreasonable difficult method to solve a 
problem that's already been solved multiple times.  What you want is 
called query caching.   There are about 800 different ways to do this on 
the middleware or application layer which are 1000% easier than what 
you're proposing.

-- 
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] Multiple Order By Criteria

2006-01-17 Thread Fredrick O Jackson

try adding the keyword 'date' before the date in your query.
I ran into this quite a while back, but I'm not sure I remember the solution.


  In Reply to: Tuesday January 17 2006 04:29 pm, [EMAIL PROTECTED] [EMAIL 
  PROTECTED] 
wrote:
 I created the index, in order. Did a vacuum analyze on the table and my
 explain still says:

 Limit  (cost=229610.78..229611.03 rows=100 width=717)
   -  Sort  (cost=229610.78..230132.37 rows=208636 width=717)
 Sort Key: receipt, carrier_id, batchnum, encounternum, encounter_id
 -  Seq Scan on detail_summary ds  (cost=0.00..22647.13 rows=208636
 width=717)
   Filter: (receipt = '2004-03-22'::date)


 So, for fun I did
 set enable_seqscan to off

 But that didn't help. For some reason, the sort wants to do a seq scan and
 not use my super new index.

 Am I doing something wrong ?

 - Original Message -
 From: Josh Berkus josh@agliodbs.com
 To: pgsql-performance@postgresql.org
 Cc: [EMAIL PROTECTED]
 Sent: Tuesday, January 17, 2006 5:25 PM
 Subject: Re: [PERFORM] Multiple Order By Criteria

  J,
 
  I have an index built for each of these columns in my order by clause.
  This query takes an unacceptable amount of time to execute. Here are the
  results of the explain:
 
  You need a single index which has all five columns, in order.
 
 
  --
  --Josh
 
  Josh Berkus
  Aglio Database Solutions
  San Francisco
 
  ---(end of broadcast)---
  TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly

 ---(end of broadcast)---
 TIP 4: Have you searched our list archives?

http://archives.postgresql.org

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Suspending SELECTs

2006-01-17 Thread Josh Berkus
People:

To follow up further, what Alessandro is talking about is known as a 
keyset cursor.   Sybase and SQL Server used to support them; I beleive 
that they were strictly read-only and had weird issues with record 
visibility.

-- 
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Multiple Order By Criteria

2006-01-17 Thread Stephan Szabo

On Tue, 17 Jan 2006 [EMAIL PROTECTED] wrote:

 I created the index like this:

 CREATE INDEX rcbee_idx
   ON detail_summary
   USING btree
   (receipt, carrier_id, batchnum, encounternum, encounter_id);

 Is this correct ?

That would work if you were asking for all the columns ascending or
descending, but we don't currently use it for mixed orders.

 How do I make a reverse opclass ?

There's some information at the following:
http://archives.postgresql.org/pgsql-novice/2005-10/msg00254.php
http://archives.postgresql.org/pgsql-general/2005-01/msg00121.php
http://archives.postgresql.org/pgsql-general/2004-06/msg00565.php

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Ensuring data integrity with fsync=off

2006-01-17 Thread Jim C. Nasby
On Sat, Jan 14, 2006 at 01:41:43PM -0500, Tom Lane wrote:
 Benjamin Arai [EMAIL PROTECTED] writes:
  Right now I run sync afte the updates have finished to ensure that the
  data is synced to disk but I am concerned about the segment data and
  anything else I am missing that PostgreSQL explicitly handles.  Is there
  something I can do in addition to sync to tell PostgreSQL exlplicitly that
  it is time to ensure everything is stored in its final destionation and etc?
 
 You need to give PG a CHECKPOINT command to flush stuff out of its
 internal buffers.  After that finishes, a manual sync commnd will
 push everything down to disk.
 
 You realize, of course, that a system failure while the updates are
 running might leave your database corrupt?  As long as you are prepared
 to restore from scratch, this might be a good tradeoff ... but don't
 let yourself get caught without an up-to-date backup ...

Another alternative that may (or may not) be simpler would be to run
everything in one transaction and just let that commit at the end. Also,
there is ongoing work towards allowing certain operations to occur
without generating any log writes. Currently there is code submitted
that allows COPY into a table that was created in the same transaction
to go un-logged, though I think it's only in HEAD. In any case, there
should be some features that could be very useful to you in 8.2.
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Multiple Order By Criteria

2006-01-17 Thread J

I've read all of this info, closely. I wish when I was searching for an
answer for my problem these pages came up. Oh well.

I am getting an idea of what I need to do to make this work well. I was
wondering if there is more information to read on how to implement this
solution in a more simple way. Much of what's written seems to be towards an
audience that should understand certain things automatically.
- Original Message - 
From: Stephan Szabo [EMAIL PROTECTED]

To: [EMAIL PROTECTED]
Cc: Josh Berkus josh@agliodbs.com; pgsql-performance@postgresql.org
Sent: Tuesday, January 17, 2006 6:39 PM
Subject: Re: [PERFORM] Multiple Order By Criteria




On Tue, 17 Jan 2006 [EMAIL PROTECTED] wrote:


I created the index like this:

CREATE INDEX rcbee_idx
  ON detail_summary
  USING btree
  (receipt, carrier_id, batchnum, encounternum, encounter_id);

Is this correct ?


That would work if you were asking for all the columns ascending or
descending, but we don't currently use it for mixed orders.


How do I make a reverse opclass ?


There's some information at the following:
http://archives.postgresql.org/pgsql-novice/2005-10/msg00254.php
http://archives.postgresql.org/pgsql-general/2005-01/msg00121.php
http://archives.postgresql.org/pgsql-general/2004-06/msg00565.php

---(end of broadcast)---
TIP 6: explain analyze is your friend




---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Suspending SELECTs

2006-01-17 Thread Mark Kirkwood

[EMAIL PROTECTED] wrote:

On Tue, Jan 17, 2006 at 08:56:00PM +0100, Alessandro Baretta wrote:


What is wrong with LIMIT and OFFSET? I assume your results are ordered
in some manner.

Especially with web users, who become bored if the page doesn't flicker
in a way that appeals to them, how could one have any expectation that
the cursor would ever be useful at all?

As a 'general' solution, I think optimizing the case where the same
query is executed multiple times, with only the LIMIT and OFFSET
parameters changing, would be a better bang for the buck. I'm thinking
along the lines of materialized views, for queries executed more than
a dozen times in a short length of time... :-)

In the mean time, I successfully use LIMIT and OFFSET without such an
optimization, and things have been fine for me.



Second that.

I do seem to recall a case where I used a different variant of this 
method (possibly a database product that didn't have OFFSET, or maybe 
because OFFSET was expensive for the case in point), where the ORDER BY 
key for the last record on the page was saved and the query amended to 
use it filter for the next' screen - e.g:


1st time in:

SELECT ... FROM table WHERE ... ORDER BY id LIMIT 20;

Suppose this displays records for id 1 - 10020.
When the user hits next, and page saves id=10020 in the session state 
and executes:


SELECT ... FROM table WHERE ... AND id  10020 ORDER BY id LIMIT 20;

Clearly you have to be a little careful about whether to use '' or '=' 
depending on whether 'id' is unique or not (to continue using '' in the 
non unique case, you can just save and use all the members of the 
primary key too).


Cheers

Mark

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


[PERFORM] Getting pg to use index on an inherited table (8.1.1)

2006-01-17 Thread Hari Warrier

Hi,

I have two tables foobar and foobar2 (which inherits from foobar, no 
extra columns).

foobar2 has all the data (574,576 rows), foobar is empty.
Both foobar and foobar2 have an index on the only column 'id'.  Now I 
have a list of ids in a tmp_ids tables.
A query on foobar2 (child table) uses the index, whereas the same query 
via foobar (parent) doesn't.
Even if I set seq_scan off, it still doesn't use the index on the child 
table while queried via the parent table.


Details are given below. Any help is appreciated.

# analyze foobar;
ANALYZE
# analyze foobar2;
ANALYZE
# explain analyze select * from foobar2 join tmp_ids using (id);
   QUERY PLAN
---
Nested Loop  (cost=0.00..3013.69 rows=85856 width=4) (actual 
time=0.038..234.864 rows=44097 loops=1)
  -  Seq Scan on tmp_ids  (cost=0.00..1.52 rows=52 width=4) (actual 
time=0.008..0.102 rows=52 loops=1)
  -  Index Scan using foobar2_idx1 on foobar2  (cost=0.00..37.29 
rows=1651 width=4) (actual time=0.007..1.785 rows=848 loops=52)

Index Cond: (foobar2.id = outer.id)
Total runtime: 302.963 ms
(5 rows)

# explain analyze select * from foobar join tmp_ids using (id);
QUERY PLAN

Hash Join  (cost=1.65..13267.85 rows=149946 width=4) (actual 
time=7.338..3837.060 rows=44097 loops=1)

  Hash Cond: (outer.id = inner.id)
  -  Append  (cost=0.00..8883.16 rows=576716 width=4) (actual 
time=0.012..2797.555 rows=574576 loops=1)
-  Seq Scan on foobar  (cost=0.00..31.40 rows=2140 width=4) 
(actual time=0.002..0.002 rows=0 loops=1)
-  Seq Scan on foobar2 foobar  (cost=0.00..8851.76 rows=574576 
width=4) (actual time=0.004..1027.422 rows=574576 loops=1)
  -  Hash  (cost=1.52..1.52 rows=52 width=4) (actual time=0.194..0.194 
rows=52 loops=1)
-  Seq Scan on tmp_ids  (cost=0.00..1.52 rows=52 width=4) 
(actual time=0.003..0.094 rows=52 loops=1)

Total runtime: 3905.074 ms
(8 rows)

# select version();
 version

PostgreSQL 8.1.1 on x86_64-unknown-linux-gnu, compiled by GCC gcc (GCC) 
3.3.3 (SuSE Linux)

(1 row)

# \d foobar
   Table public.foobar
Column |  Type   | Modifiers
+-+---
id | integer |
Indexes:
   foobar_idx1 btree (id)

# \d foobar2
  Table public.foobar2
Column |  Type   | Modifiers
+-+---
id | integer |
Indexes:
   foobar2_idx1 btree (id)
Inherits: foobar

# \d tmp_ids
  Table public.tmp_ids
Column |  Type   | Modifiers
+-+---
id | integer |


# set enable_seqscan=off;
SET
# explain analyze select * from foobar join tmp_ids using (id);
  QUERY 
PLAN

-
Hash Join  (cost=20001.65..300013267.85 rows=149946 width=4) 
(actual time=7.352..3841.221 rows=44097 loops=1)

  Hash Cond: (outer.id = inner.id)
  -  Append  (cost=1.00..28883.16 rows=576716 width=4) 
(actual time=0.012..2803.547 rows=574576 loops=1)
-  Seq Scan on foobar  (cost=1.00..10031.40 
rows=2140 width=4) (actual time=0.003..0.003 rows=0 loops=1)
-  Seq Scan on foobar2 foobar  
(cost=1.00..18851.76 rows=574576 width=4) (actual 
time=0.005..1032.148 rows=574576 loops=1)
  -  Hash  (cost=10001.52..10001.52 rows=52 width=4) (actual 
time=0.194..0.194 rows=52 loops=1)
-  Seq Scan on tmp_ids  (cost=1.00..10001.52 
rows=52 width=4) (actual time=0.004..0.098 rows=52 loops=1)

Total runtime: 3909.332 ms
(8 rows)

Output of  show all (remember I just turned off seq_scan above)

enable_bitmapscan   | 
on| Enables the planner's use of 
bitmap-scan plans.
enable_hashagg  | 
on| Enables the planner's use of 
hashed aggregation plans.
enable_hashjoin | 
on| Enables the planner's use of 
hash join plans.
enable_indexscan| 
on| Enables the planner's use of 
index-scan plans.
enable_mergejoin| 
on| Enables the planner's use of 
merge join plans.
enable_nestloop | 
on| Enables the planner's use of 
nested-loop join plans.
enable_seqscan  | 
off

Re: [PERFORM] Suspending SELECTs

2006-01-17 Thread Craig A. James

Alessandro Baretta wrote:
I think you're trying to do something at the wrong layer of your 
architecture.  This task normally goes in your middleware layer, not 
your database layer.


I am developing my applications in Objective Caml, and I have written 
the middleware layer myself. I could easily implement a cursor-pooling 
strategy...


You're trying to solve a very hard problem, and you're rewriting a lot of stuff 
that's been worked on for years by teams of people.  If there's any way you 
switch use something like JBOSS, it might save you a lot of grief and hard work.

I eliminated this problem a different way, using what we call a hitlist.  Basically, 
every query becomes a select into, something like this:

 insert into hitlist_ (select id from ...)

where  is your user's id.  Once you do this, it's trivial to return each page to 
the user almost instantly using offset/limit, or by adding a ROW_NUM column of some 
sort.  We manage very large hitlists -- millions of rows.  Going from page 1 to page 100,000 takes 
a fraction of a second.

It also has the advantage that the user can come back in a week or a month and 
the results are still there.

The drawback are:

1. Before the user gets the first page, the entire query must complete.
2. You need a way to clean up old hitlists.
3. If you have tens of thousands of users, you'll have a large number of 
hitlists, and you have to use tablespaces to ensure that Linux filesystem 
directories don't get too large.
4. It takes space to store everyone's data.  (But disk space is so cheap this 
isn't much of an issue.)

You can eliminate #3 by a single shared hitlist with a column of UserID's.  But 
experience shows that a big shared hitlist doesn't work very well:  Inserts get 
slower because the UserID column must be indexed, and you can truncate 
individual hitlists but you have to delete from a shared hitlist.

Craig

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Suspending SELECTs

2006-01-17 Thread Tom Lane
Mark Kirkwood [EMAIL PROTECTED] writes:
 SELECT ... FROM table WHERE ... ORDER BY id LIMIT 20;

 Suppose this displays records for id 1 - 10020.
 When the user hits next, and page saves id=10020 in the session state 
 and executes:

 SELECT ... FROM table WHERE ... AND id  10020 ORDER BY id LIMIT 20;

 Clearly you have to be a little careful about whether to use '' or '=' 
 depending on whether 'id' is unique or not (to continue using '' in the 
 non unique case, you can just save and use all the members of the 
 primary key too).

This is actually fairly painful to get right for a multi-column key
at the moment.  It'll be much easier once I finish up the
SQL-spec-row-comparison project.  See this thread for background:
http://archives.postgresql.org/pgsql-performance/2004-07/msg00188.php

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Getting pg to use index on an inherited table (8.1.1)

2006-01-17 Thread Tom Lane
Hari Warrier [EMAIL PROTECTED] writes:
 A query on foobar2 (child table) uses the index, whereas the same query 
 via foobar (parent) doesn't.

A query just on foobar should be able to use the index AFAIR.  The
problem here is that you have a join, and we are not very good about
situations involving joins against inheritance sets (nor joins against
UNION ALL subqueries, which is really about the same thing).

I'm hoping to get a chance to look into improving this during the 8.2
development cycle.

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] Multiple Order By Criteria

2006-01-17 Thread Ahmad Fajar


-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of [EMAIL PROTECTED]
Sent: Rabu, 18 Januari 2006 07:23
To: Stephan Szabo
Cc: Josh Berkus; pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Multiple Order By Criteria

I've read all of this info, closely. I wish when I was searching for an
answer for my problem these pages came up. Oh well.
Well, I think you have to know about btree index. Btree is good enough,
although it's not better. It will perform best, if it doesn't index
too many multiple column.
In your case, you have to consentrate on 2 or 3 fields that will
use frequently. Put the most duplicate value on the front and others
are behind.
Eq: 
receipt, carrier_id, batchnum is the most frequently use, 
but the most duplicate value are: carrier_id, receipt, and batchnum
so make btree index (carrier_id, receipt, batchnum).
Btree will not suffer, and we also will advantage if the table
have relationship with other table with the same fields order. We have
not to make another index for that relation.

Best regards,
ahmad fajar.


 I am getting an idea of what I need to do to make this work well. I was
 wondering if there is more information to read on how to implement this
 solution in a more simple way. Much of what's written seems to be towards
 audience that should understand certain things automatically.
- Original Message - 
From: Stephan Szabo [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Cc: Josh Berkus josh@agliodbs.com; pgsql-performance@postgresql.org
Sent: Tuesday, January 17, 2006 6:39 PM
Subject: Re: [PERFORM] Multiple Order By Criteria



 On Tue, 17 Jan 2006 [EMAIL PROTECTED] wrote:

 I created the index like this:

 CREATE INDEX rcbee_idx
   ON detail_summary
   USING btree
   (receipt, carrier_id, batchnum, encounternum, encounter_id);

 Is this correct ?

 That would work if you were asking for all the columns ascending or
 descending, but we don't currently use it for mixed orders.

 How do I make a reverse opclass ?

 There's some information at the following:
 http://archives.postgresql.org/pgsql-novice/2005-10/msg00254.php
 http://archives.postgresql.org/pgsql-general/2005-01/msg00121.php
 http://archives.postgresql.org/pgsql-general/2004-06/msg00565.php



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Suspending SELECTs

2006-01-17 Thread Mark Kirkwood

Tom Lane wrote:

Mark Kirkwood [EMAIL PROTECTED] writes:


SELECT ... FROM table WHERE ... ORDER BY id LIMIT 20;




Suppose this displays records for id 1 - 10020.
When the user hits next, and page saves id=10020 in the session state 
and executes:




SELECT ... FROM table WHERE ... AND id  10020 ORDER BY id LIMIT 20;



Clearly you have to be a little careful about whether to use '' or '=' 
depending on whether 'id' is unique or not (to continue using '' in the 
non unique case, you can just save and use all the members of the 
primary key too).



This is actually fairly painful to get right for a multi-column key
at the moment.  It'll be much easier once I finish up the
SQL-spec-row-comparison project. 


Right, I think it was actually an Oracle 7.3 based web app (err... 
showing age here...) that I used this technique on.


Cheers

Mark

---(end of broadcast)---
TIP 6: explain analyze is your friend