[PERFORM] Suspending SELECTs

2006-01-16 Thread Alessandro Baretta
I am aware that what I am dreaming of is already available through cursors, but 
in a web application, cursors are bad boys, and should be avoided. What I would 
like to be able to do is to plan a query and run the plan to retreive a limited 
number of rows as well as the executor's state. This way, the burden of 
maintaining the cursor on hold, between activations of the web resource which 
uses it, is transferred from the DBMS to the web application server, and, most 
importantly, the responsibility for garbage-collecting stale cursors is 
implicitely delegated to the garbage-collector of active user sessions. Without 
this mechanism, we are left with two equally unpleasant solutions: first, any 
time a user instantiates a new session, a new cursor would have to be declared 
for all relevant queries, and an ad-hoc garbage collection daemon would have to 
be written to periodically scan the database for stale cursors to be closed; 
otherwise, instead of using cursors, the web application could resort to 
OFFSET-LIMIT queries--no garbage collection issues but pathetic performance and 
server-load.


Do we have any way out?

Alex

--
*
http://www.barettadeit.com/
Baretta DEIT
A division of Baretta SRL

tel. +39 02 370 111 55
fax. +39 02 370 111 54

Our technology:

The Application System/Xcaml (AS/Xcaml)
http://www.asxcaml.org/

The FreerP Project
http://www.freerp.org/

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


[PERFORM] Materialized Views

2006-01-16 Thread Michael Riess

Hi,

I've been reading an interesting article which compared different 
database systems, focusing on materialized views. I was wondering how 
the postgresql developers feel about this feature ... is it planned to 
implement materialized views any time soon? They would greatly improve 
both performance and readability (and thus maintainability) of my code.


In particular I'm interested in a view which materializes whenever 
queried, and is invalidated as soon as underlying data is changed.


Mike

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Materialized Views

2006-01-16 Thread Frank Wiles
On Mon, 16 Jan 2006 15:36:53 +0100
Michael Riess [EMAIL PROTECTED] wrote:

 Hi,
 
 I've been reading an interesting article which compared different 
 database systems, focusing on materialized views. I was wondering how 
 the postgresql developers feel about this feature ... is it planned
 to implement materialized views any time soon? They would greatly
 improve both performance and readability (and thus maintainability)
 of my code.
 
 In particular I'm interested in a view which materializes whenever 
 queried, and is invalidated as soon as underlying data is changed.

  You can already build materialized views in PostgreSQL, but you
  end up doing the heavy lifting yourself with triggers. You put
  insert/update/delete triggers on the underlying tables of your
  view that do the right thing in your materialized view table.

  I wrote a blog entry about this recently,
  http://revsys.com/blog/archive/9, where I used a very simple
  materialized view to achieve the performance I needed. It has links
  to the relevant documentation you'll need however to build triggers
  for a more complex situation. 

  Hope this helps! 

 -
   Frank Wiles [EMAIL PROTECTED]
   http://www.wiles.org
 -


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] Materialized Views

2006-01-16 Thread me

hi mike

In particular I'm interested in a view which materializes whenever 
queried, and is invalidated as soon as underlying data is changed.


from the german pgsql list earlier last week:

http://jonathangardner.net/PostgreSQL/materialized_views/matviews.html

this seems to be pretty much what you want (except you'll have to update 
everything yourself). would be really nice if pgsql supports this in-house


cheers,
thomas 




---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PERFORM] Materialized Views

2006-01-16 Thread Michael Riess

Thanks!

Of course I know that I can build materialized views with triggers, but 
so far I've avoided using triggers altogether ... I would really 
appreciate something like create view foo (select * from b) materialize 
on query.


But I'll look into your blog entry, thanks again!

Mike

On Mon, 16 Jan 2006 15:36:53 +0100
Michael Riess [EMAIL PROTECTED] wrote:


Hi,

I've been reading an interesting article which compared different 
database systems, focusing on materialized views. I was wondering how 
the postgresql developers feel about this feature ... is it planned

to implement materialized views any time soon? They would greatly
improve both performance and readability (and thus maintainability)
of my code.

In particular I'm interested in a view which materializes whenever 
queried, and is invalidated as soon as underlying data is changed.


  You can already build materialized views in PostgreSQL, but you
  end up doing the heavy lifting yourself with triggers. You put
  insert/update/delete triggers on the underlying tables of your
  view that do the right thing in your materialized view table.

  I wrote a blog entry about this recently,
  http://revsys.com/blog/archive/9, where I used a very simple
  materialized view to achieve the performance I needed. It has links
  to the relevant documentation you'll need however to build triggers
  for a more complex situation. 

  Hope this helps! 


 -
   Frank Wiles [EMAIL PROTECTED]
   http://www.wiles.org
 -


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


[PERFORM] Use of * affect the performance

2006-01-16 Thread Marcos
Hi,

I always think that use of * in SELECT affected in the performance,
becoming the search slowest.

But I read in the a Postgres book's that it increases the speed of
search.

And now What the more fast?

Thanks


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Suspending SELECTs

2006-01-16 Thread Tom Lane
Alessandro Baretta [EMAIL PROTECTED] writes:
 I am aware that what I am dreaming of is already available through
 cursors, but in a web application, cursors are bad boys, and should be
 avoided. What I would like to be able to do is to plan a query and run
 the plan to retreive a limited number of rows as well as the
 executor's state. This way, the burden of maintaining the cursor on
 hold, between activations of the web resource which uses it, is
 transferred from the DBMS to the web application server,

This is a pipe dream, I'm afraid, as the state of a cursor does not
consist exclusively of bits that can be sent somewhere else and then
retrieved.  There are also locks to worry about, as well as the open
transaction itself, and these must stay alive inside the DBMS because
they affect the behavior of other transactions.  As an example, once
the cursor's originating transaction closes, there is nothing to stop
other transactions from modifying or removing rows it would have read.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Suspending SELECTs

2006-01-16 Thread Alvaro Herrera
Tom Lane wrote:
 Alessandro Baretta [EMAIL PROTECTED] writes:
  I am aware that what I am dreaming of is already available through
  cursors, but in a web application, cursors are bad boys, and should be
  avoided. What I would like to be able to do is to plan a query and run
  the plan to retreive a limited number of rows as well as the
  executor's state. This way, the burden of maintaining the cursor on
  hold, between activations of the web resource which uses it, is
  transferred from the DBMS to the web application server,
 
 This is a pipe dream, I'm afraid, as the state of a cursor does not
 consist exclusively of bits that can be sent somewhere else and then
 retrieved.

I wonder if we could have a way to suspend a transaction and restart
it later in another backend.  I think we could do something like this
using the 2PC machinery.

Not that I'm up for coding it; just an idea that crossed my mind.

-- 
Alvaro Herrera   Developer, http://www.PostgreSQL.org
Oh, oh, las chicas galacianas, lo harán por las perlas,
¡Y las de Arrakis por el agua! Pero si buscas damas
Que se consuman como llamas, ¡Prueba una hija de Caladan! (Gurney Halleck)

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Suspending SELECTs

2006-01-16 Thread Tom Lane
Alvaro Herrera [EMAIL PROTECTED] writes:
 I wonder if we could have a way to suspend a transaction and restart
 it later in another backend.  I think we could do something like this
 using the 2PC machinery.
 Not that I'm up for coding it; just an idea that crossed my mind.

It's not impossible, perhaps, but it would require an order-of-magnitude
expansion of the 2PC machinery --- the amount of state associated with
an open execution plan is daunting.  I think there are discussions about
this in the archives.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Suspending SELECTs

2006-01-16 Thread Mark Lewis
On Mon, 2006-01-16 at 11:13 +0100, Alessandro Baretta wrote:
 I am aware that what I am dreaming of is already available through cursors, 
 but 
 in a web application, cursors are bad boys, and should be avoided. What I 
 would 
 like to be able to do is to plan a query and run the plan to retreive a 
 limited 
 number of rows as well as the executor's state. This way, the burden of 
 maintaining the cursor on hold, between activations of the web resource 
 which 
 uses it, is transferred from the DBMS to the web application server, and, 
 most 
 importantly, the responsibility for garbage-collecting stale cursors is 
 implicitely delegated to the garbage-collector of active user sessions. 
 Without 
 this mechanism, we are left with two equally unpleasant solutions: first, any 
 time a user instantiates a new session, a new cursor would have to be 
 declared 
 for all relevant queries, and an ad-hoc garbage collection daemon would have 
 to 
 be written to periodically scan the database for stale cursors to be closed; 
 otherwise, instead of using cursors, the web application could resort to 
 OFFSET-LIMIT queries--no garbage collection issues but pathetic performance 
 and 
 server-load.
 
 Do we have any way out?
 
 Alex

I know that Tom has pretty much ruled out any persistent cursor
implementation in the database, but here's an idea for a workaround in
the app:

Have a pool of connections used for these queries.  When a user runs a
query the first time, create a cursor and remember that this user
session is associated with that particular connection.  When the user
tries to view the next page of results, request that particular
connection from the pool and continue to use the cursor.  Between
requests, this connection could of course be used to service other
users.

This avoids the awfulness of tying up a connection for the entire course
of a user session, but still allows you to use cursors for
performance.  

When a user session is invalidated or times out, you remove the mapping
for this connection and close the cursor.  Whenever there are no more
mappings for a particular connection, you can use the opportunity to
close the current transaction (to prevent eternal transactions).

If the site is at all busy, you will need to implement a pooling policy
such as 'do not open new cursors on the connection with the oldest
transaction', which will ensure that all transactions can be closed in a
finite amount of time, the upper bound on the duration of a transaction
is (longest_session_duration * connections in pool).

Limitations:

1. You shouldn't do anything that acquires write locks on the database
using these connections, because the transactions will be long-running.
To mitigate this, use a separate connection pool.

2. Doesn't work well if some queries take a long time to run, because
other users may need to wait for the connection, and another connection
won't do.

3. If this is a busy web site, you might end up with potentially many
thousands of open cursors.  I don't know if this introduces an
unacceptable performance penalty or other bottleneck in the server?

-- Mark Lewis

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Suspending SELECTs

2006-01-16 Thread Craig A. James


Alessandro Baretta [EMAIL PROTECTED] writes:

I am aware that what I am dreaming of is already available through
cursors, but in a web application, cursors are bad boys, and should be
avoided. What I would like to be able to do is to plan a query and run
the plan to retreive a limited number of rows as well as the
executor's state. This way, the burden of maintaining the cursor on
hold, between activations of the web resource which uses it, is
transferred from the DBMS to the web application server,


I think you're trying to do something at the wrong layer of your architecture.  
This task normally goes in your middleware layer, not your database layer.

There are several technologies that allow you to keep persistent database sessions open 
(for example, mod_perl, mod_cgi among others).  If you combine these with what's called 
session affinity (the ability of a load-balancing server to route a 
particular user back to the same persistent session object every time), then you can 
create a middleware layer that does exactly what you need.

Basically, you create a session object that holds all of the state (such as 
your cursor, and anything else you need to maintain between requests), and send 
back a cookie to the client.  Each time the client reconnects, your server 
finds the user's session object using the cookie, and you're ready to go.

The main trick is that you have to manage your session objects, primarily to 
flush the full state to the database, if too much time elapses between 
requests, and then be able to re-create them on demand.  Enterprise Java Beans 
has a large fraction of its design devoted to this sort of object management.

There are solutions for this in just about every middleware technology, from Apache/perl 
to EJB to CORBA.  Search for session affinity and you should find a lot of 
information.

Craig

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Use of * affect the performance

2006-01-16 Thread Qingqing Zhou

Marcos [EMAIL PROTECTED] wrote

 I always think that use of * in SELECT affected in the performance,
 becoming the search slowest.

 But I read in the a Postgres book's that it increases the speed of
 search.

 And now What the more fast?


If you mean use * vs. explicitely name all columns of a relation, then 
there is almost no difference except the negligible difference in parsing. 
If you mean you just want part of the columns of a relation but you still 
use *: Yes, you will save one projection operation for each result row but 
you will pay for more network traffic. In the worst case, say your * 
involves some toast attributes, you just hurt performance. Considering the 
benefits is so marginal and dangerous, I suggest stay with the idea that 
only retrive the columns that you are interested in.

Regards,
Qingqing 



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


[PERFORM] new to postgres (and db management) and performance already a problem :-(

2006-01-16 Thread Antoine

Hi,
We have a horribly designed postgres 8.1.0 database (not my fault!). I 
am pretty new to database design and management and have really no idea 
how to diagnose performance problems. The db has only 25-30 tables, and 
half of them are only there because our codebase needs them (long story, 
again not my fault!). Basically we have 10 tables that are being 
accessed, and only a couple of queries that join more than 3 tables. 
Most of the action takes place on two tables. One of the devs has done 
some truly atrocious coding and is using the db as his data access 
mechanism (instead of an in-memory array, and he only needs an 
array/collection).
It is running on an p4 3000ish (desktop model) running early linux 2.6 
(mdk 10.1) (512meg of ram) so that shouldn't be an issue, as we are 
talking only about 2 inserts a day. It probably gets queried about 
2 times a day too (all vb6 via the pg odbc).
So... seeing as I didn't really do any investigation as to setting 
default sizes for storage and the like - I am wondering whether our 
performance problems (a programme running 1.5x slower than two weeks 
ago) might not be coming from the db (or rather, my maintaining of it). 
I have turned on stats, so as to allow autovacuuming, but have no idea 
whether that could be related. Is it better to schedule a cron job to do 
it x times a day? I just left all the default values in postgres.conf... 
could I do some tweaking?
Does anyone know of any practical resources that might guide me in 
sorting out these sorts of problems? Some stuff with pratical examples 
would be good so I could compare with what we have.

Thanks
Antoine
ps. I had a look with top and it didn't look like it was going much over 
15% cpu, with memory usage negligeable. There are usually about 10 open 
connections. I couldn't find an easy way to check for disk accessings.
pps. The db is just one possible reason for our bottleneck so if you 
tell me it is very unlikely I will be most reassured!


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] new to postgres (and db management) and performance already a problem :-(

2006-01-16 Thread Andrew Sullivan
On Mon, Jan 16, 2006 at 11:07:52PM +0100, Antoine wrote:

 performance problems (a programme running 1.5x slower than two weeks 
 ago) might not be coming from the db (or rather, my maintaining of it). 
 I have turned on stats, so as to allow autovacuuming, but have no idea 
 whether that could be related. Is it better to schedule a cron job to do 
 it x times a day? I just left all the default values in postgres.conf... 
 could I do some tweaking?

The first thing you need to do is find out where your problem is. 
Are queries running slowly?  You need to do some EXPLAIN ANALYSE
queries to understand that.

A


-- 
Andrew Sullivan  | [EMAIL PROTECTED]
The whole tendency of modern prose is away from concreteness.
--George Orwell

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] new to postgres (and db management) and performance already a problem :-(

2006-01-16 Thread Tom Lane
Antoine [EMAIL PROTECTED] writes:
 So... seeing as I didn't really do any investigation as to setting 
 default sizes for storage and the like - I am wondering whether our 
 performance problems (a programme running 1.5x slower than two weeks 
 ago) might not be coming from the db (or rather, my maintaining of it). 

That does sound like a lack-of-vacuuming problem.  If the performance
goes back where it was after VACUUM FULL, then you can be pretty sure
of it.  Note that autovacuum is not designed to fix this for you: it
only ever issues regular vacuum not vacuum full.

 I couldn't find an easy way to check for disk accessings.

Watch the output of vmstat 1 or iostat 1 for info about that.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] new to postgres (and db management) and performance already a problem :-(

2006-01-16 Thread me

That does sound like a lack-of-vacuuming problem.  If the performance
goes back where it was after VACUUM FULL, then you can be pretty sure
of it.  Note that autovacuum is not designed to fix this for you: it
only ever issues regular vacuum not vacuum full.


in our db system (for a website), i notice performance boosts after a vacuum 
full. but then, a VACUUM FULL takes 50min+ during which the db is not really 
accessible to web-users. is there another way to perform maintenance tasks 
AND leaving the db fully operable and accessible?


thanks,
thomas 




---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] new to postgres (and db management) and performance already a problem :-(

2006-01-16 Thread Tom Lane
[EMAIL PROTECTED] writes:
 in our db system (for a website), i notice performance boosts after a vacuum 
 full. but then, a VACUUM FULL takes 50min+ during which the db is not really 
 accessible to web-users. is there another way to perform maintenance tasks 
 AND leaving the db fully operable and accessible?

You're not doing regular vacuums often enough.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] new to postgres (and db management) and performance already a problem :-(

2006-01-16 Thread me
in our db system (for a website), i notice performance boosts after a 
vacuum
full. but then, a VACUUM FULL takes 50min+ during which the db is not 
really
accessible to web-users. is there another way to perform maintenance 
tasks

AND leaving the db fully operable and accessible?


You're not doing regular vacuums often enough.


well, shouldn't autovacuum take care of regular vacuums? in addition to 
autovacuum, tables with data changes are vacuumed and reindexed once a day - 
still performance seems to degrade slowly until a vacuum full is 
initiated... could an additional daily vacuum over the entire db (even on 
tables that only get data added, never changed or removed) help?


- thomas 




---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PERFORM] new to postgres (and db management) and performance already a problem :-(

2006-01-16 Thread Christopher Browne
 in our db system (for a website), i notice performance boosts after
 a vacuum
 full. but then, a VACUUM FULL takes 50min+ during which the db is
 not really
 accessible to web-users. is there another way to perform
 maintenance tasks
 AND leaving the db fully operable and accessible?

 You're not doing regular vacuums often enough.

By the way, you can get that VACUUM FULL to be less injurious if you
collect a list of tables:
pubs=# select table_schema, table_name from information_schema.tables
where table_type = 'BASE TABLE';

And then VACUUM FULL table by table.  It'll take the same 50 minutes;
it'll be more sporadically unusable which may turn out better.  But
that's just one step better; you want more steps :-).

 well, shouldn't autovacuum take care of regular vacuums? in addition
 to autovacuum, tables with data changes are vacuumed and reindexed
 once a day -
 still performance seems to degrade slowly until a vacuum full is
 initiated... could an additional daily vacuum over the entire db (even
 on tables that only get data added, never changed or removed) help?

Tables which never see updates/deletes don't need to get vacuumed very
often.  They should only need to get a periodic ANALYZE so that the
query optimizer gets the right stats.

There are probably many tables where pg_autovacuum is doing a fine
job.  What you need to do is to figure out which tables *aren't*
getting maintained well enough, and see about doing something special
to them.

What you may want to do is to go table by table and, for each one, do
two things:

1) VACUUM VERBOSE, which will report some information about how much
dead space there is on the table.

2) Contrib function pgstattuple(), which reports more detailed info
about space usage (alas, for just the table).

You'll find, between these, that there are some tables that have a LOT
of dead space.  At that point, there may be three answers:

a) PG 8.1 pg_autovacuum allows you to modify how often specific tables
are vacuumed; upping the numbers for the offending tables may clear
things up

b) Schedule cron jobs to periodically (hourly?  several times per
hour?) VACUUM the offending tables

c) You may decide to fall back to VACUUM FULL; if you do so just for a
small set of tables, the time of pain won't be the 50 minutes you're
living with now...

Try a), b), and c) in order on the offending tables as they address
the problem at increasing cost...
-- 
(reverse (concatenate 'string moc.liamg @ enworbbc))
http://linuxdatabases.info/info/x.html
Listen,  strange women, lyin'  in ponds,  distributin' swords,  is no
basis  for a  system of  government. Supreme  executive  power derives
itself from a mandate from  the masses, not from some farcical aquatic
ceremony.  -- Monty Python and the Holy Grail

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] new to postgres (and db management) and performance already a problem :-(

2006-01-16 Thread me

Try a), b), and c) in order on the offending tables as they address
the problem at increasing cost...


thanks alot for the detailed information! the entire concept of vacuum isn't 
yet that clear to me, so your explanations and hints are very much 
appreciated. i'll defenitely try these steps this weekend when the next full 
vacuum was scheduled :-)


best regards,
thomas 




---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org