Re: [PERFORM] Locking vs. Exceptions

2006-11-02 Thread Benjamin Minshall

Robins wrote:

Hi,

The documentation says that function blocks with exceptions are far 
costlier than without one.




I recommend against using exceptions.  There is a memory leak in the 
exception handler that will cause headaches if it is called many times 
in the transaction.


In plpgsql, I would use:

SELECT ... FOR UPDATE;
IF FOUND THEN
UPDATE ...;
ELSE
INSERT ...;
END IF;


If you have multiple transactions doing this process at the same time, 
you'll need explicit locking of the table to avoid a race condition.


--
Benjamin Minshall <[EMAIL PROTECTED]>
Senior Developer -- Intellicon, Inc.
http://www.intellicon.biz

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] VACUUMs take twice as long across all nodes

2006-11-02 Thread Vivek Khera


On Oct 27, 2006, at 2:07 PM, Tom Lane wrote:


8.2, but in existing releases I can't see much you can do about it
except REINDEX when things get slow.


This will be so nice for me.  I have one huge table with a massive  
amount of churn and bulk deletes.  I have to reindex it once every  
other month.  It takes about 60 to 75 minutes per index (times two  
indexes) else I'd do it monthly.


It shaves nearly 1/3 of the relpages off of the index size.



smime.p7s
Description: S/MIME cryptographic signature


Re: [PERFORM] Query plan for "heavy" SELECT with "lite" sub-SELECTs

2006-11-02 Thread Alvaro Herrera
Dave Dutcher wrote:
> > -Original Message-
> > From: [EMAIL PROTECTED] 
> > Nikolay Samokhvalov
> > 
> > What should I do to make Postgres work properly in such cases (I have
> > a lot of similar queries; surely, they are executed w/o seqscans, but
> > overall picture is the same - I see that starting from sub-selects
> > dramatically decrease performance)?
> 
> How about this:
> 
> explain analyze 
> select (select typname from pg_type where pg_type.oid=mainq.prorettype limit
> 1)
> from (select * from pg_proc offset 1500 limit 1) mainq;

What's the use of such a query?  One would think that in the real world,
you'd at least have an ORDER BY somewhere in the subqueries.

Performance analysis of strange queries is useful, but the input queries
have to be meaningful as well.  Otherwise you end up optimizing bizarre
and useless cases.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Setting "nice" values

2006-11-02 Thread Tobias Brox
[Madison Kelly - Thu at 10:25:07AM -0500]
> Will the priority of the script pass down to the pgsql queries it calls? 
> I figured (likely incorrectly) that because the queries were executed by 
> the psql server the queries ran with the server's priority. 

I think you are right, and in any case, I don't think the niceness
value won't help much if the bottleneck is iowait.

In our application, I've made a special function for doing
low-priority transactions which I believe is quite smart - though maybe
not always.  Before introducing this logic, we observed we had a tipping
point, too many queries, and the database wouldn't swallow them fast
enough, and the database server just jammed up, trying to work at too
many queries at once, yielding the results far too slow.

In the config file, I now have those two flags set:

 stats_start_collector = on
 stats_command_string = on

This will unfortunately cause some CPU-load, but the benefit is great
- one can actually check what the server is working with at any time:

  select * from pg_stat_activity

with those, it is possible to check a special view pg_stat_activity -
it will contain all the queries the database is working on right now.
My idea is to peek into this table - if there is no active queries,
the database is idle, and it's safe to start our low-priority
transaction.  If this view is full of stuff, one should certainly not
run any low-priority transactions, rather sleep a bit and try again
later.

 select count(*) from pg_stat_activity where not current_query like
 '%' and query_start+?

Re: [PERFORM] Database-wide vacuum can take a long time, duringwhich tables are not being analyzed

2006-11-02 Thread Alvaro Herrera
Steven Flatt wrote:
> Sorry, I think there's a misunderstanding here.  Our system is not doing
> near that number of transactions per second.  I meant that the duration of a
> single DB-wide vacuum takes on the order of a couple of weeks.  The time
> between DB-wide vacuums is a little over a year, I believe.

I wonder if this is using some vacuum delay setting?

If that's the case, I think you could manually run a database-wide
vacuum with a zero vacuum delay setting, so that said vacuum takes less
time to finish (say, once every 8 months).

(8.2 pretty much solves this issue BTW, by not requiring database-wide
vacuums).

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Setting "nice" values

2006-11-02 Thread Scott Marlowe
On Thu, 2006-11-02 at 09:25, Madison Kelly wrote:
> Scott Marlowe wrote:
> > On Thu, 2006-11-02 at 09:14, Madison Kelly wrote:
> >> Hi all,
> >>
> >>I've got a script (perl, in case it matters) that I need to run once
> >> a month to prepare statements. This script queries and updates the
> >> database a *lot*. I am not concerned with the performance of the SQL
> >> calls so much as I am about the impact it has on the server's load.
> >>
> >>Is there a way to limit queries speed (ie: set a low 'nice' value on
> >> a query)? This might be an odd question, or I could be asking the
> >> question the wrong way, but hopefully you the idea. :)
> > 
> > While you can safely set the priority lower on the calling perl script,
> > setting db backend priorities lower can result in problems caused by
> > "priority inversion"  Look up that phrase on the pgsql admin, perform,
> > general, or hackers lists for an explanation, or go here:
> > 
> > http://en.wikipedia.org/wiki/Priority_inversion
> > 
> > I have a simple script that grabs raw data from an oracle db and shoves
> > it into a postgresql database for reporting purposes.  Every 100 rows I
> > put into postgresql, I usleep 10 or so and the load caused by that
> > script on both systems is minimal.  You might try something like that.
> 
> Will the priority of the script pass down to the pgsql queries it calls? 
> I figured (likely incorrectly) that because the queries were executed by 
> the psql server the queries ran with the server's priority. If this 
> isn't the case, then perfect. :)

nope, the priorities don't pass down.  you connect via a client lib to
the server, which spawns a backend process that does the work for you. 
The backend process inherits its priority from the postmaster that
spawns it, and they all run at the same priority.

> Thanks for the tip, too, it's something I will try.

Sometimes it's the simple solutions that work best.  :)  Welcome to the
world of pgsql, btw...

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Setting "nice" values

2006-11-02 Thread Madison Kelly

Scott Marlowe wrote:

On Thu, 2006-11-02 at 09:14, Madison Kelly wrote:

Hi all,

   I've got a script (perl, in case it matters) that I need to run once
a month to prepare statements. This script queries and updates the
database a *lot*. I am not concerned with the performance of the SQL
calls so much as I am about the impact it has on the server's load.

   Is there a way to limit queries speed (ie: set a low 'nice' value on
a query)? This might be an odd question, or I could be asking the
question the wrong way, but hopefully you the idea. :)


While you can safely set the priority lower on the calling perl script,
setting db backend priorities lower can result in problems caused by
"priority inversion"  Look up that phrase on the pgsql admin, perform,
general, or hackers lists for an explanation, or go here:

http://en.wikipedia.org/wiki/Priority_inversion

I have a simple script that grabs raw data from an oracle db and shoves
it into a postgresql database for reporting purposes.  Every 100 rows I
put into postgresql, I usleep 10 or so and the load caused by that
script on both systems is minimal.  You might try something like that.


Will the priority of the script pass down to the pgsql queries it calls? 
I figured (likely incorrectly) that because the queries were executed by 
the psql server the queries ran with the server's priority. If this 
isn't the case, then perfect. :)


Thanks for the tip, too, it's something I will try.

Madi

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Setting "nice" values

2006-11-02 Thread Scott Marlowe
On Thu, 2006-11-02 at 09:14, Madison Kelly wrote:
> Hi all,
> 
>I've got a script (perl, in case it matters) that I need to run once
> a month to prepare statements. This script queries and updates the
> database a *lot*. I am not concerned with the performance of the SQL
> calls so much as I am about the impact it has on the server's load.
> 
>Is there a way to limit queries speed (ie: set a low 'nice' value on
> a query)? This might be an odd question, or I could be asking the
> question the wrong way, but hopefully you the idea. :)

While you can safely set the priority lower on the calling perl script,
setting db backend priorities lower can result in problems caused by
"priority inversion"  Look up that phrase on the pgsql admin, perform,
general, or hackers lists for an explanation, or go here:

http://en.wikipedia.org/wiki/Priority_inversion

I have a simple script that grabs raw data from an oracle db and shoves
it into a postgresql database for reporting purposes.  Every 100 rows I
put into postgresql, I usleep 10 or so and the load caused by that
script on both systems is minimal.  You might try something like that.

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Database-wide vacuum can take a long time, duringwhich tables are not being analyzed

2006-11-02 Thread Steven Flatt
Sorry, I think there's a misunderstanding here.  Our system is not doing near that number of transactions per second.  I meant that the duration of a single DB-wide vacuum takes on the order of a couple of weeks.  The time between DB-wide vacuums is a little over a year, I believe.

 
 

Every couple of weeks implies a transaction rate of ~~500tps, so I'd beinterested to hear more about your system.
--Simon RiggsEnterpriseDB   http://www.enterprisedb.com


[PERFORM] Setting "nice" values

2006-11-02 Thread Madison Kelly

Hi all,

  I've got a script (perl, in case it matters) that I need to run once
a month to prepare statements. This script queries and updates the
database a *lot*. I am not concerned with the performance of the SQL
calls so much as I am about the impact it has on the server's load.

  Is there a way to limit queries speed (ie: set a low 'nice' value on
a query)? This might be an odd question, or I could be asking the
question the wrong way, but hopefully you the idea. :)

Thanks!

Madi


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Query plan for "heavy" SELECT with "lite" sub-SELECTs

2006-11-02 Thread Dave Dutcher
> -Original Message-
> From: [EMAIL PROTECTED] 
> Nikolay Samokhvalov
> 
> What should I do to make Postgres work properly in such cases (I have
> a lot of similar queries; surely, they are executed w/o seqscans, but
> overall picture is the same - I see that starting from sub-selects
> dramatically decrease performance)?

How about this:

explain analyze 
select (select typname from pg_type where pg_type.oid=mainq.prorettype limit
1)
from (select * from pg_proc offset 1500 limit 1) mainq;

QUERY PLAN

-
 Subquery Scan mainq  (cost=50.99..56.85 rows=1 width=4) (actual
time=13.646..13.659 rows=1 loops=1)
   ->  Limit  (cost=50.99..51.02 rows=1 width=310) (actual
time=13.575..13.579 rows=1 loops=1)
 ->  Seq Scan on pg_proc  (cost=0.00..62.34 rows=1834 width=310)
(actual time=0.014..7.297 rows=1501 loops=1)
   SubPlan
 ->  Limit  (cost=0.00..5.82 rows=1 width=64) (actual time=0.038..0.043
rows=1 loops=1)
   ->  Index Scan using pg_type_oid_index on pg_type
(cost=0.00..5.82 rows=1 width=64) (actual time=0.028..0.028 rows=1 loops=1)
 Index Cond: (oid = $0)
 Total runtime: 13.785 ms

I would expect you to get closer to 2 ms on that query.  My machine takes 13
ms to do just the seq scan of pg_proc.

Dave




---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


[PERFORM] Locking vs. Exceptions

2006-11-02 Thread Robins
Hi,The documentation says that function blocks with exceptions are far costlier than without one.So if I need to implement an INSTEAD OF trigger (after checking for unique constraint violations) which way should I go ?
1. Get a table lock2. Use 'Select ... For Update' (which could be used to lock only the desired recordsets)3. Use ExceptionsAny advice / experiences or even pointers would be helpful.
ThanksRobins Tharakan


Re: [PERFORM] Query plan for "heavy" SELECT with "lite" sub-SELECTs

2006-11-02 Thread Richard Huxton

Nikolay Samokhvalov wrote:

2. explain analyze select
 *,
 (select typname from pg_type where pg_type.oid=pg_proc.prorettype limit 1)
from pg_proc offset 1500 limit 1;
"Limit  (cost=8983.31..8989.30 rows=1 width=365) (actual
time=17.648..17.649 rows=1 loops=1)"
"  ->  Seq Scan on pg_proc  (cost=0.00..13486.95 rows=2252 width=365)
(actual time=0.100..16.851 rows=1501 loops=1)"
"SubPlan"
"  ->  Limit  (cost=0.00..5.91 rows=1 width=64) (actual
time=0.006..0.007 rows=1 loops=1501)"
"->  Index Scan using pg_type_oid_index on pg_type
(cost=0.00..5.91 rows=1 width=64) (actual time=0.004..0.004 rows=1
loops=1501)"
"  Index Cond: (oid = $0)"
"Total runtime: 17.784 ms"

We see that in the 2nd example Postgres starts with "Index Scan using
pg_type_oid_index" (1501 iterations!).


No, what you see here is that the inner loop is the index-scan over 
pg_type_oid. It's running a sequential scan on pg_proc and then runs 
1501 index scans against pg_type.



My understanding of SQL says me
that the simplest (and, in this case - and probably in *most* cases -
fastest) way to perform such queries is to start from main SELECT and
then, when we already have rows from "main" table, perform "lite"
sub-selects. So, I expected  smth near 2.156 ms + 0.079 ms, but obtain
17.784 ms... For large table this is killing behaviour.


You've forgotten about the cost of matching up the two sets of rows. 
Now, if the first part of the query outputs only one row then you might 
be right, but I'm not sure that the SQL standard allows the subquery to 
be delayed to that stage without explicitly organising the query that 
way. From memory, the OFFSET/LIMIT takes place at the very end of the 
query processing.



What should I do to make Postgres work properly in such cases (I have
a lot of similar queries; surely, they are executed w/o seqscans, but
overall picture is the same - I see that starting from sub-selects
dramatically decrease performance)?


Do you have a real example? That might be more practical.

--
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


[PERFORM] Query plan for "heavy" SELECT with "lite" sub-SELECTs

2006-11-02 Thread Nikolay Samokhvalov

Hello,

I do not understand, why Postgres very ofter starts execution from
sub-select instead of doing main select and then proceeding to "lite"
sub-selects. For example:

(example is quite weird, but it demonstrates the problem)

1. explain analyze select * from pg_proc offset 1500 limit 1;
"Limit  (cost=116.91..116.99 rows=1 width=365) (actual
time=2.111..2.112 rows=1 loops=1)"
"  ->  Seq Scan on pg_proc  (cost=0.00..175.52 rows=2252 width=365)
(actual time=0.034..1.490 rows=1501 loops=1)"
"Total runtime: 2.156 ms"

3. explain analyze select oid,* from pg_type where oid=2277 limit 1;
"Limit  (cost=0.00..5.91 rows=1 width=816) (actual time=0.021..0.022
rows=1 loops=1)"
"  ->  Index Scan using pg_type_oid_index on pg_type  (cost=0.00..5.91
rows=1 width=816) (actual time=0.018..0.018 rows=1 loops=1)"
"Index Cond: (oid = 2277::oid)"
"Total runtime: 0.079 ms"

2. explain analyze select
 *,
 (select typname from pg_type where pg_type.oid=pg_proc.prorettype limit 1)
from pg_proc offset 1500 limit 1;
"Limit  (cost=8983.31..8989.30 rows=1 width=365) (actual
time=17.648..17.649 rows=1 loops=1)"
"  ->  Seq Scan on pg_proc  (cost=0.00..13486.95 rows=2252 width=365)
(actual time=0.100..16.851 rows=1501 loops=1)"
"SubPlan"
"  ->  Limit  (cost=0.00..5.91 rows=1 width=64) (actual
time=0.006..0.007 rows=1 loops=1501)"
"->  Index Scan using pg_type_oid_index on pg_type
(cost=0.00..5.91 rows=1 width=64) (actual time=0.004..0.004 rows=1
loops=1501)"
"  Index Cond: (oid = $0)"
"Total runtime: 17.784 ms"

We see that in the 2nd example Postgres starts with "Index Scan using
pg_type_oid_index" (1501 iterations!). My understanding of SQL says me
that the simplest (and, in this case - and probably in *most* cases -
fastest) way to perform such queries is to start from main SELECT and
then, when we already have rows from "main" table, perform "lite"
sub-selects. So, I expected  smth near 2.156 ms + 0.079 ms, but obtain
17.784 ms... For large table this is killing behaviour.

What should I do to make Postgres work properly in such cases (I have
a lot of similar queries; surely, they are executed w/o seqscans, but
overall picture is the same - I see that starting from sub-selects
dramatically decrease performance)?

--
Best regards,
Nikolay

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Help w/speeding up range queries?

2006-11-02 Thread Simon Riggs
On Tue, 2006-10-31 at 18:18 -0500, John Major wrote:

> #I am a biologist, and work with large datasets (tables with millions of 
> rows are common).
> #These datasets often can be simplified as features with a name, and a 
> start and end position (ie:  a range along a number line.  GeneX is on 
> some chromosome from position 10->40)

Do you know about www.biopostgres.org ?

I believe they provide some additional indexing mechanisms for just this
type of data.

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [PERFORM] Help w/speeding up range queries?

2006-11-02 Thread Marcin Mank
> Ie:  select FeatureID from SIMPLE_TABLE where FeatureChromosomeName like
> 'chrX' and StartPosition > 1000500 and EndPosition < 200;

How about ( this assumes that StartPosition <= EndPosition ):

select FeatureID
from SIMPLE_TABLE
where FeatureChromosomeName llike 'chrX'
and StartPosition > 1000500
and StartPosition < 200
and EndPosition > 1000500
and EndPosition < 200;


This at least should help the planner with estimating number of rows.

Also think twice when You assume that a query with ILIKE will use an index.
Read about varchar_pattern_ops.
Make an index on (FeatureChromosomeName,StartPosition) , and all should be
fine.

Greetings
Marcin


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Database-wide vacuum can take a long time, duringwhich tables are not being analyzed

2006-11-02 Thread Simon Riggs
On Wed, 2006-11-01 at 14:15 -0500, Steven Flatt wrote:
> Here is a potential problem with the auto-vacuum daemon, and I'm
> wondering if anyone has considered this.  To avoid transaction ID
> wraparound, the auto-vacuum daemon will periodically determine that it
> needs to do a DB-wide vacuum, which takes a long time.  On our system,
> it is on the order of a couple of weeks.  (The system is very busy and
> there is a lot of I/O going on pretty much 24/7).  During this period
> of time, there is nothing to automatically analyze any of the tables,
> leading to further performance problems.  What are your thoughts on
> having the DB-wide vacuum running on a separate thread so that the
> daemon can concurrently wake up and take care of analyzing tables?

Yes, do it.

Every couple of weeks implies a transaction rate of ~~500tps, so I'd be
interested to hear more about your system. 

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq