Re: [PERFORM] Use my (date) index, darn it!

2004-01-05 Thread Tom Lane
John Siracusa [EMAIL PROTECTED] writes:
 Obviously the planner is making some bad choices here.

A fair conclusion ...

 I know that it is trying to avoid random seeks or other scary things
 implied by a correlation statistic that is not close to 1 or -1, but
 it is massively overestimating the hit caused by those seeks and
 seemingly not taking into account the size of the table!

You haven't given any evidence to support these conclusions, though.
Could we see some table schemas, EXPLAIN ANALYZE output, and relevant
pg_stats entries for the various cases?

regards, tom lane

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] optimizing Postgres queries

2004-01-05 Thread Shridhar Daithankar
On Monday 05 January 2004 16:58, David Teran wrote:
 We have some tests to check the performance and FrontBase is about 10
 times faster than Postgres. We already played around with explain
 analyse select. It seems that for large tables Postgres does not use an
 index. We often see the scan message in the query plan. Were can we
 find more hints about tuning the performance? The database is about 350
 MB large, without BLOB's. We tried to define every important index for
 the selects but it seems that something still goes wrong: FrontBase
 needs about 23 seconds for about 4300 selects and Postgres needs 4
 minutes, 34 seconds.

Check 
http://www.varlena.com/varlena/GeneralBits/Tidbits/perf.html
http://www.varlena.com/varlena/GeneralBits/Tidbits/annotated_conf_e.html

Are you sure you are using correct data types on indexes?

e.g. if field1 is an int2 field, then following query would not use an index.

select * from table where field1=2;

However following will

select * from table where field1=2::int2;

It is called as typecasting and postgresql is rather strict about it when it 
comes to making a decision of index usage.

I am sure above two tips could take care of some of the problems. 

Such kind of query needs more specific information. Can you post explain 
analyze output for queries and database schema.

 HTH

 Shridhar


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] optimizing Postgres queries

2004-01-05 Thread David Teran
Hi Shridhar,

Are you sure you are using correct data types on indexes?

Did not know about this...

e.g. if field1 is an int2 field, then following query would not use an 
index.

our fk have the type bigint, when i try one simple select like this:

explain analyze SELECT --columns-- FROM KEY_VALUE_META_DATA t0 WHERE 
t0.ID_FOREIGN_TABLE = 21110;

i see that no index is being used whereas when i use

explain analyze SELECT --columns-- FROM KEY_VALUE_META_DATA t0 WHERE 
t0.ID_FOREIGN_TABLE = 21110::bigint;

an index is used. Very fine, the performance is about 10 to 100 times 
faster for the single select.

I am using WebObjects with JDBC. I will now create a DB with integer 
instead of bigint and see how this performs.

regards David

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] optimizing Postgres queries

2004-01-05 Thread Shridhar Daithankar
On Monday 05 January 2004 17:35, David Teran wrote:
 explain analyze SELECT --columns-- FROM KEY_VALUE_META_DATA t0 WHERE
 t0.ID_FOREIGN_TABLE = 21110;

 i see that no index is being used whereas when i use

 explain analyze SELECT --columns-- FROM KEY_VALUE_META_DATA t0 WHERE
 t0.ID_FOREIGN_TABLE = 21110::bigint;

 an index is used. Very fine, the performance is about 10 to 100 times
 faster for the single select.

 I am using WebObjects with JDBC. I will now create a DB with integer
 instead of bigint and see how this performs.

The performance will likely to be the same. Its just that integer happens to 
be default integer type and hence it does not need an explicit typecast. ( I 
don't remember exactly which integer is default but it is either of int2,int4 
and int8...:-))

The performance diffference is likely due to use of index, which is in turn 
due to typecasting. If you need bigint, you should use them. Just remember to 
typecast whenever required.

 Shridhar


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] optimizing Postgres queries

2004-01-05 Thread David Teran
Hi,

The performance will likely to be the same. Its just that integer 
happens to
be default integer type and hence it does not need an explicit 
typecast. ( I
don't remember exactly which integer is default but it is either of 
int2,int4
and int8...:-))

The docs say int4 is much faster than int8, but i will check this.

The performance diffference is likely due to use of index, which is in 
turn
due to typecasting. If you need bigint, you should use them. Just 
remember to
typecast whenever required.
This is my bigger problem: i am using EOF (OR mapping tool) which frees 
me more or less form writing a lot of SQL. If i need to typecast to use 
an index then i have to see how to do this with this framework.

Regards David

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] optimizing Postgres queries

2004-01-05 Thread Shridhar Daithankar
On Monday 05 January 2004 17:48, David Teran wrote:
 Hi,

  The performance will likely to be the same. Its just that integer
  happens to
  be default integer type and hence it does not need an explicit
  typecast. ( I
  don't remember exactly which integer is default but it is either of
  int2,int4
  and int8...:-))

 The docs say int4 is much faster than int8, but i will check this.

Well yes. That is correct as well. 

What I (really) meant to say that an index scan to pick few in4 tuples 
wouldn't be hell much faster than an index scan to pick same number of tuples 
with int8 definition. 

The initial boost you got from converting to index scan, would be probably 
best you can beat out of it..

Of course if you are scanning a few million of them sequentially, then it is 
different story.

 This is my bigger problem: i am using EOF (OR mapping tool) which frees
 me more or less form writing a lot of SQL. If i need to typecast to use
 an index then i have to see how to do this with this framework.

Well, you can direct your queries to a function rather than table, that would 
cast the argument appropriately and select. Postgresql support function 
overloading as well, in case you need different types of arguments with same 
name.

Or you can write an instead rule on server side which will perform casting 
before touching the table.

I am not sure of exact details it would take to make it work, but it should 
work, at least in theory. That way you can preserve the efforts invested in 
the mapping tool. 

Of course, converting everything to integer might be a simpler option after 
all..:-)


 Shridhar


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] optimizing Postgres queries

2004-01-05 Thread Christopher Kings-Lynne

explain analyze SELECT --columns-- FROM KEY_VALUE_META_DATA t0 WHERE 
t0.ID_FOREIGN_TABLE = 21110::bigint;

an index is used. Very fine, the performance is about 10 to 100 times 
faster for the single select.
An alternative technique is to do this:

... t0.ID_FOREIGN_TABLE = '21110';

Chris

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] optimizing Postgres queries

2004-01-05 Thread Tom Lane
David Teran [EMAIL PROTECTED] writes:
 This is my bigger problem: i am using EOF (OR mapping tool) which frees 
 me more or less form writing a lot of SQL. If i need to typecast to use 
 an index then i have to see how to do this with this framework.

It's worth pointing out that this problem is fixed (at long last) in
CVS tip.  Ypu probably shouldn't expend large amounts of effort on
working around a problem that will go away in 7.5.

If you don't anticipate going to production for six months or so, you
could adopt CVS tip as your development platform, with the expectation
that 7.5 will be released by the time you need a production system.
I wouldn't recommend running CVS tip as a production database but it
should be plenty stable enough for devel purposes.

Another plan would be to use int4 columns for the time being with the
intention of widening them to int8 when you move to 7.5.  This would
depend on how soon you anticipate needing values  32 bits, of course.

regards, tom lane

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Use my (date) index, darn it!

2004-01-05 Thread John Siracusa
On 1/5/04 1:55 AM, Tom Lane wrote:
 John Siracusa [EMAIL PROTECTED] writes:
 Obviously the planner is making some bad choices here.
 
 A fair conclusion ...
 
 I know that it is trying to avoid random seeks or other scary things
 implied by a correlation statistic that is not close to 1 or -1, but
 it is massively overestimating the hit caused by those seeks and
 seemingly not taking into account the size of the table!
 
 You haven't given any evidence to support these conclusions, though.

Well here's what I was basing that theory on: before clustering, the
correlation for the date column was around 0.3.  After clustering, it was 1,
and the index was always used.  Does clustering change any other statistics
other that correlation?  I ran analyze immediately before and after the
cluster operation.

 Could we see some table schemas, EXPLAIN ANALYZE output, and relevant
 pg_stats entries for the various cases?

Well, the table is clustered now, so I can't reproduce the situation.  Is
there any way to uncluster a table?  Should I just cluster it on a
different column?

-John


---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] Use my (date) index, darn it!

2004-01-05 Thread Tom Lane
John Siracusa [EMAIL PROTECTED] writes:
 Is there any way to uncluster a table?  Should I just cluster it on a
 different column?

That should work, if you choose one that's uncorrelated with the
previous clustering attribute.

regards, tom lane

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Use my (date) index, darn it!

2004-01-05 Thread Christopher Browne
After a long battle with technology, [EMAIL PROTECTED] (John Siracusa), an earthling, 
wrote:
 On 1/5/04 1:55 AM, Tom Lane wrote:
 John Siracusa [EMAIL PROTECTED] writes:
 Obviously the planner is making some bad choices here.
 
 A fair conclusion ...
 
 I know that it is trying to avoid random seeks or other scary things
 implied by a correlation statistic that is not close to 1 or -1, but
 it is massively overestimating the hit caused by those seeks and
 seemingly not taking into account the size of the table!
 
 You haven't given any evidence to support these conclusions, though.

 Well here's what I was basing that theory on: before clustering, the
 correlation for the date column was around 0.3.  After clustering, it was 1,
 and the index was always used.  Does clustering change any other statistics
 other that correlation?  I ran analyze immediately before and after the
 cluster operation.

 Could we see some table schemas, EXPLAIN ANALYZE output, and relevant
 pg_stats entries for the various cases?

 Well, the table is clustered now, so I can't reproduce the situation.  Is
 there any way to uncluster a table?  Should I just cluster it on a
 different column?

That would presumably work...

It sounds to me as though the statistics that are being collected
aren't good enough.  That tends to be a sign that the quantity of
statistics (e.g. - bins in the histogram) are insufficient.

This would be resolved by changing the number of bins (default of 10)
via ALTER TABLE FOO ALTER COLUMN BAR SET STATISTICS 100 (or some
other value higher than 10).

Clustering would rearrange the contents of the table, and perhaps make
the histogram 'more representative.'  Increasing the SET STATISTICS
value will quite likely be even more helpful, and is a lot less
expensive than clustering the table...
-- 
If this was helpful, http://svcs.affero.net/rm.php?r=cbbrowne rate me
http://www.ntlug.org/~cbbrowne/nonrdbms.html
Rules of  the Evil Overlord  #158. I will  exchange the labels  on my
folder of  top-secret plans and  my folder of family  recipes. Imagine
the  hero's  surprise when  he  decodes  the  stolen plans  and  finds
instructions for Grandma's Potato Salad.
http://www.eviloverlord.com/

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] Use my (date) index, darn it!

2004-01-05 Thread John Siracusa
On 1/5/04 11:45 AM, Christopher Browne wrote:
 It sounds to me as though the statistics that are being collected
 aren't good enough.  That tends to be a sign that the quantity of
 statistics (e.g. - bins in the histogram) are insufficient.
 
 This would be resolved by changing the number of bins (default of 10)
 via ALTER TABLE FOO ALTER COLUMN BAR SET STATISTICS 100 (or some
 other value higher than 10).

I did that, but I wasn't sure what value to use and what column to increase.
I believe I increased the date column itself to 50 or something, but then I
wasn't sure what to do next.  I re-analyzed the table with the date column
set to 50 but it didn't seem to help, so I resorted to clustering.

 Clustering would rearrange the contents of the table, and perhaps make
 the histogram 'more representative.'  Increasing the SET STATISTICS
 value will quite likely be even more helpful, and is a lot less
 expensive than clustering the table...

What column(s) should I increase?  Do I have to do anything after increasing
the statistics, or do I just wait for the stats collector to do its thing?

-John


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] deferred foreign keys

2004-01-05 Thread Bruno Wolff III
On Mon, Jan 05, 2004 at 11:33:40 -0500,
  Vivek Khera [EMAIL PROTECTED] wrote:
 
 Thanks.  Then it sorta makes it moot for me to try deferred checks,
 since the Pimary and Foreign keys never change once set.  I wonder
 what is making the transactions appear to run lockstep, then...

I think this is probably the issue with foreign key checks needing an
exclusive lock, since there is no shared lock that will prevent deletes.
This problem has been discussed a number of times on the lists and you
should be able to find out more information from the archives if you
want to confirm that this is the root cause of your problems.

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] optimizing Postgres queries

2004-01-05 Thread David Teran
Hi Tom,

It's worth pointing out that this problem is fixed (at long last) in
CVS tip.  Ypu probably shouldn't expend large amounts of effort on
working around a problem that will go away in 7.5.
We have now changed the definition to integer, this will work for some 
time. We are currently evaluating and have several production database 
we might switch in some time.

What we found out now is that a query with a single 'where' works fine, 
the query planer uses the index but when we have 'two' where clauses it 
does not use the index anymore:

EXPLAIN ANALYZE SELECT columns...  FROM KEY_VALUE_META_DATA t0 WHERE 
(t0.ID_VALUE = 14542); performs fine, less than one millisecond.

EXPLAIN ANALYZE SELECT columns...  FROM KEY_VALUE_META_DATA t0 WHERE 
(t0.ID_VALUE = 14542 OR t0.ID_VALUE = 14550); performs bad: about 
235 milliseconds.

I tried to change the second one to use IN but this did not help at 
all. Am i doing something wrong? I have an index defined like this:

CREATE INDEX key_value_meta_data__id_value__fk_index ON 
KEY_VALUE_META_DATA USING btree (ID_VALUE);

Regards David

---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org


Re: [PERFORM] optimizing Postgres queries

2004-01-05 Thread Tom Lane
David Teran [EMAIL PROTECTED] writes:
 What we found out now is that a query with a single 'where' works fine, 
 the query planer uses the index but when we have 'two' where clauses it 
 does not use the index anymore:

 EXPLAIN ANALYZE SELECT columns...  FROM KEY_VALUE_META_DATA t0 WHERE 
 (t0.ID_VALUE = 14542); performs fine, less than one millisecond.

 EXPLAIN ANALYZE SELECT columns...  FROM KEY_VALUE_META_DATA t0 WHERE 
 (t0.ID_VALUE = 14542 OR t0.ID_VALUE = 14550); performs bad: about 
 235 milliseconds.

Please, when you ask this sort of question, show the EXPLAIN ANALYZE
output.  It is not a virtue to provide minimal information and see if
anyone can guess what's happening.

regards, tom lane

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] deferred foreign keys

2004-01-05 Thread Stephan Szabo
On Mon, 5 Jan 2004, Bruno Wolff III wrote:

 On Mon, Jan 05, 2004 at 11:33:40 -0500,
   Vivek Khera [EMAIL PROTECTED] wrote:
 
  Thanks.  Then it sorta makes it moot for me to try deferred checks,
  since the Pimary and Foreign keys never change once set.  I wonder
  what is making the transactions appear to run lockstep, then...

 I think this is probably the issue with foreign key checks needing an
 exclusive lock, since there is no shared lock that will prevent deletes.

But, if he's updating the fk table but not the keyed column, it should no
longer be doing the check and grabbing the locks.  If he's seeing it grab
the row locks still a full test case would be handy because it'd probably
mean we missed something.

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] deferred foreign keys

2004-01-05 Thread Vivek Khera
On Jan 5, 2004, at 1:38 PM, Bruno Wolff III wrote:

I think this is probably the issue with foreign key checks needing an
exclusive lock, since there is no shared lock that will prevent 
deletes.

That was my original thought upon reading all the discussion of late 
regarding the FK checking locks.  I figured if I deferred the checks to 
commit, I could save some contention time.  However, if FK checks are 
skipped if the field in question is not updated, what locks would there 
be?  Are they taken even if the checks are not performed on some sort 
of be prepared principle?

Vivek Khera, Ph.D.
+1-301-869-4449 x806
---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] optimizing Postgres queries

2004-01-05 Thread David Teran
Hi Tom,


David Teran [EMAIL PROTECTED] writes:
What we found out now is that a query with a single 'where' works  
fine,
the query planer uses the index but when we have 'two' where clauses  
it
does not use the index anymore:

EXPLAIN ANALYZE SELECT columns...  FROM KEY_VALUE_META_DATA t0 WHERE
(t0.ID_VALUE = 14542); performs fine, less than one millisecond.

EXPLAIN ANALYZE SELECT columns...  FROM KEY_VALUE_META_DATA t0 WHERE
(t0.ID_VALUE = 14542 OR t0.ID_VALUE = 14550); performs bad: about
235 milliseconds.
Please, when you ask this sort of question, show the EXPLAIN ANALYZE
output.  It is not a virtue to provide minimal information and see if
anyone can guess what's happening.
Sorry for that, i thought this is such a trivial question that the  
answer is easy.

explain result from first query:

Index Scan using key_value_meta_data__id_value__fk_index on KEY_VALUE_M 
ETA_DATA t0  (cost=0.00..1585.52 rows=467 width=1068) (actual time=0.42 
4..0.493 rows=13 loops=1)

  Index Cond: (ID_VALUE = 21094)

Total runtime: 0.608 ms



explain result from second query:

Seq Scan on KEY_VALUE_META_DATA t0  (cost=0.00..2671.16 rows=931 width 
=1068) (actual time=122.669..172.179 rows=25 loops=1)

  Filter: ((ID_VALUE = 21094) OR (ID_VALUE = 21103))

Total runtime: 172.354 ms



I found out that its possible to disable seq scans with set  
enable_seqscan to off; then the second query result looks like this:

Index Scan using key_value_meta_data__id_value__fk_index, key_value_meta 
_data__id_value__fk_index on KEY_VALUE_META_DATA t0  (cost=0.00..3173. 
35 rows=931 width=1068) (actual time=0.116..0.578 rows=25 loops=1)

  Index Cond: ((ID_VALUE = 21094) OR (ID_VALUE = 21103))

Total runtime: 0.716 ms

But i read in the docs that its not OK to turn this off by default. I  
really wonder if this is my fault or not, from my point of view this is  
such a simple select that the query plan should not result in a table  
scan.

Regards David

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly


[PERFORM] Select max(foo) and select count(*) optimization

2004-01-05 Thread John Siracusa
Speaking of special cases (well, I was on the admin list) there are two
kinds that would really benefit from some attention.

1. The query select max(foo) from bar where the column foo has an index.
Aren't indexes ordered?  If not, an ordered index would be useful in this
situation so that this query, rather than doing a sequential scan of the
whole table, would just ask the index for the max value and return nearly
instantly.

2. The query select count(*) from bar  Surely the total number of rows in
a table is kept somewhere convenient.  If not, it would be nice if it could
be :)  Again, rather than doing a sequential scan of the entire table, this
type of query could return instantly.

I believe MySQL does both of these optimizations (which are probably a lot
easier in that product, given its data storage system).  These were the
first areas where I noticed a big performance difference between MySQL and
Postgres.

Especially with very large tables, hearing the disks grind as Postgres scans
every single row in order to determine the number of rows in a table or the
max value of a column (even a primary key created from a sequence) is pretty
painful.  If the implementation is not too horrendous, this is an area where
an orders-of-magnitude performance increase can  be had.

-John


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] optimizing Postgres queries

2004-01-05 Thread Tom Lane
David Teran [EMAIL PROTECTED] writes:
 explain result from second query:

 Seq Scan on KEY_VALUE_META_DATA t0  (cost=0.00..2671.16 rows=931 width 
 =1068) (actual time=122.669..172.179 rows=25 loops=1)
   Filter: ((ID_VALUE = 21094) OR (ID_VALUE = 21103))

The problem is evidently that the row estimate is so far off (931
estimate vs 25 actual).  Have you done ANALYZE or VACUUM ANALYZE
on this table recently?  If you have, I'd be interested to see the
pg_stats row for ID_VALUE.  It might be that you need to increase
the statistics target for this table.

regards, tom lane

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] deferred foreign keys

2004-01-05 Thread Stephan Szabo
On Mon, 5 Jan 2004, Vivek Khera wrote:


 On Jan 5, 2004, at 1:57 PM, Stephan Szabo wrote:

  But, if he's updating the fk table but not the keyed column, it should
  no
  longer be doing the check and grabbing the locks.  If he's seeing it
  grab
  the row locks still a full test case would be handy because it'd
  probably
  mean we missed something.
 

 I'm not *sure* it is taking any locks.  The transactions appear to be
 running lock step (operating on different parts of the same pair of
 tables) and I was going to see if deferring the locks made the
 difference.  It is my feeling now that it will not.  However, if there
 is a way to detect if locks are being taken, I'll do that.  I'd like to
 avoid dropping and recreating the foreign keys if I can since it takes
 up some bit of time on the table with 20+ million rows.

The only way I can think of to see the locks is to do just one of the
operations and then manually attempting to select for update the
associated pk row.


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Select max(foo) and select count(*) optimization

2004-01-05 Thread John Siracusa
On 1/5/04 2:52 PM, Rod Taylor wrote:
 max(foo) optimizations requires an extension to the aggregates system.
 It will likely happen within a few releases.

Looking forward to it.

 A work around can be accomplished today through the use of LIMIT and ORDER BY.

Wowzers, I never imagined that that'd be so much faster.  Thanks! :)

-John


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] deferred foreign keys

2004-01-05 Thread Rod Taylor
On Mon, 2004-01-05 at 14:48, Stephan Szabo wrote:
 On Mon, 5 Jan 2004, Vivek Khera wrote:
 
 
  On Jan 5, 2004, at 1:57 PM, Stephan Szabo wrote:
 
   But, if he's updating the fk table but not the keyed column, it should
   no
   longer be doing the check and grabbing the locks.  If he's seeing it
   grab
   the row locks still a full test case would be handy because it'd
   probably
   mean we missed something.
  
 
  I'm not *sure* it is taking any locks.  The transactions appear to be
  running lock step (operating on different parts of the same pair of
  tables) and I was going to see if deferring the locks made the
  difference.  It is my feeling now that it will not.  However, if there
  is a way to detect if locks are being taken, I'll do that.  I'd like to
  avoid dropping and recreating the foreign keys if I can since it takes
  up some bit of time on the table with 20+ million rows.
 
 The only way I can think of to see the locks is to do just one of the
 operations and then manually attempting to select for update the
 associated pk row.

When a locker runs into a row lock held by another transaction, the
locker will show a pending lock on the transaction id in pg_locks.



---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Use my (date) index, darn it!

2004-01-05 Thread Christopher Browne
In the last exciting episode, [EMAIL PROTECTED] (John Siracusa) wrote:
 What column(s) should I increase?  Do I have to do anything after increasing
 the statistics, or do I just wait for the stats collector to do its thing?

You have to ANALYZE the table again, to force in new statistics.

And if the index in question is on _just_ the date column, then it is
probably only that date column where the SET STATISTICS needs to be
increased.
-- 
let name=cbbrowne and tld=cbbrowne.com in name ^ @ ^ tld;;
http://www.ntlug.org/~cbbrowne/sap.html
Faith  is the  quality that  enables you  to eat  blackberry jam  on a
picnic without looking to see whether the seeds move. -- DeMara Cabrera

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Select max(foo) and select count(*) optimization

2004-01-05 Thread Christopher Browne
Oops! [EMAIL PROTECTED] (John Siracusa) was seen spray-painting on a wall:
 Speaking of special cases (well, I was on the admin list) there are two
 kinds that would really benefit from some attention.

 1. The query select max(foo) from bar where the column foo has an
 index.  Aren't indexes ordered?  If not, an ordered index would be
 useful in this situation so that this query, rather than doing a
 sequential scan of the whole table, would just ask the index for
 the max value and return nearly instantly.

 2. The query select count(*) from bar Surely the total number of
 rows in a table is kept somewhere convenient.  If not, it would be
 nice if it could be :) Again, rather than doing a sequential scan of
 the entire table, this type of query could return instantly.

 I believe MySQL does both of these optimizations (which are probably
 a lot easier in that product, given its data storage system).  These
 were the first areas where I noticed a big performance difference
 between MySQL and Postgres.

 Especially with very large tables, hearing the disks grind as
 Postgres scans every single row in order to determine the number of
 rows in a table or the max value of a column (even a primary key
 created from a sequence) is pretty painful.  If the implementation
 is not too horrendous, this is an area where an orders-of-magnitude
 performance increase can be had.

These are both VERY frequently asked questions.

In the case of question #1, the optimization you suggest could be
accomplished via some Small Matter Of Programming.  None of the people
that have wanted the optimization have, however, offered to actually
DO the programming.

In the case of #2, the answer is surely NOT.  In MVCC databases,
that information CANNOT be stored anywhere convenient because queries
requested by transactions started at different points in time must get
different answers.

I think we need to add these questions and their answers to the FAQ so
that the answer can be See FAQ Item #17 rather than people having to
gratuitously explain it over and over and over again.
-- 
(reverse (concatenate 'string moc.enworbbc @ enworbbc))
http://www.ntlug.org/~cbbrowne/finances.html
Rules of  the Evil Overlord #127.  Prison guards will  have their own
cantina featuring  a wide  variety of tasty  treats that  will deliver
snacks to the  guards while on duty. The guards  will also be informed
that  accepting food or  drink from  any other  source will  result in
execution. http://www.eviloverlord.com/

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] deferred foreign keys

2004-01-05 Thread Stephan Szabo
On Mon, 5 Jan 2004, Rod Taylor wrote:

 On Mon, 2004-01-05 at 14:48, Stephan Szabo wrote:
  On Mon, 5 Jan 2004, Vivek Khera wrote:
 
  
   On Jan 5, 2004, at 1:57 PM, Stephan Szabo wrote:
  
But, if he's updating the fk table but not the keyed column, it should
no
longer be doing the check and grabbing the locks.  If he's seeing it
grab
the row locks still a full test case would be handy because it'd
probably
mean we missed something.
   
  
   I'm not *sure* it is taking any locks.  The transactions appear to be
   running lock step (operating on different parts of the same pair of
   tables) and I was going to see if deferring the locks made the
   difference.  It is my feeling now that it will not.  However, if there
   is a way to detect if locks are being taken, I'll do that.  I'd like to
   avoid dropping and recreating the foreign keys if I can since it takes
   up some bit of time on the table with 20+ million rows.
 
  The only way I can think of to see the locks is to do just one of the
  operations and then manually attempting to select for update the
  associated pk row.

 When a locker runs into a row lock held by another transaction, the
 locker will show a pending lock on the transaction id in pg_locks.

Yeah, but AFAIR that won't let you know if it's blocking on the particular
row lock you're expecting.

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Select max(foo) and select count(*) optimization

2004-01-05 Thread Paul Tuckfield
Not that I'm offering to do the porgramming mind you, :) but . . 


In the case of select count(*), one optimization is to do  a scan of the
primary key, not the table itself, if the table has a primary key. In a
certain commercial, lesser database, this is called an index fast full
scan.  It would be important to scan the index in physical order
(sequential physical IO) and not in key order (random physical IO)

I'm guessing the payoff as well as real-world-utility of a max(xxx)
optimization are much higher than a count(*) optimization tho


On Mon, 2004-01-05 at 12:26, Christopher Browne wrote:
 Oops! [EMAIL PROTECTED] (John Siracusa) was seen spray-painting on a wall:
  Speaking of special cases (well, I was on the admin list) there are two
  kinds that would really benefit from some attention.
 
  1. The query select max(foo) from bar where the column foo has an
  index.  Aren't indexes ordered?  If not, an ordered index would be
  useful in this situation so that this query, rather than doing a
  sequential scan of the whole table, would just ask the index for
  the max value and return nearly instantly.
 
  2. The query select count(*) from bar Surely the total number of
  rows in a table is kept somewhere convenient.  If not, it would be
  nice if it could be :) Again, rather than doing a sequential scan of
  the entire table, this type of query could return instantly.
 
  I believe MySQL does both of these optimizations (which are probably
  a lot easier in that product, given its data storage system).  These
  were the first areas where I noticed a big performance difference
  between MySQL and Postgres.
 
  Especially with very large tables, hearing the disks grind as
  Postgres scans every single row in order to determine the number of
  rows in a table or the max value of a column (even a primary key
  created from a sequence) is pretty painful.  If the implementation
  is not too horrendous, this is an area where an orders-of-magnitude
  performance increase can be had.
 
 These are both VERY frequently asked questions.
 
 In the case of question #1, the optimization you suggest could be
 accomplished via some Small Matter Of Programming.  None of the people
 that have wanted the optimization have, however, offered to actually
 DO the programming.
 
 In the case of #2, the answer is surely NOT.  In MVCC databases,
 that information CANNOT be stored anywhere convenient because queries
 requested by transactions started at different points in time must get
 different answers.
 
 I think we need to add these questions and their answers to the FAQ so
 that the answer can be See FAQ Item #17 rather than people having to
 gratuitously explain it over and over and over again.


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Select max(foo) and select count(*) optimization

2004-01-05 Thread Doug McNaught
Paul Tuckfield [EMAIL PROTECTED] writes:

 In the case of select count(*), one optimization is to do  a scan of the
 primary key, not the table itself, if the table has a primary key. In a
 certain commercial, lesser database, this is called an index fast full
 scan.  It would be important to scan the index in physical order
 (sequential physical IO) and not in key order (random physical IO)

That won't work because you still have to hit the actual tuple to
determine visibility.

-Doug


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Select max(foo) and select count(*) optimization

2004-01-05 Thread Christopher Browne
Martha Stewart called it a Good Thing when [EMAIL PROTECTED] (Paul Tuckfield) wrote:
 Not that I'm offering to do the porgramming mind you, :) but . . 

 In the case of select count(*), one optimization is to do  a scan of the
 primary key, not the table itself, if the table has a primary key. In a
 certain commercial, lesser database, this is called an index fast full
 scan.  It would be important to scan the index in physical order
 (sequential physical IO) and not in key order (random physical IO)

The problem is that this optimization does not actually work.  The
index does not contain transaction visibility information, so you have
to go to the pages of tuples in order to determine if any given tuple
is visible.

 I'm guessing the payoff as well as real-world-utility of a max(xxx)
 optimization are much higher than a count(*) optimization tho

That's probably so.

In many cases, approximations, such as page counts, may be good
enough, and pray consider, that (an approximation) is probably all
you were getting from the database systems that had an optimization
to store the count in a counter.
-- 
let name=cbbrowne and tld=ntlug.org in name ^ @ ^ tld;;
http://www3.sympatico.ca/cbbrowne/linuxxian.html
No, you  misunderstand. Microsoft asked  some hackers how  they could
make their system secure - the hackers replied Turn it off.. So they
did. -- Anthony Ord

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] Select max(foo) and select count(*) optimization

2004-01-05 Thread Christopher Browne
[EMAIL PROTECTED] (Rod Taylor) wrote:
 Especially with very large tables, hearing the disks grind as Postgres scans
 every single row in order to determine the number of rows in a table or the
 max value of a column (even a primary key created from a sequence) is pretty
 painful.  If the implementation is not too horrendous, this is an area where
 an orders-of-magnitude performance increase can  be had.

 Actually, it's very painful. For MySQL, they've accepted the concurrancy
 hit in order to accomplish it -- PostgreSQL would require a more subtle
 approach.

 Anyway, with Rules you can force this:

 ON INSERT UPDATE counter SET tablecount = tablecount + 1;

 ON DELETE UPDATE counter SET tablecount = tablecount - 1;

 You need to create a table counter with a single row that will keep
 track of the number of rows in the table. Just remember, you've now
 serialized all writes to the table, but in your situation it may be
 worth while.

There's a still more subtle approach that relieves the serialization
constraint, at some cost...

- You add rules that _insert_ a row each time there is an
  insert/delete
   ON INSERT insert into counts(table, value) values ('our_table', 1);
   ON DELETE insert into counts(table, value) values ('our_table', -1);

- The select count(*) from our_table is replaced by select
  sum(value) from counts where table = 'our_table'

- Periodically, a compression process goes through and either:

a) Deletes the rows for 'our_table' and replaces them with one
   row with a conventionally-scanned 'count(*)' value, or

b) Computes select table, sum(value) as value from counts group
   by table, deletes all the existing rows in counts, and replaces
   them by the preceding selection, or

c) Perhaps does something incremental that's like b), but which
   only processes parts of the count table at once.  Process
   500 rows, then COMMIT, or something of the sort...

Note that this counts table can potentially grow _extremely_ large.
The win comes when it gets compressed, so that instead of scanning
through 500K items, it index-scans through 27, the 1 that has the
497000 that was the state of the table at the last compression, and
then 26 singletons.

A win comes in if an INSERT that adds in 50 rows can lead to
inserting ('our_table', 50) into COUNTS, or a delete that eliminates
5000 rows puts in ('our_table', -5000).

It's vital to run the compression reasonably often (much like VACUUM
:-)) in order that the COUNTS summary table stays relatively small.
-- 
let name=cbbrowne and tld=cbbrowne.com in String.concat @ [name;tld];;
http://www3.sympatico.ca/cbbrowne/wp.html
Debugging is twice  as hard as writing   the code in the first  place.
Therefore, if you write the code as cleverly as  possible, you are, by
definition, not smart enough to debug it.  -- Brian W. Kernighan

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] optimizing Postgres queries

2004-01-05 Thread Bruce Momjian
David Teran wrote:
 Index?Scan?using?key_value_meta_data__id_value__fk_index,?key_value_meta 
 _data__id_value__fk_index?on?KEY_VALUE_META_DATA?t0??(cost=0.00..19.94 
 ?rows=14?width=75)?(actual?time=0.112..0.296?rows=25?loops=1)
 ??Index?Cond:?((ID_VALUE?=?21094)?OR?(ID_VALUE?=?21103))
 Total runtime: 0.429 ms
 
 Much better. So i think i will first read more about this optimization  
 stuff and regular maintenance things. This is something i like very  
 much from FrontBase: no need for such things, simply start and run. But  
 other things were not so fine ;-).
 
 Is there any hint where to start to understand more about this  
 optimization problem?

Read the FAQ.  There is an item about slow queries and indexes.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Select max(foo) and select count(*) optimization

2004-01-05 Thread Shridhar Daithankar
On Tuesday 06 January 2004 07:16, Christopher Browne wrote:
 Martha Stewart called it a Good Thing when [EMAIL PROTECTED] (Paul 
Tuckfield) wrote:
  Not that I'm offering to do the porgramming mind you, :) but . .
 
  In the case of select count(*), one optimization is to do  a scan of the
  primary key, not the table itself, if the table has a primary key. In a
  certain commercial, lesser database, this is called an index fast full
  scan.  It would be important to scan the index in physical order
  (sequential physical IO) and not in key order (random physical IO)

 The problem is that this optimization does not actually work.  The
 index does not contain transaction visibility information, so you have
 to go to the pages of tuples in order to determine if any given tuple
 is visible.

It was rejected as an idea to add transaction visibility information to 
indexes. The time I proposed, my idea was to vacuum tuples on page level 
while postgresql pushes buffers out of shared cache. If indexes had 
visibility information, they could be cleaned out of order than heap tuples.

This wouldn't have eliminated vacuum entirely but at least frequently hit data 
would be clean.

But it was rejected because of associated overhead. 

Just thought worh a mention..

 Shridhar



---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] Select max(foo) and select count(*) optimization

2004-01-05 Thread Shridhar Daithankar
On Tuesday 06 January 2004 01:22, Rod Taylor wrote:
 Anyway, with Rules you can force this:

 ON INSERT UPDATE counter SET tablecount = tablecount + 1;

 ON DELETE UPDATE counter SET tablecount = tablecount - 1;

That would generate lot of dead tuples in counter table. How about

select relpages,reltuples from pg_class where relname=tablename;

Assuming the stats are recent enough, it would be much faster and accurate..

 Shridhar


---(end of broadcast)---
TIP 8: explain analyze is your friend