Re: [PERFORM] Questions on query planner, join types, and work_mem

2011-02-01 Thread Bruce Momjian
Bruce Momjian wrote:
 Robert Haas wrote:
  On Wed, Aug 11, 2010 at 9:42 PM, Bruce Momjian br...@momjian.us wrote:
   This confused me. ?If we are assuing the data is in
   effective_cache_size, why are we adding sequential/random page cost to
   the query cost routines?
  
  See the comments for index_pages_fetched().  We basically assume that
  all data starts uncached at the beginning of each query - in fact,
  each plan node.  effective_cache_size only measures the chances that
  if we hit the same block again later in the execution of something
  like a nested-loop-with-inner-indexscan, it'll still be in cache.
  
  It's an extremely weak knob, and unless you have tables or indices
  that are larger than RAM, the only mistake you can make is setting it
  too low.
 
 The attached patch documents that there is no assumption that data
 remains in the disk cache between queries.  I thought this information
 might be helpful.

Applied.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Questions on query planner, join types, and work_mem

2011-01-31 Thread Bruce Momjian
Robert Haas wrote:
 On Wed, Aug 11, 2010 at 9:42 PM, Bruce Momjian br...@momjian.us wrote:
  This confused me. ?If we are assuing the data is in
  effective_cache_size, why are we adding sequential/random page cost to
  the query cost routines?
 
 See the comments for index_pages_fetched().  We basically assume that
 all data starts uncached at the beginning of each query - in fact,
 each plan node.  effective_cache_size only measures the chances that
 if we hit the same block again later in the execution of something
 like a nested-loop-with-inner-indexscan, it'll still be in cache.
 
 It's an extremely weak knob, and unless you have tables or indices
 that are larger than RAM, the only mistake you can make is setting it
 too low.

The attached patch documents that there is no assumption that data
remains in the disk cache between queries.  I thought this information
might be helpful.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 77cacdd..520170b 100644
*** a/doc/src/sgml/config.sgml
--- b/doc/src/sgml/config.sgml
*** SET ENABLE_SEQSCAN TO OFF;
*** 2424,2430 
  space.  This parameter has no effect on the size of shared
  memory allocated by productnamePostgreSQL/productname, nor
  does it reserve kernel disk cache; it is used only for estimation
! purposes.  The default is 128 megabytes (literal128MB/).
 /para
/listitem
   /varlistentry
--- 2424,2432 
  space.  This parameter has no effect on the size of shared
  memory allocated by productnamePostgreSQL/productname, nor
  does it reserve kernel disk cache; it is used only for estimation
! purposes.  The system also does not assume data remains in
! the disk cache between queries.  The default is 128 megabytes
! (literal128MB/).
 /para
/listitem
   /varlistentry

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Questions on query planner, join types, and work_mem

2010-08-11 Thread Bruce Momjian
Tom Lane wrote:
 Hannu Krosing ha...@2ndquadrant.com writes:
  Of course there are more variables than just *_page_cost, so if you nail
  down any other one, you may end with less than 1 for both page costs.
 
  I have always used seq_page_cost = 1 in my thinking and adjusted others
  relative to it.
 
 Right, seq_page_cost = 1 is sort of the traditional reference point,
 but you don't have to do it that way.  The main point here is that for
 an all-in-RAM database, the standard page access costs are too high
 relative to the CPU effort costs:
 
 regression=# select name, setting from pg_settings where name like '%cost';
  name | setting 
 --+-
  cpu_index_tuple_cost | 0.005
  cpu_operator_cost| 0.0025
  cpu_tuple_cost   | 0.01
  random_page_cost | 4
  seq_page_cost| 1
 (5 rows)
 
 To model an all-in-RAM database, you can either dial down both
 random_page_cost and seq_page_cost to 0.1 or so, or set random_page_cost
 to 1 and increase all the CPU costs.  The former is less effort ;-)
 
 It should be noted also that there's not all that much evidence backing
 up the default values of the cpu_xxx_cost variables.  In the past those
 didn't matter much because I/O costs always swamped CPU costs anyway.
 But I can foresee us having to twiddle those defaults and maybe refine
 the CPU cost model more, as all-in-RAM cases get more common.

This confused me.  If we are assuing the data is in
effective_cache_size, why are we adding sequential/random page cost to
the query cost routines?

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Questions on query planner, join types, and work_mem

2010-08-11 Thread Robert Haas
On Wed, Aug 11, 2010 at 9:42 PM, Bruce Momjian br...@momjian.us wrote:
 This confused me.  If we are assuing the data is in
 effective_cache_size, why are we adding sequential/random page cost to
 the query cost routines?

See the comments for index_pages_fetched().  We basically assume that
all data starts uncached at the beginning of each query - in fact,
each plan node.  effective_cache_size only measures the chances that
if we hit the same block again later in the execution of something
like a nested-loop-with-inner-indexscan, it'll still be in cache.

It's an extremely weak knob, and unless you have tables or indices
that are larger than RAM, the only mistake you can make is setting it
too low.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Questions on query planner, join types, and work_mem

2010-08-04 Thread Hannu Krosing
On Mon, 2010-08-02 at 14:23 -0700, Peter Hussey wrote:
 I already had effective_cache_size set to 500MB.
 
 I experimented with lowering  random_page_cost to 3 then 2.  

In case of fully cached database it is closer to 1.

 2) Why is the setting of work_mem something left to the admin and/or
 developer?  Couldn't the optimizer say how much it thinks it needs to
 build a hash table based on size of the keys and estimated number of
 rows?

Yes, It can say how much it thinks it needs to build a hash table, the
part it can't figure out is how much it can afford, based on things like
number concurrent queries and how much work-mem these are using, and any
work-mem used will be substracted from total memory pool, affecting also
how much of the files the system caches.

 It is difficult for a software development platform like ours to take
 advantage of suggestions to set work_mem, or to change the cost
 function, or turn on/off join strategies for individual queries.  The
 SQL we issue is formed by user interaction with the product and rarely
 static.  How would we know when to turn something on or off?  That's
 why I'm looking for a configuration solution that I can set on a
 database-wide basis and have it work well for all queries.

Keep trying. The close you get with your conf to real conditions, the
better choices the optimiser can make ;)



-- 
Hannu Krosing   http://www.2ndQuadrant.com
PostgreSQL Scalability and Availability 
   Services, Consulting and Training



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Questions on query planner, join types, and work_mem

2010-08-04 Thread Robert Haas
On Tue, Aug 3, 2010 at 3:03 AM, Hannu Krosing ha...@2ndquadrant.com wrote:
 In case of fully cached database it is closer to 1.

In the case of a fully cached database I believe the correct answer
begins with a decimal point.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Questions on query planner, join types, and work_mem

2010-08-04 Thread Kevin Grittner
Peter Hussey pe...@labkey.com wrote:
 
 I already had effective_cache_size set to 500MB.
 
That seems awfully small.  You do realize that this setting does not
cause PostgreSQL to allocate any memory; it merely advises how much
disk space is likely to be cached.  It should normally be set to the
sum of your shared_buffers setting and whatever your OS reports as
cached.  Setting it too small will discourage the optimizer from
picking plans which use indexes.
 
 I experimented with lowering  random_page_cost to 3 then 2.
 
As others have said, in a fully cached system that's still too high.
If the active portion of your database is fully cached, you should
set random_page_cost and seq_page_cost to the same value, and that
value should probably be in the range of 0.1 to 0.005.  It can get
trickier if the active portion is largely but not fully cached; we
have one server where we found, through experimentation, that we got
better plans overall with seq_page_cost = 0.3 and random_page_cost =
0.5 than any other settings we tried.
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Questions on query planner, join types, and work_mem

2010-08-04 Thread Hannu Krosing
On Wed, 2010-08-04 at 09:14 -0400, Robert Haas wrote:
 On Tue, Aug 3, 2010 at 3:03 AM, Hannu Krosing ha...@2ndquadrant.com wrote:
  In case of fully cached database it is closer to 1.
 
 In the case of a fully cached database I believe the correct answer
 begins with a decimal point.

The number 1 here was suggested in relation to seq_page_cost, which is
1. 

For fully cached db there is no additional seek time for random access,
so seq_page_cost == random_page_cost.

Of course there are more variables than just *_page_cost, so if you nail
down any other one, you may end with less than 1 for both page costs.

I have always used seq_page_cost = 1 in my thinking and adjusted others
relative to it.

-- 
Hannu Krosing   http://www.2ndQuadrant.com
PostgreSQL Scalability and Availability 
   Services, Consulting and Training



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Questions on query planner, join types, and work_mem

2010-08-04 Thread Tom Lane
Hannu Krosing ha...@2ndquadrant.com writes:
 Of course there are more variables than just *_page_cost, so if you nail
 down any other one, you may end with less than 1 for both page costs.

 I have always used seq_page_cost = 1 in my thinking and adjusted others
 relative to it.

Right, seq_page_cost = 1 is sort of the traditional reference point,
but you don't have to do it that way.  The main point here is that for
an all-in-RAM database, the standard page access costs are too high
relative to the CPU effort costs:

regression=# select name, setting from pg_settings where name like '%cost';
 name | setting 
--+-
 cpu_index_tuple_cost | 0.005
 cpu_operator_cost| 0.0025
 cpu_tuple_cost   | 0.01
 random_page_cost | 4
 seq_page_cost| 1
(5 rows)

To model an all-in-RAM database, you can either dial down both
random_page_cost and seq_page_cost to 0.1 or so, or set random_page_cost
to 1 and increase all the CPU costs.  The former is less effort ;-)

It should be noted also that there's not all that much evidence backing
up the default values of the cpu_xxx_cost variables.  In the past those
didn't matter much because I/O costs always swamped CPU costs anyway.
But I can foresee us having to twiddle those defaults and maybe refine
the CPU cost model more, as all-in-RAM cases get more common.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Questions on query planner, join types, and work_mem

2010-08-04 Thread Hannu Krosing
On Wed, 2010-08-04 at 14:00 -0400, Tom Lane wrote:
 Hannu Krosing ha...@2ndquadrant.com writes:
  Of course there are more variables than just *_page_cost, so if you nail
  down any other one, you may end with less than 1 for both page costs.
 
  I have always used seq_page_cost = 1 in my thinking and adjusted others
  relative to it.
 
 Right, seq_page_cost = 1 is sort of the traditional reference point,
 but you don't have to do it that way.  The main point here is that for
 an all-in-RAM database, the standard page access costs are too high
 relative to the CPU effort costs:
 
 regression=# select name, setting from pg_settings where name like '%cost';
  name | setting 
 --+-
  cpu_index_tuple_cost | 0.005
  cpu_operator_cost| 0.0025
  cpu_tuple_cost   | 0.01
  random_page_cost | 4
  seq_page_cost| 1
 (5 rows)
 
 To model an all-in-RAM database, you can either dial down both
 random_page_cost and seq_page_cost to 0.1 or so, or set random_page_cost
 to 1 and increase all the CPU costs.  The former is less effort ;-)
 
 It should be noted also that there's not all that much evidence backing
 up the default values of the cpu_xxx_cost variables.  In the past those
 didn't matter much because I/O costs always swamped CPU costs anyway.
 But I can foresee us having to twiddle those defaults and maybe refine
 the CPU cost model more, as all-in-RAM cases get more common.

Especially the context switch + copy between shared buffers and system
disk cache will become noticeable at these speeds.

An easy way to test it is loading a table with a few indexes, once with
a shared_buffers value, which is senough for only the main table and
once with one that fits both table and indexes,


   regards, tom lane


-- 
Hannu Krosing   http://www.2ndQuadrant.com
PostgreSQL Scalability and Availability 
   Services, Consulting and Training



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Questions on query planner, join types, and work_mem

2010-08-04 Thread Hannu Krosing
On Wed, 2010-08-04 at 21:41 +0300, Hannu Krosing wrote:
 On Wed, 2010-08-04 at 14:00 -0400, Tom Lane wrote:

  regression=# select name, setting from pg_settings where name like '%cost';
   name | setting 
  --+-
   cpu_index_tuple_cost | 0.005
   cpu_operator_cost| 0.0025
   cpu_tuple_cost   | 0.01
   random_page_cost | 4
   seq_page_cost| 1
  (5 rows)
  
  To model an all-in-RAM database, you can either dial down both
  random_page_cost and seq_page_cost to 0.1 or so, or set random_page_cost
  to 1 and increase all the CPU costs.  The former is less effort ;-)
  
  It should be noted also that there's not all that much evidence backing
  up the default values of the cpu_xxx_cost variables.  In the past those
  didn't matter much because I/O costs always swamped CPU costs anyway.
  But I can foresee us having to twiddle those defaults and maybe refine
  the CPU cost model more, as all-in-RAM cases get more common.
 
 Especially the context switch + copy between shared buffers and system
 disk cache will become noticeable at these speeds.
 
 An easy way to test it is loading a table with a few indexes, once with
 a shared_buffers value, which is senough for only the main table and
 once with one that fits both table and indexes,

ok, just to back this up I ran the following test with 28MB and 128MB
shared buffers.

create table sbuf_test(f1 float, f2 float, f3 float);
create index sbuf_test1 on sbuf_test(f1);
create index sbuf_test2 on sbuf_test(f2);
create index sbuf_test3 on sbuf_test(f3);

and then did 3 times the following for each shared_buffers setting

truncate sbuf_test;
insert into sbuf_test 
select random(), random(), random() from generate_series(1,60);

the main table size was 31MB, indexes were 18MB each for total size of
85MB

in case of 128MB shared buffers, the insert run in 14sec (+/- 1 sec)

in case of 28MB shared buffers, the insert run between 346 and 431 sec,
that is 20-30 _times_ slower.

There was ample space for keeping the indexes in linux cache (it has 1GB
cached currently) though the system may have decided to start writing it
to disk, so I suspect that most of the time was spent copying random
index pages back and forth between shared buffers and disk cache.

I did not verify this, so there may be some other factors involved, but
this seems like the most obvious suspect.

-- 
Hannu Krosing   http://www.2ndQuadrant.com
PostgreSQL Scalability and Availability 
   Services, Consulting and Training



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Questions on query planner, join types, and work_mem

2010-08-04 Thread Greg Smith

Hannu Krosing wrote:

There was ample space for keeping the indexes in linux cache (it has 1GB
cached currently) though the system may have decided to start writing it
to disk, so I suspect that most of the time was spent copying random
index pages back and forth between shared buffers and disk cache.
  


Low shared_buffers settings will result in the same pages more often 
being written multiple times per checkpoint, particularly index pages, 
which is less efficient than keeping in the database cache and updating 
them there.  This is a slightly different issue than just the overhead 
of copying them back and forth; by keeping them in cache, you actually 
reduce writes to the OS cache.  What I do to quantify that is...well, 
the attached shows it better than I can describe; only works on 9.0 or 
later as it depends on a feature I added for this purpose there.  It 
measures exactly how much buffer cache churn happened during a test, in 
this case creating a pgbench database.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Questions on query planner, join types, and work_mem

2010-08-04 Thread Greg Smith

This time with attachment...

--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us



fsync-stress.sh
Description: Bourne shell script

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Questions on query planner, join types, and work_mem

2010-08-04 Thread Tom Lane
Hannu Krosing ha...@2ndquadrant.com writes:
 There was ample space for keeping the indexes in linux cache (it has 1GB
 cached currently) though the system may have decided to start writing it
 to disk, so I suspect that most of the time was spent copying random
 index pages back and forth between shared buffers and disk cache.

If you're on a platform that has oprofile, you could probably verify
that rather than just guess it ...

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Questions on query planner, join types, and work_mem

2010-08-04 Thread Kevin Grittner
Greg Smith g...@2ndquadrant.com wrote:
 
 What I do to quantify that is...well, the attached shows it better
 than I can describe; only works on 9.0 or later as it depends on a
 feature I added for this purpose there. It measures exactly how
 much buffer cache churn happened during a test, in this case
 creating a pgbench database.
 
I'm not entirely sure I understand what I'm supposed to get from
that.  On a 3GB workstation, a compile from a recent HEAD checkout,
with a default postgresql.conf file, I get this:
 
-[ RECORD 1 ]--+--
now| 2010-08-04 14:25:46.683766-05
checkpoints_timed  | 0
checkpoints_req| 0
buffers_checkpoint | 0
buffers_clean  | 0
maxwritten_clean   | 0
buffers_backend| 0
buffers_alloc  | 73   

Initializing pgbench
-[ RECORD 1 ]--+--
now| 2010-08-04 14:27:49.062551-05
checkpoints_timed  | 0
checkpoints_req| 0
buffers_checkpoint | 0
buffers_clean  | 0
maxwritten_clean   | 0
buffers_backend| 633866
buffers_alloc  | 832
 
I boost shared_buffers from 32MB to 320MB, restart, and get this:
 
-[ RECORD 1 ]--+--
now| 2010-08-04 14:30:42.816719-05
checkpoints_timed  | 0
checkpoints_req| 0
buffers_checkpoint | 0
buffers_clean  | 0
maxwritten_clean   | 0
buffers_backend| 0
buffers_alloc  | 0

Initializing pgbench
-[ RECORD 1 ]--+--
now| 2010-08-04 14:32:40.750098-05
checkpoints_timed  | 0
checkpoints_req| 0
buffers_checkpoint | 0
buffers_clean  | 0
maxwritten_clean   | 0
buffers_backend| 630794
buffers_alloc  | 2523
 
So run time dropped from 123 seconds to 118 seconds, buffers_backend
dropped by less than 0.5%, and buffers_alloc went up.  Assuming this
is real, and not just in the noise -- what conclusions would you
draw from this?  Dedicating an additional 10% of my free memory got
me a 4% speed improvement?  Was I supposed to try with other scales?
Which ones?
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Questions on query planner, join types, and work_mem

2010-08-04 Thread Hannu Krosing
On Wed, 2010-08-04 at 22:03 +0300, Hannu Krosing wrote:
 On Wed, 2010-08-04 at 21:41 +0300, Hannu Krosing wrote:
  On Wed, 2010-08-04 at 14:00 -0400, Tom Lane wrote:
 
   regression=# select name, setting from pg_settings where name like 
   '%cost';
name | setting 
   --+-
cpu_index_tuple_cost | 0.005
cpu_operator_cost| 0.0025
cpu_tuple_cost   | 0.01
random_page_cost | 4
seq_page_cost| 1
   (5 rows)
   
   To model an all-in-RAM database, you can either dial down both
   random_page_cost and seq_page_cost to 0.1 or so, or set random_page_cost
   to 1 and increase all the CPU costs.  The former is less effort ;-)
   
   It should be noted also that there's not all that much evidence backing
   up the default values of the cpu_xxx_cost variables.  In the past those
   didn't matter much because I/O costs always swamped CPU costs anyway.
   But I can foresee us having to twiddle those defaults and maybe refine
   the CPU cost model more, as all-in-RAM cases get more common.
  
  Especially the context switch + copy between shared buffers and system
  disk cache will become noticeable at these speeds.
  
  An easy way to test it is loading a table with a few indexes, once with
  a shared_buffers value, which is senough for only the main table and
  once with one that fits both table and indexes,

I re-ran the test, and checked idx_blks_read for 28MB case

hannu=# select * from pg_statio_user_indexes where relname =
'sbuf_test';
| schemaname |  relname  | indexrelname | idx_blks_read | idx_blks_hit 
++---+--+---+--
| hannu  | sbuf_test | sbuf_test1   | 71376 |  1620908
| hannu  | sbuf_test | sbuf_test2   | 71300 |  1620365
| hannu  | sbuf_test | sbuf_test3   | 71436 |  1619619


this means that there were a total of 214112 index blocks read back from
disk cache (obviously at least some of these had to be copied the other
way as well).

This seems to indicate about 1 ms for moving pages over user/system
boundary. (Intel Core2 Duo T7500 @ 2.20GHz, Ubuntu 9.10, 4GB RAM)

for 128MB shared buffers the total idx_blks_read for 3 indexes was about
6300 .


 ok, just to back this up I ran the following test with 28MB and 128MB
 shared buffers.
 
 create table sbuf_test(f1 float, f2 float, f3 float);
 create index sbuf_test1 on sbuf_test(f1);
 create index sbuf_test2 on sbuf_test(f2);
 create index sbuf_test3 on sbuf_test(f3);
 
 and then did 3 times the following for each shared_buffers setting
 
 truncate sbuf_test;
 insert into sbuf_test 
 select random(), random(), random() from generate_series(1,60);
 
 the main table size was 31MB, indexes were 18MB each for total size of
 85MB
 
 in case of 128MB shared buffers, the insert run in 14sec (+/- 1 sec)
 
 in case of 28MB shared buffers, the insert run between 346 and 431 sec,
 that is 20-30 _times_ slower.
 
 There was ample space for keeping the indexes in linux cache (it has 1GB
 cached currently) though the system may have decided to start writing it
 to disk, so I suspect that most of the time was spent copying random
 index pages back and forth between shared buffers and disk cache.
 
 I did not verify this, so there may be some other factors involved, but
 this seems like the most obvious suspect.
 
 -- 
 Hannu Krosing   http://www.2ndQuadrant.com
 PostgreSQL Scalability and Availability 
Services, Consulting and Training
 
 
 



-- 
Hannu Krosing   http://www.2ndQuadrant.com
PostgreSQL Scalability and Availability 
   Services, Consulting and Training



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Questions on query planner, join types, and work_mem

2010-08-04 Thread Greg Smith

Kevin Grittner wrote:

Assuming this is real, and not just in the noise -- what conclusions would you
draw from this?


Was trying to demonstrate the general ability of pg_stat_bgwriter 
snapshots at points in time to directly measure the buffer activity 
Hannu was theorizing about, not necessarily show a useful benchmark of 
any sort with that.  Watching pgbench create a database isn't all that 
interesting unless you either a) increase the database scale such that 
at least one timed checkpoint kicks in, or b) turn on archive_mode so 
the whole WAL COPY optimization is defeated.  More on this topic later, 
just happened to have that little example script ready to demonstrate 
the measurement concept.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Questions on query planner, join types, and work_mem

2010-08-04 Thread Greg Smith

Hannu Krosing wrote:

Do you mean written to disk, or written out from shared_buffers to
disk cache ?
  


The later turns into the former eventually, so both really.  The kernel 
will do some amount of write combining for you if you're lucky.  But not 
in all cases; it may decide to write something out to physical disk 
before the second write shows up.


--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com   www.2ndQuadrant.us


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Questions on query planner, join types, and work_mem

2010-08-04 Thread Kevin Grittner
Hannu Krosing ha...@2ndquadrant.com wrote:
 
 This seems to indicate about 1 ms for moving pages over
 user/system boundary. (Intel Core2 Duo T7500 @ 2.20GHz, Ubuntu
 9.10, 4GB RAM)
 
Using Greg's test script on a box with two cores like this:
 
Intel(R) Pentium(R) D CPU 3.40GHz
Linux kgrittn-desktop 2.6.31-22-generic #60-Ubuntu SMP Thu May 27
00:22:23 UTC 2010 i686 GNU/Linux
 
Dividing the run time by accumulated buffers_backend, it comes to
less than 0.2 ms per dirty buffer flushed.  If I get a few spare
ticks I'll try again while checking what vmstat and oprofile say
about how much of that went to things besides the transfer from
shared buffers to the OS.  I mean, it's possible I was waiting on
actual disk I/O at some point.
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Questions on query planner, join types, and work_mem

2010-08-04 Thread Kevin Grittner
Greg Smith g...@2ndquadrant.com wrote:
 
 Was trying to demonstrate the general ability of pg_stat_bgwriter 
 snapshots at points in time to directly measure the buffer
 activity Hannu was theorizing about, not necessarily show a useful
 benchmark of any sort with that.
 
Ah, OK.  Sorry I didn't pick up on that; I was struggling to tease
out some particular effect you expected to see in the numbers from
that particular run.  :-/
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Questions on query planner, join types, and work_mem

2010-08-04 Thread Hannu Krosing
On Wed, 2010-08-04 at 15:16 -0400, Greg Smith wrote:
 Hannu Krosing wrote:
  There was ample space for keeping the indexes in linux cache (it has 1GB
  cached currently) though the system may have decided to start writing it
  to disk, so I suspect that most of the time was spent copying random
  index pages back and forth between shared buffers and disk cache.

 
 Low shared_buffers settings will result in the same pages more often 
 being written multiple times per checkpoint,

Do you mean written to disk, or written out from shared_buffers to
disk cache ?

  particularly index pages, 
 which is less efficient than keeping in the database cache and updating 
 them there.  This is a slightly different issue than just the overhead 
 of copying them back and forth; by keeping them in cache, you actually 
 reduce writes to the OS cache. 

That's what I meant. Both writes to and read from the OS cache take a
significant amount of time once you are not doing real disk I/O.

 What I do to quantify that is...well, 
 the attached shows it better than I can describe; only works on 9.0 or 
 later as it depends on a feature I added for this purpose there.  It 
 measures exactly how much buffer cache churn happened during a test, in 
 this case creating a pgbench database.
 
 -- 
 Greg Smith  2ndQuadrant US  Baltimore, MD
 PostgreSQL Training, Services and Support
 g...@2ndquadrant.com   www.2ndQuadrant.us
 
 



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Questions on query planner, join types, and work_mem

2010-08-04 Thread Hannu Krosing
On Wed, 2010-08-04 at 22:03 +0300, Hannu Krosing wrote:
 On Wed, 2010-08-04 at 21:41 +0300, Hannu Krosing wrote:
  On Wed, 2010-08-04 at 14:00 -0400, Tom Lane wrote:
 
   regression=# select name, setting from pg_settings where name like 
   '%cost';
name | setting 
   --+-
cpu_index_tuple_cost | 0.005
cpu_operator_cost| 0.0025
cpu_tuple_cost   | 0.01
random_page_cost | 4
seq_page_cost| 1
   (5 rows)
   
   To model an all-in-RAM database, you can either dial down both
   random_page_cost and seq_page_cost to 0.1 or so, or set random_page_cost
   to 1 and increase all the CPU costs.  The former is less effort ;-)
   
   It should be noted also that there's not all that much evidence backing
   up the default values of the cpu_xxx_cost variables.  In the past those
   didn't matter much because I/O costs always swamped CPU costs anyway.
   But I can foresee us having to twiddle those defaults and maybe refine
   the CPU cost model more, as all-in-RAM cases get more common.
  
  Especially the context switch + copy between shared buffers and system
  disk cache will become noticeable at these speeds.
  
  An easy way to test it is loading a table with a few indexes, once with
  a shared_buffers value, which is senough for only the main table and
  once with one that fits both table and indexes,

I re-ran the test, and checked idx_blks_read for 28MB case

hannu=# select * from pg_statio_user_indexes where relname =
'sbuf_test';
| schemaname |  relname  | indexrelname | idx_blks_read | idx_blks_hit 
++---+--+---+--
| hannu  | sbuf_test | sbuf_test1   | 71376 |  1620908
| hannu  | sbuf_test | sbuf_test2   | 71300 |  1620365
| hannu  | sbuf_test | sbuf_test3   | 71436 |  1619619


this means that there were a total of 214112 index blocks read back from
disk cache (obviously at least some of these had to be copied the other
way as well).

This seems to indicate about 1 ms for moving pages over user/system
boundary. (Intel Core2 Duo T7500 @ 2.20GHz, Ubuntu 9.10, 4GB RAM)

for 128MB shared buffers the total idx_blks_read for 3 indexes was about
6300 .


 ok, just to back this up I ran the following test with 28MB and 128MB
 shared buffers.
 
 create table sbuf_test(f1 float, f2 float, f3 float);
 create index sbuf_test1 on sbuf_test(f1);
 create index sbuf_test2 on sbuf_test(f2);
 create index sbuf_test3 on sbuf_test(f3);
 
 and then did 3 times the following for each shared_buffers setting
 
 truncate sbuf_test;
 insert into sbuf_test 
 select random(), random(), random() from generate_series(1,60);
 
 the main table size was 31MB, indexes were 18MB each for total size of
 85MB
 
 in case of 128MB shared buffers, the insert run in 14sec (+/- 1 sec)
 
 in case of 28MB shared buffers, the insert run between 346 and 431 sec,
 that is 20-30 _times_ slower.
 
 There was ample space for keeping the indexes in linux cache (it has 1GB
 cached currently) though the system may have decided to start writing it
 to disk, so I suspect that most of the time was spent copying random
 index pages back and forth between shared buffers and disk cache.
 
 I did not verify this, so there may be some other factors involved, but
 this seems like the most obvious suspect.
 
 -- 
 Hannu Krosing   http://www.2ndQuadrant.com
 PostgreSQL Scalability and Availability 
Services, Consulting and Training
 
 
 



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Questions on query planner, join types, and work_mem

2010-08-02 Thread Peter Hussey
I already had effective_cache_size set to 500MB.

I experimented with lowering  random_page_cost to 3 then 2.  It made no
difference in the choice of plan that I could see.  In the explain analyze
output the estimated costs of nested loop were in fact lowererd, but so were
the costs of the hash join plan, and the hash join remained the lowest
predicted costs in all tests i tried.

What seems wrong to me is that the hash join strategy shows almost no
difference in estimated costs as work_mem goes from 1MB to 500MB. The cost
function decreases by 1%, but the actual time for the query to execute
decreases by 86% as work_mem goes from 1MB to 500MB.

My questions are still
1)  Does the planner have any component of cost calculations based on the
size of work_mem, and if so why do those calculations  seem to have so
little effect here?

2) Why is the setting of work_mem something left to the admin and/or
developer?  Couldn't the optimizer say how much it thinks it needs to build
a hash table based on size of the keys and estimated number of rows?

It is difficult for a software development platform like ours to take
advantage of suggestions to set work_mem, or to change the cost function, or
turn on/off join strategies for individual queries.  The SQL we issue is
formed by user interaction with the product and rarely static.  How would we
know when to turn something on or off?  That's why I'm looking for a
configuratoin solution that I can set on a database-wide basis and have it
work well for all queries.

thanks
Peter


On Fri, Jul 30, 2010 at 7:03 AM, Tom Lane t...@sss.pgh.pa.us wrote:

 Peter Hussey pe...@labkey.com writes:
  Using the default of 1MB work_mem, the planner chooses a hash join plan :
  Hash Left Join  (cost=252641.82..11847353.87 rows=971572 width=111)
 (actual
  time=124196.670..280461.604 rows=968080 loops=1)
  ...
  For the same default 1MB work_mem, a nested loop plan is better
  Nested Loop Left Join  (cost=8.27..15275401.19 rows=971572 width=111)
  (actual time=145.015..189957.023 rows=968080 loops=1)
  ...

 Hm.  A nestloop with nearly a million rows on the outside is pretty
 scary.  The fact that you aren't unhappy with that version of the plan,
 rather than the hash, indicates that the object table must be
 fully cached in memory, otherwise the repeated indexscans would be a
 lot slower than this:

-  Index Scan using uq_object on object obj  (cost=0.00..3.51 rows=1
  width=95) (actual time=0.168..0.170 rows=1 loops=968080)
  Index Cond: ((sd.lsid)::text = (obj.objecturi)::text)

 My take on it is that the estimate of the hash plan's cost isn't bad;
 what's bad is that the planner is mistakenly estimating the nestloop as
 being worse.  What you need to do is adjust the planner's cost
 parameters so that it has a better idea of the true cost of repeated
 index probes in your environment.  Crank up effective_cache_size if
 you didn't already, and experiment with lowering random_page_cost.
 See the list archives for more discussion of these parameters.

regards, tom lane




-- 
Peter Hussey
LabKey Software
206-667-7193 (office)
206-291-5625 (cell)


Re: [PERFORM] Questions on query planner, join types, and work_mem

2010-08-02 Thread Robert Haas
On Mon, Aug 2, 2010 at 5:23 PM, Peter Hussey pe...@labkey.com wrote:
 I already had effective_cache_size set to 500MB.

 I experimented with lowering  random_page_cost to 3 then 2.  It made no
 difference in the choice of plan that I could see.  In the explain analyze
 output the estimated costs of nested loop were in fact lowererd, but so were
 the costs of the hash join plan, and the hash join remained the lowest
 predicted costs in all tests i tried.

What do you get if you set random_page_cost to a small value such as 0.01?

 What seems wrong to me is that the hash join strategy shows almost no
 difference in estimated costs as work_mem goes from 1MB to 500MB. The cost
 function decreases by 1%, but the actual time for the query to execute
 decreases by 86% as work_mem goes from 1MB to 500MB.

Wow.  It would be interesting to find out how many batches are being
used.  Unfortunately, releases prior to 9.0 don't display that
information.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Questions on query planner, join types, and work_mem

2010-08-02 Thread Tom Lane
Peter Hussey pe...@labkey.com writes:
 My questions are still
 1)  Does the planner have any component of cost calculations based on the
 size of work_mem,

Sure.

 and if so why do those calculations  seem to have so
 little effect here?

Since you haven't provided sufficient information to let someone else
reproduce what you're seeing, it's pretty hard to say.  It might have
something to do with the particularly wide join key values you're using,
but that's mere speculation based on the one tidbit you provided.  There
might be some other effect altogether that's making it do the wrong thing.

 2) Why is the setting of work_mem something left to the admin and/or
 developer?

Because we're not smart enough to find a way to avoid that.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Questions on query planner, join types, and work_mem

2010-07-30 Thread Tom Lane
Peter Hussey pe...@labkey.com writes:
 Using the default of 1MB work_mem, the planner chooses a hash join plan :
 Hash Left Join  (cost=252641.82..11847353.87 rows=971572 width=111) (actual
 time=124196.670..280461.604 rows=968080 loops=1)
 ...
 For the same default 1MB work_mem, a nested loop plan is better
 Nested Loop Left Join  (cost=8.27..15275401.19 rows=971572 width=111)
 (actual time=145.015..189957.023 rows=968080 loops=1)
 ...

Hm.  A nestloop with nearly a million rows on the outside is pretty
scary.  The fact that you aren't unhappy with that version of the plan,
rather than the hash, indicates that the object table must be 
fully cached in memory, otherwise the repeated indexscans would be a
lot slower than this:

   -  Index Scan using uq_object on object obj  (cost=0.00..3.51 rows=1
 width=95) (actual time=0.168..0.170 rows=1 loops=968080)
 Index Cond: ((sd.lsid)::text = (obj.objecturi)::text)

My take on it is that the estimate of the hash plan's cost isn't bad;
what's bad is that the planner is mistakenly estimating the nestloop as
being worse.  What you need to do is adjust the planner's cost
parameters so that it has a better idea of the true cost of repeated
index probes in your environment.  Crank up effective_cache_size if
you didn't already, and experiment with lowering random_page_cost.
See the list archives for more discussion of these parameters.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Questions on query planner, join types, and work_mem

2010-07-27 Thread Peter Hussey
I have spent the last couple of weeks digging into a Postgres performance
problem that ultimately boiled down to this:  the planner was choosing to
use hash joins on a set of join keys that were much larger than the
configured work_mem.  We found we could make the  performance much better by
either
1) increasing work_mem to 500MB or more, or
2) forcing the planner to choose index-backed nested loops by turning off
hash and merge joins as well as bitmap and sequential scans.

Now we are trying to decide which of these paths to choose, and asking why
the planner doesn't handle this for us.

Background:  LabKey builds an open source platform for biomedical research
data.  The platform consists of a tomcat web application and a relational
database.  we support two databases, Postgres and SQL Server.  We started
with SQL Server because we were very familiar with it.  Two of our technical
team came from the SQL Server development team.  We chose Postgres because
we assessed that it was the open source database most likely to be able to
handle our application  requirements for capacity and complex, nested,
generated SQL handling.  Postgres is now the default database for our
platform and most of our key customers use it.  In general we've been very
satisfied with Postgres' performance and compatibility, but our customers
are starting to hit situations where we really need to be able to understand
why a particular operation is slow.  We are currently recommending version
8.4 and using that ourselves.

The core of the problem query was

SELECT * INTO snapshot_table FROM
  (SELECT ... FROM  tableA A LEFT  OUTER JOIN tableB B ON (A.lsid = B.lsid)
and A.datasetid = ? )  query1

the join column, lsid, is a poor choice for a join column as it is a long
varchar value (avg length 101 characters) that us only gets unique way out
on the right hand side.  But we are stuck with this choice.  I can post the
SQL query and table definitions if it will help, but changes to either of
those would be risky and difficult, whereas setting the work_mem value or
forcing nested loop joins is less risky.

The Performance curve looks something like this

Join Type  work_mem(MB) time to populate snapshot (min)
__
Hash  5085
Hash  200   38
Hash  400   21
Hash  500   12
Hash 1000   12
___
NestedLoop5015
NestedLoop200   11
NestedLoop400   11
NestedLoop500   10
NestedLoop   1000   10


Table A contains about 3.5 million rows, and table B contains about 4.4
million rows.  By looking at the EXPLAIN ANALYZE reports I concluded that
the planner seemed to be accurately determining the approximate number of
rows returned on each side of the join node.  I also noticed that at the
work_mem = 50 test, the hash join query execution was using over a GB of
space in the pgsql_tmp, space that grew and shrank slowly over the course of
the test.

Now for the questions:
1)  If we tell the customer to set his work_mem value to 500MB or 1GB in
postgres.config, what problems might they see?  the documentation and the
guidelines we received from Rupinder Singh in support suggest a much lower
value, e.g. a max work_mem of 10MB.  Other documentation such as the Guide
to Posting Slow Query Questions suggest at least testing up to 1GB.  What
is a reasonable maximum to configure for all connnections?

2) How is work_mem used by a query execution?  For example, does each hash
table in an execution get allocated a full work_mem's worth of memory ?   Is
this memory released when the query is finished, or does it stay attached to
the connection or some other object?

3) is there a reason why the planner doesn't seem to recognize the condition
when the hash table won't fit in the current work_mem, and choose a
low-memory plan instead?

Excuse the long-winded post; I was trying to give the facts and nothing but
the facts.

Thanks,
Peter Hussey
LabKey Software


Re: [PERFORM] Questions on query planner, join types, and work_mem

2010-07-27 Thread Andres Freund
Hi,

On Tue, Jul 27, 2010 at 04:08:16PM -0700, Peter Hussey wrote:
 Now for the questions:
 1)  If we tell the customer to set his work_mem value to 500MB or 1GB in
 postgres.config, what problems might they see?  the documentation and the
 guidelines we received from Rupinder Singh in support suggest a much lower
 value, e.g. a max work_mem of 10MB.  Other documentation such as the Guide
 to Posting Slow Query Questions suggest at least testing up to 1GB.  What
 is a reasonable maximum to configure for all connnections?
Well. That depends on the amount of expected concurrency and available
memory. Obviously you can set it way much higher in an OLAPish, low
concurrency setting than in an OLTP environment.

That setting is significantly complex to estimate in my opinion. For
one the actualy usage depends on the complexity of the queries, for
another to be halfway safe you have to use avail_mem/(max_connections
* max_nodes_of_most_complex_query). Which is often a very pessimistic
and unusably low estimate.

 2) How is work_mem used by a query execution?  For example, does each hash
 table in an execution get allocated a full work_mem's worth of memory ?   Is
 this memory released when the query is finished, or does it stay attached to
 the connection or some other object?
Each Node of the query can use one work_mem worth of data (sometimes a
bit more). The memory is released after the query finished (or
possibly earlier, dependent of the structure of the query).
The specific allocation pattern and implementation details (of malloc)
influence how and when that memory is actually returned to the os.

 3) is there a reason why the planner doesn't seem to recognize the condition
 when the hash table won't fit in the current work_mem, and choose a
 low-memory plan instead?
Hard to say without more information. Bad estimates maybe? Best show
your query plan (EXPLAIN ANALYZE), the table definition and some
details about common hardware (i.e. whether it has 1GB of memory or
256GB).

Andres

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Questions on query planner, join types, and work_mem

2010-07-27 Thread Tom Lane
Peter Hussey pe...@labkey.com writes:
 I have spent the last couple of weeks digging into a Postgres performance
 problem that ultimately boiled down to this:  the planner was choosing to
 use hash joins on a set of join keys that were much larger than the
 configured work_mem.

What Postgres version is this, exactly?  (8.4 is not the answer I want.)

 the join column, lsid, is a poor choice for a join column as it is a long
 varchar value (avg length 101 characters) that us only gets unique way out
 on the right hand side.

Hm, but it is unique eventually?  It's not necessarily bad for hashing
as long as that's so.

 1)  If we tell the customer to set his work_mem value to 500MB or 1GB in
 postgres.config, what problems might they see?

That would almost certainly be disastrous.  If you have to follow the
hack-work_mem path, I'd suggest increasing it locally in the session
executing the problem query, and only for the duration of that query.
Use SET, or even SET LOCAL.

 2) How is work_mem used by a query execution?

Well, the issue you're hitting is that the executor is dividing the
query into batches to keep the size of the in-memory hash table below
work_mem.  The planner should expect that and estimate the cost of
the hash technique appropriately, but seemingly it's failing to do so.
Since you didn't provide EXPLAIN ANALYZE output, though, it's hard
to be sure.

 3) is there a reason why the planner doesn't seem to recognize the condition
 when the hash table won't fit in the current work_mem, and choose a
 low-memory plan instead?

That's the question, all right.  I wonder if it's got something to do
with the wide-varchar nature of the join key ... but again that's just
speculation with no facts.  Please show us EXPLAIN ANALYZE results
for the hash plan with both small and large work_mem, as well as for
the nestloop plan.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Questions on query planner, join types, and work_mem

2010-07-27 Thread Alvaro Herrera
Excerpts from Tom Lane's message of mar jul 27 20:05:02 -0400 2010:
 Peter Hussey pe...@labkey.com writes:

  2) How is work_mem used by a query execution?
 
 Well, the issue you're hitting is that the executor is dividing the
 query into batches to keep the size of the in-memory hash table below
 work_mem.  The planner should expect that and estimate the cost of
 the hash technique appropriately, but seemingly it's failing to do so.
 Since you didn't provide EXPLAIN ANALYZE output, though, it's hard
 to be sure.

Hmm, I wasn't aware that hash joins worked this way wrt work_mem.  Is
this visible in the explain output?  If it's something subtle (like an
increased total cost), may I suggest that it'd be a good idea to make it
explicit somehow in the machine-readable outputs?

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Questions on query planner, join types, and work_mem

2010-07-27 Thread Jayadevan M
Hello,
 the join column, lsid, is a poor choice for a join column as it is a
 long varchar value (avg length 101 characters) that us only gets 
 unique way out on the right hand side.
Would a join on subtring on the 'way out on the right hand side' (did you 
mean 'rightmost characters' or 'only when we take almost all the 101 
characters'?) together with a function based index help?
Regards,
Jayadevan





DISCLAIMER: 

The information in this e-mail and any attachment is intended only for 
the person to whom it is addressed and may contain confidential and/or 
privileged material. If you have received this e-mail in error, kindly 
contact the sender and destroy all copies of the original communication. 
IBS makes no warranty, express or implied, nor guarantees the accuracy, 
adequacy or completeness of the information contained in this email or any 
attachment and is not liable for any errors, defects, omissions, viruses 
or for resultant loss or damage, if any, direct or indirect.






-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Questions on query planner, join types, and work_mem

2010-07-27 Thread Tom Lane
Alvaro Herrera alvhe...@commandprompt.com writes:
 Excerpts from Tom Lane's message of mar jul 27 20:05:02 -0400 2010:
 Well, the issue you're hitting is that the executor is dividing the
 query into batches to keep the size of the in-memory hash table below
 work_mem.  The planner should expect that and estimate the cost of
 the hash technique appropriately, but seemingly it's failing to do so.

 Hmm, I wasn't aware that hash joins worked this way wrt work_mem.  Is
 this visible in the explain output?

As of 9.0, any significant difference between Hash Batches and
Original Hash Batches would be a cue that the planner blew the
estimate.  For Peter's problem, we're just going to have to look
to see if the estimated cost changes in a sane way between the
small-work_mem and large-work_mem cases.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance