Re: [PERFORM] Slow Postgresql server

2007-04-12 Thread Ron

1= RAID 1improves data =intregrity=, not IO performance.
Your HD IO performance is essentially that of 1 160GB HD of whatever 
performance one of those HDs have.
(what kind of HDs are they anyway?  For instance 7200rpm 160GB HDs 
are not particularly high performance)

BEST case is streaming IO involving no seeks = ~50 MBps.
You can't get even that as the back end of a website.

2= 1GB of RAM is -small- for a DB server.

You need to buy RAM and HD.

Boost the RAM to 4GB, change pg config parameters appropriately and 
see how much it helps.

Non ECC RAM is currently running ~$60-$75 per GB for 1 or 2 GB sticks
ECC RAM prices will be ~ 1.5x - 2x that, $120 - $150 per GB for 1 or 
2 GB sticks.
(do !not! buy 4GB sticks unless you have a large budget.  Their price 
pr GB is still too high)


If adding RAM helps as much as I suspect it will, find out how big 
the hot section of your DB is and see if you can buy enough RAM to 
make it RAM resident.

If you can do this, it will result in the lowest term DB maintenance.

If you can't do that for whatever reason, the next step is to improve 
your HD subsystem.
Cheap RAID cards with enough BB cache to allow writes to be coalesced 
into larger streams (reducing seeks) will help, but you fundamentally 
you will need more HDs.


RAID 5 is an reasonable option for most website DBs workloads.
To hit the 300MBps speeds attainable by the cheap RAID cards, you are 
going to at least 7 HDs (6 HDs * 50MBps ASTR = 300MBps ASTR + the 
equivalent of 1 HD gets used for the R in RAID).  A minimum of 8 
HDs are need for this performance if you want to use RAID 6.

Most tower case (not mini-tower, tower) cases can hold this internally.
Price per MBps of HD is all over the map.  The simplest (but not 
necessarily best) option is to buy more of the 160GB HDs you already have.
Optimizing the money spent when buying HDs for a RAID set is a bit 
more complicated than doing so for RAM.  Lot's of context dependent 
things affect the final decision.


I see you are mailing from Brandeis.  I'm local.  Drop me some 
private email at the address I'm posting from if you want and I'll 
send you further contact info so we can talk in more detail.


Cheers,
Ron Peacetree


At 06:02 PM 4/11/2007, Jason Lustig wrote:

Hello all,

My website has been having issues with our new Linux/PostgreSQL
server being somewhat slow. I have done tests using Apache Benchmark
and for pages that do not connect to Postgres, the speeds are much
faster (334 requests/second v. 1-2 requests/second), so it seems that
Postgres is what's causing the problem and not Apache. I did some
reserach, and it seems that the bottleneck is in fact the hard
drives! Here's an excerpt from vmstat:

procs ---memory-- ---swap-- -io --system--
-cpu--
r  b   swpd   free   buff  cache   si   sobibo   incs us
sy id wa st
1  1140  24780 166636 57514400 0  3900 1462  3299  1
4 49 48  0
0  1140  24780 166636 57514400 0  3828 1455  3391  0
4 48 48  0
1  1140  24780 166636 57514400 0  2440  960  2033  0
3 48 48  0
0  1140  24780 166636 57514400 0  2552 1001  2131  0
2 50 49  0
0  1140  24780 166636 57514400 0  3188 1233  2755  0
3 49 48  0
0  1140  24780 166636 57514400 0  2048  868  1812  0
2 49 49  0
0  1140  24780 166636 57514400 0  2720 1094  2386  0
3 49 49  0

As you can see, almost 50% of the CPU is waiting on I/O. This doesn't
seem like it should be happening, however, since we are using a RAID
1 setup (160+160). We have 1GB ram, and have upped shared_buffers to
13000 and work_mem to 8096. What would cause the computer to only use
such a small percentage of the CPU, with more than half of it waiting
on I/O requests?

Thanks a lot
Jason


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] Large objetcs performance

2007-04-12 Thread Merlin Moncure

On 4/4/07, Alexandre Vasconcelos [EMAIL PROTECTED] wrote:

We have an application subjected do sign documents and store them
somewhere. The files size may vary from Kb to Mb. Delelopers are
arguing about the reasons to store files direcly on operating system
file system or on the database, as large objects. My boss is
considering file system storing, because he is concerned about
integrity, backup/restore corruptions. I'd like to know some reasons
to convince them to store these files on PosgtreSQL, including
integrity, and of course, performance. I would like to know the file
system storing disadvantages as well.


This topic actually gets debated about once a month on the lists :-).
Check the archives, but here is a quick summary:

Storing objects on the file system:
* usually indexed on the database for searching
* faster than database (usually)
* more typical usage pattern
* requires extra engineering if you want to store huge numbers of objects
* requires extra engineering to keep your database in sync.  on
postgresql irc someone suggested a clever solution with inotify
* backup can be a pain (even rsync has its limits) -- for really big
systems, look at clustering solutions (drbd for example)
* lots of people will tell you this 'feels' right or wrong -- ignore them :-)
* well traveled path. it can be made to work.

Storing objects on the database:
* slower, but getting faster -- its mostly cpu bound currently
* get very recent cpu. core2 xeons appear to be particularly good at this.
* use bytea, not large objects
* will punish you if your client interface does not communicate with
database in binary
* less engineering in the sense you are not maintaining two separate systems
* forget backing up with pg_dump...go right to pitr (maybe slony?)
* 1gb limit. be aware of high memory requirements
* you get to work with all your data with single interface and
administrate one system -- thats the big payoff.
* less well traveled path.  put your rd cap on and be optimistic but
skeptical.  do some tests.

merlin

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [PERFORM] Slow Postgresql server

2007-04-12 Thread Guido Neitzer

On 12.04.2007, at 07:26, Ron wrote:


You need to buy RAM and HD.


Before he does that, wouldn't it be more useful, to find out WHY he  
has so much IO?


Have I missed that or has nobody suggested finding the slow queries  
(when you have much IO on them, they might be slow at least with a  
high shared memory setting).


So, my first idea is, to turn on query logging for queries longer  
than a xy milliseconds, explain analyse these queries and see  
wether there are a lot of seq scans involved, which would explain the  
high IO.


Just an idea, perhaps I missed that step in that discussion  
somewhere ...


But yes, it might also be, that the server is swapping, that's  
another thing to find out.


cug

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] Slow Postgresql server

2007-04-12 Thread Ron

At 10:08 AM 4/12/2007, Guido Neitzer wrote:

On 12.04.2007, at 07:26, Ron wrote:


You need to buy RAM and HD.


Before he does that, wouldn't it be more useful, to find out WHY he
has so much IO?


1= Unless I missed something, the OP described pg being used as a 
backend DB for a webserver.


I know the typical IO demands of that scenario better than I sometimes want to.
:-(


2= 1GB of RAM + effectively 1 160GB HD = p*ss poor DB IO support.
~ 1/2 that RAM is going to be used for OS stuff, leaving only ~512MB 
of RAM to be used supporting pg.
That RAID 1 set is effectively 1 HD head that all IO requests are 
going to contend for.
Even if the HD in question is a 15Krpm screamer, that level of HW 
contention has very adverse implications.



Completely agree that at some point the queries need to be examined 
(ditto the table schema, etc), but this system is starting off in a 
Bad Place for its stated purpose IME.
Some minimum stuff is obvious even w/o spending time looking at 
anything beyond the HW config.


Cheers,
Ron Peacetree



Have I missed that or has nobody suggested finding the slow queries
(when you have much IO on them, they might be slow at least with a
high shared memory setting).

So, my first idea is, to turn on query logging for queries longer
than a xy milliseconds, explain analyse these queries and see
wether there are a lot of seq scans involved, which would explain the
high IO.

Just an idea, perhaps I missed that step in that discussion
somewhere ...

But yes, it might also be, that the server is swapping, that's
another thing to find out.

cug

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] Slow Postgresql server

2007-04-12 Thread Guido Neitzer

On 12.04.2007, at 08:59, Ron wrote:

1= Unless I missed something, the OP described pg being used as a  
backend DB for a webserver.


Yep.

I know the typical IO demands of that scenario better than I  
sometimes want to.

:-(


Yep. Same here. ;-)


2= 1GB of RAM + effectively 1 160GB HD = p*ss poor DB IO support.


Absolutely right. Depending a little bit on the DB and WebSite layout  
and on the actual requirements, but yes - it's not really a kick-ass  
machine ...


Completely agree that at some point the queries need to be examined  
(ditto the table schema, etc), but this system is starting off in a  
Bad Place for its stated purpose IME.
Some minimum stuff is obvious even w/o spending time looking at  
anything beyond the HW config.


Depends. As I said - if the whole DB fits into the remaining space,  
and a lot of website backend DBs do, it might just work out. But this  
seems not to be the case - either the site is chewing on seq scans  
all the time which will cause I/O or it is bound by the lack of  
memory and swaps the whole time ... He has to find out.


cug

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Slow Postgresql server

2007-04-12 Thread Jeff Frost

On Thu, 12 Apr 2007, Jason Lustig wrote:


0 -- BM starts here
10  0180 700436  16420  9174000 0   176  278  2923 59 41  0 
0  0
11  0180 696736  16420  9174000 0 0  254  2904 57 43  0 
0  0
12  0180 691272  16420  9174000 0 0  255  3043 60 39  1 
0  0
9  0180 690396  16420  9174000 0 0  254  3078 63 36  2  0 
0


Obviously, I've turned off logging now but I'd like to get it running again 
(without bogging down the server) so that I can profile the system and find 
out which queries I need to optimize. My logging settings (with unnecessary 
comments taken out) were:


So what did you get in the logs when you had logging turned on?  If you have 
the statement logging, perhaps it's worth running through pgfouine to generate 
a report.




log_destination = 'syslog'# Valid values are combinations of
redirect_stderr = off   # Enable capturing of stderr into log
log_min_duration_statement =  0  # -1 is disabled, 0 logs all 
statements

silent_mode = on# DO NOT USE without syslog or
log_duration = off
log_line_prefix = 'user=%u,db=%d'   # Special values:
log_statement = 'none'  # none, ddl, mod, all



Perhaps you just want to log slow queries  100ms?  But since you don't seem 
to know what queries you're running on each web page, I'd suggest you just 
turn on the following and run your benchmark against it, then turn it back 
off:


log_duration = on
log_statement = 'all'

Then go grab pgfouine and run the report against the logs to see what queries 
are chewing up all your time.


So you know, we're using Postgres 8.2.3. The database currently is pretty 
small (we're just running a testing database right now with a few megabytes 
of data). No doubt some of our queries are slow, but I was concerned because 
no matter how slow the queries were (at most the worst were taking a couple 
of msecs anyway), I was getting ridiculously slow responses from the server. 
Outside of logging, our only other non-default postgresql.conf items are:


shared_buffers = 13000  # min 128kB or max_connections*16kB
work_mem = 8096 # min 64kB

In terms of the server itself, I think that it uses software raid. How can I 
tell? Our hosting company set it up with the server so I guess I could ask 
them, but is there a program I can run which will tell me the information? I 
also ran bonnie++ and got this output:


Version  1.03   --Sequential Output-- --Sequential Input- 
--Random-
  -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- 
--Seeks--
MachineSize K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec 
%CP

pgtest 2000M 29277  67 33819  15 15446   4 35144  62 48887   5 152.7   0
  --Sequential Create-- Random 
Create
  -Create-- --Read--- -Delete-- -Create-- --Read--- 
-Delete--
files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec 
%CP
   16 17886  77 + +++ + +++ 23258  99 + +++ + 
+++


So I'm getting 33MB and 48MB write/read respectively. Is this slow? Is there 
anything I should be doing to optimize our RAID configuration?




It's not fast, but at least it's about the same speed as an average IDE drive 
from this era.  More disks would help, but since you indicate the DB fits in 
RAM with plenty of room to spare, how about you update your 
effective_cache_size to something reasonable.  You can use the output of the 
'free' command and take the cache number and divide by 8 to get a reasonable 
value on linux.  Then turn on logging and run your benchmark.  After that, run 
a pgfouine report against the log and post us the explain analyze from your 
slow queries.


And if Ron is indeed local, it might be worthwhile to contact him.  Someone 
onsite would likely get this taken care of much faster than we can on the 
mailing list.


--
Jeff Frost, Owner   [EMAIL PROTECTED]
Frost Consulting, LLC   http://www.frostconsultingllc.com/
Phone: 650-780-7908 FAX: 650-649-1954

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] Question about memory allocations

2007-04-12 Thread Steve



Steve [EMAIL PROTECTED] writes:

- What is temp_buffers used for exactly?


Temporary tables.  Pages of temp tables belonging to your own backend
don't ever get loaded into the main shared-buffers arena, they are read
into backend-local memory.  temp_buffers is the max amount (per backend)
of local memory to use for this purpose.


	Are these only tables explicitly stated as 'temporary' (which as I 
recall is a create table option) or are temporary tables used for other 
things to like, say, nested queries or other lil in the background things?



- Any idea if this is a smart configuration for this machine?


Um ... you didn't mention which PG version?



	The latest and greatest stable as downloaded a couple days ago. 
8.2.3. :)



Thanks for the info!


Steve

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Slow Postgresql server

2007-04-12 Thread Scott Marlowe
On Thu, 2007-04-12 at 10:19, Guido Neitzer wrote:
 On 12.04.2007, at 08:59, Ron wrote:

 
 Depends. As I said - if the whole DB fits into the remaining space,  
 and a lot of website backend DBs do, it might just work out. But this  
 seems not to be the case - either the site is chewing on seq scans  
 all the time which will cause I/O or it is bound by the lack of  
 memory and swaps the whole time ... He has to find out.

It could also be something as simple as a very bloated data store.

I'd ask the user what vacuum verbose says at the end

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Slow Postgresql server

2007-04-12 Thread Jeff Frost

On Thu, 12 Apr 2007, Scott Marlowe wrote:


On Thu, 2007-04-12 at 10:19, Guido Neitzer wrote:

On 12.04.2007, at 08:59, Ron wrote:




Depends. As I said - if the whole DB fits into the remaining space,
and a lot of website backend DBs do, it might just work out. But this
seems not to be the case - either the site is chewing on seq scans
all the time which will cause I/O or it is bound by the lack of
memory and swaps the whole time ... He has to find out.


It could also be something as simple as a very bloated data store.

I'd ask the user what vacuum verbose says at the end


You know, I should answer emails at night...we didn't ask when the last time 
the data was vacuumed or analyzed and I believe he indicated that the only 
non-default values were memory related, so no autovacuum running.


Jason,

Before you go any further, run 'vacuum analyze;' on your DB if you're not 
doing this with regularity and strongly consider enabling autovacuum.


--
Jeff Frost, Owner   [EMAIL PROTECTED]
Frost Consulting, LLC   http://www.frostconsultingllc.com/
Phone: 650-780-7908 FAX: 650-649-1954

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [PERFORM] Equivalents in PostgreSQL of MySQL's ENGINE=MEMORY MAX_ROWS=1000

2007-04-12 Thread Robert Treat
On Wednesday 04 April 2007 07:51, Arnau wrote:
 Hi Ansgar ,

  On 2007-04-04 Arnau wrote:
  Josh Berkus wrote:
  Is there anything similar in PostgreSQL? The idea behind this is how
  I can do in PostgreSQL to have tables where I can query on them very
  often something like every few seconds and get results very fast
  without overloading the postmaster.
 
  If you're only querying the tables every few seconds, then you don't
  really need to worry about performance.
 
  Well, the idea behind this is to have events tables, and a monitoring
  system polls that table every few seconds.  I'd like to have a kind of
  FIFO stack. From the events producer point of view he'll be pushing
  rows into that table, when it's filled the oldest one will be removed
  to leave room to the newest one. From the consumer point of view
  he'll read all the contents of that table.
 
  So I'll not only querying the tables, I'll need to also modify that
  tables.
 
  Ummm... this may be a dumb question, but why are you trying to implement
  something like a FIFO with an RDBMS in the first place? Wouldn't it be
  much easier to implement something like that as a separate program or
  script?

 Well, the idea is have a table with a maximum number of rows. As the
 number of queries over this table will be very high, I'd like to keep it
 as small as possible and without indexes and so on that could make the
 update slower.

 Maybe it's the moment to change my question, is there any trick to get a
 table that can be modified/queried very fast and with the minimum of
 overhead? This table will have several queries every second and I'd like
 to do this as fast as possible


If you're wedded to the FIFO idea, I'd suggest reading this:
http://people.planetpostgresql.org/greg/index.php?/archives/89-Implementing-a-queue-in-SQL-Postgres-version.html

-- 
Robert Treat
Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Slow Postgresql server

2007-04-12 Thread Carlos Moreno

Jeff Frost wrote:


You know, I should answer emails at night...


Indeed you shouldN'T   ;-)

Carlos
--


---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


[PERFORM] Strangely Variable Query Performance

2007-04-12 Thread Steve

Hey there;

On a Postgres 8.2.3 server, I've got a query that is running very slow in 
some cases.  With some work, I've determined the 'slow part' of the query. 
:)  This is a query on a table with like 10 million rows or something like 
that.  encounter_id is an integer and receipt is of type 'date'.


This query runs really slow [minutes] (query and explain below):

select extract(epoch from ds.receipt) from detail_summary ds where
ds.receipt = '1998-12-30 0:0:0' and
ds.encounter_id in
(8813186,8813187,8813188,8813189,8813190,8813191,8813192,
8813193,8813194,8813195,8813196,8813197,8813198,8813199,
8813200,8813201,8813202,8813203,8813204,8813205,8813206,
8813207,8813208,8813209,8813210,8813211,8813212,8813213,
8813214,8813215,8813216,8813217,8813218,8813219,8813220,
8813221,8813222,8813223,8813224,8813225,8813226,8813227,
8813228,8813229,8813230,8813231,8813232,8813233,8813234,
8813235,8813236,8813237,8813238,8813239,8813240,8813241,
8813242,8813243,8813244,8813245,8813246,8813247,8813248,
8813249,8813250,8813251,8813252,8813253,8813254,8813255,
8813256,8813257,8813258,8813259,8813260,8813261,8813262,
8813263,8813264,8813265,8813266,8813267,8813268,8813269,
8813270,8813271,8813272,8813273,8813274,8813275,8813276,
8813277,8813278,8813279,8813280,8813281,8813282,8813283,
8813284,8815534)

Results in the 'explain' :

 Seq Scan on detail_summary ds  (cost=0.00..1902749.83 rows=9962 width=4)
   Filter: ((receipt = '1998-12-30'::date) AND (encounter_id = ANY 
('{8813186,8813187,8813188,8813189,8813190,8813191,8813192,8813193,8813194,8813195,8813196,8813197,8813198,8813199,8813200,8813201,8813202,8813203,8813204,8813205,8813206,8813207,8813208,8813209,8813210,8813211,8813212,8813213,8813214,8813215,8813216,8813217,8813218,8813219,8813220,8813221,8813222,8813223,8813224,8813225,8813226,8813227,8813228,8813229,8813230,8813231,8813232,8813233,8813234,8813235,8813236,8813237,8813238,8813239,8813240,8813241,8813242,8813243,8813244,8813245,8813246,8813247,8813248,8813249,8813250,8813251,8813252,8813253,8813254,8813255,8813256,8813257,8813258,8813259,8813260,8813261,8813262,8813263,8813264,8813265,8813266,8813267,8813268,8813269,8813270,8813271,8813272,8813273,8813274,8813275,8813276,8813277,8813278,8813279,8813280,8813281,8813282,8813283,8813284,8815534}'::integer[])))

(2 rows)


Turning enable_seqscan to off results in a slightly more interesting 
explain, but an equally slow query.



HOWEVER!  The simple removal of the receipt date paramater results in a 
fast query, as such:


select extract(epoch from ds.receipt) from detail_summary ds where
ds.encounter_id in
(8813186,8813187,8813188,8813189,8813190,8813191,8813192,
8813193,8813194,8813195,8813196,8813197,8813198,8813199,
8813200,8813201,8813202,8813203,8813204,8813205,8813206,
8813207,8813208,8813209,8813210,8813211,8813212,8813213,
8813214,8813215,8813216,8813217,8813218,8813219,8813220,
8813221,8813222,8813223,8813224,8813225,8813226,8813227,
8813228,8813229,8813230,8813231,8813232,8813233,8813234,
8813235,8813236,8813237,8813238,8813239,8813240,8813241,
8813242,8813243,8813244,8813245,8813246,8813247,8813248,
8813249,8813250,8813251,8813252,8813253,8813254,8813255,
8813256,8813257,8813258,8813259,8813260,8813261,8813262,
8813263,8813264,8813265,8813266,8813267,8813268,8813269,
8813270,8813271,8813272,8813273,8813274,8813275,8813276,
8813277,8813278,8813279,8813280,8813281,8813282,8813283,
8813284 ,8815534)

This query returns instantly and explains as:

 Bitmap Heap Scan on detail_summary ds  (cost=161.00..14819.81 rows=9963 
width=4)
   Recheck Cond: (encounter_id = ANY 
('{8813186,8813187,8813188,8813189,8813190,8813191,8813192,8813193,8813194,8813195,8813196,8813197,8813198,8813199,8813200,8813201,8813202,8813203,8813204,8813205,8813206,8813207,8813208,8813209,8813210,8813211,8813212,8813213,8813214,8813215,8813216,8813217,8813218,8813219,8813220,8813221,8813222,8813223,8813224,8813225,8813226,8813227,8813228,8813229,8813230,8813231,8813232,8813233,8813234,8813235,8813236,8813237,8813238,8813239,8813240,8813241,8813242,8813243,8813244,8813245,8813246,8813247,8813248,8813249,8813250,8813251,8813252,8813253,8813254,8813255,8813256,8813257,8813258,8813259,8813260,8813261,8813262,8813263,8813264,8813265,8813266,8813267,8813268,8813269,8813270,8813271,8813272,8813273,8813274,8813275,8813276,8813277,8813278,8813279,8813280,8813281,8813282,8813283,8813284,8815534}'::integer[]))
   -  Bitmap Index Scan on detail_summary_encounter_id_idx 
(cost=0.00..160.75 rows=9963 width=0)
 Index Cond: (encounter_id = ANY 

Re: [PERFORM] Strangely Variable Query Performance

2007-04-12 Thread Tom Lane
Steve [EMAIL PROTECTED] writes:
 On a Postgres 8.2.3 server, I've got a query that is running very slow in 
 some cases.

Could we see the exact definition of that table and its indexes?
It looks like the planner is missing the bitmap scan for some reason,
but I've not seen a case like that before.

Also, I assume the restriction on receipt date is very nonselective?
It doesn't seem to have changed the estimated rowcount much.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Strangely Variable Query Performance

2007-04-12 Thread Steve

Could we see the exact definition of that table and its indexes?
It looks like the planner is missing the bitmap scan for some reason,
but I've not seen a case like that before.

Also, I assume the restriction on receipt date is very nonselective?
It doesn't seem to have changed the estimated rowcount much.



	This is true -- This particular receipt date is actually quite 
meaningless.  It's equivalent to saying 'all receipt dates'.  I don't 
think there's even any data that goes back before 2005.


Here's the table and it's indexes.  Before looking, a note; there's 
several 'revop' indexes, this is for sorting.  The customer insisted on, 
frankly, meaninglessly complicated sorts.  I don't think any of that 
matters for our purposes here though :)


Column |  Type  | 
Modifiers

---++
 detailsummary_id  | integer| not null default 
nextval(('detailsummary_id_seq'::text)::regclass)

 detailgroup_id| integer|
 receipt   | date   |
 batchnum  | integer|
 encounternum  | integer|
 procedureseq  | integer|
 procedurecode | character varying(5)   |
 wrong_procedurecode   | character varying(5)   |
 batch_id  | integer|
 encounter_id  | integer|
 procedure_id  | integer|
 carrier_id| integer|
 product_line  | integer|
 provider_id   | integer|
 member_num| character varying(20)  |
 wrong_member_num  | character varying(20)  |
 member_name   | character varying(40)  |
 patient_control   | character varying(20)  |
 rendering_prov_id | character varying(15)  |
 rendering_prov_name   | character varying(30)  |
 referring_prov_id | character varying(15)  |
 referring_prov_name   | character varying(30)  |
 servicedate   | date   |
 wrong_servicedate | date   |
 diagnosis_codes   | character varying(5)[] |
 wrong_diagnosis_codes | character varying(5)[] |
 ffs_charge| double precision   |
 export_date   | date   |
 hedis_date| date   |
 raps_date | date   |
 diagnosis_pointers| character(1)[] |
 modifiers | character(2)[] |
 units | double precision   |
 pos   | character(2)   |
 isduplicate   | boolean|
 duplicate_id  | integer|
 encounter_corrected   | boolean|
 procedure_corrected   | boolean|
 numerrors | integer|
 encerrors_codes   | integer[]  |
 procerror_code| integer|
 error_servicedate | text   |
 e_duplicate_id| integer|
 ecode_counts  | integer[]  |
 p_record_status   | integer|
 e_record_status   | integer|
 e_delete_date | date   |
 p_delete_date | date   |
 b_record_status   | integer|
 b_confirmation| character varying(20)  |
 b_carrier_cobol_id| character varying(16)  |
 b_provider_cobol_id   | character varying(20)  |
 b_provider_tax_id | character varying(16)  |
 b_carrier_name| character varying(50)  |
 b_provider_name   | character varying(50)  |
 b_submitter_file_id   | character varying(40)  |
 e_hist_carrier_id | integer|
 p_hist_carrier_id | integer|
 e_duplicate_id_orig   | character varying(25)  |
 p_duplicate_id_orig   | character varying(25)  |
 num_procerrors| integer|
 num_encerrors | integer|
 export_id | integer|
 raps_id   | integer|
 hedis_id  | integer|
Indexes:
detail_summary_b_record_status_idx btree (b_record_status)
detail_summary_batch_id_idx btree (batch_id)
detail_summary_batchnum_idx btree (batchnum)
detail_summary_carrier_id_idx btree (carrier_id)
detail_summary_duplicate_id_idx btree (duplicate_id)
detail_summary_e_record_status_idx btree (e_record_status)
detail_summary_encounter_id_idx btree (encounter_id)
detail_summary_encounternum_idx btree (encounternum)
detail_summary_export_date_idx btree (export_date)
detail_summary_hedis_date_idx btree (hedis_date)
detail_summary_member_name_idx btree (member_name)
detail_summary_member_num_idx btree (member_num)
detail_summary_p_record_status_idx btree (p_record_status)
  

Re: [PERFORM] Strangely Variable Query Performance

2007-04-12 Thread Scott Marlowe
On Thu, 2007-04-12 at 16:03, Steve wrote:
 Hey there;
 
 On a Postgres 8.2.3 server, I've got a query that is running very slow in 
 some cases.  With some work, I've determined the 'slow part' of the query. 
 :)  This is a query on a table with like 10 million rows or something like 
 that.  encounter_id is an integer and receipt is of type 'date'.

SNIP

   Seq Scan on detail_summary ds  (cost=0.00..1902749.83 rows=9962 width=4)
 Filter: ((receipt = '1998-12-30'::date) AND (encounter_id = ANY 
 ('{8813186,8813187,8813188,8813189,8813190,8813191,8813192,8813193,8813194,8813195,8813196,8813197,8813198,8813199,8813200,8813201,8813202,8813203,8813204,8813205,8813206,8813207,8813208,8813209,8813210,8813211,8813212,8813213,8813214,8813215,8813216,8813217,8813218,8813219,8813220,8813221,8813222,8813223,8813224,8813225,8813226,8813227,8813228,8813229,8813230,8813231,8813232,8813233,8813234,8813235,8813236,8813237,8813238,8813239,8813240,8813241,8813242,8813243,8813244,8813245,8813246,8813247,8813248,8813249,8813250,8813251,8813252,8813253,8813254,8813255,8813256,8813257,8813258,8813259,8813260,8813261,8813262,8813263,8813264,8813265,8813266,8813267,8813268,8813269,8813270,8813271,8813272,8813273,8813274,8813275,8813276,8813277,8813278,8813279,8813280,8813281,8813282,8813283,8813284,8815534}'::integer[])))
 (2 rows)

How accurate is the row estimate made by the planner?  (explain analyze
to be sure)

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Slow Postgresql server

2007-04-12 Thread Jason Lustig

Hi all,

Wow! That's a lot to respond to. Let me go through some of the  
ideas... First, I just turned on autovacuum, I forgot to do that. I'm  
not seeing a major impact though. Also, I know that it's not optimal  
for a dedicated server. It's not just for postgres, it's also got our  
apache server on it. We're just getting started and didn't want to  
make the major investment right now in getting the most expensive  
server we can get. Within the next year, as our traffic grows, we  
will most likely upgrade, but for now when we're in the beginning  
phases of our project, we're going to work with this server.


In terms of RAID not helping speed-wise (only making an impact in  
data integrity) - I was under the impression that even a mirrored  
disk set improves speed, because read requests can be sent to either  
of the disk controllers. Is this incorrect?


I turned on logging again, only logging queries  5ms. and it caused  
the same problems. I think it might be an issue within the OS's  
logging facilities, since it's going through stderr.


Some of the queries are definitely making an impact on the speed. We  
are constantly trying to improve performance, and part of that is  
reassessing our indexes and denormalizing data where it would help.  
We're also doing work with memcached to cache the results of some of  
the more expensive operations.


Thanks for all your help guys - it's really fantastic to see the  
community here! I've got a lot of database experience (mostly with ms  
sql and mysql) but this is my first time doing serious work with  
postgres and it's really a great system with great people too.


Jason

On Apr 12, 2007, at 11:35 AM, Jeff Frost wrote:


On Thu, 12 Apr 2007, Jason Lustig wrote:


0 -- BM starts here
10  0180 700436  16420  9174000 0   176  278  2923  
59 41  0 0  0
11  0180 696736  16420  9174000 0 0  254  2904  
57 43  0 0  0
12  0180 691272  16420  9174000 0 0  255  3043  
60 39  1 0  0
9  0180 690396  16420  9174000 0 0  254  3078  
63 36  2  0 0


Obviously, I've turned off logging now but I'd like to get it  
running again (without bogging down the server) so that I can  
profile the system and find out which queries I need to optimize.  
My logging settings (with unnecessary comments taken out) were:


So what did you get in the logs when you had logging turned on?  If  
you have the statement logging, perhaps it's worth running through  
pgfouine to generate a report.




log_destination = 'syslog'# Valid values are  
combinations of
redirect_stderr = off   # Enable capturing of  
stderr into log
log_min_duration_statement =  0  # -1 is disabled, 0  
logs all statements
silent_mode = on# DO NOT USE without  
syslog or

log_duration = off
log_line_prefix = 'user=%u,db=%d'   # Special  
values:

log_statement = 'none'  # none, ddl, mod, all



Perhaps you just want to log slow queries  100ms?  But since you  
don't seem to know what queries you're running on each web page,  
I'd suggest you just turn on the following and run your benchmark  
against it, then turn it back off:


log_duration = on
log_statement = 'all'

Then go grab pgfouine and run the report against the logs to see  
what queries are chewing up all your time.


So you know, we're using Postgres 8.2.3. The database currently is  
pretty small (we're just running a testing database right now with  
a few megabytes of data). No doubt some of our queries are slow,  
but I was concerned because no matter how slow the queries were  
(at most the worst were taking a couple of msecs anyway), I was  
getting ridiculously slow responses from the server. Outside of  
logging, our only other non-default postgresql.conf items are:


shared_buffers = 13000  # min 128kB or  
max_connections*16kB

work_mem = 8096 # min 64kB

In terms of the server itself, I think that it uses software raid.  
How can I tell? Our hosting company set it up with the server so I  
guess I could ask them, but is there a program I can run which  
will tell me the information? I also ran bonnie++ and got this  
output:


Version  1.03   --Sequential Output-- --Sequential  
Input- --Random-
  -Per Chr- --Block-- -Rewrite- -Per Chr- -- 
Block-- --Seeks--
MachineSize K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec % 
CP  /sec %CP
pgtest 2000M 29277  67 33819  15 15446   4 35144  62 48887   5  
152.7   0
  --Sequential Create-- Random  
Create
  -Create-- --Read--- -Delete-- -Create-- -- 
Read--- -Delete--
files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec % 
CP  /sec %CP
   16 17886  77 + +++ + +++ 23258  99 + ++ 
+ + +++


So I'm getting 33MB and 48MB write/read respectively. Is this  

Re: [PERFORM] Strangely Variable Query Performance

2007-04-12 Thread Steve


Oy vey ... I hope this is a read-mostly table, because having that many
indexes has got to be killing your insert/update performance.


	Hahaha yeah these are read-only tables.  Nightly inserts/updates. 
Takes a few hours, depending on how many records (between 4 and 10 
usually).  But during the day, while querying, read only.



I see that some of the revop indexes might be considered relevant to
this query, so how exactly have you got those opclasses defined?
There's built-in support for reverse sort as of CVS HEAD, but in
existing releases you must have cobbled something together, and I wonder
if that could be a contributing factor ...


Here's the revops (the c functions are at the bottom):

CREATE FUNCTION ddd_date_revcmp(date, date) RETURNS integer
AS '/usr/local/pgsql/contrib/cmplib.so', 'ddd_date_revcmp'
LANGUAGE c STRICT;

CREATE FUNCTION ddd_int_revcmp(integer, integer) RETURNS integer
AS '/usr/local/pgsql/contrib/cmplib.so', 'ddd_int_revcmp'
LANGUAGE c STRICT;

CREATE FUNCTION ddd_text_revcmp(text, text) RETURNS integer
AS '/usr/local/pgsql/contrib/cmplib.so', 'ddd_text_revcmp'
LANGUAGE c STRICT;

CREATE OPERATOR CLASS date_revop
FOR TYPE date USING btree AS
OPERATOR 1 (date,date) ,
OPERATOR 2 =(date,date) ,
OPERATOR 3 =(date,date) ,
OPERATOR 4 =(date,date) ,
OPERATOR 5 (date,date) ,
FUNCTION 1 ddd_date_revcmp(date,date);

CREATE OPERATOR CLASS int4_revop
FOR TYPE integer USING btree AS
OPERATOR 1 (integer,integer) ,
OPERATOR 2 =(integer,integer) ,
OPERATOR 3 =(integer,integer) ,
OPERATOR 4 =(integer,integer) ,
OPERATOR 5 (integer,integer) ,
FUNCTION 1 ddd_int_revcmp(integer,integer);

CREATE OPERATOR CLASS text_revop
FOR TYPE text USING btree AS
OPERATOR 1 (text,text) ,
OPERATOR 2 =(text,text) ,
OPERATOR 3 =(text,text) ,
OPERATOR 4 =(text,text) ,
OPERATOR 5 (text,text) ,
FUNCTION 1 ddd_text_revcmp(text,text);

Datum   ddd_date_revcmp(PG_FUNCTION_ARGS){
DateADT arg1=PG_GETARG_DATEADT(0);
DateADT arg2=PG_GETARG_DATEADT(1);

PG_RETURN_INT32(arg2 - arg1);
}


Datum   ddd_int_revcmp(PG_FUNCTION_ARGS){
int32   arg1=PG_GETARG_INT32(0);
int32   arg2=PG_GETARG_INT32(1);

PG_RETURN_INT32(arg2 - arg1);
}

Datum   ddd_text_revcmp(PG_FUNCTION_ARGS){
char*   arg1=(char*)VARDATA(PG_GETARG_TEXT_P(0));
char*   arg2=(char*)VARDATA(PG_GETARG_TEXT_P(1));

if((*arg1) != (*arg2)){
PG_RETURN_INT32(*arg2 - *arg1);
}else{
PG_RETURN_INT32(strcmp(arg2,arg1));
}
}



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Strangely Variable Query Performance

2007-04-12 Thread Steve

  Seq Scan on detail_summary ds  (cost=0.00..1902749.83 rows=9962 width=4)
Filter: ((receipt = '1998-12-30'::date) AND (encounter_id = ANY
('{8813186,8813187,8813188,8813189,8813190,8813191,8813192,8813193,8813194,8813195,8813196,8813197,8813198,8813199,8813200,8813201,8813202,8813203,8813204,8813205,8813206,8813207,8813208,8813209,8813210,8813211,8813212,8813213,8813214,8813215,8813216,8813217,8813218,8813219,8813220,8813221,8813222,8813223,8813224,8813225,8813226,8813227,8813228,8813229,8813230,8813231,8813232,8813233,8813234,8813235,8813236,8813237,8813238,8813239,8813240,8813241,8813242,8813243,8813244,8813245,8813246,8813247,8813248,8813249,8813250,8813251,8813252,8813253,8813254,8813255,8813256,8813257,8813258,8813259,8813260,8813261,8813262,8813263,8813264,8813265,8813266,8813267,8813268,8813269,8813270,8813271,8813272,8813273,8813274,8813275,8813276,8813277,8813278,8813279,8813280,8813281,8813282,8813283,8813284,8815534}'::integer[])))
(2 rows)


How accurate is the row estimate made by the planner?  (explain analyze
to be sure)



Results:

 Seq Scan on detail_summary ds  (cost=0.00..1902749.83 rows=9962 width=4) 
(actual time=62871.386..257258.249 rows=112 loops=1)
   Filter: ((receipt = '1998-12-30'::date) AND (encounter_id = ANY 
('{8813186,8813187,8813188,8813189,8813190,8813191,8813192,8813193,8813194,8813195,8813196,8813197,8813198,8813199,8813200,8813201,8813202,8813203,8813204,8813205,8813206,8813207,8813208,8813209,8813210,8813211,8813212,8813213,8813214,8813215,8813216,8813217,8813218,8813219,8813220,8813221,8813222,8813223,8813224,8813225,8813226,8813227,8813228,8813229,8813230,8813231,8813232,8813233,8813234,8813235,8813236,8813237,8813238,8813239,8813240,8813241,8813242,8813243,8813244,8813245,8813246,8813247,8813248,8813249,8813250,8813251,8813252,8813253,8813254,8813255,8813256,8813257,8813258,8813259,8813260,8813261,8813262,8813263,8813264,8813265,8813266,8813267,8813268,8813269,8813270,8813271,8813272,8813273,8813274,8813275,8813276,8813277,8813278,8813279,8813280,8813281,8813282,8813283,8813284,8815534}'::integer[])))

 Total runtime: 257258.652 ms



---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [PERFORM] Strangely Variable Query Performance

2007-04-12 Thread Scott Marlowe
On Thu, 2007-04-12 at 17:04, Steve wrote:
Seq Scan on detail_summary ds  (cost=0.00..1902749.83 rows=9962 width=4)
  Filter: ((receipt = '1998-12-30'::date) AND (encounter_id = ANY
  ('{8813186,8813187,8813188,8813189,8813190,8813191,8813192,8813193,8813194,8813195,8813196,8813197,8813198,8813199,8813200,8813201,8813202,8813203,8813204,8813205,8813206,8813207,8813208,8813209,8813210,8813211,8813212,8813213,8813214,8813215,8813216,8813217,8813218,8813219,8813220,8813221,8813222,8813223,8813224,8813225,8813226,8813227,8813228,8813229,8813230,8813231,8813232,8813233,8813234,8813235,8813236,8813237,8813238,8813239,8813240,8813241,8813242,8813243,8813244,8813245,8813246,8813247,8813248,8813249,8813250,8813251,8813252,8813253,8813254,8813255,8813256,8813257,8813258,8813259,8813260,8813261,8813262,8813263,8813264,8813265,8813266,8813267,8813268,8813269,8813270,8813271,8813272,8813273,8813274,8813275,8813276,8813277,8813278,8813279,8813280,8813281,8813282,8813283,8813284,8815534}'::integer[])))
  (2 rows)
 
  How accurate is the row estimate made by the planner?  (explain analyze
  to be sure)
 
 
 Results:
 
   Seq Scan on detail_summary ds  (cost=0.00..1902749.83 rows=9962 width=4) 
 (actual time=62871.386..257258.249 rows=112 loops=1)
 Filter: ((receipt = '1998-12-30'::date) AND (encounter_id = ANY 
 ('{8813186,8813187,8813188,8813189,8813190,8813191,8813192,8813193,8813194,8813195,8813196,8813197,8813198,8813199,8813200,8813201,8813202,8813203,8813204,8813205,8813206,8813207,8813208,8813209,8813210,8813211,8813212,8813213,8813214,8813215,8813216,8813217,8813218,8813219,8813220,8813221,8813222,8813223,8813224,8813225,8813226,8813227,8813228,8813229,8813230,8813231,8813232,8813233,8813234,8813235,8813236,8813237,8813238,8813239,8813240,8813241,8813242,8813243,8813244,8813245,8813246,8813247,8813248,8813249,8813250,8813251,8813252,8813253,8813254,8813255,8813256,8813257,8813258,8813259,8813260,8813261,8813262,8813263,8813264,8813265,8813266,8813267,8813268,8813269,8813270,8813271,8813272,8813273,8813274,8813275,8813276,8813277,8813278,8813279,8813280,8813281,8813282,8813283,8813284,8815534}'::integer[])))
   Total runtime: 257258.652 ms

So there's a misjudgment of the number of rows returned by a factor of
about 88.  That's pretty big.  Since you had the same number without the
receipt date (I think...) then it's the encounter_id that's not being
counted right.

Try upping the stats target on that column and running analyze again and
see if you get closer to 112 in your analyze or not.

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Slow Postgresql server

2007-04-12 Thread Guido Neitzer

On 12.04.2007, at 15:58, Jason Lustig wrote:

Wow! That's a lot to respond to. Let me go through some of the  
ideas... First, I just turned on autovacuum, I forgot to do that.  
I'm not seeing a major impact though. Also, I know that it's not  
optimal for a dedicated server.


Hmm, why not? Have you recently vacuumed your db manually so it gets  
cleaned up? Even a vacuum full might be useful if the db is really  
bloated.


It's not just for postgres, it's also got our apache server on it.  
We're just getting started and didn't want to make the major  
investment right now in getting the most expensive server we can get


Hmmm, but more RAM would definitely make sense, especially in that  
szenaria. It really sounds like you machine is swapping to dead.


What does the system say about memory usage?

Some of the queries are definitely making an impact on the speed.  
We are constantly trying to improve performance, and part of that  
is reassessing our indexes and denormalizing data where it would  
help. We're also doing work with memcached to cache the results of  
some of the more expensive operations.


Hmmm, that kills you even more, as it uses RAM. I really don't think  
at the moment that it has something to do with PG itself, but with  
not enough memory for what you want to achieve.


What perhaps helps might be connection pooling, so that not so many  
processes are created for the requests. It depends on your middle- 
ware what you can do about that. pg_pool might be an option.


cug



---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Strangely Variable Query Performance

2007-04-12 Thread Tom Lane
Steve [EMAIL PROTECTED] writes:
 Datum   ddd_text_revcmp(PG_FUNCTION_ARGS){
  char*   arg1=(char*)VARDATA(PG_GETARG_TEXT_P(0));
  char*   arg2=(char*)VARDATA(PG_GETARG_TEXT_P(1));

  if((*arg1) != (*arg2)){
  PG_RETURN_INT32(*arg2 - *arg1);
  }else{
  PG_RETURN_INT32(strcmp(arg2,arg1));
  }
 }

[ itch... ]  That code is just completely wrong, because the contents
of a TEXT datum aren't guaranteed null-terminated.  It'd be better to
invoke bttextcmp and negate its result.

That's not relevant to your immediate problem, but if you've noticed
any strange behavior with your text_revop indexes, that's the reason...

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Strangely Variable Query Performance

2007-04-12 Thread Steve

[ itch... ]  That code is just completely wrong, because the contents
of a TEXT datum aren't guaranteed null-terminated.  It'd be better to
invoke bttextcmp and negate its result.

That's not relevant to your immediate problem, but if you've noticed
any strange behavior with your text_revop indexes, that's the reason...


	The indexes have all worked, though I'll make the change anyway. 
Documentation on how to code these things is pretty sketchy and I believe 
I followed an example on the site if I remember right. :/


Thanks for the info though :)


Steve

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] Strangely Variable Query Performance

2007-04-12 Thread Steve


So there's a misjudgment of the number of rows returned by a factor of
about 88.  That's pretty big.  Since you had the same number without the
receipt date (I think...) then it's the encounter_id that's not being
counted right.

Try upping the stats target on that column and running analyze again and
see if you get closer to 112 in your analyze or not.



If I max the statistics targets at 1000, I get:

 Seq Scan on detail_summary ds  (cost=0.00..1903030.26 rows=1099 width=4)
   Filter: ((receipt = '1998-12-30'::date) AND (encounter_id = ANY 
('{8813186,8813187,8813188,8813189,8813190,8813191,8813192,8813193,8813194,8813195,8813196,8813197,8813198,8813199,8813200,8813201,8813202,8813203,8813204,8813205,8813206,8813207,8813208,8813209,8813210,8813211,8813212,8813213,8813214,8813215,8813216,8813217,8813218,8813219,8813220,8813221,8813222,8813223,8813224,8813225,8813226,8813227,8813228,8813229,8813230,8813231,8813232,8813233,8813234,8813235,8813236,8813237,8813238,8813239,8813240,8813241,8813242,8813243,8813244,8813245,8813246,8813247,8813248,8813249,8813250,8813251,8813252,8813253,8813254,8813255,8813256,8813257,8813258,8813259,8813260,8813261,8813262,8813263,8813264,8813265,8813266,8813267,8813268,8813269,8813270,8813271,8813272,8813273,8813274,8813275,8813276,8813277,8813278,8813279,8813280,8813281,8813282,8813283,8813284,8815534}'::integer[])))



	Setting it ot 500 makes the estimated rows twice as much.  It 
seems to have no effect on anything though, either way. :)



Steve

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Strangely Variable Query Performance

2007-04-12 Thread Tom Lane
Scott Marlowe [EMAIL PROTECTED] writes:
 So there's a misjudgment of the number of rows returned by a factor of
 about 88.  That's pretty big.  Since you had the same number without the
 receipt date (I think...) then it's the encounter_id that's not being
 counted right.

I don't think that's Steve's problem, though.  It's certainly
misestimating, but nonetheless the cost estimate for the seqscan is
1902749.83 versus 14819.81 for the bitmap scan; it should've picked
the bitmap scan anyway.

I tried to duplicate the problem here, without any success; I get

Bitmap Heap Scan on detail_summary ds  (cost=422.01..801.27 rows=100 width=4)
   Recheck Cond: (encounter_id = ANY 
('{8813186,8813187,8813188,8813189,8813190,8813191,8813192,8813193,8813194,8813195,8813196,8813197,8813198,8813199,8813200,8813201,8813202,8813203,8813204,8813205,8813206,8813207,8813208,8813209,8813210,8813211,8813212,8813213,8813214,8813215,8813216,8813217,8813218,8813219,8813220,8813221,8813222,8813223,8813224,8813225,8813226,8813227,8813228,8813229,8813230,8813231,8813232,8813233,8813234,8813235,8813236,8813237,8813238,8813239,8813240,8813241,8813242,8813243,8813244,8813245,8813246,8813247,8813248,8813249,8813250,8813251,8813252,8813253,8813254,8813255,8813256,8813257,8813258,8813259,8813260,8813261,8813262,8813263,8813264,8813265,8813266,8813267,8813268,8813269,8813270,8813271,8813272,8813273,8813274,8813275,8813276,8813277,8813278,8813279,8813280,8813281,8813282,8813283,8813284,8815534}'::integer[]))
   Filter: (receipt = '1998-12-30'::date)
   -  Bitmap Index Scan on detail_summary_encounter_id_idx  (cost=0.00..421.98 
rows=100 width=0)
 Index Cond: (encounter_id = ANY 
('{8813186,8813187,8813188,8813189,8813190,8813191,8813192,8813193,8813194,8813195,8813196,8813197,8813198,8813199,8813200,8813201,8813202,8813203,8813204,8813205,8813206,8813207,8813208,8813209,8813210,8813211,8813212,8813213,8813214,8813215,8813216,8813217,8813218,8813219,8813220,8813221,8813222,8813223,8813224,8813225,8813226,8813227,8813228,8813229,8813230,8813231,8813232,8813233,8813234,8813235,8813236,8813237,8813238,8813239,8813240,8813241,8813242,8813243,8813244,8813245,8813246,8813247,8813248,8813249,8813250,8813251,8813252,8813253,8813254,8813255,8813256,8813257,8813258,8813259,8813260,8813261,8813262,8813263,8813264,8813265,8813266,8813267,8813268,8813269,8813270,8813271,8813272,8813273,8813274,8813275,8813276,8813277,8813278,8813279,8813280,8813281,8813282,8813283,8813284,8815534}'::integer[]))

so either this has been fixed by a post-8.2.3 bug fix (which I doubt,
it doesn't seem familiar at all) or there's some additional contributing
factor.  Steve, are you using any nondefault planner parameters?

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] Strangely Variable Query Performance

2007-04-12 Thread Steve

Here's my planner parameters:

seq_page_cost = 1.0 # measured on an arbitrary scale
random_page_cost = 1.5  # same scale as above
cpu_tuple_cost = 0.001  # same scale as above
cpu_index_tuple_cost = 0.0005   # same scale as above
cpu_operator_cost = 0.00025 # same scale as above
effective_cache_size = 8192MB

default_statistics_target = 100 # range 1-1000


On a machine with 16 gigs of RAM.  I tried to make it skew towards 
indexes.  However, even if I force it to use the indexes 
(enable_seqscan=off) it doesn't make it any faster really :/


Steve

On Thu, 12 Apr 2007, Tom Lane wrote:


Scott Marlowe [EMAIL PROTECTED] writes:

So there's a misjudgment of the number of rows returned by a factor of
about 88.  That's pretty big.  Since you had the same number without the
receipt date (I think...) then it's the encounter_id that's not being
counted right.


I don't think that's Steve's problem, though.  It's certainly
misestimating, but nonetheless the cost estimate for the seqscan is
1902749.83 versus 14819.81 for the bitmap scan; it should've picked
the bitmap scan anyway.

I tried to duplicate the problem here, without any success; I get

Bitmap Heap Scan on detail_summary ds  (cost=422.01..801.27 rows=100 width=4)
  Recheck Cond: (encounter_id = ANY 
('{8813186,8813187,8813188,8813189,8813190,8813191,8813192,8813193,8813194,8813195,8813196,8813197,8813198,8813199,8813200,8813201,8813202,8813203,8813204,8813205,8813206,8813207,8813208,8813209,8813210,8813211,8813212,8813213,8813214,8813215,8813216,8813217,8813218,8813219,8813220,8813221,8813222,8813223,8813224,8813225,8813226,8813227,8813228,8813229,8813230,8813231,8813232,8813233,8813234,8813235,8813236,8813237,8813238,8813239,8813240,8813241,8813242,8813243,8813244,8813245,8813246,8813247,8813248,8813249,8813250,8813251,8813252,8813253,8813254,8813255,8813256,8813257,8813258,8813259,8813260,8813261,8813262,8813263,8813264,8813265,8813266,8813267,8813268,8813269,8813270,8813271,8813272,8813273,8813274,8813275,8813276,8813277,8813278,8813279,8813280,8813281,8813282,8813283,8813284,8815534}'::integer[]))
  Filter: (receipt = '1998-12-30'::date)
  -  Bitmap Index Scan on detail_summary_encounter_id_idx  (cost=0.00..421.98 
rows=100 width=0)
Index Cond: (encounter_id = ANY 
('{8813186,8813187,8813188,8813189,8813190,8813191,8813192,8813193,8813194,8813195,8813196,8813197,8813198,8813199,8813200,8813201,8813202,8813203,8813204,8813205,8813206,8813207,8813208,8813209,8813210,8813211,8813212,8813213,8813214,8813215,8813216,8813217,8813218,8813219,8813220,8813221,8813222,8813223,8813224,8813225,8813226,8813227,8813228,8813229,8813230,8813231,8813232,8813233,8813234,8813235,8813236,8813237,8813238,8813239,8813240,8813241,8813242,8813243,8813244,8813245,8813246,8813247,8813248,8813249,8813250,8813251,8813252,8813253,8813254,8813255,8813256,8813257,8813258,8813259,8813260,8813261,8813262,8813263,8813264,8813265,8813266,8813267,8813268,8813269,8813270,8813271,8813272,8813273,8813274,8813275,8813276,8813277,8813278,8813279,8813280,8813281,8813282,8813283,8813284,8815534}'::integer[]))

so either this has been fixed by a post-8.2.3 bug fix (which I doubt,
it doesn't seem familiar at all) or there's some additional contributing
factor.  Steve, are you using any nondefault planner parameters?

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match



---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Strangely Variable Query Performance

2007-04-12 Thread Tom Lane
Steve [EMAIL PROTECTED] writes:
 Here's my planner parameters:

I copied all these, and my 8.2.x still likes the bitmap scan a lot
better than the seqscan.  Furthermore, I double-checked the CVS history
and there definitely haven't been any changes in that area in REL8_2
branch since 8.2.3.  So I'm a bit baffled.  Maybe the misbehavior is
platform-specific ... what are you on exactly?  Is there anything
nonstandard about your Postgres installation?

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Strangely Variable Query Performance

2007-04-12 Thread Tom Lane
Steve [EMAIL PROTECTED] writes:
 ... even if I force it to use the indexes 
 (enable_seqscan=off) it doesn't make it any faster really :/

Does that change the plan, or do you still get a seqscan?

BTW, how big is this table really (how many rows)?

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Strangely Variable Query Performance

2007-04-12 Thread Steve

It's a redhat enterprise machine running AMD x64 processors.

Linux ers3.dddcorp.com 2.6.9-42.0.10.ELsmp #1 SMP Fri Feb 16 17:13:42 EST 
2007 x86_64 x86_64 x86_64 GNU/Linux


It was compiled by me, straight up, nothing weird at all, no odd compiler 
options or wahtever :)


So yeah :/ I'm quite baffled as well,

Talk to you later,

Steve


On Thu, 12 Apr 2007, Tom Lane wrote:


Steve [EMAIL PROTECTED] writes:

Here's my planner parameters:


I copied all these, and my 8.2.x still likes the bitmap scan a lot
better than the seqscan.  Furthermore, I double-checked the CVS history
and there definitely haven't been any changes in that area in REL8_2
branch since 8.2.3.  So I'm a bit baffled.  Maybe the misbehavior is
platform-specific ... what are you on exactly?  Is there anything
nonstandard about your Postgres installation?

regards, tom lane



---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [PERFORM] Strangely Variable Query Performance

2007-04-12 Thread Steve

Table size: 16,037,728 rows

With enable_seqscan=off I get:

 Bitmap Heap Scan on detail_summary ds  (cost=4211395.20..4213045.32 
rows=1099 width=4)
   Recheck Cond: ((receipt = '1998-12-30'::date) AND (encounter_id = ANY 
('{8813186,8813187,8813188,8813189,8813190,8813191,8813192,8813193,8813194,8813195,8813196,8813197,8813198,8813199,8813200,8813201,8813202,8813203,8813204,8813205,8813206,8813207,8813208,8813209,8813210,8813211,8813212,8813213,8813214,8813215,8813216,8813217,8813218,8813219,8813220,8813221,8813222,8813223,8813224,8813225,8813226,8813227,8813228,8813229,8813230,8813231,8813232,8813233,8813234,8813235,8813236,8813237,8813238,8813239,8813240,8813241,8813242,8813243,8813244,8813245,8813246,8813247,8813248,8813249,8813250,8813251,8813252,8813253,8813254,8813255,8813256,8813257,8813258,8813259,8813260,8813261,8813262,8813263,8813264,8813265,8813266,8813267,8813268,8813269,8813270,8813271,8813272,8813273,8813274,8813275,8813276,8813277,8813278,8813279,8813280,8813281,8813282,8813283,8813284,8815534}'::integer[])))
   -  Bitmap Index Scan on detail_summary_receipt_encounter_idx 
(cost=0.00..4211395.17 rows=1099 width=0)
 Index Cond: ((receipt = '1998-12-30'::date) AND (encounter_id = 
ANY 
('{8813186,8813187,8813188,8813189,8813190,8813191,8813192,8813193,8813194,8813195,8813196,8813197,8813198,8813199,8813200,8813201,8813202,8813203,8813204,8813205,8813206,8813207,8813208,8813209,8813210,8813211,8813212,8813213,8813214,8813215,8813216,8813217,8813218,8813219,8813220,8813221,8813222,8813223,8813224,8813225,8813226,8813227,8813228,8813229,8813230,8813231,8813232,8813233,8813234,8813235,8813236,8813237,8813238,8813239,8813240,8813241,8813242,8813243,8813244,8813245,8813246,8813247,8813248,8813249,8813250,8813251,8813252,8813253,8813254,8813255,8813256,8813257,8813258,8813259,8813260,8813261,8813262,8813263,8813264,8813265,8813266,8813267,8813268,8813269,8813270,8813271,8813272,8813273,8813274,8813275,8813276,8813277,8813278,8813279,8813280,8813281,8813282,8813283,8813284,8815534}'::integer[])))



The explain analyze is pending, running it now (however it doens't really 
appear to be any faster using this plan).



Steve

On Thu, 12 Apr 2007, Tom Lane wrote:


Steve [EMAIL PROTECTED] writes:

... even if I force it to use the indexes
(enable_seqscan=off) it doesn't make it any faster really :/


Does that change the plan, or do you still get a seqscan?

BTW, how big is this table really (how many rows)?

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] Strangely Variable Query Performance

2007-04-12 Thread Steve

Here's the explain analyze with seqscan = off:

 Bitmap Heap Scan on detail_summary ds  (cost=4211395.20..4213045.32 
rows=1099 width=4) (actual time=121288.825..121305.908 rows=112 loops=1)
   Recheck Cond: ((receipt = '1998-12-30'::date) AND (encounter_id = ANY 
('{8813186,8813187,8813188,8813189,8813190,8813191,8813192,8813193,8813194,8813195,8813196,8813197,8813198,8813199,8813200,8813201,8813202,8813203,8813204,8813205,8813206,8813207,8813208,8813209,8813210,8813211,8813212,8813213,8813214,8813215,8813216,8813217,8813218,8813219,8813220,8813221,8813222,8813223,8813224,8813225,8813226,8813227,8813228,8813229,8813230,8813231,8813232,8813233,8813234,8813235,8813236,8813237,8813238,8813239,8813240,8813241,8813242,8813243,8813244,8813245,8813246,8813247,8813248,8813249,8813250,8813251,8813252,8813253,8813254,8813255,8813256,8813257,8813258,8813259,8813260,8813261,8813262,8813263,8813264,8813265,8813266,8813267,8813268,8813269,8813270,8813271,8813272,8813273,8813274,8813275,8813276,8813277,8813278,8813279,8813280,8813281,8813282,8813283,8813284,8815534}'::integer[])))
   -  Bitmap Index Scan on detail_summary_receipt_encounter_idx 
(cost=0.00..4211395.17 rows=1099 width=0) (actual 
time=121256.681..121256.681 rows=112 loops=1)
 Index Cond: ((receipt = '1998-12-30'::date) AND (encounter_id = 
ANY 
('{8813186,8813187,8813188,8813189,8813190,8813191,8813192,8813193,8813194,8813195,8813196,8813197,8813198,8813199,8813200,8813201,8813202,8813203,8813204,8813205,8813206,8813207,8813208,8813209,8813210,8813211,8813212,8813213,8813214,8813215,8813216,8813217,8813218,8813219,8813220,8813221,8813222,8813223,8813224,8813225,8813226,8813227,8813228,8813229,8813230,8813231,8813232,8813233,8813234,8813235,8813236,8813237,8813238,8813239,8813240,8813241,8813242,8813243,8813244,8813245,8813246,8813247,8813248,8813249,8813250,8813251,8813252,8813253,8813254,8813255,8813256,8813257,8813258,8813259,8813260,8813261,8813262,8813263,8813264,8813265,8813266,8813267,8813268,8813269,8813270,8813271,8813272,8813273,8813274,8813275,8813276,8813277,8813278,8813279,8813280,8813281,8813282,8813283,8813284,8815534}'::integer[])))

 Total runtime: 121306.233 ms


Your other question is answered in the other mail along with the 
non-analyze'd query plan :D


Steve

On Thu, 12 Apr 2007, Tom Lane wrote:


Steve [EMAIL PROTECTED] writes:

... even if I force it to use the indexes
(enable_seqscan=off) it doesn't make it any faster really :/


Does that change the plan, or do you still get a seqscan?

BTW, how big is this table really (how many rows)?

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org



---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Strangely Variable Query Performance

2007-04-12 Thread Tom Lane
Steve [EMAIL PROTECTED] writes:
 With enable_seqscan=off I get:

 -  Bitmap Index Scan on detail_summary_receipt_encounter_idx 
 (cost=0.00..4211395.17 rows=1099 width=0)
   Index Cond: ((receipt = '1998-12-30'::date) AND (encounter_id = 
 ANY ...

 The explain analyze is pending, running it now (however it doens't really 
 appear to be any faster using this plan).

Yeah, that index is nearly useless for this query --- since the receipt
condition isn't really eliminating anything, it'll have to look at every
index entry :-( ... in fact, do so again for each of the IN arms :-( :-(
So it's definitely right not to want to use that plan.  Question is, why
is it seemingly failing to consider the right index?

I'm busy setting up my test case on an x86_64 machine right now, but
I rather fear it'll still work just fine for me.  Have you got any
nondefault parameter settings besides the ones you already mentioned?

Another thing that might be interesting, if you haven't got a problem
with exclusive-locking the table for a little bit, is

BEGIN;
DROP INDEX each index except detail_summary_encounter_id_idx
EXPLAIN the problem query
ROLLBACK;

just to see if it does the right thing when it's not distracted by
all the wrong indexes (yeah, I'm grasping at straws here).  If you
set up the above as a SQL script it should only take a second to run.
Please try this with both settings of enable_seqscan --- you don't need
to do explain analyze though, we just want to know which plan it picks
and what the cost estimate is.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Strangely Variable Query Performance

2007-04-12 Thread Steve

If the other indexes are removed, with enable_seqscan=on:

 Bitmap Heap Scan on detail_summary ds  (cost=154.10..1804.22 rows=1099 
width=4)
   Recheck Cond: (encounter_id = ANY 
('{8813186,8813187,8813188,8813189,8813190,8813191,8813192,8813193,8813194,8813195,8813196,8813197,8813198,8813199,8813200,8813201,8813202,8813203,8813204,8813205,8813206,8813207,8813208,8813209,8813210,8813211,8813212,8813213,8813214,8813215,8813216,8813217,8813218,8813219,8813220,8813221,8813222,8813223,8813224,8813225,8813226,8813227,8813228,8813229,8813230,8813231,8813232,8813233,8813234,8813235,8813236,8813237,8813238,8813239,8813240,8813241,8813242,8813243,8813244,8813245,8813246,8813247,8813248,8813249,8813250,8813251,8813252,8813253,8813254,8813255,8813256,8813257,8813258,8813259,8813260,8813261,8813262,8813263,8813264,8813265,8813266,8813267,8813268,8813269,8813270,8813271,8813272,8813273,8813274,8813275,8813276,8813277,8813278,8813279,8813280,8813281,8813282,8813283,8813284,8815534}'::integer[]))

   Filter: (receipt = '1998-12-30'::date)
   -  Bitmap Index Scan on detail_summary_encounter_id_idx 
(cost=0.00..154.07 rows=1099 width=0)
 Index Cond: (encounter_id = ANY 
('{8813186,8813187,8813188,8813189,8813190,8813191,8813192,8813193,8813194,8813195,8813196,8813197,8813198,8813199,8813200,8813201,8813202,8813203,8813204,8813205,8813206,8813207,8813208,8813209,8813210,8813211,8813212,8813213,8813214,8813215,8813216,8813217,8813218,8813219,8813220,8813221,8813222,8813223,8813224,8813225,8813226,8813227,8813228,8813229,8813230,8813231,8813232,8813233,8813234,8813235,8813236,8813237,8813238,8813239,8813240,8813241,8813242,8813243,8813244,8813245,8813246,8813247,8813248,8813249,8813250,8813251,8813252,8813253,8813254,8813255,8813256,8813257,8813258,8813259,8813260,8813261,8813262,8813263,8813264,8813265,8813266,8813267,8813268,8813269,8813270,8813271,8813272,8813273,8813274,8813275,8813276,8813277,8813278,8813279,8813280,8813281,8813282,8813283,8813284,8815534}'::integer[]))



With it off:

 Bitmap Heap Scan on detail_summary ds  (cost=154.10..1804.22 rows=1099 
width=4)
   Recheck Cond: (encounter_id = ANY 
('{8813186,8813187,8813188,8813189,8813190,8813191,8813192,8813193,8813194,8813195,8813196,8813197,8813198,8813199,8813200,8813201,8813202,8813203,8813204,8813205,8813206,8813207,8813208,8813209,8813210,8813211,8813212,8813213,8813214,8813215,8813216,8813217,8813218,8813219,8813220,8813221,8813222,8813223,8813224,8813225,8813226,8813227,8813228,8813229,8813230,8813231,8813232,8813233,8813234,8813235,8813236,8813237,8813238,8813239,8813240,8813241,8813242,8813243,8813244,8813245,8813246,8813247,8813248,8813249,8813250,8813251,8813252,8813253,8813254,8813255,8813256,8813257,8813258,8813259,8813260,8813261,8813262,8813263,8813264,8813265,8813266,8813267,8813268,8813269,8813270,8813271,8813272,8813273,8813274,8813275,8813276,8813277,8813278,8813279,8813280,8813281,8813282,8813283,8813284,8815534}'::integer[]))

   Filter: (receipt = '1998-12-30'::date)
   -  Bitmap Index Scan on detail_summary_encounter_id_idx 
(cost=0.00..154.07 rows=1099 width=0)
 Index Cond: (encounter_id = ANY 
('{8813186,8813187,8813188,8813189,8813190,8813191,8813192,8813193,8813194,8813195,8813196,8813197,8813198,8813199,8813200,8813201,8813202,8813203,8813204,8813205,8813206,8813207,8813208,8813209,8813210,8813211,8813212,8813213,8813214,8813215,8813216,8813217,8813218,8813219,8813220,8813221,8813222,8813223,8813224,8813225,8813226,8813227,8813228,8813229,8813230,8813231,8813232,8813233,8813234,8813235,8813236,8813237,8813238,8813239,8813240,8813241,8813242,8813243,8813244,8813245,8813246,8813247,8813248,8813249,8813250,8813251,8813252,8813253,8813254,8813255,8813256,8813257,8813258,8813259,8813260,8813261,8813262,8813263,8813264,8813265,8813266,8813267,8813268,8813269,8813270,8813271,8813272,8813273,8813274,8813275,8813276,8813277,8813278,8813279,8813280,8813281,8813282,8813283,8813284,8815534}'::integer[]))



Either way, it runs perfectly fast.  So it looks like the indexes are 
confusing this query like you suspected.  Any advise?  This isn't the only 
query we run on this table, much as I'd absolutely love to kill off some 
indexes to imrpove our nightly load times I can't foul up the other 
queries :)



Thank you very much for all your help on this issue, too!

Steve

On Thu, 12 Apr 2007, Tom Lane wrote:


Steve [EMAIL PROTECTED] writes:

With enable_seqscan=off I get:



-  Bitmap Index Scan on detail_summary_receipt_encounter_idx
(cost=0.00..4211395.17 rows=1099 width=0)
  Index Cond: ((receipt = '1998-12-30'::date) AND (encounter_id =
ANY ...



The explain analyze is pending, running it now (however it doens't really
appear to be any faster using this plan).


Yeah, that index is nearly useless for this query --- since the receipt
condition isn't really eliminating anything, it'll have to look at every
index entry :-( ... in fact, do so again for each of the IN arms :-( :-(
So it's definitely right not to want 

Re: [PERFORM] Strangely Variable Query Performance

2007-04-12 Thread Tom Lane
Steve [EMAIL PROTECTED] writes:
 Either way, it runs perfectly fast.  So it looks like the indexes are 
 confusing this query like you suspected.  Any advise?

Wow --- sometimes grasping at straws pays off.  I was testing here with
just a subset of the indexes to save build time, but I bet that one of
the irrelevant ones is affecting this somehow.  Time to re-test.

If you have some time to kill, it might be interesting to vary that
begin/rollback test script to leave one or two other indexes in place,
and see if you can identify exactly which other index(es) get it
confused.

I'm about to go out to dinner with the wife, but will have a closer
look when I get back, or tomorrow morning.  We'll figure this out.

regards, tom lane

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [PERFORM] Strangely Variable Query Performance

2007-04-12 Thread Steve

Okay -- I started leaving indexes on one by one.

The explain broke when the detail_summary_receipt_encounter_idx index was 
left on (receipt, encounter_id).


Just dropping that index had no effect, but there's a LOT of indexes that 
refer to receipt.  So on a hunch I tried dropping all indexes that refer 
to receipt date and that worked -- so it's the indexes that contain 
receipt date that are teh problem.


For more fun, I tried leaving the index that's just receipt date alone 
(index detail_summary_receipt_id_idx) and THAT produced the correct query; 
it's all these multi-column queries that are fouling things up, it would 
seem!



 So does this mean I should experiment with dropping those indexes? 
I'm not sure if that will result in 'bad things' as there are other 
complicated actions like sorts that may go real slow if I drop those 
indexes.  BUT I think it'll be easy to convince the customer to drop their 
absurdly complicated sorts if I can come back with serious results like 
what we've worked out here.



And thanks again -- have a good dinner! :)

Steve


On Thu, 12 Apr 2007, Tom Lane wrote:


Steve [EMAIL PROTECTED] writes:

Either way, it runs perfectly fast.  So it looks like the indexes are
confusing this query like you suspected.  Any advise?


Wow --- sometimes grasping at straws pays off.  I was testing here with
just a subset of the indexes to save build time, but I bet that one of
the irrelevant ones is affecting this somehow.  Time to re-test.

If you have some time to kill, it might be interesting to vary that
begin/rollback test script to leave one or two other indexes in place,
and see if you can identify exactly which other index(es) get it
confused.

I'm about to go out to dinner with the wife, but will have a closer
look when I get back, or tomorrow morning.  We'll figure this out.

regards, tom lane

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Strangely Variable Query Performance

2007-04-12 Thread Tom Lane
Steve [EMAIL PROTECTED] writes:
 Okay -- I started leaving indexes on one by one.
 ...
  So does this mean I should experiment with dropping those indexes? 

No, I think this means there's a planner bug to fix.  I haven't quite
scoped out what it is yet, though.

regards, tom lane

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [PERFORM] Question about memory allocations

2007-04-12 Thread Greg Smith

On Tue, 10 Apr 2007, Steve wrote:

- I've set up a configuration (I'll show important values below), and Im 
wondering if there's any way I can actually see the distribution of memory in 
the DB and how the memory is being used.


I didn't notice anyone address this for you yet.  There is a tool in 
contrib/pg_buffercache whose purpose in life is to show you what the 
shared buffer cache has inside it.  The documentation in that directory 
leads through installing it.  The additional variable you'll likely never 
know is what additional information is inside the operating system's 
buffer cache.


# Leaving this low makes the DB complain, but I'm not sure what's # 
reasonable.

checkpoint_segments = 128


That's a reasonable setting for a large server.  The main downside to 
setting it that high is longer recovery periods after a crash, but I doubt 
that's a problem for you if you're so brazen as to turn off fsync.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Strangely Variable Query Performance

2007-04-12 Thread Tom Lane
Steve [EMAIL PROTECTED] writes:
 Just dropping that index had no effect, but there's a LOT of indexes that 
 refer to receipt.  So on a hunch I tried dropping all indexes that refer 
 to receipt date and that worked -- so it's the indexes that contain 
 receipt date that are teh problem.

I'm still not having any luck reproducing the failure here.  Grasping at
straws again, I wonder if it's got something to do with the order in
which the planner examines the indexes --- which is OID order.  Could
you send me the results of 

select indexrelid::regclass from pg_index where indrelid = 
'detail_summary'::regclass order by indexrelid;

regards, tom lane

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [PERFORM] Strangely Variable Query Performance

2007-04-12 Thread Steve

Here you go:

 detail_summary_b_record_status_idx
 detail_summary_batch_id_idx
 detail_summary_batchnum_idx
 detail_summary_carrier_id_idx
 detail_summary_duplicate_id_idx
 detail_summary_e_record_status_idx
 detail_summary_encounter_id_idx
 detail_summary_encounternum_idx
 detail_summary_export_date_idx
 detail_summary_hedis_date_idx
 detail_summary_member_name_idx
 detail_summary_member_num_idx
 detail_summary_p_record_status_idx
 detail_summary_patient_control_idx
 detail_summary_procedurecode_idx
 detail_summary_product_line_idx
 detail_summary_provider_id_idx
 detail_summary_raps_date_idx
 detail_summary_receipt_id_idx
 detail_summary_referrering_prov_id_idx
 detail_summary_rendering_prov_id_idx
 detail_summary_rendering_prov_name_idx
 detail_summary_servicedate_idx
 ds_sort_1
 ds_sort_10
 ed_cbee_norev
 ed_cbee_norev_p
 ed_cbee_rev
 ed_cbee_rev_p
 mcbe
 mcbe_p
 mcbe_rev
 mcbe_rev_p
 mcbee_norev
 mcbee_norev_p
 mcbee_rev
 mcbee_rev_p
 pcbee_norev
 pcbee_norev_p
 pcbee_rev
 pcbee_rev_p
 rcbee_norev
 rcbee_norev_p
 rp_cbee_norev
 rp_cbee_norev_p
 rp_cbee_rev
 rp_cbee_rev_p
 sd_cbee_norev
 sd_cbee_norev_p
 sd_cbee_rev
 sd_cbee_rev_p
 testrev
 testrev_p
 detail_summary_receipt_encounter_idx


On Thu, 12 Apr 2007, Tom Lane wrote:


Steve [EMAIL PROTECTED] writes:

Just dropping that index had no effect, but there's a LOT of indexes that
refer to receipt.  So on a hunch I tried dropping all indexes that refer
to receipt date and that worked -- so it's the indexes that contain
receipt date that are teh problem.


I'm still not having any luck reproducing the failure here.  Grasping at
straws again, I wonder if it's got something to do with the order in
which the planner examines the indexes --- which is OID order.  Could
you send me the results of

select indexrelid::regclass from pg_index where indrelid = 
'detail_summary'::regclass order by indexrelid;

regards, tom lane



---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Question about memory allocations

2007-04-12 Thread Steve
I didn't notice anyone address this for you yet.  There is a tool in 
contrib/pg_buffercache whose purpose in life is to show you what the shared 
buffer cache has inside it.  The documentation in that directory leads 
through installing it.  The additional variable you'll likely never know is 
what additional information is inside the operating system's buffer cache.


Okay -- thanks!  I'll take a look at this.

# Leaving this low makes the DB complain, but I'm not sure what's # 
reasonable.

checkpoint_segments = 128


That's a reasonable setting for a large server.  The main downside to setting 
it that high is longer recovery periods after a crash, but I doubt that's a 
problem for you if you're so brazen as to turn off fsync.


	Hahaha yeah.  It's 100% assumed that if something goes bad we're 
restoring from the previous day's backup.  However because the DB is read 
only for -most- of the day and only read/write at night it's acceptable 
risk for us anyway.  But good to know that's a reasonable value.



Steve

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly