Re: [PERFORM] Scaling further up

2004-06-09 Thread Anjan Dave
Can you describe the vendors/components of a "cheap SAN setup?"

Thanks,
Anjan

-Original Message-
From: Rod Taylor [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, March 02, 2004 5:57 PM
To: Scott Marlowe
Cc: Anjan Dave; Chris Ruprecht; [EMAIL PROTECTED]; William Yu;
Postgresql Performance
Subject: Re: [PERFORM] Scaling further up


> For speed, the X86 32 and 64 bit architectures seem to be noticeable
> faster than Sparc.  However, running Linux or BSD on Sparc make them 
> pretty fast too, but you lose the fault tolerant support for things
like 
> hot swappable CPUs or memory.

Agreed.. You can get a Quad Opteron with 16GB memory for around 20K.

Grab 3, a cheap SAN and setup a little master/slave replication with
failover (how is Slony coming?), and you're all set.



---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] Database Server Tuning

2004-06-09 Thread Josh Berkus
Vivek,

> I did a bunch of testing with different RAID levels on a 14 disk
> array.  I finally settled on this:  RAID5 across 14 disks for the
> data, the OS (including syslog directory) and WAL on a RAID1 pair on
> the other channel of the same controller (I didn't want to spring for
> dual RAID controllers).  The biggest bumps in performance came from
> increasing the checkpoint_buffers since my DB is heavily written to,
> and increasing sort_mem.

With large RAID, have you found that having WAL on a seperate array actually 
boosts performance?   The empirical tests we've seen so far don't seem to 
support this.

-- 
-Josh Berkus
 Aglio Database Solutions
 San Francisco


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Scaling further up

2004-06-09 Thread Chris Ruprecht
Hi all,

If you have a DB of 'only' 13 GB and you do not expect it to grow much, it 
might be advisable to have enough memory (RAM) to hold the entire DB in 
shared memory (everything is cached). If you have a server with say 24 GB or 
memory and can allocate 20 GB for cache, you don't care about the speed of 
disks any more - all you worry about is the speed of your memory and your 
network connection.
I believe, this not possible using 32-bit technology, you would have to go to 
some 64-bit platform, but if it's speed you want ...
You can also try solid state hard disk drives. These are actually just meory, 
there are no moving parts, but the look and behave like very very fast disk 
drives. I have seen them at capacities of 73 GB - but they didn't mention the 
price (I'd probably have a heart attack when I look at the price tag).

Best regards,
Chris


On Tuesday 02 March 2004 14:41, Anjan Dave wrote:
> "By lots I mean dozen(s) in a raid 10 array with a good controller."
>
> I believe, for RAID-10, I will need even number of drives. Currently,
> the size of the database is about 13GB, and is not expected to grow
> exponentially with thousands of concurrent users, so total space is not
> of paramount importance compared to performance.
>
> Does this sound reasonable setup?
> 10x36GB FC drives on RAID-10
> 4x36GB FC drives for the logs on RAID-10 (not sure if this is the
> correct ratio)?
> 1 hotspare
> Total=15 Drives per enclosure.
>
> Tentatively, I am looking at an entry-level EMC CX300 product with 2GB
> RAID cache, etc.
>
> Question - Are 73GB drives supposed to give better performance because
> of higher number of platters?
>
> Thanks,
> Anjan
>
>
> -Original Message-
> From: Fred Moyer [mailto:[EMAIL PROTECTED]
> Sent: Tuesday, March 02, 2004 5:57 AM
> To: William Yu; Anjan Dave
> Cc: [EMAIL PROTECTED]
> Subject: Re: [PERFORM] Scaling further up
>
> On Tue, 2004-03-02 at 17:42, William Yu wrote:
> > Anjan Dave wrote:
> > > We have a Quad-Intel XEON 2.0GHz (1MB cache), 12GB memory, running
> > > RH9,
> > > PG 7.4.0. There's an internal U320, 10K RPM RAID-10 setup on 4
>
> drives.
>
> > > We are expecting a pretty high load, a few thousands of 'concurrent'
> > > users executing either select, insert, update, statments.
> >
> > The quick and dirty method would be to upgrade to the recently
> > announced
> > 3GHz Xeon MPs with 4MB of L3. My semi-educated guess is that you'd get
> >
> > another +60% there due to the huge L3 hiding the Xeon's shared bus
>
> penalty.
>
> If you are going to have thousands of 'concurrent' users you should
> seriously consider the 2.6 kernel if you are running Linux or as an
> alternative going with FreeBSD.  You will need to load test your system
> and become an expert on tuning Postgres to get the absolute maximum
> performance from each and every query you have.
>
> And you will need lots of hard drives.  By lots I mean dozen(s) in a
> raid 10 array with a good controller.  Thousands of concurrent users
> means hundreds or thousands of transactions per second.  I've personally
> seen it scale that far but in my opinion you will need a lot more hard
> drives and ram than cpu.
>
>
> ---(end of broadcast)---
> TIP 7: don't forget to increase your free space map settings


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Index oddity

2004-06-09 Thread Tom Lane
ken <[EMAIL PROTECTED]> writes:
> ... and here is the plan with statistics set to 1000 ...

>  Seq Scan on nrgfeature f  (cost=0.00..31675.57 rows=18608 width=218)
> (actual time=63.544..1002.701 rows=225 loops=1)
>Filter: ((upperrightx > 321264.236977215::double precision) AND
> (lowerleftx < 324046.799812083::double precision) AND (upperrighty >
> 123286.261898636::double precision) AND (lowerlefty <
> 124985.927450476::double precision) AND (diagonalsize > 49.999::double
> precision))

> ... so yeah, its obviously finding way, way less rows than it thinks it
> will.

Yup.  I think your problem here is that the conditions on the different
columns are highly correlated, but the planner doesn't know anything
about that, because it has no cross-column statistics.

You could check that the individual conditions are accurately estimated
by looking at the estimated and actual row counts in

explain analyze
SELECT * FROM nrgfeature f WHERE upperRightX > 321264.23697721504;

explain analyze
SELECT * FROM nrgfeature f WHERE lowerLeftX < 324046.79981208267;

etc --- but I'll bet lunch that they are right on the money with the
higher statistics targets, and probably not too far off even at the
default.  The trouble is that the planner is simply multiplying these
probabilities together to get its estimate for the combined query,
and when the columns are not independent that leads to a very bad
estimate.

In particular it sounds like you have a *whole lot* of rows that have
diagonalSize just under 50, and so changing the diagonalSize condition
to include those makes for a big change in the predicted number of rows,
even though for the specific values of upperRightX and friends in your
test query there isn't any change in the actual number of matching rows.

I don't have any advice to magically solve this problem.  I would
suggest experimenting with alternative data representations -- for
example, maybe it would help to store "leftX" and "width" in place
of "leftX" and "rightX", etc.  What you want to do is try to decorrelate
the column values.  leftX and rightX are likely to have a strong
correlation, but maybe leftX and width don't.

regards, tom lane

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Index oddity

2004-06-09 Thread Joshua D. Drake


ANALYZE might be producing poor stats due to :
i) many dead tuples or
ii) high proportion of dead tuples in the first few pages of the table
Does a VACUUM FULL followed by ANALYZE change the estimates (or have 
you tried this already)?

(p.s. - I probably don't qualify for the 'more knowledge' bit either...)
You can also increase your statistics_target which will make ANALYZE 
take longer but can help a great deal
with larger data sets.

Sincerely,
Joshua D. Drake

regards
Mark
---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

--
Command Prompt, Inc., home of Mammoth PostgreSQL - S/ODBC and S/JDBC
Postgresql support, programming shared hosting and dedicated hosting.
+1-503-667-4564 - [EMAIL PROTECTED] - http://www.commandprompt.com
PostgreSQL Replicator -- production quality replication for PostgreSQL
---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match


Re: [PERFORM] Index oddity

2004-06-09 Thread Mark Kirkwood

Rod Taylor wrote:
The big problem is a very poor estimate (off by a couple orders of
magnitude). I was hoping someone with more knowledge in fixing those
would jump in.
 

ANALYZE might be producing poor stats due to :
i) many dead tuples or
ii) high proportion of dead tuples in the first few pages of the table
Does a VACUUM FULL followed by ANALYZE change the estimates (or have you 
tried this already)?

(p.s. - I probably don't qualify for the 'more knowledge' bit either...)
regards
Mark
---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Scaling further up

2004-06-09 Thread Anjan Dave
For the disks part - I am looking at a SAN implementation, and I will be
planning a separate RAID group for the WALs.

The controller is a PERC, with 128MB cache, and I think it is writeback.

Other than the disks, I am curious what other people are using in terms
of the horsepower needed. The Quad server has been keeping up, but we
are expecting quite high loads in the near future, and I am not sure if
just by having the disks on a high-end storage will do it.

Thanks,
Anjan


-Original Message-
From: Magnus Hagander [mailto:[EMAIL PROTECTED] 
Sent: Monday, March 01, 2004 3:54 PM
To: Anjan Dave; [EMAIL PROTECTED]
Subject: RE: [PERFORM] Scaling further up


> All:
> 
> We have a Quad-Intel XEON 2.0GHz (1MB cache), 12GB memory, running
RH9, PG 7.4.0. There's 
> an internal U320, 10K RPM RAID-10 setup on 4 drives.
> 
> We are expecting a pretty high load, a few thousands of 'concurrent'
users executing either 
> select, insert, update, statments.

> What is the next step up in terms of  handling very heavy loads?
Clustering? 

I'd look at adding more disks first. Depending on what type of query
load you get, that box sounds like it will be very much I/O bound. More
spindles = more parallell operations = faster under load. Consider
adding 15KRPM disks as well, they're not all that much more expensive,
and should give you better performance than 10KRPM.

Also, make sure you put your WAL disks on a separate RAIDset if possible
(not just a separate partition on existing RAIDset).

Finally, if you don't already have it, look for a battery-backed RAID
controller that can do writeback-cacheing, and enable that. (Don't even
think about enabling it unless it's battery backed!) And add as much RAM
as you can to that controller.


//Magnus

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [PERFORM] Index oddity

2004-06-09 Thread Rod Taylor
On Wed, 2004-06-09 at 21:45, Christopher Kings-Lynne wrote:
> > If I take away the diagonalSize condition in my query I find that there
> > are 225 rows that satisfy the other conditions.  155 of these have a

> Maybe you should drop your random_page_cost to something less than 4, 
> eg. 3 or even 2...

The big problem is a very poor estimate (off by a couple orders of
magnitude). I was hoping someone with more knowledge in fixing those
would jump in.



---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Index oddity

2004-06-09 Thread Christopher Kings-Lynne
If I take away the diagonalSize condition in my query I find that there
are 225 rows that satisfy the other conditions.  155 of these have a
diagonalSize value of exactly 50.000, while the remaining 70 rows all
have values larger than 50.  Thus there is a big discrete jump in the
number of rows at a diagonalSize of 50.  However, the statistics are off
by two orders of magnitude in guessing how many rows there are going to
be in this case and thus is not using my index.  How can I fix that?
Maybe you should drop your random_page_cost to something less than 4, 
eg. 3 or even 2...

Chris
---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] Slow query

2004-06-09 Thread scott.marlowe
On Tue, 24 Feb 2004, Ivan Voras wrote:

>   ->  Nested Loop  (cost=1.04..788.76 rows=3 width=500) (actual 
> time=4078.85..20185.89 rows=38999 loops=1)
> ->  Nested Loop  (cost=1.04..771.27 rows=3 width=485) 
> (actual time=4078.71..14673.27 rows=38999 loops=1)
> ->  Nested Loop  (cost=0.00..752.16 rows=195 
> width=288) (actual time=4078.20..6702.17 rows=38999 loops=1)
> ->  Nested Loop 
> (cost=0.00..748.72 rows=195 width=184) (actual time=0.21..3197.16 
> rows=38999 loops=1)

Note those nested loops up there.  They think that you are going to be 
operating on 3,3,195, and 195 rows respectively, when they actually are 
operating on 38999, 38999, 38999, and 38999 in reality.

set enable_nestloop = off

and see if that helps.  If so, see if altering the responsible columns 
default stats to something higher (100 is a good start) and reanalyze to 
see if you get a better plan.  As long as those estimates are that far 
off, you're gonna get a poorly performing query when the planner is 
allowed to use nested loops.


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Index oddity

2004-06-09 Thread ken
I had already tried setting the statistics to 1000 for all five of these
double precision fields with effectively no improvement.  Should have
mentioned that.

Also the between makes all values for diagonalSize bad since it is
effectively doing digonalSize > X and diagonalSize < Y.  If I do a query
with the same values for the four x,y values and diagonalSize < Y, then
for Y=49.999 the query is fast but for anything 50.000 and greater the
query is slow.  The exact opposite of the greater than queries not
surprisingly.

I also think I originally reported that the two queries gave the same
number of rows.  That is not true.  It was true when I had other joins
in, but when I stripped the query down to this core problem I should
have noticed that the number of results now differs between the two,
which I didn't at first.

If I take away the diagonalSize condition in my query I find that there
are 225 rows that satisfy the other conditions.  155 of these have a
diagonalSize value of exactly 50.000, while the remaining 70 rows all
have values larger than 50.  Thus there is a big discrete jump in the
number of rows at a diagonalSize of 50.  However, the statistics are off
by two orders of magnitude in guessing how many rows there are going to
be in this case and thus is not using my index.  How can I fix that?

Ken



On Wed, 2004-06-09 at 14:29, Rod Taylor wrote:
> > ... and here is the plan with statistics set to 1000 ...
> > 
> >  Seq Scan on nrgfeature f  (cost=0.00..31675.57 rows=18608 width=218)
> > (actual time=63.544..1002.701 rows=225 loops=1)
> >Filter: ((upperrightx > 321264.236977215::double precision) AND
> > (lowerleftx < 324046.799812083::double precision) AND (upperrighty >
> > 123286.261898636::double precision) AND (lowerlefty <
> > 124985.927450476::double precision) AND (diagonalsize > 49.999::double
> > precision))
> 
> It's better like this, but still way off the mark. Even your good query
> which uses the index was out by more than an order of magnitude.
> 
> Try raising the statistics levels for upperrightx, lowerleftx,
> upperrighty and lowerlefty.
> 
> Failing that, you might be able to push it back down again by giving
> diagonalsize an upper limit. Perhaps 500 is a value that would never
> occur.
> 
>   AND (diagonalsize BETWEEN 49.999::double precision AND 500)
> 
> 
> 


---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] Index oddity

2004-06-09 Thread Rod Taylor
> ... and here is the plan with statistics set to 1000 ...
> 
>  Seq Scan on nrgfeature f  (cost=0.00..31675.57 rows=18608 width=218)
> (actual time=63.544..1002.701 rows=225 loops=1)
>Filter: ((upperrightx > 321264.236977215::double precision) AND
> (lowerleftx < 324046.799812083::double precision) AND (upperrighty >
> 123286.261898636::double precision) AND (lowerlefty <
> 124985.927450476::double precision) AND (diagonalsize > 49.999::double
> precision))

It's better like this, but still way off the mark. Even your good query
which uses the index was out by more than an order of magnitude.

Try raising the statistics levels for upperrightx, lowerleftx,
upperrighty and lowerlefty.

Failing that, you might be able to push it back down again by giving
diagonalsize an upper limit. Perhaps 500 is a value that would never
occur.

AND (diagonalsize BETWEEN 49.999::double precision AND 500)



---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Index oddity

2004-06-09 Thread ken
On Wed, 2004-06-09 at 13:56, Rod Taylor wrote:
> On Wed, 2004-06-09 at 16:50, ken wrote:
> > Thanks Rod,
> > 
> > This setting has no effect however.  If I set statistics to 1000, or
> 
> Okay.. but you never did send EXPLAIN ANALYZE output. I want to know
> what it is really finding.

Ah, sorry, missed the ANALYZE portion of your request (thought you only
wanted the result of explain if it changed due to the setting).

Here is the query plan with statistics on diagonalsize set to the
default (-1) ...

 Seq Scan on nrgfeature f  (cost=0.00..32176.98 rows=19134 width=218)
(actual time=61.640..1009.414 rows=225 loops=1)
   Filter: ((upperrightx > 321264.236977215::double precision) AND
(lowerleftx < 324046.799812083::double precision) AND (upperrighty >
123286.261898636::double precision) AND (lowerlefty <
124985.927450476::double precision) AND (diagonalsize > 49.999::double
precision))

... and here is the plan with statistics set to 1000 ...

 Seq Scan on nrgfeature f  (cost=0.00..31675.57 rows=18608 width=218)
(actual time=63.544..1002.701 rows=225 loops=1)
   Filter: ((upperrightx > 321264.236977215::double precision) AND
(lowerleftx < 324046.799812083::double precision) AND (upperrighty >
123286.261898636::double precision) AND (lowerlefty <
124985.927450476::double precision) AND (diagonalsize > 49.999::double
precision))

... so yeah, its obviously finding way, way less rows than it thinks it
will.

thanks,

ken


> 
> > On Wed, 2004-06-09 at 13:12, Rod Taylor wrote:
> > > It seems to believe that the number of rows returned for the >49.999
> > > case will be 4 times the number for the >50 case. If that was true, then
> > > the sequential scan would be correct.
> > > 
> > > ALTER TABLE  ALTER COLUMN diagonalsize SET STATISTICS 1000;
> > > ANALZYE ;
> > > 
> > > Send back EXPLAIN ANALYZE output for the >49.999 case.
> > > 



---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] Index oddity

2004-06-09 Thread Rod Taylor
On Wed, 2004-06-09 at 16:50, ken wrote:
> Thanks Rod,
> 
> This setting has no effect however.  If I set statistics to 1000, or

Okay.. but you never did send EXPLAIN ANALYZE output. I want to know
what it is really finding.

> On Wed, 2004-06-09 at 13:12, Rod Taylor wrote:
> > It seems to believe that the number of rows returned for the >49.999
> > case will be 4 times the number for the >50 case. If that was true, then
> > the sequential scan would be correct.
> > 
> > ALTER TABLE  ALTER COLUMN diagonalsize SET STATISTICS 1000;
> > ANALZYE ;
> > 
> > Send back EXPLAIN ANALYZE output for the >49.999 case.
> > 
> > > The query plan for diagonalSize > 50.000 is ...
> > > 
> > > Index Scan using nrgfeature_xys_index on nrgfeature f 
> > > (cost=0.00..17395.79 rows=4618 width=220)
> > >Index Cond: ((upperrightx > 321264.236977215::double precision) AND
> > > (lowerleftx < 324046.799812083::double precision) AND (upperrighty >
> > > 123286.261898636::double precision) AND (lowerlefty <
> > > 124985.927450476::double precision) AND (diagonalsize > 50::double
> > > precision))
> > > 
> > > ... while for diagonalSize > 49.999 is ...
> > > 
> > >  Seq Scan on nrgfeature f  (cost=0.00..31954.70 rows=18732 width=220)
> > >Filter: ((upperrightx > 321264.236977215::double precision) AND
> > > (lowerleftx < 324046.799812083::double precision) AND (upperrighty >
> > > 123286.261898636::double precision) AND (lowerlefty <
> > > 124985.927450476::double precision) AND (diagonalsize > 49.999::double
> > > precision))
> > 
> > 
> > ---(end of broadcast)---
> > TIP 9: the planner will ignore your desire to choose an index scan if your
> >   joining column's datatypes do not match
> > 


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Index oddity

2004-06-09 Thread ken
Thanks Rod,

This setting has no effect however.  If I set statistics to 1000, or
even 0, (and then reanalyze the table) I see no change in the behaviour
of the query plans.  i.e. there is still the odd transtion in the plans
at diagonalSize = 50.

Ken



On Wed, 2004-06-09 at 13:12, Rod Taylor wrote:
> It seems to believe that the number of rows returned for the >49.999
> case will be 4 times the number for the >50 case. If that was true, then
> the sequential scan would be correct.
> 
> ALTER TABLE  ALTER COLUMN diagonalsize SET STATISTICS 1000;
> ANALZYE ;
> 
> Send back EXPLAIN ANALYZE output for the >49.999 case.
> 
> > The query plan for diagonalSize > 50.000 is ...
> > 
> > Index Scan using nrgfeature_xys_index on nrgfeature f 
> > (cost=0.00..17395.79 rows=4618 width=220)
> >Index Cond: ((upperrightx > 321264.236977215::double precision) AND
> > (lowerleftx < 324046.799812083::double precision) AND (upperrighty >
> > 123286.261898636::double precision) AND (lowerlefty <
> > 124985.927450476::double precision) AND (diagonalsize > 50::double
> > precision))
> > 
> > ... while for diagonalSize > 49.999 is ...
> > 
> >  Seq Scan on nrgfeature f  (cost=0.00..31954.70 rows=18732 width=220)
> >Filter: ((upperrightx > 321264.236977215::double precision) AND
> > (lowerleftx < 324046.799812083::double precision) AND (upperrighty >
> > 123286.261898636::double precision) AND (lowerlefty <
> > 124985.927450476::double precision) AND (diagonalsize > 49.999::double
> > precision))
> 
> 
> ---(end of broadcast)---
> TIP 9: the planner will ignore your desire to choose an index scan if your
>   joining column's datatypes do not match
> 


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Index oddity

2004-06-09 Thread Rod Taylor
It seems to believe that the number of rows returned for the >49.999
case will be 4 times the number for the >50 case. If that was true, then
the sequential scan would be correct.

ALTER TABLE  ALTER COLUMN diagonalsize SET STATISTICS 1000;
ANALZYE ;

Send back EXPLAIN ANALYZE output for the >49.999 case.

> The query plan for diagonalSize > 50.000 is ...
> 
> Index Scan using nrgfeature_xys_index on nrgfeature f 
> (cost=0.00..17395.79 rows=4618 width=220)
>Index Cond: ((upperrightx > 321264.236977215::double precision) AND
> (lowerleftx < 324046.799812083::double precision) AND (upperrighty >
> 123286.261898636::double precision) AND (lowerlefty <
> 124985.927450476::double precision) AND (diagonalsize > 50::double
> precision))
> 
> ... while for diagonalSize > 49.999 is ...
> 
>  Seq Scan on nrgfeature f  (cost=0.00..31954.70 rows=18732 width=220)
>Filter: ((upperrightx > 321264.236977215::double precision) AND
> (lowerleftx < 324046.799812083::double precision) AND (upperrighty >
> 123286.261898636::double precision) AND (lowerlefty <
> 124985.927450476::double precision) AND (diagonalsize > 49.999::double
> precision))


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


[PERFORM] Index oddity

2004-06-09 Thread ken
I'm having a performance issue that I just can't resolve and its very,
very curious.  Thought someone here might be able to shed some light on
the subject.

I'm using Postgres 7.4.2 on Red Hat 9.  I have a table with 763,809 rows
in it defined as follows ...

ksedb=# \d nrgfeature
Table "public.nrgfeature"
 Column |Type | Modifiers
+-+---
 fid1   | numeric(64,0)   | not null
 fid2   | numeric(64,0)   | not null
 created| timestamp without time zone | not null
 createdby  | character varying(30)   | not null
 modified   | timestamp without time zone |
 modifiedby | character varying(30)   |
 geommodified   | timestamp without time zone |
 geommodifiedby | character varying(30)   |
 deleted| timestamp without time zone |
 deletedby  | character varying(30)   |
 featuretypeid  | smallint| not null
 description| text|
 datasourceid   | smallint| not null
 lowerleftx | double precision| not null
 lowerlefty | double precision| not null
 upperrightx| double precision| not null
 upperrighty| double precision| not null
 diagonalsize   | double precision|
 login  | character varying(25)   |
Indexes:
"nrgfeature_pkey" primary key, btree (fid1, fid2)
"nrgfeature_ft_index" btree (featuretypeid)
"nrgfeature_xys_index" btree (upperrightx, lowerleftx, upperrighty,
lowerlefty, diagonalsize)
Inherits: commonfidattrs,
  commonrevisionattrs


... If I write a query as follows ...

SELECT *
FROM nrgfeature f
WHERE 
upperRightX > 321264.23697721504
AND lowerLeftX < 324046.79981208267
AND upperRightY > 123286.26189863647
AND lowerLeftY < 124985.92745047594
AND diagonalSize > 50.000
;

... (or any value for diagonalsize over 50) then my query runs in 50-100
milliseconds.  However, if the diagonalSize value is changed to 49.999
or any value below 50, then the query takes over a second for a factor
of 10 degradation in speed, even though the exact same number of rows is
returned.

The query plan for diagonalSize > 50.000 is ...

Index Scan using nrgfeature_xys_index on nrgfeature f 
(cost=0.00..17395.79 rows=4618 width=220)
   Index Cond: ((upperrightx > 321264.236977215::double precision) AND
(lowerleftx < 324046.799812083::double precision) AND (upperrighty >
123286.261898636::double precision) AND (lowerlefty <
124985.927450476::double precision) AND (diagonalsize > 50::double
precision))

... while for diagonalSize > 49.999 is ...

 Seq Scan on nrgfeature f  (cost=0.00..31954.70 rows=18732 width=220)
   Filter: ((upperrightx > 321264.236977215::double precision) AND
(lowerleftx < 324046.799812083::double precision) AND (upperrighty >
123286.261898636::double precision) AND (lowerlefty <
124985.927450476::double precision) AND (diagonalsize > 49.999::double
precision))

... and yes, if I set enable_seqscan=false then the index is forced to
be used.  However, despite this being an undesirable solution for this
simple case it doesn't solve the problem for the general case.  As soon
as I add in joins with a couple of tables to perform the actual query I
want to perform, the seq scan setting doesn't force the index to be used
anymore.  Instead, the primary key index is used at this same
diagonalSize cutoff and the 5-part double precision clause is used as a
filter to the index scan and the result is again a very slow query.

I can provide those queries and results but that would only complicate
this already lengthy email and the above seems to be the crux of the
problem anyway.

Any help or thoughts would be greatly appreciated of course.

Thanks,

Ken Southerland


-- 
--samsixedd--
--

Ken Southerland
Senior Consultant
Sam Six EDD
http://www.samsixedd.com

503-236-4288 (office)
503-358-6542 (cell)



---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] seq scan woes

2004-06-09 Thread Rod Taylor
On Mon, 2004-06-07 at 16:12, Dan Langille wrote:
> On 7 Jun 2004 at 16:00, Rod Taylor wrote:
> 
> > On Mon, 2004-06-07 at 15:45, Dan Langille wrote:
> > > A production system has had a query recently degrade in performance.  
> > > What once took < 1s now takes over 1s.  I have tracked down the 
> > > problem to a working example.
> > 
> > What changes have you made to postgresql.conf?
> 
> Nothing recently (ie. past few months). Nothing at all really.  
> Perhaps I need to start tuning that.
> 
> > Could you send explain analyse again with SEQ_SCAN enabled but with
> > nested loops disabled?
> 
> See http://rafb.net/paste/results/zpJEvb28.html

This doesn't appear to be the same query as we were shown earlier.

> > Off the cuff? I might hazard a guess that effective_cache is too low or
> > random_page_cost is a touch too high. Probably the former.
> 
> I grep'd postgresql.conf:
> 
> #effective_cache_size = 1000# typically 8KB each
> #random_page_cost = 4   # units are one sequential page fetch cost

This would be the issue. You haven't told PostgreSQL anything about your
hardware. The defaults are somewhat modest.

http://www.postgresql.org/docs/7.4/static/runtime-config.html

Skim through the run-time configuration parameters that can be set in
postgresql.conf.

Pay particular attention to:
  * shared_buffers (you may be best with 2000 or 4000)
  * effective_cache_size (set to 50% of ram size if dedicated db
machine)
  * random_page_cost (good disks will bring this down to a 2 from a
4)


-- 
Rod Taylor 

Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL
PGP Key: http://www.rbt.ca/signature.asc


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] seq scan woes

2004-06-09 Thread Rod Taylor
On Mon, 2004-06-07 at 15:45, Dan Langille wrote:
> A production system has had a query recently degrade in performance.  
> What once took < 1s now takes over 1s.  I have tracked down the 
> problem to a working example.

What changes have you made to postgresql.conf?

Could you send explain analyse again with SEQ_SCAN enabled but with
nested loops disabled?

Off the cuff? I might hazard a guess that effective_cache is too low or
random_page_cost is a touch too high. Probably the former.
-- 
Rod Taylor 

Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL
PGP Key: http://www.rbt.ca/signature.asc


---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] Use of Functional Indexs and Planner estimates

2004-06-09 Thread Greg Stark

"Scott Marlowe" <[EMAIL PROTECTED]> writes:

> >->  Seq Scan on rules
> >  (cost=0.00..22296.32 rows=11294 width=12) 
> >  (actual time=540.149..2047.308 rows=1 loops=1)


> Simple, the planner is choosing a sequential scan when it should be
> choosing an index scan.  This is usually because random_page_cost is set
> too high, at the default of 4.  Try settings between 1.2 and 2.x or so
> to see how that helps.  Be sure and test with various queries of your
> own to be sure you've got about the right setting.

Unless you make random_page_cost about .0004 (4/11294) it isn't going to be
costing this query right (That's a joke, don't do it:). It's thinking there
are 11,000 records matching the where clause when in fact there is only 1.

If you know how an upper bound on how many records the query should be finding
you might try a kludge involving putting a LIMIT inside the group by. ie,
something like 

select rulename,redirect 
  from (select rulename,redirect 
  from ...
 where ... 
 limit 100) as kludge
 group by rulename,redirect

This would at least tell the planner not to expect more than 100 rows and to
take the plan likely to produce the first 100 rows fastest.

But this has the disadvantage of uglifying your code and introducing an
arbitrary limit. When 7.5 comes out it you'll want to rip this out.

-- 
greg


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])