Re: [HACKERS] [PERFORM] Estimation problem with a LIKE clause containing a /

2007-11-09 Thread Gregory Stark
Tom Lane [EMAIL PROTECTED] writes:

 This rule works for all the locales I have installed ... but I don't
 have any Far Eastern locales installed.  Also, my test cases are only
 covering ASCII characters, and I believe many locales have some non-ASCII
 letters that sort after 'Z'.  I'm not sure how hard we need to try to
 cover those corner cases, though.  It is ultimately only an estimate...

If I understand correctly what we're talking about it's generating estimates
for LIKE 'foo%' using the algorithm which makes sense for C locale which means
generating the next range of values which start with 'foo%'.

It seems to me the problematic situations is when the most-frequent-values
come into play. Being off slightly in the histogram isn't going to generate
very inaccurate estimates but including or not a most-frequent-value could
throw off the estimate severely.

Could we not use the bogus range to calculate the histogram estimate but apply
the LIKE pattern directly to the most-frequent-values instead of applying the
bogus range? Or would that be too much code re-organization for now?

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Get trained by Bruce Momjian - ask me about EnterpriseDB's PostgreSQL 
training!

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] dell versus hp

2007-11-09 Thread Florian Weimer
* Scott Marlowe:

 If the right two disks fail in a RAID-10 you lose everything.
 Admittedly, that's a pretty remote possibility,

It's not, unless you carefully layout the RAID-1 subunits so that
their drives aren't physically adjacent. 8-/ I don't think many
controllers support that.

-- 
Florian Weimer[EMAIL PROTECTED]
BFK edv-consulting GmbH   http://www.bfk.de/
Kriegsstraße 100  tel: +49-721-96201-1
D-76133 Karlsruhe fax: +49-721-96201-99

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] PostgreSQL vs MySQL, and FreeBSD

2007-11-09 Thread Jonah H. Harris
On Nov 9, 2007 7:06 AM, Ivan Voras [EMAIL PROTECTED] wrote:
 I just read this document and thought I should share it with this list:

 http://people.freebsd.org/~kris/scaling/7.0%20Preview.pdf

Nice presentation.  Thanks for posting it on here.

 Among other things (FreeBSD advocacy, mostly :) ), it contains a direct
 comparison between MySQL and PostgreSQL on various platforms, with
 PostgreSQL winning!

:)

-- 
Jonah H. Harris, Sr. Software Architect | phone: 732.331.1324
EnterpriseDB Corporation| fax: 732.331.1301
499 Thornall Street, 2nd Floor  | [EMAIL PROTECTED]
Edison, NJ 08837| http://www.enterprisedb.com/

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


[PERFORM] PostgreSQL vs MySQL, and FreeBSD

2007-11-09 Thread Ivan Voras
Hi,

I just read this document and thought I should share it with this list:

http://people.freebsd.org/~kris/scaling/7.0%20Preview.pdf

Among other things (FreeBSD advocacy, mostly :) ), it contains a direct
comparison between MySQL and PostgreSQL on various platforms, with
PostgreSQL winning!

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] PostgreSQL vs MySQL, and FreeBSD

2007-11-09 Thread Sebastian Hennebrueder


 Among other things (FreeBSD advocacy, mostly :) ), it contains a direct
 comparison between MySQL and PostgreSQL on various platforms, with
 PostgreSQL winning!
   
Hello,

If the queries are complex, this is understable. I had a performance
review of a Hibernate project (Java Object Relation Mapping) using
MySQL. ORM produces easily complex queries with joins and subqueries.
MySQL uses nested loops for subqueries which lead to performance issues
with growing database size.

They state in their documentation that for version 5.2 there are
improvements planned regarding this kind of query.

Best Regards

Sebastian

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] PostgreSQL vs MySQL, and FreeBSD

2007-11-09 Thread Erik Jones

On Nov 9, 2007, at 6:06 AM, Ivan Voras wrote:


Hi,

I just read this document and thought I should share it with this  
list:


http://people.freebsd.org/~kris/scaling/7.0%20Preview.pdf

Among other things (FreeBSD advocacy, mostly :) ), it contains a  
direct

comparison between MySQL and PostgreSQL on various platforms, with
PostgreSQL winning!


Which is typical for those who aren't in on the FUD :)

Erik Jones

Software Developer | Emma®
[EMAIL PROTECTED]
800.595.4401 or 615.292.5888
615.292.0777 (fax)

Emma helps organizations everywhere communicate  market in style.
Visit us online at http://www.myemma.com



---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [PERFORM] PostgreSQL vs MySQL, and FreeBSD

2007-11-09 Thread Scott Marlowe
On Nov 9, 2007 9:41 AM, Sebastian Hennebrueder [EMAIL PROTECTED] wrote:
 If the queries are complex, this is understable. I had a performance
 review of a Hibernate project (Java Object Relation Mapping) using
 MySQL. ORM produces easily complex queries with joins and subqueries.
 MySQL uses nested loops for subqueries which lead to performance issues
 with growing database size.

 They state in their documentation that for version 5.2 there are
 improvements planned regarding this kind of query.

So, MySQL 5.2 will be catching up to version 7.1 or 7.2 of PostgreSQL
in that regard?

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [PERFORM] dell versus hp

2007-11-09 Thread Jurgen Haan
Apart from the disks, you might also investigate using Opterons instead
of Xeons. there appears to be some significant dent in performance
between Opteron and Xeon. Xeons appear to spend more time in passing
around ownership of memory cache lines in case of a spinlock.
It's not yet clear whether or not here has been worked around the issue.
You should at least investigate it a bit.

We're using a HP DL385 ourselves which performs quite well.

-R-

Tore Halset wrote:
 Hello.

 1) Dell 2900 (5U)
 8 * 146 GB SAS 15Krpm 3,5
 8GB ram
 Perc 5/i. battery backup. 256MB ram.
 2 * 4 Xeon 2,66GHz
 
 2) Dell 2950 (2U)
 8 * 146 GB SAS 10Krpm 2,5 (not really selectable, but I think the
 webshop is wrong..)
 8GB ram
 Perc 5/i. battery backup. 256MB ram.
 2 * 4 Xeon 2,66GHz
 
 3) HP ProLiant DL380 G5 (2U)
 8 * 146 GB SAS 10Krpm 2,5
 8GB ram
 P400 raid controller. battery backup. 512MB ram.
 2 * 2 Xeon 3GHz
 


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] PostgreSQL vs MySQL, and FreeBSD

2007-11-09 Thread Greg Smith

On Fri, 9 Nov 2007, Sebastian Hennebrueder wrote:


If the queries are complex, this is understable.


The queries used for this comparison are trivial.  There's only one table 
involved and there are no joins.  It's testing very low-level aspects of 
performance.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [HACKERS] [PERFORM] Estimation problem with a LIKE clause containing a /

2007-11-09 Thread Tom Lane
Gregory Stark [EMAIL PROTECTED] writes:
 Could we not use the bogus range to calculate the histogram estimate
 but apply the LIKE pattern directly to the most-frequent-values
 instead of applying the bogus range? Or would that be too much code
 re-organization for now?

We have already done that for quite some time.  It won't help
Guillaume's case anyhow: he's got no MCVs, presumably because the field
is unique.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] [PERFORM] Estimation problem with a LIKE clause containing a /

2007-11-09 Thread Guillaume Smet
On Nov 9, 2007 5:33 PM, Tom Lane [EMAIL PROTECTED] wrote:
 he's got no MCVs, presumably because the field
 is unique.

It is. The ancestors field contains the current folder itself so the
id of the folder (which is the primary key) is in it.

--
Guillaume

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [PERFORM] dell versus hp

2007-11-09 Thread Claus Guttesen
 Apart from the disks, you might also investigate using Opterons instead
 of Xeons. there appears to be some significant dent in performance
 between Opteron and Xeon. Xeons appear to spend more time in passing
 around ownership of memory cache lines in case of a spinlock.
 It's not yet clear whether or not here has been worked around the issue.
 You should at least investigate it a bit.

 We're using a HP DL385 ourselves which performs quite well.

Not atm. Until new benchmarks are published comparing AMD's new
quad-core with Intel's ditto, Intel has the edge.

http://tweakers.net/reviews/657/6

-- 
regards
Claus

When lenity and cruelty play for a kingdom,
the gentlest gamester is the soonest winner.

Shakespeare

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] dell versus hp

2007-11-09 Thread Scott Marlowe
On Nov 9, 2007 10:40 AM, Claus Guttesen [EMAIL PROTECTED] wrote:
  Apart from the disks, you might also investigate using Opterons instead
  of Xeons. there appears to be some significant dent in performance
  between Opteron and Xeon. Xeons appear to spend more time in passing
  around ownership of memory cache lines in case of a spinlock.
  It's not yet clear whether or not here has been worked around the issue.
  You should at least investigate it a bit.
 
  We're using a HP DL385 ourselves which performs quite well.

 Not atm. Until new benchmarks are published comparing AMD's new
 quad-core with Intel's ditto, Intel has the edge.

 http://tweakers.net/reviews/657/6

For 8 cores, it appears AMD has the lead, read this (stolen from
another thread):

http://people.freebsd.org/~kris/scaling/7.0%20Preview.pdf

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


[PERFORM] work_mem and shared_buffers

2007-11-09 Thread Campbell, Lance
Does the amount of memory allocate to work_mem get subtracted from
shared_buffers?

 

Example:

 

If work_mem is 1M and there are 10 connections and shared_buffers is
100M then would the total be 90 M left for shared_buffers?

 

Or does the amount of memory allocated for work_mem have nothing to do
with shared_buffers?

 

Thanks,

 

Lance Campbell

Project Manager/Software Architect

Web Services at Public Affairs

University of Illinois

217.333.0382

http://webservices.uiuc.edu

 



Re: [PERFORM] work_mem and shared_buffers

2007-11-09 Thread Heikki Linnakangas

Campbell, Lance wrote:

Does the amount of memory allocate to work_mem get subtracted from
shared_buffers?

Example:

If work_mem is 1M and there are 10 connections and shared_buffers is
100M then would the total be 90 M left for shared_buffers?

Or does the amount of memory allocated for work_mem have nothing to do
with shared_buffers?


No, they're completely separate.

Note that a connection can use more than work_mem of memory. For 
example, if you run a query with multiple Sort or hash-nodes, each such 
node allocates up to work_mem of memory.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] PostgreSQL vs MySQL, and FreeBSD

2007-11-09 Thread Bill Moran
On Fri, 9 Nov 2007 11:11:18 -0500 (EST)
Greg Smith [EMAIL PROTECTED] wrote:

 On Fri, 9 Nov 2007, Sebastian Hennebrueder wrote:
 
  If the queries are complex, this is understable.
 
 The queries used for this comparison are trivial.  There's only one table 
 involved and there are no joins.  It's testing very low-level aspects of 
 performance.

Actually, what it's really showing is parallelism, and I've always
expected PostgreSQL to come out on top in that arena.

-- 
Bill Moran
Potential Technologies
http://www.potentialtech.com

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] dell versus hp

2007-11-09 Thread Greg Smith

On Fri, 9 Nov 2007, Scott Marlowe wrote:


Not atm. Until new benchmarks are published comparing AMD's new
quad-core with Intel's ditto, Intel has the edge.
http://tweakers.net/reviews/657/6


For 8 cores, it appears AMD has the lead, read this (stolen from
another thread):
http://people.freebsd.org/~kris/scaling/7.0%20Preview.pdf


This issue isn't simple, and it may be the case that both conclusions are 
correct in their domain but testing slightly different things.  The 
sysbench test used by the FreeBSD benchmark is a much simpler than what 
the tweakers.net benchmark simulates.


Current generation AMD and Intel processors are pretty close in 
performance, but guessing which will work better involves a complicated 
mix of both CPU and memory issues.  AMD's NUMA architecture does some 
things better, and Intel's memory access takes a second hit in designs 
that use FB-DIMMs.  But Intel has enough of an advantage on actual CPU 
performance and CPU caching that current designs are usually faster 
regardless.


For an interesting look at the low-level details here, the current 
mainstream parts are compared at http://techreport.com/articles.x/11443/13 
and a similar comparison for the just released quad-core Opterons is at 
http://techreport.com/articles.x/13176/12


Nowadays Intel vs. AMD is tight enough that I don't even worry about that 
part in the context of a database application (there was still a moderate 
gap when the Tweakers results were produced a year ago).  On a real 
server, I'd suggest being more worried about how good the disk controller 
is, what the expansion options are there, and relative $/core.  In the 
x86/x64 realm, I don't feel CPU architecture is a huge issue right now 
when you're running a database.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] work_mem and shared_buffers

2007-11-09 Thread Campbell, Lance
Wow.  That is a nice logging feature in 8.3!

Thanks,

Lance Campbell
Project Manager/Software Architect
Web Services at Public Affairs
University of Illinois
217.333.0382
http://webservices.uiuc.edu
 
-Original Message-
From: Bill Moran [mailto:[EMAIL PROTECTED] 
Sent: Friday, November 09, 2007 2:08 PM
To: Campbell, Lance
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] work_mem and shared_buffers

On Fri, 9 Nov 2007 12:08:57 -0600
Campbell, Lance [EMAIL PROTECTED] wrote:

 How do you know when you should up the value of work_mem?  Just play
 with the number.  Is there a query I could do that would tell me if
 PostgreSql is performing SQL that could use more memory for sorting?

8.2 and older, it can be difficult to know, and I don't have a specific
recommendation.

8.3 includes a parameter to log the usage of temporary files by
Postgres.
When a sort can't fit in the available memory, it uses a temp file, thus
you could use this new feature to track when sorts don't fit in
work_mem.

-- 
Bill Moran
Potential Technologies
http://www.potentialtech.com

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] dell versus hp

2007-11-09 Thread Vivek Khera


On Nov 8, 2007, at 3:56 PM, Alan Hodgson wrote:

You can't touch RAID 10 for performance or reliability. The only  
reason to

use RAID 5 or RAID 6 is to get more capacity out of the same drives.


Maybe you can't, but I can.  I guess I have better toys than you :-)


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] work_mem and shared_buffers

2007-11-09 Thread Bill Moran
On Fri, 9 Nov 2007 12:08:57 -0600
Campbell, Lance [EMAIL PROTECTED] wrote:

 How do you know when you should up the value of work_mem?  Just play
 with the number.  Is there a query I could do that would tell me if
 PostgreSql is performing SQL that could use more memory for sorting?

8.2 and older, it can be difficult to know, and I don't have a specific
recommendation.

8.3 includes a parameter to log the usage of temporary files by Postgres.
When a sort can't fit in the available memory, it uses a temp file, thus
you could use this new feature to track when sorts don't fit in
work_mem.

-- 
Bill Moran
Potential Technologies
http://www.potentialtech.com

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [PERFORM] work_mem and shared_buffers

2007-11-09 Thread Scott Marlowe
On Nov 9, 2007 12:08 PM, Campbell, Lance [EMAIL PROTECTED] wrote:
 How do you know when you should up the value of work_mem?  Just play
 with the number.  Is there a query I could do that would tell me if
 PostgreSql is performing SQL that could use more memory for sorting?

Trial and error.  Note that you can set work_mem for a given session.
While it may seem that making work_mem bigger will always help, that's
not necessarily the case.

Using this query:

select count(*) from (select * from myreporttable where lasttime 
now() - interval '1 week' order by random() ) as l

I did the following: (I ran the query by itself once to fill the
buffers / cache of the machine with the data)

work_mem Time:
1000kB 29215.563 ms
4000kB 20612.489 ms
8000kB 18408.087 ms
16000kB 16893.964 ms
32000kB 17681.221 ms
64000kB 22439.988 ms
125MB 23398.891 ms
250MB 25461.797 ms

Note that my best time was at around 16 Meg work_mem.  This data set
is MUCH bigger than 16 Meg, it's around 300-400 Meg.  But work_mem
optimized out at 16 Meg.  Btw, I tried it going as high as 768 Meg,
and it was still slower than 16M.

This machine has 2 Gigs ram and is optimized for IO not CPU performance.

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] work_mem and shared_buffers

2007-11-09 Thread Campbell, Lance
It is amazing, how after working with databases very actively for over 8
years, I am still learning things.

Thanks,

Lance Campbell
Project Manager/Software Architect
Web Services at Public Affairs
University of Illinois
217.333.0382
http://webservices.uiuc.edu
 
-Original Message-
From: Scott Marlowe [mailto:[EMAIL PROTECTED] 
Sent: Friday, November 09, 2007 1:13 PM
To: Campbell, Lance
Cc: Heikki Linnakangas; pgsql-performance@postgresql.org
Subject: Re: [PERFORM] work_mem and shared_buffers

On Nov 9, 2007 12:08 PM, Campbell, Lance [EMAIL PROTECTED] wrote:
 How do you know when you should up the value of work_mem?  Just play
 with the number.  Is there a query I could do that would tell me if
 PostgreSql is performing SQL that could use more memory for sorting?

Trial and error.  Note that you can set work_mem for a given session.
While it may seem that making work_mem bigger will always help, that's
not necessarily the case.

Using this query:

select count(*) from (select * from myreporttable where lasttime 
now() - interval '1 week' order by random() ) as l

I did the following: (I ran the query by itself once to fill the
buffers / cache of the machine with the data)

work_mem Time:
1000kB 29215.563 ms
4000kB 20612.489 ms
8000kB 18408.087 ms
16000kB 16893.964 ms
32000kB 17681.221 ms
64000kB 22439.988 ms
125MB 23398.891 ms
250MB 25461.797 ms

Note that my best time was at around 16 Meg work_mem.  This data set
is MUCH bigger than 16 Meg, it's around 300-400 Meg.  But work_mem
optimized out at 16 Meg.  Btw, I tried it going as high as 768 Meg,
and it was still slower than 16M.

This machine has 2 Gigs ram and is optimized for IO not CPU performance.

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] work_mem and shared_buffers

2007-11-09 Thread Scott Marlowe
On Nov 9, 2007 1:19 PM, Campbell, Lance [EMAIL PROTECTED] wrote:
 It is amazing, how after working with databases very actively for over 8
 years, I am still learning things.

The fun thing about postgresql is that just when you've got it figured
out, somebody will come along and improve it in such a way as to make
your previously gathered knowledge obsolete.  In a good way.

I imagine in a few years, hardly anyone using postgresql will remember
the ancient art of having either apostrophes in a row inside your
plpgsql functions...

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] work_mem and shared_buffers

2007-11-09 Thread Erik Jones

On Nov 9, 2007, at 1:24 PM, Scott Marlowe wrote:


On Nov 9, 2007 1:19 PM, Campbell, Lance [EMAIL PROTECTED] wrote:
It is amazing, how after working with databases very actively for  
over 8

years, I am still learning things.


The fun thing about postgresql is that just when you've got it figured
out, somebody will come along and improve it in such a way as to make
your previously gathered knowledge obsolete.  In a good way.

I imagine in a few years, hardly anyone using postgresql will remember
the ancient art of having either apostrophes in a row inside your
plpgsql functions...


Speaking of that devil, I started working with Postgres mere months  
after that particular evil went away but we still have a good bit of  
plpgsql with it in production.  I've been meaning to convert it and  
clean it up for a while now.  Would you, or anybody, happen to know  
of any scripts out there that I could grab to make a quick job, no  
brains required of it?


Erik Jones

Software Developer | Emma®
[EMAIL PROTECTED]
800.595.4401 or 615.292.5888
615.292.0777 (fax)

Emma helps organizations everywhere communicate  market in style.
Visit us online at http://www.myemma.com



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] [PERFORM] Estimation problem with a LIKE clause containing a /

2007-11-09 Thread Guillaume Smet
Tom,

Just to confirm you that your last commit fixed the problem:

lbo=# explain analyze select * from cms_items where ancestors LIKE '1062/%';
  QUERY PLAN
---
 Seq Scan on cms_items  (cost=0.00..688.26 rows=*9097* width=103)
(actual time=0.011..22.605 rows=11326 loops=1)
   Filter: ((ancestors)::text ~~ '1062/%'::text)
 Total runtime: 30.022 ms
(3 rows)

Thanks for your time.

--
Guillaume

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] work_mem and shared_buffers

2007-11-09 Thread Scott Marlowe
On Nov 9, 2007 2:38 PM, Erik Jones [EMAIL PROTECTED] wrote:

 
  I imagine in a few years, hardly anyone using postgresql will remember
  the ancient art of having either apostrophes in a row inside your
  plpgsql functions...

 Speaking of that devil, I started working with Postgres mere months
 after that particular evil went away but we still have a good bit of
 plpgsql with it in production.  I've been meaning to convert it and
 clean it up for a while now.  Would you, or anybody, happen to know
 of any scripts out there that I could grab to make a quick job, no
 brains required of it?

Man, I can't think of any.  I'd assume you'd need to look for the
longest occurance of ' marks, and replace it with one field, say $1$
or something, then the next smaller set, with $2$ or something and so
on.  I imagine one could write a script to do it.  Luckily, we only
had one or two levels of ' marks in any of our stored procs, so it was
only a few minutes each time I edited one to switch it over.

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


[PERFORM] Can I Determine if AutoVacuum Does Anything?

2007-11-09 Thread David Crane
We've had our PostgreSQL 8.1.4 installation configured to autovacuum
since January, but I suspect it might not be doing anything.  Perhaps I
can determine what happens through the log files?  Is there a summary of
which when to log settings in postgresql.conf should be set to get at
least table-level messages about yes/no decisions?  The only message I
see now is very terse, indicating that autovacuum does run:

 

LOG:  autovacuum: processing database dc_prod

 

I suspect there's a problem because there appears to be 78% overhead in
the database size, whereas I would expect 10-15% based on what I've
read.  This is not good for some Seq Scan operations on large tables
(the root problem I'm starting to tackle).  Notes:

 

  [+] Last week I restored a production backup into my

  development sandbox with a psql -f, then ran a

  vacuumdb -a z on it. After that, I noticed that the

  size of the production database is 78% larger than

  development, using select pg_database_size('dc_prod')

  in pgAdmin3.  Prod is 5.9GB, but my Dev is 3.3GB.

 

  [+] The worst table has about 2.7x overhead, according to

  select relpages/reltuples from pg_class queries.

 

Here are the relevant postgresql.conf settings in production.  I can't
speak to their suitability, but I think they should reclaim some unused
space for reuse.

 

#stats_start_collector = on

#stats_block_level = off

stats_row_level = on

#stats_reset_on_server_start = off

 

autovacuum = on

autovacuum_naptime = 360

autovacuum_vacuum_threshold = 1000

autovacuum_analyze_threshold = 500

autovacuum_vacuum_scale_factor = 0.04

autovacuum_analyze_scale_factor = 0.02

autovacuum_vacuum_cost_delay = 10

autovacuum_vacuum_cost_limit = -1

 

I was suspicious that the stat_row_level might not work because
stat_block_level is off.  But I see pg_stat_user_tables.n_tup_ins,
pg_stat_user_tables.n_tup_upd and pg_stat_user_tables.n_tup_del are all
increasing (slowly but surely).

 

Thanks,

David Crane

http://www.donorschoose.org http://www.donorschoose.org 

Teachers Ask. You Choose. Students Learn.



Re: [PERFORM] Can I Determine if AutoVacuum Does Anything?

2007-11-09 Thread Alvaro Herrera
David Crane wrote:
 We've had our PostgreSQL 8.1.4 installation configured to autovacuum
 since January, but I suspect it might not be doing anything.  Perhaps I
 can determine what happens through the log files?  Is there a summary of
 which when to log settings in postgresql.conf should be set to get at
 least table-level messages about yes/no decisions?  The only message I
 see now is very terse, indicating that autovacuum does run:

Yeah, you have to set log_min_messages to debug2 to get useful output
for autovacuum.  This is fixed in 8.3, but for earlier version there is
nothing short of patching the server.

 autovacuum = on
 
 autovacuum_naptime = 360

This is a bit on the high side, but it may not be very important.  Keep
in mind that in 8.2 and earlier, it means how long between autovac
checks, so if there are many databases, it could be long before one
autovac run in a particular database and the next one.  (In 8.3 it has
been redefined to mean the interval between runs on every database).

 autovacuum_vacuum_threshold = 1000
 autovacuum_analyze_threshold = 500

These are the default values but for small tables they seem high as
well.  IIRC your problem is actually with big tables, for which it
doesn't make much of a difference.


-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Join performance

2007-11-09 Thread Russell Smith

Pepe Barbe wrote:

Hello,

I am having an issue on PostgreSQL 8.0.12. In the past we had 
performance issues with the query planner for queries on some tables 
where we knew we had indexes and it was doing a sequential scan, and for 
this reason we issue SET enable_seqscan = FALSE for some queries.


Recently we have stumbled upon one of these kind of queries that is 
giving terrible performance, because seqscan is disabled. I've reduced 
the problem to a a command like this one:


SELECT * from gsm_sector_metrics NATURAL JOIN gsm_amr_metrics INNER JOIN 
temp_busy_hr USING(start_time,bsc_id,sect_id);


Where temp_busy_hr is a temporary table.


Have you tried analyzing the temp_busy_hr table?
Possibly adding an index to the temp table can help if you are doing lots of 
queries.



If the previous is issued with seqscan TRUE, it runs within reasonable 
time, else it runs for ever. The query plan for the previous query with 
enable_seqscan = TRUE:


It would be worth know how far the estimates are out.  Also, have you tried 
altering the statistics target
for relevant columns to increase the accuracy?



QUERY PLAN
 


Limit  (cost=0.00..384555.98 rows=1 width=3092)
  -  Nested Loop  (cost=0.00..384555.98 rows=1 width=3092)
Join Filter: ((inner.bsc_id = outer.bsc_id) AND (inner.site_id = outer.site_id) AND 
(inner.sect_id = outer.sect_id))
-  Nested Loop  (cost=0.00..368645.64 rows=28 width=1192)
  Join Filter: ((outer.sect_id = inner.sect_id) AND (outer.bsc_id = 
inner.bsc_id))
  -  Seq Scan on temp_busy_hr  (cost=0.00..24.00 rows=1400 
width=24)
  -  Index Scan using gsm_amr_start_time_idx on gsm_amr_metrics  
(cost=0.00..226.66 rows=2094 width=1168)
Index Cond: (outer.start_time = 
gsm_amr_metrics.start_time)
-  Index Scan using gsm_sector_start_time_idx on gsm_sector_metrics t1 
 (cost=0.00..528.77 rows=1973 width=1936)
  Index Cond: (t1.start_time = outer.start_time)
(10 rows)

and the plan for enable_seqscan = FALSE:

QUERY PLAN
 


Limit  (cost=10097.16.. 100720844.01 rows=1 width=3092)
  -  Nested Loop  (cost=10097.16..100720844.01 rows=1 width=3092)
Join Filter: ((inner.bsc_id = outer.bsc_id) AND (inner.site_id = outer.site_id) AND 
(inner.sect_id = outer.sect_id))
-  Merge Join  (cost=10097.16..100704933.67 rows=28 width=1192)
  Merge Cond: (outer.start_time = inner.start_time)
  Join Filter: ((inner.sect_id = outer.sect_id) AND (inner.bsc_id = 
outer.bsc_id))
  -  Index Scan using gsm_amr_start_time_idx on gsm_amr_metrics  
(cost=0.00..631211.45 rows=6005551 width=1168)
  -  Sort  (cost=10097.16..10100.66 rows=1400 width=24)
Sort Key: temp_busy_hr.start_time
-  Seq Scan on temp_busy_hr  
(cost=1.00..10024.00 rows=1400 width=24)
-  Index Scan using gsm_sector_start_time_idx on gsm_sector_metrics t1 
 (cost=0.00..528.77 rows=1973 width=1936)
  Index Cond: (t1.start_time = outer.start_time)
(12 rows)

Any ideas what could I try to fix this problem?

Thanks,
Pepe

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq




---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match