Re: [PERFORM] High inserts, bulk deletes - autovacuum vs scheduled vacuum

2007-01-10 Thread Florian Weimer
* Jeremy Haile:

 Good advice on the partitioning idea.  I may have to restructure some of
 my queries, since some of them query across the whole range - but it may
 be a much more performant solution.  How is the performance when
 querying across a set of partitioned tables vs. querying on a single
 table with all rows?

Locality of access decreases, of course, and depending on your data
size, you hit something like to 2 or 4 additional disk seeks per
partition for index-based accesses.  Sequential scans are not
impacted.

 Does my current approach of disabling autovacuum and manually vacuuming
 once-an-hour sound like a good idea, or would I likely have better
 results by auto-vacuuming and turning row-level stats back on?

Sorry, I haven't got much experience with autovacuum, since most of
other databases are INSERT-only (or get VACUUMed automatically after
major updates).

-- 
Florian Weimer[EMAIL PROTECTED]
BFK edv-consulting GmbH   http://www.bfk.de/
Kriegsstraße 100  tel: +49-721-96201-1
D-76133 Karlsruhe fax: +49-721-96201-99

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Horribly slow query/ sequential scan

2007-01-10 Thread Tom Lane
I wrote:
 ... What seems to be happening is that Informix is willing to
 flatten the sub-SELECT into an IN join even though the sub-SELECT is
 correlated to the outer query (that is, it contains outer references).

I did some googling this morning and found confirmation that recent
versions of Informix have pretty extensive support for optimizing
correlated subqueries:
http://www.iiug.org/waiug/archive/iugnew83/FeaturesIDS73.htm

This is something we've not really spent much time on for Postgres,
but it might be interesting to look at someday.  Given that the problem
with your query was really a mistake anyway, I'm not sure that your
example is compelling evidence for making it a high priority.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


[PERFORM] Slow inner join, but left join is fast

2007-01-10 Thread Jeremy Haile
I have a query made by joining two subqueries where the outer query
performing the join takes significantly longer to run than the two
subqueries.  

The first subquery runs in 600ms.  The seconds subquery runs in 700ms. 
But the outer query takes 240 seconds to run!  Both of the two
subqueries only return 8728 rows.  

Changing the inner join to a left join makes the outer query run in
about 1000ms (which is great), but I don't understand why the inner join
is so slow!

I'm using PostgreSQL 8.2.1.  Any ideas?

QUERY PLAN (Inner Join) - takes 240 seconds
---
Nested Loop  (cost=17.46..17.56 rows=1 width=120)
  Join Filter: ((a.merchant_dim_id = b.merchant_dim_id) AND
  (a.dcms_dim_id = b.dcms_dim_id))
  -  HashAggregate  (cost=8.71..8.74 rows=1 width=16)
-  Index Scan using transaction_facts_transaction_date_idx on
transaction_facts  (cost=0.00..8.69 rows=1 width=16)
  Index Cond: ((transaction_date = '2007-01-09
  00:00:00'::timestamp without time zone) AND
  (transaction_date  '2007-01-09 09:30:00'::timestamp
  without time zone))
  -  HashAggregate  (cost=8.75..8.78 rows=1 width=16)
-  HashAggregate  (cost=8.71..8.72 rows=1 width=55)
  -  Index Scan using
  transaction_facts_transaction_date_idx on
  transaction_facts  (cost=0.00..8.69 rows=1 width=55)
Index Cond: ((transaction_date = '2007-01-09
00:00:00'::timestamp without time zone) AND
(transaction_date  '2007-01-09 09:30:00'::timestamp
without time zone))


QUERY PLAN (Left Join) - takes one second
---
Merge Left Join  (cost=304037.63..304064.11 rows=2509 width=120)
  Merge Cond: ((a.dcms_dim_id = b.dcms_dim_id) AND (a.merchant_dim_id =
  b.merchant_dim_id))
  -  Sort  (cost=152019.45..152025.72 rows=2509 width=64)
Sort Key: a.dcms_dim_id, a.merchant_dim_id
-  HashAggregate  (cost=151771.15..151852.69 rows=2509
width=16)
  -  Bitmap Heap Scan on transaction_facts 
  (cost=5015.12..150419.90 rows=77214 width=16)
Recheck Cond: ((transaction_date = '2007-01-09
00:00:00'::timestamp without time zone) AND
(transaction_date  '2007-01-09 09:30:00'::timestamp
without time zone))
-  Bitmap Index Scan on
transaction_facts_transaction_date_idx 
(cost=0.00..4995.81 rows=77214 width=0)
  Index Cond: ((transaction_date = '2007-01-09
  00:00:00'::timestamp without time zone) AND
  (transaction_date  '2007-01-09
  09:30:00'::timestamp without time zone))
  -  Sort  (cost=152018.18..152020.54 rows=943 width=64)
Sort Key: b.dcms_dim_id, b.merchant_dim_id
-  Subquery Scan b  (cost=151931.51..151971.59 rows=943
width=64)
  -  HashAggregate  (cost=151931.51..151962.16 rows=943
  width=16)
-  HashAggregate  (cost=151578.11..151672.35
rows=9424 width=55)
  -  Bitmap Heap Scan on transaction_facts 
  (cost=5015.12..150419.90 rows=77214 width=55)
Recheck Cond: ((transaction_date =
'2007-01-09 00:00:00'::timestamp without
time zone) AND (transaction_date 
'2007-01-09 09:30:00'::timestamp without
time zone))
-  Bitmap Index Scan on
transaction_facts_transaction_date_idx 
(cost=0.00..4995.81 rows=77214 width=0)
  Index Cond: ((transaction_date =
  '2007-01-09 00:00:00'::timestamp
  without time zone) AND
  (transaction_date  '2007-01-09
  09:30:00'::timestamp without time
  zone))


QUERY
---
select a.merchant_dim_id, a.dcms_dim_id, 
   a.num_success, a.num_failed, a.total_transactions,
   a.success_rate,
   b.distinct_num_success, b.distinct_num_failed,
   b.distinct_total_transactions, b.distinct_success_rate
from (

-- SUBQUERY 1
select merchant_dim_id, 
   dcms_dim_id,
   sum(success) as num_success, 
   sum(failed) as num_failed, 
   count(*) as total_transactions,
   (sum(success) * 1.0 / count(*)) as success_rate 
from transaction_facts 
where transaction_date = '2007-1-9' 
and transaction_date  '2007-1-9 9:30' 
group by merchant_dim_id, dcms_dim_id

) as a inner join (

-- SUBQUERY 2
select merchant_dim_id, 
   

Re: [PERFORM] group by will not use an index?

2007-01-10 Thread tsuraan

For the reasons indicated (that is, MVCC), PG can not do a DISTINCT or the
equivalent
GROUP BY from index values alone.



Ok, that makes sense.  Thanks for the help everybody!

If this table is large, perhaps you could denormalize and maintain a

summary table with date (using truncation) and count, updated with
triggers on the original table. This table will presumably have a
small number of rows at the cost of doubling the times for updates,
inserts, and deletes.



Well, the inserted time, at least, is never updated, and deletions are very
rare (never, so far), so I'll have a look at doing things that way.  Thanks!


Re: [PERFORM] Slow inner join, but left join is fast

2007-01-10 Thread Tom Lane
Jeremy Haile [EMAIL PROTECTED] writes:
 I have a query made by joining two subqueries where the outer query
 performing the join takes significantly longer to run than the two
 subqueries.  

Please show EXPLAIN ANALYZE results, not just EXPLAIN.
Also, have you analyzed your tables recently?

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] Slow inner join, but left join is fast

2007-01-10 Thread Jeremy Haile
The table should have been analyzed, but to make sure I ran analyze on
the table before executing the explain analyze queries.  Well - problem
solved.  This time the inner join query runs quickly.  

I still don't understand why the inner join would be so different from
the left join prior to the analyze.  It looks like the amount of rows
expected in the original query plan for inner join was 1 (not correct
since it was really 8728)  The left join query had the exact same
subqueries but expected 77214 rows to be returned from them, which was
still not correct but resulted in a better query plan.

After the recent analyze, here's the new inner join query plan.  I won't
bother pasting the left join plan, since it is almost identical now
(including row counts)  FYI -the result of the queries is (and always
was) identical for inner join and left join.


QUERY PLAN (inner join)
Merge Join  (cost=279457.86..279479.83 rows=43 width=120) (actual
time=626.771..670.275 rows=8728 loops=1)
  Merge Cond: ((a.dcms_dim_id = b.dcms_dim_id) AND (a.merchant_dim_id =
  b.merchant_dim_id))
  -  Sort  (cost=139717.30..139722.38 rows=2029 width=64) (actual
  time=265.669..269.878 rows=8728 loops=1)
Sort Key: a.dcms_dim_id, a.merchant_dim_id
-  HashAggregate  (cost=139519.61..139585.56 rows=2029
width=16) (actual time=211.368..247.429 rows=8728 loops=1)
  -  Bitmap Heap Scan on transaction_facts 
  (cost=4427.62..138316.05 rows=68775 width=16) (actual
  time=21.858..100.998 rows=65789 loops=1)
Recheck Cond: ((transaction_date = '2007-01-09
00:00:00'::timestamp without time zone) AND
(transaction_date  '2007-01-09 09:30:00'::timestamp
without time zone))
-  Bitmap Index Scan on
transaction_facts_transaction_date_idx 
(cost=0.00..4410.42 rows=68775 width=0) (actual
time=21.430..21.430 rows=65789 loops=1)
  Index Cond: ((transaction_date = '2007-01-09
  00:00:00'::timestamp without time zone) AND
  (transaction_date  '2007-01-09
  09:30:00'::timestamp without time zone))
  -  Sort  (cost=139740.56..139742.67 rows=843 width=64) (actual
  time=361.083..365.418 rows=8728 loops=1)
Sort Key: b.dcms_dim_id, b.merchant_dim_id
-  Subquery Scan b  (cost=139663.76..139699.59 rows=843
width=64) (actual time=308.567..346.135 rows=8728 loops=1)
  -  HashAggregate  (cost=139663.76..139691.16 rows=843
  width=16) (actual time=308.563..337.677 rows=8728 loops=1)
-  HashAggregate  (cost=139347.68..139431.97
rows=8429 width=55) (actual time=198.093..246.591
rows=48942 loops=1)
  -  Bitmap Heap Scan on transaction_facts 
  (cost=4427.62..138316.05 rows=68775 width=55)
  (actual time=24.080..83.988 rows=65789
  loops=1)
Recheck Cond: ((transaction_date =
'2007-01-09 00:00:00'::timestamp without
time zone) AND (transaction_date 
'2007-01-09 09:30:00'::timestamp without
time zone))
-  Bitmap Index Scan on
transaction_facts_transaction_date_idx 
(cost=0.00..4410.42 rows=68775 width=0)
(actual time=23.596..23.596 rows=65789
loops=1)
  Index Cond: ((transaction_date =
  '2007-01-09 00:00:00'::timestamp
  without time zone) AND
  (transaction_date  '2007-01-09
  09:30:00'::timestamp without time
  zone))
Total runtime: 675.638 ms



On Wed, 10 Jan 2007 12:15:44 -0500, Tom Lane [EMAIL PROTECTED] said:
 Jeremy Haile [EMAIL PROTECTED] writes:
  I have a query made by joining two subqueries where the outer query
  performing the join takes significantly longer to run than the two
  subqueries.  
 
 Please show EXPLAIN ANALYZE results, not just EXPLAIN.
 Also, have you analyzed your tables recently?
 
   regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] Slow inner join, but left join is fast

2007-01-10 Thread Tom Lane
Jeremy Haile [EMAIL PROTECTED] writes:
 I still don't understand why the inner join would be so different from
 the left join prior to the analyze.

Are you sure you hadn't analyzed in between?  Or maybe autovac did it
for you?  The reason for the plan change is the change from estimating
1 row matching the transaction_date range constraint, to estimating lots
of them, and the join type away up at the top would surely not have
affected that.

regards, tom lane

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [PERFORM] Slow inner join, but left join is fast

2007-01-10 Thread Jeremy Haile
I'm pretty sure it didn't analyze in between  - autovac is turned off
and I ran the test multiple times before posting.  

But since I can't reproduce it anymore, I can't be 100% sure.  And it
certainly doesn't make sense that the estimate for the index scan would
change based on an unrelated join condition.

If I ever get it to happen again, I'll be more careful and repost if it
is a real issue.  Thanks for pointing me in the right direction!


On Wed, 10 Jan 2007 13:38:15 -0500, Tom Lane [EMAIL PROTECTED] said:
 Jeremy Haile [EMAIL PROTECTED] writes:
  I still don't understand why the inner join would be so different from
  the left join prior to the analyze.
 
 Are you sure you hadn't analyzed in between?  Or maybe autovac did it
 for you?  The reason for the plan change is the change from estimating
 1 row matching the transaction_date range constraint, to estimating lots
 of them, and the join type away up at the top would surely not have
 affected that.
 
   regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Slow inner join, but left join is fast

2007-01-10 Thread Jeremy Haile
Another random idea - does PostgreSQL do any caching of query plans?
even on the session level?  

I ran these queries from the same Query window, so my idea is that maybe
the inner join plan was cached prior to an automatic analyze being run.  

But I'm doubting PostgreSQL would do something like that.  And of
course, if PostgreSQL doesn't cache query plans - this idea is bogus =)


On Wed, 10 Jan 2007 13:38:24 -0500, Jeremy Haile [EMAIL PROTECTED]
said:
 I'm pretty sure it didn't analyze in between  - autovac is turned off
 and I ran the test multiple times before posting.  
 
 But since I can't reproduce it anymore, I can't be 100% sure.  And it
 certainly doesn't make sense that the estimate for the index scan would
 change based on an unrelated join condition.
 
 If I ever get it to happen again, I'll be more careful and repost if it
 is a real issue.  Thanks for pointing me in the right direction!
 
 
 On Wed, 10 Jan 2007 13:38:15 -0500, Tom Lane [EMAIL PROTECTED] said:
  Jeremy Haile [EMAIL PROTECTED] writes:
   I still don't understand why the inner join would be so different from
   the left join prior to the analyze.
  
  Are you sure you hadn't analyzed in between?  Or maybe autovac did it
  for you?  The reason for the plan change is the change from estimating
  1 row matching the transaction_date range constraint, to estimating lots
  of them, and the join type away up at the top would surely not have
  affected that.
  
  regards, tom lane
 
 ---(end of broadcast)---
 TIP 3: Have you checked our extensive FAQ?
 
http://www.postgresql.org/docs/faq

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Slow inner join, but left join is fast

2007-01-10 Thread Tom Lane
Jeremy Haile [EMAIL PROTECTED] writes:
 Another random idea - does PostgreSQL do any caching of query plans?

Only if the client specifies it, either by PREPARE or the equivalent
protocol-level message.  I dunno what client software you were using,
but I think few if any would PREPARE behind your back.  Might be worth
checking into though, if you've eliminated autovacuum.

Actually there's another possibility --- did you create any indexes on
the table in between?  CREATE INDEX doesn't do a full stats update, but
it does count the rows and update pg_class.reltuples.  But it's hard to
believe that'd have caused as big a rowcount shift as we see here ...

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Slow inner join, but left join is fast

2007-01-10 Thread Jeremy Haile
I did create and drop an index at some point while looking at this
issue.  But I definitely reran both of the queries (and explains) after
the index was dropped, so I don't understand why there would be a
difference between the inner and left query plans.  (which were run
back-to-back more than once)  Anyways - I'll let you know if something
similar happens again.

Thanks,
Jeremy Haile 


On Wed, 10 Jan 2007 14:22:35 -0500, Tom Lane [EMAIL PROTECTED] said:
 Jeremy Haile [EMAIL PROTECTED] writes:
  Another random idea - does PostgreSQL do any caching of query plans?
 
 Only if the client specifies it, either by PREPARE or the equivalent
 protocol-level message.  I dunno what client software you were using,
 but I think few if any would PREPARE behind your back.  Might be worth
 checking into though, if you've eliminated autovacuum.
 
 Actually there's another possibility --- did you create any indexes on
 the table in between?  CREATE INDEX doesn't do a full stats update, but
 it does count the rows and update pg_class.reltuples.  But it's hard to
 believe that'd have caused as big a rowcount shift as we see here ...
 
   regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] performance implications of binary placement

2007-01-10 Thread Jim C. Nasby
Are you 100% certain that both builds are using all the same libraries?
And to be an apples-apples comparison, you really need to ensure that
the datadir is on the same filesystem in both cases (that's the first
thing I'd check).

Also, that pg_index... error sounds like the second build has been
corrupted.

On Tue, Dec 26, 2006 at 03:37:47PM -0500, Bob Dusek wrote:
 Hello all,
 
 I've been running performance tests on various incantations of Postgres
 on/off for a month or so.  And, I've just come across some unexpected
 results.
 
 When I start my Postgres build as such:
 
 # (Scenario 1)
 
 ./configure --prefix=/usr --libdir=/usr/lib --bindir=/usr/bin
 --includedir=/usr/include/pgsql --datadir=/usr/share/postgresql
 --mandir=/usr/share/man --with-docdir=/usr/share/doc/packages
 --disable-rpath --enable-thread-safety --enable-integer-datetimes
 --without-python --without-perl --without-tcl --without-tk
 
 It performs significantly worse than when I start my build like this:
 
 # (Scenario 2)
 
 ./configure --disable-rpath --enable-thread-safety
 --enable-integer-datetimes --without-python --without-perl --without-tcl
 --without-tk
 
 Note:  the only differences are that Scenario 1 includes these
 options:
 
 --prefix=/usr --libdir=/usr/lib --bindir=/usr/bin
 --includedir=/usr/include/pgsql --datadir=/usr/share/postgresql
 --mandir=/usr/share/man --with-docdir=/usr/share/doc/packages
 
 And, to be clear, Scenario 1 performs worse than Scenario 2.  Simple
 insert statements are taking significantly longer. 
 
 I did not expect to see a performance hit with these options, especially
 since /usr/ on the test machine is mounted as its own partition, and
 in both cases, all of the binaries, include files, etc. are in that
 partition. 
 
 Has anyone seen this before?  Are hard drive mechanics the only thing in
 play here?
 
 The only difference I'm seeing in logging between the two versions is
 that Scenario 2 has several of this message littered throughout the
 logfile:
 
 ERROR: could not open relation pg_index_indexrelid_index: No such file
 or directory
 
 But, that doesn't seem to be effecting functionality or performance
 (especially considering the fact that the logfile that contains that
 message is part of the test that is performing better).
 
 We're using Postgres 7.4.8, building from the SLES9 Postgres 7.4.8
 source rpm. 
 
 Thanks for any help you can provide.  I can provide more detail if
 needed.
 
 Thanks again,
 
 Bob 
 
 
 ---(end of broadcast)---
 TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly
 

-- 
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] More 8.2 client issues (Was: [Slow dump?)

2007-01-10 Thread Jim C. Nasby
On Wed, Jan 03, 2007 at 11:56:20AM -0500, Tom Lane wrote:
 Erik Jones [EMAIL PROTECTED] writes:
  Tom Lane wrote:
  I could see this taking an unreasonable amount of time if you had a huge
  number of pg_class rows or a very long search_path --- is your database
  at all out of the ordinary in those ways?
  
  Well, running select count(*) from pg_class; returns 524699 rows
 
 Ouch.
 
  our search path is the default.  I'd also like to reiterate that \d 
  pg_class returns instantly when run from the 8.1.4 psql client connected 
  to the 8.2 db.
 
 I think I know where the problem is: would you compare timing of
 
   select * from pg_class where c.relname ~ '^(pg_class)$';
 
   select * from pg_class where c.relname ~ '^pg_class$';
 
 Recent versions of psql put parentheses into the regex pattern for
 safety in case it's got |, but I just realized that that probably
 confuses the optimizer's check for an indexable regex :-(
 
 However, this only explains slowdown in psql's \d commands, which
 wasn't your original complaint ...

On the other hand, with 500k relations pg_dump is presumably going to be
doing a lot of querying of the catalog tables, so if it uses similar
queries...
-- 
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] Performance of PostgreSQL on Windows vs Linux

2007-01-10 Thread Jim C. Nasby
On Wed, Jan 03, 2007 at 12:24:24PM -0500, Jeremy Haile wrote:
 I am sure that this has been discussed before, but I can't seem to find
 any recent posts. (I am running PostgreSQL 8.2)
 
 I have always ran PostgreSQL on Linux in the past, but the company I am
 currently working for uses Windows on all of their servers.  I don't
 have the luxury right now of running my own benchmarks on the two OSes,
 but wanted to know if anyone else has done a performance comparison.  Is
 there any significant differences?

One thing to consider... I've seen a case or two where pgbench running
on windows with HyperThreading enabled was actually faster than with it
turned off. (General experience has been that HT hurts PostgreSQL). I
suspect that the windows kernel may have features that allow it to
better utilize HT than linux.

Of course if you don't have HT... it doesn't matter. :)
-- 
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] PostgreSQL to host e-mail?

2007-01-10 Thread Jim C. Nasby
On Fri, Jan 05, 2007 at 01:15:44PM -0500, Reid Thompson wrote:
 On Fri, 2007-01-05 at 04:10 +0100, Grega Bremec wrote:
  he main reason I'm writing this mail though, is to suggest you take a
  look
  at Oryx, http://www.oryx.com/; They used to have this product called
  Mailstore, which was designed to be a mail store using PostgreSQL as a
  backend, and has since evolved to a bit more than just that, it seems.
  Perhaps it could be of help to you while building your system, and I'm
  sure
  the people at Oryx will be glad to hear from you while, and after
  you've
  built your system.
  
  Kind regards,
  --
  ~Grega Bremec 
 re above...
 http://www.archiveopteryx.org/1.10.html 

You should also look at http://dbmail.org/ , which runs on several
databases (PostgreSQL included).
-- 
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Partitioning

2007-01-10 Thread Jim C. Nasby
BTW, someone coming up with a set of functions to handle partitioning
for the general 'partition by time' case would make a GREAT project on
pgFoundry.

On Fri, Jan 05, 2007 at 12:47:08PM +0100, Mikael Carneholm wrote:
 Take a look at the set of partitioning functions I wrote shortly after
 the 8.1 release:
 
 http://www.studenter.hb.se/~arch/files/part_functions.sql
 
 You could probably work something out using those functions (as-is, or
 as inspiration) together with pgAgent
 (http://www.pgadmin.org/docs/1.4/pgagent.html)
 
 /Mikael
 
  -Original Message-
  From: [EMAIL PROTECTED]
 [mailto:pgsql-performance-
  [EMAIL PROTECTED] On Behalf Of Arnau
  Sent: den 5 januari 2007 12:02
  To: pgsql-performance@postgresql.org
  Subject: [PERFORM] Partitioning
  
  Hi all,
  
 I'm not sure if this question fits in the topic of this list.
  
 I'm interested in partitioning and it's the first time I'd use it.
  There is an issue I don't know how you handle it. Lets say I'm
  interested in store monthly based statistical data like the example of
  http://www.postgresql.org/docs/8.2/static/ddl-partitioning.html. What
 I
  don't like of this approach is that the monthly tables, rules... must
 be
  created manually or at least I haven't found any other option.
  
 My question is how do you manage this? do you have a cron task that
  creates automatically these monthly elements (tables, rules, ... ) or
  there is another approach that doesn't require external things like
 cron
only PostgreSQL.
  --
  Arnau
  
  ---(end of
 broadcast)---
  TIP 5: don't forget to increase your free space map settings
 
 
 ---(end of broadcast)---
 TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match
 

-- 
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] table partioning performance

2007-01-10 Thread Jim C. Nasby
On Mon, Jan 08, 2007 at 03:02:24PM -0500, Steven Flatt wrote:
 We use partitioned tables extensively and we have observed linear
 performance degradation on inserts as the number of rules on the master
 table grows (i.e. number of rules = number of partitions).  We had to come
 up with a solution that didn't have a rule per partition on the master
 table.  Just wondering if you are observing the same thing.

Except for the simplest partitioning cases, you'll be much better off
using a trigger on the parent table to direct inserts/updates/deletes to
the children. As a bonus, using a trigger makes it a lot more realistic
to deal with an update moving data between partitions.
-- 
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [PERFORM] High update activity, PostgreSQL vs BigDBMS

2007-01-10 Thread Jim C. Nasby
On Tue, Jan 09, 2007 at 09:10:51AM -0800, Jeff Frost wrote:
 On Tue, 9 Jan 2007, Jim C. Nasby wrote:
 
 On Thu, Dec 28, 2006 at 02:15:31PM -0800, Jeff Frost wrote:
 When benchmarking various options for a new PG server at one of my 
 clients,
 I tried ext2 and ext3 (data=writeback) for the WAL and it appeared to be
 fastest to have ext2 for the WAL.  The winning time was 157m46.713s for
 ext2, 159m47.098s for combined ext3 data/xlog and 158m25.822s for ext3
 data=writeback.  This was on an 8x150GB Raptor RAID10 on an Areca 1130 w/
 1GB BBU cache.  This config benched out faster than a 6disk RAID10 + 2 
 disk
 RAID1 for those of you who have been wondering if the BBU write back cache
 mitigates the need for separate WAL (at least on this workload).  Those 
 are
 the fastest times for each config, but ext2 WAL was always faster than the
 other two options.  I didn't test any other filesystems in this go around.
 
 Uh, if I'm reading this correctly, you're saying that WAL on a separate
 ext2 vs. one big ext3 with data=writeback saved ~39 seconds out of
 ~158.5 minutes, or 0.4%? Is that even above the noise for your
 measurements? I suspect the phase of the moon might play a bigger role
 ;P
 
 That's what I thought too...cept I ran it 20 times and ext2 won by that 
 margin every time, so it was quite repeatable. :-/

Even so, you've got to really be hunting for performance to go through
the hassle of different filesystems just to gain 0.4%... :)
-- 
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Partitioning

2007-01-10 Thread Erik Jones

On Fri, Jan 05, 2007 at 12:47:08PM +0100, Mikael Carneholm wrote:

Take a look at the set of partitioning functions I wrote shortly after
the 8.1 release:

http://www.studenter.hb.se/~arch/files/part_functions.sql

You could probably work something out using those functions (as-is, or
as inspiration) together with pgAgent
(http://www.pgadmin.org/docs/1.4/pgagent.html)

/Mikael


Those are  some great functions.

--
erik jones [EMAIL PROTECTED]
software development
emma(r)


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] Partitioning

2007-01-10 Thread Jeremy Haile
I really wish that PostgreSQL supported a nice partitioning syntax
like MySQL has.  

Here is an example:
CREATE TABLE tr (id INT, name VARCHAR(50), purchased DATE)
  PARTITION BY RANGE( YEAR(purchased) ) (
  PARTITION p0 VALUES LESS THAN (1990),
  PARTITION p1 VALUES LESS THAN (1995),
  PARTITION p2 VALUES LESS THAN (2000),
  PARTITION p3 VALUES LESS THAN (2005)
);

And to drop a partition:
ALTER TABLE tr DROP PARTITION p2;


This seems so much more intuitive and simpler than what is required to
set it up in PostgreSQL.  Does PostgreSQL's approach to table
partitioning have any advantage over MySQL?  Is a nicer syntax planned
for Postgres?


On Wed, 10 Jan 2007 14:20:06 -0600, Jim C. Nasby [EMAIL PROTECTED] said:
 BTW, someone coming up with a set of functions to handle partitioning
 for the general 'partition by time' case would make a GREAT project on
 pgFoundry.
 
 On Fri, Jan 05, 2007 at 12:47:08PM +0100, Mikael Carneholm wrote:
  Take a look at the set of partitioning functions I wrote shortly after
  the 8.1 release:
  
  http://www.studenter.hb.se/~arch/files/part_functions.sql
  
  You could probably work something out using those functions (as-is, or
  as inspiration) together with pgAgent
  (http://www.pgadmin.org/docs/1.4/pgagent.html)
  
  /Mikael
  
   -Original Message-
   From: [EMAIL PROTECTED]
  [mailto:pgsql-performance-
   [EMAIL PROTECTED] On Behalf Of Arnau
   Sent: den 5 januari 2007 12:02
   To: pgsql-performance@postgresql.org
   Subject: [PERFORM] Partitioning
   
   Hi all,
   
  I'm not sure if this question fits in the topic of this list.
   
  I'm interested in partitioning and it's the first time I'd use it.
   There is an issue I don't know how you handle it. Lets say I'm
   interested in store monthly based statistical data like the example of
   http://www.postgresql.org/docs/8.2/static/ddl-partitioning.html. What
  I
   don't like of this approach is that the monthly tables, rules... must
  be
   created manually or at least I haven't found any other option.
   
  My question is how do you manage this? do you have a cron task that
   creates automatically these monthly elements (tables, rules, ... ) or
   there is another approach that doesn't require external things like
  cron
 only PostgreSQL.
   --
   Arnau
   
   ---(end of
  broadcast)---
   TIP 5: don't forget to increase your free space map settings
  
  
  ---(end of broadcast)---
  TIP 9: In versions below 8.0, the planner will ignore your desire to
 choose an index scan if your joining column's datatypes do not
 match
  
 
 -- 
 Jim Nasby[EMAIL PROTECTED]
 EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)
 
 ---(end of broadcast)---
 TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] High update activity, PostgreSQL vs BigDBMS

2007-01-10 Thread Jeff Frost

On Wed, 10 Jan 2007, Jim C. Nasby wrote:


RAID1 for those of you who have been wondering if the BBU write back cache
mitigates the need for separate WAL (at least on this workload).  Those
are
the fastest times for each config, but ext2 WAL was always faster than the
other two options.  I didn't test any other filesystems in this go around.


Uh, if I'm reading this correctly, you're saying that WAL on a separate
ext2 vs. one big ext3 with data=writeback saved ~39 seconds out of
~158.5 minutes, or 0.4%? Is that even above the noise for your
measurements? I suspect the phase of the moon might play a bigger role
;P


That's what I thought too...cept I ran it 20 times and ext2 won by that
margin every time, so it was quite repeatable. :-/


Even so, you've got to really be hunting for performance to go through
the hassle of different filesystems just to gain 0.4%... :)


Indeed, but actually, I did the math again and it appears that it saves close 
to 2 minutes versus one big ext3.  I guess the moral of the story is that 
having a separate pg_xlog even on the same physical volume tends to be 
slightly faster for write oriented workloads.  Ext2 is slightly faster than 
ext3, but of course you could likely go with another filesystem yet and be 
even slightly faster as well. :-)


I guess the real moral of the story is that you can probably use one big ext3 
with the default config and it won't matter much more than 1-2% if you have a 
BBU.


--
Jeff Frost, Owner   [EMAIL PROTECTED]
Frost Consulting, LLC   http://www.frostconsultingllc.com/
Phone: 650-780-7908 FAX: 650-649-1954

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [PERFORM] Performance of PostgreSQL on Windows vs Linux

2007-01-10 Thread Jeremy Haile
Hey Jim - 

Thanks for the feedback.  The server has dual Xeons with HyperThreading
enabled - so perhaps I should try disabling it.  How much performance
boost have you seen by disabling it?  Of course, the bottleneck in my
case is more on the I/O or RAM side, not the CPU side.

Jeremy Haile


On Wed, 10 Jan 2007 14:15:26 -0600, Jim C. Nasby [EMAIL PROTECTED] said:
 On Wed, Jan 03, 2007 at 12:24:24PM -0500, Jeremy Haile wrote:
  I am sure that this has been discussed before, but I can't seem to find
  any recent posts. (I am running PostgreSQL 8.2)
  
  I have always ran PostgreSQL on Linux in the past, but the company I am
  currently working for uses Windows on all of their servers.  I don't
  have the luxury right now of running my own benchmarks on the two OSes,
  but wanted to know if anyone else has done a performance comparison.  Is
  there any significant differences?
 
 One thing to consider... I've seen a case or two where pgbench running
 on windows with HyperThreading enabled was actually faster than with it
 turned off. (General experience has been that HT hurts PostgreSQL). I
 suspect that the windows kernel may have features that allow it to
 better utilize HT than linux.
 
 Of course if you don't have HT... it doesn't matter. :)
 -- 
 Jim Nasby[EMAIL PROTECTED]
 EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] High inserts, bulk deletes - autovacuum vs scheduled vacuum

2007-01-10 Thread Jim C. Nasby
On Tue, Jan 09, 2007 at 12:26:41PM -0500, Jeremy Haile wrote:
 I am developing an application that has very predictable database
 operations:
   -inserts several thousand rows into 3 tables every 5 minutes. (table
   contain around 10 million rows each)
   -truncates and rebuilds aggregate tables of this data every 5 minutes.
   (several thousand rows each)
   -regular reads of aggregate table and sometimes large tables by user
   interaction
   -every night, hundreds of thousands of rows are deleted from these 3
   tables (old data)
   -20-30 other tables get inserted/updated slowly throughout the day
 
 In order to optimize performance of the inserts, I disabled
 autovacuum/row-level stats and instead run vacuum analyze on the whole
 DB every hour.  However this operation takes around 20 minutes of each
 hour.  This means that the database is involved in vacuum/analyzing
 tables 33% of the time.
 
 I'd like any performance advice, but my main concern is the amount of
 time vacuum/analyze runs and its possible impact on the overall DB
 performance.  Thanks!
 
If much of the data in the database isn't changing that often, then why
continually re-vacuum the whole thing?

I'd suggest trying autovacuum and see how it does (though you might want
to tune it to be more or less aggressive, and you'll probably want to
enable the cost delay).

The only cases where manual vacuum makes sense to me is if you've got a
defined slow period and vacuuming during that slow period is still
frequent enough to keep up with demand, or if you've got tables that
have a very high churn rate and need to be kept small. In the later
case, I'll usually setup a cronjob to vacuum those tables once a minute
with no cost delay. I'm sure there might be some other cases where not
using autovac might make sense, but generally I'd much rather let
autovac worry about this so I don't have to.

 I am running 8.2 (will be 8.2.1 soon).  The box is Windows with 2GB RAM
 connected to a SAN over fiber.  The data and pg_xlog are on separate
 partitions.  
 
 Modified configuration:
 effective_cache_size = 1000MB
 random_page_cost = 3
 default_statistics_target = 50
 maintenance_work_mem = 256MB
 shared_buffers = 400MB
 temp_buffers = 10MB
 work_mem = 10MB
 max_fsm_pages = 150

One other useful manual vacuum to consider is running vacuumdb -av
periodically (say, once a month) and looking at the last few lines of
output. That will give you a good idea on how large you should set
max_fsm_pages. Running the output of vacuumdb -av through pgFouine will
give you other useful data.

 checkpoint_segments = 30
 stats_row_level = off
 stats_start_collector = off

Unless you're really trying to get the last ounce of performance out,
it's probably not worth turning those stats settings off.
-- 
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] Performance of PostgreSQL on Windows vs Linux

2007-01-10 Thread Scott Marlowe
On Wed, 2007-01-10 at 14:15, Jim C. Nasby wrote:
 On Wed, Jan 03, 2007 at 12:24:24PM -0500, Jeremy Haile wrote:
  I am sure that this has been discussed before, but I can't seem to find
  any recent posts. (I am running PostgreSQL 8.2)
  
  I have always ran PostgreSQL on Linux in the past, but the company I am
  currently working for uses Windows on all of their servers.  I don't
  have the luxury right now of running my own benchmarks on the two OSes,
  but wanted to know if anyone else has done a performance comparison.  Is
  there any significant differences?
 
 One thing to consider... I've seen a case or two where pgbench running
 on windows with HyperThreading enabled was actually faster than with it
 turned off. (General experience has been that HT hurts PostgreSQL). I
 suspect that the windows kernel may have features that allow it to
 better utilize HT than linux.

I've also seen a few comments in perform (and elsewhere) in the past
that newer linux kernels seem to handle HT better than older ones, and
also might give better numbers for certain situations.

Note that you should really test with a wide variety of loads (i.e. a
lot of parallel loads, a few etc...) to see what the curve looks like. 
If HT gets you 10% gain on 4 or fewer clients, but 20% slower with 8
clients, then hyperthreading might be a not so good choice.

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] table partioning performance

2007-01-10 Thread Steven Flatt

On 1/9/07, Simon Riggs [EMAIL PROTECTED] wrote:


If you are doing date range partitioning it should be fairly simple to
load data into the latest table directly. That was the way I originally
intended for it to be used. The rules approach isn't something I'd
recommend as a bulk loading option and its a lot more complex anyway.


The problem we have with blindly loading all data into the latest table is
that some data ( 5%, possibly even much less) is actually delivered late
and belongs in earlier partitions.  So we still needed the ability to send
data to an arbitrary partition.

Steve


Re: [PERFORM] Partitioning

2007-01-10 Thread Jim C. Nasby
On Wed, Jan 10, 2007 at 03:28:00PM -0500, Jeremy Haile wrote:
 This seems so much more intuitive and simpler than what is required to
 set it up in PostgreSQL.  Does PostgreSQL's approach to table
 partitioning have any advantage over MySQL?  Is a nicer syntax planned
 for Postgres?

The focus was to get the base functionality working, and working
correctly. Another consideration is that there's multiple ways to
accomplish the partitioning; exposing the basic functionality without
enforcing a given interface provides more flexibility (ie: it appears
that you can't do list partitioning with MySQL, while you can with
PostgreSQL).
-- 
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] High inserts, bulk deletes - autovacuum vs scheduled vacuum

2007-01-10 Thread Jim C. Nasby
Please cc the list so others can help and learn.

On Wed, Jan 10, 2007 at 03:43:00PM -0500, Jeremy Haile wrote:
  I'd suggest trying autovacuum and see how it does (though you might want
  to tune it to be more or less aggressive, and you'll probably want to
  enable the cost delay).
 
 What are some decent default values for the cost delay vacuum settings? 
 I haven't used these before.
 
I find that simply setting vacuum_cost_delay to 20 is generally a good
starting point. I'll usually do that and then run a vacuum while
watching disk activity; I try and tune it so that the disk is ~90%
utilized with vacuum running. That allows a safety margin without
stretching vacuums out forever.

 Also - do the default autovacuum settings make sense for tables on the
 scale of 10 million rows?  For example, using the defaults it would
 require about a  million rows (250 + 0.1 * 10 million) to be
 inserted/updated/deleted before analyzing - which  seems high.  (about 2
 million for vacuum)  Or am I overestimating how often I would need to
 vacuum/analyze these tables?
 
Depends on your application... the way I look at it is that a setting of
0.1 means 10% dead space in the table. While 5% or 1% would be better,
you hit a point of diminishing returns since you have to read the entire
table and it's indexes to vacuum it.

BTW, that's the default values for analyze... the defaults for vacuum
are 2x that.

 Do most people use the default autovacuum settings successfully, or are
 they usually modified?

I generally use the 8.2 defaults (which are much better than the 8.1
defaults) unless I'm really trying to tune things. What's more important
is to make sure critical tables (such as queue tables) are getting
vacuumed frequently so that they stay small. (Of course you also need to
ensure there's no long running transactions).
-- 
Jim C. Nasby, Database Architect   [EMAIL PROTECTED]
512.569.9461 (cell) http://jim.nasby.net

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Partitioning

2007-01-10 Thread Jeremy Haile
You can do list partitioning in MySQL:
http://dev.mysql.com/doc/refman/5.1/en/partitioning-list.html

My comment was not meant as a criticism of PostgreSQL's current state -
I'm glad that it has partitioning.  I'm simply wondering if there are
any plans of adopting a more user-friendly syntax in the future similar
to MySQL partitioning support.  Having first-class citizen support of
partitions would also allow some nice administrative GUIs and views to
be built for managing them.  

Jeremy Haile


On Wed, 10 Jan 2007 15:09:31 -0600, Jim C. Nasby [EMAIL PROTECTED] said:
 On Wed, Jan 10, 2007 at 03:28:00PM -0500, Jeremy Haile wrote:
  This seems so much more intuitive and simpler than what is required to
  set it up in PostgreSQL.  Does PostgreSQL's approach to table
  partitioning have any advantage over MySQL?  Is a nicer syntax planned
  for Postgres?
 
 The focus was to get the base functionality working, and working
 correctly. Another consideration is that there's multiple ways to
 accomplish the partitioning; exposing the basic functionality without
 enforcing a given interface provides more flexibility (ie: it appears
 that you can't do list partitioning with MySQL, while you can with
 PostgreSQL).
 -- 
 Jim Nasby[EMAIL PROTECTED]
 EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] table partioning performance

2007-01-10 Thread Steven Flatt

On 1/10/07, Jim C. Nasby [EMAIL PROTECTED] wrote:


Except for the simplest partitioning cases, you'll be much better off
using a trigger on the parent table to direct inserts/updates/deletes to
the children. As a bonus, using a trigger makes it a lot more realistic
to deal with an update moving data between partitions.



In our application, data is never moved between partitions.

The problem I found with triggers is the non-robustness of the PLpgSQL
record data type.  For example, in an on insert trigger, I can't determine
the fields of the NEW record unless I hard code the column names into the
trigger.  This makes it hard to write a generic trigger, which I can use for
all our partitioned tables.  It would have been somewhat of a pain to write
a separate trigger for each of our partitioned tables.

For that and other reasons, we moved some of the insert logic up to the
application level in our product.

Steve


Re: [PERFORM] Partitioning

2007-01-10 Thread Scott Marlowe
On Wed, 2007-01-10 at 15:15, Jeremy Haile wrote:
 You can do list partitioning in MySQL:
 http://dev.mysql.com/doc/refman/5.1/en/partitioning-list.html
 
 My comment was not meant as a criticism of PostgreSQL's current state -
 I'm glad that it has partitioning.  I'm simply wondering if there are
 any plans of adopting a more user-friendly syntax in the future similar
 to MySQL partitioning support.  Having first-class citizen support of
 partitions would also allow some nice administrative GUIs and views to
 be built for managing them.  

I don't think anyone took it as a negative criticism.  Jim and I were
both more pointing out that the development process of the two projects
is somewhat different.

In MySQL a small group that doesn't necessarily interact with a large
user community sets out to implement a feature in a given time line with
a given set of requirements and they tend to ignore what they see as
esoteric requirements.

In PostgreSQL a large development community that communicates fairly
well with it's large user community put somewhat of the onus of proving
the need and doing the initial proof of concept on those who say they
need a feature, often working in a method where the chief hackers lend a
hand to someone who wants the feature so they can get a proof of concept
up and running.  And example would be the auditing / time travel in the
contrib/spi project.  After several iterations, and given the chance to
learn from the mistakes of the previous incarnations, something often
rises out of that to produce the feature needed.

Generally speaking the postgresql method takes longer, making life
harder today, but produces cleaner more easily maintained solutions,
making life easier in the future.  Meanwhile the mysql method works
faster, making life easier today, but makes compromises that might make
life harder in the future.

Something that embodies that difference is the table handler philosophy
of both databases.  PostgreSQL has the abstraction to have more than one
table handler, but in practice has exactly one table handler.  MySQL has
the ability to have many table handlers, and in fact uses many of them.

With PostgreSQL this means that things like the query parsing /
execution and the table handler are tightly coupled.  This results in
things like transactable DDL.  Sometimes this results in suggestions
being dismissed out of hand because they would have unintended
consequences.

In MySQL, because of the multiple table handlers, many compromises on
the query parsing have to be made.  The most common one being that you
can define constraints / foreign keys in a column item, and they will
simply be ignored with no error or notice.  The fk constraints have to
go at the end of the column list to be parsed and executed.

So, partitioning, being something that will touch a lot of parts of the
database, isn't gonna just show up one afternoon in pgsql.  It will
likely take a few people making proof of concept versions before a
consensus is reached and someone who has the ability codes it up.

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] High inserts, bulk deletes - autovacuum vs scheduled

2007-01-10 Thread Jeremy Haile
 BTW, that's the default values for analyze... the defaults for vacuum
 are 2x that.

Yeah - I was actually more concerned that tables would need to be
analyzed more often than I was about vacuuming too often, so I used
analyze as the example.  Since my app is inserting constantly throughout
the day and querying for recent data - I want to make sure the query
planner realizes that there are lots of rows with new timestamps on
them.  In other words, if I run a query select * from mytable where
timestamp  '9:00am' - I want to make sure it hasn't been a day since
the table was analyzed, so the planner thinks there are zero rows
greater than 9:00am today.

 What's more important
 is to make sure critical tables (such as queue tables) are getting
 vacuumed frequently so that they stay small. 

Is the best way to do that usually to lower the scale factors?  Is it
ever a good approach to lower the scale factor to zero and just set the
thresholds to a pure number of rows? (when setting it for a specific
table)

Thanks,
Jeremy Haile

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


[PERFORM] UNSUBSCRIBE

2007-01-10 Thread andrew


- Original Message - 
From: Jim C. Nasby [EMAIL PROTECTED]

To: Bob Dusek [EMAIL PROTECTED]
Cc: pgsql-performance@postgresql.org
Sent: Wednesday, January 10, 2007 2:01 PM
Subject: Re: [PERFORM] performance implications of binary placement



Are you 100% certain that both builds are using all the same libraries?
And to be an apples-apples comparison, you really need to ensure that
the datadir is on the same filesystem in both cases (that's the first
thing I'd check).

Also, that pg_index... error sounds like the second build has been
corrupted.

On Tue, Dec 26, 2006 at 03:37:47PM -0500, Bob Dusek wrote:

Hello all,

I've been running performance tests on various incantations of Postgres
on/off for a month or so.  And, I've just come across some unexpected
results.

When I start my Postgres build as such:

# (Scenario 1)

./configure --prefix=/usr --libdir=/usr/lib --bindir=/usr/bin
--includedir=/usr/include/pgsql --datadir=/usr/share/postgresql
--mandir=/usr/share/man --with-docdir=/usr/share/doc/packages
--disable-rpath --enable-thread-safety --enable-integer-datetimes
--without-python --without-perl --without-tcl --without-tk

It performs significantly worse than when I start my build like this:

# (Scenario 2)

./configure --disable-rpath --enable-thread-safety
--enable-integer-datetimes --without-python --without-perl --without-tcl
--without-tk

Note:  the only differences are that Scenario 1 includes these
options:

--prefix=/usr --libdir=/usr/lib --bindir=/usr/bin
--includedir=/usr/include/pgsql --datadir=/usr/share/postgresql
--mandir=/usr/share/man --with-docdir=/usr/share/doc/packages

And, to be clear, Scenario 1 performs worse than Scenario 2.  Simple
insert statements are taking significantly longer. 


I did not expect to see a performance hit with these options, especially
since /usr/ on the test machine is mounted as its own partition, and
in both cases, all of the binaries, include files, etc. are in that
partition. 


Has anyone seen this before?  Are hard drive mechanics the only thing in
play here?


The only difference I'm seeing in logging between the two versions is
that Scenario 2 has several of this message littered throughout the
logfile:

ERROR: could not open relation pg_index_indexrelid_index: No such file
or directory

But, that doesn't seem to be effecting functionality or performance
(especially considering the fact that the logfile that contains that
message is part of the test that is performing better).

We're using Postgres 7.4.8, building from the SLES9 Postgres 7.4.8
source rpm. 


Thanks for any help you can provide.  I can provide more detail if
needed.

Thanks again,

Bob 



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly



--
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] table partioning performance

2007-01-10 Thread Jim C. Nasby
On Wed, Jan 10, 2007 at 04:39:06PM -0500, Steven Flatt wrote:
 On 1/10/07, Jim C. Nasby [EMAIL PROTECTED] wrote:
 
 Except for the simplest partitioning cases, you'll be much better off
 using a trigger on the parent table to direct inserts/updates/deletes to
 the children. As a bonus, using a trigger makes it a lot more realistic
 to deal with an update moving data between partitions.
 
 
 In our application, data is never moved between partitions.
 
 The problem I found with triggers is the non-robustness of the PLpgSQL
 record data type.  For example, in an on insert trigger, I can't determine
 the fields of the NEW record unless I hard code the column names into the
 trigger.  This makes it hard to write a generic trigger, which I can use for
 all our partitioned tables.  It would have been somewhat of a pain to write
 a separate trigger for each of our partitioned tables.
 
 For that and other reasons, we moved some of the insert logic up to the
 application level in our product.

Yeah, I think the key there would be to produce a function that wrote
the function for you.
-- 
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [PERFORM] High inserts, bulk deletes - autovacuum vs scheduled vacuum

2007-01-10 Thread Jim C. Nasby
On Wed, Jan 10, 2007 at 04:48:42PM -0500, Jeremy Haile wrote:
  BTW, that's the default values for analyze... the defaults for vacuum
  are 2x that.
 
 Yeah - I was actually more concerned that tables would need to be
 analyzed more often than I was about vacuuming too often, so I used
 analyze as the example.  Since my app is inserting constantly throughout
 the day and querying for recent data - I want to make sure the query
 planner realizes that there are lots of rows with new timestamps on
 them.  In other words, if I run a query select * from mytable where
 timestamp  '9:00am' - I want to make sure it hasn't been a day since
 the table was analyzed, so the planner thinks there are zero rows
 greater than 9:00am today.
 
Well, analyze is pretty cheap. At most it'll read only 30,000 pages,
which shouldn't take terribly long on a decent system. So you can be a
lot more aggressive with it.

  What's more important
  is to make sure critical tables (such as queue tables) are getting
  vacuumed frequently so that they stay small. 
 
 Is the best way to do that usually to lower the scale factors?  Is it
 ever a good approach to lower the scale factor to zero and just set the
 thresholds to a pure number of rows? (when setting it for a specific
 table)

The problem is what happens if autovac goes off and starts vacuuming
some large table? While that's going on your queue table is sitting
there bloating. If you have a separate cronjob to handle the queue
table, it'll stay small, especially in 8.2.
-- 
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] High inserts, bulk deletes - autovacuum vs scheduled vacuum

2007-01-10 Thread Alvaro Herrera
Jim C. Nasby wrote:

  Is the best way to do that usually to lower the scale factors?  Is it
  ever a good approach to lower the scale factor to zero and just set the
  thresholds to a pure number of rows? (when setting it for a specific
  table)
 
 The problem is what happens if autovac goes off and starts vacuuming
 some large table? While that's going on your queue table is sitting
 there bloating. If you have a separate cronjob to handle the queue
 table, it'll stay small, especially in 8.2.

You mean at least in 8.2.  In previous releases, you could vacuum
that queue table until you were blue on the face, but it would achieve
nothing because it would consider that the dead tuples were visible to a
running transaction: that running the vacuum on the large table.  This
is an annoyance that was fixed in 8.2.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] High inserts, bulk deletes - autovacuum vs scheduled vacuum

2007-01-10 Thread Jim C. Nasby
On Thu, Jan 11, 2007 at 12:10:34AM -0300, Alvaro Herrera wrote:
 Jim C. Nasby wrote:
 
   Is the best way to do that usually to lower the scale factors?  Is it
   ever a good approach to lower the scale factor to zero and just set the
   thresholds to a pure number of rows? (when setting it for a specific
   table)
  
  The problem is what happens if autovac goes off and starts vacuuming
  some large table? While that's going on your queue table is sitting
  there bloating. If you have a separate cronjob to handle the queue
  table, it'll stay small, especially in 8.2.
 
 You mean at least in 8.2.  In previous releases, you could vacuum
 that queue table until you were blue on the face, but it would achieve
 nothing because it would consider that the dead tuples were visible to a
 running transaction: that running the vacuum on the large table.  This
 is an annoyance that was fixed in 8.2.

True, but in many environments there are other transactions that run
long enough that additional vacuums while a long vacuum was running
would still help.
-- 
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly