date:20070110

Re: [PERFORM] High inserts, bulk deletes - autovacuum vs scheduled vacuum

2007-01-10 Thread Jim C. Nasby

On Thu, Jan 11, 2007 at 12:10:34AM -0300, Alvaro Herrera wrote:
> Jim C. Nasby wrote:
> 
> > > Is the best way to do that usually to lower the scale factors?  Is it
> > > ever a good approach to lower the scale factor to zero and just set the
> > > thresholds to a pure number of rows? (when setting it for a specific
> > > table)
> > 
> > The problem is what happens if autovac goes off and starts vacuuming
> > some large table? While that's going on your queue table is sitting
> > there bloating. If you have a separate cronjob to handle the queue
> > table, it'll stay small, especially in 8.2.
> 
> You mean "at least in 8.2".  In previous releases, you could vacuum
> that queue table until you were blue on the face, but it would achieve
> nothing because it would consider that the dead tuples were visible to a
> running transaction: that running the vacuum on the large table.  This
> is an annoyance that was fixed in 8.2.

True, but in many environments there are other transactions that run
long enough that additional vacuums while a long vacuum was running
would still help.
-- 
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [PERFORM] High inserts, bulk deletes - autovacuum vs scheduled vacuum

2007-01-10 Thread Alvaro Herrera

Jim C. Nasby wrote:

> > Is the best way to do that usually to lower the scale factors?  Is it
> > ever a good approach to lower the scale factor to zero and just set the
> > thresholds to a pure number of rows? (when setting it for a specific
> > table)
> 
> The problem is what happens if autovac goes off and starts vacuuming
> some large table? While that's going on your queue table is sitting
> there bloating. If you have a separate cronjob to handle the queue
> table, it'll stay small, especially in 8.2.

You mean "at least in 8.2".  In previous releases, you could vacuum
that queue table until you were blue on the face, but it would achieve
nothing because it would consider that the dead tuples were visible to a
running transaction: that running the vacuum on the large table.  This
is an annoyance that was fixed in 8.2.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [PERFORM] High inserts, bulk deletes - autovacuum vs scheduled vacuum

2007-01-10 Thread Jim C. Nasby

On Wed, Jan 10, 2007 at 04:48:42PM -0500, Jeremy Haile wrote:
> > BTW, that's the default values for analyze... the defaults for vacuum
> > are 2x that.
> 
> Yeah - I was actually more concerned that tables would need to be
> analyzed more often than I was about vacuuming too often, so I used
> analyze as the example.  Since my app is inserting constantly throughout
> the day and querying for "recent" data - I want to make sure the query
> planner realizes that there are lots of rows with new timestamps on
> them.  In other words, if I run a query "select * from mytable where
> timestamp > '9:00am'" - I want to make sure it hasn't been a day since
> the table was analyzed, so the planner thinks there are zero rows
> greater than 9:00am today.
 
Well, analyze is pretty cheap. At most it'll read only 30,000 pages,
which shouldn't take terribly long on a decent system. So you can be a
lot more aggressive with it.

> > What's more important
> > is to make sure critical tables (such as queue tables) are getting
> > vacuumed frequently so that they stay small. 
> 
> Is the best way to do that usually to lower the scale factors?  Is it
> ever a good approach to lower the scale factor to zero and just set the
> thresholds to a pure number of rows? (when setting it for a specific
> table)

The problem is what happens if autovac goes off and starts vacuuming
some large table? While that's going on your queue table is sitting
there bloating. If you have a separate cronjob to handle the queue
table, it'll stay small, especially in 8.2.
-- 
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [PERFORM] table partioning performance

2007-01-10 Thread Jim C. Nasby

On Wed, Jan 10, 2007 at 04:39:06PM -0500, Steven Flatt wrote:
> On 1/10/07, Jim C. Nasby <[EMAIL PROTECTED]> wrote:
> >
> >Except for the simplest partitioning cases, you'll be much better off
> >using a trigger on the parent table to direct inserts/updates/deletes to
> >the children. As a bonus, using a trigger makes it a lot more realistic
> >to deal with an update moving data between partitions.
> 
> 
> In our application, data is never moved between partitions.
> 
> The problem I found with triggers is the non-robustness of the PLpgSQL
> record data type.  For example, in an "on insert" trigger, I can't determine
> the fields of the NEW record unless I hard code the column names into the
> trigger.  This makes it hard to write a generic trigger, which I can use for
> all our partitioned tables.  It would have been somewhat of a pain to write
> a separate trigger for each of our partitioned tables.
> 
> For that and other reasons, we moved some of the insert logic up to the
> application level in our product.

Yeah, I think the key there would be to produce a function that wrote
the function for you.
-- 
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

[PERFORM] UNSUBSCRIBE

2007-01-10 Thread andrew

- Original Message - 
From: "Jim C. Nasby" <[EMAIL PROTECTED]>

To: "Bob Dusek" <[EMAIL PROTECTED]>
Cc: 
Sent: Wednesday, January 10, 2007 2:01 PM
Subject: Re: [PERFORM] performance implications of binary placement

Are you 100% certain that both builds are using all the same libraries?
And to be an apples-apples comparison, you really need to ensure that
the datadir is on the same filesystem in both cases (that's the first
thing I'd check).

Also, that pg_index... error sounds like the second build has been
corrupted.

On Tue, Dec 26, 2006 at 03:37:47PM -0500, Bob Dusek wrote:

Hello all,

I've been running performance tests on various incantations of Postgres
on/off for a month or so.  And, I've just come across some unexpected
results.

When I start my Postgres build as such:

# (Scenario 1)

./configure --prefix=/usr --libdir=/usr/lib --bindir=/usr/bin
--includedir=/usr/include/pgsql --datadir=/usr/share/postgresql
--mandir=/usr/share/man --with-docdir=/usr/share/doc/packages
--disable-rpath --enable-thread-safety --enable-integer-datetimes
--without-python --without-perl --without-tcl --without-tk

It performs significantly worse than when I start my build like this:

# (Scenario 2)

./configure --disable-rpath --enable-thread-safety
--enable-integer-datetimes --without-python --without-perl --without-tcl
--without-tk

Note:  the only differences are that "Scenario 1" includes these
options:

--prefix=/usr --libdir=/usr/lib --bindir=/usr/bin
--includedir=/usr/include/pgsql --datadir=/usr/share/postgresql
--mandir=/usr/share/man --with-docdir=/usr/share/doc/packages

And, to be clear, "Scenario 1" performs worse than "Scenario 2".  Simple
insert statements are taking significantly longer. 

I did not expect to see a performance hit with these options, especially
since "/usr/" on the test machine is mounted as its own partition, and
in both cases, all of the binaries, include files, etc. are in that
partition. 

Has anyone seen this before?  Are hard drive mechanics the only thing in
play here?

The only difference I'm seeing in logging between the two versions is
that Scenario 2 has several of this message littered throughout the
logfile:

ERROR: could not open relation "pg_index_indexrelid_index": No such file
or directory

But, that doesn't seem to be effecting functionality or performance
(especially considering the fact that the logfile that contains that
message is part of the test that is performing better).

We're using Postgres 7.4.8, building from the SLES9 Postgres 7.4.8
source rpm. 

Thanks for any help you can provide.  I can provide more detail if
needed.

Thanks again,

Bob 

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

--
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [PERFORM] High inserts, bulk deletes - autovacuum vs scheduled

2007-01-10 Thread Jeremy Haile

> BTW, that's the default values for analyze... the defaults for vacuum
> are 2x that.

Yeah - I was actually more concerned that tables would need to be
analyzed more often than I was about vacuuming too often, so I used
analyze as the example.  Since my app is inserting constantly throughout
the day and querying for "recent" data - I want to make sure the query
planner realizes that there are lots of rows with new timestamps on
them.  In other words, if I run a query "select * from mytable where
timestamp > '9:00am'" - I want to make sure it hasn't been a day since
the table was analyzed, so the planner thinks there are zero rows
greater than 9:00am today.

> What's more important
> is to make sure critical tables (such as queue tables) are getting
> vacuumed frequently so that they stay small. 

Is the best way to do that usually to lower the scale factors?  Is it
ever a good approach to lower the scale factor to zero and just set the
thresholds to a pure number of rows? (when setting it for a specific
table)

Thanks,
Jeremy Haile

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [PERFORM] Partitioning

2007-01-10 Thread Scott Marlowe

On Wed, 2007-01-10 at 15:15, Jeremy Haile wrote:
> You can do list partitioning in MySQL:
> http://dev.mysql.com/doc/refman/5.1/en/partitioning-list.html
> 
> My comment was not meant as a criticism of PostgreSQL's current state -
> I'm glad that it has partitioning.  I'm simply wondering if there are
> any plans of adopting a more user-friendly syntax in the future similar
> to MySQL partitioning support.  Having first-class citizen support of
> partitions would also allow some nice administrative GUIs and views to
> be built for managing them.  

I don't think anyone took it as a negative criticism.  Jim and I were
both more pointing out that the development process of the two projects
is somewhat different.

In MySQL a small group that doesn't necessarily interact with a large
user community sets out to implement a feature in a given time line with
a given set of requirements and they tend to ignore what they see as
esoteric requirements.

In PostgreSQL a large development community that communicates fairly
well with it's large user community put somewhat of the onus of proving
the need and doing the initial proof of concept on those who say they
need a feature, often working in a method where the chief hackers lend a
hand to someone who wants the feature so they can get a proof of concept
up and running.  And example would be the auditing / time travel in the
contrib/spi project.  After several iterations, and given the chance to
learn from the mistakes of the previous incarnations, something often
rises out of that to produce the feature needed.

Generally speaking the postgresql method takes longer, making life
harder today, but produces cleaner more easily maintained solutions,
making life easier in the future.  Meanwhile the mysql method works
faster, making life easier today, but makes compromises that might make
life harder in the future.

Something that embodies that difference is the table handler philosophy
of both databases.  PostgreSQL has the abstraction to have more than one
table handler, but in practice has exactly one table handler.  MySQL has
the ability to have many table handlers, and in fact uses many of them.

With PostgreSQL this means that things like the query parsing /
execution and the table handler are tightly coupled.  This results in
things like transactable DDL.  Sometimes this results in suggestions
being dismissed out of hand because they would have unintended
consequences.

In MySQL, because of the multiple table handlers, many compromises on
the query parsing have to be made.  The most common one being that you
can define constraints / foreign keys in a column item, and they will
simply be ignored with no error or notice.  The fk constraints have to
go at the end of the column list to be parsed and executed.

So, partitioning, being something that will touch a lot of parts of the
database, isn't gonna just show up one afternoon in pgsql.  It will
likely take a few people making proof of concept versions before a
consensus is reached and someone who has the ability codes it up.

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [PERFORM] table partioning performance

2007-01-10 Thread Steven Flatt


On 1/10/07, Jim C. Nasby <[EMAIL PROTECTED]> wrote:


Except for the simplest partitioning cases, you'll be much better off
using a trigger on the parent table to direct inserts/updates/deletes to
the children. As a bonus, using a trigger makes it a lot more realistic
to deal with an update moving data between partitions.



In our application, data is never moved between partitions.

The problem I found with triggers is the non-robustness of the PLpgSQL
record data type.  For example, in an "on insert" trigger, I can't determine
the fields of the NEW record unless I hard code the column names into the
trigger.  This makes it hard to write a generic trigger, which I can use for
all our partitioned tables.  It would have been somewhat of a pain to write
a separate trigger for each of our partitioned tables.

For that and other reasons, we moved some of the insert logic up to the
application level in our product.

Steve

Re: [PERFORM] Partitioning

2007-01-10 Thread Jeremy Haile

You can do list partitioning in MySQL:
http://dev.mysql.com/doc/refman/5.1/en/partitioning-list.html

My comment was not meant as a criticism of PostgreSQL's current state -
I'm glad that it has partitioning.  I'm simply wondering if there are
any plans of adopting a more user-friendly syntax in the future similar
to MySQL partitioning support.  Having first-class citizen support of
partitions would also allow some nice administrative GUIs and views to
be built for managing them.  

Jeremy Haile


On Wed, 10 Jan 2007 15:09:31 -0600, "Jim C. Nasby" <[EMAIL PROTECTED]> said:
> On Wed, Jan 10, 2007 at 03:28:00PM -0500, Jeremy Haile wrote:
> > This seems so much more intuitive and simpler than what is required to
> > set it up in PostgreSQL.  Does PostgreSQL's approach to table
> > partitioning have any advantage over MySQL?  Is a "nicer" syntax planned
> > for Postgres?
> 
> The focus was to get the base functionality working, and working
> correctly. Another consideration is that there's multiple ways to
> accomplish the partitioning; exposing the basic functionality without
> enforcing a given interface provides more flexibility (ie: it appears
> that you can't do list partitioning with MySQL, while you can with
> PostgreSQL).
> -- 
> Jim Nasby[EMAIL PROTECTED]
> EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [PERFORM] Partitioning

2007-01-10 Thread Scott Marlowe

On Wed, 2007-01-10 at 15:09, Jim C. Nasby wrote:
> On Wed, Jan 10, 2007 at 03:28:00PM -0500, Jeremy Haile wrote:
> > This seems so much more intuitive and simpler than what is required to
> > set it up in PostgreSQL.  Does PostgreSQL's approach to table
> > partitioning have any advantage over MySQL?  Is a "nicer" syntax planned
> > for Postgres?
> 
> The focus was to get the base functionality working, and working
> correctly. Another consideration is that there's multiple ways to
> accomplish the partitioning; exposing the basic functionality without
> enforcing a given interface provides more flexibility (ie: it appears
> that you can't do list partitioning with MySQL, while you can with
> PostgreSQL).

And I don't think the mysql partition supports tablespaces either.

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [PERFORM] High inserts, bulk deletes - autovacuum vs scheduled vacuum

2007-01-10 Thread Jim C. Nasby

Please cc the list so others can help and learn.

On Wed, Jan 10, 2007 at 03:43:00PM -0500, Jeremy Haile wrote:
> > I'd suggest trying autovacuum and see how it does (though you might want
> > to tune it to be more or less aggressive, and you'll probably want to
> > enable the cost delay).
> 
> What are some decent default values for the cost delay vacuum settings? 
> I haven't used these before.

I find that simply setting vacuum_cost_delay to 20 is generally a good
starting point. I'll usually do that and then run a vacuum while
watching disk activity; I try and tune it so that the disk is ~90%
utilized with vacuum running. That allows a safety margin without
stretching vacuums out forever.

> Also - do the default autovacuum settings make sense for tables on the
> scale of 10 million rows?  For example, using the defaults it would
> require about a  million rows (250 + 0.1 * 10 million) to be
> inserted/updated/deleted before analyzing - which  seems high.  (about 2
> million for vacuum)  Or am I overestimating how often I would need to
> vacuum/analyze these tables?

Depends on your application... the way I look at it is that a setting of
0.1 means 10% dead space in the table. While 5% or 1% would be better,
you hit a point of diminishing returns since you have to read the entire
table and it's indexes to vacuum it.

BTW, that's the default values for analyze... the defaults for vacuum
are 2x that.

> Do most people use the default autovacuum settings successfully, or are
> they usually modified?

I generally use the 8.2 defaults (which are much better than the 8.1
defaults) unless I'm really trying to tune things. What's more important
is to make sure critical tables (such as queue tables) are getting
vacuumed frequently so that they stay small. (Of course you also need to
ensure there's no long running transactions).
-- 
Jim C. Nasby, Database Architect   [EMAIL PROTECTED]
512.569.9461 (cell) http://jim.nasby.net

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [PERFORM] Partitioning

2007-01-10 Thread Jim C. Nasby

On Wed, Jan 10, 2007 at 03:28:00PM -0500, Jeremy Haile wrote:
> This seems so much more intuitive and simpler than what is required to
> set it up in PostgreSQL.  Does PostgreSQL's approach to table
> partitioning have any advantage over MySQL?  Is a "nicer" syntax planned
> for Postgres?

The focus was to get the base functionality working, and working
correctly. Another consideration is that there's multiple ways to
accomplish the partitioning; exposing the basic functionality without
enforcing a given interface provides more flexibility (ie: it appears
that you can't do list partitioning with MySQL, while you can with
PostgreSQL).
-- 
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [PERFORM] table partioning performance

2007-01-10 Thread Steven Flatt


On 1/9/07, Simon Riggs <[EMAIL PROTECTED]> wrote:


If you are doing date range partitioning it should be fairly simple to
load data into the latest table directly. That was the way I originally
intended for it to be used. The rules approach isn't something I'd
recommend as a bulk loading option and its a lot more complex anyway.


The problem we have with blindly loading all data into the latest table is
that some data (< 5%, possibly even much less) is actually delivered "late"
and belongs in earlier partitions.  So we still needed the ability to send
data to an arbitrary partition.

Steve

Re: [PERFORM] Performance of PostgreSQL on Windows vs Linux

2007-01-10 Thread Scott Marlowe

On Wed, 2007-01-10 at 14:15, Jim C. Nasby wrote:
> On Wed, Jan 03, 2007 at 12:24:24PM -0500, Jeremy Haile wrote:
> > I am sure that this has been discussed before, but I can't seem to find
> > any recent posts. (I am running PostgreSQL 8.2)
> > 
> > I have always ran PostgreSQL on Linux in the past, but the company I am
> > currently working for uses Windows on all of their servers.  I don't
> > have the luxury right now of running my own benchmarks on the two OSes,
> > but wanted to know if anyone else has done a performance comparison.  Is
> > there any significant differences?
> 
> One thing to consider... I've seen a case or two where pgbench running
> on windows with HyperThreading enabled was actually faster than with it
> turned off. (General experience has been that HT hurts PostgreSQL). I
> suspect that the windows kernel may have features that allow it to
> better utilize HT than linux.

I've also seen a few comments in perform (and elsewhere) in the past
that newer linux kernels seem to handle HT better than older ones, and
also might give better numbers for certain situations.

Note that you should really test with a wide variety of loads (i.e. a
lot of parallel loads, a few etc...) to see what the curve looks like. 
If HT gets you 10% gain on 4 or fewer clients, but 20% slower with 8
clients, then hyperthreading might be a not so good choice.

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [PERFORM] High inserts, bulk deletes - autovacuum vs scheduled vacuum

2007-01-10 Thread Jim C. Nasby

On Tue, Jan 09, 2007 at 12:26:41PM -0500, Jeremy Haile wrote:
> I am developing an application that has very predictable database
> operations:
>   -inserts several thousand rows into 3 tables every 5 minutes. (table
>   contain around 10 million rows each)
>   -truncates and rebuilds aggregate tables of this data every 5 minutes.
>   (several thousand rows each)
>   -regular reads of aggregate table and sometimes large tables by user
>   interaction
>   -every night, hundreds of thousands of rows are deleted from these 3
>   tables (old data)
>   -20-30 other tables get inserted/updated slowly throughout the day
> 
> In order to optimize performance of the inserts, I disabled
> autovacuum/row-level stats and instead run "vacuum analyze" on the whole
> DB every hour.  However this operation takes around 20 minutes of each
> hour.  This means that the database is involved in vacuum/analyzing
> tables 33% of the time.
> 
> I'd like any performance advice, but my main concern is the amount of
> time vacuum/analyze runs and its possible impact on the overall DB
> performance.  Thanks!

If much of the data in the database isn't changing that often, then why
continually re-vacuum the whole thing?

I'd suggest trying autovacuum and see how it does (though you might want
to tune it to be more or less aggressive, and you'll probably want to
enable the cost delay).

The only cases where manual vacuum makes sense to me is if you've got a
defined slow period and vacuuming during that slow period is still
frequent enough to keep up with demand, or if you've got tables that
have a very high churn rate and need to be kept small. In the later
case, I'll usually setup a cronjob to vacuum those tables once a minute
with no cost delay. I'm sure there might be some other cases where not
using autovac might make sense, but generally I'd much rather let
autovac worry about this so I don't have to.

> I am running 8.2 (will be 8.2.1 soon).  The box is Windows with 2GB RAM
> connected to a SAN over fiber.  The data and pg_xlog are on separate
> partitions.  
> 
> Modified configuration:
> effective_cache_size = 1000MB
> random_page_cost = 3
> default_statistics_target = 50
> maintenance_work_mem = 256MB
> shared_buffers = 400MB
> temp_buffers = 10MB
> work_mem = 10MB
> max_fsm_pages = 150

One other useful manual vacuum to consider is running vacuumdb -av
periodically (say, once a month) and looking at the last few lines of
output. That will give you a good idea on how large you should set
max_fsm_pages. Running the output of vacuumdb -av through pgFouine will
give you other useful data.

> checkpoint_segments = 30
> stats_row_level = off
> stats_start_collector = off

Unless you're really trying to get the last ounce of performance out,
it's probably not worth turning those stats settings off.
-- 
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [PERFORM] Performance of PostgreSQL on Windows vs Linux

2007-01-10 Thread Jeremy Haile

Hey Jim - 

Thanks for the feedback.  The server has dual Xeons with HyperThreading
enabled - so perhaps I should try disabling it.  How much performance
boost have you seen by disabling it?  Of course, the bottleneck in my
case is more on the I/O or RAM side, not the CPU side.

Jeremy Haile


On Wed, 10 Jan 2007 14:15:26 -0600, "Jim C. Nasby" <[EMAIL PROTECTED]> said:
> On Wed, Jan 03, 2007 at 12:24:24PM -0500, Jeremy Haile wrote:
> > I am sure that this has been discussed before, but I can't seem to find
> > any recent posts. (I am running PostgreSQL 8.2)
> > 
> > I have always ran PostgreSQL on Linux in the past, but the company I am
> > currently working for uses Windows on all of their servers.  I don't
> > have the luxury right now of running my own benchmarks on the two OSes,
> > but wanted to know if anyone else has done a performance comparison.  Is
> > there any significant differences?
> 
> One thing to consider... I've seen a case or two where pgbench running
> on windows with HyperThreading enabled was actually faster than with it
> turned off. (General experience has been that HT hurts PostgreSQL). I
> suspect that the windows kernel may have features that allow it to
> better utilize HT than linux.
> 
> Of course if you don't have HT... it doesn't matter. :)
> -- 
> Jim Nasby[EMAIL PROTECTED]
> EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [PERFORM] High update activity, PostgreSQL vs BigDBMS

2007-01-10 Thread Jeff Frost


On Wed, 10 Jan 2007, Jim C. Nasby wrote:


RAID1 for those of you who have been wondering if the BBU write back cache
mitigates the need for separate WAL (at least on this workload).  Those
are
the fastest times for each config, but ext2 WAL was always faster than the
other two options.  I didn't test any other filesystems in this go around.


Uh, if I'm reading this correctly, you're saying that WAL on a separate
ext2 vs. one big ext3 with data=writeback saved ~39 seconds out of
~158.5 minutes, or 0.4%? Is that even above the noise for your
measurements? I suspect the phase of the moon might play a bigger role
;P


That's what I thought too...cept I ran it 20 times and ext2 won by that
margin every time, so it was quite repeatable. :-/


Even so, you've got to really be hunting for performance to go through
the hassle of different filesystems just to gain 0.4%... :)


Indeed, but actually, I did the math again and it appears that it saves close 
to 2 minutes versus one big ext3.  I guess the moral of the story is that 
having a separate pg_xlog even on the same physical volume tends to be 
slightly faster for write oriented workloads.  Ext2 is slightly faster than 
ext3, but of course you could likely go with another filesystem yet and be 
even slightly faster as well. :-)


I guess the real moral of the story is that you can probably use one big ext3 
with the default config and it won't matter much more than 1-2% if you have a 
BBU.


--
Jeff Frost, Owner   <[EMAIL PROTECTED]>
Frost Consulting, LLC   http://www.frostconsultingllc.com/
Phone: 650-780-7908 FAX: 650-649-1954

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate

Re: [PERFORM] Partitioning

2007-01-10 Thread Jeremy Haile

I really wish that PostgreSQL supported a "nice" partitioning syntax
like MySQL has.  

Here is an example:
CREATE TABLE tr (id INT, name VARCHAR(50), purchased DATE)
  PARTITION BY RANGE( YEAR(purchased) ) (
  PARTITION p0 VALUES LESS THAN (1990),
  PARTITION p1 VALUES LESS THAN (1995),
  PARTITION p2 VALUES LESS THAN (2000),
  PARTITION p3 VALUES LESS THAN (2005)
);

And to drop a partition:
ALTER TABLE tr DROP PARTITION p2;


This seems so much more intuitive and simpler than what is required to
set it up in PostgreSQL.  Does PostgreSQL's approach to table
partitioning have any advantage over MySQL?  Is a "nicer" syntax planned
for Postgres?


On Wed, 10 Jan 2007 14:20:06 -0600, "Jim C. Nasby" <[EMAIL PROTECTED]> said:
> BTW, someone coming up with a set of functions to handle partitioning
> for the general 'partition by time' case would make a GREAT project on
> pgFoundry.
> 
> On Fri, Jan 05, 2007 at 12:47:08PM +0100, Mikael Carneholm wrote:
> > Take a look at the set of partitioning functions I wrote shortly after
> > the 8.1 release:
> > 
> > http://www.studenter.hb.se/~arch/files/part_functions.sql
> > 
> > You could probably work something out using those functions (as-is, or
> > as inspiration) together with pgAgent
> > (http://www.pgadmin.org/docs/1.4/pgagent.html)
> > 
> > /Mikael
> > 
> > > -Original Message-
> > > From: [EMAIL PROTECTED]
> > [mailto:pgsql-performance-
> > > [EMAIL PROTECTED] On Behalf Of Arnau
> > > Sent: den 5 januari 2007 12:02
> > > To: pgsql-performance@postgresql.org
> > > Subject: [PERFORM] Partitioning
> > > 
> > > Hi all,
> > > 
> > >I'm not sure if this question fits in the topic of this list.
> > > 
> > >I'm interested in partitioning and it's the first time I'd use it.
> > > There is an issue I don't know how you handle it. Lets say I'm
> > > interested in store monthly based statistical data like the example of
> > > http://www.postgresql.org/docs/8.2/static/ddl-partitioning.html. What
> > I
> > > don't like of this approach is that the monthly tables, rules... must
> > be
> > > created "manually" or at least I haven't found any other option.
> > > 
> > >My question is how do you manage this? do you have a cron task that
> > > creates automatically these monthly elements (tables, rules, ... ) or
> > > there is another approach that doesn't require external things like
> > cron
> > >   only PostgreSQL.
> > > --
> > > Arnau
> > > 
> > > ---(end of
> > broadcast)---
> > > TIP 5: don't forget to increase your free space map settings
> > 
> > 
> > ---(end of broadcast)---
> > TIP 9: In versions below 8.0, the planner will ignore your desire to
> >choose an index scan if your joining column's datatypes do not
> >match
> > 
> 
> -- 
> Jim Nasby[EMAIL PROTECTED]
> EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)
> 
> ---(end of broadcast)---
> TIP 1: if posting/reading through Usenet, please send an appropriate
>subscribe-nomail command to [EMAIL PROTECTED] so that your
>message can get through to the mailing list cleanly

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [PERFORM] Partitioning

2007-01-10 Thread Erik Jones


On Fri, Jan 05, 2007 at 12:47:08PM +0100, Mikael Carneholm wrote:

Take a look at the set of partitioning functions I wrote shortly after
the 8.1 release:

http://www.studenter.hb.se/~arch/files/part_functions.sql

You could probably work something out using those functions (as-is, or
as inspiration) together with pgAgent
(http://www.pgadmin.org/docs/1.4/pgagent.html)

/Mikael


Those are  some great functions.

--
erik jones <[EMAIL PROTECTED]>
software development
emma(r)


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq

Re: [PERFORM] High update activity, PostgreSQL vs BigDBMS

2007-01-10 Thread Jim C. Nasby

On Tue, Jan 09, 2007 at 09:10:51AM -0800, Jeff Frost wrote:
> On Tue, 9 Jan 2007, Jim C. Nasby wrote:
> 
> >On Thu, Dec 28, 2006 at 02:15:31PM -0800, Jeff Frost wrote:
> >>When benchmarking various options for a new PG server at one of my 
> >>clients,
> >>I tried ext2 and ext3 (data=writeback) for the WAL and it appeared to be
> >>fastest to have ext2 for the WAL.  The winning time was 157m46.713s for
> >>ext2, 159m47.098s for combined ext3 data/xlog and 158m25.822s for ext3
> >>data=writeback.  This was on an 8x150GB Raptor RAID10 on an Areca 1130 w/
> >>1GB BBU cache.  This config benched out faster than a 6disk RAID10 + 2 
> >>disk
> >>RAID1 for those of you who have been wondering if the BBU write back cache
> >>mitigates the need for separate WAL (at least on this workload).  Those 
> >>are
> >>the fastest times for each config, but ext2 WAL was always faster than the
> >>other two options.  I didn't test any other filesystems in this go around.
> >
> >Uh, if I'm reading this correctly, you're saying that WAL on a separate
> >ext2 vs. one big ext3 with data=writeback saved ~39 seconds out of
> >~158.5 minutes, or 0.4%? Is that even above the noise for your
> >measurements? I suspect the phase of the moon might play a bigger role
> >;P
> 
> That's what I thought too...cept I ran it 20 times and ext2 won by that 
> margin every time, so it was quite repeatable. :-/

Even so, you've got to really be hunting for performance to go through
the hassle of different filesystems just to gain 0.4%... :)
-- 
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [PERFORM] table partioning performance

2007-01-10 Thread Jim C. Nasby

On Mon, Jan 08, 2007 at 03:02:24PM -0500, Steven Flatt wrote:
> We use partitioned tables extensively and we have observed linear
> performance degradation on inserts as the number of rules on the master
> table grows (i.e. number of rules = number of partitions).  We had to come
> up with a solution that didn't have a rule per partition on the master
> table.  Just wondering if you are observing the same thing.

Except for the simplest partitioning cases, you'll be much better off
using a trigger on the parent table to direct inserts/updates/deletes to
the children. As a bonus, using a trigger makes it a lot more realistic
to deal with an update moving data between partitions.
-- 
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Re: [PERFORM] Partitioning

2007-01-10 Thread Jim C. Nasby

BTW, someone coming up with a set of functions to handle partitioning
for the general 'partition by time' case would make a GREAT project on
pgFoundry.

On Fri, Jan 05, 2007 at 12:47:08PM +0100, Mikael Carneholm wrote:
> Take a look at the set of partitioning functions I wrote shortly after
> the 8.1 release:
> 
> http://www.studenter.hb.se/~arch/files/part_functions.sql
> 
> You could probably work something out using those functions (as-is, or
> as inspiration) together with pgAgent
> (http://www.pgadmin.org/docs/1.4/pgagent.html)
> 
> /Mikael
> 
> > -Original Message-
> > From: [EMAIL PROTECTED]
> [mailto:pgsql-performance-
> > [EMAIL PROTECTED] On Behalf Of Arnau
> > Sent: den 5 januari 2007 12:02
> > To: pgsql-performance@postgresql.org
> > Subject: [PERFORM] Partitioning
> > 
> > Hi all,
> > 
> >I'm not sure if this question fits in the topic of this list.
> > 
> >I'm interested in partitioning and it's the first time I'd use it.
> > There is an issue I don't know how you handle it. Lets say I'm
> > interested in store monthly based statistical data like the example of
> > http://www.postgresql.org/docs/8.2/static/ddl-partitioning.html. What
> I
> > don't like of this approach is that the monthly tables, rules... must
> be
> > created "manually" or at least I haven't found any other option.
> > 
> >My question is how do you manage this? do you have a cron task that
> > creates automatically these monthly elements (tables, rules, ... ) or
> > there is another approach that doesn't require external things like
> cron
> >   only PostgreSQL.
> > --
> > Arnau
> > 
> > ---(end of
> broadcast)---
> > TIP 5: don't forget to increase your free space map settings
> 
> 
> ---(end of broadcast)---
> TIP 9: In versions below 8.0, the planner will ignore your desire to
>choose an index scan if your joining column's datatypes do not
>match
> 

-- 
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [PERFORM] PostgreSQL to host e-mail?

2007-01-10 Thread Jim C. Nasby

On Fri, Jan 05, 2007 at 01:15:44PM -0500, Reid Thompson wrote:
> On Fri, 2007-01-05 at 04:10 +0100, Grega Bremec wrote:
> > he main reason I'm writing this mail though, is to suggest you take a
> > look
> > at Oryx, http://www.oryx.com/; They used to have this product called
> > Mailstore, which was designed to be a mail store using PostgreSQL as a
> > backend, and has since evolved to a bit more than just that, it seems.
> > Perhaps it could be of help to you while building your system, and I'm
> > sure
> > the people at Oryx will be glad to hear from you while, and after
> > you've
> > built your system.
> > 
> > Kind regards,
> > --
> > ~Grega Bremec 
> re above...
> http://www.archiveopteryx.org/1.10.html 

You should also look at http://dbmail.org/ , which runs on several
databases (PostgreSQL included).
-- 
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [PERFORM] Performance of PostgreSQL on Windows vs Linux

2007-01-10 Thread Jim C. Nasby

On Wed, Jan 03, 2007 at 12:24:24PM -0500, Jeremy Haile wrote:
> I am sure that this has been discussed before, but I can't seem to find
> any recent posts. (I am running PostgreSQL 8.2)
> 
> I have always ran PostgreSQL on Linux in the past, but the company I am
> currently working for uses Windows on all of their servers.  I don't
> have the luxury right now of running my own benchmarks on the two OSes,
> but wanted to know if anyone else has done a performance comparison.  Is
> there any significant differences?

One thing to consider... I've seen a case or two where pgbench running
on windows with HyperThreading enabled was actually faster than with it
turned off. (General experience has been that HT hurts PostgreSQL). I
suspect that the windows kernel may have features that allow it to
better utilize HT than linux.

Of course if you don't have HT... it doesn't matter. :)
-- 
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [PERFORM] More 8.2 client issues (Was: [Slow dump?)

2007-01-10 Thread Jim C. Nasby

On Wed, Jan 03, 2007 at 11:56:20AM -0500, Tom Lane wrote:
> Erik Jones <[EMAIL PROTECTED]> writes:
> > Tom Lane wrote:
> >> I could see this taking an unreasonable amount of time if you had a huge
> >> number of pg_class rows or a very long search_path --- is your database
> >> at all out of the ordinary in those ways?
> >> 
> > Well, running "select count(*) from pg_class;" returns 524699 rows
> 
> Ouch.
> 
> > our search path is the default.  I'd also like to reiterate that \d 
> > pg_class returns instantly when run from the 8.1.4 psql client connected 
> > to the 8.2 db.
> 
> I think I know where the problem is: would you compare timing of
> 
>   select * from pg_class where c.relname ~ '^(pg_class)$';
> 
>   select * from pg_class where c.relname ~ '^pg_class$';
> 
> Recent versions of psql put parentheses into the regex pattern for
> safety in case it's got "|", but I just realized that that probably
> confuses the optimizer's check for an indexable regex :-(
> 
> However, this only explains slowdown in psql's \d commands, which
> wasn't your original complaint ...

On the other hand, with 500k relations pg_dump is presumably going to be
doing a lot of querying of the catalog tables, so if it uses similar
queries...
-- 
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [PERFORM] performance implications of binary placement

2007-01-10 Thread Jim C. Nasby

Are you 100% certain that both builds are using all the same libraries?
And to be an apples-apples comparison, you really need to ensure that
the datadir is on the same filesystem in both cases (that's the first
thing I'd check).

Also, that pg_index... error sounds like the second build has been
corrupted.

On Tue, Dec 26, 2006 at 03:37:47PM -0500, Bob Dusek wrote:
> Hello all,
> 
> I've been running performance tests on various incantations of Postgres
> on/off for a month or so.  And, I've just come across some unexpected
> results.
> 
> When I start my Postgres build as such:
> 
> # (Scenario 1)
> 
> ./configure --prefix=/usr --libdir=/usr/lib --bindir=/usr/bin
> --includedir=/usr/include/pgsql --datadir=/usr/share/postgresql
> --mandir=/usr/share/man --with-docdir=/usr/share/doc/packages
> --disable-rpath --enable-thread-safety --enable-integer-datetimes
> --without-python --without-perl --without-tcl --without-tk
> 
> It performs significantly worse than when I start my build like this:
> 
> # (Scenario 2)
> 
> ./configure --disable-rpath --enable-thread-safety
> --enable-integer-datetimes --without-python --without-perl --without-tcl
> --without-tk
> 
> Note:  the only differences are that "Scenario 1" includes these
> options:
> 
> --prefix=/usr --libdir=/usr/lib --bindir=/usr/bin
> --includedir=/usr/include/pgsql --datadir=/usr/share/postgresql
> --mandir=/usr/share/man --with-docdir=/usr/share/doc/packages
> 
> And, to be clear, "Scenario 1" performs worse than "Scenario 2".  Simple
> insert statements are taking significantly longer. 
> 
> I did not expect to see a performance hit with these options, especially
> since "/usr/" on the test machine is mounted as its own partition, and
> in both cases, all of the binaries, include files, etc. are in that
> partition. 
> 
> Has anyone seen this before?  Are hard drive mechanics the only thing in
> play here?
> 
> The only difference I'm seeing in logging between the two versions is
> that Scenario 2 has several of this message littered throughout the
> logfile:
> 
> ERROR: could not open relation "pg_index_indexrelid_index": No such file
> or directory
> 
> But, that doesn't seem to be effecting functionality or performance
> (especially considering the fact that the logfile that contains that
> message is part of the test that is performing better).
> 
> We're using Postgres 7.4.8, building from the SLES9 Postgres 7.4.8
> source rpm. 
> 
> Thanks for any help you can provide.  I can provide more detail if
> needed.
> 
> Thanks again,
> 
> Bob 
> 
> 
> ---(end of broadcast)---
> TIP 1: if posting/reading through Usenet, please send an appropriate
>subscribe-nomail command to [EMAIL PROTECTED] so that your
>message can get through to the mailing list cleanly
> 

-- 
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [PERFORM] Slow inner join, but left join is fast

2007-01-10 Thread Jeremy Haile

I did create and drop an index at some point while looking at this
issue.  But I definitely reran both of the queries (and explains) after
the index was dropped, so I don't understand why there would be a
difference between the inner and left query plans.  (which were run
back-to-back more than once)  Anyways - I'll let you know if something
similar happens again.

Thanks,
Jeremy Haile 


On Wed, 10 Jan 2007 14:22:35 -0500, "Tom Lane" <[EMAIL PROTECTED]> said:
> "Jeremy Haile" <[EMAIL PROTECTED]> writes:
> > Another random idea - does PostgreSQL do any caching of query plans?
> 
> Only if the client specifies it, either by PREPARE or the equivalent
> protocol-level message.  I dunno what client software you were using,
> but I think few if any would PREPARE behind your back.  Might be worth
> checking into though, if you've eliminated autovacuum.
> 
> Actually there's another possibility --- did you create any indexes on
> the table in between?  CREATE INDEX doesn't do a full stats update, but
> it does count the rows and update pg_class.reltuples.  But it's hard to
> believe that'd have caused as big a rowcount shift as we see here ...
> 
>   regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [PERFORM] Slow inner join, but left join is fast

2007-01-10 Thread Tom Lane

"Jeremy Haile" <[EMAIL PROTECTED]> writes:
> Another random idea - does PostgreSQL do any caching of query plans?

Only if the client specifies it, either by PREPARE or the equivalent
protocol-level message.  I dunno what client software you were using,
but I think few if any would PREPARE behind your back.  Might be worth
checking into though, if you've eliminated autovacuum.

Actually there's another possibility --- did you create any indexes on
the table in between?  CREATE INDEX doesn't do a full stats update, but
it does count the rows and update pg_class.reltuples.  But it's hard to
believe that'd have caused as big a rowcount shift as we see here ...

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [PERFORM] Slow inner join, but left join is fast

2007-01-10 Thread Jeremy Haile

Another random idea - does PostgreSQL do any caching of query plans?
even on the session level?  

I ran these queries from the same Query window, so my idea is that maybe
the inner join plan was cached prior to an automatic analyze being run.  

But I'm doubting PostgreSQL would do something like that.  And of
course, if PostgreSQL doesn't cache query plans - this idea is bogus =)


On Wed, 10 Jan 2007 13:38:24 -0500, "Jeremy Haile" <[EMAIL PROTECTED]>
said:
> I'm pretty sure it didn't analyze in between  - autovac is turned off
> and I ran the test multiple times before posting.  
> 
> But since I can't reproduce it anymore, I can't be 100% sure.  And it
> certainly doesn't make sense that the estimate for the index scan would
> change based on an unrelated join condition.
> 
> If I ever get it to happen again, I'll be more careful and repost if it
> is a real issue.  Thanks for pointing me in the right direction!
> 
> 
> On Wed, 10 Jan 2007 13:38:15 -0500, "Tom Lane" <[EMAIL PROTECTED]> said:
> > "Jeremy Haile" <[EMAIL PROTECTED]> writes:
> > > I still don't understand why the inner join would be so different from
> > > the left join prior to the analyze.
> > 
> > Are you sure you hadn't analyzed in between?  Or maybe autovac did it
> > for you?  The reason for the plan change is the change from estimating
> > 1 row matching the transaction_date range constraint, to estimating lots
> > of them, and the join type away up at the top would surely not have
> > affected that.
> > 
> > regards, tom lane
> 
> ---(end of broadcast)---
> TIP 3: Have you checked our extensive FAQ?
> 
>http://www.postgresql.org/docs/faq

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [PERFORM] Slow inner join, but left join is fast

2007-01-10 Thread Jeremy Haile

I'm pretty sure it didn't analyze in between  - autovac is turned off
and I ran the test multiple times before posting.  

But since I can't reproduce it anymore, I can't be 100% sure.  And it
certainly doesn't make sense that the estimate for the index scan would
change based on an unrelated join condition.

If I ever get it to happen again, I'll be more careful and repost if it
is a real issue.  Thanks for pointing me in the right direction!

On Wed, 10 Jan 2007 13:38:15 -0500, "Tom Lane" <[EMAIL PROTECTED]> said:
> "Jeremy Haile" <[EMAIL PROTECTED]> writes:
> > I still don't understand why the inner join would be so different from
> > the left join prior to the analyze.
> 
> Are you sure you hadn't analyzed in between?  Or maybe autovac did it
> for you?  The reason for the plan change is the change from estimating
> 1 row matching the transaction_date range constraint, to estimating lots
> of them, and the join type away up at the top would surely not have
> affected that.
> 
>   regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [PERFORM] Slow inner join, but left join is fast

2007-01-10 Thread Tom Lane

"Jeremy Haile" <[EMAIL PROTECTED]> writes:
> I still don't understand why the inner join would be so different from
> the left join prior to the analyze.

Are you sure you hadn't analyzed in between?  Or maybe autovac did it
for you?  The reason for the plan change is the change from estimating
1 row matching the transaction_date range constraint, to estimating lots
of them, and the join type away up at the top would surely not have
affected that.

regards, tom lane

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Re: [PERFORM] Slow inner join, but left join is fast

2007-01-10 Thread Jeremy Haile

The table should have been analyzed, but to make sure I ran analyze on
the table before executing the explain analyze queries.  Well - problem
solved.  This time the inner join query runs quickly.  

I still don't understand why the inner join would be so different from
the left join prior to the analyze.  It looks like the amount of rows
expected in the original query plan for inner join was 1 (not correct
since it was really 8728)  The left join query had the exact same
subqueries but expected 77214 rows to be returned from them, which was
still not correct but resulted in a better query plan.

After the recent analyze, here's the new inner join query plan.  I won't
bother pasting the left join plan, since it is almost identical now
(including row counts)  FYI -the result of the queries is (and always
was) identical for inner join and left join.


QUERY PLAN (inner join)
Merge Join  (cost=279457.86..279479.83 rows=43 width=120) (actual
time=626.771..670.275 rows=8728 loops=1)
  Merge Cond: ((a.dcms_dim_id = b.dcms_dim_id) AND (a.merchant_dim_id =
  b.merchant_dim_id))
  ->  Sort  (cost=139717.30..139722.38 rows=2029 width=64) (actual
  time=265.669..269.878 rows=8728 loops=1)
Sort Key: a.dcms_dim_id, a.merchant_dim_id
->  HashAggregate  (cost=139519.61..139585.56 rows=2029
width=16) (actual time=211.368..247.429 rows=8728 loops=1)
  ->  Bitmap Heap Scan on transaction_facts 
  (cost=4427.62..138316.05 rows=68775 width=16) (actual
  time=21.858..100.998 rows=65789 loops=1)
Recheck Cond: ((transaction_date >= '2007-01-09
00:00:00'::timestamp without time zone) AND
(transaction_date < '2007-01-09 09:30:00'::timestamp
without time zone))
->  Bitmap Index Scan on
transaction_facts_transaction_date_idx 
(cost=0.00..4410.42 rows=68775 width=0) (actual
time=21.430..21.430 rows=65789 loops=1)
  Index Cond: ((transaction_date >= '2007-01-09
  00:00:00'::timestamp without time zone) AND
  (transaction_date < '2007-01-09
  09:30:00'::timestamp without time zone))
  ->  Sort  (cost=139740.56..139742.67 rows=843 width=64) (actual
  time=361.083..365.418 rows=8728 loops=1)
Sort Key: b.dcms_dim_id, b.merchant_dim_id
->  Subquery Scan b  (cost=139663.76..139699.59 rows=843
width=64) (actual time=308.567..346.135 rows=8728 loops=1)
  ->  HashAggregate  (cost=139663.76..139691.16 rows=843
  width=16) (actual time=308.563..337.677 rows=8728 loops=1)
->  HashAggregate  (cost=139347.68..139431.97
rows=8429 width=55) (actual time=198.093..246.591
rows=48942 loops=1)
  ->  Bitmap Heap Scan on transaction_facts 
  (cost=4427.62..138316.05 rows=68775 width=55)
  (actual time=24.080..83.988 rows=65789
  loops=1)
Recheck Cond: ((transaction_date >=
'2007-01-09 00:00:00'::timestamp without
time zone) AND (transaction_date <
'2007-01-09 09:30:00'::timestamp without
time zone))
->  Bitmap Index Scan on
transaction_facts_transaction_date_idx 
(cost=0.00..4410.42 rows=68775 width=0)
(actual time=23.596..23.596 rows=65789
loops=1)
  Index Cond: ((transaction_date >=
  '2007-01-09 00:00:00'::timestamp
  without time zone) AND
  (transaction_date < '2007-01-09
  09:30:00'::timestamp without time
  zone))
Total runtime: 675.638 ms



On Wed, 10 Jan 2007 12:15:44 -0500, "Tom Lane" <[EMAIL PROTECTED]> said:
> "Jeremy Haile" <[EMAIL PROTECTED]> writes:
> > I have a query made by joining two subqueries where the outer query
> > performing the join takes significantly longer to run than the two
> > subqueries.  
> 
> Please show EXPLAIN ANALYZE results, not just EXPLAIN.
> Also, have you analyzed your tables recently?
> 
>   regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [PERFORM] Slow inner join, but left join is fast

2007-01-10 Thread Tom Lane

"Jeremy Haile" <[EMAIL PROTECTED]> writes:
> I have a query made by joining two subqueries where the outer query
> performing the join takes significantly longer to run than the two
> subqueries.  

Please show EXPLAIN ANALYZE results, not just EXPLAIN.
Also, have you analyzed your tables recently?

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [PERFORM] group by will not use an index?

2007-01-10 Thread tsuraan


For the reasons indicated (that is, MVCC), PG can not do a DISTINCT or the
equivalent
GROUP BY from index values alone.



Ok, that makes sense.  Thanks for the help everybody!

If this table is large, perhaps you could denormalize and maintain a

summary table with date (using truncation) and count, updated with
triggers on the original table. This table will presumably have a
small number of rows at the cost of doubling the times for updates,
inserts, and deletes.



Well, the inserted time, at least, is never updated, and deletions are very
rare (never, so far), so I'll have a look at doing things that way.  Thanks!

[PERFORM] Slow inner join, but left join is fast

2007-01-10 Thread Jeremy Haile

I have a query made by joining two subqueries where the outer query
performing the join takes significantly longer to run than the two
subqueries.  

The first subquery runs in 600ms.  The seconds subquery runs in 700ms. 
But the outer query takes 240 seconds to run!  Both of the two
subqueries only return 8728 rows.  

Changing the inner join to a left join makes the outer query run in
about 1000ms (which is great), but I don't understand why the inner join
is so slow!

I'm using PostgreSQL 8.2.1.  Any ideas?

QUERY PLAN (Inner Join) - takes 240 seconds
---
Nested Loop  (cost=17.46..17.56 rows=1 width=120)
  Join Filter: ((a.merchant_dim_id = b.merchant_dim_id) AND
  (a.dcms_dim_id = b.dcms_dim_id))
  ->  HashAggregate  (cost=8.71..8.74 rows=1 width=16)
->  Index Scan using transaction_facts_transaction_date_idx on
transaction_facts  (cost=0.00..8.69 rows=1 width=16)
  Index Cond: ((transaction_date >= '2007-01-09
  00:00:00'::timestamp without time zone) AND
  (transaction_date < '2007-01-09 09:30:00'::timestamp
  without time zone))
  ->  HashAggregate  (cost=8.75..8.78 rows=1 width=16)
->  HashAggregate  (cost=8.71..8.72 rows=1 width=55)
  ->  Index Scan using
  transaction_facts_transaction_date_idx on
  transaction_facts  (cost=0.00..8.69 rows=1 width=55)
Index Cond: ((transaction_date >= '2007-01-09
00:00:00'::timestamp without time zone) AND
(transaction_date < '2007-01-09 09:30:00'::timestamp
without time zone))


QUERY PLAN (Left Join) - takes one second
---
Merge Left Join  (cost=304037.63..304064.11 rows=2509 width=120)
  Merge Cond: ((a.dcms_dim_id = b.dcms_dim_id) AND (a.merchant_dim_id =
  b.merchant_dim_id))
  ->  Sort  (cost=152019.45..152025.72 rows=2509 width=64)
Sort Key: a.dcms_dim_id, a.merchant_dim_id
->  HashAggregate  (cost=151771.15..151852.69 rows=2509
width=16)
  ->  Bitmap Heap Scan on transaction_facts 
  (cost=5015.12..150419.90 rows=77214 width=16)
Recheck Cond: ((transaction_date >= '2007-01-09
00:00:00'::timestamp without time zone) AND
(transaction_date < '2007-01-09 09:30:00'::timestamp
without time zone))
->  Bitmap Index Scan on
transaction_facts_transaction_date_idx 
(cost=0.00..4995.81 rows=77214 width=0)
  Index Cond: ((transaction_date >= '2007-01-09
  00:00:00'::timestamp without time zone) AND
  (transaction_date < '2007-01-09
  09:30:00'::timestamp without time zone))
  ->  Sort  (cost=152018.18..152020.54 rows=943 width=64)
Sort Key: b.dcms_dim_id, b.merchant_dim_id
->  Subquery Scan b  (cost=151931.51..151971.59 rows=943
width=64)
  ->  HashAggregate  (cost=151931.51..151962.16 rows=943
  width=16)
->  HashAggregate  (cost=151578.11..151672.35
rows=9424 width=55)
  ->  Bitmap Heap Scan on transaction_facts 
  (cost=5015.12..150419.90 rows=77214 width=55)
Recheck Cond: ((transaction_date >=
'2007-01-09 00:00:00'::timestamp without
time zone) AND (transaction_date <
'2007-01-09 09:30:00'::timestamp without
time zone))
->  Bitmap Index Scan on
transaction_facts_transaction_date_idx 
(cost=0.00..4995.81 rows=77214 width=0)
  Index Cond: ((transaction_date >=
  '2007-01-09 00:00:00'::timestamp
  without time zone) AND
  (transaction_date < '2007-01-09
  09:30:00'::timestamp without time
  zone))


QUERY
---
select a.merchant_dim_id, a.dcms_dim_id, 
   a.num_success, a.num_failed, a.total_transactions,
   a.success_rate,
   b.distinct_num_success, b.distinct_num_failed,
   b.distinct_total_transactions, b.distinct_success_rate
from (

-- SUBQUERY 1
select merchant_dim_id, 
   dcms_dim_id,
   sum(success) as num_success, 
   sum(failed) as num_failed, 
   count(*) as total_transactions,
   (sum(success) * 1.0 / count(*)) as success_rate 
from transaction_facts 
where transaction_date >= '2007-1-9' 
and transaction_date < '2007-1-9 9:30' 
group by merchant_dim_id, dcms_dim_id

) as a inner join (

-- SUBQUERY 2
select merchant

Re: [PERFORM] Horribly slow query/ sequential scan

2007-01-10 Thread Tom Lane

I wrote:
> ... What seems to be happening is that Informix is willing to
> flatten the sub-SELECT into an IN join even though the sub-SELECT is
> correlated to the outer query (that is, it contains outer references).

I did some googling this morning and found confirmation that recent
versions of Informix have pretty extensive support for optimizing
correlated subqueries:
http://www.iiug.org/waiug/archive/iugnew83/FeaturesIDS73.htm

This is something we've not really spent much time on for Postgres,
but it might be interesting to look at someday.  Given that the problem
with your query was really a mistake anyway, I'm not sure that your
example is compelling evidence for making it a high priority.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [PERFORM] High inserts, bulk deletes - autovacuum vs scheduled vacuum

2007-01-10 Thread Florian Weimer

* Jeremy Haile:

> Good advice on the partitioning idea.  I may have to restructure some of
> my queries, since some of them query across the whole range - but it may
> be a much more performant solution.  How is the performance when
> querying across a set of partitioned tables vs. querying on a single
> table with all rows?

Locality of access decreases, of course, and depending on your data
size, you hit something like to 2 or 4 additional disk seeks per
partition for index-based accesses.  Sequential scans are not
impacted.

> Does my current approach of disabling autovacuum and manually vacuuming
> once-an-hour sound like a good idea, or would I likely have better
> results by auto-vacuuming and turning row-level stats back on?

Sorry, I haven't got much experience with autovacuum, since most of
other databases are INSERT-only (or get VACUUMed automatically after
major updates).

-- 
Florian Weimer<[EMAIL PROTECTED]>
BFK edv-consulting GmbH   http://www.bfk.de/
Kriegsstraße 100  tel: +49-721-96201-1
D-76133 Karlsruhe fax: +49-721-96201-99

---(end of broadcast)---
TIP 6: explain analyze is your friend

37 matches

Mail list logo