from:"Robert Haas"

Re: [PERFORM] Defaulting wal_sync_method to fdatasync on Linux for 9.1?

2010-11-16 Thread Robert Haas

On Tue, Nov 16, 2010 at 3:39 PM, Greg Smith g...@2ndquadrant.com wrote:
 I want to next go through and replicate some of the actual database level
 tests before giving a full opinion on whether this data proves it's worth
 changing the wal_sync_method detection.  So far I'm torn between whether
 that's the right approach, or if we should just increase the default value
 for wal_buffers to something more reasonable.

How about both?

open_datasync seems problematic for a number of reasons - you get an
immediate write-through whether you need it or not, including, as you
point out, the case where the you want to write several blocks at once
and then force them all out together.

And 64kB for a ring buffer just seems awfully small.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Defaulting wal_sync_method to fdatasync on Linux for 9.1?

2010-11-16 Thread Robert Haas

On Tue, Nov 16, 2010 at 6:25 PM, Josh Berkus j...@agliodbs.com wrote:
 On 11/16/10 12:39 PM, Greg Smith wrote:
 I want to next go through and replicate some of the actual database
 level tests before giving a full opinion on whether this data proves
 it's worth changing the wal_sync_method detection.  So far I'm torn
 between whether that's the right approach, or if we should just increase
 the default value for wal_buffers to something more reasonable.

 We'd love to, but wal_buffers uses sysV shmem.

places tongue firmly in cheek

Gee, too bad there's not some other shared-memory implementation we could use...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Running PostgreSQL as fast as possible no matter the consequences

2010-11-15 Thread Robert Haas

On Fri, Nov 5, 2010 at 8:12 AM, Jon Nelson jnelson+pg...@jamponi.net wrote:
 On Fri, Nov 5, 2010 at 7:08 AM, Guillaume Cottenceau g...@mnc.ch wrote:
 Marti Raudsepp marti 'at' juffo.org writes:

 On Fri, Nov 5, 2010 at 13:32, A B gentosa...@gmail.com wrote:
 I was just thinking about the case where I will have almost 100%
 selects, but still needs something better than a plain key-value
 storage so I can do some sql queries.
 The server will just boot, load data, run,  hopefully not crash but if
 it would, just start over with load and run.

 If you want fast read queries then changing
 fsync/full_page_writes/synchronous_commit won't help you.

 That illustrates how knowing the reasoning of this particular
 requests makes new suggestions worthwhile, while previous ones
 are now seen as useless.

 I disagree that they are useless - the stated mechanism was start,
 load data, and run. Changing the params above won't likely change
 much in the 'run' stage but would they help in the 'load' stage?

Yes, they certainly will.  And they might well help in the run stage,
too, if there are temporary tables in use, or checkpoints flushing
hint bit updates, or such things.

It's also important to crank up checkpoint_segments and
checkpoint_timeout very high, especially for the bulk data load but
even afterwards if there is any write activity at all.  And it's
important to set shared_buffers correctly, too, which helps on
workloads of all kinds.  But as said upthread, turning off fsync,
full_page_writes, and synchronous_commit are the things you can do
that specifically trade reliability away to get speed.

In 9.1, I'm hopeful that we'll have unlogged tables, which will even
better than turning these parameters off, and for which I just posted
a patch to -hackers.  Instead of generating WAL and writing WAL to the
OS and then NOT trying to make sure it hits the disk, we just won't
generate it in the first place.  But if PostgreSQL or the machine it's
running on crashes, you won't need to completely blow away the cluster
and start over; instead, the particular tables that you chose to
create as unlogged will be truncated, and the rest of your data,
including the system catalogs, will still be intact.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Running PostgreSQL as fast as possible no matter the consequences

2010-11-15 Thread Robert Haas

On Mon, Nov 15, 2010 at 2:27 PM, Andy Colson a...@squeakycode.net wrote:
 On 11/15/2010 9:06 AM, Robert Haas wrote:

 In 9.1, I'm hopeful that we'll have unlogged tables, which will even
 better than turning these parameters off, and for which I just posted
 a patch to -hackers.  Instead of generating WAL and writing WAL to the
 OS and then NOT trying to make sure it hits the disk, we just won't
 generate it in the first place.  But if PostgreSQL or the machine it's
 running on crashes, you won't need to completely blow away the cluster
 and start over; instead, the particular tables that you chose to
 create as unlogged will be truncated, and the rest of your data,
 including the system catalogs, will still be intact.


 if I am reading this right means: we can run our db safely (with fsync and
 full_page_writes enabled) except for tables of our choosing?

 If so, I am very +1 for this!

Yep.  But we need some vic^H^Holunteers to reviews and test the patches.

https://commitfest.postgresql.org/action/patch_view?id=424

Code review, benchmarking, or just general tinkering and reporting
what you find out on the -hackers thread would be appreciated.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Difference between explain analyze and real execution time

2010-11-15 Thread Robert Haas

2010/11/15 Artur Zając aza...@ang.com.pl:
 Why there is so big difference between explain analyze (0.710 ms) and real
 execution time (3309 ms)? Any suggestions?

Could it be that it takes a long time to plan for some reason?  How
fast is a plain EXPLAIN?

What happens if you start up psql, turn on \timing, and then run
EXPLAIN ANALYZE from within an interactive session?  That's usually a
better way to test, as it avoids counting the session-startup
overhead.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Why dose the planner select one bad scan plan.

2010-11-14 Thread Robert Haas

On Thu, Nov 11, 2010 at 3:43 AM,  t...@fuzzy.cz wrote:
 Okay, I want to know how the planner computes the cost of constructing
 bitmap. And when the planner computes the cost of 'Bitmap Index Scan', if
 it considers the influence of memory cache? As when I do not clear the
 memory cache, I find the 'Bitmap Index Scan' is real fast than 'Seq
 Scan'.

 There are two things here - loading the data from a disk into a cache
 (filesystem cache at the OS level / shared buffers at the PG level), and
 then the execution itself.

 PostgreSQL estimates the first part using an effective_cache_size hint,
 and uses that to estimate the probability that the data are already in the
 filesystem cache.

No, it does not do that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] anti-join chosen even when slower than old plan

2010-11-13 Thread Robert Haas

On Sat, Nov 13, 2010 at 4:32 AM, Marc Mamin m.ma...@intershop.de wrote:
 Hello,

 Just a short though:

 Is it imaginable to compare the prognoses of the plans with the actual
 results
 and somehow log the worst cases ?

 a) to help the DBA locate bad statistics and queries
 b) as additional information source for the planner

 This could possibly affect parameters of your formula on the fly.

Yeah, I've thought about this, but it's not exactly clear what would
be most useful.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] temporary tables, indexes, and query plans

2010-11-13 Thread Robert Haas

On Sat, Nov 13, 2010 at 10:41 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Jon Nelson jnelson+pg...@jamponi.net writes:
 OK. This is a highly distilled example that shows the behavior.

 BEGIN;
 CREATE TEMPORARY TABLE foo AS SELECT x AS A, chr(x % 75 + 32) AS b,
 ''::text AS c from generate_series(1,500) AS x;
 UPDATE foo SET c = 'foo' WHERE b = 'A' ;
 CREATE INDEX foo_b_idx on foo (b);
 [ and the rest of the transaction can't use that index ]

 OK, this is an artifact of the HOT update optimization.  Before
 creating the index, you did updates on the table that would have been
 executed differently if the index had existed.  When the index does get
 created, its entries for those updates are incomplete, so the index
 can't be used in transactions that could in principle see the unmodified
 rows.

Is the in principle here because there might be an open snapshot
other than the one under which CREATE INDEX is running, like a cursor?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] temporary tables, indexes, and query plans

2010-11-13 Thread Robert Haas

On Sat, Nov 13, 2010 at 7:54 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Sat, Nov 13, 2010 at 10:41 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 OK, this is an artifact of the HOT update optimization.  Before
 creating the index, you did updates on the table that would have been
 executed differently if the index had existed.  When the index does get
 created, its entries for those updates are incomplete, so the index
 can't be used in transactions that could in principle see the unmodified
 rows.

 Is the in principle here because there might be an open snapshot
 other than the one under which CREATE INDEX is running, like a cursor?

 Well, the test is based on xmin alone, not cmin, so it can't really tell
 the difference.  It's unclear that it'd be worth trying.

Yeah, I'm not familiar with the logic in that area of the code, so I
can't comment all that intelligently.  However, I feel like there's a
class of things that could potentially be optimized if we know that
the only snapshot they could affect is the one we're currently using.
For example, when bulk loading a newly created table with COPY or
CTAS, we could set the xmin-committed hint bit if it weren't for the
possibility that some snapshot with a command-ID equal to or lower
than our own might take a look and get confused.  That seems to
require a BEFORE trigger or another open snapshot.  And, if we
HOT-update a tuple created by our own transaction that can't be of
interest to anyone else ever again, it would be nice to either mark it
for pruning or maybe even overwrite it in place; similarly if we
delete such a tuple it would be nice to schedule its execution.  There
are problems with all of these ideas, and I'm not totally sure how to
make any of it work, but to me this sounds suspiciously like another
instance of a somewhat more general problem.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] anti-join chosen even when slower than old plan

2010-11-12 Thread Robert Haas

On Fri, Nov 12, 2010 at 4:15 AM, Cédric Villemain
cedric.villemain.deb...@gmail.com wrote:
 I wondering if we could do something with a formula like 3 *
 amount_of_data_to_read / (3 * amount_of_data_to_read +
 effective_cache_size) = percentage NOT cached.  That is, if we're
 reading an amount of data equal to effective_cache_size, we assume 25%
 caching, and plot a smooth curve through that point.  In the examples
 above, we would assume that a 150MB read is 87% cached, a 1GB read is
 50% cached, and a 3GB read is 25% cached.

 But isn't it already the behavior of effective_cache_size usage ?

No.

The ideal of trying to know what is actually in cache strikes me as an
almost certain non-starter.  It can change very quickly, even as a
result of the query you're actually running.  And getting the
information we'd need in order to do it that way would be very
expensive, when it can be done at all.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] anti-join chosen even when slower than old plan

2010-11-12 Thread Robert Haas

On Fri, Nov 12, 2010 at 11:43 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 I think his point is that we already have a proven formula
 (Mackert-Lohmann) and shouldn't be inventing a new one out of thin air.
 The problem is to figure out what numbers to apply the M-L formula to.

I'm not sure that's really measuring the same thing, although I'm not
opposed to using it if it produces reasonable answers.

 I've been thinking that we ought to try to use it in the context of the
 query as a whole rather than for individual table scans; the current
 usage already has some of that flavor but we haven't taken it to the
 logical conclusion.

That's got a pretty severe chicken-and-egg problem though, doesn't it?
 You're going to need to know how much data you're touching to
estimate the costs so you can pick the best plan, but you can't know
how much data will ultimately be touched until you've got the whole
plan.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] questions regarding shared_buffers behavior

2010-11-12 Thread Robert Haas

On Sun, Nov 7, 2010 at 10:03 PM, Cédric Villemain
cedric.villemain.deb...@gmail.com wrote:
 2010/11/8 Mark Rostron mrost...@ql2.com:
 
  What is the procedure that postgres uses to decide whether or not a
  table/index block will be left in the shared_buffers cache at the end
  of the operation?
 

 The only special cases are for sequential scans and VACUUM, which use 
 continuously re-use a small section of the buffer cache in some cases 
 instead.

 Thanks - the part about sequential scans and the re-use of a small section 
 of shared_buffers is the bit I was interested in.
 I don't suppose you would be able to tell me how large that re-useable area 
 might be?

 There are 256KB per seqscan and 256KB per vacuum.

 I suggest you to go reading src/backend/storage/buffer/README

Note that there is a different, higher limit for the bulk write
strategy when using COPY IN or CTAS.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] anti-join chosen even when slower than old plan

2010-11-11 Thread Robert Haas

escape hatches elsewhere.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] anti-join chosen even when slower than old plan

2010-11-11 Thread Robert Haas

On Thu, Nov 11, 2010 at 1:23 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 Let's back up a moment and talk about what the overall goal is, here.
 Ideally, we would like PostgreSQL to have excellent performance at all
 times, under all circumstances, with minimal tuning.  Therefore, we do
 NOT want to add variables that will, by design, need constant manual
 adjustment.  That is why I suggested that Tom's idea of an
 assume_cached GUC is probably not what we really want to do.   On the
 other hand, what I understand Mladen to be suggesting is something
 completely different.  He's basically saying that, of course, he wants
 it to work out of the box most of the time, but since there are
 guaranteed to be cases where it doesn't, how about providing some
 knobs that aren't intended to be routinely twaddled but which are
 available in case of emergency?  Bravo, I say!

 Um ... those are exactly the same thing.  You're just making different
 assumptions about how often you will need to twiddle the setting.
 Neither assumption is based on any visible evidence, unfortunately.

 I was thinking of assume_cached as something that could be
 set-and-forget most of the time, and you're entirely right to criticize
 it on the grounds that maybe it wouldn't.  But to support a proposal
 that doesn't even exist yet on the grounds that it *would* be
 set-and-forget seems a tad inconsistent.  We can't make that judgment
 without a whole lot more details than have been provided yet for any
 idea in this thread.

Well, maybe I misunderstood what you were proposing.  I had the
impression that you were proposing something that would *by design*
require adjustment for each query, so evidently I missed the point.
It seems to me that random_page_cost and seq_page_cost are pretty
close to set-and-forget already.  We don't have many reports of people
needing to tune these values on a per-query basis; most people seem to
just guesstimate a cluster-wide value and call it good.  Refining the
algorithm should only make things better.

 I do think that something based around a settable-per-table caching
 percentage might be a reasonable way to proceed.  But the devil is in
 the details, and we don't have those yet.

I think one of the larger devils in the details is deciding how to
estimate the assumed caching percentage when the user hasn't specified
one.  Frankly, I suspect that if we simply added a reloption called
assumed_caching_percentage and made it default to zero, we would make
a bunch of DBAs happy; they'd knock down seq_page_cost and
random_page_cost enough to account for the general level of caching
and then bump assumed_caching_percentage up for hot tables/indexes (or
ones that they want to have become hot).  I think we can do better
than that, but the right formula isn't exactly obvious.  I feel safe
saying that if effective_cache_size=1GB and table_size=4MB, then we
ought to take the table as fully cached.  But it's far from clear what
caching percentage we should assume when table_size=400MB, and it
seems like the sort of thing that will lead to endless bikeshedding.
There's probably no perfect answer, but I feel we can likely come up
with something that is better than a constant (which would probably
still be better than what we have now).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] anti-join chosen even when slower than old plan

2010-11-11 Thread Robert Haas

On Thu, Nov 11, 2010 at 1:58 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 I wrote:
 I do think that something based around a settable-per-table caching
 percentage might be a reasonable way to proceed.

 BTW ... on reflection it seems that this would *not* solve the use-case
 Kevin described at the start of this thread.  What he's got AIUI is some
 large tables whose recent entries are well-cached, and a lot of queries
 that tend to hit that well-cached portion, plus a few queries that hit
 the whole table and so see largely-not-cached behavior.  We can't
 represent that very well with a caching knob at the table level.  Either
 a high or a low setting will be wrong for one set of queries or the
 other.

Yeah.  For Kevin's case, it seems like we want the caching percentage
to vary not so much based on which table we're hitting at the moment
but on how much of it we're actually reading.  However, the two
problems are related enough that I think it might be feasible to come
up with one solution that answers both needs, or perhaps two
somewhat-intertwined solutions.

 The most practical solution for his case still seems to be to twiddle
 some GUC or other locally in the maintenance scripts that do the
 full-table-scan queries.  Unfortunately we don't have an equivalent
 of per-session SET (much less SET LOCAL) for per-relation attributes.
 Not sure if we want to go there.

I doubt it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] anti-join chosen even when slower than old plan

2010-11-11 Thread Robert Haas

On Thu, Nov 11, 2010 at 2:35 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 Yeah.  For Kevin's case, it seems like we want the caching percentage
 to vary not so much based on which table we're hitting at the moment
 but on how much of it we're actually reading.

 Well, we could certainly take the expected number of pages to read and
 compare that to effective_cache_size.  The thing that's missing in that
 equation is how much other stuff is competing for cache space.  I've
 tried to avoid having the planner need to know the total size of the
 database cluster, but it's kind of hard to avoid that if you want to
 model this honestly.

I'm not sure I agree with that.  I mean, you could easily have a
database that is much larger than effective_cache_size, but only that
much of it is hot.  Or, the hot portion could move around over time.
And for reasons of both technical complexity and plan stability, I
don't think we want to try to model that.  It seems perfectly
reasonable to say that reading 25% of effective_cache_size will be
more expensive *per-page* than reading 5% of effective_cache_size,
independently of what the total cluster size is.

 Would it be at all workable to have an estimate that so many megs of a
 table are in cache (independently of any other table), and then we could
 scale the cost based on the expected number of pages to read versus that
 number?  The trick here is that DBAs really aren't going to want to set
 such a per-table number (at least, most of the time) so we need a
 formula to get to a default estimate for that number based on some simple
 system-wide parameters.  I'm not sure if that's any easier.

That's an interesting idea.  For the sake of argument, suppose we
assume that a relation which is less than 5% of effective_cache_size
will be fully cached; and anything larger we'll assume that much of it
is cached.  Consider a 4GB machine with effective_cache_size set to
3GB.  Then we'll assume that any relation less than 153MB table is
100% cached, a 1 GB table is 15% cached, and a 3 GB table is 5%
cached.  That doesn't seem quite right, though: the caching percentage
drops off very quickly after you exceed the threshold.

*thinks*

I wondering if we could do something with a formula like 3 *
amount_of_data_to_read / (3 * amount_of_data_to_read +
effective_cache_size) = percentage NOT cached.  That is, if we're
reading an amount of data equal to effective_cache_size, we assume 25%
caching, and plot a smooth curve through that point.  In the examples
above, we would assume that a 150MB read is 87% cached, a 1GB read is
50% cached, and a 3GB read is 25% cached.

 BTW, it seems that all these variants have an implicit assumption that
 if you're reading a small part of the table it's probably part of the
 working set; which is an assumption that could be 100% wrong.  I don't
 see a way around it without trying to characterize the data access at
 an unworkably fine level, though.

Me neither, but I think it will frequently be true, and I'm not sure
it will hurt very much when it isn't.  I mean, if you execute the same
query repeatedly, that data will become hot soon enough.  If you
execute a lot of different queries that each touch a small portion of
a big, cold table, we might underestimate the costs of the index
probes, but so what?  There's probably no better strategy for
accessing that table anyway.  Perhaps you can construct an example
where this underestimate affects the join order in an undesirable
fashion, but I'm having a hard time getting worked up about that as a
potential problem case.  Our current system - where we essentially
assume that the caching percentage is uniform across the board - can
have the same problem in less artificial cases.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] anti-join chosen even when slower than old plan

2010-11-10 Thread Robert Haas

On Wed, Nov 10, 2010 at 10:15 AM, Kevin Grittner
kevin.gritt...@wicourts.gov wrote:
 But wait -- it turns out that this pain was self-inflicted.  Based
 on heavy testing of the interactive queries which users run against
 this database we tuned the database for fully-cached settings,
 with both random_page_cost and _seq_page_cost at 0.1.  In a
 practical sense, the users are almost always running these queries
 against very recent data which is, in fact, heavily cached -- so
 it's no surprise that the queries they run perform best with plans
 based on such costing.  The problem is that these weekly maintenance
 runs need to pass the entire database, so caching effects are far
 less pronounced.  If I set seq_page_cost = 1 and random_page_cost =
 2 I get exactly the same (fast) plan as above.

 I guess the lesson here is not to use the same costing for
 database-wide off-hours maintenance queries as for ad hoc queries
 against a smaller set of recent data by users who expect quick
 response time.  I'm fine with tweaking the costs in our maintenance
 scripts, but it does tend to make me daydream about how the
 optimizer might possibly auto-tweak such things

Wow.  That's fascinating, and if you don't mind, I might mention this
potential problem in a future talk at some point.

I've given some thought in the past to trying to maintain some model
of which parts of the database are likely to be cached, and trying to
adjust costing estimates based on that data.  But it's a really hard
problem, because what is and is not in cache can change relatively
quickly, and you don't want to have too much plan instability.  Also,
for many workloads, you'd need to have pretty fine-grained statistics
to figure out anything useful, which would be expensive and difficult
to maintain.

But thinking over what you've written here, I'm reminded of something
Peter said years ago, also about the optimizer.  He was discussed the
ratio of the estimated cost to the actual cost and made an off-hand
remark that efforts had been made over the years to make that ratio
more consistent (i.e. improve the quality of the cost estimates) but
that they'd been abandoned because they didn't necessarily produce
better plans.  Applying that line of thinking to this problem, maybe
we should give up on trying to make the estimates truly model reality,
and focus more on assigning them values which work well in practice.
For example, in your case, it would be sufficient to estimate the
amount of data that a given query is going to grovel through and then
applying some heuristic to choose values for random_page_cost and
seq_page_cost based on the ratio of that value to, I don't know,
effective_cache_size.

Unfortunately, to know how much data we're going to grovel through, we
need to know the plan; and to decide on the right plan, we need to
know how much data we're going to grovel through.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] anti-join chosen even when slower than old plan

2010-11-10 Thread Robert Haas

On Wed, Nov 10, 2010 at 6:07 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Kevin Grittner kevin.gritt...@wicourts.gov writes:
 Robert Haas robertmh...@gmail.com wrote:
 Unfortunately, to know how much data we're going to grovel
 through, we need to know the plan; and to decide on the right
 plan, we need to know how much data we're going to grovel through.

 And that's where they've been ending.

 The only half-sane answer I've thought of is to apply a different
 cost to full-table or full-index scans based on the ratio with
 effective cache size.

Kevin, yes, good point.  Bravo!  Let's do that.  Details TBD, but
suppose effective_cache_size = 1GB.  What we know for sure is that a 4
GB table is not going to be fully cached but a 4 MB table may well be.
 In fact, I think we should assume that the 4 MB table IS cached,
because the point is that if it's used at all, it soon will be.  It's
almost certainly a bad idea to build a plan around the idea of
minimizing reads from that 4 MB table in favor of doing a substantial
amount of additional work somewhere else.  I suppose this could break
down if you had hundreds and hundreds of 4 MB tables all of which were
accessed regularly, but that's an unusual situation, and anyway it's
not clear that assuming them all uncached is going to be any better
than assuming them all cached.

 This might have some connection to some rather half-baked ideas I've
 been having in connection with the generalized-inner-indexscan problem.
 I don't have anything in the way of a coherent presentation to make yet,
 but the thing I'm being forced to realize is that sane modeling of a
 complex subplan that's on the inside of a nestloop join requires
 treating *every* scan type as having different costs the first time
 versus during rescan.  If the total amount of data touched in the
 query is less than effective_cache_size, it's not unreasonable to
 suppose that I/O costs during rescan might be zero, even for a seqscan or
 a non-parameterized indexscan.  In fact, only parameterized indexscans
 would be able to touch pages they'd not touched the first time, and so
 they ought to have higher not lower rescan costs in this environment.
 But once the total data volume exceeds effective_cache_size, you have to
 do something different since you shouldn't any longer assume the data is
 all cached from the first scan.  (This isn't quite as hard as the case
 you're talking about, since I think the relevant data volume is the sum
 of the sizes of the tables used in the query; which is easy to
 estimate at the start of planning, unlike the portion of the tables
 that actually gets touched.)

Well, we don't want the costing model to have sharp edges.
effective_cache_size can't be taken as much more than an educated
guess, and what actually happens will depend a lot on what else is
going on on the system.  If only one query is running on a system at a
time and it is repeatedly seq-scanning a large table, the cost of
reading pages in will be very small until the table grows large enough
that you can't fit the whole thing in memory at once, and then will
abruptly go through the roof.  But realistically you're not going to
know exactly where that edge is going to be, because you can't predict
exactly how much concurrent activity there will be, for example, or
how much work_mem allocations will push out of the OS buffer cache.
So I'm thinking we should start the costs at something like 0.05/0.05
for tables that are much smaller than effective_cache_size and ramp up
to 4/1 for tables that are larger than effective_cache_size.  Maybe
just by linearly ramping up, although that has a certain feeling of
being without mathemetical soundness.

 An idea that isn't even half-baked yet is that once we had a cost model
 like that, we might be able to produce plans that are well-tuned for a
 heavily cached environment by applying the rescan cost model even to
 the first scan for a particular query.  So that might lead to some sort
 of assume_cached GUC parameter, and perhaps Kevin could tune his
 reporting queries by turning that off instead of messing with individual
 cost constants.

I think the real goal here should be to try to avoid needing a GUC.  A
lot of people could benefit if the system could make some attempt to
recognize on its own which queries are likely to be cached.  We
already have parameters you can hand-tune for each query as necessary.
 Being able to set some parameters system-wide and then get sensible
behavior automatically would be much nicer.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Regression: 8.3 2 seconds - 8.4 100+ seconds

2010-11-06 Thread Robert Haas

On Wed, Oct 27, 2010 at 8:41 AM, Francisco Reyes li...@stringsutils.com wrote:
                                -  Nested Loop  (cost=293.80..719.87
 rows=2434522 width=4) (actual time=228.867..241.909 rows=2 loops=1)
                                      -  HashAggregate  (cost=293.80..294.13
 rows=33 width=29) (actual time=169.551..169.553 rows=2 loops=1)
                                            -  Nested Loop
  (cost=11.33..293.71 rows=33 width=29) (actual time=145.940..169.543 rows=2
 loops=1)
                                                  -  HashAggregate
  (cost=11.33..11.66 rows=33 width=4) (actual time=64.730..64.732 rows=2
 loops=1)
                                                        -  Index Scan using
 members_commonid on members  (cost=0.00..11.25 rows=33 width=4) (actual time
 = 64.688..64.703 rows=2 loops=1)
                                                              Index Cond:
 (commonid = 3594)
                                                  -  Index Scan using
 cards_membid on cards  (cost=0.00..8.53 rows=1 width=33) (actual time=
 52.400..52.401 rows=1 loops=2)
                                                        Index Cond:
 (public.cards.membid = public.members.membid)
                                      -  Index Scan using cards_useraccount
 on cards  (cost=0.00..12.88 rows=2 width=33) (actual time=36.172..  36.173
 rows=1 loops=2)
                                            Index Cond:
 (public.cards.useraccount = public.cards.useraccount)

This part looks really strange to me.  Here we have a nested loop
whose outer side is estimated to produce 33 rows and whose outer side
is estimated to produce 2 rows.  Given that, one would think that the
estimate for the loop as a whole shouldn't be more than 33 * 2 = 66
rows (or maybe a bit more if 33 is really 33.4999 rounded down, and 2
is really 2.4 rounded down).   But the actual estimate is 5 orders
of magnitude larger.  How is that possible?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] BBU Cache vs. spindles

2010-10-29 Thread Robert Haas

On Fri, Oct 29, 2010 at 11:56 AM, Aidan Van Dyk ai...@highrise.ca wrote:
 1) The pages you write to must be in the page cache, or your memcpy is
 going to fault them in.  With a plain write, you don't need the
 over-written page in the cache.

I seem to remember a time many years ago when I got bitten by this
problem.  The fact that our I/O is in 8K pages means this could be a
pretty severe hit, I think.

 2) Now, instead of the torn-page problem being FS block/sector sized
 base, you can now actually have a possibly arbitrary amount of the
 block memory written when the kernel writes out the page.  you
 *really* need full-page-writes.

Yeah.

 3) The mmap overhead required for the kernel to setup the mappings is
 less than the repeated syscalls of a simple write().

You'd expect to save something from that; but on the other hand, at
least on 32-bit systems, there's a very limited number of 1GB files
that can be simultaneously mapped into one address space, and it's a
lot smaller than the number of file descriptors that you can have
open.   Rumor has it that cutting down the number of fds that can stay
open simultaneously is pretty bad for performance, so cutting it down
to a number you can count on one hand (maybe one finger) would
probably be bad.  Maybe on 64-bit it would be OK but it seems like an
awful lot of complexity for at most a minor savings (and a pretty bad
anti-savings if point #1 kicks in).

Anyway this is all totally off-topic...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] temporary tables, indexes, and query plans

2010-10-29 Thread Robert Haas

On Wed, Oct 27, 2010 at 3:44 PM, Justin Pitts justinpi...@gmail.com wrote:
 Jason Pitts:
 RE: changing default_statistics_target (or via ALTER TABLE SET STATS)
 not taking effect until ANALYZE is performed.

 I did already know that, but it's probably good to put into this
 thread. However, you'll note that this is a temporary table created at
 the beginning of a transaction.


 ( giving up on replying to the group; the list will not accept my posts )

Evidently it's accepting some of them...

 I've been following the thread so long I had forgotten that. I rather
 strongly doubt that analyze can reach that table's content inside that
 transaction, if you are creating, populating, and querying it all
 within that single transaction.

Actually I don't think that's a problem, at least for a manual ANALYZE.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Slow Query- Simple taking

2010-10-28 Thread Robert Haas

On Tue, Oct 19, 2010 at 2:21 PM, Ozer, Pam po...@automotive.com wrote:
 I have the following query running on 8.4, which takes 3516 ms.  It is very
 straight forward.  It brings back 116412 records.  The explain only takes
 1348ms

 select VehicleUsed.VehicleUsedId as VehicleUsedId ,

 VehicleUsed.VehicleUsedDisplayPriority as VehicleUsedDisplayPriority ,

 VehicleUsed.VehicleYear as VehicleYear ,

 VehicleUsed.VehicleUsedDisplayPriority as VehicleUsedDisplayPriority ,

 VehicleUsed.HasVehicleUsedThumbnail as HasVehicleUsedThumbnail ,

 VehicleUsed.HasVehicleUsedPrice as HasVehicleUsedPrice ,

 VehicleUsed.VehicleUsedPrice as VehicleUsedPrice ,

 VehicleUsed.HasVehicleUsedMileage as HasVehicleUsedMileage ,

 VehicleUsed.VehicleUsedMileage as VehicleUsedMileage ,

 VehicleUsed.IsCPO as IsCPO , VehicleUsed.IsMTCA as IsMTCA

 from VehicleUsed

 where ( VehicleUsed.VehicleMakeId = 28 )

 order by VehicleUsed.VehicleUsedDisplayPriority , VehicleUsed.VehicleYear
 desc , VehicleUsed.HasVehicleUsedThumbnail desc ,
 VehicleUsed.HasVehicleUsedPrice desc , VehicleUsed.VehicleUsedPrice ,
 VehicleUsed.HasVehicleUsedMileage desc , VehicleUsed.VehicleUsedMileage ,

 VehicleUsed.IsCPO desc , VehicleUsed.IsMTCA desc





 The explain is also very straight forward



 Sort  (cost=104491.48..105656.24 rows=116476 width=41) (actual
 time=1288.413..1325.457 rows=116412 loops=1)

   Sort Key: vehicleuseddisplaypriority, vehicleyear,
 hasvehicleusedthumbnail, hasvehicleusedprice, vehicleusedprice,
 hasvehicleusedmileage, vehicleusedmileage, iscpo, ismtca

   Sort Method:  quicksort  Memory: 19443kB

   -  Bitmap Heap Scan on vehicleused  (cost=7458.06..65286.42 rows=116476
 width=41) (actual time=34.982..402.164 rows=116412 loops=1)

     Recheck Cond: (vehiclemakeid = 28)

     -  Bitmap Index Scan on vehicleused_i08  (cost=0.00..7341.59
 rows=116476 width=0) (actual time=22.854..22.854 rows=116412 loops=1)

       Index Cond: (vehiclemakeid = 28)

 Total runtime: 1348.487 ms



 Can someone tell me why after it runs the index scan it hen runs a bitmap
 heap scan?  It should not take this long to run should it?  If I limit the
 results it comes back in 300ms.

It doesn't.  The EXPLAIN output shows it running the bitmap index scan
first and then bitmap heap scan.  The bitmap index scan is taking 22
ms, and the bitmap index and bitmap heap scans combined are taking 402
ms.  The sort is then taking another 800+ ms for a total of 1325 ms.
Any additional time is spent returning rows to the client.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Slow Query- Simple taking

2010-10-28 Thread Robert Haas

On Thu, Oct 28, 2010 at 10:39 AM, Robert Haas robertmh...@gmail.com wrote:
 Can someone tell me why after it runs the index scan it hen runs a bitmap
 heap scan?  It should not take this long to run should it?  If I limit the
 results it comes back in 300ms.

 It doesn't.  The EXPLAIN output shows it running the bitmap index scan
 first and then bitmap heap scan.  The bitmap index scan is taking 22
 ms, and the bitmap index and bitmap heap scans combined are taking 402
 ms.  The sort is then taking another 800+ ms for a total of 1325 ms.
 Any additional time is spent returning rows to the client.

Doh!  I misread your email.  You had it right, and I'm all wet.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Slow Query- Simple taking

2010-10-28 Thread Robert Haas

On Tue, Oct 19, 2010 at 6:05 PM, Ozer, Pam po...@automotive.com wrote:
 On mysql the same query only takes milliseconds not seconds.  That's a
 big difference.

I can believe that MySQL is faster, because they probably don't need
to do the bitmap heap scan.  There is a much-anticipated feature
called index-only scans that we don't have yet in PG, which would help
cases like this a great deal.

But I don't see how MySQL could send back 116,000 rows to the client
in milliseconds, or sort them that quickly.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Slow Query- Simple taking

2010-10-28 Thread Robert Haas

On Thu, Oct 28, 2010 at 11:23 AM, Mladen Gogala
mladen.gog...@vmsinfo.com wrote:
 On 10/28/2010 10:53 AM, Richard Broersma wrote:

 On Thu, Oct 28, 2010 at 7:51 AM, Mladen Gogala
 mladen.gog...@vmsinfo.com  wrote:

 Yyesss! Any time frame on that? Can you make it into 9.0.2?

 Maybe 9.1.0 or 9.2.0 :)  9.0's features are already frozen.


 Well, with all this global warming around us, index scans may still thaw in
 time to make it into 9.0.2

I fear this is not going to happen for 9.1.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Query much faster with enable_seqscan=0

2010-10-28 Thread Robert Haas

On Tue, Oct 12, 2010 at 10:28 PM, Samuel Gendler
sgend...@ideasculptor.com wrote:


 On Tue, Sep 21, 2010 at 4:30 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 Ogden li...@darkstatic.com writes:
  SELECT tr.id, tr.sid
              FROM
              test_registration tr,
              INNER JOIN test_registration_result r on (tr.id =
  r.test_registration_id)
              WHERE.
 
  tr.test_administration_id='32a22b12-aa21-11df-a606-96551e8f4e4c'::uuid
              GROUP BY tr.id, tr.sid

 Seeing that tr.id is a primary key, I think you might be a lot better
 off if you avoided the inner join and group by.  I think what you really
 want here is something like

 SELECT tr.id, tr.sid
            FROM
            test_registration tr
            WHERE

  tr.test_administration_id='32a22b12-aa21-11df-a606-96551e8f4e4c'::uuid
            AND EXISTS(SELECT 1 FROM test_registration_result r
                       WHERE tr.id = r.test_registration_id)

                        regards, tom lane


 Could you explain the logic behind why this structure is better than the
 other? Is it always the case that one should just always use the
 'exists(select 1 from x...)' structure when trying to strip rows that don't
 join or is it just the case when you know that the rows which do join are a
 fairly limited subset?  Does the same advantage exist if filtering rows in
 the joined table on some criteria, or is it better at that point to use an
 inner join and add a where clause to filter the joined rows.
 select table1.columns
 from  table1, table2
 where table1.column = 'some_value'
    and table1.fk = table2.pk
   AND table2.column = 'some_other_value'
 versus
 select table1.columns
   from table1
 where table1.column = 'some_value'
    and exists(select 1 from table2 where table1.fk = table2.pk
                       and table2.column ='some_other_value')

I don't think there's much difference between those two cases.  I
think Tom's point was that GROUP BY can be expensive - which it
certainly can.  It's absolutely necessary and unavoidable for certain
queries, of course, but don't include it unless you need it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] how to get the total number of records in report

2010-10-28 Thread Robert Haas

On Tue, Oct 19, 2010 at 7:56 PM, Merlin Moncure mmonc...@gmail.com wrote:
 On Mon, Oct 18, 2010 at 1:16 AM, AI Rumman rumman...@gmail.com wrote:
 At present for reporting I use following types of query:
 select crm.*, crm_cnt.cnt
 from crm,
 (select count(*) as cnt from crm) crm_cnt;
 Here count query is used to find the total number of records.
 Same FROM clause is copied in both the part of the query.
 Is there any other good alternative way to get this similar value?

 Probably the best way to do this type of thing is handle it on the
 client.  However, if you want to do it this way and your from clause
 is more complex than 'from table', you can possibly improve on this
 with a CTE:

 with q as (select * from something expensive)
 select q.* q_cnt.cnt from q, (select count(*) as cnt from q) q_cnt;

 The advantage here is that the CTE is materialized without having to
 do the whole query again.  This can be win or loss depending on the
 query.

What about

select crm.*, sum(1) over () as crm_count from crm limit 10;

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] BBU Cache vs. spindles

2010-10-27 Thread Robert Haas

On Wed, Oct 27, 2010 at 12:41 AM, Rob Wultsch wult...@gmail.com wrote:
On Tue, Oct 26, 2010 at 7:25 AM, Robert Haas robertmh...@gmail.com wrote:
On Tue, Oct 26, 2010 at 10:13 AM, Rob Wultsch wult...@gmail.com wrote:
The double write buffer is one of the few areas where InnoDB does more
IO (in the form of fsynch's) than PG. InnoDB also has fuzzy
checkpoints (which help to keep dirty pages in memory longer),
buffering of writing out changes to secondary indexes, and recently
tunable page level compression.

Baron Schwartz was talking to me about this at Surge. I don't really
understand how the fuzzy checkpoint stuff works, and I haven't been
able to find a good description of it anywhere. How does it keep
dirty pages in memory longer? Details on the other things you mention
would be interesting to hear, too.

For checkpoint behavior:
http://books.google.com/books?id=S_yHERPRZScCpg=PA606lpg=PA606dq=fuzzy+checkpointsource=blots=JJrzRUKBGhsig=UOMPsRy5E-YDgjAFkaSVn3dps_Mhl=enei=_k8yTOfeHYzZnAepyumLBAsa=Xoi=book_resultct=resultresnum=8ved=0CEYQ6AEwBw#v=onepageq=fuzzy%20checkpointf=false

I would think that best case behavior sharp checkpoints with a large
checkpoint_completion_target would have behavior similar to a fuzzy
checkpoint.

Well, under that definition of a fuzzy checkpoint, our checkpoints are
fuzzy even with checkpoint_completion_target=0.

What Baron seemed to be describing was a scheme whereby you could do
what I might call partial checkpoints. IOW, you want to move the redo
pointer without writing out ALL the dirty buffers in memory, so you
write out the pages with the oldest LSNs and then move the redo
pointer to the oldest LSN you have left. Except that doesn't quite
work, because the page might have been dirtied at LSN X and then later
updated again at LSN Y, and you still have to flush it to disk before
moving the redo pointer to any value X. So you work around that by
maintaining a first dirtied LSN for each page as well as the current
LSN.

I'm not 100% sure that this is how it works or that it would work in
PG, but even assuming that it is and does, I'm not sure what the
benefit is over the checkpoint-spreading logic we have now. There
might be some benefit in sorting the writes that we do, so that we can
spread out the fsyncs. So, write all the blocks to a give file,
fsync, and then repeat for each underlying data file that has at least
one dirty block. But that's completely orthogonal to (and would
actually be hindered by) the approach described in the preceding
paragraph.

Insert (for innodb 1.1+ evidently there is also does delete and purge)
buffering:
http://dev.mysql.com/doc/refman/5.5/en/innodb-insert-buffering.html

We do something a bit like this for GIST indices. It would be
interesting to see if it also has a benefit for btree indices.

For a recent ~800GB db I had to restore, the insert buffer saved 92%
of io needed for secondary indexes.

Compression:
http://dev.mysql.com/doc/innodb-plugin/1.0/en/innodb-compression-internals.html

For many workloads 50% compression results in negligible impact to
performance. For certain workloads compression can help performance.
Please note that InnoDB also has non-tunable toast like feature.

Interesting. I am surprised this works well. It seems that this only
works for pages that can be compressed by =50%, which seems like it
could result in a lot of CPU wasted on failed attempts to compress.

Given that InnoDB is not shipping its logs across the wire, I don't
think many users would really care if it used the double writer or
full page writes approach to the redo log (other than the fact that
the log files would be bigger). PG on the other hand *is* pushing its
logs over the wire...

So how is InnoDB doing replication? Is there a second log just for that?

The other log is the binary log and it is one of the biggest
problems with MySQL. Running MySQL in such a way that the binary log
stays in sync with the InnoDB redo has a very significant impact on
performance.
http://www.mysqlperformanceblog.com/2010/10/23/mysql-limitations-part-2-the-binary-log/
http://mysqlha.blogspot.com/2010/09/mysql-versus-mongodb-update-performance.html
(check out the pretty graph)

Hmm. That seems kinda painful. Having to ship full page images over
the wire doesn't seems so bad by comparison, though I'm not very happy
about having to do that either.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Re: Postgres insert performance and storage requirement compared to Oracle

2010-10-27 Thread Robert Haas

On Wed, Oct 27, 2010 at 6:13 AM, Ivan Voras ivo...@freebsd.org wrote:
 On 10/26/10 17:41, Merlin Moncure wrote:
 On Tue, Oct 26, 2010 at 11:08 AM, Leonardo Francalanci m_li...@yahoo.it 
 wrote:
 temp  tables are not wal logged or
 synced.  Periodically they can be flushed  to a permanent table.


 What do you mean with Periodically they can be flushed  to
 a permanent table? Just doing

 insert into tabb select * from temptable


 yup, that's exactly what I mean -- this will give you more uniform

 In effect, when so much data is in temporary storage, a better option
 would be to simply configure synchronous_commit = off (better in the
 sense that the application would not need to be changed). The effects
 are almost the same - in both cases transactions might be lost but the
 database will survive.

Gee, I wonder if it would possible for PG to automatically do an
asynchronous commit of any transaction which touches only temp tables.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Auto ANALYZE criteria

2010-10-26 Thread Robert Haas

On Wed, Oct 13, 2010 at 5:20 PM, Joe Miller joe.d.mil...@gmail.com wrote:
 Thanks for fixing the docs, but if that's the case, I shouldn't be
 seeing the behavior that I'm seeing.

 Should I flesh out this test case a little better and file a bug?

A reproducible test case is always a good thing to have...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] BBU Cache vs. spindles

2010-10-26 Thread Robert Haas

On Fri, Oct 22, 2010 at 3:05 PM, Kevin Grittner
kevin.gritt...@wicourts.gov wrote:
 Rob Wultsch wult...@gmail.com wrote:

 I would think full_page_writes=off + double write buffer should be
 far superior, particularly given that the WAL is shipped over the
 network to slaves.

 For a reasonably brief description of InnoDB double write buffers, I
 found this:

 http://www.mysqlperformanceblog.com/2006/08/04/innodb-double-write/

 One big question before even considering this would by how to
 determine whether a potentially torn page is inconsistent.
 Without a page CRC or some such mechanism, I don't see how this
 technique is possible.

There are two sides to this problem: figuring out when to write a page
to the double write buffer, and figuring out when to read it back from
the double write buffer.  The first seems easy: we just do it whenever
we would XLOG a full page image.  As to the second, when we write the
page out to the double write buffer, we could also write to the double
write buffer the LSN of the WAL record which depends on that full page
image.  Then, at the start of recovery, we scan the double write
buffer and remember all those LSNs.  When we reach one of them, we
replay the full page image.

The good thing about this is that it would reduce WAL volume; the bad
thing about it is that it would probably mean doing two fsyncs where
we only now do one.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] BBU Cache vs. spindles

2010-10-26 Thread Robert Haas

On Tue, Oct 26, 2010 at 10:13 AM, Rob Wultsch wult...@gmail.com wrote:
 The double write buffer is one of the few areas where InnoDB does more
 IO (in the form of fsynch's) than PG. InnoDB also has fuzzy
 checkpoints (which help to keep dirty pages in memory longer),
 buffering of writing out changes to secondary indexes, and recently
 tunable page level compression.

Baron Schwartz was talking to me about this at Surge.  I don't really
understand how the fuzzy checkpoint stuff works, and I haven't been
able to find a good description of it anywhere.  How does it keep
dirty pages in memory longer?  Details on the other things you mention
would be interesting to hear, too.

 Given that InnoDB is not shipping its logs across the wire, I don't
 think many users would really care if it used the double writer or
 full page writes approach to the redo log (other than the fact that
 the log files would be bigger). PG on the other hand *is* pushing its
 logs over the wire...

So how is InnoDB doing replication?  Is there a second log just for that?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Select count(*), the sequel

2010-10-26 Thread Robert Haas

On Sat, Oct 16, 2010 at 2:44 PM, Kenneth Marshall k...@rice.edu wrote:
 Interesting data points. The amount of rows that you managed to
 insert into PostgreSQL before Oracle gave up the ghost is 95%
 of the rows in the Oracle version of the database. To count 5%
 fewer rows, it took PostgreSQL 24 seconds longer. Or adjusting
 for the missing rows, 52 seconds longer for the entire table
 or 18% longer than the full table scan in Oracle. This seems to
 be well within the table layout size differences, possibly due
 to the fillfactor used --not really bad at all.

I don't think this is due to fillfactor - the default fillfactor is
100, and anyway we ARE larger on disk than Oracle.  We really need to
do something about that, in the changes to NUMERIC in 9.1 are a step
in that direction, but I think a lot more work is needed.  I think it
would be really helpful if we could try to quantify where the extra
space is going.

Some places to look:

- Bloated representations of individual datatypes.  (I know that even
the new NUMERIC format is larger than Oracle's NUMBER.)
- Excessive per-tuple overhead.  Ours is 24 bytes, plus the item pointer.
- Alignment requirements.  We have a fair number of datatypes that
require 4 or 8 byte alignment.  How much is that hurting us?
- Compression.  Maybe Oracle's algorithm does better than PGLZ.

If we can quantify where we're losing vs. Oracle - or any other
competitor - that might give us some idea where to start looking.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] HashJoin order, hash the large or small table? Postgres likes to hash the big one, why?

2010-10-26 Thread Robert Haas

On Mon, Oct 18, 2010 at 9:40 PM, Scott Carey sc...@richrelevance.com wrote:
 8.4.5

 I consistently see HashJoin plans that hash the large table, and scan the 
 small table.  This is especially puzzling in some cases where I have 30M rows 
 in the big table and ~ 100 in the small... shouldn't it hash the small table 
 and scan the big one?

 Here is one case I saw just recently

               Hash Cond: ((a.e_id)::text = (ta.name)::text)
               -  Index Scan using c_a_s_e_id on a  (cost=0.00..8.21 rows=14 
 width=27)
                     Index Cond: (id = 12)
               -  Hash  (cost=89126.79..89126.79 rows=4825695 width=74)
                     -  Seq Scan on p_a_1287446030 tmp  (cost=0.00..89126.79 
 rows=4825695 width=74)
                           Filter: (id = 12)

Can we have the complex EXPLAIN output here, please?  And the query?
For example, this would be perfectly sensible if the previous line
started with Hash Semi Join or Hash Anti Join.

rhaas=# explain select * from little where exists (select * from big
where big.a = little.a);
  QUERY PLAN
---
 Hash Semi Join  (cost=3084.00..3478.30 rows=10 width=4)
   Hash Cond: (little.a = big.a)
   -  Seq Scan on little  (cost=0.00..1.10 rows=10 width=4)
   -  Hash  (cost=1443.00..1443.00 rows=10 width=4)
 -  Seq Scan on big  (cost=0.00..1443.00 rows=10 width=4)
(5 rows)

I'm also a bit suspicious of the fact that the hash condition has a
cast to text on both sides, which implies, to me anyway, that the
underlying data types are not text.  That might mean that the query
planner doesn't have very good statistics, which might mean that the
join selectivity estimates are wackadoo, which can apparently cause
this problem:

rhaas=# explain select * from little, big where little.a = big.a;
QUERY PLAN
---
 Hash Join  (cost=3084.00..3577.00 rows=2400 width=8)
   Hash Cond: (little.a = big.a)
   -  Seq Scan on little  (cost=0.00..34.00 rows=2400 width=4)
   -  Hash  (cost=1443.00..1443.00 rows=10 width=4)
 -  Seq Scan on big  (cost=0.00..1443.00 rows=10 width=4)
(5 rows)

rhaas=# analyze;
ANALYZE
rhaas=# explain select * from little, big where little.a = big.a;
QUERY PLAN
---
 Hash Join  (cost=1.23..1819.32 rows=10 width=8)
   Hash Cond: (big.a = little.a)
   -  Seq Scan on big  (cost=0.00..1443.00 rows=10 width=4)
   -  Hash  (cost=1.10..1.10 rows=10 width=4)
 -  Seq Scan on little  (cost=0.00..1.10 rows=10 width=4)
(5 rows)

This doesn't appear to make a lot of sense, but...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] odd postgresql performance (excessive lseek)

2010-10-26 Thread Robert Haas

On Tue, Oct 19, 2010 at 10:36 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Jon Nelson jnelson+pg...@jamponi.net writes:
 This is another situation where using pread would have saved a lot of
 time and sped things up a bit, but failing that, keeping track of the
 file position ourselves and only lseek'ing when necessary would also
 help.

 No, it wouldn't; you don't have the slightest idea what's going on
 there.  Those lseeks are for the purpose of detecting the current EOF
 location, ie, finding out whether some other backend has extended the
 file recently.  We could get rid of them, but only at the cost of
 putting in some other communication mechanism instead.

I don't get it.  Why would be doing that in a tight loop within a
single backend?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Slow count(*) again...

2010-10-14 Thread Robert Haas

On Wed, Oct 13, 2010 at 1:59 PM, Jesper Krogh jes...@krogh.cc wrote:
 On 2010-10-13 15:28, Robert Haas wrote:

 On Wed, Oct 13, 2010 at 6:16 AM, Neil Whelchelneil.whelc...@gmail.com
  wrote:


 I might go as far as to rattle the cage of the developers to see if it
 makes
 any sense to add some column oriented storage capability to Postgres.
 That
 would be the hot ticket to be able to specify an attribute on a column so
 that
 the back end could shadow or store a column in a column oriented table so
 aggregate functions could work on them with good efficiency, or is that
 an
 INDEX?


 I'd love to work on that, but without funding it's tough to find the
 time.  It's a big project.


 Is it hugely different from just getting the visibillity map suitable
 for doing index-only scans and extracting values from the index
 directly as Heikki has explained?]

I think that there's a lot more to a real column-oriented database
than index-only scans, although, of course, index-only scans are very
important.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Slow count(*) again...

2010-10-14 Thread Robert Haas

On Thu, Oct 14, 2010 at 12:22 AM, mark dvlh...@gmail.com wrote:
 Could this be an interesting test use of https://www.fossexperts.com/ ?

 'Community' driven proposal - multiple people / orgs agree to pay various
 portions? Maybe with multiple funders a reasonable target fund amount could
 be reached.

 Just throwing around ideas here.

This is a bit off-topic, but as of now, they're only accepting
proposals for projects to be performed by CommandPrompt itself.  So
that doesn't help me much (note the sig).

But in theory it's a good idea.  Of course, when and if they open it
up, then what?  If more than one developer or company is interested in
a project, who determines who gets to do the work and get paid for it?
 If that determination is made by CommandPrompt itself, or if it's
just a free-for-all to see who can get their name on the patch that
ends up being committed, it's going to be hard to get other
people/companies to take it very seriously.

Another problem is that even when they do open it up, they apparently
intend to charge 7.5 - 15% of the contract value as a finder's fee.
That's a lot of money.  For a $100 project it's totally reasonable,
but for a $10,000 project it's far more expensive than the value of
the service they're providing can justify.  (Let's not even talk about
a $100,000 project.)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] How does PG know if data is in memory?

2010-10-13 Thread Robert Haas

On Tue, Oct 12, 2010 at 10:35 AM, Kevin Grittner
kevin.gritt...@wicourts.gov wrote:
 (1) Heavily used data could be kept fully cached in RAM and not
 driven out by transient activity.

We've attempted to address this problem by adding logic to prevent the
buffer cache from being trashed by vacuums, bulk loads, and sequential
scans.  It would be interesting to know if anyone has examples of that
logic falling over or proving inadequate.

 (2) You could flag a cache used for (1) above as using relaxed LRU
 accounting -- it saved a lot of time tracking repeated references,
 leaving more CPU for other purposes.

We never do strict LRU accounting.

 (3) Each named cache had its own separate set of locks, reducing
 contention.

We have lock partitions, but as discussed recently on -hackers, they
seem to start falling over around 26 cores.  We probably need to
improve that, but I'd rather do that by making the locking more
efficient and by increasing the number of partitions rather than by
allowing users to partition the buffer pool by hand.

 (4) Large tables for which the heap was often were scanned in its
 entirety or for a range on the clustered index could be put in a
 relatively small cache with large I/O buffers.  This avoided blowing
 out the default cache space for situations which almost always
 required disk I/O anyway.

I think, but am not quite sure, that my answer to point #1 is also
relevant here.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Slow count(*) again...

2010-10-13 Thread Robert Haas

On Tue, Oct 12, 2010 at 1:07 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Anyway, if anyone is hot to make COUNT(*) faster, that's where to look.
 I don't think any of the previous discussion in this thread is on-point
 at all, except for the parts where people suggested avoiding it.

I kind of hope that index-only scans help with this, too.  If you have
a wide table and a narrow (but not partial) index, and if the
visibility map bits are mostly set, it ought to be cheaper to read the
index than the table - certainly in the case where any disk I/O is
involved, and maybe even if it isn't.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Slow count(*) again...

2010-10-13 Thread Robert Haas

On Wed, Oct 13, 2010 at 4:38 AM, Neil Whelchel neil.whelc...@gmail.com wrote:
 There seems to be allot of discussion about VACUUM FULL, and its problems. The
 overall buzz seems to be that VACUUM FULL is a bad idea (I could be wrong
 here). It has been some time since I have read the changelogs, but I seem to
 remember that there have been some major changes to VACUUM FULL recently.
 Maybe this needs to be re-visited in the documentation.

In 9.0, VACUUM FULL does something similar to what CLUSTER does.  This
is a much better idea than what it did in 8.4 and prior.

 crash:~# time psql -U test test -c VACUUM FULL log;
 VACUUM

 real    4m49.055s
 user    0m0.000s
 sys     0m0.000s

 crash:~# time psql -U test test -c SELECT count(*) FROM log;
  count
 --
  10050886
 (1 row)

 real    0m9.665s
 user    0m0.000s
 sys     0m0.004s

 A huge improvement from the minute and a half before the VACUUM FULL.

This is a very surprising result that I would like to understand
better.  Let's assume that your UPDATE statement bloated the table by
2x (you could use pg_relation_size to find out exactly; the details
probably depend on fillfactor which you might want to lower if you're
going to do lots of updates).  That ought to mean that count(*) has to
grovel through twice as much data, so instead of taking 9 seconds it
ought to take 18 seconds.  Where the heck is the other 1:12 going?
This might sort of make sense if the original table was laid out
sequentially on disk and the updated table was not, but how and why
would that happen?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Slow count(*) again...

2010-10-13 Thread Robert Haas

On Wed, Oct 13, 2010 at 6:16 AM, Neil Whelchel neil.whelc...@gmail.com wrote:
 I might go as far as to rattle the cage of the developers to see if it makes
 any sense to add some column oriented storage capability to Postgres. That
 would be the hot ticket to be able to specify an attribute on a column so that
 the back end could shadow or store a column in a column oriented table so
 aggregate functions could work on them with good efficiency, or is that an
 INDEX?

I'd love to work on that, but without funding it's tough to find the
time.  It's a big project.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] How does PG know if data is in memory?

2010-10-12 Thread Robert Haas

On Mon, Oct 11, 2010 at 11:11 PM,  gnuo...@rcn.com wrote:
 An approach that works can be found in DB2, and likely elsewhere.

 The key is that tablespaces/tables/indexes/buffers are all attached through 
 the bufferpool (the DB2 term).  A tablespace/bufferpool match is defined.  
 Then tables and indexes are assigned to the tablespace (and implicitly, the 
 bufferpool).  As a result, one can effectively pin data in memory.  This is 
 very useful, but not low hanging fruit to implement.

 The introduction of rudimentary tablespaces is a first step.  I assumed that 
 the point was to get to a DB2-like structure at some point.  Yes?

We already have tablespaces, and our data already is accessed through
the buffer pool.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Runtime dependency from size of a bytea field

2010-10-12 Thread Robert Haas

On Fri, Oct 8, 2010 at 12:53 AM, Sander, Ingo (NSN - DE/Munich)
ingo.san...@nsn.com wrote:
 The difference to my test is that we use the ODBC interface in our C program. 
 Could it be that the difference in the runtimes is caused by the ODBC?

I've heard tell that ODBC is substantially slower than a native libpq
connection, but I don't know that for a fact, not being an ODBC user.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] gist indexes for distance calculations

2010-10-11 Thread Robert Haas

On Fri, Oct 1, 2010 at 1:56 AM, Jesper Krogh jes...@krogh.cc wrote:
 On 2010-09-30 20:33, Marcelo Zabani wrote:

 If you can also pinpoint me to where I can find this sort of information
 (index utilization and planning, performance tuning), I'd be very
 grateful.
 Thank you already,


 Isn't this what the knngist patches are for?
 https://commitfest.postgresql.org/action/patch_view?id=350

 http://www.sai.msu.su/~megera/wiki/knngist

Those are for when you want to order by distance; the OP is trying to
*filter* by distance, which is different.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] How does PG know if data is in memory?

2010-10-11 Thread Robert Haas

On Mon, Oct 4, 2010 at 6:47 PM, Jeremy Harris j...@wizmail.org wrote:
 On 10/04/2010 04:22 AM, Greg Smith wrote:

 I had a brain-storming session on this subject with a few of the hackers
 in the community in this area a while back I haven't had a chance to do
 something with yet (it exists only as a pile of scribbled notes so far).
 There's a couple of ways to collect data on what's in the database and OS
 cache, and a couple of ways to then expose that data to the optimizer. But
 that needs to be done very carefully, almost certainly as only a manual
 process at first, because something that's producing cache feedback all of
 the time will cause plans to change all the time, too. Where I suspect this
 is going is that we may end up tracking various statistics over time, then
 periodically providing a way to export a mass of typical % cached data
 back to the optimizer for use in plan cost estimation purposes. But the idea
 of monitoring continuously and always planning based on the most recent data
 available has some stability issues, both from a too many unpredictable
 plan changes and a ba

 d

 short-term feedback loop perspective, as mentioned by Tom and Kevin
 already.

 Why not monitor the distribution of response times, rather than cached vs.
 not?

 That a) avoids the issue of discovering what was a cache hit  b) deals
 neatly with
 multilevel caching  c) feeds directly into cost estimation.

I was hot on doing better cache modeling a year or two ago, but the
elephant in the room is that it's unclear that it solves any
real-world problem.  The OP is clearly having a problem, but there's
not enough information in his post to say what is actually causing it,
and it's probably not caching effects.  We get occasional complaints
of the form the first time I run this query it's slow, and then after
that it's fast but, as Craig Ringer pointed out upthread, not too
many.  And even with respect to the complaints we do get, it's far
from clear that the cure is any better than the disease.  Taking
caching effects into account could easily result in the first
execution being slightly less slow and all of the subsequent
executions being moderately slow.  That would not be an improvement
for most people.  The reports that seem really painful to me are the
ones where people with really big machines complain of needing HOURS
for the cache to warm up, and having the system bogged down to a
standstill until then.  But changing the cost model isn't going to
help them either.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] [HACKERS] MIT benchmarks pgsql multicore (up to 48)performance

2010-10-07 Thread Robert Haas

On Wed, Oct 6, 2010 at 10:07 PM, Stephen Frost sfr...@snowman.net wrote:
 * Robert Haas (robertmh...@gmail.com) wrote:
 It's good to be you.

 They're HP BL465 G7's w/ 2x 12-core AMD processors and 48G of RAM.
 Unfortunately, they currently only have local storage, but it seems
 unlikely that would be an issue for this.

 I don't suppose you could try to replicate the lseek() contention?

 I can give it a shot, but the impression I had from the paper is that
 the lseek() contention wouldn't be seen without the changes to the lock
 manager...?  Or did I misunderstand?

rereads appropriate section of paper

Looks like the lock manager problems hit at 28 cores, and the lseek
problems at 36 cores.  So your system might not even be big enough to
manifest either problem.

It's unclear to me whether a 48-core system would be able to see the
lseek issues without improvements to the lock manager, but perhaps it
would be possible by, say, increasing the number of lock partitions by
8x.  It would be nice to segregate these issues though, because using
pread/pwrite is probably a lot less work than rewriting our lock
manager.  Do you have tools to measure the lseek overhead?  If so, we
could prepare a patch to use pread()/pwrite() and just see whether
that reduced the overhead, without worrying so much about whether it
was actually a major bottleneck.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] [HACKERS] MIT benchmarks pgsql multicore (up to 48)performance

2010-10-07 Thread Robert Haas

On Thu, Oct 7, 2010 at 1:21 PM, Kevin Grittner
kevin.gritt...@wicourts.gov wrote:
 Robert Haas robertmh...@gmail.com wrote:

 perhaps it would be possible by, say, increasing the number of
 lock partitions by 8x.  It would be nice to segregate these issues
 though, because using pread/pwrite is probably a lot less work
 than rewriting our lock manager.

 You mean easier than changing this 4 to a 7?:

 #define LOG2_NUM_LOCK_PARTITIONS  4

 Or am I missing something?

Right.  They did something more complicated (and, I think, better)
than that, but that change by itself might be enough to ameliorate the
lock contention enough to see the lsek() issue.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Odd behaviour with redundant CREATE statement

2010-10-07 Thread Robert Haas

On Mon, Sep 27, 2010 at 3:27 PM, Gurjeet Singh singh.gurj...@gmail.com wrote:
 On Mon, Sep 27, 2010 at 8:50 PM, Dave Crooke dcro...@gmail.com wrote:

 Our Java application manages its own schema. Some of this is from
 Hibernate, but some is hand-crafted JDBC.

 By way of an upgrade path, we have a few places where we have added
 additional indexes to optimize performance, and so at startup time the
 application issues CREATE INDEX ... statements for these, expecting to
 catch the harmless exception ERROR:  relation date_index already exists,
 as a simpler alternative to using the meta-data to check for it first.

 In general, this seems to work fine, but we have one installation where we
 observed one of these CREATE statements hanging up in the database, as if
 waiting for a lock, thus stalling the app startup

 You can tell if it is really waiting by looking at 'select * from pg_locks',
 and check the 'granted' column.

CREATE INDEX (without CONCURRENTLY) tries to acquire a share-lock on
the table, which will conflict with any concurrent INSERT, UPDATE,
DELETE, or VACUUM.  It probably tries to acquire the lock before
noticing that the index is a duplicate.  CREATE INDEX CONCURRENTLY
might be an option, or you could write and call a PL/pgsql function
(or, in 9.0, use a DO block) to test for the existence of the index
before trying create it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] [HACKERS] MIT benchmarks pgsql multicore (up to 48)performance

2010-10-06 Thread Robert Haas

On Wed, Oct 6, 2010 at 6:31 PM, Ivan Voras ivo...@freebsd.org wrote:
 On 10/04/10 20:49, Josh Berkus wrote:

 The other major bottleneck they ran into was a kernel one: reading from
 the heap file requires a couple lseek operations, and Linux acquires a
 mutex on the inode to do that. The proper place to fix this is
 certainly in the kernel but it may be possible to work around in
 Postgres.

 Or we could complain to Kernel.org.  They've been fairly responsive in
 the past.  Too bad this didn't get posted earlier; I just got back from
 LinuxCon.

 So you know someone who can speak technically to this issue? I can put
 them in touch with the Linux geeks in charge of that part of the kernel
 code.

 Hmmm... lseek? As in lseek() then read() or write() idiom? It AFAIK
 cannot be fixed since you're modifying the global strean position
 variable and something has got to lock that.

Well, there are lock free algorithms using CAS, no?

 OTOH, pread() / pwrite() don't have to do that.

Hey, I didn't know about those.  That sounds like it might be worth
investigating, though I confess I lack a 48-core machine on which to
measure the alleged benefit.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] [HACKERS] MIT benchmarks pgsql multicore (up to 48)performance

2010-10-06 Thread Robert Haas

On Wed, Oct 6, 2010 at 9:30 PM, Stephen Frost sfr...@snowman.net wrote:
 * Robert Haas (robertmh...@gmail.com) wrote:
 Hey, I didn't know about those.  That sounds like it might be worth
 investigating, though I confess I lack a 48-core machine on which to
 measure the alleged benefit.

 I've got a couple 24-core systems, if it'd be sufficiently useful to
 test with..

It's good to be you.

I don't suppose you could try to replicate the lseek() contention?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Query much faster with enable_seqscan=0

2010-09-28 Thread Robert Haas

On Wed, Sep 22, 2010 at 9:36 AM, Ogden li...@darkstatic.com wrote:

 On Sep 21, 2010, at 2:34 PM, Ogden wrote:


 On Sep 21, 2010, at 2:16 PM, Greg Smith wrote:

 Joshua D. Drake wrote:
 PostgreSQL's defaults are based on extremely small and some would say
 (non production) size databases. As a matter of course I always
 recommend bringing seq_page_cost and random_page_cost more in line.


 Also, they presume that not all of your data is going to be in memory, and 
 the query optimizer needs to be careful about what it does and doesn't pull 
 from disk.  If that's not the case, like here where there's 8GB of RAM and 
 a 7GB database, dramatic reductions to both seq_page_cost and 
 random_page_cost can make sense.  Don't be afraid to think lowering below 
 1.0 is going too far--something more like 0.01 for sequential and 0.02 for 
 random may actually reflect reality here.


 I have done just that, per your recommendations and now what took 14 
 seconds, only takes less than a second, so it was certainly these figures I 
 messed around with. I have set:

 seq_page_cost = 0.01
 random_page_cost = 0.02
 cpu_tuple_cost = 0.01

 Everything seems to run faster now. I think this should be fine - I'll keep 
 an eye on things over the next few days.

 I truly appreciate everyone's help.

 Ogden



 I spoke too soon - well I came in this morning and reran the query that was 
 speeded up yesterday by a lot after tweaking those numbers. This morning the 
 first time I ran it, it took 16 seconds whereas every subsequent run was a 
 matter of 2 seconds. I assume there is OS caching going on for those results. 
 Is this normal or could it also be the speed of my disks which is causing a 
 lag when I first run it (it's RAID 5 across 6 disks). Is there any 
 explanation for this or what should those settings really be? Perhaps 0.01 is 
 too low?

Yeah, I think those numbers are a bit low.  Your database probably
isn't fully cached.  Keep in mind there's going to be some fluctuation
as to what is and is not in cache, and you can't expect whatever plan
the planner picks to be exactly perfect for both cases.  I might try
something more like 0.2 / 0.1.  If you really need the query to be
fast, though, you might need to do more than jigger the page costs.
Did you try Tom's suggested rewrite?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] how to enforce index sub-select over filter+seqscan

2010-09-27 Thread Robert Haas

On Thu, Sep 23, 2010 at 10:26 AM, Dmitry Teslenko dtesle...@gmail.com wrote:
 I know I can set enable_seqscan = off.
 Is there other ways to enforce index usage?

Not really, but I suspect random_page_cost and seq_page_cost might
help the planner make better decisions.  Is your data by any chance
mostly cached in memory?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] turn off caching for performance test

2010-09-25 Thread Robert Haas

On Fri, Aug 27, 2010 at 1:57 PM, Scott Marlowe scott.marl...@gmail.com wrote:
 On Thu, Aug 26, 2010 at 4:32 AM, Willy-Bas Loos willy...@gmail.com wrote:
 Hi,

 I have a colleague that is convinced that the website is faster if
 enable_seqscan is turned OFF.
 I'm convinced of the opposite (better to leave it ON), but i would like to
 show it, prove it to him.

 Stop, you're both doing it wrong.  The issue isn't whether or not
 turning off seq scans will make a few things faster here and there,
 it's why is the query planner choosing sequential scans when it should
 be choosing index scans.

 So, what are your non-default settings in postgresql.conf?
 Have you increased effective_cache_size yet?
 Lowered random_page_cost?
 Raised default stats target and re-analyzed?

 Have you been looking at the problem queries with explain analyze?
 What does it have to say about the planners choices?

[a bit behind on my email]

This was exactly my thought on first reading this post.  If the
indexes are faster and PG thinks they are slower, it's a good bet that
there are some parameters that need tuning.  Specifically,
effective_cache_size may be too low, and random_page_cost and
seq_page_cost are almost certainly too high.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Is disableing nested_loops a bad idea ?

2010-09-25 Thread Robert Haas

On Thu, Sep 16, 2010 at 10:13 AM, Franck Routier
franck.rout...@axege.com wrote:
 Thanks Kevin and Samuel for your input.

 The point is we already made a lot of tweaking to try to tune postgresql
 to behave correctly. I work with Damien, and here is a post he did in
 july to explain the kind of problems we have
 http://comments.gmane.org/gmane.comp.db.postgresql.performance/25745

 The end of the thread was Robert Hass concluding that Disabling
 nestloops altogether, even for one particular query, is
 often going to be a sledgehammer where you need a scalpel.   But then
 again, a sledgehammer is better than no hammer.

 So I wanted to better understand to what extend using a sledgehammer
 will impact me :-)

One particular case where you may get a nasty surprise is:

Nested Loop
- Whatever
- Index Scan

This isn't necessarily terrible if the would-be index scan is on a
small table, because a hash join may be not too bad.  It may not be
too good, either, but if the would-be index scan is on a large table
the whole thing might turn into a merge join.  That can get pretty
ugly.  Of course in some cases the planner may be able to rejigger the
whole plan in some way that mitigates the damage, but not necessarily.

One of the things I've noticed about our planner is that it becomes
less predictable in stressful situations.  As you increase the number
of tables involved in join planning, for example, the query planner
still delivers a lot of very good plans, but not quite as predictably.
 Things don't slow down uniformly across the board; instead, most of
the plans remain pretty good but every once in a while (and with
increasing frequency as you keep cranking up the table count) you get
a bad one.  Shutting off any of the enable_* constants will, I think,
produce a similar effect.  Many queries can be adequate handled using
some other technique and you won't really notice it, but you may find
that you have a few (or someone will eventually write one) which
*really* needs whatever technique you turned off for decent
performance.  At that point you don't have a lot of options...

Incidentally, it's Haas, rather than Hass.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Using Between

2010-09-22 Thread Robert Haas

On Tue, Sep 21, 2010 at 4:04 PM, Ozer, Pam po...@automotive.com wrote:
 There are 850,000 records in vehicleused.  And the database is too big to be 
 kept in memory.

Ah.  So in other words, you are retrieving about half the rows in that
table.  For those kinds of queries, using the index tends to actually
be slower, because (1) you read the index in addition to reading the
table, which has CPU and I/O cost, and (2) instead of reading the
table sequentially, you end up jumping around and reading it out of
order, which tends to result in more disk seeks and defeats the OS
prefetch logic.  The query planner is usually pretty smart about
making good decisions about this kind of thing.  As a debugging aid
(but never in production), you can try disabling enable_seqscan and
see what plan you get that way.  If it's slower, well then the query
planner did the right thing.  If it's faster, then probably you need
to adjust seq_page_cost and random_page_cost a bit.  But my guess is
that it will be somewhere between a lot slower and only very slightly
faster.

A whole different line of inquiry is ask the more general question
how can I make this query faster?, but I'm not sure whether you're
unhappy with how the query is running or just curious about why the
index isn't being used.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Using Between

2010-09-22 Thread Robert Haas

On Wed, Sep 22, 2010 at 11:18 AM, Ozer, Pam po...@automotive.com wrote:
 The question is how can we make it faster.

If there's just one region ID for any given postal code, you might try
adding a column to vehicleused and storing the postal codes there.
You could possibly populate that column using a trigger; probably it
doesn't change unless the postalcode changes.  Then you could index
that column and query against it directly, rather than joining to
PostalCodeRegionCountyCity.  Short of that, I don't see any obvious
way to avoid reading most of the vehicleused table.  There may or may
not be an index that can speed that up slightly and of course you can
always throw hardware at the problem, but fundamentally reading half a
million or more rows isn't going to be instantaneous.

Incidentally, it would probably simplify things to store postal codes
in the same case throughout the system. If you can avoid the need to
write lower(x) = lower(y) and just write x = y you may get better
plans.  I'm not sure that's the case in this particular example but
it's something to think about.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Useless sort by

2010-09-22 Thread Robert Haas

On Mon, Sep 13, 2010 at 1:09 PM, Gaetano Mendola mend...@gmail.com wrote:
 I see your point, but some functions like:  unique, count are not affected
 by the order of values fed, and I don't think either that unique has to
 give out the unique values in the same fed order.

Gee, I'd sure expect it to.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Useless sort by

2010-09-22 Thread Robert Haas

On Wed, Sep 22, 2010 at 11:05 PM,  gnuo...@rcn.com wrote:
 Spoken like a dyed in the wool COBOL coder.  The RM has no need for order; 
 it's set based.  I've dabbled in PG for some time, and my sense is 
 increasingly that PG developers are truly code oriented, not database (set) 
 oriented.

I'm struggling to think of an adequate response to this.  I think I'm
going to go with: huh?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] GPU Accelerated Sorting

2010-09-21 Thread Robert Haas

On Mon, Aug 30, 2010 at 9:46 AM, Eliot Gable ega...@gmail.com wrote:
 Not sure if anyone else saw this, but it struck me as an interesting
 idea if it could be added to PostgreSQL. GPU accelerated database
 operations could be very... interesting. Of course, this could be
 difficult to do in a way that usefully increases performance of
 PostgreSQL, but I'll leave that up to you guys to figure out.

 http://code.google.com/p/back40computing/wiki/RadixSorting

It would be hard to use this because, in addition to the fact that
this is specific to a very particular type of hardware, it only works
if you're trying to do a very particular type of sort.  For example,
it wouldn't handle multi-byte characters properly.  And it wouldn't
handle integers properly either - you'd end up sorting negatives after
positives.  You could possibly still find applications for it but
they'd be quite narrow, I think.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Using Between

2010-09-21 Thread Robert Haas

,
 vehicleused.vehicleusedmileagerangefloor,
 vehicleused.hasvehicleusedmileage,
 vehicleused.VehicleUsedIntId.distinct_count,
 vehicleused.VehicleUsedPrice.average,
 vehicleused.VehicleUsedId.distinct_count, vehicleused.iscpo,
 vehicleused.ismtca, vehicleused.cpoprogramoemid,
 vehicleused.cpoprogram3rdpartyid
                          Filter: ((vehicleusedpricerangefloor = 0) AND
 (vehicleusedpricerangefloor = 15000))
                    -  Hash  (cost=283.32..283.32 rows=522 width=6)
 (actual time=1.084..1.084 rows=532 loops=1)
                          Output: postalcoderegioncountycity.postalcode
                          -  Bitmap Heap Scan on
 postalcoderegioncountycity  (cost=12.30..283.32 rows=522 width=6)
 (actual time=0.092..0.361 rows=532 loops=1)
                                Output:
 postalcoderegioncountycity.postalcode
                                Recheck Cond: (regionid = 26)
                                -  Bitmap Index Scan on
 postalcoderegioncountycity_i05  (cost=0.00..12.17 rows=522 width=0)
 (actual time=0.082..0.082 rows=532 loops=1)
                                      Index Cond: (regionid = 26)
 Total runtime: 1945.244 ms

How many rows are in the vehicleused table in total?

Is your database small enough to fit in memory?

Do you have any non-default settings in postgresql.conf?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] slow DDL creation

2010-09-21 Thread Robert Haas

On Tue, Aug 31, 2010 at 11:35 AM, Kevin Kempter
cs_...@consistentstate.com wrote:
 On Monday 30 August 2010 17:04, bricklen wrote:
 On Mon, Aug 30, 2010 at 3:28 PM, Kevin Kempter

 cs_...@consistentstate.com wrote:
  Hi all ;
 
  we have an automated partition creation process that includes the
  creation of an FK constraint. we have a few other servers with similar
  scenarios and this is the only server that stinks per when we create the
  new partitions.
 
  Anyone have any thoughts on how to debug this?  were running postgres
  8.4.4 on CentOS 5.5
 
  Thanks in advance

 Is the referenced column indexed?

 no, but its for a new partition so there's no data as of yet in the partition

What exactly does stinks mean?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: Fwd: [PERFORM] Vacuum Full + Cluster + Vacuum full = non removable dead rows

2010-08-22 Thread Robert Haas

On Sat, Aug 21, 2010 at 9:49 AM, Alexandre de Arruda Paes
adald...@gmail.com wrote:
 Only for discussion: the CLUSTER command, in my little knowledge, is a
 intrusive command that's cannot recover the dead tuples too.

 Only TRUNCATE can do this job, but obviously is not applicable all the time.

Either VACUUM or CLUSTER will recover *dead* tuples.  What you can't
recover are tuples that are still visible to some running transaction.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Two fast searches turn slow when used with OR clause

2010-08-17 Thread Robert Haas

On Thu, Aug 5, 2010 at 2:34 PM, Craig James craig_ja...@emolecules.com wrote:
 = explain analyze select p.price, p.amount, p.units, s.catalogue_id,
 vn.version_id
 - from plus p join sample s
 -  on (p.compound_id = s.compound_id and p.supplier_id = s.supplier_id)
 - join version vn on (s.version_id = vn.version_id) join parent pn
 -  on (s.parent_id = pn.parent_id)
 - where vn.isosmiles = 'Fc1n1B1OC(C)(C)C(C)(C)O1'
 - or pn.isosmiles = 'Fc1n1B1OC(C)(C)C(C)(C)O1'
 - order by price;

Well, you can't evaluate the WHERE clause here until you've joined {s vn pn}.

 If I only query the VERSION table, it's very fast:

 x= explain analyze select p.price, p.amount, p.units, s.catalogue_id,
 vn.version_id
 - from plus p
 - join sample s on (p.compound_id = s.compound_id and p.supplier_id =
 s.supplier_id)
 - join version vn on (s.version_id = vn.version_id)
 - where vn.isosmiles = 'Fc1n1B1OC(C)(C)C(C)(C)O1' order by price;

But here you can push the WHERE clause all the way down to the vn
table, and evaluate it right at the get go, which is pretty much
exactly what is happening.

In the first case, you have to join all 297,306 vn rows against s,
because they could be interesting if the other half of the WHERE
clause turns out to hold.  In the second case, you can throw away
297,305 of those 297,306 rows before doing anything else, because
there's no possibility that they can ever be interesting.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Questions on query planner, join types, and work_mem

2010-08-11 Thread Robert Haas

On Wed, Aug 11, 2010 at 9:42 PM, Bruce Momjian br...@momjian.us wrote:
 This confused me.  If we are assuing the data is in
 effective_cache_size, why are we adding sequential/random page cost to
 the query cost routines?

See the comments for index_pages_fetched().  We basically assume that
all data starts uncached at the beginning of each query - in fact,
each plan node.  effective_cache_size only measures the chances that
if we hit the same block again later in the execution of something
like a nested-loop-with-inner-indexscan, it'll still be in cache.

It's an extremely weak knob, and unless you have tables or indices
that are larger than RAM, the only mistake you can make is setting it
too low.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Questions on query planner, join types, and work_mem

2010-08-04 Thread Robert Haas

On Tue, Aug 3, 2010 at 3:03 AM, Hannu Krosing ha...@2ndquadrant.com wrote:
 In case of fully cached database it is closer to 1.

In the case of a fully cached database I believe the correct answer
begins with a decimal point.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Strange explain on partitioned tables

2010-08-04 Thread Robert Haas

On Mon, Jul 26, 2010 at 5:26 PM, Gerald Fontenay gvfonte...@lbl.gov wrote:
 Thank you for your response. So if I query only my target child table, this
 should be just like any other single table wrt planning right? I have
 thousands of these tables. (I suppose that I'm only using inheritance for
 the sake of organization in this situation...)

Yeah, I wouldn't expect planning time to be affected by whether a
table has parents; only whether it has children.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Questions on query planner, join types, and work_mem

2010-08-02 Thread Robert Haas

On Mon, Aug 2, 2010 at 5:23 PM, Peter Hussey pe...@labkey.com wrote:
 I already had effective_cache_size set to 500MB.

 I experimented with lowering  random_page_cost to 3 then 2.  It made no
 difference in the choice of plan that I could see.  In the explain analyze
 output the estimated costs of nested loop were in fact lowererd, but so were
 the costs of the hash join plan, and the hash join remained the lowest
 predicted costs in all tests i tried.

What do you get if you set random_page_cost to a small value such as 0.01?

 What seems wrong to me is that the hash join strategy shows almost no
 difference in estimated costs as work_mem goes from 1MB to 500MB. The cost
 function decreases by 1%, but the actual time for the query to execute
 decreases by 86% as work_mem goes from 1MB to 500MB.

Wow.  It would be interesting to find out how many batches are being
used.  Unfortunately, releases prior to 9.0 don't display that
information.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Explains of queries to partitioned tables

2010-07-30 Thread Robert Haas

On Mon, Jul 26, 2010 at 4:47 AM, Vlad Arkhipov arhi...@dc.baikal.ru wrote:
 There is a partitioned table with 2 partitions:

 drop table if exists p cascade;

 create table p (
  id bigint not null primary key,
  ts timestamp);

 create table p_actual ( check (ts is null) ) inherits (p);
 create table p_historical ( check (ts is not null) ) inherits (p);

 -- I skipped the triggers and rules creation

 insert into p (id, ts) values (1, '2000-01-01');
 insert into p (id, ts) values (2, null);
 insert into p (id, ts) values (3, '2001-01-01');
 insert into p (id, ts) values (4, '2005-01-01');

 analyze p;
 analyze p_actual;
 analyze p_historical;

 Here is the explain output for the query 'select * from p where ts is null'

 Result  (cost=0.00..188.10 rows=10 width=16) (actual time=0.028..0.038
 rows=1 loops=1)
  -  Append  (cost=0.00..188.10 rows=10 width=16) (actual time=0.023..0.029
 rows=1 loops=1)
        -  Seq Scan on p  (cost=0.00..187.00 rows=9 width=16) (actual
 time=0.002..0.002 rows=0 loops=1)
              Filter: (ts IS NULL)
        -  Seq Scan on p_actual p  (cost=0.00..1.10 rows=1 width=16) (actual
 time=0.014..0.016 rows=1 loops=1)
              Filter: (ts IS NULL)
 Total runtime: 0.080 ms

 You can notice that the optimizer expects 10 rows in the table p and as a
 result of this assumption the whole query is estimated as 10 rows. Whether
 it will cause a performance impact further? pg_stats does not contain any
 statistics on the table 'p'. Is this a cause of such behaviour?
 The estimation is worse for some other queries, for example 'select * from p
 where ts is not null'

 Result  (cost=0.00..188.30 rows=1764 width=16) (actual time=0.021..0.049
 rows=3 loops=1)
  -  Append  (cost=0.00..188.30 rows=1764 width=16) (actual
 time=0.016..0.032 rows=3 loops=1)
        -  Seq Scan on p  (cost=0.00..187.00 rows=1761 width=16) (actual
 time=0.003..0.003 rows=0 loops=1)
              Filter: (ts IS NOT NULL)
        -  Seq Scan on p_historical p  (cost=0.00..1.30 rows=3 width=16)
 (actual time=0.008..0.015 rows=3 loops=1)
              Filter: (ts IS NOT NULL)
 Total runtime: 0.095 ms

It would be easier to comment on this if you mentioned things like
which version of PG you're using, and what you have
constraint_exclusion set to, but as a general comment analyze doesn't
store statistics for any tables that are empty, because it assumes
that at some point you're going to put data in them.  So in this case
p_historical is probably using fake stats.  But it's not clear that it
really matters: you haven't got any relevant indices, so a sequential
scan is the only possible plan; and even if you did have some, there's
only 4 rows, so a sequential scan is probably the only plan that makes
sense anyway.  And your query ran in a tenth of a millisecond, which
is pretty zippy.  So I'm not really sure what the problem is.  If this
isn't the real data, post an example with the real data and ask for
help about that.

http://wiki.postgresql.org/wiki/SlowQueryQuestions

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Pooling in Core WAS: Need help in performance tuning.

2010-07-28 Thread Robert Haas

On Wed, Jul 28, 2010 at 3:44 PM, Josh Berkus j...@agliodbs.com wrote:
 On 7/27/10 6:56 PM, Tom Lane wrote:
 Yeah, if it weren't for that I'd say sure let's try it.  But I'm
 afraid we'd be introducing significant headaches in return for a gain
 that's quite speculative.

 Well, the *gain* isn't speculative.  For example, I am once again
 dealing with the issue that PG backend processes on Solaris never give
 up their RAM, resulting in pathological swapping situations if you have
 many idle connections.  This requires me to install pgpool, which is
 overkill (since it has load balancing, replication, and more) just to
 make sure that connections get recycled so that I don't have 300 idle
 connections eating up 8GB of RAM.

 Relative to switching databases, I'd tend to say that, like pgbouncer
 and pgpool, we don't need to support that.  Each user/database combo can
 have their own pool.  While not ideal, this would be good enough for
 90% of users.

However, if we don't support that, we can't do any sort of pooling-ish
thing without the ability to pass file descriptors between processes;
and Tom seems fairly convinced there's no portable way to do that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Pooling in Core WAS: Need help in performance tuning.

2010-07-28 Thread Robert Haas

On Wed, Jul 28, 2010 at 4:10 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 However, if we don't support that, we can't do any sort of pooling-ish
 thing without the ability to pass file descriptors between processes;
 and Tom seems fairly convinced there's no portable way to do that.

 Well, what it would come down to is: are we prepared to not support
 pooling on platforms without such a capability?  It's certainly possible
 to do it on many modern platforms, but I don't believe we can make it
 happen everywhere.  Generally we've tried to avoid having major features
 that don't work everywhere ...

I suppose it depends on the magnitude of the benefit.  And how many
platforms aren't covered.  And how much code is required.  In short,
until someone writes a patch, who knows?  I think the core question we
should be thinking about is what would be the cleanest method of
resetting a backend - either for the same database or for a different
one, whichever seems easier.  And by cleanest, I mean least likely to
introduce bugs.  If we can get to the point where we have something to
play around with, even if it's kind of kludgey or doesn't quite work,
it'll give us some idea of whether further effort is worthwhile and
how it should be directed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Pooling in Core WAS: Need help in performance tuning.

2010-07-27 Thread Robert Haas

On Sat, Jul 24, 2010 at 2:23 AM, Craig Ringer
cr...@postnewspapers.com.au wrote:
 On 24/07/10 01:28, Robert Haas wrote:

 Well, if we could change the backends so that they could fully
 reinitialize themselves (disconnect from a database to which they are
 bound, etc.), I don't see why we couldn't use the Apache approach.

 This would offer the bonus on the side that it'd be more practical to
 implement database changes for a connection, akin to MySQL's USE.
 Inefficient, sure, but possible.

Yep.

 I don't care about that current limitation very much. I think anyone
 changing databases all the time probably has the wrong design and should
 be using schema. I'm sure there are times it'd be good to be able to
 switch databases on one connection, though.

I pretty much agree with this.  I think this is merely slightly nice
on its own, but I think it might be a building-block to other things
that we might want to do down the road.  Markus Wanner's Postgres-R
replication uses worker processes; autovacuum does as well; and then
there's parallel query.  I can't help thinking that not needing to
fork a new backend every time you want to connect to a new database
has got to be useful.

 My question with all this remains: is it worth the effort when external
 poolers already solve the problem.

Whether it's worth the effort is something anyone who is thinking
about working on this will have to decide for themselves.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Pooling in Core WAS: Need help in performance tuning.

2010-07-27 Thread Robert Haas

On Tue, Jul 27, 2010 at 4:40 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Thu, Jul 22, 2010 at 5:29 PM, Andres Freund and...@anarazel.de wrote:
 The problem is harder for us because a backend can't switch identities
 once it's been assigned to a database.  I haven't heard an adequate
 explanation of why that couldn't be changed, though.

 Possibly it might decrease the performance significantly enough by
 reducing the cache locality (syscache, prepared plans)?

 Those things are backend-local.  The worst case scenario is you've got
 to flush them all when you reinitialize, in which case you still save
 the overhead of creating a new process.

 Flushing them all is not zero-cost; it's not too hard to believe that
 it could actually be slower than forking a clean new backend.

I'm not so sure I believe that.  Is a sinval overrun slower than
forking a clean new backend?  Is DISCARD ALL slower that forking a
clean new backend?  How much white space is there between either of
those and what would be needed here?  I guess it could be slower, but
I wouldn't want to assume that without evidence.

 What's much worse, it's not zero-bug.  We've got little bitty caches
 all over the backend, including (no doubt) some caching behavior in
 third-party code that wouldn't get the word about whatever API you
 invented to deal with this.

I think this is probably the biggest issue with the whole idea, and I
agree there would be some pain involved.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Pooling in Core WAS: Need help in performance tuning.

2010-07-27 Thread Robert Haas

On Tue, Jul 27, 2010 at 9:56 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Tue, Jul 27, 2010 at 4:40 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Flushing them all is not zero-cost; it's not too hard to believe that
 it could actually be slower than forking a clean new backend.

 I'm not so sure I believe that.

 I'm not asserting it's true, just suggesting it's entirely possible.
 Other than the fork() cost itself and whatever authentication activity
 there might be, practically all the startup cost of a new backend can be
 seen as cache-populating activities.  You'd have to redo all of that,
 *plus* pay the costs of getting rid of the previous cache entries.
 Maybe the latter costs less than a fork(), or maybe not.  fork() is
 pretty cheap on modern Unixen.

 What's much worse, it's not zero-bug.

 I think this is probably the biggest issue with the whole idea, and I
 agree there would be some pain involved.

 Yeah, if it weren't for that I'd say sure let's try it.  But I'm
 afraid we'd be introducing significant headaches in return for a gain
 that's quite speculative.

I agree that the gain is minimal of itself; after all, how often do
you need to switch databases, and what's the big deal if the
postmaster has to fork a new backend?  Where I see it as a potentially
big win is when it comes to things like parallel query.  I can't help
thinking that's going to be a lot less efficient if you're forever
forking new backends.  Perhaps the point here is that you'd actually
sort of like to NOT flush all those caches unless it turns out that
you're switching databases - many installations probably operate with
essentially one big database, so chances are good that even if you
distributed connections / parallel queries to backends round-robin,
you'd potentially save quite a bit of overhead.  Of course, for the
guy who has TWO big databases, you might hope to be a little smarter,
but that's another problem altogether.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Pooling in Core WAS: Need help in performance tuning.

2010-07-23 Thread Robert Haas

On Fri, Jul 23, 2010 at 11:58 AM, Hannu Krosing ha...@krosing.net wrote:
 On Thu, 2010-07-22 at 20:57 -0400, Robert Haas wrote:
 On Thu, Jul 22, 2010 at 3:15 PM, Hannu Krosing ha...@krosing.net wrote:
  On Thu, 2010-07-22 at 14:36 -0400, Robert Haas wrote:
  On Mon, Jul 12, 2010 at 6:58 AM, Craig Ringer
  cr...@postnewspapers.com.au wrote:
   So rather than asking should core have a connection pool perhaps
   what's needed is to ask what can an in-core pool do that an external
   pool cannot do?
 
  Avoid sending every connection through an extra hop.
 
  not really. in-core != magically-in-right-backend-process

 Well, how about if we arrange it so it IS in the right backend
 process?  I don't believe magic is required.

 Do you have any design in mind, how you can make it so ?

Well, if we could change the backends so that they could fully
reinitialize themselves (disconnect from a database to which they are
bound, etc.), I don't see why we couldn't use the Apache approach.
There's a danger of memory leaks but that's why Apache has
MaxRequestsPerChild, and it works pretty darn well.  Of course,
passing file descriptors would be even nicer (you could pass the
connection off to a child that was already bound to the correct
database, perhaps) but has pointed out more than once, that's not
portable.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Slow query with planner row strange estimation

2010-07-22 Thread Robert Haas

On Mon, Jul 12, 2010 at 4:33 PM, phb07 ph...@apra.asso.fr wrote:

 Dimitri a écrit :

 It's probably one of the cases when having HINTS in PostgreSQL may be
 very helpful..

 SELECT /*+ enable_nestloop=off */ ... FROM ...

 will just fix this query without impacting other queries and without
 adding any additional instructions into the application code..

 So, why there is a such resistance to implement hints withing SQL
 queries in PG?..


 +1.
 Another typical case when it would be helpful is with setting the
 cursor_tuple_fraction GUC variable for a specific statement, without being
 obliged to issue 2 SET statements, one before the SELECT and the other
 after.

We've previously discussed adding a command something like:

LET (variable = value, variable = value, ...) command

...which would set those variables just for that one command.  But
honestly I'm not sure how much it'll help with query planner problems.
 Disabling nestloops altogether, even for one particular query, is
often going to be a sledgehammer where you need a scalpel.   But then
again, a sledgehammer is better than no hammer.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Pooling in Core WAS: Need help in performance tuning.

2010-07-22 Thread Robert Haas

On Mon, Jul 12, 2010 at 6:58 AM, Craig Ringer
cr...@postnewspapers.com.au wrote:
 On 12/07/10 17:45, Matthew Wakeling wrote:

 I'm surprised. Doesn't apache httpd do this? Does it have to do a whole
 load of non-portable stuff? It seems to work on a whole load of platforms.

 A lot of what Apache HTTPd does is handled via the Apache Portable
 Runtime (APR). It contains a lot of per-platform handlers for various
 functionality.

Apache just has all of the worker processes call accept() on the
socket, and whichever one the OS hands it off to gets the job.  The
problem is harder for us because a backend can't switch identities
once it's been assigned to a database.  I haven't heard an adequate
explanation of why that couldn't be changed, though.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Pooling in Core WAS: Need help in performance tuning.

2010-07-22 Thread Robert Haas

On Mon, Jul 12, 2010 at 6:58 AM, Craig Ringer
cr...@postnewspapers.com.au wrote:
 So rather than asking should core have a connection pool perhaps
 what's needed is to ask what can an in-core pool do that an external
 pool cannot do?

Avoid sending every connection through an extra hop.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Pooling in Core WAS: Need help in performance tuning.

2010-07-22 Thread Robert Haas

On Thu, Jul 22, 2010 at 3:15 PM, Hannu Krosing ha...@krosing.net wrote:
 On Thu, 2010-07-22 at 14:36 -0400, Robert Haas wrote:
 On Mon, Jul 12, 2010 at 6:58 AM, Craig Ringer
 cr...@postnewspapers.com.au wrote:
  So rather than asking should core have a connection pool perhaps
  what's needed is to ask what can an in-core pool do that an external
  pool cannot do?

 Avoid sending every connection through an extra hop.

 not really. in-core != magically-in-right-backend-process

Well, how about if we arrange it so it IS in the right backend
process?  I don't believe magic is required.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Pooling in Core WAS: Need help in performance tuning.

2010-07-22 Thread Robert Haas

On Thu, Jul 22, 2010 at 5:29 PM, Andres Freund and...@anarazel.de wrote:
 The problem is harder for us because a backend can't switch identities
 once it's been assigned to a database.  I haven't heard an adequate
 explanation of why that couldn't be changed, though.
 Possibly it might decrease the performance significantly enough by
 reducing the cache locality (syscache, prepared plans)?

Those things are backend-local.  The worst case scenario is you've got
to flush them all when you reinitialize, in which case you still save
the overhead of creating a new process.  The best case scenario is
that you can keep some of them around, in which case, great.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Slow query with planner row strange estimation

2010-07-09 Thread Robert Haas

On Fri, Jul 9, 2010 at 6:13 AM, damien hostin damien.hos...@axege.com wrote:
 Have you tried running ANALYZE on the production server?

 You might also want to try ALTER TABLE ... SET STATISTICS to a large
 value on some of the join columns involved in the query.

 Hello,

 Before comparing the test case on the two machines, I run analyse on the
 whole and look at pg_stats table to see if change occurs for the columns.
 but on the production server the stats never became as good as on the
 desktop computer. I set statistic at 1 on column used by the join, run
 analyse which take a 300 row sample then look at the stats. The stats
 are not as good as on the desktop. Row number is nearly the same but only 1
 or 2 values are found.

 The data are not balanced the same way on the two computer :
 - Desktop is 12000 rows with 6000 implicated in the query (50%),
 - Production (actually a dev/test server) is 6 million rows with 6000
 implicated in the query (0,1%).
 Columns used in the query are nullable, and in the 5994000 other rows that
 are not implicated in the query these columns are null.

 I don't know if the statistic target is a % or a number of value to obtain,

It's a number of values to obtain.

 but event set at max (1), it didn't managed to collect good stats (for
 this particular query).

I think there's a cutoff where it won't collect values unless they
occur significantly more often than the average frequency.  I wonder
if that might be biting you here: without the actual values in the MCV
table, the join selectivity estimates probably aren't too good.

 As I don't know what more to do, my conclusion is that the data need to be
 better balanced to allow the analyse gather better stats. But if there is a
 way to improve the stats/query with this ugly balanced data, I'm open to it
 !

 I hope that in real production, data will never be loaded this way. If this
 appened we will maybe set enable_nestloop to off, but I don't think it's a
 good solution, other query have a chance to get slower.

Yeah, that usually works out poorly.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Need help in performance tuning.

2010-07-09 Thread Robert Haas

On Fri, Jul 9, 2010 at 12:42 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Samuel Gendler sgend...@ideasculptor.com writes:
 On Thu, Jul 8, 2010 at 8:11 PM, Craig Ringer
 cr...@postnewspapers.com.au wrote:
 If you're not using a connection pool, start using one.

 I see this issue and subsequent advice cross this list awfully
 frequently.  Is there in architectural reason why postgres itself
 cannot pool incoming connections in order to eliminate the requirement
 for an external pool?

 Perhaps not, but there's no obvious benefit either.  Since there's
 More Than One Way To Do It, it seems more practical to keep that as a
 separate problem that can be solved by a choice of add-on packages.

I'm not buying it.  A separate connection pooler increases overhead
and management complexity, and, I believe, limits our ability to
implement optimizations like parallel query execution.  I'm glad there
are good ones available, but the fact that they're absolutely
necessary for good performance in some environments is not a feature.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Slow query with planner row strange estimation

2010-07-08 Thread Robert Haas

On Wed, Jul 7, 2010 at 10:39 AM, damien hostin damien.hos...@axege.com wrote:
 Hello again,

 At last, I check the same query with the same data on my desktop computer.
 Just after loading the data, the queries were slow, I launch a vaccum
 analyse which collect good stats on the main table, the query became quick
 (~200ms). Now 1classic sata disk computer is faster than our little monster
 server !!

Have you tried running ANALYZE on the production server?

You might also want to try ALTER TABLE ... SET STATISTICS to a large
value on some of the join columns involved in the query.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [Slony1-general] [PERFORM] WAL partition overloaded--by autovacuum?

2010-07-08 Thread Robert Haas

On Wed, Jul 7, 2010 at 3:32 PM, Richard Yen rich...@iparadigms.com wrote:
 On Jul 6, 2010, at 8:25 PM, Scott Marlowe wrote:

 Tell us what you can about your hardware setup.

 Sorry, I made the bad assumption that the hardware setup would be 
 irrelevant--dunno why I thought that.

 My hardware setup is 2 FusionIO 160GB drives in a RAID-1 configuration, 
 running on an HP DL360 G5

 I think I figured out the problem:

 -- I figured that pg_xlog and data/base could both be on the FusionIO drive, 
 since there would be no latency when there are no spindles.
 -- However, I didn't take into account the fact that pg_xlog might grow in 
 size when autovacuum does its work when vacuuming to prevent XID wraparound.  
 I *just* discovered this when one of my other replication nodes decided to 
 die on me and fill up its disk.
 -- Unfortunately, my db is 114GB (including indexes) or 60GB (without 
 indexes), leaving ~37GB for pg_xlog (since they are sharing a partition).  So 
 I'm guessing what happened was that when autovacuum ran to prevent XID 
 wraparound, it takes each table and changes the XID, and it gets recorded in 
 WAL, causing WAL to bloat.  This this the correct understanding?

That seems logical (and un-fun), but I don't understand how you
managed to fill up 37GB of disk with WAL files.  Every time you fill
up checkpoint_segments * 16MB of WAL files, you ought to get a
checkpoint.  When it's complete, WAL segments completely written
before the start of the checkpoint should be recyclable.  Unless I'm
confused, which apparently I am.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] big data - slow select (speech search)

2010-07-07 Thread Robert Haas

On Wed, Jul 7, 2010 at 9:31 AM, Michal Fapso michal.fa...@gmail.com wrote:
 thank you for your help. I tried to cluster the table on
 hyps_wordid_index and the query execution time dropped from 4.43 to
 0.19 seconds which is not that far from Lucene's performance of 0.10
 second.

Dang.  Nice!

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Highly Efficient Custom Sorting

2010-07-06 Thread Robert Haas

On Sat, Jul 3, 2010 at 4:17 PM, Eliot Gable
egable+pgsql-performa...@gmail.com wrote:
 Read RFC 2782 on random weighted load balancing of SRV records inside DNS.

It may be asking a bit much to expect people here to read an RFC to
figure out how to help you solve this problem, but...

 I've looked through the documentation on how to re-write this in C, but I
 cannot seem to find good documentation on working with the input array
 (which is an array of a complex type). I also don't see good documentation
 for working with the complex type. I found stuff that talks about
 constructing a complex type in C and returning it. However, I'm not sure how
 to take an input complex type and deconstruct it into something I can work
 with in C. Also, the memory context management stuff is not entirely clear.

...there's no question that writing things in C is a lot more work,
and takes some getting used to.  Still, it's fast, so maybe worth it,
especially since you already know C++, and will therefore mostly just
need to learn the PostgreSQL coding conventions.  The best thing to do
is probably to look at some of the existing examples within the
backend code.  Most of the datatype code is in src/backend/utils/adt.
You might want to look at arrayfuncs.c (perhaps array_ref() or
array_map()); and also rowtypes.c (perhaps record_cmp()).

 Specifically, how do I go about preserving the pointers to the data that I
 allocate in multi-call memory context so that they still point to the data
 on the next call to the function for the next result row? Am I supposed to
 set up some global variables to do that, or am I supposed to take a
 different approach? If I need to use global variables, then how do I deal
 with concurrency?

Global variables would be a bad idea, not so much because of
concurrency as because they won't get cleaned up properly.  Again, the
best thing to do is to look at existing examples, like array_unnest()
in src/backend/utils/adt/arrayfuncs.c; the short answer is that you
probably want to compute all your results on the first call and stash
them in the FuncCallContext (funcctx-user_fctx); and then on
subsequent calls just return one row per call.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Two equivalent WITH RECURSIVE queries, one of them slow.

2010-07-06 Thread Robert Haas

On Mon, Jul 5, 2010 at 2:07 AM, Octavio Alvarez
alvar...@alvarezp.ods.org wrote:
 Hello.

 I have a tree-like table with a three-field PK (name, date, id) and one
 parent field.
 It has 5k to 6k records as of now, but it will hold about 1 million records.

 I am trying the following WITH RECURSIVE query:

 WITH RECURSIVE t AS (
                 SELECT par.id AS tid, par.name, par.date, par.id, par.text,
 par.h_title, par.h_name, par.parent
                   FROM _books.par
        UNION
                 SELECT t.tid AS pid, p.name, p.date, p.id, p.text,
 p.h_title, p.h_name, p.parent
                   FROM t, _books.par p
                  WHERE p.name = t.name AND p.date = t.date AND t.id =
 p.parent
        )
  SELECT t.tid, t.name, t.date, t.id, t.text, t.h_title, t.h_name, t.parent
   FROM t WHERE name = 'cfx' AND date = '2009-08-19' AND tid = '28340';

 ... which takes 2547.503 ms

 However, if I try the same query but adding the same WHERE clause to the
 non-recursive term, I get much better results.


 WITH RECURSIVE t AS (
                 SELECT par.id AS tid, par.name, par.date, par.id, par.text,
 par.h_title, par.h_name, par.parent
                   FROM _books.par WHERE name = 'cfx' AND date = '2009-08-19'
 AND par.id = '28340'
        UNION
                 SELECT t.tid AS pid, p.name, p.date, p.id, p.text,
 p.h_title, p.h_name, p.parent
                   FROM t, _books.par p
                  WHERE p.name = t.name AND p.date = t.date AND t.id =
 p.parent
        )
  SELECT t.tid, t.name, t.date, t.id, t.text, t.h_title, t.h_name, t.parent
   FROM t WHERE name = 'cfx' AND date = '2009-08-19' AND tid = '28340';

 ... which takes 0.221 ms

 I am being forced to use the slow query because I want to define it as a
 view, leaving the WHERE clause to the application.

I think this is just a limitation of the optimizer.  Recursive queries
are a relatively new feature and the optimizer doesn't know a whole
lot about how to deal with them.  That may get improved at some point,
but the optimization you're hoping for here is pretty tricky.  In
order to push the WHERE clauses down into the non-recursive term, the
optimizer would need to prove that this doesn't change the final
results.  I think that's possible here because it so happens that your
recursive term only generates results that have the same name, date,
and tid as some existing result, but with a slightly different
recursive query that wouldn't be true, so you'd need to make the code
pretty smart to work this one out.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Question about partitioned query behavior

2010-07-06 Thread Robert Haas

On Tue, Jul 6, 2010 at 12:30 PM, Ranga Gopalan
ranga_gopa...@hotmail.com wrote:
 It seems that this is an issue faced by others as well - Please see this
 link:
 http://stackoverflow.com/questions/2236776/efficient-querying-of-multi-partition-postgres-table

 Is this a known bug? Is this something that someone is working on or is
 there a known work around?

I think that we know this problem exists, but I'm not aware anyone is
working on fixing it.  There is a Merge Append patch floating around
out there that I think might help with this, but AFAICS it was last
updated on July 5, 2009, and still needed some more work at that time.

Since this is an all-volunteer effort, complicated problems like this
don't always get fixed as fast as we'd like; most of us have to spend
most of our time on whatever it is that our employer pays us to do.
Of course if you're in a position to sponsor a developer there are a
number of companies that will be happy to work with you.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] big data - slow select (speech search)

2010-07-05 Thread Robert Haas

On Thu, Jul 1, 2010 at 6:34 PM, Michal Fapso michal.fa...@gmail.com wrote:
 It took about 4.5 seconds. If I rerun it, it takes
 less than 2 miliseconds, but it is because of the cache. I need to
 optimize the first-run.

 laptop ASUS, CPU dual core T2300 1.66GHz, 1.5G RAM

 EXPLAIN ANALYZE SELECT h1.docid
 FROM hyps AS h1
 WHERE h1.wordid=65658;

  Bitmap Heap Scan on hyps h1  (cost=10.97..677.09 rows=171 width=4)
 (actual time=62.106..4416.864 rows=343 loops=1)
   Recheck Cond: (wordid = 65658)
   -  Bitmap Index Scan on hyps_wordid_index  (cost=0.00..10.92
 rows=171 width=0) (actual time=42.969..42.969 rows=343 loops=1)
         Index Cond: (wordid = 65658)
  Total runtime: 4432.015 ms

 If I run the same query in Lucene search engine, it takes 0.105
 seconds on the same data which is quite a huge difference.

So PostgreSQL is reading 343 rows from disk in 4432 ms, or about 12
ms/row.  I'm not an expert on seek times, but that might not really be
that unreasonable, considering that those rows may be scattered all
over the index and thus it may be basically random I/O.  Have you
tried clustering hyps on hyps_wordid_index?  If you had a more
sophisticated disk subsystem you could try increasing
effective_io_concurrency but that's not going to help with only one
spindle.

If you run the same query in Lucene and it takes only 0.105 s, then
Lucene is obviously doing a lot less I/O.  I doubt that any amount of
tuning of your existing schema is going to produce that kind of result
on PostgreSQL.  Using the full-text search stuff, or a gin index of
some kind, might get you closer, but it's hard to beat a
special-purpose engine that implements exactly the right algorithm for
your use case.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Performance issues with postgresql-8.4.0: Query gets stuck sometimes

2010-07-05 Thread Robert Haas

On Fri, Jul 2, 2010 at 1:40 AM, Sachin Kumar sachin...@globallogic.com wrote:
 At times we have observed that postgres stops responding for several
 minutes, even couldn’t fetch the number of entries in a particular table.
 One such instance happens when we execute the following steps:

Sounds sort of like a checkpoint spike.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Low perfomance SUM and Group by large databse

2010-07-02 Thread Robert Haas

On Tue, Jun 29, 2010 at 7:59 AM, Sergio Charpinel Jr.
sergiocharpi...@gmail.com wrote:
 One more question about two specifics query behavior: If I add AND (ip_dst
 = x.x.x.x), it uses another plan and take a much more time. In both of
 them, I'm using WHERE clause. Why this behavior?

With either query, the planner is choosing to scan backward through
the acct_2010_26_pk index to get the rows in descending order by the
bytes column.  It keeps scanning until it finds 50 rows that match
the WHERE clause.  With just the critieria on stamp_inserted, matches
are pretty common, so it doesn't have to scan very far before finding
50 suitable rows.  But when you add the ip_dst = 'x.x.x.x' criterion,
suddenly a much smaller percentage of the rows match and so it has to
read much further into the index before it finds 50 that do.

A second index on just the ip_dst column might help a lot - then it
could consider index-scanning for the matching rows and sorting them
afterwards.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] No hash join across partitioned tables?

2010-07-02 Thread Robert Haas

On Fri, Jul 2, 2010 at 4:58 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Bruce Momjian br...@momjian.us writes:
 I am surprised there is no documentation update requirement for this.

 Somebody put something about it in the docs a few days ago, IIRC.

That was me.

http://archives.postgresql.org/pgsql-committers/2010-06/msg00144.php

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] PostgreSQL as a local in-memory cache

2010-06-29 Thread Robert Haas

On Mon, Jun 28, 2010 at 5:57 PM, Bruce Momjian br...@momjian.us wrote:
 The patch also documents that synchronous_commit = false has
 potential committed transaction loss from a database crash (as well as
 an OS crash).

Is this actually true?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] PostgreSQL as a local in-memory cache

2010-06-29 Thread Robert Haas

On Tue, Jun 29, 2010 at 9:32 AM, Bruce Momjian br...@momjian.us wrote:
 Robert Haas wrote:
 On Mon, Jun 28, 2010 at 5:57 PM, Bruce Momjian br...@momjian.us wrote:
  The patch also documents that synchronous_commit = false has
  potential committed transaction loss from a database crash (as well as
  an OS crash).

 Is this actually true?

 I asked on IRC and was told it is true, and looking at the C code it
 looks true.  What synchronous_commit = false does is to delay writing
 the wal buffers to disk and fsyncing them, not just fsync, which is
 where the commit loss due to db process crash comes from.

Ah, I see.  Thanks.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] WAL+Os on a single disk

2010-06-25 Thread Robert Haas

On Thu, Jun 24, 2010 at 10:55 AM, Anj Adu fotogra...@gmail.com wrote:
 What would you recommend to do a quick test for this? (i.e WAL on
 internal disk vs WALon the 12 disk raid array )?

Maybe just pgbench?

http://archives.postgresql.org/pgsql-performance/2010-06/msg00223.php

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] PostgreSQL as a local in-memory cache

2010-06-24 Thread Robert Haas

On Thu, Jun 24, 2010 at 4:40 AM, Rob Wultsch wult...@gmail.com wrote:
 On Fri, Jun 18, 2010 at 1:55 PM, Josh Berkus j...@agliodbs.com wrote:

 It must be a setting, not a version.

 For instance suppose you have a session table for your website and a
 users table.

 - Having ACID on the users table is of course a must ;
 - for the sessions table you can drop the D

 You're trying to solve a different use-case than the one I am.

 Your use-case will be solved by global temporary tables.  I suggest that
 you give Robert Haas some help  feedback on that.

 My use case is people using PostgreSQL as a cache, or relying entirely
 on replication for durability.

 Is he? Wouldn't a global temporary table have content that is not
 visible between db connections? A db session many not be the same as a
 user session.


I'm planning to implement global temporary tables, which can have
different contents for each user session.

And I'm also planning to implement unlogged tables, which have the
same contents for all sessions but are not WAL-logged (and are
truncated on startup).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] requested shared memory size overflows size_t

2010-06-24 Thread Robert Haas

On Thu, Jun 24, 2010 at 7:19 PM, Alvaro Herrera
alvhe...@commandprompt.com wrote:
 Excerpts from Craig James's message of jue jun 24 19:03:00 -0400 2010:

 select relname, pg_relation_size(relname) from pg_class
          where pg_get_userbyid(relowner) = 'emol_warehouse_1'
          and relname not like 'pg_%'
          order by pg_relation_size(relname) desc;
 ERROR:  relation rownum_temp does not exist

 emol_warehouse_1= select relname from pg_class where relname = 
 'rownum_temp';
         relname
 --
   rownum_temp
 (1 row)

 What's the full row?  I'd just add a WHERE relkind = 'r' to the above
 query anyway.

Yeah - also, it would probably be good to call pg_relation_size on
pg_class.oid rather than pg_class.relname, to avoid any chance of
confusion over which objects are in which schema.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Aggressive autovacuuming ?

2010-06-23 Thread Robert Haas

On Sun, Jun 20, 2010 at 4:13 PM, Scott Marlowe scott.marl...@gmail.com wrote:
 The largest consequence I can see at the moment is that when I get a
 full vacuum (for preventing transaction-id wraparound) it would be

 I assume you mean the automatic database wide vacuum.  I don't think
 8.4 and above need that anymore.  I thnk 8.3 does that too, but I'm
 not 100% sure.

8.4 (and 9.0) do still need to do vacuums to freeze tuples before
transaction ID wraparound occurs.  This is not to be confused with
VACUUM FULL, which is something else altogether.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] Aggressive autovacuuming ?

2010-06-23 Thread Robert Haas

On Wed, Jun 23, 2010 at 2:20 PM, Scott Marlowe scott.marl...@gmail.com wrote:
 On Wed, Jun 23, 2010 at 1:58 PM, Robert Haas robertmh...@gmail.com wrote:
 On Sun, Jun 20, 2010 at 4:13 PM, Scott Marlowe scott.marl...@gmail.com 
 wrote:
 The largest consequence I can see at the moment is that when I get a
 full vacuum (for preventing transaction-id wraparound) it would be

 I assume you mean the automatic database wide vacuum.  I don't think
 8.4 and above need that anymore.  I thnk 8.3 does that too, but I'm
 not 100% sure.

 8.4 (and 9.0) do still need to do vacuums to freeze tuples before
 transaction ID wraparound occurs.  This is not to be confused with
 VACUUM FULL, which is something else altogether.

 My point was that modern pgsql doesn't need db wide vacuum to prevent
 wrap around anymore, but can vacuum individual relations to prevent
 wraparound.

Oh, I see.  I didn't realize we used to do that.  Looks like that
change was committed 11/5/2006.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] PostgreSQL as a local in-memory cache

2010-06-23 Thread Robert Haas

On Wed, Jun 23, 2010 at 3:37 PM, Bruce Momjian br...@momjian.us wrote:
 Tom Lane wrote:
 Dimitri Fontaine dfonta...@hi-media.com writes:
  Josh Berkus j...@agliodbs.com writes:
  a) Eliminate WAL logging entirely

 If we elimiate WAL logging, that means a reinstall is required for even
 a postmaster crash, which is a new non-durable behavior.

 Also, we just added wal_level = minimal, which might end up being a poor
 name choice of we want wal_level = off in PG 9.1.  Perhaps we should
 have used wal_level = crash_safe in 9.0.

 I have added the following TODO:

        Consider a non-crash-safe wal_level that eliminates WAL activity

            * 
 http://archives.postgresql.org/pgsql-performance/2010-06/msg00300.php

I don't think we need a system-wide setting for that.  I believe that
the unlogged tables I'm working on will handle that case.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: [PERFORM] PostgreSQL as a local in-memory cache

2010-06-21 Thread Robert Haas

On Thu, Jun 17, 2010 at 1:29 PM, Josh Berkus j...@agliodbs.com wrote:
 a) Eliminate WAL logging entirely

In addition to global temporary tables, I am also planning to
implement unlogged tables, which are, precisely, tables for which no
WAL is written.  On restart, any such tables will be truncated.  That
should give you the ability to do this (by making all your tables
unlogged).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

< 1 2 3 4 5 6 7 >

201 - 300 of 654 matches

Mail list logo