Greg Stark writes:
> Sorry I was unclear. By "usual case" I meant reading, as opposed to updates.
> The size of the on-disk representation turns out to be a major determinant in
> a lot of database applications, since the dominant resource is i/o bandwidth.
> Try doing a fresh import of a large table with pctfree 0 pctuse 100 and
> compare how long a select takes on it compared to the original table.
BTW, I greatly appreciate your support on this stuff. This list is a
I think we agree. The question is what is the workload. On tables
without updates, postgres will be fast enough. However, postgres is
slow on tables with updates afaict. I think of OLTP as a system with
updates. One can do DSS on an OLTP database with Oracle, at least it
seems to work for one of our projects.
> FIrstly, that type of query will be faster in 7.4 due to implementing a new
> method for doing groups called hash aggregates.
We'll be trying it as soon as it is out.
> Secondly you could try raising sort_mem. Postgres can't know how much memory
> it really has before it swaps, so there's a parameter to tell it. And swapping
> would be much worse than doing disk sorts.
It is at 8000. This is probably as high as I can go with multiple
postmasters. The sort area is shared in Oracle (I think :-) in the
> You can raise sort_mem to tell it how much memory it's allowed to
> use before it goes to disk sorts. You can even use ALTER SESSION to
> raise it in a few DSS sessions but leave it low the many OLTP
> sessions. If it's high in OLTP sessions then you could quickly hit
> swap when they all happen to decide to use the maximum amount at the
> same time. But then you don't want to be doing big sorts in OLTP
> sessions anyways.
This is a web app. I can't control what the user wants to do.
Sometimes they update data, and other times they simply look at it.
I didn't find ALTER SESSION for postgres (isn't that Oracle?), so I
set sort_mem in the conf file to 512000, restarted postrgres. Reran
the simpler query (no name) 3 times, and it was still 27 secs.
> Unfortunately there's no way to tell how much memory it thinks it's
> going to use. I used to use a script to monitor the pgsql_tmp
> directory in the database to watch for usage.
I don't have to. The queries that run slow are hitting disk.
Anything that takes a minute has to be writing to disk.
> Well, first of all it doesn't really because you said to group by t2.name not
> f1. You might expect it to at least optimize something like this:
I put f2 in the group by, and it doesn't matter. That's the point.
It's the on-disk sort before the aggregate that's killing the query.
> but even then I don't think it actually is capable of using foreign keys as a
> hint like that. I don't think Oracle does either actually, but I'm not sure.
I'll be finding out this week.
> To convince it to do the right thing you would have to do either:
> SELECT a, t2.name
> FROM (SELECT avg(f1),f2 FROM t1 GROUP BY f2) AS t1
> JOIN t2 USING (f2)
> Or use a subquery:
> SELECT a, (SELECT name FROM t2 WHERE t2.f2 = t1.f2)
> FROM t1
> GROUP BY f2
This doesn't solve the problem. It's the GROUP BY that is doing the
wrong thing. It's grouping, then aggregating.
---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend