"Simon Riggs" <[EMAIL PROTECTED]> writes: > On Mon, 2007-06-04 at 14:41 -0400, Tom Lane wrote: >> "Simon Riggs" <[EMAIL PROTECTED]> writes: >>> The original ideal implementation was to use round-robin/cyclic >>> selection, which allows much better usage in the above case. >> >> Really? What if multiple backends are all hitting the same tablespaces >> in the same order? A random selection seems much less likely to risk >> having any self-synchronizing behavior.
> I'd like a single backend to never reuse a temp tablespace that is > actively being used so that large queries won't randomly conflict with > themselves. That's pretty certain to draw complaints, IMHO. > We can do this two ways > - cycle thru temp tablespaces, as originally suggested (not by me...) > - pick a random tablespace **other than ones already in active use** Idea 2 fails as soon as you have more temp files than tablespaces, and also requires tracking which tablespaces are currently in use, a bit of complexity we do not have in there. Perhaps a reasonable compromise could work like this: at the first point in a transaction where a temp file is created, choose a random list element, and thereafter advance cyclically for the duration of that transaction. This ensures within-transaction spread-out while still having some randomness across backends. The reason I'm thinking per-transaction is that we could tie this to setting up a cached list of tablespace OIDs, which would avoid the overhead of repeat parsing and tablespace validity checking. We had rejected using a long-lived cache because of the problem of tablespaces getting dropped, but I think one that lasts only across a transaction would be OK. And the reason I'm thinking a cache is important is that if you really want to get any win from this idea, you need to spread the temp files across tablespaces *per file*, which is not the way it works now. As committed, the code selects one temp tablespace per sort or hashjoin. The submitted patch already did it that way for sorts, and I forced the same for hashjoins, because I wanted to be sure to minimize the number of executions of aforesaid parsing/checking. So really that patch is entirely wrong, and selection of the tablespace for a temp file needs to be pushed much further down. Assuming, that is, that you think this point is important enough to drive the whole design; which I find rather questionable in view of the fact that the submitted patch contained no mention whatever of any such consideration. Or is this just another way in which its documentation was not up to snuff? regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 6: explain analyze is your friend