On Thu, Nov 19, 2015 at 6:56 PM, Peter Geoghegan <p...@heroku.com> wrote:
> Yes, I really do mean it when I say that the DBA is not supposed to
> see this message, no matter how much or how little memory or data is
> involved. There is no nuance intended here; it isn't sensible to allow
> a multi-pass sort, just as it isn't sensible to allow checkpoints
> every 5 seconds. Both of those things can be thought of as thrashing.

Hm. So a bit of back-of-envelope calculation. If we have want to
buffer at least 1MB for each run -- I think we currently do more
actually -- and say that a 1GB work_mem ought to be enough to run
reasonably (that's per sort after all and there might be multiple
sorts to say nothing of other users on the system). That means we can
merge about 1,000 runs in the final merge. Each run will be about 2GB
currently but 1GB if we quicksort the runs. So the largest table we
can sort in a single pass is 1-2 TB.

If we go above those limits we have the choice of buffering less per
run or doing a whole second pass through the data. I suspect we would
get more horsepower out of buffering less though I'm not sure where
the break-even point is. Certainly if we did random I/O for every I/O
that's much more expensive than a factor of 2 over sequential I/O. We
could probably do the math based on random_page_cost and
sequential_page_cost to calculate the minimum amount of buffering
before it's worth doing an extra pass.

So I think you're kind of right and kind of wrong. The vast majority
of use cases are either sub 1TB or are in work environments designed
specifically for data warehouse queries where a user can obtain much
more memory for their queries. However I think it's within the
intended use cases that Postgres should be able to handle a few
terabytes of data on a moderately sized machine in a shared
environment too.

Our current defaults are particularly bad for this though. If you
initdb a new Postgres database today and create a table even a few
gigabytes and try to build an index on it it takes forever. The last
time I did a test I canceled it after it had run for hours, raised
maintenance_work_mem and built the index in a few minutes. The problem
is that if we just raise those limits then people will use more
resources when they don't need it. If it were safer for to have those
limits be much higher then we could make the defaults reflect what
people want when they do bigger jobs rather than just what they want
for normal queries or indexes.

> I think that if the DBA ever sees the multipass_warning message, he or she 
> does not have an OLTP workload.

Hm, that's pretty convincing. I guess this isn't the usual sort of
warning due to the time it would take to trigger.

-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to