Re: [HACKERS] sequential scan result order vs performance
On Sun, Oct 30, 2016 at 11:37 PM, Jim Nasby wrote: > BTW, I've sometimes wished for a mode where queries would silently have > result ordering intentionally futzed, to eliminate any possibility of > dependence on tuple ordering (as well as having sequences start at some > random value). I guess with the hooks that are in place today it wouldn't > be hard to stick a ORDER BY random() in if there wasn't already a Sort node > at the top level... +1 In Oracle, we sorta had that feature by adding a parallel hint to a query even if it didn't need it. It came in handy.
Re: [HACKERS] sequential scan result order vs performance
Jim Nasby writes: > BTW, I've sometimes wished for a mode where queries would silently have > result ordering intentionally futzed, to eliminate any possibility of > dependence on tuple ordering (as well as having sequences start at some > random value). FWIW, SQLite has this, in the form of 'PRAGMA reverse_unordered_selects'. http://sqlite.org/pragma.html#pragma_reverse_unordered_selects -- "The surreality of the universe tends towards a maximum" -- Skud's Law "Never formulate a law or axiom that you're not prepared to live with the consequences of." -- Skud's Meta-Law -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] sequential scan result order vs performance
Jim Nasby wrote: > On 10/30/16 9:12 AM, Tom Lane wrote: >> I think there will be a lot of howls. People expect that creating >> a table by inserting a bunch of rows, and then reading back those >> rows, will not change the order. We already futzed with that guarantee >> a bit with syncscans, but that only affects quite large tables --- and >> even there, we were forced to provide a way to turn it off. > > Leaving a 30% performance improvement on the floor because some people > don't grok how sets work seems insane to me. +1 Yours, Laurenz Albe -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] sequential scan result order vs performance
On 10/30/16 9:12 AM, Tom Lane wrote: I think there will be a lot of howls. People expect that creating a table by inserting a bunch of rows, and then reading back those rows, will not change the order. We already futzed with that guarantee a bit with syncscans, but that only affects quite large tables --- and even there, we were forced to provide a way to turn it off. Leaving a 30% performance improvement on the floor because some people don't grok how sets work seems insane to me. We could have a GUC to disable this. I suspect ORDER BY ctid would be another option. BTW, I've sometimes wished for a mode where queries would silently have result ordering intentionally futzed, to eliminate any possibility of dependence on tuple ordering (as well as having sequences start at some random value). I guess with the hooks that are in place today it wouldn't be hard to stick a ORDER BY random() in if there wasn't already a Sort node at the top level... -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com 855-TREBLE2 (855-873-2532) mobile: 512-569-9461 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] sequential scan result order vs performance
Andres Freund writes: > It's quite easy to change iteration so we start with the latest item, > and iterate till the first, rather than the other way round. In > benchmarks with somewhat wide columns and aggregation, this yields > speedups of over 30%, before hitting other bottlenecks. > I do wonder however if it's acceptable to change the result order of > sequential scans. I think there will be a lot of howls. People expect that creating a table by inserting a bunch of rows, and then reading back those rows, will not change the order. We already futzed with that guarantee a bit with syncscans, but that only affects quite large tables --- and even there, we were forced to provide a way to turn it off. If you were talking about 3X then maybe it would be worth it, but for 30% (on a subset of queries) I am not excited. I wonder whether we could instead adjust the rules for insertion so that tuples tend to be physically in order by itemid. I'm imagining leaving two "holes" in a page and sometimes (hopefully not often) having to shift data during insert to preserve the ordering. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] sequential scan result order vs performance
Hi, On 2016-10-30 00:36:55 -0700, Andres Freund wrote: > It's quite easy to change iteration so we start with the latest item, > and iterate till the first, rather than the other way round. In > benchmarks with somewhat wide columns and aggregation, this yields > speedups of over 30%, before hitting other bottlenecks. One more point: Over IM Robert commented that it's not guaranteed that itemid order correlates precisely with reverse "tuple data" order. After PageRepairFragmentation() that'll not be the case anymore. That's true, but I suspect in many cases with larger analytics queries the correlation will still be significant, and we also could make it guaranteed with the price of making PageRepairFragmentation() a bit more expensive. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers