Re: [HACKERS] sequential scan result order vs performance

2016-10-31 Thread Corey Huinker
On Sun, Oct 30, 2016 at 11:37 PM, Jim Nasby 
wrote:

> BTW, I've sometimes wished for a mode where queries would silently have
> result ordering intentionally futzed, to eliminate any possibility of
> dependence on tuple ordering (as well as having sequences start at some
> random value). I guess with the hooks that are in place today it wouldn't
> be hard to stick a ORDER BY random() in if there wasn't already a Sort node
> at the top level...


+1
In Oracle, we sorta had that feature by adding a parallel hint to a query
even if it didn't need it. It came in handy.


Re: [HACKERS] sequential scan result order vs performance

2016-10-31 Thread Dagfinn Ilmari Mannsåker
Jim Nasby  writes:

> BTW, I've sometimes wished for a mode where queries would silently have
> result ordering intentionally futzed, to eliminate any possibility of
> dependence on tuple ordering (as well as having sequences start at some
> random value).

FWIW, SQLite has this, in the form of 'PRAGMA reverse_unordered_selects'.

http://sqlite.org/pragma.html#pragma_reverse_unordered_selects

-- 
"The surreality of the universe tends towards a maximum" -- Skud's Law
"Never formulate a law or axiom that you're not prepared to live with
 the consequences of."  -- Skud's Meta-Law


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] sequential scan result order vs performance

2016-10-31 Thread Albe Laurenz
Jim Nasby wrote:
> On 10/30/16 9:12 AM, Tom Lane wrote:
>> I think there will be a lot of howls.  People expect that creating
>> a table by inserting a bunch of rows, and then reading back those
>> rows, will not change the order.  We already futzed with that guarantee
>> a bit with syncscans, but that only affects quite large tables --- and
>> even there, we were forced to provide a way to turn it off.
> 
> Leaving a 30% performance improvement on the floor because some people
> don't grok how sets work seems insane to me.

+1

Yours,
Laurenz Albe

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] sequential scan result order vs performance

2016-10-30 Thread Jim Nasby

On 10/30/16 9:12 AM, Tom Lane wrote:

I think there will be a lot of howls.  People expect that creating
a table by inserting a bunch of rows, and then reading back those
rows, will not change the order.  We already futzed with that guarantee
a bit with syncscans, but that only affects quite large tables --- and
even there, we were forced to provide a way to turn it off.


Leaving a 30% performance improvement on the floor because some people 
don't grok how sets work seems insane to me.


We could have a GUC to disable this. I suspect ORDER BY ctid would be 
another option.


BTW, I've sometimes wished for a mode where queries would silently have 
result ordering intentionally futzed, to eliminate any possibility of 
dependence on tuple ordering (as well as having sequences start at some 
random value). I guess with the hooks that are in place today it 
wouldn't be hard to stick a ORDER BY random() in if there wasn't already 
a Sort node at the top level...

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)   mobile: 512-569-9461


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] sequential scan result order vs performance

2016-10-30 Thread Tom Lane
Andres Freund  writes:
> It's quite easy to change iteration so we start with the latest item,
> and iterate till the first, rather than the other way round. In
> benchmarks with somewhat wide columns and aggregation, this yields
> speedups of over 30%, before hitting other bottlenecks.

> I do wonder however if it's acceptable to change the result order of
> sequential scans.

I think there will be a lot of howls.  People expect that creating
a table by inserting a bunch of rows, and then reading back those
rows, will not change the order.  We already futzed with that guarantee
a bit with syncscans, but that only affects quite large tables --- and
even there, we were forced to provide a way to turn it off.

If you were talking about 3X then maybe it would be worth it, but for 30%
(on a subset of queries) I am not excited.

I wonder whether we could instead adjust the rules for insertion so
that tuples tend to be physically in order by itemid.  I'm imagining
leaving two "holes" in a page and sometimes (hopefully not often)
having to shift data during insert to preserve the ordering.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] sequential scan result order vs performance

2016-10-30 Thread Andres Freund
Hi,

On 2016-10-30 00:36:55 -0700, Andres Freund wrote:
> It's quite easy to change iteration so we start with the latest item,
> and iterate till the first, rather than the other way round. In
> benchmarks with somewhat wide columns and aggregation, this yields
> speedups of over 30%, before hitting other bottlenecks.

One more point: Over IM Robert commented that it's not guaranteed that
itemid order correlates precisely with reverse "tuple data" order. After
PageRepairFragmentation() that'll not be the case anymore. That's true,
but I suspect in many cases with larger analytics queries the
correlation will still be significant, and we also could make it
guaranteed with the price of making PageRepairFragmentation() a bit more
expensive.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers