Re: [GENERAL] effective_io_concurrency increasing

2017-06-19 Thread Peter Geoghegan
On Mon, Jun 19, 2017 at 4:51 PM, Peter Geoghegan  wrote:
> This would make only the first lookup for each distinct value on the
> outer side actually do an index scan on the inner side. I can imagine
> the optimization saving certain queries from consuming a lot of memory
> bandwidth, as well as saving them from pinning and locking the same
> buffers repeatedly.

Apparently this is sometimes called block nested loop join, and MySQL
has had it for a while now:

https://en.wikipedia.org/wiki/Block_nested_loop

It doesn't necessarily require that the outer side input be sorted,
because you might end up using a hash table, etc.

-- 
Peter Geoghegan


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] effective_io_concurrency increasing

2017-06-19 Thread Peter Geoghegan
On Mon, Jun 19, 2017 at 4:35 PM, Andres Freund  wrote:
>> I think that this is the way index scan prefetch is normally
>> implemented. Index scans will on average have a much more random
>> access pattern than what is typical for bitmap heap scans, making this
>> optimization more compelling, so hopefully someone will get around to
>> this.
>
> I think for index based merge and nestloop joins, it'd be hugely
> beneficial to do prefetching on the index, but more importantly on the
> heap level.  Not entirely trivial to do however.

Speaking of nestloop join, and on a similar note, we could do some
caching on the inner side of a nestloop join.

We already track if the outer side access path of a nestloop join
preserves sort order within the optimizer. It might not be that hard
to teach the optimizer to generate a plan where, when we know that
this has happened, and we know that the outer side is not unique, the
final plan hints to the executor to opportunistically cache every
lookup on the inner side.

This would make only the first lookup for each distinct value on the
outer side actually do an index scan on the inner side. I can imagine
the optimization saving certain queries from consuming a lot of memory
bandwidth, as well as saving them from pinning and locking the same
buffers repeatedly.

-- 
Peter Geoghegan


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] effective_io_concurrency increasing

2017-06-19 Thread Andres Freund
On 2017-06-19 15:21:20 -0700, Peter Geoghegan wrote:
> On Mon, Jun 19, 2017 at 8:36 AM, Jeff Janes  wrote:
> > Unfortunately, it is only implemented in very narrow circumstances.  You
> > have to be doing bitmap index scans of many widely scattered rows to make it
> > useful.  I don't think that this is all that common of a situation.  The
> > problem is that at every point in the scan, it has to be possible to know
> > what data block it is going to want N iterations in the future, so you can
> > inform the kernel to pre-fetch it.  That is only easy to know for bitmap
> > scans.
> 
> I think that you could prefetch in index scans by using the
> pointers/downlinks in the immediate parent page of the leaf page that
> the index scan currently pins. The sibling pointer in the leaf itself
> is no good for this, because there is only one block to prefetch
> available at a time.
> 
> I think that this is the way index scan prefetch is normally
> implemented. Index scans will on average have a much more random
> access pattern than what is typical for bitmap heap scans, making this
> optimization more compelling, so hopefully someone will get around to
> this.

I think for index based merge and nestloop joins, it'd be hugely
beneficial to do prefetching on the index, but more importantly on the
heap level.  Not entirely trivial to do however.

- Andres


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] effective_io_concurrency increasing

2017-06-19 Thread Peter Geoghegan
On Mon, Jun 19, 2017 at 3:25 PM, Alvaro Herrera
 wrote:
> Surely you could prefetch all the heap pages pointed to by index items
> in the current leaf index page ...

I'm sure that you could do that too. I'm not sure how valuable each
prefetching optimization is.

I can imagine prefetching heap pages mattering a lot less for a
primary key index, where there is a strong preexisting correlation
between physical and logical order, while also mattering a lot more
than what I describe in other cases. I suppose that you need both.

-- 
Peter Geoghegan


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] effective_io_concurrency increasing

2017-06-19 Thread Alvaro Herrera
Peter Geoghegan wrote:
> On Mon, Jun 19, 2017 at 8:36 AM, Jeff Janes  wrote:
> > Unfortunately, it is only implemented in very narrow circumstances.  You
> > have to be doing bitmap index scans of many widely scattered rows to make it
> > useful.  I don't think that this is all that common of a situation.  The
> > problem is that at every point in the scan, it has to be possible to know
> > what data block it is going to want N iterations in the future, so you can
> > inform the kernel to pre-fetch it.  That is only easy to know for bitmap
> > scans.
> 
> I think that you could prefetch in index scans by using the
> pointers/downlinks in the immediate parent page of the leaf page that
> the index scan currently pins. The sibling pointer in the leaf itself
> is no good for this, because there is only one block to prefetch
> available at a time.

Surely you could prefetch all the heap pages pointed to by index items
in the current leaf index page ...

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] effective_io_concurrency increasing

2017-06-19 Thread Peter Geoghegan
On Mon, Jun 19, 2017 at 8:36 AM, Jeff Janes  wrote:
> Unfortunately, it is only implemented in very narrow circumstances.  You
> have to be doing bitmap index scans of many widely scattered rows to make it
> useful.  I don't think that this is all that common of a situation.  The
> problem is that at every point in the scan, it has to be possible to know
> what data block it is going to want N iterations in the future, so you can
> inform the kernel to pre-fetch it.  That is only easy to know for bitmap
> scans.

I think that you could prefetch in index scans by using the
pointers/downlinks in the immediate parent page of the leaf page that
the index scan currently pins. The sibling pointer in the leaf itself
is no good for this, because there is only one block to prefetch
available at a time.

I think that this is the way index scan prefetch is normally
implemented. Index scans will on average have a much more random
access pattern than what is typical for bitmap heap scans, making this
optimization more compelling, so hopefully someone will get around to
this.

-- 
Peter Geoghegan


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] effective_io_concurrency increasing

2017-06-19 Thread Bruce Momjian
On Mon, Jun 19, 2017 at 10:49:59AM -0500, Merlin Moncure wrote:
> On Mon, Jun 19, 2017 at 10:36 AM, Jeff Janes  wrote:
> > If you have a RAID, set it to the number of spindles in your RAID and forget
> > it. It is usually one of the less interesting knobs to play with.  (Unless
> > your usage pattern of the database is unusual and exact fits the above
> > pattern.)
> 
> Isn't that advice obsolete in a SSD world though?  I was able to show
> values up to 256 for a single device provided measurable gains for a
> single S3500.  It's true though that the class of queries that this
> would help is pretty narrow.

Our developer docs are much clearer:


https://www.postgresql.org/docs/10/static/runtime-config-resource.html#runtime-config-resource-disk

For magnetic drives, a good starting point for this setting is the
number of separate drives comprising a RAID 0 stripe or RAID 1 mirror
being used for the database. (For RAID 5 the parity drive should not be
counted.) However, if the database is often busy with multiple queries
issued in concurrent sessions, lower values may be sufficient to keep
the disk array busy. A value higher than needed to keep the disks busy
will only result in extra CPU overhead. SSDs and other memory-based
storage can often process many concurrent requests, so the best value
might be in the hundreds.

I didn't backpatch this change since the original docs were not
incorrect.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] effective_io_concurrency increasing

2017-06-19 Thread Jeff Janes
On Mon, Jun 19, 2017 at 8:49 AM, Merlin Moncure  wrote:

> On Mon, Jun 19, 2017 at 10:36 AM, Jeff Janes  wrote:
> > If you have a RAID, set it to the number of spindles in your RAID and
> forget
> > it. It is usually one of the less interesting knobs to play with.
> (Unless
> > your usage pattern of the database is unusual and exact fits the above
> > pattern.)
>
> Isn't that advice obsolete in a SSD world though?  I was able to show
> values up to 256 for a single device provided measurable gains for a
> single S3500.  It's true though that the class of queries that this
> would help is pretty narrow.


I don't think it is obsolete, you just have to be creative with how you
interpret 'spindle' :)

With a single laptop hard-drive, I could get improvements of about 2 fold
by setting it to very high numbers, like 50 or 80. By giving the hard drive
the option of dozens of different possible sectors to read next, it could
minimize head-seek.  But that is with just one query running at a time.
With multiple queries all running simultaneously all trying to take
advantage of this, performance gains quickly fell apart.  I would expect
the SSD situation to be similar to that, where the improvements are
measurable but also fragile, but I haven't tested it.

Cheers,

Jeff


Re: [GENERAL] effective_io_concurrency increasing

2017-06-19 Thread Merlin Moncure
On Mon, Jun 19, 2017 at 10:36 AM, Jeff Janes  wrote:
> If you have a RAID, set it to the number of spindles in your RAID and forget
> it. It is usually one of the less interesting knobs to play with.  (Unless
> your usage pattern of the database is unusual and exact fits the above
> pattern.)

Isn't that advice obsolete in a SSD world though?  I was able to show
values up to 256 for a single device provided measurable gains for a
single S3500.  It's true though that the class of queries that this
would help is pretty narrow.

merlin


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] effective_io_concurrency increasing

2017-06-19 Thread Jeff Janes
On Sun, Jun 18, 2017 at 7:09 PM, David G. Johnston <
david.g.johns...@gmail.com> wrote:

> On Sun, Jun 18, 2017 at 6:02 PM, Patrick B 
> wrote:
>
>> Hi guys.
>>
>> I just wanna understand the effective_io_concurrency value better.
>>
>> My current Master database server has 16 vCPUS and I use
>> ​​
>>  effective_io_concurrency = 0.
>>
>
> ​It seems as though the number of virtual CPUs little to no bearing on
> whether, or to what value, you should set this parameter.  Obviously with
> only one CPU parallelism wouldn't be possible (I'm assuming a single query
> does not make multiple parallel requests for data)
>

Ah, but it does.  That is exactly what this parameter is for.

Unfortunately, it is only implemented in very narrow circumstances.  You
have to be doing bitmap index scans of many widely scattered rows to make
it useful.  I don't think that this is all that common of a situation.  The
problem is that at every point in the scan, it has to be possible to know
what data block it is going to want N iterations in the future, so you can
inform the kernel to pre-fetch it.  That is only easy to know for bitmap
scans.

If you have a RAID, set it to the number of spindles in your RAID and
forget it. It is usually one of the less interesting knobs to play with.
 (Unless your usage pattern of the database is unusual and exact fits the
above pattern.)


Cheers,

Jeff


Re: [GENERAL] effective_io_concurrency increasing

2017-06-19 Thread Andreas Kretschmer



Am 19.06.2017 um 03:02 schrieb Patrick B:

Hi guys.

I just wanna understand the effective_io_concurrency value better.

My current Master database server has 16 vCPUS and I 
use effective_io_concurrency = 0.


What can be the benefits of increasing that number? Also, do you guys 
have any recommendations?



as far as i know, at the moment only bitmap-index-scans would benefit 
from higher values. You can try 16 or 32 as starting point.

(if you have a proper io-controller with cache)


(it has nothing to do with parallel execution of queries)


Regards, Andreas

--
2ndQuadrant - The PostgreSQL Support Company.
www.2ndQuadrant.com



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] effective_io_concurrency increasing

2017-06-18 Thread David G. Johnston
On Sun, Jun 18, 2017 at 6:02 PM, Patrick B  wrote:

> Hi guys.
>
> I just wanna understand the effective_io_concurrency value better.
>
> My current Master database server has 16 vCPUS and I use
> ​​
>  effective_io_concurrency = 0.
>

​It seems as though the number of virtual CPUs little to no bearing on
whether, or to what value, you should set this parameter.  Obviously with
only one CPU parallelism wouldn't be possible (I'm assuming a single query
does not make multiple parallel requests for data) but the value seems to
strictly describe a characteristic the I/O subsystem.  Whether you can
fully leverage a properly set large value is another matter.

​As general advice, even you are using a soon to be obsolete (or any
non-current really) version of PostgreSQL when you are learning about a new
concept checking the most recent docs can be helpful.  Generally only bugs
in the docs get back-patched but a number of doc contributions are not bug
related but helpful none-the-less.

https://www.postgresql.org/docs/devel/static/runtime-config-resource.html​

​In short, if you want any good advice you will need to figure out the
specifics of your I/O subsystem (non-volatile memory and any associated
hardware), and share that with the list.​ Lacking rules-of-thumb learning
how to test your system and measure changes would help get you to the end
goal.  Sadly not a skill I've really picked up as of yet.

​David J.​


Re: [GENERAL] effective_io_concurrency increasing

2017-06-18 Thread Melvin Davidson
*As per the docs:*



*1. This is dependent on whether or not you are using a RAID disk,2. "Some
experimentation may be needed to find the best value"*

*IOW, there is no general recommendation.*

On Sun, Jun 18, 2017 at 9:24 PM, Patrick B  wrote:

>
>
> 2017-06-19 13:19 GMT+12:00 Melvin Davidson :
>
>>
>>
>> On Sun, Jun 18, 2017 at 9:02 PM, Patrick B 
>> wrote:
>>
>>> Hi guys.
>>>
>>> I just wanna understand the effective_io_concurrency value better.
>>>
>>> My current Master database server has 16 vCPUS and I
>>> use effective_io_concurrency = 0.
>>>
>>> What can be the benefits of increasing that number? Also, do you guys
>>> have any recommendations?
>>>
>>> I'm using PG 9.2 and the official doc does not say much about which
>>> value you should use.
>>>
>>> If I put it to 1, does it mean I can have a query spread into 1
>>> processor?
>>>
>>> Thanks
>>> P
>>>
>>
>>
>>
>> *Perhaps you should read the
>> doc.https://www.postgresql.org/docs/9.2/static/runtime-config-resource.html
>> *
>> 18.4.6. Asynchronous Behavior *effective_io_concurrency (integer)*
>>
>
>
> I've done that! But I'm looking for some personal experiences and
> suggestions!!
>



-- 
*Melvin Davidson*
I reserve the right to fantasize.  Whether or not you
wish to share my fantasy is entirely up to you.


Re: [GENERAL] effective_io_concurrency increasing

2017-06-18 Thread Patrick B
2017-06-19 13:19 GMT+12:00 Melvin Davidson :

>
>
> On Sun, Jun 18, 2017 at 9:02 PM, Patrick B 
> wrote:
>
>> Hi guys.
>>
>> I just wanna understand the effective_io_concurrency value better.
>>
>> My current Master database server has 16 vCPUS and I
>> use effective_io_concurrency = 0.
>>
>> What can be the benefits of increasing that number? Also, do you guys
>> have any recommendations?
>>
>> I'm using PG 9.2 and the official doc does not say much about which value
>> you should use.
>>
>> If I put it to 1, does it mean I can have a query spread into 1 processor?
>>
>> Thanks
>> P
>>
>
>
>
> *Perhaps you should read the
> doc.https://www.postgresql.org/docs/9.2/static/runtime-config-resource.html
> *
> 18.4.6. Asynchronous Behavior *effective_io_concurrency (integer)*
>


I've done that! But I'm looking for some personal experiences and
suggestions!!


Re: [GENERAL] effective_io_concurrency increasing

2017-06-18 Thread Melvin Davidson
On Sun, Jun 18, 2017 at 9:02 PM, Patrick B  wrote:

> Hi guys.
>
> I just wanna understand the effective_io_concurrency value better.
>
> My current Master database server has 16 vCPUS and I
> use effective_io_concurrency = 0.
>
> What can be the benefits of increasing that number? Also, do you guys have
> any recommendations?
>
> I'm using PG 9.2 and the official doc does not say much about which value
> you should use.
>
> If I put it to 1, does it mean I can have a query spread into 1 processor?
>
> Thanks
> P
>



*Perhaps you should read the
doc.https://www.postgresql.org/docs/9.2/static/runtime-config-resource.html
*
18.4.6. Asynchronous Behavior *effective_io_concurrency (integer)*
*Melvin Davidson*
I reserve the right to fantasize.  Whether or not you
wish to share my fantasy is entirely up to you.


[GENERAL] effective_io_concurrency increasing

2017-06-18 Thread Patrick B
Hi guys.

I just wanna understand the effective_io_concurrency value better.

My current Master database server has 16 vCPUS and I
use effective_io_concurrency = 0.

What can be the benefits of increasing that number? Also, do you guys have
any recommendations?

I'm using PG 9.2 and the official doc does not say much about which value
you should use.

If I put it to 1, does it mean I can have a query spread into 1 processor?

Thanks
P