Re: [HACKERS] Erroneous cost estimation for nested loop join

2015-12-08 Thread Robert Haas
On Wed, Dec 2, 2015 at 8:42 PM, Bruce Momjian  wrote:
> No one mentioned the random page docs so I will quote it here:
>
> 
> http://www.postgresql.org/docs/9.5/static/runtime-config-query.html#RUNTIME-CONFIG-QUERY-CONSTANTS
>
> Random access to mechanical disk storage is normally much more 
> expensive
> than four times sequential access. However, a lower default is used
> (4.0) because the majority of random accesses to disk, such as indexed
> reads, are assumed to be in cache. The default value can be thought of
> as modeling random access as 40 times slower than sequential, while
> expecting 90% of random reads to be cached.
>
> If you believe a 90% cache rate is an incorrect assumption for your
> workload, you can increase random_page_cost to better reflect the true
> cost of random storage reads. Correspondingly, if your data is likely 
> to
> be completely in cache, such as when the database is smaller than the
> total server memory, decreasing random_page_cost can be appropriate.
> Storage that has a low random read cost relative to sequential, e.g.
> solid-state drives, might also be better modeled with a lower value 
> for
> random_page_cost.
>
> What we don't have is way to know how much is in the cache, not only at
> planning time, but at execution time.  (Those times are often
> different for prepared queries.)  I think that is the crux of what has
> to be addressed here.

I think that paragraph is more of an apology for the system that we've
got than a description of what a good one would look like.  If I have
a 1MB table and a 1TB, they are not equally likely to be cached.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Erroneous cost estimation for nested loop join

2015-12-02 Thread Bruce Momjian
On Mon, Nov 30, 2015 at 04:29:43PM +0900, KAWAMICHI Ryoji wrote:
> 
> 
> Robert Haas  wrote:
> > 
> > - If we're sequential scanning a small table, let's say less than 1/4
> > of shared_buffers, which is the point where synchronized scans kick
> > in, then assume the data is coming from shared_buffers.
> > - If we're scanning a medium-sized table, let's say less than
> > effective_cache_size, then assume the data is coming from the OS
> > cache.  Maybe this is the same cost as the previous case, or maybe
> > it's slightly more.
> > - Otherwise, assume that the first effective_cache_size pages are
> > coming from cache and the rest has to be read from disk.  This is
> > perhaps unrealistic, but we don't want the cost curve to be
> > discontinuous.
> 
> I think this improvement is so reasonable, and I expect it will be merged 
> into current optimizer code.
> 
> 
> > A problem with this sort of thing, of course, is that it's really hard
> > to test a proposed change broadly enough to be certain how it will
> > play out in the real world.
> 
> That’s the problem we’re really interested in and trying to tackle.
> 
> For example, with extensive experiments, I’m really sure my modification of 
> cost model is effective for our environment, but I can’t see if it is also 
> efficient or unfortunately harmful in general environments.
> 
> And I think that, in postgres community, there must be (maybe buried) 
> knowledge on how to judge the effectiveness of cost model modifications 
> because someone should have considered something like that at each commit.
> I’m interested in it, and hopefully would like to contribute to finding 
> a better way to improve the optimizer through cost model refinement.

No one mentioned the random page docs so I will quote it here:


http://www.postgresql.org/docs/9.5/static/runtime-config-query.html#RUNTIME-CONFIG-QUERY-CONSTANTS

Random access to mechanical disk storage is normally much more expensive
than four times sequential access. However, a lower default is used
(4.0) because the majority of random accesses to disk, such as indexed
reads, are assumed to be in cache. The default value can be thought of
as modeling random access as 40 times slower than sequential, while
expecting 90% of random reads to be cached.

If you believe a 90% cache rate is an incorrect assumption for your
workload, you can increase random_page_cost to better reflect the true
cost of random storage reads. Correspondingly, if your data is likely to
be completely in cache, such as when the database is smaller than the
total server memory, decreasing random_page_cost can be appropriate.
Storage that has a low random read cost relative to sequential, e.g.
solid-state drives, might also be better modeled with a lower value for
random_page_cost.

What we don't have is way to know how much is in the cache, not only at
planning time, but at execution time.  (Those times are often
different for prepared queries.)  I think that is the crux of what has
to be addressed here.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Erroneous cost estimation for nested loop join

2015-11-29 Thread KAWAMICHI Ryoji


Robert Haas  wrote:
> 
> - If we're sequential scanning a small table, let's say less than 1/4
> of shared_buffers, which is the point where synchronized scans kick
> in, then assume the data is coming from shared_buffers.
> - If we're scanning a medium-sized table, let's say less than
> effective_cache_size, then assume the data is coming from the OS
> cache.  Maybe this is the same cost as the previous case, or maybe
> it's slightly more.
> - Otherwise, assume that the first effective_cache_size pages are
> coming from cache and the rest has to be read from disk.  This is
> perhaps unrealistic, but we don't want the cost curve to be
> discontinuous.

I think this improvement is so reasonable, and I expect it will be merged 
into current optimizer code.


> A problem with this sort of thing, of course, is that it's really hard
> to test a proposed change broadly enough to be certain how it will
> play out in the real world.

That’s the problem we’re really interested in and trying to tackle.

For example, with extensive experiments, I’m really sure my modification of 
cost model is effective for our environment, but I can’t see if it is also 
efficient or unfortunately harmful in general environments.

And I think that, in postgres community, there must be (maybe buried) 
knowledge on how to judge the effectiveness of cost model modifications 
because someone should have considered something like that at each commit.
I’m interested in it, and hopefully would like to contribute to finding 
a better way to improve the optimizer through cost model refinement.

Thanks.
Ryoji.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Erroneous cost estimation for nested loop join

2015-11-17 Thread Robert Haas
On Mon, Nov 16, 2015 at 6:50 PM, Jeff Janes  wrote:
> On Mon, Nov 9, 2015 at 6:37 AM, Tom Lane  wrote:
>> kawami...@tkl.iis.u-tokyo.ac.jp writes:
>>>   - cost parameter calibration: random_page_cost = 92.89
>>
>> TBH, you lost me there already.  I know of no hardware on which that would
>> be a sane depiction of reality, so I think you've probably overfitted the
>> model to some particular case it was already inaccurate on.
>
> I can easily get a ratio of random to sequential of 50, and my RAID is
> nothing special.  I don't see why a high-end RAID couldn't justify 100
> or more, as long as they know the cache-hit is very low. (The default
> value of 4 seems to bake in the notion that 90% of even random IO is
> going to be satisfied from the cache, which might be a good estimate
> if you have frequently used smallish lookup tables that always get
> joined to the RAM-busters, but some people aren't going to have that
> type of database queries as their main load).

I agree.  What I've been thinking about is:

- If we're sequential scanning a small table, let's say less than 1/4
of shared_buffers, which is the point where synchronized scans kick
in, then assume the data is coming from shared_buffers.
- If we're scanning a medium-sized table, let's say less than
effective_cache_size, then assume the data is coming from the OS
cache.  Maybe this is the same cost as the previous case, or maybe
it's slightly more.
- Otherwise, assume that the first effective_cache_size pages are
coming from cache and the rest has to be read from disk.  This is
perhaps unrealistic, but we don't want the cost curve to be
discontinuous.

The effect of this would be that sequential scanning a small table
would look cheaper per page than sequential scanning a large table,
which is a good idea, because it's likely to be true.  Similar
adaptations could be made for index scans, index-only scans, bitmap
index scans, and bitmap heap scans.  The default value of
random_page_cost would go up, but there would be a new
cached_page_cost GUC that would apply in some cases where
seq_page_cost and/or random_page_cost apply today.

Previous attempts to improve the modeling of cache effects have
faltered because we don't know what will be cached at execution time,
and even if we did it can change very quickly and we don't want plan
instability.  But it seems to me that guessing based on the size of
the relation is pretty reasonable - essentially we'd be trying to
refine a model which says that every page is equally likely to be
cached, and that shouldn't be too high a bar to clear.

A problem with this sort of thing, of course, is that it's really hard
to test a proposed change broadly enough to be certain how it will
play out in the real world.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Erroneous cost estimation for nested loop join

2015-11-16 Thread Jeff Janes
On Mon, Nov 9, 2015 at 6:37 AM, Tom Lane  wrote:
> kawami...@tkl.iis.u-tokyo.ac.jp writes:
>>   - cost parameter calibration: random_page_cost = 92.89
>
> TBH, you lost me there already.  I know of no hardware on which that would
> be a sane depiction of reality, so I think you've probably overfitted the
> model to some particular case it was already inaccurate on.

I can easily get a ratio of random to sequential of 50, and my RAID is
nothing special.  I don't see why a high-end RAID couldn't justify 100
or more, as long as they know the cache-hit is very low. (The default
value of 4 seems to bake in the notion that 90% of even random IO is
going to be satisfied from the cache, which might be a good estimate
if you have frequently used smallish lookup tables that always get
joined to the RAM-busters, but some people aren't going to have that
type of database queries as their main load).

With the current code, a single scan out of several can get estimated
to have a higher cost than just a free-standing single scan
(loop_count > 1), and I don't see how that can ever make sense.

Right now it can only benefit from assumed cache hits (driven by
effective_cache_size) via Mackert and Lohman.

I think that, at least, it should get to claim the greater of either
the Mackert and Lohman benefit between inner scans, or the benefit of
converting some random IO to sequential within each separate inner
scan.

And really, I don't see why it should not get both benefits.  If the
pages are still in cache between inner scans, that's great.  But if
the one time they do need to be read in from disk they are read in
mostly sequentially, why is that benefit not also fully justified? I
don't see where the worry about "double-counting the cache effects"
comes from.  The effective_cache_size and io read-ahead can both apply
and both give benefits simultaneously and cumulatively.

Cheers,

Jeff


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Erroneous cost estimation for nested loop join

2015-11-12 Thread Jeff Janes
On Mon, Nov 9, 2015 at 2:42 AM, Simon Riggs  wrote:
> On 9 November 2015 at 10:08,  wrote:
>>
>>
>> We guessed the cause of this error would be in the cost model of Postgres,
>> and investigated the source code of optimizer, and we found the cause of
>> this problem. It was in the index cost estimation process. On scanning inner
>> table, if loop count is greater than 1, its I/O cost is counted as random
>> access. In the case of Query2, in one loop (i.e. one inner table scan) ,
>> much data is read sequentially with clustered index, so it seems to be wrong
>> that optimizer thinks its I/O workload is random access.
>
>
> We don't have a clustered index in Postgres. We do store correlation stats
> for the index, which is used in some places to reduce cost.
>
> Could you look some more at this and see if a model change that uses the
> correlation can improve this?

That is already happening.  min_IO_cost is set on the assumption of
perfect correlation, max_IO_cost is set on the assumption of no
correlation, and then later in the code it interpolates between these
two based on the observed correlation:

run_cost += max_IO_cost + csquared * (min_IO_cost - max_IO_cost);

But the min_IO_cost is very pessimistic in this particular case.  So I
think that their proposed change is already exactly what you are
asking them to try.

Cheers,

Jeff


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Erroneous cost estimation for nested loop join

2015-11-12 Thread KAWAMICHI Ryoji

 wrote:
>>
>> We guessed the cause of this error would be in the cost model of Postgres,
>> and investigated the source code of optimizer, and we found the cause of
>> this problem. It was in the index cost estimation process. On scanning
>> inner table, if loop count is greater than 1, its I/O cost is counted as
>> random access. In the case of Query2, in one loop (i.e. one inner table
>> scan) , much data is read sequentially with clustered index, so it seems to
>> be wrong that optimizer thinks its I/O workload is random access.
>>
> 
> We don't have a clustered index in Postgres. We do store correlation stats
> for the index, which is used in some places to reduce cost.

Yes, postgres does not have a clustered index as you pointed. I meant an index 
whose correlation is 1.0 by using word “clustered index”. In this case, 
the index is primary key (records are physically ordered by this) and the index 
was created just after the whole data was loaded. We’ve been assuming OLAP 
workload for our experiments, so I think correlation = 1.0 is the basic case 
for our experiments.

 
> Could you look some more at this and see if a model change that uses the
> correlation can improve this?

I cannot understand the question so let me clarify. Did you mean that I should 
read the optimizer code more, and I can find the correlation is used to improve 
cost estimation? 

Of course I read them, and I know that correlation is used to determine 
the value between the min cost and max cost. (The min cost is the best case 
cost (i.e. correlation is 1.0), and the max cost is the worst case cost (i.e. 
correlation is 0).

But in both case, I/O cost is counted as random access on scanning inner table.
I think I/O cost should be counted as sequential access when the correlation is 
1.0, so I tried to modify the code as previous mail. But this modification is 
just an example of solution. I’m not so familiar with optimizer code yet, so 
I’m wondering this is the right way or not.

Thank you for your comment.
Ryoji


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Erroneous cost estimation for nested loop join

2015-11-12 Thread KAWAMICHI Ryoji


 wrote:
> 
> More knowledgeable people are sure to reply in more detail!
> 
> However, they would probably appreciate it if you can run with 9.4.5
> (the latest released version).  Running it with the beta of 9.5 would be
> a bonus!
> 
> Note that I don't know enough to say for sure that later versions would
> make any difference in this case, but at least using later later
> versions would kill lots of Red Herrings!  :-)
> 

Difference between minor versions does not matter in this case. I confirmed 
the cost calculation logic regarding this problem was not changed between 
Postgres 9.4.1 and 9.4.5.

I heard there are some performance improvements on 9.5. It might change 
something regarding this problem. so I will try these queries on 9.5!

Thanks
Ryoji


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Erroneous cost estimation for nested loop join

2015-11-12 Thread KAWAMICHI Ryoji

 wrote:
>>
>>   - cost parameter calibration: random_page_cost = 92.89
>>
> 
> This demands some explanation and raises question of value of seq_page_cost
> parameter -- I don't see anything about it your mail.

seq_page_cost was set to 1.0 (default), and I explained the reason about 
random_page_cost value in reply to Tom.
Please see it.

Thanks
Ryoji


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Erroneous cost estimation for nested loop join

2015-11-09 Thread Tom Lane
kawami...@tkl.iis.u-tokyo.ac.jp writes:
>   - cost parameter calibration: random_page_cost = 92.89

TBH, you lost me there already.  I know of no hardware on which that would
be a sane depiction of reality, so I think you've probably overfitted the
model to some particular case it was already inaccurate on.  Any results
you're getting using this setting will likely fall into the category of
"garbage in, garbage out".

What led you to choose that number?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Erroneous cost estimation for nested loop join

2015-11-09 Thread Simon Riggs
On 9 November 2015 at 10:08,  wrote:

>
> We guessed the cause of this error would be in the cost model of Postgres,
> and investigated the source code of optimizer, and we found the cause of
> this problem. It was in the index cost estimation process. On scanning
> inner table, if loop count is greater than 1, its I/O cost is counted as
> random access. In the case of Query2, in one loop (i.e. one inner table
> scan) , much data is read sequentially with clustered index, so it seems to
> be wrong that optimizer thinks its I/O workload is random access.
>

We don't have a clustered index in Postgres. We do store correlation stats
for the index, which is used in some places to reduce cost.

Could you look some more at this and see if a model change that uses the
correlation can improve this?

-- 
Simon Riggshttp://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: [HACKERS] Erroneous cost estimation for nested loop join

2015-11-09 Thread Gavin Flower

On 09/11/15 23:08, kawami...@tkl.iis.u-tokyo.ac.jp wrote:

Hi guys,

I’ve been using Postgres for research at an university,

Great!

[...]

・Postgres 9.4.1

[..]

More knowledgeable people are sure to reply in more detail!

However, they would probably appreciate it if you can run with 9.4.5 
(the latest released version).  Running it with the beta of 9.5 would be 
a bonus!


Note that I don't know enough to say for sure that later versions would 
make any difference in this case, but at least using later later 
versions would kill lots of Red Herrings!  :-)



Cheers,
Gavin


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Erroneous cost estimation for nested loop join

2015-11-09 Thread Shulgin, Oleksandr
On Mon, Nov 9, 2015 at 11:08 AM,  wrote:

>
>   - cost parameter calibration: random_page_cost = 92.89
>

This demands some explanation and raises question of value of seq_page_cost
parameter -- I don't see anything about it your mail.

--
Alex