Re: [HACKERS] BRIN cost estimate

2017-04-06 Thread Alvaro Herrera
Tom Lane wrote:

> TBH, I think that code is in the noise.  It doesn't involve any disk
> access, or catalog access, or user-defined function calls.  I wouldn't
> bother trying to account for it.

I removed it in the pushed version.

> What you should be accounting for is the ensuing heap page accesses,
> but I assume that's done somewhere else.

It's supposed to be accounted for, yeah.

One thing we do not account for is the number of extra heap accesses we
do for unsummarized ranges (mostly, heap has grown but the index doesn't
cover the new pages yet).

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] BRIN cost estimate

2017-04-06 Thread Tom Lane
David Rowley  writes:
> + *indexTotalCost += 0.1 * cpu_operator_cost * estimatedRanges *
> + pagesPerRange;

> This is trying to cost up the following code in bringetbitmap()

> if (addrange)
> {
> BlockNumber pageno;

> for (pageno = heapBlk;
> pageno <= heapBlk + opaque->bo_pagesPerRange - 1;
> pageno++)
> {
> MemoryContextSwitchTo(oldcxt);
> tbm_add_page(tbm, pageno);
> totalpages++;
> MemoryContextSwitchTo(perRangeCxt);
> }

> I'm charging 0.1 * cpu_operator_cost for each loop we expect to
> perform here.

TBH, I think that code is in the noise.  It doesn't involve any disk
access, or catalog access, or user-defined function calls.  I wouldn't
bother trying to account for it.

What you should be accounting for is the ensuing heap page accesses,
but I assume that's done somewhere else.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] BRIN cost estimate

2017-04-06 Thread David Rowley
On 6 April 2017 at 20:01, Emre Hasegeli  wrote:
>> + /*
>> + * Charge a small amount per range tuple which we expect to match to.
>> This
>> + * is meant to reflect the costs of manipulating the bitmap. The BRIN
>> scan
>> + * will set a bit for each page in the range when we find a matching
>> + * range, so we must multiply the charge by the number of pages in the
>> + * range.
>> + */
>> + *indexTotalCost += 0.1 * cpu_operator_cost * estimatedRanges *
>> + pagesPerRange;
>
>
> Isn't BRIN executes the operator only once per range?  I think we can just
> multiply cpu_operator_cost and estimatedRanges.

The above line is for the bitmap maintenance accounting, not the
scanning of the BRIN index. That's costed by:

+ *indexTotalCost += indexRanges * (cpu_index_tuple_cost + qual_op_cost);

So not the estimated matching ranges, but the total ranges, since
we'll scan all of them.

+ *indexTotalCost += 0.1 * cpu_operator_cost * estimatedRanges *
+ pagesPerRange;

This is trying to cost up the following code in bringetbitmap()

if (addrange)
{
BlockNumber pageno;

for (pageno = heapBlk;
pageno <= heapBlk + opaque->bo_pagesPerRange - 1;
pageno++)
{
MemoryContextSwitchTo(oldcxt);
tbm_add_page(tbm, pageno);
totalpages++;
MemoryContextSwitchTo(perRangeCxt);
}

I'm charging 0.1 * cpu_operator_cost for each loop we expect to
perform here. There is no tbm_add_range(), so instead, it performs a
tbm_add_page() for each page in the range, which is why I multiply
that cost by pagesPerRange.

>
>> I also noticed that you were doing:
>>
>> + if (get_index_stats_hook &&
>> + (*get_index_stats_hook) (root, index->indexoid, 1, ))
>>
>> and
>>
>> + vardata.statsTuple = SearchSysCache3(STATRELATTINH,
>> + ObjectIdGetDatum(index->indexoid),
>> + Int16GetDatum(1),
>>
>> I wondered why you picked to hardcode that as 1, and I thought that's
>> surely a bug. But then looking deeper it seems to be copied from
>> btcostestimate(), which also seems to be a long-standing bug
>> introduced in a536ed53bca40cb0d199824e358a86fcfd5db7f2. I'll go write
>> a patch for that one now.
>
>
> Yes, I copy-pasted from btcostestimate().

Turns out it's not a bug in btcostestimate(). It was just intending to
only ever get the stats for the first column in the index. Not what's
needed in the BRIN case.



-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] BRIN cost estimate

2017-04-06 Thread Emre Hasegeli
> Good point. That's wrong, but I'm confused at why you kept the:
>
> + *indexTotalCost += selec * numTuples * cpu_index_tuple_cost;
>
> at all if that's the case. All the BRIN scan is going to do is build a
> bitmap of the matching ranges found.

My mind was not clear when I was working on it a year ago.

I've ended up with:
>
> + /*
> + * Charge a small amount per range tuple which we expect to match to. This
> + * is meant to reflect the costs of manipulating the bitmap. The BRIN scan
> + * will set a bit for each page in the range when we find a matching
> + * range, so we must multiply the charge by the number of pages in the
> + * range.
> + */
> + *indexTotalCost += 0.1 * cpu_operator_cost * estimatedRanges *
> + pagesPerRange;


Isn't BRIN executes the operator only once per range?  I think we can just
multiply cpu_operator_cost and estimatedRanges.

I also noticed that you were doing:
>
> + if (get_index_stats_hook &&
> + (*get_index_stats_hook) (root, index->indexoid, 1, ))
>
> and
>
> + vardata.statsTuple = SearchSysCache3(STATRELATTINH,
> + ObjectIdGetDatum(index->indexoid),
> + Int16GetDatum(1),
>
> I wondered why you picked to hardcode that as 1, and I thought that's
> surely a bug. But then looking deeper it seems to be copied from
> btcostestimate(), which also seems to be a long-standing bug
> introduced in a536ed53bca40cb0d199824e358a86fcfd5db7f2. I'll go write
> a patch for that one now.


Yes, I copy-pasted from btcostestimate().

I do still have concerns about how the correlation value is used, and
> the fact that we use the highest value, but then the whole use of this
> value is far from perfect, and is just a rough indication of how
> things are.


I agree.  I tried to document how incomplete it is on the comments I wrote
to help future developers improve the situation.


Re: [HACKERS] BRIN cost estimate

2017-04-05 Thread David Rowley
On 5 April 2017 at 17:34, Emre Hasegeli  wrote:
>
>> Interested to hear comments on this.
>
>
> I don't have chance to test it right now, but I am sure it would be an
> improvement over what we have right now.  There is no single correct
> equation with so many unknowns we have.
>
>>   *indexTotalCost += (numTuples * *indexSelectivity) *
>> (cpu_index_tuple_cost + qual_op_cost);
>
> Have you preserved this line intentionally?  This would make BRIN index scan
> cost even higher, but the primary property of BRIN is to be cheap to scan.
> Shouldn't bitmap heap scan node take into account of checking the tuples in
> its cost?

Good point. That's wrong, but I'm confused at why you kept the:

+ *indexTotalCost += selec * numTuples * cpu_index_tuple_cost;

at all if that's the case. All the BRIN scan is going to do is build a
bitmap of the matching ranges found.

I've ended up with:

+ /*
+ * Charge a small amount per range tuple which we expect to match to. This
+ * is meant to reflect the costs of manipulating the bitmap. The BRIN scan
+ * will set a bit for each page in the range when we find a matching
+ * range, so we must multiply the charge by the number of pages in the
+ * range.
+ */
+ *indexTotalCost += 0.1 * cpu_operator_cost * estimatedRanges *
+ pagesPerRange;

Which is inspired from cost_bitmap_tree_node(), and seems like a fair
cost for setting a single bit.

I also noticed that you were doing:

+ if (get_index_stats_hook &&
+ (*get_index_stats_hook) (root, index->indexoid, 1, ))

and

+ vardata.statsTuple = SearchSysCache3(STATRELATTINH,
+ ObjectIdGetDatum(index->indexoid),
+ Int16GetDatum(1),

I wondered why you picked to hardcode that as 1, and I thought that's
surely a bug. But then looking deeper it seems to be copied from
btcostestimate(), which also seems to be a long-standing bug
introduced in a536ed53bca40cb0d199824e358a86fcfd5db7f2. I'll go write
a patch for that one now.

I've been looking at the estimates output by the following:

create table ab (a int, b int);
insert into ab select x/1000,x%1000 from generate_series(1,100)x;
create index ab_a_idx on ab using brin (a) with (pages_per_range = 128);
analyze ab;
explain analyze select * from ab where a = 1;
drop index ab_a_idx;
create index ab_a_idx on ab using brin (a) with (pages_per_range = 64);
analyze ab;
explain analyze select * from ab where a = 1;
drop index ab_a_idx;
create index ab_a_idx on ab using brin (a) with (pages_per_range = 32);
analyze ab;
explain analyze select * from ab where a = 1;
drop index ab_a_idx;
create index ab_a_idx on ab using brin (a) with (pages_per_range = 16);
analyze ab;
explain analyze select * from ab where a = 1;
drop index ab_a_idx;
create index ab_a_idx on ab using brin (a) with (pages_per_range = 8);
analyze ab;
explain analyze select * from ab where a = 1;
drop index ab_a_idx;
create index ab_a_idx on ab using brin (a) with (pages_per_range = 4);
analyze ab;
explain analyze select * from ab where a = 1;
drop index ab_a_idx;
create index ab_a_idx on ab using brin (a) with (pages_per_range = 2);
analyze ab;
explain analyze select * from ab where a = 1;
drop index ab_a_idx;
create index ab_a_idx on ab using brin (a) with (pages_per_range = 1);
analyze ab;
explain analyze select * from ab where a = 1;
drop index ab_a_idx;

and I see that the bitmap index scan gets more expensive the more
ranges that are in the index, which of course makes sense. The small
bitmap maintenance cost I added should stay about the same, since I
multiplied it by the pagesPerRange, it may only drop slightly as it
may expect to set fewer bits in the bitmap due to the ranges being
more specific to the tuples in the heap. The row estimates seem pretty
sane there too, based on how many were filtered out on the recheck.

I do still have concerns about how the correlation value is used, and
the fact that we use the highest value, but then the whole use of this
value is far from perfect, and is just a rough indication of how
things are.

Apart from that, I'm pretty happy with this now.



-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


brin-correlation-drowley_v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] BRIN cost estimate

2017-04-04 Thread Emre Hasegeli
> Interested to hear comments on this.


I don't have chance to test it right now, but I am sure it would be an
improvement over what we have right now.  There is no single correct
equation with so many unknowns we have.

>   *indexTotalCost += (numTuples * *indexSelectivity) *
(cpu_index_tuple_cost + qual_op_cost);

Have you preserved this line intentionally?  This would make BRIN index
scan cost even higher, but the primary property of BRIN is to be cheap to
scan. Shouldn't bitmap heap scan node take into account of checking the
tuples in its cost?


Re: [HACKERS] BRIN cost estimate

2017-04-04 Thread David Rowley
On 3 April 2017 at 03:05, Emre Hasegeli  wrote:
> Unfortunately, I am on vacation for two weeks without my computer.  I can
> post another version after 18th.  I know we are under time pressure for
> release.  I wouldn't mind if you or Alvaro would change it anyway you like.

I've made some changes. Actually, I completely changed how the
estimates work. I find this method more self-explanatory.

Basically, we work out the total index ranges, then work out how many
of those we'd touch in a perfectly correlated scenario. We then work
out how many ranges we'll probably visit based on the correlation
estimates from the stats, and assume the selectivity is probableRanges
/ totalIndexRanges.

I've attached a spreadsheet that compares Emre's method to mine. Mine
seems to favour the BRIN index less when the table is small. I think
this is pretty natural since if there is only 1 range, and we narrow
the result to one of them, then we might as well have performed a
seqscan.

My method seems favour BRIN a bit longer when the correlation is
between about 1% and 100%. But penalises BRIN much more when
correlation is less than around 1%. This may be better my way is
certainly smarter than the unpatched version, but still holds on a bit
longer, which may be more favourable if a BRIN index actually exists.
It might be more annoying for a user to have added a BRIN index and
never have the planner choose it.

My method also never suffers from estimating > 100% of the table.

I was a bit worried that Emre's method would penalise BRIN too much
when the correlation is not so high.

Interested to hear comments on this.

Please feel free to play with the spreadsheet by changing rows 1-3 in column B.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


brin-correlation-drowley_v1.patch
Description: Binary data


BRIN_estimates2.ods
Description: application/vnd.oasis.opendocument.spreadsheet

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] BRIN cost estimate

2017-04-02 Thread Emre Hasegeli
- cond := format('%I %s %L', r.colname, r.oper, r.value);
+ cond := format('%s %s %L', r.colname, r.oper, r.value);

Why did you change this? It seems unrelated.


Must be a mistake.

+ indexRel = index_open(index->indexoid, AccessShareLock);
+ pagesPerRange = Min(BrinGetPagesPerRange(indexRel), baserel->pages);
+ Assert(baserel->pages > 0);
+ Assert(pagesPerRange > 0);
+ rangeProportion = (double) pagesPerRange / baserel->pages;
+ numRanges = 1.0 + (double) baserel->pages / pagesPerRange;
+ index_close(indexRel, AccessShareLock);

Why do you add 1.0 to numRanges? This gives one too many ranges.


I have tried to prevent division by zero that can happen later, but your
solution below sounds cleaner to me.

I also don't really like the way you're setting pagesPerRange. I'd suggest
keeping that as the raw value from the index metadata, and doing:

If you want the actual number of ranges then I think this is best expressed
as:

numRanges = Max(ceil(baserel->pages / pagesPerRange), 1.0);

The rangeProportion variable could be calculated based on that
too rangeProportion = 1.0 / numRanges;

That way you don't have to rely on relpages being > 0. It seems like a bad
assumption to make. I see there's some hack in estimate_rel_size() making
that true, but I think it's best not to rely on that hack.

- qual_op_cost = cpu_operator_cost *
- (list_length(indexQuals) + list_length(indexOrderBys));
-
  *indexStartupCost += qual_arg_cost;
  *indexTotalCost += qual_arg_cost;
- *indexTotalCost += (numTuples * *indexSelectivity) *
(cpu_index_tuple_cost + qual_op_cost);
  *indexPages = index->pages;

- /* XXX what about pages_per_range? */
+ /*
+ * Add index qual op costs.  Unlike other indexes, we are not processing
+ * tuples but ranges.
+ */
+ qual_op_cost = cpu_operator_cost * list_length(indexQuals);
+ *indexTotalCost += numRanges * qual_op_cost;

What's the reason for removing the  + list_length(indexOrderBys) here? I
don't think it's the business of this patch to touch that. BRIN may one day
support that by partition sorting non-overlapping ranges, so that could
well be why it was there in the first place.


I thought it was a mistake.  I agree we better not change it, even though I
have no idea how cost of BRIN sorting can be calculated.  I guess the
indexOrderBys list would always be empty for now, anyway.

- *indexTotalCost += (numTuples * *indexSelectivity) *
(cpu_index_tuple_cost + qual_op_cost);
+ *indexTotalCost += selec * numTuples * cpu_index_tuple_cost;

Why is this not being charged qual_op_cost anymore?


I must have thought that BRIN doesn't execute the qual unlike btree.  I am
not sure what is the best equation to put there.  Previously, we were
returning good selectivity, but high cost.  I believe things should be
other way around.  Maybe what we really need is to use "numRanges" in here
instead of "selec * numTuples".

+ * Our starting point is that BRIN selectivity has to be less than the
+ * selectivity of the btree.  We are using a product of logical and

Can you explain why this is the case?


My wording sounds wrong in here.  I should have said "worse than" instead
of "less than".

+ * class "minmax", and makes a little sense for the other one "inclusion".

"and" I think should be "but"


I agree.

I think it would also be good to write some regression tests for this. All
the changes I see that you did make to brin.sql seem unrelated to this
patch.


I added an expression index to test getting correlation of expressions.
The BRIN tests would call the estimation functions, but we don't have any
tests to check the result of the functions.  We can maybe prepare a test to
check BRIN is prefferred over btree when it would perform better, and maybe
also vice versa.

Unfortunately, I am on vacation for two weeks without my computer.  I can
post another version after 18th.  I know we are under time pressure for
release.  I wouldn't mind if you or Alvaro would change it anyway you like.


Re: [HACKERS] BRIN cost estimate

2017-04-02 Thread David Rowley
On 27 March 2017 at 00:44, Emre Hasegeli  wrote:

> > If we want to have a variable which stores the number of ranges, then
> > I think numRanges is better than numBlocks. I can't imagine many
> > people would disagree there.
>
> I renamed it with other two.
>
> > At the very least please write a comment to explain this in the code.
> > Right now it looks broken. If I noticed this then one day in the
> > future someone else will. If you write a comment then person of the
> > future will likely read it, and then not raise any questions about the
> > otherwise questionable code.
>
> I added a sentence about it.
>
> > I do strongly agree that the estimates need improved here. I've
> > personally had issues with bad brin estimates before, and I'd like to
> > see it improved. I think the patch is not quite complete without it
> > also considering stats on expression indexes. If you have time to go
> > do that I'd suggest you go ahead with that.
>
> I copy-pasted expression index support from btcostestimate() as well,
> and extended the regression test.
>
> I think this function can use more polishing before committing, but I
> don't know where to begin.  There are single functions for every
> access method in here.  I don't like them being in there to begin
> with.  selfuncs.s is quite long with all kinds of dependencies and
> dependents.  I think it would be better to have the access method
> selectivity estimation functions under src/access.  Maybe we should
> start doing so by moving this to src/access/brin/brin_selfuncs.c.
> What do you think?
>

Looking over this again.

- cond := format('%I %s %L', r.colname, r.oper, r.value);
+ cond := format('%s %s %L', r.colname, r.oper, r.value);

Why did you change this? It seems unrelated.

+ indexRel = index_open(index->indexoid, AccessShareLock);
+ pagesPerRange = Min(BrinGetPagesPerRange(indexRel), baserel->pages);
+ Assert(baserel->pages > 0);
+ Assert(pagesPerRange > 0);
+ rangeProportion = (double) pagesPerRange / baserel->pages;
+ numRanges = 1.0 + (double) baserel->pages / pagesPerRange;
+ index_close(indexRel, AccessShareLock);

Why do you add 1.0 to numRanges? This gives one too many ranges.

I also don't really like the way you're setting pagesPerRange. I'd suggest
keeping that as the raw value from the index metadata, and doing:

If you want the actual number of ranges then I think this is best expressed
as:

numRanges = Max(ceil(baserel->pages / pagesPerRange), 1.0);

The rangeProportion variable could be calculated based on that
too rangeProportion = 1.0 / numRanges;

That way you don't have to rely on relpages being > 0. It seems like a bad
assumption to make. I see there's some hack in estimate_rel_size() making
that true, but I think it's best not to rely on that hack.

- qual_op_cost = cpu_operator_cost *
- (list_length(indexQuals) + list_length(indexOrderBys));
-
  *indexStartupCost += qual_arg_cost;
  *indexTotalCost += qual_arg_cost;
- *indexTotalCost += (numTuples * *indexSelectivity) *
(cpu_index_tuple_cost + qual_op_cost);
  *indexPages = index->pages;

- /* XXX what about pages_per_range? */
+ /*
+ * Add index qual op costs.  Unlike other indexes, we are not processing
+ * tuples but ranges.
+ */
+ qual_op_cost = cpu_operator_cost * list_length(indexQuals);
+ *indexTotalCost += numRanges * qual_op_cost;

What's the reason for removing the  + list_length(indexOrderBys) here? I
don't think it's the business of this patch to touch that. BRIN may one day
support that by partition sorting non-overlapping ranges, so that could
well be why it was there in the first place.

- *indexTotalCost += (numTuples * *indexSelectivity) *
(cpu_index_tuple_cost + qual_op_cost);
+ *indexTotalCost += selec * numTuples * cpu_index_tuple_cost;

Why is this not being charged qual_op_cost anymore?

+ * Our starting point is that BRIN selectivity has to be less than the
+ * selectivity of the btree.  We are using a product of logical and

Can you explain why this is the case?

+ * class "minmax", and makes a little sense for the other one "inclusion".

"and" I think should be "but"

I think it would also be good to write some regression tests for this. All
the changes I see that you did make to brin.sql seem unrelated to this
patch.

I've also attached a spreadsheet that can be used to play around with the
estimates your patch is giving.


-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


BRIN_estimates2.ods
Description: application/vnd.oasis.opendocument.spreadsheet

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] BRIN cost estimate

2017-03-31 Thread David Steele

On 3/26/17 7:44 AM, Emre Hasegeli wrote:

If we want to have a variable which stores the number of ranges, then
I think numRanges is better than numBlocks. I can't imagine many
people would disagree there.


I renamed it with other two.


At the very least please write a comment to explain this in the code.
Right now it looks broken. If I noticed this then one day in the
future someone else will. If you write a comment then person of the
future will likely read it, and then not raise any questions about the
otherwise questionable code.


I added a sentence about it.


I do strongly agree that the estimates need improved here. I've
personally had issues with bad brin estimates before, and I'd like to
see it improved. I think the patch is not quite complete without it
also considering stats on expression indexes. If you have time to go
do that I'd suggest you go ahead with that.


I copy-pasted expression index support from btcostestimate() as well,
and extended the regression test.

I think this function can use more polishing before committing, but I
don't know where to begin.  There are single functions for every
access method in here.  I don't like them being in there to begin
with.  selfuncs.s is quite long with all kinds of dependencies and
dependents.  I think it would be better to have the access method
selectivity estimation functions under src/access.  Maybe we should
start doing so by moving this to src/access/brin/brin_selfuncs.c.
What do you think?


I have marked this patch "Needs Review".

--
-David
da...@pgmasters.net


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] BRIN cost estimate

2017-03-26 Thread Emre Hasegeli
> If we want to have a variable which stores the number of ranges, then
> I think numRanges is better than numBlocks. I can't imagine many
> people would disagree there.

I renamed it with other two.

> At the very least please write a comment to explain this in the code.
> Right now it looks broken. If I noticed this then one day in the
> future someone else will. If you write a comment then person of the
> future will likely read it, and then not raise any questions about the
> otherwise questionable code.

I added a sentence about it.

> I do strongly agree that the estimates need improved here. I've
> personally had issues with bad brin estimates before, and I'd like to
> see it improved. I think the patch is not quite complete without it
> also considering stats on expression indexes. If you have time to go
> do that I'd suggest you go ahead with that.

I copy-pasted expression index support from btcostestimate() as well,
and extended the regression test.

I think this function can use more polishing before committing, but I
don't know where to begin.  There are single functions for every
access method in here.  I don't like them being in there to begin
with.  selfuncs.s is quite long with all kinds of dependencies and
dependents.  I think it would be better to have the access method
selectivity estimation functions under src/access.  Maybe we should
start doing so by moving this to src/access/brin/brin_selfuncs.c.
What do you think?


brin-correlation-v5.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] BRIN cost estimate

2017-03-21 Thread Emre Hasegeli
> Not sure what you mean here. I'm not speaking of the brin index am, I
> mean the get_index_stats_hook call which you've added.

I see.  Actually this part was from Alvaro.  I haven't noticed the
get_index_stats_hook call before, but it is still the same coding as
btcostestimate().  btcostestimate() also calls get_index_stats_hook,
and then Asserts nnumbers == 1.

> hmm, before what exactly? before your patch it didn't exist. You
> introduced it into brincostestimate().

I confused by looking at my changes on my repository I made on top of
Alvaro's.  I will rename it on the next version.

> At the very least please write a comment to explain this in the code.
> Right now it looks broken. If I noticed this then one day in the
> future someone else will. If you write a comment then person of the
> future will likely read it, and then not raise any questions about the
> otherwise questionable code.

Will do.

> I do strongly agree that the estimates need improved here. I've
> personally had issues with bad brin estimates before, and I'd like to
> see it improved. I think the patch is not quite complete without it
> also considering stats on expression indexes. If you have time to go
> do that I'd suggest you go ahead with that.

I will look into it this week.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] BRIN cost estimate

2017-03-21 Thread David Rowley
On 17 March 2017 at 23:50, Emre Hasegeli  wrote:
>> 1.
>>
>> + Assert(nnumbers == 1);
>>
>> I think its a bad idea to Assert() this. The stat tuple can come from
>> a plugin which could do anything. Seems like if we need to be certain
>> of that then it should be an elog(ERROR), maybe mention that we
>> expected a 1 element array, but got  elements.
>
> But it is not coming from a plugin which can do anything.  It is the
> plugin we know.  Assert() is exact same coding with btcostestimate().

Not sure what you mean here. I'm not speaking of the brin index am, I
mean the get_index_stats_hook call which you've added. An extension
can be loaded which sets this hook and provides its own tuple, which
may cause the Assert to fail. Asserts are normally used to verify in
assert enabled builds that would cause some following code to crash or
not do what it's meant to. Normally they're used to help track down
bugs in the software, they're not meant to track bugs in some software
we have no control over.

>> 2.
>>
>> + Assert(varCorrelation >= 0 && varCorrelation <= 1);
>>
>> same goes for that. I don't think we want to Assert() that either.
>> elog(ERROR) seems better, or perhaps we should fall back on the old
>> estimation method in this case?
>
> Again the correlation value is expected to be in this range.  I don't
> think we even need to bother with Assert().  Garbage in garbage out.

That's fine. Let's just not garbage crash.

>> The Asserted range also seems to contradict the range mentioned in
>> pg_statistic.h:
>
> We are using Abs() of the value.

I missed that.

>> 3.
>>
>> + numBlocks = 1 + baserel->pages / BrinGetPagesPerRange(indexRel);
>>
>> should numBlocks be named numRanges? After all we call the option
>> "pages_per_range".
>
> It was named this way before.

hmm, before what exactly? before your patch it didn't exist. You
introduced it into brincostestimate().

Here's the line of code:

+ double numBlocks;

If we want to have a variable which stores the number of ranges, then
I think numRanges is better than numBlocks. I can't imagine many
people would disagree there.

>> 6.
>>
>> Might want to consider not applying this estimate when the statistics
>> don't contain any STATISTIC_KIND_CORRELATION array.
>
> I am not sure what would be better than using the value as 0 in this case.

At the very least please write a comment to explain this in the code.
Right now it looks broken. If I noticed this then one day in the
future someone else will. If you write a comment then person of the
future will likely read it, and then not raise any questions about the
otherwise questionable code.

> I can look into supporting expression indexes, if you think something
> very much incomplete like this has a chance to get committed.

I do strongly agree that the estimates need improved here. I've
personally had issues with bad brin estimates before, and I'd like to
see it improved. I think the patch is not quite complete without it
also considering stats on expression indexes. If you have time to go
do that I'd suggest you go ahead with that.

I've not looked at the new patch yet, but thanks for making some of
those changes.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] BRIN cost estimate

2017-03-17 Thread Emre Hasegeli
> 1.
>
> + Assert(nnumbers == 1);
>
> I think its a bad idea to Assert() this. The stat tuple can come from
> a plugin which could do anything. Seems like if we need to be certain
> of that then it should be an elog(ERROR), maybe mention that we
> expected a 1 element array, but got  elements.

But it is not coming from a plugin which can do anything.  It is the
plugin we know.  Assert() is exact same coding with btcostestimate().

> 2.
>
> + Assert(varCorrelation >= 0 && varCorrelation <= 1);
>
> same goes for that. I don't think we want to Assert() that either.
> elog(ERROR) seems better, or perhaps we should fall back on the old
> estimation method in this case?

Again the correlation value is expected to be in this range.  I don't
think we even need to bother with Assert().  Garbage in garbage out.

> The Asserted range also seems to contradict the range mentioned in
> pg_statistic.h:

We are using Abs() of the value.

> 3.
>
> + numBlocks = 1 + baserel->pages / BrinGetPagesPerRange(indexRel);
>
> should numBlocks be named numRanges? After all we call the option
> "pages_per_range".

It was named this way before.

> 4.
>
> + * correlated table is copied 4 times, the correlation would be 0.25,
> + * although the index would be almost as good as the version on the
>
> What's meant by "table is copied 4 times" ?
>
> As best as I can work out, it means.
>
> INSERT INTO t SELECT * FROM t;
> INSERT INTO t SELECT * FROM t;
>
> but I'm uncertain, and it's not very clear to me what it means.

This was exactly what I meant.  I included the INSERT INTO statement
to make it more clear.

> 5.
>
> + blockSelectivity = (blockProportion + 1 - *indexCorrelation) / 2;
>
> I missed the comment that explains why we divide by two here.

To get the arithmetic mean.  The paragraph explaining this had a word
missing.  I fixed it.

> 6.
>
> Might want to consider not applying this estimate when the statistics
> don't contain any STATISTIC_KIND_CORRELATION array.

I am not sure what would be better than using the value as 0 in this case.

> 7.
>
> + blockProportion = (double) BrinGetPagesPerRange(indexRel) /
> + baserel->pages;
>
> Perhaps the blockProportion should also be clamped to the number of
> pages in the relation. Even a 1 page relation is estimated to have 128
> pages with the default pages_per_range, by the method used in the
> patch.

I have failed to consider the case when there are less pages than
pages_per_range.  New version considers for this.

> Good news is, I'm seeing much better plans coming out in cases when
> the index is unlikely to help. So +1 for fixing this issue.

It is also useful when same columns are indexed with different access
methods.  If we let HOT-updates with BRIN indexes, people can freely
spread BRIN indexes around.  I have also noticed while testing again
that the patch positively effects selecting parallel bitmap index
scans.

The most doubtful part of the patch is the "(1 + logical_selectivity)
* (1 + physical_selectivity) - 1" equation I made up.
logical_selectivity is the value btree would return.  I am sure BRIN
should always return something greater.  physical_selectivity is the
value calculated to fix the logical_selectivity using correlation and
pages_per_range.  I had no idea how to combine them together, and
picked the current one after random tests with different datasets.  We
can try to make an empirical formula using the actual values, but it
would be too much tied with the tests and datasets we would use.

One of the things the patch makes worse is the case when the
selectivity is correctly estimated as 0.  For example, searching for
id = 101 where ids are changing between 0 and 100.  That is bad, but I
don't see a way to improve it without pushing down the job to the
opclasses and going through a lot more complication.  The adjustment
would be useful when searching for id = 101 where ids are changing
between 0 and 100 or 102 and 200.

I can look into supporting expression indexes, if you think something
very much incomplete like this has a chance to get committed.


brin-correlation-v4.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] BRIN cost estimate

2017-03-13 Thread Emre Hasegeli
> Will Emre be around to make the required changes to the patch? I see
> it's been a while since it was originally posted.

I am around.  I can post an update in a few days.  Thank you for
picking this up.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] BRIN cost estimate

2017-03-12 Thread David Rowley
On 2 March 2017 at 04:40, Alvaro Herrera  wrote:
> Alvaro Herrera wrote:
>
>> I have added this patch to the commitfest, which I've been intending to
>> get in for a long time.  I'll be submitting an updated patch, if needed.
>
> Here is Emre's patch rebased to current sources.

Looking over this now, and have noticed a couple of things:

1.

+ Assert(nnumbers == 1);

I think its a bad idea to Assert() this. The stat tuple can come from
a plugin which could do anything. Seems like if we need to be certain
of that then it should be an elog(ERROR), maybe mention that we
expected a 1 element array, but got  elements.

2.

+ Assert(varCorrelation >= 0 && varCorrelation <= 1);

same goes for that. I don't think we want to Assert() that either.
elog(ERROR) seems better, or perhaps we should fall back on the old
estimation method in this case?

The Asserted range also seems to contradict the range mentioned in
pg_statistic.h:

/*
 * A "correlation" slot describes the correlation between the physical order
 * of table tuples and the ordering of data values of this column, as seen
 * by the "<" operator identified by staop.  (As with the histogram, more
 * than one entry could theoretically appear.) stavalues is not used and
 * should be NULL.  stanumbers contains a single entry, the correlation
 * coefficient between the sequence of data values and the sequence of
 * their actual tuple positions.  The coefficient ranges from +1 to -1.
 */
#define STATISTIC_KIND_CORRELATION 3

3.

+ numBlocks = 1 + baserel->pages / BrinGetPagesPerRange(indexRel);

should numBlocks be named numRanges? After all we call the option
"pages_per_range".

4.

+ * correlated table is copied 4 times, the correlation would be 0.25,
+ * although the index would be almost as good as the version on the

What's meant by "table is copied 4 times" ?

As best as I can work out, it means.

INSERT INTO t SELECT * FROM t;
INSERT INTO t SELECT * FROM t;

but I'm uncertain, and it's not very clear to me what it means.

5.

+ blockSelectivity = (blockProportion + 1 - *indexCorrelation) / 2;

I missed the comment that explains why we divide by two here.

6.

Might want to consider not applying this estimate when the statistics
don't contain any STATISTIC_KIND_CORRELATION array.

In this case the estimation is still applied the same way, only the
*indexCorrelation is 0.

Consider:

CREATE TABLE r2 (values INT NOT NULL);
INSERT INTO r2 VALUES(1);
ANALYZE r2;
\x on
SELECT * FROM pg_statistic WHERE starelid = (SELECT oid FROM pg_class
WHERE relname = 'r2');

Notice the lack of any stakind being set to 3.

7.

+ blockProportion = (double) BrinGetPagesPerRange(indexRel) /
+ baserel->pages;

Perhaps the blockProportion should also be clamped to the number of
pages in the relation. Even a 1 page relation is estimated to have 128
pages with the default pages_per_range, by the method used in the
patch.

blockProportion = Max(blockProportion, baserel->relpages);

during my test the selectivity was set to 64.5, then clamped to 1 by
CLAMP_PROBABILITY(). This does not seem very nice.

Good news is, I'm seeing much better plans coming out in cases when
the index is unlikely to help. So +1 for fixing this issue.

Will Emre be around to make the required changes to the patch? I see
it's been a while since it was originally posted.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] BRIN cost estimate

2017-03-01 Thread Alvaro Herrera
Alvaro Herrera wrote:

> I have added this patch to the commitfest, which I've been intending to
> get in for a long time.  I'll be submitting an updated patch, if needed.

Here is Emre's patch rebased to current sources.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index 8b05e8f..f462436 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -100,8 +100,10 @@
 #include 
 #include 
 
+#include "access/brin.h"
 #include "access/gin.h"
 #include "access/htup_details.h"
+#include "access/reloptions.h"
 #include "access/sysattr.h"
 #include "catalog/index.h"
 #include "catalog/pg_am.h"
@@ -7541,14 +7543,21 @@ brincostestimate(PlannerInfo *root, IndexPath *path, 
double loop_count,
 {
IndexOptInfo *index = path->indexinfo;
List   *indexQuals = path->indexquals;
-   List   *indexOrderBys = path->indexorderbys;
double  numPages = index->pages;
double  numTuples = index->tuples;
+   RelOptInfo *baserel = index->rel;
List   *qinfos;
Costspc_seq_page_cost;
Costspc_random_page_cost;
double  qual_op_cost;
double  qual_arg_cost;
+   double  qualSelectivity;
+   double  blockProportion;
+   double  numBlocks;
+   double  blockSelectivity;
+   double  selec;
+   RelationindexRel;
+   VariableStatData vardata;
 
/* Do preliminary analysis of indexquals */
qinfos = deconstruct_indexquals(path);
@@ -7561,7 +7570,8 @@ brincostestimate(PlannerInfo *root, IndexPath *path, 
double loop_count,
/*
 * BRIN indexes are always read in full; use that as startup cost.
 *
-* XXX maybe only include revmap pages here?
+* XXX We should consider the revmap at seqpage cost, and regular pages 
at
+* random page cost.
 */
*indexStartupCost = spc_seq_page_cost * numPages * loop_count;
 
@@ -7572,24 +7582,190 @@ brincostestimate(PlannerInfo *root, IndexPath *path, 
double loop_count,
 */
*indexTotalCost = spc_random_page_cost * numPages * loop_count;
 
-   *indexSelectivity =
-   clauselist_selectivity(root, indexQuals,
-  
path->indexinfo->rel->relid,
-  JOIN_INNER, NULL);
-   *indexCorrelation = 1;
+   /*
+* Compute index correlation.
+*
+* Because we can use all index quals equally when scanning, we can use
+* the largest correlation (in absolute value) among columns used by the
+* query. Start at zero, the worst possible case.
+*/
+   *indexCorrelation = 0;
+
+   {
+   RangeTblEntry *rte = planner_rt_fetch(index->rel->relid, root);
+   Oid relid;
+   ListCell   *cell;
+
+   Assert(rte->rtekind == RTE_RELATION);
+   relid = rte->relid;
+   Assert(relid != InvalidOid);
+
+   foreach(cell, qinfos)
+   {
+   IndexQualInfo *qinfo = (IndexQualInfo *) lfirst(cell);
+   AttrNumber  colnum = 
index->indexkeys[qinfo->indexcol];
+
+   if (colnum != 0)
+   {
+   /* Simple variable -- look to stats for the 
underlying table */
+   if (get_relation_stats_hook &&
+   (*get_relation_stats_hook) (root, rte, 
colnum, ))
+   {
+   /*
+* The hook took control of acquiring a 
stats tuple.  If
+* it did supply a tuple, it'd better 
have supplied a
+* freefunc.
+*/
+   if 
(HeapTupleIsValid(vardata.statsTuple) &&
+   !vardata.freefunc)
+   elog(ERROR, "no function 
provided to release variable stats with");
+   }
+   else
+   {
+   vardata.statsTuple = 
SearchSysCache3(STATRELATTINH,
+   
 ObjectIdGetDatum(relid),
+   
   Int16GetDatum(colnum),
+   /* XXX no inh */
+   

Re: [HACKERS] BRIN cost estimate

2017-03-01 Thread Alvaro Herrera
Emre Hasegeli wrote:
> > The patch is attached.
> 
> Now, it is actually attached.

I have added this patch to the commitfest, which I've been intending to
get in for a long time.  I'll be submitting an updated patch, if needed.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] BRIN cost estimate

2015-12-24 Thread Emre Hasegeli
> The patch is attached.

Now, it is actually attached.


brin-correlation-v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] BRIN cost estimate

2015-12-24 Thread Emre Hasegeli
> Somebody wrote to me a few days ago that the BRIN cost estimation is
> rather poor.  One immediately obvious issue which I think is easily
> fixed is the correlation estimate, which is currently hardcoded to 1.
>
> Since a BRIN scan always entails a scan of the relation in physical
> order, it's simple to produce a better estimate for that, per the
> attached patch.  (Note: trying to run this on expression indexes will
> cause a crash.  I need to complete that part ...)
>
> There are other improvements we've been discussing, but I'm unclear that
> I will have time to study them to get a useful patch quickly enough.  If
> somebody else wants to mess with this, please get in touch.
>
> Thoughts?

As we have discusses, the correlation is not used for bitmap index
scan cost estimation.  I think the best we can do in here is to somehow
increase the selectivity.  I suggest something like this:

/*
* Index selectivity is important for the planner to calculate the cost
* of the bitmap heap scan.  Unfortunately, we don't have a robust way to
* estimate selectivity of BRIN.  It can dependent on many things.  This
* is a long rationale about the incomplete calculation we have at the
* moment.
*
* Our starting point is that BRIN selectivity has to be less than the
* selectivity of the btree.  We are using a product of logical and
* physical selectivities to achieve this.  The equality of
*
* (1 + logical_selectivity) * (1 + physical_selectivity) - 1
*
* is used to make sure the result is not less than any of the values.
*
* The logical selectivity is calculated using the indexable expressions
* of the WHERE clause.  The physical selectivity is calculated using
* the block proportion and the maximum correlation.  The block
* proportion is a comparable value with selectivity.  It is the
* selectivity of the smallest unit of the index.  The final selectivity
* can never be less than that.
*
* Using the contrary of the correlation by subtracting it from 1 is not
* not really a comparable value with the selectivity.  It is just a
* value between 0 and 1.  On the other hand, it is the only value
* related to the BRIN quality, we have available right now.  We are
* using the arithmetic of it with the block proportion to normalise it.
* This part of the physical selectivity is likely to be more effective
* than the block proportion in many circumstances as there would be
* many blocks on big tables.
*
* Using the contrary of the correlation of a column as selectivity of
* the index is wrong in many ways.  First of all, it cannot be applied
* to all BRIN operator classes.  It makes sense for the main built-in
* operator class "minmax", and makes a little sense for the other one
* "inclusion".  It wouldn't probably make any sense for a bloom filter
* implementation, if there would be any.  Maybe, we should push down
* this function to the operator class, but there is not enough reason
* to do it right now.
*
* Second, correlation is not dependent to any indexed expression.  It
* probably doesn't make any sense for the complicated operators.  It
* would probably effect basic comparison operators differently than
* equality operator.  The effect would even differ by count of those
* expressions.  For example, x IN (10, 20, 30) would be effected from
* correlation more than x = 15, even when their selectivities are the
* same.
*
* Last but not least, the correlation is a single value for the whole
* range.  The indexed table can partly be very well correlated, but
* the correlation value can still be very low.  For example, if a
* perfectly correlated table is copied 4 times, the correlation would
* be 0.25, although the index would be almost as good as the version on
* the initial table.  Or the expression can match the better correlated
* part of the table.  It is not hard to imagine more scenarios where
* the correlation is a bad value to use as the selectivity.  We should
* probably improve this by collecting more statistics, one day.
*
* Another problem in here is that the caller assumes the selectivity
* by tuples.  It might have been better, if we had a way to return it
* as some number of pages.  On the other hand, even though we know about
* the index, it is not too much easier for us to estimate the number of
* matching pages then it is for the caller.  We are likely to make too
* much mistake by relying on the correlation, anyway.  We are at least
* not making it worse in here.
*/
blockSelectivity = (blockProportion + 1 - *indexCorrelation) / 2;
selec = (1 + qualSelectivity) * (1 + blockSelectivity) - 1;
CLAMP_PROBABILITY(selec);

The patch is attached.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers