Re: [HACKERS] gincostestimate and hypothetical indexes

2015-12-01 Thread Tom Lane
Julien Rouhaud  writes:
> On 01/12/2015 00:37, Tom Lane wrote:
>> Maybe we could do something along the lines of pretending that 90% of the
>> index size given by the plugin is entry pages?  Don't know what a good
>> ratio would be exactly, but we could probably come up with one with a bit
>> of testing.

> I used zero values because gincostestimate already handle empty
> statistics, and pretend that 100% of the pages are entry pages:

Yeah, but that code is pretty bogus.  It was never intended to do more
than minimally hold the fort until someone had vacuumed.  If we're trying
to support hypothetical-index plugins with this code, it should try to
do something a bit more realistic.

I did a bit of investigation using some sample data I had laying around
(JSONB and TSVECTOR data).  It looks like assuming that entry pages are
90% of the index is not too awful; I saw actual values ranging from
80% to 94%.  The real weak spot of the current code, however, is

numEntries = numTuples; /* bogus, but no other info available */

which is just as bogus as it says, because numTuples is going to be the
heap tuple count not anything specific to GIN.  Often you'd expect the
number of entries to be some multiple of the number of tuples, because
the whole point of GIN is to be able to index components of the indexed
column's values.  But on the other hand if there are not a lot of
distinct component values, you could get many fewer entries than tuples.

Based on what I saw in this small sample, I'm inclined to propose setting
numEntries to 100 times numEntryPages, that is, assume 100 entries per
entry page.  That could easily be off by a factor of 2, but it seems more
robust than just blindly using the heap tuple count.

(Of course, all of this is predicated on the assumption that the
hypothetical-index plugin gives us some realistic value for the index's
size in pages, but if it doesn't it's the plugin's fault.)

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] gincostestimate and hypothetical indexes

2015-11-30 Thread Julien Rouhaud
Hello,

I figured out that it's not possible to use a hypothetical gin index, as
the gincostestimate function try to retrieve some statistical data from
the index meta page.

Attached patch fixes this. I believe this should be back-patched as was
a2095f7fb5a57ea1794f25d029756d9a140fd429.

Regards.
-- 
Julien Rouhaud
http://dalibo.com - http://dalibo.org
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index 37fad86..24ffa3a 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -101,6 +101,7 @@
 #include 
 
 #include "access/gin.h"
+#include "access/gin_private.h"
 #include "access/htup_details.h"
 #include "access/sysattr.h"
 #include "catalog/index.h"
@@ -7260,11 +7261,25 @@ gincostestimate(PG_FUNCTION_ARGS)
 	qinfos = deconstruct_indexquals(path);
 
 	/*
-	 * Obtain statistic information from the meta page
+	 * Obtain statistic information from the meta page if the index is not
+	 * hypothetical. Otherwise set all the counters to 0, as it would be for an
+	 * index that never got VACUUMed.
 	 */
-	indexRel = index_open(index->indexoid, AccessShareLock);
-	ginGetStats(indexRel, );
-	index_close(indexRel, AccessShareLock);
+	if (!index->hypothetical)
+	{
+		indexRel = index_open(index->indexoid, AccessShareLock);
+		ginGetStats(indexRel, );
+		index_close(indexRel, AccessShareLock);
+	}
+	else
+	{
+		ginStats.nPendingPages = 0;
+		ginStats.nTotalPages = 0;
+		ginStats.nEntryPages = 0;
+		ginStats.nDataPages = 0;
+		ginStats.nEntries = 0;
+		ginStats.ginVersion = GIN_CURRENT_VERSION;
+	}
 
 	numEntryPages = ginStats.nEntryPages;
 	numDataPages = ginStats.nDataPages;

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] gincostestimate and hypothetical indexes

2015-11-30 Thread Tom Lane
Julien Rouhaud  writes:
> I figured out that it's not possible to use a hypothetical gin index, as
> the gincostestimate function try to retrieve some statistical data from
> the index meta page.

Good point.

> Attached patch fixes this. I believe this should be back-patched as was
> a2095f7fb5a57ea1794f25d029756d9a140fd429.

I don't much care for this patch though.  The core problem is that just
returning all zeroes seems quite useless: it will probably result in silly
cost estimates.  The comment in the patch claiming that this would be the
situation in a never-vacuumed index is wrong, because ginbuild() updates
those stats too.  But I'm not sure exactly what to do instead :-(.

Ideally we'd put it on the head of the hypothetical-index plugin to invent
some numbers, but I dunno if we want to create such an API or not ... and
we certainly couldn't back-patch such a change.

Maybe we could do something along the lines of pretending that 90% of the
index size given by the plugin is entry pages?  Don't know what a good
ratio would be exactly, but we could probably come up with one with a bit
of testing.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] gincostestimate and hypothetical indexes

2015-11-30 Thread Julien Rouhaud
On 01/12/2015 00:37, Tom Lane wrote:
> Julien Rouhaud  writes:
>> I figured out that it's not possible to use a hypothetical gin index, as
>> the gincostestimate function try to retrieve some statistical data from
>> the index meta page.
> 
> Good point.
> 
>> Attached patch fixes this. I believe this should be back-patched as was
>> a2095f7fb5a57ea1794f25d029756d9a140fd429.
> 
> I don't much care for this patch though.  The core problem is that just
> returning all zeroes seems quite useless: it will probably result in silly
> cost estimates.  The comment in the patch claiming that this would be the
> situation in a never-vacuumed index is wrong, because ginbuild() updates
> those stats too.  But I'm not sure exactly what to do instead :-(.
> 

Oops, it looks that this is only true for pre 9.1 indexes (per comment
shown below).

> Ideally we'd put it on the head of the hypothetical-index plugin to invent
> some numbers, but I dunno if we want to create such an API or not ... and
> we certainly couldn't back-patch such a change.
> 
> Maybe we could do something along the lines of pretending that 90% of the
> index size given by the plugin is entry pages?  Don't know what a good
> ratio would be exactly, but we could probably come up with one with a bit
> of testing.
> 

I used zero values because gincostestimate already handle empty
statistics, and pretend that 100% of the pages are entry pages:


/*
* nPendingPages can be trusted, but the other fields are as of the last
* VACUUM. Scale them by the ratio numPages / nTotalPages to account for
* growth since then. If the fields are zero (implying no VACUUM at all,
* and an index created pre-9.1), assume all pages are entry pages.
*/
if (ginStats.nTotalPages == 0 || ginStats.nEntryPages == 0)
{
numEntryPages = numPages;
numDataPages = 0;
numEntries = numTuples; /* bogus, but no other info available */
}


But I don't have any clue of what would be a better ratio either.

>   regards, tom lane
> 


-- 
Julien Rouhaud
http://dalibo.com - http://dalibo.org


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers