Re: [HACKERS] ANALYZE versus expression indexes with nondefault opckeytype
On Sat, Jul 31, 2010 at 11:15 PM, Tom Lane t...@sss.pgh.pa.us wrote: Now, of the above the only cases where we'd be likely to be able to do anything very useful with stats on the expression value are the name case, which isn't that exciting in practice, and the tsvector cases. For tsvector it was only with 8.4 that we had non-toy stats code, so while the limitation is ancient it's only recently that it started to be meaningful. I don't think this can be claimed to be a corner case. If you set up an FTS index according to the first alternative offered in http://developer.postgresql.org/pgdocs/postgres/textsearch-tables.html#TEXTSEARCH-TABLES-INDEX you will find that the system fails to collect stats for it and so you get stupid default estimates for your FTS queries. If this were a documented limitation I'd expect to see a big red warning there to *not* do it that way. The only way that you actually get usable tsvector stats at the moment is to explicitly store the tsvector as an ordinary column, as in the second approach offered in the above documentation section. Yeah, maybe you're right. But I'd still prefer to see us break the ABI and do this just in 9.0 rather than changing 8.4. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ANALYZE versus expression indexes with nondefault opckeytype
Tom Lane wrote: Robert Haas writes: I guess I'd appreciate it if someone could explain in more detail in what cases we fail to collect stats. [detailed description] I don't think this can be claimed to be a corner case. If you set up an FTS index according to the first alternative offered in http://developer.postgresql.org/pgdocs/postgres/textsearch-tables.html#TEXTSEARCH-TABLES-INDEX you will find that the system fails to collect stats for it and so you get stupid default estimates for your FTS queries. Objection to a fix in 9.0 withdrawn. No opinion on backpatching. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ANALYZE versus expression indexes with nondefault opckeytype
Robert Haas robertmh...@gmail.com writes: On Sat, Jul 31, 2010 at 11:15 PM, Tom Lane t...@sss.pgh.pa.us wrote: I don't think this can be claimed to be a corner case. If you set up an FTS index according to the first alternative offered in http://developer.postgresql.org/pgdocs/postgres/textsearch-tables.html#TEXTSEARCH-TABLES-INDEX you will find that the system fails to collect stats for it and so you get stupid default estimates for your FTS queries. Yeah, maybe you're right. But I'd still prefer to see us break the ABI and do this just in 9.0 rather than changing 8.4. OK, I can live with that. I'll take a look at it shortly. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ANALYZE versus expression indexes with nondefault opckeytype
I wrote: Robert Haas robertmh...@gmail.com writes: Yeah, maybe you're right. But I'd still prefer to see us break the ABI and do this just in 9.0 rather than changing 8.4. OK, I can live with that. I'll take a look at it shortly. Proposed patch attached (compiles, untested as yet). regards, tom lane Index: src/backend/commands/analyze.c === RCS file: /cvsroot/pgsql/src/backend/commands/analyze.c,v retrieving revision 1.152 diff -c -r1.152 analyze.c *** src/backend/commands/analyze.c 26 Feb 2010 02:00:37 - 1.152 --- src/backend/commands/analyze.c 1 Aug 2010 19:56:12 - *** *** 92,98 AnlIndexData *indexdata, int nindexes, HeapTuple *rows, int numrows, MemoryContext col_context); ! static VacAttrStats *examine_attribute(Relation onerel, int attnum); static int acquire_sample_rows(Relation onerel, HeapTuple *rows, int targrows, double *totalrows, double *totaldeadrows); static double random_fract(void); --- 92,99 AnlIndexData *indexdata, int nindexes, HeapTuple *rows, int numrows, MemoryContext col_context); ! static VacAttrStats *examine_attribute(Relation onerel, int attnum, ! Node *index_expr); static int acquire_sample_rows(Relation onerel, HeapTuple *rows, int targrows, double *totalrows, double *totaldeadrows); static double random_fract(void); *** *** 339,345 (errcode(ERRCODE_UNDEFINED_COLUMN), errmsg(column \%s\ of relation \%s\ does not exist, col, RelationGetRelationName(onerel; ! vacattrstats[tcnt] = examine_attribute(onerel, i); if (vacattrstats[tcnt] != NULL) tcnt++; } --- 340,346 (errcode(ERRCODE_UNDEFINED_COLUMN), errmsg(column \%s\ of relation \%s\ does not exist, col, RelationGetRelationName(onerel; ! vacattrstats[tcnt] = examine_attribute(onerel, i, NULL); if (vacattrstats[tcnt] != NULL) tcnt++; } *** *** 353,359 tcnt = 0; for (i = 1; i = attr_cnt; i++) { ! vacattrstats[tcnt] = examine_attribute(onerel, i); if (vacattrstats[tcnt] != NULL) tcnt++; } --- 354,360 tcnt = 0; for (i = 1; i = attr_cnt; i++) { ! vacattrstats[tcnt] = examine_attribute(onerel, i, NULL); if (vacattrstats[tcnt] != NULL) tcnt++; } *** *** 407,427 elog(ERROR, too few entries in indexprs list); indexkey = (Node *) lfirst(indexpr_item); indexpr_item = lnext(indexpr_item); - - /* - * Can't analyze if the opclass uses a storage type - * different from the expression result type. We'd get - * confused because the type shown in pg_attribute for - * the index column doesn't match what we are getting - * from the expression. Perhaps this can be fixed - * someday, but for now, punt. - */ - if (exprType(indexkey) != - Irel[ind]-rd_att-attrs[i]-atttypid) - continue; - thisdata-vacattrstats[tcnt] = ! examine_attribute(Irel[ind], i + 1); if (thisdata-vacattrstats[tcnt] != NULL) { tcnt++; --- 408,415 elog(ERROR, too few entries in indexprs list); indexkey = (Node *) lfirst(indexpr_item); indexpr_item = lnext(indexpr_item); thisdata-vacattrstats[tcnt] = ! examine_attribute(Irel[ind], i + 1, indexkey); if (thisdata-vacattrstats[tcnt] != NULL) { tcnt++; *** *** 802,810 * * Determine whether the column is analyzable; if so, create and initialize * a VacAttrStats struct for it. If not, return NULL. */ static VacAttrStats * ! examine_attribute(Relation onerel, int attnum) { Form_pg_attribute attr = onerel-rd_att-attrs[attnum - 1]; HeapTuple typtuple; --- 790,801 * * Determine whether the column is analyzable; if so, create and initialize * a VacAttrStats struct for it. If not, return NULL. + * + * If index_expr isn't NULL, then we're trying to analyze an expression index, + * and index_expr is the expression tree representing the column's data. */ static VacAttrStats * ! examine_attribute(Relation onerel, int attnum, Node *index_expr) { Form_pg_attribute attr = onerel-rd_att-attrs[attnum - 1]; HeapTuple typtuple; *** *** 827,835 stats = (VacAttrStats *) palloc0(sizeof(VacAttrStats)); stats-attr = (Form_pg_attribute) palloc(ATTRIBUTE_FIXED_PART_SIZE); memcpy(stats-attr, attr, ATTRIBUTE_FIXED_PART_SIZE); ! typtuple = SearchSysCache1(TYPEOID, ObjectIdGetDatum(attr-atttypid)); if (!HeapTupleIsValid(typtuple)) ! elog(ERROR, cache lookup failed for type %u, attr-atttypid); stats-attrtype = (Form_pg_type) palloc(sizeof(FormData_pg_type)); memcpy(stats-attrtype, GETSTRUCT(typtuple), sizeof(FormData_pg_type));
Re: [HACKERS] ANALYZE versus expression indexes with nondefault opckeytype
* Tom Lane (t...@sss.pgh.pa.us) wrote: After a bit of study of the code, it appears to me that it's not too difficult to fix: we just have to use the expression's result type rather than the index column's atttypid in the subsequent processing. ANALYZE never actually looks at the index column contents (indeed cannot, since our index AMs provide no API for extracting the contents), I don't think it'll be an issue, but just in case- is there any reason this will cause trouble for the possible index-only quals/scans work? So that seems to make it not practical to back-patch. But we could get it into 9.0.. I thought of an ugly hack that would avoid an API/ABI break: since VacAttrStats.attr is a copy of the pg_attribute data, we could scribble on it, changing atttypid, attlen, attbyval, etc to match the index expression result type. This seems pretty grotty, but it would allow the fix to be back-patched into existing branches. Yeah, that's rather nasty. :/ Comments? I'm not sure which way to jump here. For my 2c- let's get it fixed for 9.0 cleanly and encourage people who run into this to upgrade to that once it's released. Perhaps I've missed it, but I don't recall seeing many complaints about this. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] ANALYZE versus expression indexes with nondefault opckeytype
Stephen Frost sfr...@snowman.net writes: * Tom Lane (t...@sss.pgh.pa.us) wrote: After a bit of study of the code, it appears to me that it's not too difficult to fix: we just have to use the expression's result type rather than the index column's atttypid in the subsequent processing. ANALYZE never actually looks at the index column contents (indeed cannot, since our index AMs provide no API for extracting the contents), I don't think it'll be an issue, but just in case- is there any reason this will cause trouble for the possible index-only quals/scans work? Not that I can see. What ANALYZE and the planner want to do is provide stats on the behavior of the indexed expression. The fact that certain index opclasses don't actually store that value is of no interest. Even if it did become of interest later on, we're under no compulsion to not redefine the contents of pg_statistic when we need to. So that seems to make it not practical to back-patch. But we could get it into 9.0.. Hmm. I was thinking it was a bit late for 9.0 too, since it'd invalidate any testing anyone's done of their custom typanalyze functions. However, since we're still only in beta, maybe that's acceptable. Comments? I'm not sure which way to jump here. For my 2c- let's get it fixed for 9.0 cleanly and encourage people who run into this to upgrade to that once it's released. Perhaps I've missed it, but I don't recall seeing many complaints about this. It's only been very recently that we had any useful stats capability that could apply in such situations (in fact I think we still haven't actually shipped a non-bogus version of tsvector typanalyze :-(). So I'm not sure anyone would have realized they were missing something. But it would be nice to have this working in 8.4, and I'll be quite unhappy if we don't have it in 9.0. If people feel it's not too late for an ABI break in 9.0 then I'm willing to compromise on backpatching the clean fix. I'm not entirely convinced it's that much cleaner than the hack solution though. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ANALYZE versus expression indexes with nondefault opckeytype
On Jul 31, 2010, at 11:00 AM, Tom Lane t...@sss.pgh.pa.us wrote: It's only been very recently that we had any useful stats capability that could apply in such situations (in fact I think we still haven't actually shipped a non-bogus version of tsvector typanalyze :-(). So I'm not sure anyone would have realized they were missing something. But it would be nice to have this working in 8.4, and I'll be quite unhappy if we don't have it in 9.0. If people feel it's not too late for an ABI break in 9.0 then I'm willing to compromise on backpatching the clean fix. I'm not entirely convinced it's that much cleaner than the hack solution though. I think this whole discussion is starting with the wrong premise. This is not a bug fix; therefore, it's 9.1 material. If anyone else had suggested slipping this into 9.0, let alone 8.4, we would have punted it in a heartbeat. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ANALYZE versus expression indexes with nondefault opckeytype
Robert Haas robertmh...@gmail.com writes: I think this whole discussion is starting with the wrong premise. This is not a bug fix; therefore, it's 9.1 material. Failing to store stats isn't a bug? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ANALYZE versus expression indexes with nondefault opckeytype
On Sat, Jul 31, 2010 at 12:06 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: I think this whole discussion is starting with the wrong premise. This is not a bug fix; therefore, it's 9.1 material. Failing to store stats isn't a bug? Well, it kind of sounds more like you're removing a known limitation than fixing a bug. It's not as if the behavior fails to match the comment. I'm pretty hesitant to see us making any changes to 9.0 that aren't necessary to fix existing bugs or new regressions. What I want to do with 9.0 is get it stable and ship it. I'm not really terribly concerned about the possibility of an ABI break even at this late date, but I *am* concerned about the possibility either of (1) unforeseen consequences necessitating further patching or (2) getting distracted from the business of getting the release out the door. We've been in feature freeze for more than five months, so I think it's certainly time try to reduce to an absolute minimum the number of changes that need to be made before release. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ANALYZE versus expression indexes with nondefault opckeytype
Robert Haas 07/31/10 12:33 PM Tom Lane wrote: Robert Haas writes: I think this whole discussion is starting with the wrong premise. This is not a bug fix; therefore, it's 9.1 material. Failing to store stats isn't a bug? Well, it kind of sounds more like you're removing a known limitation than fixing a bug. It's operating as designed and documented. There is room for enhancement, but the only thing which could possibly justify this as 9.0 material is if there was a demonstrated performance regression in 9.0 for which this was the safest cure. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ANALYZE versus expression indexes with nondefault opckeytype
* Kevin Grittner (kevin.gritt...@wicourts.gov) wrote: Robert Haas 07/31/10 12:33 PM Tom Lane wrote: Failing to store stats isn't a bug? Well, it kind of sounds more like you're removing a known limitation than fixing a bug. It's operating as designed and documented. There is room for enhancement, but the only thing which could possibly justify this as 9.0 material is if there was a demonstrated performance regression in 9.0 for which this was the safest cure. I have to disagree with this, to be honest. The fact that we've documented what is completely unexpected and frustrating behaviour doesn't mean we get to say it's not a bug. Not collecting stats, at all, is a pretty bad bug, in my view. Stats are an important part of the system which needs to work at least decently. Perhaps before it was pretty rare that we'd have the situation described (before we brought in tsearch2), but it's not any longer and we need to support it as we would the other types. The only reason I'm against backpatching it to the beginning is that it's either an ABI change or some rather grotty code, and even then it wouldn't be hard to push me to accepting the grotty code if we make the cleaner change for 9.0 and going forward, especially as we have people in the wild being affected by it. Certain other databases have done a very good job of documenting their bugs and in some cases even calling them features. I'd rather we not go down that path. I don't see the lack of stats collecting to be a simple 'limitation'. Thanks, Stephen signature.asc Description: Digital signature
Re: [HACKERS] ANALYZE versus expression indexes with nondefault opckeytype
On Sat, Jul 31, 2010 at 9:16 PM, Stephen Frost sfr...@snowman.net wrote: * Kevin Grittner (kevin.gritt...@wicourts.gov) wrote: Robert Haas 07/31/10 12:33 PM Tom Lane wrote: Failing to store stats isn't a bug? Well, it kind of sounds more like you're removing a known limitation than fixing a bug. It's operating as designed and documented. There is room for enhancement, but the only thing which could possibly justify this as 9.0 material is if there was a demonstrated performance regression in 9.0 for which this was the safest cure. I have to disagree with this, to be honest. The fact that we've documented what is completely unexpected and frustrating behaviour doesn't mean we get to say it's not a bug. Not collecting stats, at all, is a pretty bad bug, in my view. I guess I'd appreciate it if someone could explain in more detail in what cases we fail to collect stats. Do we have a typanalyze function here that can't possibly work for anything, ever? Or is it just some subset of the cases? (Apologies if this has been discussed on the original thread; I was unable to find it in the archives.) -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ANALYZE versus expression indexes with nondefault opckeytype
Robert Haas robertmh...@gmail.com writes: On Sat, Jul 31, 2010 at 9:16 PM, Stephen Frost sfr...@snowman.net wrote: * Kevin Grittner (kevin.gritt...@wicourts.gov) wrote: Robert Haas 07/31/10 12:33 PM Tom Lane wrote: Failing to store stats isn't a bug? Well, it kind of sounds more like you're removing a known limitation than fixing a bug. It's operating as designed and documented. I have to disagree with this, to be honest. The fact that we've documented what is completely unexpected and frustrating behaviour doesn't mean we get to say it's not a bug. Not collecting stats, at all, is a pretty bad bug, in my view. I'm a bit bemused by the claim that this behavior is documented. One comment buried deep in the bowels of the source is not user-visible documentation in my book. I guess I'd appreciate it if someone could explain in more detail in what cases we fail to collect stats. Do we have a typanalyze function here that can't possibly work for anything, ever? Or is it just some subset of the cases? ANALYZE normally collects stats for any expression that there is an expression index for. However, it will punt and fail to collect stats if the expression index uses an opclass whose opckeytype (ie, storage datatype) is different from the actual expression datatype. A quick look into the system catalogs shows that that applies to these opclasses: amname | opcname | opcintype | opckeytype +--+---+- btree | name_ops | name | cstring gist | point_ops| point | box gist | poly_ops | polygon | box gist | circle_ops | circle| box gin| _int4_ops| integer[] | integer gin| _text_ops| text[]| text gin| _abstime_ops | abstime[] | abstime gin| _bit_ops | bit[] | bit gin| _bool_ops| boolean[] | boolean gin| _bpchar_ops | character[] | character gin| _bytea_ops | bytea[] | bytea gin| _char_ops| char[] | char gin| _cidr_ops| cidr[]| cidr gin| _date_ops| date[]| date gin| _float4_ops | real[]| real gin| _float8_ops | double precision[]| double precision gin| _inet_ops| inet[]| inet gin| _int2_ops| smallint[]| smallint gin| _int8_ops| bigint[] | bigint gin| _interval_ops| interval[]| interval gin| _macaddr_ops | macaddr[] | macaddr gin| _name_ops| name[]| name gin| _numeric_ops | numeric[] | numeric gin| _oid_ops | oid[] | oid gin| _oidvector_ops | oidvector[] | oidvector gin| _time_ops| time without time zone[] | time without time zone gin| _timestamptz_ops | timestamp with time zone[]| timestamp with time zone gin| _timetz_ops | time with time zone[] | time with time zone gin| _varbit_ops | bit varying[] | bit varying gin| _varchar_ops | character varying[] | character varying gin| _timestamp_ops | timestamp without time zone[] | timestamp without time zone gin| _money_ops | money[] | money gin| _reltime_ops | reltime[] | reltime gin| _tinterval_ops | tinterval[] | tinterval gist | tsvector_ops | tsvector | gtsvector gin| tsvector_ops | tsvector | text gist | tsquery_ops | tsquery | bigint (37 rows) Now, of the above the only cases where we'd be likely to be able to do anything very useful with stats on the expression value are the name case, which isn't that exciting in practice, and the tsvector cases. For tsvector it was only with 8.4 that we had non-toy stats code, so while the limitation is ancient it's only recently that it started to be meaningful. I don't think this can be claimed to be a corner case. If you set up an FTS index according to the first alternative offered in http://developer.postgresql.org/pgdocs/postgres/textsearch-tables.html#TEXTSEARCH-TABLES-INDEX you will find that the system fails to collect stats for it and so you get stupid default estimates for your FTS queries. If this were a documented limitation I'd expect to see a big red warning there to *not* do it that
[HACKERS] ANALYZE versus expression indexes with nondefault opckeytype
I've been poking at the FTS problem example recently submitted by Artur Dabrowski. It contains an index declared like so: CREATE INDEX idx_keywords_ger ON search_tab USING gin (to_tsvector('german'::regconfig, keywords)); which can be queried like so select * from search_tab where to_tsvector('german', keywords) @@ to_tsquery('german', 'whatever'); which is all fine and matches suggestions in our documentation. But I was dismayed to discover that ANALYZE wasn't storing any statistics for this expression index, which rather hobbles the planner's ability to produce useful estimates for the selectivity of such WHERE clauses. The reason for that turns out to be this bit in analyze.c: /* * Can't analyze if the opclass uses a storage type * different from the expression result type. We'd get * confused because the type shown in pg_attribute for * the index column doesn't match what we are getting * from the expression. Perhaps this can be fixed * someday, but for now, punt. */ if (exprType(indexkey) != Irel[ind]-rd_att-attrs[i]-atttypid) continue; Since a tsvector index does have a storage type different from tsvector (in fact, it's shown as TEXT), this code decides to punt and not analyze the column. I think it's past time we did something about this. After a bit of study of the code, it appears to me that it's not too difficult to fix: we just have to use the expression's result type rather than the index column's atttypid in the subsequent processing. ANALYZE never actually looks at the index column contents (indeed cannot, since our index AMs provide no API for extracting the contents), so atttypid isn't really all that relevant to it. However, this'll imply an API/ABI break for custom typanalyze functions: we'll need to store the actual type OID in struct VacAttrStats, and use that rather than looking to attr-atttypid or any of the other type-dependent fields of the Form_pg_attribute. So that seems to make it not practical to back-patch. I thought of an ugly hack that would avoid an API/ABI break: since VacAttrStats.attr is a copy of the pg_attribute data, we could scribble on it, changing atttypid, attlen, attbyval, etc to match the index expression result type. This seems pretty grotty, but it would allow the fix to be back-patched into existing branches. Comments? I'm not sure which way to jump here. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers