Re: [HACKERS] [sqlsmith] Failed assertion in joinrels.c
On Fri, Sep 9, 2016 at 6:51 PM, Tom Lane wrote: > Pushed with cosmetic adjustments --- mostly, I thought we needed some > comments about the topic. Okay, Thanks. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [sqlsmith] Failed assertion in joinrels.c
Dilip Kumar writes: > Seems like a better option, done it this way.. > attaching the modified patch.. Pushed with cosmetic adjustments --- mostly, I thought we needed some comments about the topic. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [sqlsmith] Failed assertion in joinrels.c
On Thu, Sep 8, 2016 at 7:21 PM, Tom Lane wrote: > We could make it work like that without breaking the ABI if we were > to add a NOERROR bit to the allowed "flags". However, after looking > around a bit I'm no longer convinced what I said above is a good idea. > In particular, if we put the responsibility of reporting the error on > callers then we'll lose the ability to distinguish "no such pg_type OID" > from "type exists but it's only a shell". So I'm now thinking it's okay > to promote lookup_type_cache's elog to a full ereport, especially since > it's already using ereport for some of the related error cases such as > the shell-type failure. Okay.. > > That still leaves us with what to do for domain_in. A really simple > fix would be to move the InitDomainConstraintRef call before the > getBaseTypeAndTypmod call, because the former fetches the typcache > entry and would then benefit from lookup_type_cache's ereport. > But that seems a bit magic. > > I'm tempted to propose that domain_state_setup start with > > typentry = lookup_type_cache(domainType, TYPECACHE_DOMAIN_INFO); > if (typentry->typtype != TYPTYPE_DOMAIN) > ereport(ERROR, > (errcode(ERRCODE_DATATYPE_MISMATCH), > errmsg("type %s is not a domain", > format_type_be(domainType; > > removing the need to verify that after getBaseTypeAndTypmod. Seems like a better option, done it this way.. attaching the modified patch.. > > The cache loading is basically free because InitDomainConstraintRef > would do it anyway; so the extra cost here is only one dynahash search. > You could imagine buying back those cycles by teaching the typcache > to be able to cache the result of getBaseTypeAndTypmod, but I'm doubtful > that we really care. This whole setup sequence only happens once per > query anyway. Agreed. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com cache_lookup_failure_v5.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [sqlsmith] Failed assertion in joinrels.c
Dilip Kumar writes: > On Thu, Sep 8, 2016 at 2:23 AM, Tom Lane wrote: >> In the case of record_in, it seems like it'd be easy enough to use >> lookup_rowtype_tupdesc_noerror() instead of lookup_rowtype_tupdesc() >> and then throw a user-facing error right there. > Actually when we are passing invalid type to "record_in", error is > thrown in "lookup_type_cache" function, and this function is not > taking "no_error" input (it's only limited to > lookup_rowtype_tupdesc_internal). Ah, should have dug a bit deeper. > One option is to pass "no_error" parameter to "lookup_type_cache" > function also, and throw error only in record_in function. > But problem is, "lookup_type_cache" has lot of references. And also > will it be good idea to throw one generic error instead of multiple > specific errors. ? We could make it work like that without breaking the ABI if we were to add a NOERROR bit to the allowed "flags". However, after looking around a bit I'm no longer convinced what I said above is a good idea. In particular, if we put the responsibility of reporting the error on callers then we'll lose the ability to distinguish "no such pg_type OID" from "type exists but it's only a shell". So I'm now thinking it's okay to promote lookup_type_cache's elog to a full ereport, especially since it's already using ereport for some of the related error cases such as the shell-type failure. That still leaves us with what to do for domain_in. A really simple fix would be to move the InitDomainConstraintRef call before the getBaseTypeAndTypmod call, because the former fetches the typcache entry and would then benefit from lookup_type_cache's ereport. But that seems a bit magic. I'm tempted to propose that domain_state_setup start with typentry = lookup_type_cache(domainType, TYPECACHE_DOMAIN_INFO); if (typentry->typtype != TYPTYPE_DOMAIN) ereport(ERROR, (errcode(ERRCODE_DATATYPE_MISMATCH), errmsg("type %s is not a domain", format_type_be(domainType; removing the need to verify that after getBaseTypeAndTypmod. The cache loading is basically free because InitDomainConstraintRef would do it anyway; so the extra cost here is only one dynahash search. You could imagine buying back those cycles by teaching the typcache to be able to cache the result of getBaseTypeAndTypmod, but I'm doubtful that we really care. This whole setup sequence only happens once per query anyway. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [sqlsmith] Failed assertion in joinrels.c
On Thu, Sep 8, 2016 at 2:23 AM, Tom Lane wrote: > I really don't like this solution. Changing this one function, out of the > dozens just like it in lsyscache.c, seems utterly random and arbitrary. > > I'm actually not especially convinced that passing domain_in a random > OID needs to not produce an XX000 error. That isn't a user-facing > function and people shouldn't be calling it by hand at all. The same > goes for record_in. But if we do want those cases to avoid this, > I think it's appropriate to fix it in those functions, not decree > that essentially-random other parts of the system should bear the > responsibility. (Thought experiment: if we changed the order of > operations in domain_in so that getBaseTypeAndTypmod were no longer > the first place to fail, would we undo this change and then change > wherever the failure had moved to? That's pretty messy.) > > In the case of record_in, it seems like it'd be easy enough to use > lookup_rowtype_tupdesc_noerror() instead of lookup_rowtype_tupdesc() > and then throw a user-facing error right there. Actually when we are passing invalid type to "record_in", error is thrown in "lookup_type_cache" function, and this function is not taking "no_error" input (it's only limited to lookup_rowtype_tupdesc_internal). One option is to pass "no_error" parameter to "lookup_type_cache" function also, and throw error only in record_in function. But problem is, "lookup_type_cache" has lot of references. And also will it be good idea to throw one generic error instead of multiple specific errors. ? Another option is to do same for "record_in" also what you have suggested for domain_in (an extra syscache lookup). ? >Not sure about a > nice solution for domain_in. We might have to resort to an extra > syscache lookup there, but even if we did, it should happen only > once per query so that's not very awful. > >> 3. CheckFunctionValidatorAccess: This is being called from all >> language validator functions. > > This part seems reasonable, since the validator functions are documented > as something users might call, and CheckFunctionValidatorAccess seems > like an apropos place to handle it. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [sqlsmith] Failed assertion in joinrels.c
Dilip Kumar writes: > Basically this patch changes error at three places. > 1. getBaseTypeAndTypmod: This is being called from domain_in exposed > function (domain_in-> > domain_state_setup-> getBaseTypeAndTypmod). Though this function is > being called from many other places which are not exposed function, > but I don't this will have any imapct, will this ? I really don't like this solution. Changing this one function, out of the dozens just like it in lsyscache.c, seems utterly random and arbitrary. I'm actually not especially convinced that passing domain_in a random OID needs to not produce an XX000 error. That isn't a user-facing function and people shouldn't be calling it by hand at all. The same goes for record_in. But if we do want those cases to avoid this, I think it's appropriate to fix it in those functions, not decree that essentially-random other parts of the system should bear the responsibility. (Thought experiment: if we changed the order of operations in domain_in so that getBaseTypeAndTypmod were no longer the first place to fail, would we undo this change and then change wherever the failure had moved to? That's pretty messy.) In the case of record_in, it seems like it'd be easy enough to use lookup_rowtype_tupdesc_noerror() instead of lookup_rowtype_tupdesc() and then throw a user-facing error right there. Not sure about a nice solution for domain_in. We might have to resort to an extra syscache lookup there, but even if we did, it should happen only once per query so that's not very awful. > 3. CheckFunctionValidatorAccess: This is being called from all > language validator functions. This part seems reasonable, since the validator functions are documented as something users might call, and CheckFunctionValidatorAccess seems like an apropos place to handle it. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [sqlsmith] Failed assertion in joinrels.c
On Wed, Sep 7, 2016 at 8:52 AM, Haribabu Kommi wrote: > I reviewed and tested the patch. The changes are fine. > This patch provides better error message compared to earlier. > > Marked the patch as "Ready for committer" in commit-fest. Thanks for the review ! -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [sqlsmith] Failed assertion in joinrels.c
On Fri, Aug 26, 2016 at 7:11 PM, Dilip Kumar wrote: > I have changed this in attached patch.. > I reviewed and tested the patch. The changes are fine. This patch provides better error message compared to earlier. Marked the patch as "Ready for committer" in commit-fest. Regards, Hari Babu Fujitsu Australia
Re: [HACKERS] [sqlsmith] Failed assertion in joinrels.c
On Wed, Aug 17, 2016 at 6:10 PM, Robert Haas wrote: >> If you are making changes in plpgsql_validator(), then shouldn't we >> make changes in plperl_validator() or plpython_validator()? I see >> that you have made changes to function CheckFunctionValidatorAccess() >> which seems to be getting called from *_validator() (* indicates >> plpgsql/plperl/plpython) functions. Is there a reason for changing >> the *_validator() function? Yes you are right, making changes in CheckFunctionValidatorAccess is sufficient, no need to make changes in *_validator (* indicates plpgsql/plperl/plpython) functions. I have changed this in attached patch.. > > Yeah, when I glanced briefly at this patch, I found myself wondering > whether all of these call sites were actually reachable (and why the > patch contained no test cases to prove it). I have also added test cases to cover all my changes. Basically this patch changes error at three places. 1. getBaseTypeAndTypmod: This is being called from domain_in exposed function (domain_in-> domain_state_setup-> getBaseTypeAndTypmod). Though this function is being called from many other places which are not exposed function, but I don't this will have any imapct, will this ? 2. lookup_type_cache: This is being called from record_in (record_in->lookup_rowtype_tupdesc-> lookup_rowtype_tupdesc_internal->lookup_type_cache). 3. CheckFunctionValidatorAccess: This is being called from all language validator functions. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com cache_lookup_failure_v4.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [sqlsmith] Failed assertion in joinrels.c
On Wed, Aug 17, 2016 at 7:25 AM, Amit Kapila wrote: > On Wed, Aug 10, 2016 at 11:27 AM, Dilip Kumar wrote: >> On Wed, Aug 10, 2016 at 10:04 AM, Dilip Kumar wrote: >>> This seems better, after checking at other places I found that for >>> invalid type we are using ERRCODE_UNDEFINED_OBJECT and for invalid >>> functions we are using ERRCODE_UNDEFINED_FUNCTION. So I have done the >>> same way. >>> >>> Updated patch attached. >> >> I found some more places where we can get similar error and updated in >> my v3 patch. >> > > @@ -412,7 +412,9 @@ plpgsql_validator(PG_FUNCTION_ARGS) > /* Get the new function's pg_proc entry */ > tuple = SearchSysCache1(PROCOID, ObjectIdGetDatum(funcoid)); > if (!HeapTupleIsValid(tuple)) > - elog(ERROR, "cache lookup failed for function %u", funcoid); > + ereport(ERROR, > + (errcode(ERRCODE_UNDEFINED_FUNCTION), > + errmsg("function with OID %u does not exist", funcoid))); > > If you are making changes in plpgsql_validator(), then shouldn't we > make changes in plperl_validator() or plpython_validator()? I see > that you have made changes to function CheckFunctionValidatorAccess() > which seems to be getting called from *_validator() (* indicates > plpgsql/plperl/plpython) functions. Is there a reason for changing > the *_validator() function? Yeah, when I glanced briefly at this patch, I found myself wondering whether all of these call sites were actually reachable (and why the patch contained no test cases to prove it). -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [sqlsmith] Failed assertion in joinrels.c
On Wed, Aug 10, 2016 at 11:27 AM, Dilip Kumar wrote: > On Wed, Aug 10, 2016 at 10:04 AM, Dilip Kumar wrote: >> This seems better, after checking at other places I found that for >> invalid type we are using ERRCODE_UNDEFINED_OBJECT and for invalid >> functions we are using ERRCODE_UNDEFINED_FUNCTION. So I have done the >> same way. >> >> Updated patch attached. > > I found some more places where we can get similar error and updated in > my v3 patch. > @@ -412,7 +412,9 @@ plpgsql_validator(PG_FUNCTION_ARGS) /* Get the new function's pg_proc entry */ tuple = SearchSysCache1(PROCOID, ObjectIdGetDatum(funcoid)); if (!HeapTupleIsValid(tuple)) - elog(ERROR, "cache lookup failed for function %u", funcoid); + ereport(ERROR, + (errcode(ERRCODE_UNDEFINED_FUNCTION), + errmsg("function with OID %u does not exist", funcoid))); If you are making changes in plpgsql_validator(), then shouldn't we make changes in plperl_validator() or plpython_validator()? I see that you have made changes to function CheckFunctionValidatorAccess() which seems to be getting called from *_validator() (* indicates plpgsql/plperl/plpython) functions. Is there a reason for changing the *_validator() function? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [sqlsmith] Failed assertion in joinrels.c
On Wed, Aug 10, 2016 at 10:04 AM, Dilip Kumar wrote: > This seems better, after checking at other places I found that for > invalid type we are using ERRCODE_UNDEFINED_OBJECT and for invalid > functions we are using ERRCODE_UNDEFINED_FUNCTION. So I have done the > same way. > > Updated patch attached. I found some more places where we can get similar error and updated in my v3 patch. I don't claim that it fixes all such kind of error, but at least it fixes error reported by sqlsmith plus some additional what I found in similar area. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com cache_lookup_failure_v3.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [sqlsmith] Failed assertion in joinrels.c
On Tue, Aug 9, 2016 at 6:49 PM, Amit Kapila wrote: > Okay, then how about ERRCODE_UNDEFINED_OBJECT? It seems to be used in > almost similar cases. This seems better, after checking at other places I found that for invalid type we are using ERRCODE_UNDEFINED_OBJECT and for invalid functions we are using ERRCODE_UNDEFINED_FUNCTION. So I have done the same way. Updated patch attached. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com cache_lookup_failure_v2.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [sqlsmith] Failed assertion in joinrels.c
On Mon, Aug 8, 2016 at 6:00 PM, Dilip Kumar wrote: > On Mon, Aug 8, 2016 at 5:23 PM, Amit Kapila wrote: >> >> >> Did you consider to use ERRCODE_UNDEFINED_COLUMN with error messages >> like: "type %u does not exit" or "type id %u does not exit"? Errorcode >> ERRCODE_UNDEFINED_COLUMN seems to be used for syscache lookup failure >> cases at certain places in code. > > > But I think, OBJECT will be more appropriate than COLUMN at this place. > Okay, then how about ERRCODE_UNDEFINED_OBJECT? It seems to be used in almost similar cases. > However I can change error message to "type id %u does not exit" if this > seems better ? > I think that is better. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [sqlsmith] Failed assertion in joinrels.c
On Mon, Aug 8, 2016 at 5:23 PM, Amit Kapila wrote: > > > > Your other options and the option you choose are same. > > > Sorry typo, I meant ERRCODE_INVALID_OBJECT_DEFINITION. > Did you consider to use ERRCODE_UNDEFINED_COLUMN with error messages > like: "type %u does not exit" or "type id %u does not exit"? Errorcode > ERRCODE_UNDEFINED_COLUMN seems to be used for syscache lookup failure > cases at certain places in code. But I think, OBJECT will be more appropriate than COLUMN at this place. However I can change error message to "type id %u does not exit" if this seems better ? -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] [sqlsmith] Failed assertion in joinrels.c
On Mon, Aug 8, 2016 at 5:11 PM, Amit Kapila wrote: > On Mon, Aug 8, 2016 at 4:58 PM, Dilip Kumar wrote: >> >> On Wed, Aug 3, 2016 at 5:35 AM, Robert Haas wrote: >>> >>> I think that's just making life difficult. If nothing else, sqlsmith >>> hunts around for functions it can call that return internal errors, >>> and if we refuse to fix all of them to return user-facing errors, then >>> it's just crap for the people running sqlsmith to sift through and >>> it's a judgment call whether to fix each particular case. Even aside >>> from that, I think it's much better to have a clear and unambiguous >>> rule that elog is only for can't-happen things, not >>> we-don't-recommend-it things. >> >> >> I have changed for all these function to report more appropriate error with >> ereport. >> >> I used ERRCODE_WRONG_OBJECT_TYPE error code for reporting such errors. >> I think this is closest error code among all existing error codes, other >> options can be (ERRCODE_WRONG_OBJECT_TYPE). >> > > Your other options and the option you choose are same. > Did you consider to use ERRCODE_UNDEFINED_COLUMN with error messages like: "type %u does not exit" or "type id %u does not exit"? Errorcode ERRCODE_UNDEFINED_COLUMN seems to be used for syscache lookup failure cases at certain places in code. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [sqlsmith] Failed assertion in joinrels.c
On Mon, Aug 8, 2016 at 4:58 PM, Dilip Kumar wrote: > > On Wed, Aug 3, 2016 at 5:35 AM, Robert Haas wrote: >> >> I think that's just making life difficult. If nothing else, sqlsmith >> hunts around for functions it can call that return internal errors, >> and if we refuse to fix all of them to return user-facing errors, then >> it's just crap for the people running sqlsmith to sift through and >> it's a judgment call whether to fix each particular case. Even aside >> from that, I think it's much better to have a clear and unambiguous >> rule that elog is only for can't-happen things, not >> we-don't-recommend-it things. > > > I have changed for all these function to report more appropriate error with > ereport. > > I used ERRCODE_WRONG_OBJECT_TYPE error code for reporting such errors. > I think this is closest error code among all existing error codes, other > options can be (ERRCODE_WRONG_OBJECT_TYPE). > Your other options and the option you choose are same. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [sqlsmith] Failed assertion in joinrels.c
On Wed, Aug 3, 2016 at 5:35 AM, Robert Haas wrote: > I think that's just making life difficult. If nothing else, sqlsmith > hunts around for functions it can call that return internal errors, > and if we refuse to fix all of them to return user-facing errors, then > it's just crap for the people running sqlsmith to sift through and > it's a judgment call whether to fix each particular case. Even aside > from that, I think it's much better to have a clear and unambiguous > rule that elog is only for can't-happen things, not > we-don't-recommend-it things. > I have changed for all these function to report more appropriate error with ereport. I used ERRCODE_WRONG_OBJECT_TYPE error code for reporting such errors. I think this is closest error code among all existing error codes, other options can be (ERRCODE_WRONG_OBJECT_TYPE). But I think ERRCODE_WRONG_OBJECT_TYPE is better option. Patch attached for the same. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com cache_lookup_failure_v1.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [sqlsmith] Failed assertion in joinrels.c
On Tue, Aug 2, 2016 at 5:05 PM, Robert Haas wrote: > I think that's just making life difficult. If nothing else, sqlsmith > hunts around for functions it can call that return internal errors, > and if we refuse to fix all of them to return user-facing errors, then > it's just crap for the people running sqlsmith to sift through and > it's a judgment call whether to fix each particular case. Even aside > from that, I think it's much better to have a clear and unambiguous > rule that elog is only for can't-happen things, not > we-don't-recommend-it things. +1. This also has value in the context of automatically surfacing situations where "can't happen" errors do in fact happen at scale. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [sqlsmith] Failed assertion in joinrels.c
On Tue, Aug 2, 2016 at 7:16 PM, Tom Lane wrote: > Robert Haas writes: >> On Tue, Aug 2, 2016 at 6:03 AM, Dilip Kumar wrote: >>> There are many more such exposed functions, which can throw cache lookup >>> failure error if we pass wrong value. >>> >>> i.e. >>> record_in >>> domain_in >>> fmgr_c_validator >>> plpgsql_validator >>> pg_procedure_is_visible >>> >>> Are we planning to change these also.. > >> I think if they are SQL-callable functions that users can invoke from >> a SQL prompt, they shouldn't throw "cache lookup failed" errors or, >> for that matter, any other error that is reported with elog(). > > FWIW, I would restrict that to "functions that users are *intended* to > call from a SQL prompt". If you are fooling around calling internal > functions with garbage input, you should not be surprised to get an > internal error back. So of the above list, only pg_procedure_is_visible > seems particularly worth changing to me. And for it, returning NULL > (ie, "unknown") seems reasonable enough. I think that's just making life difficult. If nothing else, sqlsmith hunts around for functions it can call that return internal errors, and if we refuse to fix all of them to return user-facing errors, then it's just crap for the people running sqlsmith to sift through and it's a judgment call whether to fix each particular case. Even aside from that, I think it's much better to have a clear and unambiguous rule that elog is only for can't-happen things, not we-don't-recommend-it things. > There's no such animal as pg_procedure_is_visible. I suspect that's an > EDB-ism that hasn't caught up with commit 66bb74dbe8206a35. Yep. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [sqlsmith] Failed assertion in joinrels.c
Robert Haas writes: > On Tue, Aug 2, 2016 at 6:03 AM, Dilip Kumar wrote: >> There are many more such exposed functions, which can throw cache lookup >> failure error if we pass wrong value. >> >> i.e. >> record_in >> domain_in >> fmgr_c_validator >> plpgsql_validator >> pg_procedure_is_visible >> >> Are we planning to change these also.. > I think if they are SQL-callable functions that users can invoke from > a SQL prompt, they shouldn't throw "cache lookup failed" errors or, > for that matter, any other error that is reported with elog(). FWIW, I would restrict that to "functions that users are *intended* to call from a SQL prompt". If you are fooling around calling internal functions with garbage input, you should not be surprised to get an internal error back. So of the above list, only pg_procedure_is_visible seems particularly worth changing to me. And for it, returning NULL (ie, "unknown") seems reasonable enough. Actually, it looks to me like we already fixed that for the built-in is_visible functions: # select pg_function_is_visible(0) is null; ?column? -- t (1 row) There's no such animal as pg_procedure_is_visible. I suspect that's an EDB-ism that hasn't caught up with commit 66bb74dbe8206a35. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [sqlsmith] Failed assertion in joinrels.c
On Tue, Aug 2, 2016 at 6:03 AM, Dilip Kumar wrote: > There are many more such exposed functions, which can throw cache lookup > failure error if we pass wrong value. > > i.e. > record_in > domain_in > fmgr_c_validator > plpgsql_validator > pg_procedure_is_visible > > Are we planning to change these also.. I think if they are SQL-callable functions that users can invoke from a SQL prompt, they shouldn't throw "cache lookup failed" errors or, for that matter, any other error that is reported with elog(). Now, I don't necessarily think that these functions should return NULL in that case the way we did with the others; that's not a sensible behavior for a type-input function AFAIK. But we should emit a better error. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [sqlsmith] Failed assertion in joinrels.c
On Tue, Aug 2, 2016 at 3:33 PM, Dilip Kumar wrote: > There are many more such exposed functions, which can throw cache lookup > failure error if we pass wrong value. > > i.e. > record_in > domain_in > fmgr_c_validator > edb_get_objecttypeheaddef > plpgsql_validator > pg_procedure_is_visible > > Are we planning to change these also.. > > below query is one example (from sqlsmith). > > ERROR: cache lookup failed for procedure 0 > > postgres=# select > > postgres-# pg_catalog.record_in( > > postgres(# cast(pg_catalog.numerictypmodout( > > postgres(# cast(98 as integer)) as cstring), > > postgres(# cast(1 as oid), > > postgres(# cast(35 as integer)); > > ERROR: cache lookup failed for type 1 > By mistake I mentioned edb_get_objecttypeheaddef function also, please ignore that. So actual list is as below.. record_in domain_in fmgr_c_validator plpgsql_validator pg_procedure_is_visible -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] [sqlsmith] Failed assertion in joinrels.c
On Sat, Jul 30, 2016 at 4:27 AM, Michael Paquier wrote: > OK for me. Thanks for the commit. There are many more such exposed functions, which can throw cache lookup failure error if we pass wrong value. i.e. record_in domain_in fmgr_c_validator edb_get_objecttypeheaddef plpgsql_validator pg_procedure_is_visible Are we planning to change these also.. below query is one example (from sqlsmith). ERROR: cache lookup failed for procedure 0 postgres=# select postgres-# pg_catalog.record_in( postgres(# cast(pg_catalog.numerictypmodout( postgres(# cast(98 as integer)) as cstring), postgres(# cast(1 as oid), postgres(# cast(35 as integer)); ERROR: cache lookup failed for type 1 -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: [HACKERS] [sqlsmith] Failed assertion in joinrels.c
On Sat, Jul 30, 2016 at 1:17 AM, Tom Lane wrote: > Robert Haas writes: >> On Tue, Jul 26, 2016 at 9:27 PM, Michael Paquier >> wrote: >>> While looking at the series of functions pg_get_*, I have noticed as >>> well that pg_get_userbyid() returns "unknown (OID=%u)" when it does >>> not know a user. Perhaps we'd want to change that to NULL for >>> consistency with the rest? > >> That's probably correct in theory, but it's a little less closely >> related, and I'm not entirely sure how far we want to go with this. >> Remember, the original purpose was to avoid having an internal error >> (cache lookup failed, XX000) exposed as a user-visible error message. >> Are we at risk from veering from actual bug-fixing off into useless >> tinkering? Not sure. > > I'd vote for leaving that one alone; yeah, it's a bit inconsistent > now, but no one has complained about its behavior. OK for me. Thanks for the commit. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [sqlsmith] Failed assertion in joinrels.c
Robert Haas writes: > On Tue, Jul 26, 2016 at 9:27 PM, Michael Paquier > wrote: >> While looking at the series of functions pg_get_*, I have noticed as >> well that pg_get_userbyid() returns "unknown (OID=%u)" when it does >> not know a user. Perhaps we'd want to change that to NULL for >> consistency with the rest? > That's probably correct in theory, but it's a little less closely > related, and I'm not entirely sure how far we want to go with this. > Remember, the original purpose was to avoid having an internal error > (cache lookup failed, XX000) exposed as a user-visible error message. > Are we at risk from veering from actual bug-fixing off into useless > tinkering? Not sure. I'd vote for leaving that one alone; yeah, it's a bit inconsistent now, but no one has complained about its behavior. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [sqlsmith] Failed assertion in joinrels.c
On Tue, Jul 26, 2016 at 9:27 PM, Michael Paquier wrote: > On Wed, Jul 27, 2016 at 5:11 AM, Robert Haas wrote: >> Committed with minor kibitizing: you don't need an "else" after a >> statement that transfers control out of the function. > > Thanks. Right, I forgot that. > >> Shouldn't >> pg_get_function_arguments, pg_get_function_identity_arguments, >> pg_get_function_result, and pg_get_function_arg_default get the same >> treatment? > > Changing all of them make sense. Please see attached. Committed. > While looking at the series of functions pg_get_*, I have noticed as > well that pg_get_userbyid() returns "unknown (OID=%u)" when it does > not know a user. Perhaps we'd want to change that to NULL for > consistency with the rest? That's probably correct in theory, but it's a little less closely related, and I'm not entirely sure how far we want to go with this. Remember, the original purpose was to avoid having an internal error (cache lookup failed, XX000) exposed as a user-visible error message. Are we at risk from veering from actual bug-fixing off into useless tinkering? Not sure. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [sqlsmith] Failed assertion in joinrels.c
On Wed, Jul 27, 2016 at 5:11 AM, Robert Haas wrote: > Committed with minor kibitizing: you don't need an "else" after a > statement that transfers control out of the function. Thanks. Right, I forgot that. > Shouldn't > pg_get_function_arguments, pg_get_function_identity_arguments, > pg_get_function_result, and pg_get_function_arg_default get the same > treatment? Changing all of them make sense. Please see attached. While looking at the series of functions pg_get_*, I have noticed as well that pg_get_userbyid() returns "unknown (OID=%u)" when it does not know a user. Perhaps we'd want to change that to NULL for consistency with the rest? -- Michael diff --git a/src/backend/utils/adt/ruleutils.c b/src/backend/utils/adt/ruleutils.c index 2915e21..51d0c23 100644 --- a/src/backend/utils/adt/ruleutils.c +++ b/src/backend/utils/adt/ruleutils.c @@ -2238,11 +2238,11 @@ pg_get_function_arguments(PG_FUNCTION_ARGS) StringInfoData buf; HeapTuple proctup; - initStringInfo(&buf); - proctup = SearchSysCache1(PROCOID, ObjectIdGetDatum(funcid)); if (!HeapTupleIsValid(proctup)) - elog(ERROR, "cache lookup failed for function %u", funcid); + PG_RETURN_NULL(); + + initStringInfo(&buf); (void) print_function_arguments(&buf, proctup, false, true); @@ -2264,11 +2264,11 @@ pg_get_function_identity_arguments(PG_FUNCTION_ARGS) StringInfoData buf; HeapTuple proctup; - initStringInfo(&buf); - proctup = SearchSysCache1(PROCOID, ObjectIdGetDatum(funcid)); if (!HeapTupleIsValid(proctup)) - elog(ERROR, "cache lookup failed for function %u", funcid); + PG_RETURN_NULL(); + + initStringInfo(&buf); (void) print_function_arguments(&buf, proctup, false, false); @@ -2289,11 +2289,11 @@ pg_get_function_result(PG_FUNCTION_ARGS) StringInfoData buf; HeapTuple proctup; - initStringInfo(&buf); - proctup = SearchSysCache1(PROCOID, ObjectIdGetDatum(funcid)); if (!HeapTupleIsValid(proctup)) - elog(ERROR, "cache lookup failed for function %u", funcid); + PG_RETURN_NULL(); + + initStringInfo(&buf); print_function_rettype(&buf, proctup); @@ -2547,7 +2547,7 @@ pg_get_function_arg_default(PG_FUNCTION_ARGS) proctup = SearchSysCache1(PROCOID, ObjectIdGetDatum(funcid)); if (!HeapTupleIsValid(proctup)) - elog(ERROR, "cache lookup failed for function %u", funcid); + PG_RETURN_NULL(); numargs = get_func_arg_info(proctup, &argtypes, &argnames, &argmodes); if (nth_arg < 1 || nth_arg > numargs || !is_input_argument(nth_arg - 1, argmodes)) diff --git a/src/test/regress/expected/rules.out b/src/test/regress/expected/rules.out index 7c633ac..c5ff318 100644 --- a/src/test/regress/expected/rules.out +++ b/src/test/regress/expected/rules.out @@ -3094,3 +3094,33 @@ SELECT pg_get_viewdef(0); (1 row) +SELECT pg_get_function_arguments(0); + pg_get_function_arguments +--- + +(1 row) + +SELECT pg_get_function_identity_arguments(0); + pg_get_function_identity_arguments + + +(1 row) + +SELECT pg_get_function_result(0); + pg_get_function_result + + +(1 row) + +SELECT pg_get_function_arg_default(0, 0); + pg_get_function_arg_default +- + +(1 row) + +SELECT pg_get_function_arg_default('pg_class'::regclass, 0); + pg_get_function_arg_default +- + +(1 row) + diff --git a/src/test/regress/sql/rules.sql b/src/test/regress/sql/rules.sql index 111c9ba..835945f 100644 --- a/src/test/regress/sql/rules.sql +++ b/src/test/regress/sql/rules.sql @@ -1152,3 +1152,8 @@ SELECT pg_get_indexdef(0); SELECT pg_get_ruledef(0); SELECT pg_get_triggerdef(0); SELECT pg_get_viewdef(0); +SELECT pg_get_function_arguments(0); +SELECT pg_get_function_identity_arguments(0); +SELECT pg_get_function_result(0); +SELECT pg_get_function_arg_default(0, 0); +SELECT pg_get_function_arg_default('pg_class'::regclass, 0); -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [sqlsmith] Failed assertion in joinrels.c
On Mon, Jun 6, 2016 at 3:30 AM, Michael Paquier wrote: > On Sun, Jun 5, 2016 at 11:16 PM, Tom Lane wrote: >> Michael Paquier writes: >>> Still, on back branches I think that it would be a good idea to have a >>> better error message for the ones that already throw an error, like >>> "trigger with OID %u does not exist". Switching from elog to ereport >>> would be a good idea, but wouldn't it be a problem to change what is >>> user-visible? >> >> If we're going to touch this behavior in the back branches, I would >> vote for changing it the same as we do in HEAD. I do not think that >> making a user-visible behavior change in a minor release and then a >> different change in the next major is good strategy. > > So, I have finished with the patch attached that changes all the > *def() functions to return NULL when an object does not exist. Some > callers of the index and constraint definitions still expect a cache > lookup error if the object does not exist, and this patch is careful > about that. I think that it would be nice to get that in 9.6, but I > won't fight hard for it either. So I am parking it for now in the > upcoming CF. > >> But, given the relative shortage of complaints from the field, I'd >> be more inclined to just do nothing in back branches. There might >> be people out there with code depending on the current behavior, >> and they'd be annoyed if we change it in a minor release. > > Sure. That's the safest position. Thinking about the existing behavior > for some of the index and constraint callers, even just changing the > error message does not bring much as some of them really expect to > fail with a cache lookup error. Committed with minor kibitizing: you don't need an "else" after a statement that transfers control out of the function. Shouldn't pg_get_function_arguments, pg_get_function_identity_arguments, pg_get_function_result, and pg_get_function_arg_default get the same treatment? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [sqlsmith] Failed assertion in joinrels.c
On Sun, Jun 5, 2016 at 11:16 PM, Tom Lane wrote: > Michael Paquier writes: >> Still, on back branches I think that it would be a good idea to have a >> better error message for the ones that already throw an error, like >> "trigger with OID %u does not exist". Switching from elog to ereport >> would be a good idea, but wouldn't it be a problem to change what is >> user-visible? > > If we're going to touch this behavior in the back branches, I would > vote for changing it the same as we do in HEAD. I do not think that > making a user-visible behavior change in a minor release and then a > different change in the next major is good strategy. So, I have finished with the patch attached that changes all the *def() functions to return NULL when an object does not exist. Some callers of the index and constraint definitions still expect a cache lookup error if the object does not exist, and this patch is careful about that. I think that it would be nice to get that in 9.6, but I won't fight hard for it either. So I am parking it for now in the upcoming CF. > But, given the relative shortage of complaints from the field, I'd > be more inclined to just do nothing in back branches. There might > be people out there with code depending on the current behavior, > and they'd be annoyed if we change it in a minor release. Sure. That's the safest position. Thinking about the existing behavior for some of the index and constraint callers, even just changing the error message does not bring much as some of them really expect to fail with a cache lookup error. -- Michael diff --git a/src/backend/utils/adt/ruleutils.c b/src/backend/utils/adt/ruleutils.c index 8cb3075..0651cb2 100644 --- a/src/backend/utils/adt/ruleutils.c +++ b/src/backend/utils/adt/ruleutils.c @@ -314,9 +314,9 @@ static char *pg_get_ruledef_worker(Oid ruleoid, int prettyFlags); static char *pg_get_indexdef_worker(Oid indexrelid, int colno, const Oid *excludeOps, bool attrsOnly, bool showTblSpc, - int prettyFlags); + int prettyFlags, bool missing_ok); static char *pg_get_constraintdef_worker(Oid constraintId, bool fullCommand, - int prettyFlags); + int prettyFlags, bool missing_ok); static text *pg_get_expr_worker(text *expr, Oid relid, const char *relname, int prettyFlags); static int print_function_arguments(StringInfo buf, HeapTuple proctup, @@ -466,9 +466,16 @@ pg_get_ruledef(PG_FUNCTION_ARGS) { Oid ruleoid = PG_GETARG_OID(0); int prettyFlags; + char *res; prettyFlags = PRETTYFLAG_INDENT; - PG_RETURN_TEXT_P(string_to_text(pg_get_ruledef_worker(ruleoid, prettyFlags))); + + res = pg_get_ruledef_worker(ruleoid, prettyFlags); + + if (res == NULL) + PG_RETURN_NULL(); + + PG_RETURN_TEXT_P(string_to_text(res)); } @@ -478,9 +485,16 @@ pg_get_ruledef_ext(PG_FUNCTION_ARGS) Oid ruleoid = PG_GETARG_OID(0); bool pretty = PG_GETARG_BOOL(1); int prettyFlags; + char *res; prettyFlags = pretty ? PRETTYFLAG_PAREN | PRETTYFLAG_INDENT : PRETTYFLAG_INDENT; - PG_RETURN_TEXT_P(string_to_text(pg_get_ruledef_worker(ruleoid, prettyFlags))); + + res = pg_get_ruledef_worker(ruleoid, prettyFlags); + + if (res == NULL) + PG_RETURN_NULL(); + + PG_RETURN_TEXT_P(string_to_text(res)); } @@ -532,7 +546,12 @@ pg_get_ruledef_worker(Oid ruleoid, int prettyFlags) if (spirc != SPI_OK_SELECT) elog(ERROR, "failed to get pg_rewrite tuple for rule %u", ruleoid); if (SPI_processed != 1) - appendStringInfoChar(&buf, '-'); + { + /* + * There is no tuple data available here, just keep the output buffer + * empty. + */ + } else { /* @@ -549,6 +568,9 @@ pg_get_ruledef_worker(Oid ruleoid, int prettyFlags) if (SPI_finish() != SPI_OK_FINISH) elog(ERROR, "SPI_finish failed"); + if (buf.len == 0) + return NULL; + return buf.data; } @@ -564,9 +586,16 @@ pg_get_viewdef(PG_FUNCTION_ARGS) /* By OID */ Oid viewoid = PG_GETARG_OID(0); int prettyFlags; + char *res; prettyFlags = PRETTYFLAG_INDENT; - PG_RETURN_TEXT_P(string_to_text(pg_get_viewdef_worker(viewoid, prettyFlags, WRAP_COLUMN_DEFAULT))); + + res = pg_get_viewdef_worker(viewoid, prettyFlags, WRAP_COLUMN_DEFAULT); + + if (res == NULL) + PG_RETURN_NULL(); + + PG_RETURN_TEXT_P(string_to_text(res)); } @@ -577,9 +606,16 @@ pg_get_viewdef_ext(PG_FUNCTION_ARGS) Oid viewoid = PG_GETARG_OID(0); bool pretty = PG_GETARG_BOOL(1); int prettyFlags; + char *res; prettyFlags = pretty ? PRETTYFLAG_PAREN | PRETTYFLAG_INDENT : PRETTYFLAG_INDENT; - PG_RETURN_TEXT_P(string_to_text(pg_get_viewdef_worker(viewoid, prettyFlags, WRAP_COLUMN_DEFAULT))); + + res = pg_get_viewdef_worker(viewoid, prettyFlags, WRAP_COLUMN_DEFAULT); + + if (res == NULL) + PG_RETURN_NULL(); + + PG_RETURN_TEXT_P(string_to_text(res)); } Datum @@ -589,10 +625,17 @@ pg_get_viewdef_wrap(PG_FUNCTION_ARGS) Oid viewoid = PG_GETARG_OID(0); int wrap = PG_GETARG_INT32(1); int prettyFlags; + char *res; /* calli
Re: [HACKERS] [sqlsmith] Failed assertion in joinrels.c
Michael Paquier writes: > Still, on back branches I think that it would be a good idea to have a > better error message for the ones that already throw an error, like > "trigger with OID %u does not exist". Switching from elog to ereport > would be a good idea, but wouldn't it be a problem to change what is > user-visible? If we're going to touch this behavior in the back branches, I would vote for changing it the same as we do in HEAD. I do not think that making a user-visible behavior change in a minor release and then a different change in the next major is good strategy. But, given the relative shortage of complaints from the field, I'd be more inclined to just do nothing in back branches. There might be people out there with code depending on the current behavior, and they'd be annoyed if we change it in a minor release. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [sqlsmith] Failed assertion in joinrels.c
On Sun, Jun 5, 2016 at 4:28 PM, Michael Paquier wrote: > On Sun, Jun 5, 2016 at 12:29 AM, Tom Lane wrote: >> Michael Paquier writes: >>> This is still failing: >>> =# select indexdef from pg_catalog.pg_indexes where indexdef is not NULL; >>> ERROR: XX000: cache lookup failed for index 2619 >>> LOCATION: pg_get_indexdef_worker, ruleutils.c:1054 >>> Do we want to improve at least the error message? >> >> Maybe we should just make the function silently return NULL if the OID >> doesn't refer to an index relation. There's precedent for that sort >> of behavior ... > > For views it is simply returned "Not a view", and for rules that's > "-". All the others behave like similarly to indexes: > =# select pg_get_constraintdef(0); > ERROR: XX000: cache lookup failed for constraint 0 > =# select pg_get_triggerdef(0); > ERROR: XX000: could not find tuple for trigger 0 > =# select pg_get_functiondef(0); > ERROR: XX000: cache lookup failed for function 0 > =# select pg_get_viewdef(0); > pg_get_viewdef > > Not a view > (1 row) > =# select pg_get_ruledef(0); > pg_get_ruledef > > - > (1 row) > > So yes we could just return NULL for all of them, or make them throw > an error. Anyway, my vote goes for a HEAD-only change, and just throw > NULL... This way an application still has the option to fallback to > something else with a CASE at SQL level without using plpgsql to catch > the error. > > Also, I have not monitored the code much regarding some code paths > that rely on the existing rules, but I would imagine that there are > things depending on the current behavior as well. Still, on back branches I think that it would be a good idea to have a better error message for the ones that already throw an error, like "trigger with OID %u does not exist". Switching from elog to ereport would be a good idea, but wouldn't it be a problem to change what is user-visible? -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [sqlsmith] Failed assertion in joinrels.c
On Sun, Jun 5, 2016 at 12:29 AM, Tom Lane wrote: > Michael Paquier writes: >> This is still failing: >> =# select indexdef from pg_catalog.pg_indexes where indexdef is not NULL; >> ERROR: XX000: cache lookup failed for index 2619 >> LOCATION: pg_get_indexdef_worker, ruleutils.c:1054 >> Do we want to improve at least the error message? > > Maybe we should just make the function silently return NULL if the OID > doesn't refer to an index relation. There's precedent for that sort > of behavior ... For views it is simply returned "Not a view", and for rules that's "-". All the others behave like similarly to indexes: =# select pg_get_constraintdef(0); ERROR: XX000: cache lookup failed for constraint 0 =# select pg_get_triggerdef(0); ERROR: XX000: could not find tuple for trigger 0 =# select pg_get_functiondef(0); ERROR: XX000: cache lookup failed for function 0 =# select pg_get_viewdef(0); pg_get_viewdef Not a view (1 row) =# select pg_get_ruledef(0); pg_get_ruledef - (1 row) So yes we could just return NULL for all of them, or make them throw an error. Anyway, my vote goes for a HEAD-only change, and just throw NULL... This way an application still has the option to fallback to something else with a CASE at SQL level without using plpgsql to catch the error. Also, I have not monitored the code much regarding some code paths that rely on the existing rules, but I would imagine that there are things depending on the current behavior as well. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [sqlsmith] Failed assertion in joinrels.c
Michael Paquier writes: > This is still failing: > =# select indexdef from pg_catalog.pg_indexes where indexdef is not NULL; > ERROR: XX000: cache lookup failed for index 2619 > LOCATION: pg_get_indexdef_worker, ruleutils.c:1054 > Do we want to improve at least the error message? Maybe we should just make the function silently return NULL if the OID doesn't refer to an index relation. There's precedent for that sort of behavior ... regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [sqlsmith] Failed assertion in joinrels.c
On Fri, Aug 7, 2015 at 9:21 AM, Tom Lane wrote: > Andreas Seltenreich writes: >> Tom Lane writes: >>> On 08/01/2015 05:59 PM, Tom Lane wrote: Well, I certainly think all of these represent bugs: 1 | ERROR: could not find pathkey item to sort > >>> Hmm ... I see no error with these queries as of today's HEAD or >>> back-branch tips. I surmise that this was triggered by one of the other >>> recently-fixed bugs, though the connection isn't obvious offhand. > >> I still see this error in master as of b8cbe43, but the queries are >> indeed a pita to reproduce. The one below is the only one so far that >> is robust against running ANALYZE on the regression db, and also >> reproduces when I run it as an EXTRA_TEST with make check. > > Got that one, thanks! But we've seen this error message before in all > sorts of weird situations, so I wouldn't assume that all the cases you've > come across had the same cause. (resurrecting an old thread) This is still failing: =# select indexdef from pg_catalog.pg_indexes where indexdef is not NULL; ERROR: XX000: cache lookup failed for index 2619 LOCATION: pg_get_indexdef_worker, ruleutils.c:1054 Do we want to improve at least the error message? -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [sqlsmith] Failed assertion in joinrels.c
Andreas Seltenreich writes: > Tom Lane writes: >> On 08/01/2015 05:59 PM, Tom Lane wrote: >>> Well, I certainly think all of these represent bugs: >>> 1 | ERROR: could not find pathkey item to sort >> Hmm ... I see no error with these queries as of today's HEAD or >> back-branch tips. I surmise that this was triggered by one of the other >> recently-fixed bugs, though the connection isn't obvious offhand. > I still see this error in master as of b8cbe43, but the queries are > indeed a pita to reproduce. The one below is the only one so far that > is robust against running ANALYZE on the regression db, and also > reproduces when I run it as an EXTRA_TEST with make check. Got that one, thanks! But we've seen this error message before in all sorts of weird situations, so I wouldn't assume that all the cases you've come across had the same cause. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [sqlsmith] Failed assertion in joinrels.c
Tom Lane writes: >> On 08/01/2015 05:59 PM, Tom Lane wrote: >>> Well, I certainly think all of these represent bugs: 1 | ERROR: could not find pathkey item to sort > > Hmm ... I see no error with these queries as of today's HEAD or > back-branch tips. I surmise that this was triggered by one of the other > recently-fixed bugs, though the connection isn't obvious offhand. I still see this error in master as of b8cbe43, but the queries are indeed a pita to reproduce. The one below is the only one so far that is robust against running ANALYZE on the regression db, and also reproduces when I run it as an EXTRA_TEST with make check. regards, Andreas select rel_217088662.a as c0, rel_217088554.a as c1, rel_217088662.b as c2, subq_34235266.c0 as c3, rel_217088660.id2 as c4, rel_217088660.id2 as c5 from public.clstr_tst as rel_217088554 inner join (select rel_217088628.a as c0 from public.rtest_vview3 as rel_217088628 where (rel_217088628.b !~ rel_217088628.b) and rel_217088628.b ~~* rel_217088628.b) or (rel_217088628.b ~* rel_217088628.b)) or (rel_217088628.b <> rel_217088628.b)) or (rel_217088628.b = rel_217088628.b))) as subq_34235266 inner join public.num_exp_mul as rel_217088660 inner join public.onek2 as rel_217088661 on (rel_217088660.id1 = rel_217088661.unique1 ) on (subq_34235266.c0 = rel_217088660.id1 ) inner join public.main_table as rel_217088662 on (rel_217088661.unique2 = rel_217088662.a ) on (rel_217088554.b = rel_217088660.id1 ) where rel_217088554.d = rel_217088554.c fetch first 94 rows only; -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [sqlsmith] Failed assertion in joinrels.c
Piotr Stefaniak writes: > On 08/05/2015 02:24 AM, Tom Lane wrote: >> Piotr Stefaniak writes: >>> Set join_collapse_limit = 32 and you should see the error. >> Ah ... now I get that error on the smaller query, but the larger one >> (that you put in an attachment) still doesn't show any problem. >> Do you have any other nondefault settings? > Sorry, that needs from_collapse_limit = 32 as well. Yeah, I assumed as much, but it still doesn't happen for me. Possibly something platform-dependent in statistics, or something like that. Anyway, I fixed the problem exposed by the smaller query; would you check that the larger one is okay for you now? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [sqlsmith] Failed assertion in joinrels.c
Piotr Stefaniak writes: > On 08/03/2015 09:18 PM, Tom Lane wrote: >> ... but I can't reproduce it on HEAD with either of these queries. >> Not clear why you're getting different results. > I'm terribly sorry, but I didn't notice that postgresql.conf was modified... > Set join_collapse_limit = 32 and you should see the error. Ah ... now I get that error on the smaller query, but the larger one (that you put in an attachment) still doesn't show any problem. Do you have any other nondefault settings? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [sqlsmith] Failed assertion in joinrels.c
Hi there,I've been following the sqlsmith work and wanted to jump in and try it out. I took Peter's idea and tried building postgres with the flags suggested but it was hard to get anything working. I'm on commit 85e5e222b1dd02f135a8c3bf387d0d6d88e669bd (Tue Aug 4 14:55:32 2015 -0400) Configure arguments:./configure --prefix=$HOME/pkg CC=clang CFLAGS='-O1 -g -fsanitize=address -fno-omit-frame-pointer -fno-optimize-sibling-calls' --enable-cassert I had to make a simple leak suppression file: $ cat leak.supp leak:save_ps_display_args leak:__GI___strdup $ export LSAN_OPTIONS=suppressions=leak.supp And then I could run postgres. After 50,000 queries, I'm left with the following report: queries: 50514 AST stats (avg): height = 7.29877 nodes = 37.8156 296 ERROR: canceling statement due to statement timeout 166 ERROR: invalid regular expression: quantifier operand invalid 26 ERROR: could not determine which collation to use for string comparison 23 ERROR: cannot compare arrays of different element types 12 ERROR: invalid regular expression: brackets [] not balanced 5 ERROR: cache lookup failed for index 2619 2 ERROR: invalid regular expression: parentheses () not balanced error rate: 0.0104921 AddressSanitizer didn't fire except for the suppressed leaks. The suppressed leaks were only hit at the beginning: - Suppressions used: count bytes template 1 520 save_ps_display_args 1 10 __GI___strdup - sqlsmith is a cool little piece of kit and I see a lot of room for on going work (performance bumps for more queries per second; more db back ends; different fuzzers). Yours,Ewan Higgs From: Peter Geoghegan To: Andreas Seltenreich Cc: Pg Hackers Sent: Sunday, 2 August 2015, 10:39 Subject: Re: [HACKERS] [sqlsmith] Failed assertion in joinrels.c On Fri, Jul 31, 2015 at 5:56 PM, Andreas Seltenreich wrote: > sqlsmith triggered the following assertion in master (c188204). Thanks for writing sqlsmith. It seems like a great tool. I wonder, are you just running the tool with assertions enabled when PostgreSQL is built? If so, it might make sense to make various problems more readily detected. As you may know, Clang has a pretty decent option called AddressSanitizer that can detect memory errors as they occur with an overhead that is not excessive. One might use the following configure arguments when building PostgreSQL to use AddressSanitizer: ./configure CC=clang CFLAGS='-O1 -g -fsanitize=address -fno-omit-frame-pointer -fno-optimize-sibling-calls' --enable-cassert Of course, it remains to be seen if this pays for itself. Apparently the tool has about a 2x overhead [1]. I'm really not sure that you'll find any more bugs this way, but it's certainly possible that you'll find a lot more. Given your success in finding bugs without using AddressSanitizer, introducing it may be premature. [1] http://clang.llvm.org/docs/AddressSanitizer.html#introduction -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [sqlsmith] Failed assertion in joinrels.c
Piotr Stefaniak writes: > How about this one? > 1 ERROR: could not find RelOptInfo for given relids That would be a bug, for sure ... > It's triggered on 13bba02271dce865cd20b6f49224889c73fed4e7 by this query > and the attached one: ... but I can't reproduce it on HEAD with either of these queries. Not clear why you're getting different results. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [sqlsmith] Failed assertion in joinrels.c
Peter Geoghegan writes: > On Fri, Jul 31, 2015 at 5:56 PM, Andreas Seltenreich > wrote: >> sqlsmith triggered the following assertion in master (c188204). > > Thanks for writing sqlsmith. It seems like a great tool. > > I wonder, are you just running the tool with assertions enabled when > PostgreSQL is built? Right. I have to admit my testing setup is still more tailored towards testing sqlsmith than postgres. > If so, it might make sense to make various problems more readily > detected. As you may know, Clang has a pretty decent option called > AddressSanitizer that can detect memory errors as they occur with an > overhead that is not excessive. I didn't known this clang feature yet, thanks for pointing it out. I considered running some instances under valgrind to detect these, but the performance penalty seemed not worth it. > One might use the following configure arguments when building > PostgreSQL to use AddressSanitizer: > > ./configure CC=clang CFLAGS='-O1 -g -fsanitize=address > -fno-omit-frame-pointer -fno-optimize-sibling-calls' --enable-cassert A quick attempt to sneak these in made my ansible playbooks unhappy due to "make check" failures and other generated noise. I'll try to have an instance with the AddressSanitizer active soon though. > Of course, it remains to be seen if this pays for itself. Apparently > the tool has about a 2x overhead [1]. I'm really not sure that you'll > find any more bugs this way, but it's certainly possible that you'll > find a lot more. Given your success in finding bugs without using > AddressSanitizer, introducing it may be premature. Piotr also suggested on IRC to run coverage tests w/ sqlsmith. This could yield valuable hints in which direction to extend sqlsmith's grammar. Thanks, Andreas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [sqlsmith] Failed assertion in joinrels.c
Piotr Stefaniak writes: > On 08/01/2015 05:59 PM, Tom Lane wrote: >> Well, I certainly think all of these represent bugs: >>> 1 | ERROR: could not find pathkey item to sort > sqlsmith was able to find these two queries that trigger the error on my > machine: Hmm ... I see no error with these queries as of today's HEAD or back-branch tips. I surmise that this was triggered by one of the other recently-fixed bugs, though the connection isn't obvious offhand. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [sqlsmith] Failed assertion in joinrels.c
On Fri, Jul 31, 2015 at 5:56 PM, Andreas Seltenreich wrote: > sqlsmith triggered the following assertion in master (c188204). Thanks for writing sqlsmith. It seems like a great tool. I wonder, are you just running the tool with assertions enabled when PostgreSQL is built? If so, it might make sense to make various problems more readily detected. As you may know, Clang has a pretty decent option called AddressSanitizer that can detect memory errors as they occur with an overhead that is not excessive. One might use the following configure arguments when building PostgreSQL to use AddressSanitizer: ./configure CC=clang CFLAGS='-O1 -g -fsanitize=address -fno-omit-frame-pointer -fno-optimize-sibling-calls' --enable-cassert Of course, it remains to be seen if this pays for itself. Apparently the tool has about a 2x overhead [1]. I'm really not sure that you'll find any more bugs this way, but it's certainly possible that you'll find a lot more. Given your success in finding bugs without using AddressSanitizer, introducing it may be premature. [1] http://clang.llvm.org/docs/AddressSanitizer.html#introduction -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [sqlsmith] Failed assertion in joinrels.c
Tom Lane writes: > Well, I certainly think all of these represent bugs: > > [...] thanks for priorizing them. I'll try to digest them somewhat before posting. > This one's pretty darn odd, because 2619 is pg_statistic and not an index > at all: > >> 4 | ERROR: cache lookup failed for index 2619 This is actually the one from the README :-). Quoting to spare media discontinuity: --8<---cut here---start->8--- Taking a closer look at it reveals that it happens when you query a certain catalog view like this: self=# select indexdef from pg_catalog.pg_indexes where indexdef is not NULL; FEHLER: cache lookup failed for index 2619 This is because the planner then puts pg_get_indexdef(oid) in a context where it sees non-index-oids, which causes it to croak: QUERY PLAN Hash Join (cost=17.60..30.65 rows=9 width=4) Hash Cond: (i.oid = x.indexrelid) -> Seq Scan on pg_class i (cost=0.00..12.52 rows=114 width=8) Filter: ((pg_get_indexdef(oid) IS NOT NULL) AND (relkind = 'i'::"char")) -> Hash (cost=17.31..17.31 rows=23 width=4) -> Hash Join (cost=12.52..17.31 rows=23 width=4) Hash Cond: (x.indrelid = c.oid) -> Seq Scan on pg_index x (cost=0.00..4.13 rows=113 width=8) -> Hash (cost=11.76..11.76 rows=61 width=8) -> Seq Scan on pg_class c (cost=0.00..11.76 rows=61 width=8) Filter: (relkind = ANY ('{r,m}'::"char"[])) --8<---cut here---end--->8--- thanks, Andreas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [sqlsmith] Failed assertion in joinrels.c
Andreas Seltenreich writes: > Tom Lane writes: >> What concerns me more is that what you're finding is only cases that trip >> an assertion sanity check. It seems likely that you're also managing to >> trigger other bugs with less drastic consequences, such as "could not >> devise a query plan" failures or just plain wrong answers. > Ja, some of these are logged as well[1], but most of them are really as > undrastic as can get, and I was afraid reporting them would be more of a > nuisance. Well, I certainly think all of these represent bugs: > 3 | ERROR: plan should not reference subplan's variable > 2 | ERROR: failed to assign all NestLoopParams to plan nodes > 1 | ERROR: could not find pathkey item to sort This I'm not sure about; it could be that the query gave conflicting collation specifiers, but on the other hand we've definitely had bugs with people forgetting to run assign_query_collations on subexpressions: > 4646 | ERROR: could not determine which collation to use for string > comparison This one's pretty darn odd, because 2619 is pg_statistic and not an index at all: > 4 | ERROR: cache lookup failed for index 2619 These seem likely to be bugs as well, though maybe they are race conditions during a DROP and not worth fixing: > 1171 | ERROR: cache lookup failed for index 16862 >172 | ERROR: cache lookup failed for index 257148 > 84 | ERROR: could not find member 1(34520,34520) of opfamily 1976 > 55 | ERROR: missing support function 1(34516,34516) in opfamily 1976 > 13 | ERROR: could not find commutator for operator 34538 > 2 | ERROR: cache lookup failed for index 12322 I would say anything of the sort that is repeatable definitely deserves investigation, because even if it's an expectable error condition, we should be throwing a more user-friendly error message. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [sqlsmith] Failed assertion in joinrels.c
Tom Lane writes: > What concerns me more is that what you're finding is only cases that trip > an assertion sanity check. It seems likely that you're also managing to > trigger other bugs with less drastic consequences, such as "could not > devise a query plan" failures or just plain wrong answers. Ja, some of these are logged as well[1], but most of them are really as undrastic as can get, and I was afraid reporting them would be more of a nuisance. I analysed a couple of the cache lookup failures, and they all had a similar severreness than the example in the README[2]. The operator ones I analysed seem due to intentionally broken operators in the regression db. The NestLoopParams and subplan reference one sound interesting though… > I'm not sure how we could identify wrong answers automatically :-( Csmith isn't doing this either. They discuss differential testing though in their papers, i.e., comparing the results of different products. Maybe a simple metric like numbers of rows returned might already be valuable for correctness checks. I also thought about doing some sampling on the data and simulating relational operations and check for witness tuples, but it is probably not appropriate to start implementing a mini-rdbms on the client side. > but it might be worth checking for XX000 SQLSTATE responses, since > generally that should be a can't-happen case. (Or if it can happen, > we need to change the errcode.) The sqlstate is currently missing in the reports because libpqxx is not putting it in it's exceptions :-/. regards, Andreas Footnotes: [1] smith=# select * from report24h; count | error ---+-- 43831 | ERROR: unsupported XML feature 39496 | ERROR: invalid regular expression: quantifier operand invalid 27261 | ERROR: canceling statement due to statement timeout 21386 | ERROR: operator does not exist: point = point 8580 | ERROR: cannot compare arrays of different element types 5019 | ERROR: invalid regular expression: brackets [] not balanced 4646 | ERROR: could not determine which collation to use for string comparison 2583 | ERROR: invalid regular expression: nfa has too many states 2248 | ERROR: operator does not exist: xml = xml 1198 | ERROR: operator does not exist: polygon = polygon 1171 | ERROR: cache lookup failed for index 16862 677 | ERROR: invalid regular expression: parentheses () not balanced 172 | ERROR: cache lookup failed for index 257148 84 | ERROR: could not find member 1(34520,34520) of opfamily 1976 55 | ERROR: missing support function 1(34516,34516) in opfamily 1976 42 | ERROR: operator does not exist: city_budget = city_budget 13 | ERROR: could not find commutator for operator 34538 10 | ERROR: could not identify a comparison function for type xid 4 | Connection to database failed 4 | ERROR: cache lookup failed for index 2619 3 | ERROR: plan should not reference subplan's variable 2 | ERROR: cache lookup failed for index 12322 2 | ERROR: failed to assign all NestLoopParams to plan nodes 2 | ERROR: invalid regular expression: invalid character range 1 | ERROR: could not find pathkey item to sort (25 rows) Time: 1158,990 ms [2] https://github.com/anse1/sqlsmith/blob/master/README.org -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [sqlsmith] Failed assertion in joinrels.c
Andreas Seltenreich writes: > sqlsmith triggered the following assertion in master (c188204). > TRAP: FailedAssertion("!(!bms_overlap(joinrelids, sjinfo->min_lefthand))", > File: "joinrels.c", Line: 500) Cool, I'll take a look. > As usual, the query is against the regression database. It is rather > unwieldy⦠I wonder if I should stop working on new grammar rules and > instead work on some post-processing that prunes the AST as much as > possible while maintaining the failure mode. Probably not really worth the trouble; I find it's usually easy to produce a minimized test case after the failure cause is understood. What concerns me more is that what you're finding is only cases that trip an assertion sanity check. It seems likely that you're also managing to trigger other bugs with less drastic consequences, such as "could not devise a query plan" failures or just plain wrong answers. I'm not sure how we could identify wrong answers automatically :-( but it might be worth checking for XX000 SQLSTATE responses, since generally that should be a can't-happen case. (Or if it can happen, we need to change the errcode.) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers