Re: [HACKERS] Refactoring identifier checks to consistently use strcmp
> On 17 Aug 2017, at 11:08, Daniel Gustafsson wrote: > >> On 16 Aug 2017, at 17:51, Tom Lane wrote: >> >> Heikki Linnakangas writes: >>> This no longer works: >> >>> postgres=# CREATE TEXT SEARCH DICTIONARY public.simple_dict ( >>>TEMPLATE = pg_catalog.simple, >>>"STOPWORds" = english >>> ); >>> ERROR: unrecognized simple dictionary parameter: "STOPWORds" >> >>> In hindsight, perhaps we should always have been more strict about that >>> to begin wtih, but let's not break backwards-compatibility without a >>> better reason. I didn't thoroughly check all of the cases here, to see >>> if there are more like this. >> >> You have a point, but I'm not sure that this is such a bad compatibility >> break as to be a reason not to change things to be more consistent. > > I agree with this, but I admittedly have no idea how common the above case > would be in the wild. > >>> It'd be nice to have some kind of a rule on when pg_strcasecmp should be >>> used and when strcmp() is enough. Currently, by looking at the code, I >>> can't tell. >> >> My thought is that if we are looking at words that have been through the >> parser, then it should *always* be plain strcmp; we should expect that >> the parser already did the appropriate case-folding. > > +1 > >> pg_strcasecmp would be appropriate, perhaps, if we're dealing with stuff >> that somehow came in without going through the parser. > > In that case it would be up to the consumer of the data to handle required > case-folding for the expected input, so pg_strcasecmp or strcmp depending on > situation. This patch has been marked “Waiting on Author”, but I’m not sure what the concensus of this thread came to with regards to quoted keywords and backwards compatibility. There seems to be a 2-1 vote for allowing a break, and forcing all keywords out of the parser to be casefolded. Any other opinions? cheers ./daniel -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Refactoring identifier checks to consistently use strcmp
> On 16 Aug 2017, at 17:51, Tom Lane wrote: > > Heikki Linnakangas writes: >> This no longer works: > >> postgres=# CREATE TEXT SEARCH DICTIONARY public.simple_dict ( >> TEMPLATE = pg_catalog.simple, >> "STOPWORds" = english >> ); >> ERROR: unrecognized simple dictionary parameter: "STOPWORds" > >> In hindsight, perhaps we should always have been more strict about that >> to begin wtih, but let's not break backwards-compatibility without a >> better reason. I didn't thoroughly check all of the cases here, to see >> if there are more like this. > > You have a point, but I'm not sure that this is such a bad compatibility > break as to be a reason not to change things to be more consistent. I agree with this, but I admittedly have no idea how common the above case would be in the wild. >> It'd be nice to have some kind of a rule on when pg_strcasecmp should be >> used and when strcmp() is enough. Currently, by looking at the code, I >> can't tell. > > My thought is that if we are looking at words that have been through the > parser, then it should *always* be plain strcmp; we should expect that > the parser already did the appropriate case-folding. +1 > pg_strcasecmp would be appropriate, perhaps, if we're dealing with stuff > that somehow came in without going through the parser. In that case it would be up to the consumer of the data to handle required case-folding for the expected input, so pg_strcasecmp or strcmp depending on situation. cheers ./daniel -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Refactoring identifier checks to consistently use strcmp
On Wed, Aug 16, 2017 at 11:51 AM, Tom Lane wrote: > You have a point, but I'm not sure that this is such a bad compatibility > break as to be a reason not to change things to be more consistent. +1. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Refactoring identifier checks to consistently use strcmp
Heikki Linnakangas writes: > This no longer works: > postgres=# CREATE TEXT SEARCH DICTIONARY public.simple_dict ( > TEMPLATE = pg_catalog.simple, > "STOPWORds" = english > ); > ERROR: unrecognized simple dictionary parameter: "STOPWORds" > In hindsight, perhaps we should always have been more strict about that > to begin wtih, but let's not break backwards-compatibility without a > better reason. I didn't thoroughly check all of the cases here, to see > if there are more like this. You have a point, but I'm not sure that this is such a bad compatibility break as to be a reason not to change things to be more consistent. > It'd be nice to have some kind of a rule on when pg_strcasecmp should be > used and when strcmp() is enough. Currently, by looking at the code, I > can't tell. My thought is that if we are looking at words that have been through the parser, then it should *always* be plain strcmp; we should expect that the parser already did the appropriate case-folding. If the user prevented case-folding by double-quoting, I don't have a lot of sympathy for any complaints about it. Generally speaking, what we're dealing with here is things that are logically keywords but we did not wish to make them real parser keywords. But in SQL, once you quote a keyword, it's not a keyword at all anymore. So I think the argument that quoting "stopwords" should be legal at all in this context is pretty weak, and the argument that quoting a weirdly-cased version of it should work is even weaker. pg_strcasecmp would be appropriate, perhaps, if we're dealing with stuff that somehow came in without going through the parser. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Refactoring identifier checks to consistently use strcmp
On 04/04/2017 10:13 AM, Daniel Gustafsson wrote: On 04 Apr 2017, at 05:52, Alvaro Herrera wrote: Daniel Gustafsson wrote: Testing DefElem options is done with both strcmp() and pg_strcasecmp() a bit mixed. Since the option defnames are all lowercased, either via IDENT, keyword rules or “by hand” with makeString(), using strcmp() is safe (and assumed to be so in quite a lot of places). While it’s not incorrect per se to use pg_strcasecmp(), it has the potential to hide a DefElem created with a mixed-case defname where it in other places is expected to be in lowercase, which may lead to subtle bugs. The attached patch refactors to use strcmp() consistently for option processing in the command code as a pre-emptive belts+suspenders move against such subtle bugs and to make the code more consistent. Also reorders a few checks to have all in the same “format” and removes a comment related to the above. Tested with randomizing case on options in make check (not included in patch). Does it work correctly in the Turkish locale? Yes, running make check with random case defnames under tr_TR works fine. This no longer works: postgres=# CREATE TEXT SEARCH DICTIONARY public.simple_dict ( TEMPLATE = pg_catalog.simple, "STOPWORds" = english ); ERROR: unrecognized simple dictionary parameter: "STOPWORds" In hindsight, perhaps we should always have been more strict about that to begin wtih, but let's not break backwards-compatibility without a better reason. I didn't thoroughly check all of the cases here, to see if there are more like this. It'd be nice to have some kind of a rule on when pg_strcasecmp should be used and when strcmp() is enough. Currently, by looking at the code, I can't tell. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Refactoring identifier checks to consistently use strcmp
> On 04 Apr 2017, at 05:52, Alvaro Herrera wrote: > > Daniel Gustafsson wrote: >> Testing DefElem options is done with both strcmp() and pg_strcasecmp() a bit >> mixed. Since the option defnames are all lowercased, either via IDENT, >> keyword >> rules or “by hand” with makeString(), using strcmp() is safe (and assumed to >> be >> so in quite a lot of places). >> >> While it’s not incorrect per se to use pg_strcasecmp(), it has the potential >> to >> hide a DefElem created with a mixed-case defname where it in other places is >> expected to be in lowercase, which may lead to subtle bugs. >> >> The attached patch refactors to use strcmp() consistently for option >> processing >> in the command code as a pre-emptive belts+suspenders move against such >> subtle >> bugs and to make the code more consistent. Also reorders a few checks to >> have >> all in the same “format” and removes a comment related to the above. >> >> Tested with randomizing case on options in make check (not included in >> patch). > > Does it work correctly in the Turkish locale? Yes, running make check with random case defnames under tr_TR works fine. cheers ./daniel -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Refactoring identifier checks to consistently use strcmp
Daniel Gustafsson wrote: > Testing DefElem options is done with both strcmp() and pg_strcasecmp() a bit > mixed. Since the option defnames are all lowercased, either via IDENT, > keyword > rules or “by hand” with makeString(), using strcmp() is safe (and assumed to > be > so in quite a lot of places). > > While it’s not incorrect per se to use pg_strcasecmp(), it has the potential > to > hide a DefElem created with a mixed-case defname where it in other places is > expected to be in lowercase, which may lead to subtle bugs. > > The attached patch refactors to use strcmp() consistently for option > processing > in the command code as a pre-emptive belts+suspenders move against such subtle > bugs and to make the code more consistent. Also reorders a few checks to have > all in the same “format” and removes a comment related to the above. > > Tested with randomizing case on options in make check (not included in patch). Does it work correctly in the Turkish locale? -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Refactoring identifier checks to consistently use strcmp
Testing DefElem options is done with both strcmp() and pg_strcasecmp() a bit mixed. Since the option defnames are all lowercased, either via IDENT, keyword rules or “by hand” with makeString(), using strcmp() is safe (and assumed to be so in quite a lot of places). While it’s not incorrect per se to use pg_strcasecmp(), it has the potential to hide a DefElem created with a mixed-case defname where it in other places is expected to be in lowercase, which may lead to subtle bugs. The attached patch refactors to use strcmp() consistently for option processing in the command code as a pre-emptive belts+suspenders move against such subtle bugs and to make the code more consistent. Also reorders a few checks to have all in the same “format” and removes a comment related to the above. Tested with randomizing case on options in make check (not included in patch). cheers ./daniel defname_strcmp.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers