Re: [HACKERS] Refactoring identifier checks to consistently use strcmp

2017-09-05 Thread Daniel Gustafsson
> On 17 Aug 2017, at 11:08, Daniel Gustafsson  wrote:
> 
>> On 16 Aug 2017, at 17:51, Tom Lane  wrote:
>> 
>> Heikki Linnakangas  writes:
>>> This no longer works:
>> 
>>> postgres=# CREATE TEXT SEARCH DICTIONARY public.simple_dict (
>>>TEMPLATE = pg_catalog.simple,
>>>"STOPWORds" = english
>>> );
>>> ERROR:  unrecognized simple dictionary parameter: "STOPWORds"
>> 
>>> In hindsight, perhaps we should always have been more strict about that 
>>> to begin wtih, but let's not break backwards-compatibility without a 
>>> better reason. I didn't thoroughly check all of the cases here, to see 
>>> if there are more like this.
>> 
>> You have a point, but I'm not sure that this is such a bad compatibility
>> break as to be a reason not to change things to be more consistent.
> 
> I agree with this, but I admittedly have no idea how common the above case
> would be in the wild.
> 
>>> It'd be nice to have some kind of a rule on when pg_strcasecmp should be 
>>> used and when strcmp() is enough. Currently, by looking at the code, I 
>>> can't tell.
>> 
>> My thought is that if we are looking at words that have been through the
>> parser, then it should *always* be plain strcmp; we should expect that
>> the parser already did the appropriate case-folding.
> 
> +1
> 
>> pg_strcasecmp would be appropriate, perhaps, if we're dealing with stuff
>> that somehow came in without going through the parser.
> 
> In that case it would be up to the consumer of the data to handle required
> case-folding for the expected input, so pg_strcasecmp or strcmp depending on
> situation.

This patch has been marked “Waiting on Author”, but I’m not sure what the
concensus of this thread came to with regards to quoted keywords and backwards
compatibility.  There seems to be a 2-1 vote for allowing a break, and forcing
all keywords out of the parser to be casefolded. Any other opinions?

cheers ./daniel



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Refactoring identifier checks to consistently use strcmp

2017-08-17 Thread Daniel Gustafsson
> On 16 Aug 2017, at 17:51, Tom Lane  wrote:
> 
> Heikki Linnakangas  writes:
>> This no longer works:
> 
>> postgres=# CREATE TEXT SEARCH DICTIONARY public.simple_dict (
>> TEMPLATE = pg_catalog.simple,
>> "STOPWORds" = english
>> );
>> ERROR:  unrecognized simple dictionary parameter: "STOPWORds"
> 
>> In hindsight, perhaps we should always have been more strict about that 
>> to begin wtih, but let's not break backwards-compatibility without a 
>> better reason. I didn't thoroughly check all of the cases here, to see 
>> if there are more like this.
> 
> You have a point, but I'm not sure that this is such a bad compatibility
> break as to be a reason not to change things to be more consistent.

I agree with this, but I admittedly have no idea how common the above case
would be in the wild.

>> It'd be nice to have some kind of a rule on when pg_strcasecmp should be 
>> used and when strcmp() is enough. Currently, by looking at the code, I 
>> can't tell.
> 
> My thought is that if we are looking at words that have been through the
> parser, then it should *always* be plain strcmp; we should expect that
> the parser already did the appropriate case-folding.

+1

> pg_strcasecmp would be appropriate, perhaps, if we're dealing with stuff
> that somehow came in without going through the parser.

In that case it would be up to the consumer of the data to handle required
case-folding for the expected input, so pg_strcasecmp or strcmp depending on
situation.

cheers ./daniel


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Refactoring identifier checks to consistently use strcmp

2017-08-16 Thread Robert Haas
On Wed, Aug 16, 2017 at 11:51 AM, Tom Lane  wrote:
> You have a point, but I'm not sure that this is such a bad compatibility
> break as to be a reason not to change things to be more consistent.

+1.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Refactoring identifier checks to consistently use strcmp

2017-08-16 Thread Tom Lane
Heikki Linnakangas  writes:
> This no longer works:

> postgres=# CREATE TEXT SEARCH DICTIONARY public.simple_dict (
>  TEMPLATE = pg_catalog.simple,
>  "STOPWORds" = english
> );
> ERROR:  unrecognized simple dictionary parameter: "STOPWORds"

> In hindsight, perhaps we should always have been more strict about that 
> to begin wtih, but let's not break backwards-compatibility without a 
> better reason. I didn't thoroughly check all of the cases here, to see 
> if there are more like this.

You have a point, but I'm not sure that this is such a bad compatibility
break as to be a reason not to change things to be more consistent.

> It'd be nice to have some kind of a rule on when pg_strcasecmp should be 
> used and when strcmp() is enough. Currently, by looking at the code, I 
> can't tell.

My thought is that if we are looking at words that have been through the
parser, then it should *always* be plain strcmp; we should expect that
the parser already did the appropriate case-folding.  If the user
prevented case-folding by double-quoting, I don't have a lot of sympathy
for any complaints about it.  Generally speaking, what we're dealing with
here is things that are logically keywords but we did not wish to make
them real parser keywords.  But in SQL, once you quote a keyword, it's
not a keyword at all anymore.  So I think the argument that quoting
"stopwords" should be legal at all in this context is pretty weak,
and the argument that quoting a weirdly-cased version of it should
work is even weaker.

pg_strcasecmp would be appropriate, perhaps, if we're dealing with stuff
that somehow came in without going through the parser.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Refactoring identifier checks to consistently use strcmp

2017-08-16 Thread Heikki Linnakangas

On 04/04/2017 10:13 AM, Daniel Gustafsson wrote:

On 04 Apr 2017, at 05:52, Alvaro Herrera  wrote:

Daniel Gustafsson wrote:

Testing DefElem options is done with both strcmp() and pg_strcasecmp() a bit
mixed.  Since the option defnames are all lowercased, either via IDENT, keyword
rules or “by hand” with makeString(), using strcmp() is safe (and assumed to be
so in quite a lot of places).

While it’s not incorrect per se to use pg_strcasecmp(), it has the potential to
hide a DefElem created with a mixed-case defname where it in other places is
expected to be in lowercase, which may lead to subtle bugs.

The attached patch refactors to use strcmp() consistently for option processing
in the command code as a pre-emptive belts+suspenders move against such subtle
bugs and to make the code more consistent.  Also reorders a few checks to have
all in the same “format” and removes a comment related to the above.

Tested with randomizing case on options in make check (not included in patch).


Does it work correctly in the Turkish locale?


Yes, running make check with random case defnames under tr_TR works fine.


This no longer works:

postgres=# CREATE TEXT SEARCH DICTIONARY public.simple_dict (
TEMPLATE = pg_catalog.simple,
"STOPWORds" = english
);
ERROR:  unrecognized simple dictionary parameter: "STOPWORds"

In hindsight, perhaps we should always have been more strict about that 
to begin wtih, but let's not break backwards-compatibility without a 
better reason. I didn't thoroughly check all of the cases here, to see 
if there are more like this.


It'd be nice to have some kind of a rule on when pg_strcasecmp should be 
used and when strcmp() is enough. Currently, by looking at the code, I 
can't tell.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Refactoring identifier checks to consistently use strcmp

2017-04-04 Thread Daniel Gustafsson
> On 04 Apr 2017, at 05:52, Alvaro Herrera  wrote:
> 
> Daniel Gustafsson wrote:
>> Testing DefElem options is done with both strcmp() and pg_strcasecmp() a bit
>> mixed.  Since the option defnames are all lowercased, either via IDENT, 
>> keyword
>> rules or “by hand” with makeString(), using strcmp() is safe (and assumed to 
>> be
>> so in quite a lot of places).
>> 
>> While it’s not incorrect per se to use pg_strcasecmp(), it has the potential 
>> to
>> hide a DefElem created with a mixed-case defname where it in other places is
>> expected to be in lowercase, which may lead to subtle bugs.
>> 
>> The attached patch refactors to use strcmp() consistently for option 
>> processing
>> in the command code as a pre-emptive belts+suspenders move against such 
>> subtle
>> bugs and to make the code more consistent.  Also reorders a few checks to 
>> have
>> all in the same “format” and removes a comment related to the above.
>> 
>> Tested with randomizing case on options in make check (not included in 
>> patch).
> 
> Does it work correctly in the Turkish locale?

Yes, running make check with random case defnames under tr_TR works fine.

cheers ./daniel

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Refactoring identifier checks to consistently use strcmp

2017-04-03 Thread Alvaro Herrera
Daniel Gustafsson wrote:
> Testing DefElem options is done with both strcmp() and pg_strcasecmp() a bit
> mixed.  Since the option defnames are all lowercased, either via IDENT, 
> keyword
> rules or “by hand” with makeString(), using strcmp() is safe (and assumed to 
> be
> so in quite a lot of places).
> 
> While it’s not incorrect per se to use pg_strcasecmp(), it has the potential 
> to
> hide a DefElem created with a mixed-case defname where it in other places is
> expected to be in lowercase, which may lead to subtle bugs.
> 
> The attached patch refactors to use strcmp() consistently for option 
> processing
> in the command code as a pre-emptive belts+suspenders move against such subtle
> bugs and to make the code more consistent.  Also reorders a few checks to have
> all in the same “format” and removes a comment related to the above.
> 
> Tested with randomizing case on options in make check (not included in patch).

Does it work correctly in the Turkish locale?

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers