2016-01-20 10:17 GMT+01:00 Tatsuo Ishii <is...@postgresql.org>: > > Hi > > > > 2016-01-20 7:20 GMT+01:00 Tatsuo Ishii <is...@postgresql.org>: > > > >> > 2016-01-20 3:47 GMT+01:00 Tatsuo Ishii <is...@postgresql.org>: > >> > > >> >> test=# select format('%I', t) from t1; > >> >> format > >> >> ---------- > >> >> aaa > >> >> "AAA" > >> >> "あいう" > >> >> (3 rows) > >> >> > >> >> Why is the text value of the third line needed to be double quoted? > >> >> (note that it is a multi byte character). Same thing can be said to > >> >> quote_ident(). > >> >> > >> >> We treat identifiers made of the multi byte characters without double > >> >> quotation (non delimited identifier) in other places. > >> >> > >> >> test=# create table t2(あいう text); > >> >> CREATE TABLE > >> >> test=# insert into t2 values('aaa'); > >> >> INSERT 0 1 > >> >> test=# select あいう from t2; > >> >> あいう > >> >> -------- > >> >> aaa > >> >> (1 row) > >> > > >> > format uses same routine as quote_ident. So quote_ident should be > fixed > >> > first. > >> > >> Yes, I had that in my mind too. > >> > >> Attached is the proposed patch to fix the bug. > >> Regression tests passed. > >> > >> Here is an example after the patch. Note that the third row is not > >> quoted any more. > >> > >> test=# select format('%I', あいう) from t2; > >> format > >> -------- > >> aaa > >> "AAA" > >> あああ > >> (3 rows) > >> > >> Best regards, > >> -- > >> Tatsuo Ishii > >> SRA OSS, Inc. Japan > >> English: http://www.sraoss.co.jp/index_en.php > >> Japanese:http://www.sraoss.co.jp > >> > >> diff --git a/src/backend/utils/adt/ruleutils.c > >> b/src/backend/utils/adt/ruleutils.c > >> index 3783e97..b93fc27 100644 > >> --- a/src/backend/utils/adt/ruleutils.c > >> +++ b/src/backend/utils/adt/ruleutils.c > >> @@ -9405,7 +9405,7 @@ quote_identifier(const char *ident) > >> * would like to use <ctype.h> macros here, but they might yield > >> unwanted > >> * locale-specific results... > >> */ > >> - safe = ((ident[0] >= 'a' && ident[0] <= 'z') || ident[0] == > '_'); > >> + safe = ((ident[0] >= 'a' && ident[0] <= 'z') || ident[0] == '_' > || > >> IS_HIGHBIT_SET(ident[0])); > >> > >> for (ptr = ident; *ptr; ptr++) > >> { > >> @@ -9413,7 +9413,8 @@ quote_identifier(const char *ident) > >> > >> if ((ch >= 'a' && ch <= 'z') || > >> (ch >= '0' && ch <= '9') || > >> - (ch == '_')) > >> + (ch == '_') || > >> + (IS_HIGHBIT_SET(ch))) > >> { > >> /* okay */ > >> } > >> > >> > > This patch ls simply - I remember I was surprised, so we allow any > > multibyte char few months ago. > > > > +1 > > If we would go this way, question is if we should back patch this or > not since the patch apparently changes the existing > behaviors. Comments? I would think we should not. >
I am sure, so we should not backport this change. This can breaks customer regress tests - and the current behave isn't 100% correct, but it is safe. Pavel > > Best regards, > -- > Tatsuo Ishii > SRA OSS, Inc. Japan > English: http://www.sraoss.co.jp/index_en.php > Japanese:http://www.sraoss.co.jp >