I posted my feedbacks through the report forms. The text of the two posts is
attached.
(I considerably shortened the list of non-Latin punctuation marks that I
suggest to exclude from identifiers, although I added two of the Hebrew
punctuation marks suggested by Kirk.)
_ Marco
Feedback on UTR#31 (draft 1): Full/Half-Width Characters.
I suggest that all compatibility character which are labelled <wide>, <narrow> and
<small> and whose compatibility decompositions is already in class <Pattern_Syntax> be
added in class <Pattern_Syntax> as well.
In practice, I am suggesting to add the following lines to section "4.1 Proposed
Pattern Properties":
FE50..FE52 ; Pattern_Syntax # SMALL COMMA..SMALL FULL STOP
FE54..FE57 ; Pattern_Syntax # SMALL SEMICOLON..SMALL EXCLAMATION MARK
FE59..FE66 ; Pattern_Syntax # SMALL LEFT PARENTHESIS..SMALL EQUALS SIGN
FE68..FE6B ; Pattern_Syntax # SMALL REVERSE SOLIDUS..SMALL COMMERCIAL AT
FF01..FF0F ; Pattern_Syntax # FULLWIDTH EXCLAMATION MARK..FULLWIDTH SOLIDUS
FF1A..FF20 ; Pattern_Syntax # FULLWIDTH COLON..FULLWIDTH COMMERCIAL AT
FF3B..FF40 ; Pattern_Syntax # FULLWIDTH LEFT SQUARE BRACKET..FULLWIDTH GRAVE
ACCENT
FF5B..FF5E ; Pattern_Syntax # FULLWIDTH LEFT CURLY BRACKET..FULLWIDTH TILDE
FF5F..FF61 ; Pattern_Syntax # FULLWIDTH LEFT WHITE PARENTHESIS..HALFWIDTH
IDEOGRAPHIC FULL STOP
FF64 ; Pattern_Syntax # HALFWIDTH IDEOGRAPHIC COMMA
FFE0..FFE2 ; Pattern_Syntax # FULLWIDTH CENT SIGN..FULLWIDTH NOT SIGN
FFE4..FFE5 ; Pattern_Syntax # FULLWIDTH BROKEN BAR..FULLWIDTH YEN SIGN
FFE8..FFEE ; Pattern_Syntax # HALFWIDTH FORMS LIGHT VERTICAL..HALFWIDTH WHITE
CIRCLE
Rationale. These characters are almost identical, visually and semantically, to their
"normal width" counterparts. Allowing such characters in identifiers means allowing
identifiers which look identical to expressions of a totally different kind. E.g., an
identifier such as "foo,bar" (where "," is U+FF0C FULLWIDTH COMMA), would look
identical to expression "foo, bar" (identifier "foo" + comma + space + identifier
"bar").
Regards.
Marco Cimarosti ([EMAIL PROTECTED])
Feedback on UTR#31 (draft 1): Non-Latin Punctuation.
I suggest that a small set of non-Latin punctuation marks be added in class
<Pattern_Syntax>. Each one of the punctuation marks that I am suggesting to include
complies with the following conditions:
1) It is very similar in shape to an ASCII-range character which is already in class
<Pattern_Syntax>;
2) It is very similar in function to an ASCII-range character already which is in
class <Pattern_Syntax>;
3) It is used in the modern orthography of modern languages and/or it is commonly
available on national keyboards;
4) It is not commonly used to form words or phrases which may be used as identifiers.
In practice, I am suggesting to add the following lines to section "4.1 Proposed
Pattern Properties":
037E ; Pattern_Syntax # GREEK QUESTION MARK
0387 ; Pattern_Syntax # GREEK ANO TELEIA
055C..055E ; Pattern_Syntax # ARMENIAN EXCLAMATION MARK..ARMENIAN QUESTION MARK
0589 ; Pattern_Syntax # ARMENIAN FULL STOP
05C0 ; Pattern_Syntax # HEBREW PUNCTUATION PASEQ
05C3 ; Pattern_Syntax # HEBREW PUNCTUATION SOF PASUQ
060C..060D ; Pattern_Syntax # ARABIC COMMA..ARABIC DATE SEPARATOR
061B ; Pattern_Syntax # ARABIC SEMICOLON
061F ; Pattern_Syntax # ARABIC QUESTION MARK
066A..066C ; Pattern_Syntax # ARABIC PERCENT SIGN..ARABIC THOUSANDS SEPARATOR
06D4 ; Pattern_Syntax # ARABIC FULL STOP
066D ; Pattern_Syntax # ARABIC FIVE POINTED STAR
0964..0965 ; Pattern_Syntax # DEVANAGARI DANDA..DEVANAGARI DOUBLE DANDA
10FB ; Pattern_Syntax # GEORGIAN PARAGRAPH SEPARATOR
1362..1368 ; Pattern_Syntax # ETHIOPIC FULL STOP..ETHIOPIC PARAGRAPH SEPARATOR
Rationale. Punctuation marks complying with conditions #1 to #3 may easily be cofused
with ASCII-range characters which are normally used in the syntax of computer
languages and notations. Allowing such character in identifiers would mean to allow
identifiers which look almost identical to expressions of a totally different kind.
E.g., an identifier such as "return;" (where ";" is U+037E GREEK QUESTION MARK),
looks identical to expression "return;" (identifier or keyword "return" + semicolon).
However, punctuation marks mentioned in condition #4 (e.g. syllable separators,
morpheme separators, abbreviation marks, diacritic marks, apostrophes) are excluded
from my suggestion (i.e. I suggest to allow them in identifiers) because they are
useful to form words or phrases which may act as identifiers.
Character-by-character rationale. In the following list, I listed each suggested
character along with the ASCII-range character which looks similar to it (as per
condition #1 above) and with the ASCII-range character which has a similar function to
it (as per condition #2).
Code Cnd.#1 Cnd.#2 Character name
037E ; ? GREEK QUESTION MARK
0387 . ; GREEK ANO TELEIA
055C ~ ! ARMENIAN EXCLAMATION MARK
055D ` , ARMENIAN COMMA
055E ^ ? ARMENIAN QUESTION MARK
0589 : . ARMENIAN FULL STOP
05C0 | ; HEBREW PUNCTUATION PASEQ
05C3 : . HEBREW PUNCTUATION SOF PASUQ
060C , , ARABIC COMMA
060D , , ARABIC DATE SEPARATOR
061B ; ; ARABIC SEMICOLON
061F ? ? ARABIC QUESTION MARK
066A % % ARABIC PERCENT SIGN
066B , . ARABIC DECIMAL SEPARATOR
066C , , ARABIC THOUSANDS SEPARATOR
06D4 _ . ARABIC FULL STOP
066D * * ARABIC FIVE POINTED STAR
0964 | . DEVANAGARI DANDA
0965 | . DEVANAGARI DOUBLE DANDA
10FB : : GEORGIAN PARAGRAPH SEPARATOR
1362 : . ETHIOPIC FULL STOP
1363 : , ETHIOPIC COMMA
1364 : ; ETHIOPIC SEMICOLON
1365 : : ETHIOPIC COLON
1366 : : ETHIOPIC PREFACE COLON
1367 | ? ETHIOPIC QUESTION MARK
1368 : . ETHIOPIC PARAGRAPH SEPARATOR
Regards.
Marco Cimarosti ([EMAIL PROTECTED])