questions about string literals

Andrew Patterson Mon, 18 Sep 2006 22:59:08 +1000

I am having trouble with the exact definition of the
string literal..

>From the spec
----------------------------------------------
3.5.1.2 String Data
All strings are enclosed in double quotes, as follows:
"this is a string"
Quoting and line extension is done using the backslash character, as follows:
"this is a much longer string, what one might call a \"phrase\" or even \
a \"sentence\" with a very annoying backslash (\\) in it."
String data can be used to contain almost any other kind of data,
which is intended to be parsed as some other formalism.
Special characters (including the inverted comma and backslash characters)
are expressed using the ISO 10646 or XML special character codes
within single quotes. ISO codes are mnemonic, and follow the pattern
&aaaa;, while
XML codes are hexadecimal and follow the pattern
&#xHHHH;, where H stands for a hexadecimal digit. An example is:
"a &isin; A" -- prints as: a ? ?
All strings are case-sensitive, i.e. 'word' is distinct from 'Word'.
----------------------------------------------


So the question then is, what should the behaviour be when a
\ is used without a valid 'quotation' character following i.e. \\
and \" should be interpreted as \ and " respectively, but should
\. be interpreted a one character . or two characters \.? I'd suggest
that if the quotation character is not recongised, then it should
be an error.

Furthermore, what are the rules around the &#xHHHH escape
sequence? Surely the & will have to be quoted as well? Otherwise
how will the parser know when it is being used for a mnenomic?
 Wouldn't a unicode escape technique such as used in Java and C#
(\uAABB) be a better approach (i.e. one that fits in more
comfortably with the standard \" quoting rules).

Finally, what about the other standard string escapes used
in Java and C# (\t, \b etc). Is there any room for them (maybe
in ADL 2.0?)

Comments?

Andrew

questions about string literals

Reply via email to