On 6/7/2014 9:19 PM, Karl Williamson wrote:
On 06/02/2014 11:00 AM, Shawn Steele wrote:
To further my understanding, can someone provide examples of how
these are used in actual practice? I can't think of any offhand and
the closest I get is like the old escape characters to get a dot
matrix printer to shift modes, or old word processor internal
formatting sequences.
Here's an example of a possible use. 20 some years ago I wrote a
front-end to the Unix diff utility. Showing the differences between
files (usually 2 versions of the same program's code) is an extremely
common programming activity. I do it many times a day. One reason is
to try to find out why a bug has crept in. In doing so, there are
some differences that are not relevant to the task at hand, and their
being shown is a significant distraction. For example, in programming,
one might have renamed a variable (identifier) because its purpose has
changed somewhat and the name should accurately reflect its new
function so the reader is not subconsciously misled. It would be nice
to be able to suppress the variable name changes from the difference
display. There could be thousands of them. By changing the name in
each file version to the same noncharacter during the diff, these
differences won't be displayed, and there would not be any possible
conflict with the input files having that noncharacter in them. (For
display the noncharacter is changed back to the original value in its
respective file) Further, one might want to ignore the name changes
of two variables. Just use a second noncharacter, up to 66.
I wrote this long before noncharacters were available. What I do
instead is scan the files for rarely used characters until I find
enough ones that aren't in the files. For example U+9F is unlikely to
appear. Scanning the files takes time. This step could be omitted
for noncharacters that are known to be illegal in the input.
This "illegal in the input" so "I'm free to assume I can use them for my
purposes" was definitely the primary(!) design goal discussed when the
set of 32 were added to Unicode. Having UTC backpedal from that, many
years after original design, based on a single meeting and without
public review is really a breakdown of the process.
A./
_______________________________________________
Unicode mailing list
[email protected]
http://unicode.org/mailman/listinfo/unicode