Re: Corrigendum #9

Karl Williamson Sun, 08 Jun 2014 08:53:20 -0700

On 06/07/2014 10:33 PM, Asmus Freytag wrote:

On 6/7/2014 9:19 PM, Karl Williamson wrote:

On 06/02/2014 11:00 AM, Shawn Steele wrote:

To further my understanding, can someone provide examples of how
these are used in actual practice?  I can't think of any offhand and
the closest I get is like the old escape characters to get a dot
matrix printer to shift modes, or old word processor internal
formatting sequences.


Here's an example of a possible use.  20 some years ago I wrote a
front-end to the Unix diff utility.  Showing the differences between
files (usually 2 versions of the same program's code) is an extremely
common programming activity.  I do it many times a day.  One reason is
to try to find out why a bug has crept in.  In doing so, there are
some differences that are not relevant to the task at hand, and their
being shown is a significant distraction. For example, in programming,
one might have renamed a variable (identifier) because its purpose has
changed somewhat and the name should accurately reflect its new
function so the reader is not subconsciously misled.  It would be nice
to be able to suppress the variable name changes from the difference
display. There could be thousands of them.  By changing the name in
each file version to the same noncharacter during the diff, these
differences won't be displayed, and there would not be any possible
conflict with the input files having that noncharacter in them.  (For
display the noncharacter is changed back to the original value in its
respective file)  Further, one might want to ignore the name changes
of two variables.  Just use a second noncharacter, up to 66.

I wrote this long before noncharacters were available.  What I do
instead is scan the files for rarely used characters until I find
enough ones that aren't in the files.  For example U+9F is unlikely to
appear.  Scanning the files takes time.  This step could be omitted
for noncharacters that are known to be illegal in the input.

This "illegal in the input" so "I'm free to assume I can use them for my
purposes" was definitely the primary(!) design goal discussed when the
set of 32 were added to Unicode. Having UTC backpedal from that, many
years after original design, based on a single meeting and without
public review is really a breakdown of the process.

A./

I should note that this front-end to 'diff' changes the input files,writes the modified versions out, and calls 'diff' with those modifiedfiles as its inputs. By using noncharacters, it would be depending on'diff' to 1) not use them, and 2) to not filter them out, and 3) for thesystem to be able to store and retrieve them in files.

I think a revision to the text was advisable to clarify that 2) and 3)were acceptable. I haven't heard anybody on this thread disagree withthat.

But item 1) shows how tricky this issue really is. My utility lookslike a fancier 'diff' to those people who call it, so they would bejustified in wanting it not to use noncharacters because they have theirown purposes for them. If some of those callers were themselvesutilities, their callers might want to use noncharacters for their ownpurposes. And so on and so on.

I don't have a good answer, except to say that Asmus' characterizationabove looks reasonable.

The purpose of public reviews is to try to get a broad range of ideas,and if none are forthcoming, then the fact that there was such a reviewshould be an adequate defense of the ultimate decision. Not holding areview is an invitation to lingering suspicions on the part of thepublic about the motives behind any such decision. These can fester andthe trust level is permanently diminished. There will always be peoplewho won't like the decision, and who will assume that the deciders aremalevolent. But the vast majority will accept a decision that seems tohave been made in good faith after public input.

This is just how things work, no matter what the venue or issue. It maybe that the UTC thought this was minor enough to not require a review,but if so, time has shown that to have been an incorrect perception.

_______________________________________________
Unicode mailing list
[email protected]
http://unicode.org/mailman/listinfo/unicode

Re: Corrigendum #9

Reply via email to