Re: How is NBH (U0083) Implemented?

Ken Whistler Mon, 08 Aug 2011 16:57:31 -0700

On 8/1/2011 7:26 AM, Naena Guru wrote:

This thread wandered off into an argument about whether U+FEFF ZWNBSP or
U+2060 WJ is best supported and which should be used to inhibit line breaks.

However, there are still several other issues which bear addressing inNaena Guru's

questions:

The Unicode character NBH (No Break Here: U0083) is understood as ahidden character that is used to keep two adjoining visual charactersfrom being separated in operations such as word wrapping.


As Jukka noted, U+0083 is a C1 control code, whose semantics is not actually

defined by the Unicode Standard. Its function in ISO 6429 is torepresent thecontrol function "No Break Here". U+0083 is unlikely to be supported(except forpass-through) by any significant Unicode-based software as a controlfunction.Its only implementation was likely for some terminal-based software inwhat are now

basically obsolete systems.

See the wiki on the topic of C0 and C1 control codes for a quick summaryof the

status of various control codes and their implementation:

http://en.wikipedia.org/wiki/C0_and_C1_control_codes

It seems to be similar to ZWJ (Zero Width nonJoiner: U200C) in that itcan prevent automatic formation of a ligature as programmed in a font.


U+200C ZWNJ is the Unicode format control whose function is to break cursive

connection between adjacent characters. That is a different and distinctfunction

from indicating the position of an inhibited line break.

Also, it is important to recognize that the insertion of *any* randomcontrol code betweentwo characters may end up preventing automatic formation of a fontligature, if it isn'taccounted for in the font tables. That does not imply that insertion ofrandom controlcodes (including U+0083) is a recommended way of inhibiting ligatureformation for

a pair of characters in a particular font.

However, it seems to me that an NBH evokes a question mark (?) Is thisan oversight by implementers or am I making wrong assumptions?

Because most control codes, including nearly all of the C1 controlcodes, are unsupportedby typical Unicode-based text processing software, it is not toosurprising that insertionof U+0083 in text would result in a "?" or other indication of anunsupported and/or undisplayable

character.

There is also the NBSP (No-break Space: U00A0), which I think has tobe mapped to the space character in fonts, that glues two letterstogether by a space. If you do not want a space between two lettersand also want to prevent glyph substitutions to happen, then NBH seemsto be the correct character to use.

No. And that leads to the discussion which followed, about U+FEFF andU+2060.

NBH is more appropriate for use within ISO-8859-1 characters thanZWNJ, because the latter is double-byte.

"Double-byte" is not a concept with any applicability to the UnicodeStandard. That is a hold-overfrom Asian character sets which mixed ASCII with two-byte encoding ofextensions to

cover Han characters (and other additions).

And U+0083 is no more appropriate for use with ISO 8859-1implementations thanUnicode implementations, for the same reason: it is a control functionwhich simply isn't supported.

Programs that handle SBCS well ought to be afforded the use of NBH asit is a SBCS character. Or, am I completely mistaken here?

If you actually run into the byte 0x83 in data which is ostensiblylabeled "ISO-8859-1", inalmost all actual cases you would be dealing instead with 0x83 (= U+0192LATIN SMALL LETTER FWITH HOOK) in mislabeled Windows Code Page 1252 data. It would be reallyinadvisableto start expecting it to be supported as a line break inhibiting controlcode instead.


--Ken

Re: How is NBH (U0083) Implemented?

Reply via email to