On Thu, Jun 11, 2015 at 1:17 AM, Philippe Verdy <[email protected]> wrote:
> The ASCII punctuations have been ovveriden for a lot of different roles. > There's simply no way to map them to a category that matches their semantic > role. [...] "Pd" (dash) is then appropriate for the ASCII hyphen-minus. > I agree, but I wasn't talking about the ASCII hyphen, U+002D (HYPHEN-MINUS). I was talking about U+2010 (HYPHEN). I also wasn't talking about changing the properties of U+0027 (APOSTROPHE). > in dictionaries I've seen small slanted tildes, or slanted small equal > signs, to make the distinction with true hyphens used in compound words > This is drifting off-topic, but I wanted to address the thing you just said above. Firstly, in the dictionaries I've seen, the slanted double hyphen is only used when a line break happens to occur at the same place as a "true hyphen". It replaces the "true hyphen". When a line is broken at a hyphenation point between letters, an ordinary-looking hyphen is displayed. Secondly, this character is encoded in Unicode at U+2E17 (DOUBLE OBLIQUE HYPHEN). - Ted On Thu, Jun 11, 2015 at 1:17 AM, Philippe Verdy <[email protected]> wrote: > The ASCII punctuations have been ovveriden for a lot of different roles. > There's simply no way to map them to a category that matches their semantic > role. So the ASCII hyphen and apostrophe-quote can only be given a very > weak category that just exhibit their visual role. "Pd" (dash) is then > appropriate for the ASCII hyphen-minus. You can't really tell from the > character alone if it is a punctuation or a minus sign. > > If it is a minus sign you can reencode it better using the more specific > mathematical minus sign. Otherwise, even if it is not a minus sign, it can > be: > - a connector between words in compound words (hyphen) > - a trailing mark at end of lines for indicating a word has been broken in > the middle (but remember that I asked previously for another character for > that role because this word-breaking hyphen is not necessarily an > horisontal hyphen (in dictionaries I've seen small slanted tildes, or > slanted small equal signs, to make the distinction with true hyphens used > in compound words, also because sometimes these breaks are not necessarily > between two syllables in "pocket books" with very narrow columns and > minimized spacing) > - a bullet leading items in a vertical list (this should be an en dash, > follwoed by some spacing) > - a punctuation (not necessarily at begining of line) marking the change > of person speaking (very common in litterature, notably in theatre). > > As a connector between words, there's a demonstrated need of > differentiating regular hyphens, longer hyphens (preferably surrounded by > thin spaces) for noting intervals (we can use the EN DASH for that), long > hyphens between two separate names that are joined (example in propers > names, after mariage, there's an example in France, where INSEE encodes it > for now using TWO successive hyphens, which are also used in French > identity cards, passports, social security green cards...). > > > ---- > > Still nobody replied to my past comment (about 1 month ago) about the > various forms of the word-breaking hypĥen / line-wrapping symbol: > > * I'm not speaking about the SHY control, but about the real character > whose glyph appears when SHY is materialized at end of lines (and which > should be neither minus, or en-dash but also not the same as the > orthographic hyphen used between words in a compound word). > > * This character can also be found (and is needed) also for breaking long > mathematical formulas and must be clearly distinct from the regular minus. > > * This character is also needed for rendering long lines of programming > code or textual data (it is something that must not be entered in programs > but that must be rendered because theses programs or codes have significant > line breaks: the glyph indicates that the following rendered line break is > to be discarded). Not all programming languages have a syntax allwong to > use an escape before the line break (such escaping varies, it may be a > backslash in C/C++, or an underscore in Basic, but in data dumps such as > CSV files, it is impossible to note such escape in the data language > itself, and we need to render some specific glyph). > > * This character is absolutely needed when rendering on a static medium > (i.e. printing or broadcasting) ; for dynamic medium (such as personal > displays with a personal UI) we could still use scrolling, but users don't > like horizontal scrolls and highly prefer reading the text directly. So > they expect to see a distinctive glyph (or icon) to see the distinction > between line breaks where there are significant or where they just wrap too > long lines, and still see the distinction with other regular hyphens and > minus (that are also significant and very frequently distinct) > > > 2015-06-11 0:51 GMT+02:00 Ted Clancy <[email protected]>: > >> On 4/Jun/2015 19:01, Leo Broukhis wrote: >>> >>> Along the same lines, we might need a MODIFIER LETTER HYPHEN, because, >>> for >>> example, the work ack-ack isn't decomposable into words, or even >>> morphemes, >>> "ack" and "ack". >>> >> I do think that U+2010 (HYPHEN) is miscategorised. I think it should have >> General Category = Pc, not Pd. (That is, hyphens are connectors, not >> dashes.) That would make it a "word" character. >> >> Or, at the very least, U+2010 should have Word Break = MidNumLet (meaning >> it can occur in the middle of numbers or letters). UAX #29 says that U+2010 >> deliberately does *not* have Word Break = MidNumLet, though an >> implementation may treat it as if it did. (UAX #29 doesn't give any reasons >> for this decision. I can understand why U+002D (HYPHEN-MINUS) doesn't have >> Word Break = MidNumLet, due to its history of being used as a dash or minus >> sign, but U+2010 should never be used as a dash or minus sign, so I don't >> see the problem.) >> >> But luckily, the miscategorisation of U+2010 hasn't led to any pressing >> practical problems, unlike the misuse of U+2019 for the apostrophe. >> >> - Ted >> >> >

