On Friday, 2 January 2026 at 05:49, Doug Ewell via Unicode <[email protected]> wrote:
> I read the proposal, and I can’t help worrying about the potential security > implications of left arrows that look like right arrows and vice versa. > > I know there is a three-sentence “Security” section at the bottom of the > proposal, which basically says to denylist the proposed control character(s) > in domains where such a facility exists (like IDN), but for commonplace > characters like arrows, I can imagine many additional opportunities for > troublemakers to make trouble. UTS #55 in particular might need several new > examples; there are programming languages that use Unicode arrows. > > I am certain that someone with better knowledge of security and (especially) > bidi will be along shortly to show how wrong I am. > > -- > Doug Ewell, CC, ALB | Lakewood, CO, US | ewellic.org Thanks. I am also hoping to hear more informed opinions on that. For programming languages, if bidirectional or RTL text is allowed in source code (especially outside of comments and strings), it's already a disaster. I am not very familiar with UTS #55 but I will definitely familiarize myself with it. Thank you for pointing it out. From my initial impressions though, the proposed character(s) shouldn't create any new concerns, especially if they're only allowed in string literals and comments. That said, I think you are right that examples would need to be added. On Tuesday, 30 December 2025 at 02:45, Erik Carvalhal Miller via Unicode <[email protected]> wrote: > I followed the discussion in April with interest, and I congratulate > you on your draft, for it shows a lot of thought and some creative > problem‐solving. That said, Iʼm about to rip it apart. Just what I was hoping for :) > The draft explicitly anticipates pushback on the “extensions”, and > thatʼs a good place to start. Naturally there is a lot more to say about them than the core proposal, but I am worried about putting the cart before the horse. > The customary images of proposed characters in real‐world use are absent [..] You're right. I think even calling this a "draft" was optimistic on my part, as I wrote it mainly to introduce the idea for discussion here. I am not familiar enough with Unicode processes and proposals to competently write a proposal right now, even for an early draft, I wanted to first make sure there is any chance at all for it to be considered before committing to writing it and learning all that is necessary. I originally intended to send the whole thing here as an email but decided in the end to send a link to it. > [..] Instead, the draft resorts to rationales such as > “inoffensive” and “[w]hy not[…]?”, which are hardly proactive, > compelling arguments. The First Natural Extension (re NEVER SUBJECT > TO MIRRORING) indeed argues eloquently against itself, telling us its > intended effect is already available in Unicode (as LRI…PDI). Indeed, that one is a bit silly when taken in isolation, but may be useful inside RLO…PDF, and especially with the "Arguably Unnatural Extension" of applying it to an entire span of directional override. > The Second Natural Extension isnʼt forthcoming about a similar problem > with ALWAYS SUBJECT TO MIRRORING (think: RLI…PDI) Do you mean, for example, "RLI (left arrow) DSM PDI" to always make a right arrow? I haven't thought of that before. > and itʼs a mystery what useful functionality REVERSED SUBJECT TO MIRRORING > and INVERSE > SUBJECT TO MIRRORING provide This is mostly the overly-analytical parts of my brain trying to cover every use case. I thought it would be best to put it out there and let it be shot down (or even shoot it down myself) than to leave it unsaid, because the previous parts sort of create a truth table that can be filled up. So this part just fills it up so there's no holes. It's a silly part and I'll be happy to see it go. > Regarding the Final Potential Extensionʼs VERTICALLY > SUBJECT TO MIRRORING, I laud the draft for remembering vertical text > (though here itʼs about rotation, not mirroring, right?), but again > thereʼs no attempt to convince beyond an unconvincing “might as well”. I thank you for giving me what I assume to be the benefit of the doubt, but I must come clean that I did not have vertical text in mind when writing that. I did mean mirroring, such as "A" being vertically mirrored to Ɐ (U+2C6F) or ∀ (U+2200). It may be interesting to consider rotation rules the way you thought I meant, but that is well outside the scope of this discussion. The only reasoning I can give besides "might as well" is that this would make it possible to use vertically-mirrored text in any plain-text application, such as IRC, instant messaging apps, online forums and so on. Basically the same reason for the proposed ASM character in the Second+Third Extensions. I just felt that it's a natural thing to bring up in this context. I was also curious about whether this topic ever came up before, because surely it must have! > Even if the extensionsʼ rationales are bolstered, I expect a highly > significant problem to remain irremediable: The extensions run > terribly afoul of some of the Unicode Design Principles (§ 2.2 of the > Core Specification), without compensatory benefit satisfying any of > the other Design Principles or some other significant consideration. > The primary issue is with the principle of Plain Text: Expansion of > mirroring to all visible characters (setting aside the question of > symmetry), as in the Third Natural Extension, or else to just the > directionally neutral ones, as more generally proposed, is a gimmickry > that litters plain text with markup for special effects generally > better served by higher‐level protocols or images. The wholesale > multiplication of superfluously homoglyphic encodings erodes the > principles of Efficiency and Unification (what characters are we > really minding when we mind our pʼs and qʼs?). I have nothing to add here, you're probably right on all of those points. My only rebuttal is that people want it, as I've mentioned under the subheader "Back To The Control Characters" and e.g. the existence of https://convertcase.net/mirror-text-generator/ , but I will happily accept that this is not reason enough to do something like this. However, there is one problem. Suppose the Core Proposal of DSM is accepted (undoubtedly after many changes), what does that mean for sequences where DSM is applied to a LTR letter character like "G" inside of a right-to-left override? Surely this can't remain "undefined" as in my draft's Core Proposal, and by Unicode's stability policies I have to assume that the meaning of such a sequence can never be changed in a future version of Unicode. Therefore, the question of whether such mirroring is **ever** to be encoded in Unicode must be decided at the same time as the Core Proposal's DSM -- at least if my interpretation of its application within override blocks is agreed upon. > So, letʼs return to the draftʼs Core Proposal. Since it applies to > rather a large repertoire of characters, the same problems occur: > Itʼs not clear why we need a plain‐text mechanism to specify (for > example) a reversed AMPERSAND or OCR BRANCH BANK IDENTIFICATION or > KANGXI RADICAL DRAGON or PLAYING CARD KING OF HEARTS or to sometimes > make members of such character pairs as MODIFIER LETTER ACUTE ACCENT & > MODIFIER LETTER GRAVE ACCENT or IDEOGRAPHIC DESCRIPTION CHARACTER > SURROUND FROM UPPER LEFT & IDEOGRAPHIC DESCRIPTION CHARACTER SURROUND > FROM UPPER RIGHT resemble one another. While applying to emoji the > draft doesnʼt explicitly address them, and so it ignores the fact that > emoji already have a burgeoning mechanism for specifying directional > orientation (which, incidentally, involves arrow characters), and itʼs > unclear how DIRECTIONALLY SUBJECT TO MIRRORING would interact with > that or with emoji ZWJ sequences in general, with pairs of regional > indicator symbols, or with emoji tag sequences. This is exactly the type of feedback I was asking for. The intended meaning of DSM is effectively "Combining Directional Operator Attribute". Perhaps that would even be a better name for it. It specifies that the preceding character is an operator, with an intended direction between the text preceding and following it, and that the operator's visual presentation needs to be adjusted accordingly if it happens to be in a RTL span of text. So if this character is used with characters where it makes no sense, like nearly all of the examples you gave, then what is a text renderer to make of it? It might be a nonsense sequence of codepoints, in which case it hardly matters how it renders (garbage in -> garbage out); Or, it was deliberately crafted to do *exactly* what it says on the tin: flip the character if it's in a RTL span. Who are we to question the text author? Zalgo text doesn't make any sense either, but Unicode encodes it and renderers do what they can to render it with best effort. And while Zalgo text can easily cause issues that may be considered bugs by some (one line of text causing a different line of text to be unreadable), it would take a rather contrived scenario to cause mayhem with the proposed DSM. > The original > discussion focused on arrow characters such as U+2192 RIGHTWARDS ARROW > and U+2190 LEFTWARDS ARROW commonly seen in ordinary text, and there > was some proffered justification for exploring a bidi‐mirroring > solution for a small set of such arrows, and accordingly I would have > expected any proposal with a chance of success to cover only such > specific characters with specific, articulated justification. I think listing a specific set of affected characters offers nothing but a way to guarantee that some will be missed. Unicode stability policies might then prevent such mistakes from ever being corrected. > (The draftʼs mention of FRACTION SLASH puzzles me because I am unaware of > bidi and mirroring issues involving inline or diagonal fractions; if a > mirrored fraction solidus is needed for at least one RTL script, the > draft should explain the need and explain why applying a > mirroring‐formatting character is a better solution than, say, > adopting a RIGHT-TO-LEFT FRACTION SLASH character.) Thanks. It was an off-the-cuff remark, and I agree that any non-arrow example deserves a deeper examination. My thinking and intuition was this: in a classically rendered simple fraction (which I can't render here, but examples abound at https://www.mathsisfun.com/proper-fractions.html ), the numerator is *above* the division line and the divisor is *below* that line. In a fraction with a slash such as "1/2", the slash plays the same role. If it were extended on both sides, the 1 would be above it and the 2 would be below it. In other words, the slash is angled such that the preceding text is above it and the following text is below it. In RTL text, therefore, the angle should be reversed so that the preceding text (now to its right instead of its left) is above it, and the following text (now to its left) is below it. This is how I see it. > On the subject of justification: The draft cites five “Real‐World Use > Cases” regarding arrow characters. Of them, the first three are said > to have been resolved using higher‐level protocols, Yes, but two of them are needlessly complex and potential sources of bugs just to get around this issue, especially when compared to what they would be if this was implemented or if arrows were bidi-mirroring to begin with. The other one can ignore the problem because it never uses a RTL layout. > and the fifth is > said to remain unresolved but to be inappropriate for the draftʼs > proposed mechanism. The fourth, which regards automatic replacement > of sequences such as ⟨-->⟩ with arrows, is said to be unresolvable > unless, essentially, the software engineers reëngineer the software to > achieve something that other software achieves — this does not sound > like a strong argument, particularly for a niche convenience feature > (if it is convenient — I am reminded of fora which transform the > sequence <RIGHT PARENTHESIS, COLON> into a sad face athwart my intent, > thereby giving me a sad face for real). Other solutions abound, for > example: > • not making the replacement > • providing users with insert‐arrow buttons > • replacing only the hyphen‐minuses with a character serving as the > arrow stem, such as U+23AF HORIZONTAL LINE EXTENSION (e.g., ⟨A-->B & > א-->ב⟩ → ⟨A⎯>B & א⎯>ב⟩) I share your frustration with unwanted text replacements, but this is the decision of that particular forum software's developers. I consider this to be a markup feature, i.e. it is assumed that when the user types "-->" they intend for the Markdown-based rendering to change it into an arrow. This precludes the first two suggested alternatives. The last suggestion might technically work, but to achieve a visually pleasing result would require specially crafting a new font particularly for this endeavor. This is absurd for an organization that does not specialize in font development. As for "reëngineer the software to achieve something that other software achieves", I really don't see it that way. Even if this was true, it would still be a huge amount of code (= a huge amount of potential bugs) just to get around this one Unicode quirk. The other examples which had been resolved using "higher-level protocols" (read: extra program code specifically meant to handle this missing Unicode option) have the advantage of knowing the actual context in which the arrow will be placed, as it's always the same. Discourse does not have this advantage because the arrow can be placed anywhere within user text. It would need orders of magnitude more code, and the introduction of a 160 kB dependency ( https://www.npmjs.com/package/bidi-js ) to achieve the same thing... When the text renderer already has all the information it needs to solve it, inaccessible to appliaction code. > Though rooted in technical details, the draft leaves some significant > technical issues unaddressed. It proposes DSM as a “combining > formatting character” — so, general category of Mn/nonspacing mark > (like CGJ), I suppose (given the proposed behavior amid combining > marks), rather than Cf/format control? What would be the combining > class and the impact on normalization? I think Mn makes sense. I thought questions like this would be points for discussion here. > Another technical concern, reaching far beyond the technical: What > happens if DSM is used but not supported? If itʼs a default ignorable > code point (like CGJ and most formatting characters) but supported for > you, you could compose something that renders as ⟨א ← ב⟩ and find it > satisfactory, only for it to render as ⟨א → ב⟩ for your > tech‐deficient readership, possibly fomenting disaster. If DSM is not > default ignorable and not supported, then your readership may instead > get something like ⟨א →⎕ ב⟩; while the unsupported‐character symbol > is a hint thereʼs something wrong in the rendering (to readers who > recognize it), usually it suggests that thereʼs a glyph missing in its > place, not that a neighboring character is represented by the wrong > glyph (or by a merely questionable glyph, as for the same readers a > left‐to‐right ⟨A → B⟩ might render as ⟨A →⎕ B⟩). For my money, the > notion that lack of support will not merely obscure meanings (as is to > be expected) but actually invert intended meanings is a fatal flaw. This is a very valid concern which I share. Whether it is a "fatal" flaw or just a flaw, is in my opinion up for debate. I would definitely prefer it to be not default ignorable, as a "missing character" indication is a good hint. Some mitigating points: 1. Correct me if I'm wrong about this, but thanks to emojis, many platforms and software are eager to adopt new versions of Unicode as soon as possible, which is excellent for minimizing compatibility issues like this. 2. This is mainly intended to be used programmatically or in templates, not by authors directly, so developers (as always) should ensure that the target platform supports whichever features it uses. I believe this is a similar situation. > Despite these misgivings, I am sympathetic to the effort. I hope this > critique is of some use in the quest for a solution. Thank you. - Nitai Sasson
