RE: RFC: controlling bidirectional mirroring of characters

Nitai Sasson via Unicode Mon, 19 Jan 2026 13:46:31 -0800

On Friday, 2 January 2026 at 05:49, Doug Ewell via Unicode 
<[email protected]> wrote:

> I read the proposal, and I can’t help worrying about the potential security 
> implications of left arrows that look like right arrows and vice versa.
>
> I know there is a three-sentence “Security” section at the bottom of the 
> proposal, which basically says to denylist the proposed control character(s) 
> in domains where such a facility exists (like IDN), but for commonplace 
> characters like arrows, I can imagine many additional opportunities for 
> troublemakers to make trouble. UTS #55 in particular might need several new 
> examples; there are programming languages that use Unicode arrows.
>
> I am certain that someone with better knowledge of security and (especially) 
> bidi will be along shortly to show how wrong I am.
>
> --
> Doug Ewell, CC, ALB | Lakewood, CO, US | ewellic.org

Thanks. I am also hoping to hear more informed opinions on that.

For programming languages, if bidirectional or RTL text is allowed in source 
code (especially outside of comments and strings), it's already a disaster. I 
am not very familiar with UTS #55 but I will definitely familiarize myself with 
it. Thank you for pointing it out. From my initial impressions though, the 
proposed character(s) shouldn't create any new concerns, especially if they're 
only allowed in string literals and comments. That said, I think you are right 
that examples would need to be added.

On Tuesday, 30 December 2025 at 02:45, Erik Carvalhal Miller via Unicode 
<[email protected]> wrote:

> I followed the discussion in April with interest, and I congratulate
> you on your draft, for it shows a lot of thought and some creative
> problem‐solving. That said, Iʼm about to rip it apart.

Just what I was hoping for :)

> The draft explicitly anticipates pushback on the “extensions”, and
> thatʼs a good place to start.

Naturally there is a lot more to say about them than the core proposal, but I 
am worried about putting the cart before the horse.

> The customary images of proposed characters in real‐world use are absent [..]

You're right. I think even calling this a "draft" was optimistic on my part, as 
I wrote it mainly to introduce the idea for discussion here. I am not familiar 
enough with Unicode processes and proposals to competently write a proposal 
right now, even for an early draft, I wanted to first make sure there is any 
chance at all for it to be considered before committing to writing it and 
learning all that is necessary.

I originally intended to send the whole thing here as an email but decided in 
the end to send a link to it.

> [..] Instead, the draft resorts to rationales such as
> “inoffensive” and “[w]hy not[…]?”, which are hardly proactive,
> compelling arguments. The First Natural Extension (re NEVER SUBJECT
> TO MIRRORING) indeed argues eloquently against itself, telling us its
> intended effect is already available in Unicode (as LRI…PDI).

Indeed, that one is a bit silly when taken in isolation, but may be useful 
inside RLO…PDF, and especially with the "Arguably Unnatural Extension" of 
applying it to an entire span of directional override.

> The Second Natural Extension isnʼt forthcoming about a similar problem
> with ALWAYS SUBJECT TO MIRRORING (think: RLI…PDI)

Do you mean, for example, "RLI (left arrow) DSM PDI" to always make a right 
arrow? I haven't thought of that before.

> and itʼs a mystery what useful functionality REVERSED SUBJECT TO MIRRORING 
> and INVERSE
> SUBJECT TO MIRRORING provide

This is mostly the overly-analytical parts of my brain trying to cover every 
use case. I thought it would be best to put it out there and let it be shot 
down (or even shoot it down myself) than to leave it unsaid, because the 
previous parts sort of create a truth table that can be filled up. So this part 
just fills it up so there's no holes. It's a silly part and I'll be happy to 
see it go.

> Regarding the Final Potential Extensionʼs VERTICALLY
> SUBJECT TO MIRRORING, I laud the draft for remembering vertical text
> (though here itʼs about rotation, not mirroring, right?), but again
> thereʼs no attempt to convince beyond an unconvincing “might as well”.

I thank you for giving me what I assume to be the benefit of the doubt, but I 
must come clean that I did not have vertical text in mind when writing that. I 
did mean mirroring, such as "A" being vertically mirrored to Ɐ (U+2C6F) or ∀ 
(U+2200).

It may be interesting to consider rotation rules the way you thought I meant, 
but that is well outside the scope of this discussion.

The only reasoning I can give besides "might as well" is that this would make 
it possible to use vertically-mirrored text in any plain-text application, such 
as IRC, instant messaging apps, online forums and so on. Basically the same 
reason for the proposed ASM character in the Second+Third Extensions.

I just felt that it's a natural thing to bring up in this context. I was also 
curious about whether this topic ever came up before, because surely it must 
have!

> Even if the extensionsʼ rationales are bolstered, I expect a highly
> significant problem to remain irremediable: The extensions run
> terribly afoul of some of the Unicode Design Principles (§ 2.2 of the
> Core Specification), without compensatory benefit satisfying any of
> the other Design Principles or some other significant consideration.
> The primary issue is with the principle of Plain Text: Expansion of
> mirroring to all visible characters (setting aside the question of
> symmetry), as in the Third Natural Extension, or else to just the
> directionally neutral ones, as more generally proposed, is a gimmickry
> that litters plain text with markup for special effects generally
> better served by higher‐level protocols or images. The wholesale
> multiplication of superfluously homoglyphic encodings erodes the
> principles of Efficiency and Unification (what characters are we
> really minding when we mind our pʼs and qʼs?).

I have nothing to add here, you're probably right on all of those points. My 
only rebuttal is that people want it, as I've mentioned under the subheader 
"Back To The Control Characters" and e.g. the existence of 
https://convertcase.net/mirror-text-generator/ , but I will happily accept that 
this is not reason enough to do something like this.

However, there is one problem. Suppose the Core Proposal of DSM is accepted 
(undoubtedly after many changes), what does that mean for sequences where DSM 
is applied to a LTR letter character like "G" inside of a right-to-left 
override? Surely this can't remain "undefined" as in my draft's Core Proposal, 
and by Unicode's stability policies I have to assume that the meaning of such a 
sequence can never be changed in a future version of Unicode. Therefore, the 
question of whether such mirroring is **ever** to be encoded in Unicode must be 
decided at the same time as the Core Proposal's DSM -- at least if my 
interpretation of its application within override blocks is agreed upon.

> So, letʼs return to the draftʼs Core Proposal. Since it applies to
> rather a large repertoire of characters, the same problems occur:
> Itʼs not clear why we need a plain‐text mechanism to specify (for
> example) a reversed AMPERSAND or OCR BRANCH BANK IDENTIFICATION or
> KANGXI RADICAL DRAGON or PLAYING CARD KING OF HEARTS or to sometimes
> make members of such character pairs as MODIFIER LETTER ACUTE ACCENT &
> MODIFIER LETTER GRAVE ACCENT or IDEOGRAPHIC DESCRIPTION CHARACTER
> SURROUND FROM UPPER LEFT & IDEOGRAPHIC DESCRIPTION CHARACTER SURROUND
> FROM UPPER RIGHT resemble one another. While applying to emoji the
> draft doesnʼt explicitly address them, and so it ignores the fact that
> emoji already have a burgeoning mechanism for specifying directional
> orientation (which, incidentally, involves arrow characters), and itʼs
> unclear how DIRECTIONALLY SUBJECT TO MIRRORING would interact with
> that or with emoji ZWJ sequences in general, with pairs of regional
> indicator symbols, or with emoji tag sequences.

This is exactly the type of feedback I was asking for.

The intended meaning of DSM is effectively "Combining Directional Operator 
Attribute". Perhaps that would even be a better name for it. It specifies that 
the preceding character is an operator, with an intended direction between the 
text preceding and following it, and that the operator's visual presentation 
needs to be adjusted accordingly if it happens to be in a RTL span of text.

So if this character is used with characters where it makes no sense, like 
nearly all of the examples you gave, then what is a text renderer to make of 
it? It might be a nonsense sequence of codepoints, in which case it hardly 
matters how it renders (garbage in -> garbage out); Or, it was deliberately 
crafted to do *exactly* what it says on the tin: flip the character if it's in 
a RTL span. Who are we to question the text author? Zalgo text doesn't make any 
sense either, but Unicode encodes it and renderers do what they can to render 
it with best effort. And while Zalgo text can easily cause issues that may be 
considered bugs by some (one line of text causing a different line of text to 
be unreadable), it would take a rather contrived scenario to cause mayhem with 
the proposed DSM.

> The original
> discussion focused on arrow characters such as U+2192 RIGHTWARDS ARROW
> and U+2190 LEFTWARDS ARROW commonly seen in ordinary text, and there
> was some proffered justification for exploring a bidi‐mirroring
> solution for a small set of such arrows, and accordingly I would have
> expected any proposal with a chance of success to cover only such
> specific characters with specific, articulated justification.

I think listing a specific set of affected characters offers nothing but a way 
to guarantee that some will be missed. Unicode stability policies might then 
prevent such mistakes from ever being corrected.

> (The draftʼs mention of FRACTION SLASH puzzles me because I am unaware of
> bidi and mirroring issues involving inline or diagonal fractions; if a
> mirrored fraction solidus is needed for at least one RTL script, the
> draft should explain the need and explain why applying a
> mirroring‐formatting character is a better solution than, say,
> adopting a RIGHT-TO-LEFT FRACTION SLASH character.)

Thanks. It was an off-the-cuff remark, and I agree that any non-arrow example 
deserves a deeper examination.

My thinking and intuition was this: in a classically rendered simple fraction 
(which I can't render here, but examples abound at 
https://www.mathsisfun.com/proper-fractions.html ), the numerator is *above* 
the division line and the divisor is *below* that line.

In a fraction with a slash such as "1/2", the slash plays the same role. If it 
were extended on both sides, the 1 would be above it and the 2 would be below 
it. In other words, the slash is angled such that the preceding text is above 
it and the following text is below it.

In RTL text, therefore, the angle should be reversed so that the preceding text 
(now to its right instead of its left) is above it, and the following text (now 
to its left) is below it.

This is how I see it.

> On the subject of justification: The draft cites five “Real‐World Use
> Cases” regarding arrow characters. Of them, the first three are said
> to have been resolved using higher‐level protocols,

Yes, but two of them are needlessly complex and potential sources of bugs just 
to get around this issue, especially when compared to what they would be if 
this was implemented or if arrows were bidi-mirroring to begin with. The other 
one can ignore the problem because it never uses a RTL layout.

> and the fifth is
> said to remain unresolved but to be inappropriate for the draftʼs
> proposed mechanism. The fourth, which regards automatic replacement
> of sequences such as ⟨-⁠-⁠>⟩ with arrows, is said to be unresolvable
> unless, essentially, the software engineers reëngineer the software to
> achieve something that other software achieves — this does not sound
> like a strong argument, particularly for a niche convenience feature
> (if it is convenient — I am reminded of fora which transform the
> sequence <RIGHT PARENTHESIS, COLON> into a sad face athwart my intent,
> thereby giving me a sad face for real). Other solutions abound, for
> example:
>  • not making the replacement
>  • providing users with insert‐arrow buttons
>  • replacing only the hyphen‐minuses with a character serving as the
> arrow stem, such as U+23AF HORIZONTAL LINE EXTENSION (e.g., ⟨A-->B &
> א-->ב⟩ → ⟨A⎯>B & א⎯>ב⟩)

I share your frustration with unwanted text replacements, but this is the 
decision of that particular forum software's developers. I consider this to be 
a markup feature, i.e. it is assumed that when the user types "-->" they intend 
for the Markdown-based rendering to change it into an arrow. This precludes the 
first two suggested alternatives.

The last suggestion might technically work, but to achieve a visually pleasing 
result would require specially crafting a new font particularly for this 
endeavor. This is absurd for an organization that does not specialize in font 
development.

As for "reëngineer the software to achieve something that other software 
achieves", I really don't see it that way. Even if this was true, it would 
still be a huge amount of code (= a huge amount of potential bugs) just to get 
around this one Unicode quirk. The other examples which had been resolved using 
"higher-level protocols" (read: extra program code specifically meant to handle 
this missing Unicode option) have the advantage of knowing the actual context 
in which the arrow will be placed, as it's always the same. Discourse does not 
have this advantage because the arrow can be placed anywhere within user text. 
It would need orders of magnitude more code, and the introduction of a 160 kB 
dependency ( https://www.npmjs.com/package/bidi-js ) to achieve the same 
thing... When the text renderer already has all the information it needs to 
solve it, inaccessible to appliaction code.

> Though rooted in technical details, the draft leaves some significant
> technical issues unaddressed. It proposes DSM as a “combining
> formatting character” — so, general category of Mn/nonspacing mark
> (like CGJ), I suppose (given the proposed behavior amid combining
> marks), rather than Cf/format control? What would be the combining
> class and the impact on normalization?

I think Mn makes sense. I thought questions like this would be points for 
discussion here.

> Another technical concern, reaching far beyond the technical: What
> happens if DSM is used but not supported? If itʼs a default ignorable
> code point (like CGJ and most formatting characters) but supported for
> you, you could compose something that renders as ⟨⁧א ← ב⁩⟩ and find it
> satisfactory, only for it to render as ⟨⁧א → ב⁩⟩ for your
> tech‐deficient readership, possibly fomenting disaster. If DSM is not
> default ignorable and not supported, then your readership may instead
> get something like ⟨⁧א →⁠⎕ ב⁩⟩; while the unsupported‐character symbol
> is a hint thereʼs something wrong in the rendering (to readers who
> recognize it), usually it suggests that thereʼs a glyph missing in its
> place, not that a neighboring character is represented by the wrong
> glyph (or by a merely questionable glyph, as for the same readers a
> left‐to‐right ⟨A → B⟩ might render as ⟨A →⁠⎕ B⟩). For my money, the
> notion that lack of support will not merely obscure meanings (as is to
> be expected) but actually invert intended meanings is a fatal flaw.

This is a very valid concern which I share. Whether it is a "fatal" flaw or 
just a flaw, is in my opinion up for debate. I would definitely prefer it to be 
not default ignorable, as a "missing character" indication is a good hint.

Some mitigating points:
1. Correct me if I'm wrong about this, but thanks to emojis, many platforms and 
software are eager to adopt new versions of Unicode as soon as possible, which 
is excellent for minimizing compatibility issues like this.
2. This is mainly intended to be used programmatically or in templates, not by 
authors directly, so developers (as always) should ensure that the target 
platform supports whichever features it uses. I believe this is a similar 
situation.

> Despite these misgivings, I am sympathetic to the effort. I hope this
> critique is of some use in the quest for a solution.

Thank you.

- Nitai Sasson

RE: RFC: controlling bidirectional mirroring of characters

Reply via email to