Hi, 
Following the discussion from April about arrow characters, I'm working on a 
draft for an upcoming email to explain the situation and proposal a lot better. 
It will also serve as a draft for a formal proposal later on. 

While writing it, I'm making heavy use of Unicode Utilities: 
https://util.unicode.org/UnicodeJsps/

I've encountered some issues, annoyances and nitpicks with these utilities that 
I hope can be addressed. Are they open-source? If so, I might even contribute 
these improvements myself.

Character Properties

This tool shows all Unicode properties for a given character: 
https://util.unicode.org/UnicodeJsps/character.jsp?a=0028
Clicking on a particular value (e.g. "Open" for the property 
Bidi_Paired_Bracket_Type) shows the set of all characters that share that 
property value: 
https://util.unicode.org/UnicodeJsps/list-unicodeset.jsp?a=[:Bidi_Paired_Bracket_Type=Open:]

Issues: 
1. Clicking on a property name leads nowhere, presumably after a website 
restructure. It links to e.g. 
https://util.unicode.org/UnicodeJsps/properties.jsp?a=Bidi_Class#Bidi_Class but 
this does not show anything related to this property.
2. Clicking on a property value that contains a single character (e.g. ")" for 
Bidi_Mirroring_Glyph) should open that character in the Character Properties 
utility, not the set of characters that share this property value (which is 
often just the character we came from). Or perhaps there could be a separate 
"inspect this character" button, so existing behavior remains the same. 
3. Missing values (null) should also be clickable to see all character which do 
not have a value for that property. This may not currently be supported by the 
UnicodeSet utility.

UnicodeSet

https://util.unicode.org/UnicodeJsps/list-unicodeset.jsp
Issues: 
1. As just alluded, I could not find a way to find characters that don't have a 
property. For example, I want to find all characters with Bidi_Mirrored=Yes but 
without any value for Bidi_Mirroring_Glyph. Best I can tell, this is a missing 
feature. 
2. It's unclear what these options do: Abbreviate, Collate, UCD format, Escape, 
Group by, Info. I've tried to understand them by experimentation, but the only 
one I'm confident I understand is Group by. Could they be explained within the 
tool page somehow? 

BIDI (UBA)

These are two tools that do the same thing but are implemented differently and 
support different versions of Unicode: 
"bidi" is implemented in Java and only supports Unicode 6.2. 
https://util.unicode.org/UnicodeJsps/bidi.jsp
"bidi-c" is implemented in C and supports Unicode versions 6.2 through 14.0. 
https://util.unicode.org/UnicodeJsps/bidic.jsp

Issues: 
1. The older "bidi" version is outdated. It should more aggressively push users 
to switch over to the newer implementation. Right now there is some text about 
this but it's easy to miss.
2. The list of utilities in the top banner lists "bidi" before "bidi-c" - 
again, it should direct users to the more updated utility instead of the older 
one.
3. Similarly, all links to "bidi" from elsewhere in the Unicode website should 
link to "bidi-c" instead. In particular: 
https://www.unicode.org/reports/tr41/tr41-34.html#Demo9
4. Bidi mirroring is not displayed. Example: 
https://util.unicode.org/UnicodeJsps/bidic.jsp?s=%D7%90%3C%D7%91&b=2&u=140&d=2 
, the rendering is "א<ב" as shown in the only box. The "Reordered Display" 
table should use <td dir="rtl"> for characters with odd embedding level to make 
them mirror appropriately. To go the extra mile, an indication can be added 
that a character has been mirrored in a new row to the table. I can create a 
mockup if this description is unclear.

Thanks, 
Nitai

(P.S. if anyone wants to see the draft I mentioned, I'd be happy to share it at 
this point. It's not quite ready to be emailed out here just yet.)


Reply via email to