My presentation from XML Prague this year should also cover this aspect about conversions between bytes on disk and characters.

https://www.youtube.com/watch?v=JDOEMQD32Ss

Regards,
Radu

Radu Coravu
<oXygen/>  XML Editor, Schema Editor and XSLT Editor/Debugger
http://www.oxygenxml.com

On 2/19/2018 5:28 PM, Eliot Kimber wrote:
The (hex.) column is the UTF-8 encoding of the character, that is, the
sequence of bytes.



The actual Unicode character number is the value in the first column,
e.g., \u2190.



So you should be able to type 2190 and get the character you want.



Unicode is the character set and the character numbers (code points) are
independent of how the characters are encoded.



The encoding is how the characters are translated to bytes when written
as a byte sequence.



The Unicode standard defines a number of encodings, including UTF-8 and
UTF-16.



So there are not “UTF-8 characters”, only UTF-8 encodings of Unicode
characters.



The UTF-8 encoding was designed so that it is identical to ASCII for the
first 127 or 255 characters (depending on which version of ASCII you’re
looking at). But after character 255 it takes at least 3 bytes to encode
a character.



Cheers,



E.

--

Eliot Kimber

http://contrext.com





*From: *oXygen-user <oxygen-user-boun...@oxygenxml.com> on behalf of
Bernhard Kleine <bernhard.kle...@gmx.net>
*Date: *Monday, February 19, 2018 at 9:17 AM
*To: *<oxygen-user@oxygenxml.com>
*Subject: *Re: [oXygen-user] How to type an UTF8 symbol in text as well
as in author mode



The UTF8 table at
http://www.utf8-zeichentabelle.de/unicode-utf8-table.pl?start=8592shows
this first four lines.

*Unicode
Codepos.*

        

*Zeichen*

        

*UTF-8
(hex.)*

        

*Name*

U+2190

        

←

        

e2 86 90

        

LEFTWARDS ARROW

U+2191

        

↑

        

e2 86 91

        

UPWARDS ARROW

U+2192

        

→

        

e2 86 92

        

RIGHTWARDS ARROW

U+2193

        

↓

        

e2 86 93

        

DOWNWARDS ARROW

When I tried to change a utf8 hex value in a simple doc, using
Ctrl-Shift-X, I get:

cid:part1.482BC927.158A9BBD@gmx.net

(not a valid hexadecimal sequence to change)

I also tried the 0x1F926 from Bens example below. The same error. What
do I wrong?

These arrows would be a good example since they will be used.

Regards

Bernhard



Am 19.02.2018 um 10:03 schrieb Oxygen XML Editor Support (Radu Coravu) :

    Hi,

    Thanks for the reminder Ben.
    Indeed I forgot about this feature in Oxygen:

    
https://www.oxygenxml.com/doc/versions/19.1/ug-editor/topics/text-mode-actions.html#text-mode-actions__convert-hex-sequence

    which basically allows you to type away the hex digits in Oxygen and
    then invoke the special "Convert Hexadecimal Sequence to Character"
    action.

    Regards,
    Radu

    Radu Coravu
    <oXygen/>  XML Editor, Schema Editor and XSLT Editor/Debugger
    http://www.oxygenxml.com

    On 2/19/2018 10:56 AM, Ben McGinnes wrote:

        On Mon, Feb 19, 2018 at 09:33:28AM +0200, Oxygen XML Editor
        Support (Radu Coravu)  wrote:

            Hi Bernhard,

            It seems that for "nbsp" which has the decimal equivalent
            "160" you would
            need to type "ALT" and then "0160", that leading "0" seems
            to be important.
            The same probably for all other characters, type their
            decimal equivalent
            but it needs to be four typed figures.


        Oh, how quickly we forget certain things.  :)

        oXygen has had the ability to enter UTF-8 characters in the first
        plane by their four character hexadecimal code point value since
        version 17.1.  I can't recall what the default hotkey is for
        invoking
        it because I changed mine (back) to F8 as soon as I installed that
        version.  I believe I've still got the plugin you guys provided me
        during my trial period for 17.0.

        Anyway, if Bernhard is happy with using hex instead of int,
        that's the
        solution instead of the Windows alt sequences (or the Mac
        alt/option
        sequences either, for that matter).

        ....



        bash-4.4$ unum.pl 0x1f926
           Octal  Decimal      Hex        HTML    Character   Unicode
         0374446   129318  0x1F926   &#129318;    "🤦"         FACE PALM
        bash-4.4$

        Obviously some of us can see that character properly and some
        can't,
        but you all know which it is.


        Regards,
        Ben



        _______________________________________________
        oXygen-user mailing list
        oXygen-user@oxygenxml.com<mailto:oXygen-user@oxygenxml.com>
        https://www.oxygenxml.com/mailman/listinfo/oxygen-user

    _______________________________________________
    oXygen-user mailing list
    oXygen-user@oxygenxml.com<mailto:oXygen-user@oxygenxml.com>
    https://www.oxygenxml.com/mailman/listinfo/oxygen-user



--

spitzhalde9

D-79853 lenzkirch

bernhard.kle...@gmx.net<mailto:bernhard.kle...@gmx.net>

www.b-kleine.com<http://www.b-kleine.com>,
www.urseetal.net<http://www.urseetal.net>

-

thunderbird mit enigmail

GPG schlüssel: D5257409

fingerprint:

08 B7 F8 70 22 7A FC C1 15 49 CA A6 C7 6F A0 2E D5 25 74 09

_______________________________________________ oXygen-user mailing list
oXygen-user@oxygenxml.com
https://www.oxygenxml.com/mailman/listinfo/oxygen-user



_______________________________________________
oXygen-user mailing list
oXygen-user@oxygenxml.com
https://www.oxygenxml.com/mailman/listinfo/oxygen-user



_______________________________________________
oXygen-user mailing list
oXygen-user@oxygenxml.com
https://www.oxygenxml.com/mailman/listinfo/oxygen-user

Reply via email to