The Euro symbol (€) is encoded as the 3 bytes e2 82 ac in UTF-8. 
The current tr implementation treats each byte separately, so:

1. When we do tr € E, it's actually mapping:
  - byte e2 → E
  - byte 82 → E
  - byte ac → E

So the 3-byte Euro symbol becomes 3 Es → EEE

2. When we do tr E €, it's trying to map:
  - byte E (0x45) → byte e2 (first byte of Euro)

But this creates invalid UTF-8, hence the replacement character �.

so, it is the expected behavior.
GNU coreutils is doing the same


$ echo € | gnutr € E
EEE

$ echo E | gnutr E €
�

** Changed in: rust-coreutils (Ubuntu)
       Status: New => Invalid

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2137532

Title:
  tr handles UTF-8 wrong

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/rust-coreutils/+bug/2137532/+subscriptions


-- 
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to