The Euro symbol (€) is encoded as the 3 bytes e2 82 ac in UTF-8.
The current tr implementation treats each byte separately, so:
1. When we do tr € E, it's actually mapping:
- byte e2 → E
- byte 82 → E
- byte ac → E
So the 3-byte Euro symbol becomes 3 Es → EEE
2. When we do tr E €, it's trying to map:
- byte E (0x45) → byte e2 (first byte of Euro)
But this creates invalid UTF-8, hence the replacement character �.
so, it is the expected behavior.
GNU coreutils is doing the same
$ echo € | gnutr € E
EEE
$ echo E | gnutr E €
�
** Changed in: rust-coreutils (Ubuntu)
Status: New => Invalid
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2137532
Title:
tr handles UTF-8 wrong
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/rust-coreutils/+bug/2137532/+subscriptions
--
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs