Public bug reported:

Affected version: 3.22.0

I'm trying to open a certain text file. Unsure of the exact encoding
used, I viewed another text file in the same folder (part of the same
thing) and had GEdit auto-detect the encoding as UTF-16. Viewing the
file in a hex editor this seems to indeed be the case. The other file
contains a lot of CJK characters, while this file contains very little
(mostly ASCII english, with a few special symbols). You can see it in
the file structure; almost every even byte is a zero byte (0x00).

GEdit fails to open the file with the message 
Could not open the file “/hdd/programs/thd2/resource/addon_english.txt”.
Unexpected error: Invalid byte sequence in conversion input

I first figured the problem was with the text file. So I tried 'fixing'
the file by converting to its own encoding, ignoring invalid sequences,
using 'iconv' tool.

$ iconv -c -f 'UTF-16' -t 'UTF-16' addon_english.txt > addon_english_fixed.txt 
$ sha1sum addon_english.txt
e0e9f360482f2f234e5aeb09406c10081ebb6e1a  addon_english.txt
$ sha1sum addon_english_fixed.txt
e0e9f360482f2f234e5aeb09406c10081ebb6e1a  addon_english_fixed.txt

As you can clearly see, nothing changed. Therefore I'm suspecting something's 
wrong with gedit here. 
As an aside, other editors also don't like this file much: 

GNU nano won't open it by default. 
vim will open it, but can't display all the characters in it (probably han 
unification issues). 
leafpad will nuke the contents replacing it with a literal ASCII Byte-order 
mark. (A BOM as rendered in Latin-1). 

My locale settings are EN-GB for language and UTF-8 for preferred
charset used by the OS itself.


The file in question has been attached to this bug report for bug reproduction 
purposes.

** Affects: gedit (Ubuntu)
     Importance: Undecided
         Status: New

** Attachment added: "addon_english.txt"
   
https://bugs.launchpad.net/bugs/1671512/+attachment/4834665/+files/addon_english.txt

** Attachment removed: "addon_english.txt"
   
https://bugs.launchpad.net/ubuntu/+source/gedit/+bug/1671512/+attachment/4834665/+files/addon_english.txt

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1671512

Title:
  Gedit fails to read UTF-16 encoded file

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/gedit/+bug/1671512/+subscriptions

-- 
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to