@b4n, checking the encoding is valid allows premature optimisers to use evil
unchecked decoders like
[this](https://gist.github.com/elextr/994dcf61b7f009297e229aba7f46f93d) :smile:
--
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on G
> I'm not sure why we accept invalid UTF-8 (well, it's structurally valid, but
> contains reserved code points),
>From pickyweedia "Not decoding surrogate halves makes it impossible to store
>invalid UTF-16, such as Windows filenames, as UTF-8. Therefore, detecting
>these as errors is often not
Reading the Wikipedia articles make me wonder about so-called "WTF-8", so I
played with it and wrote that: https://github.com/b4n/wtf8tools
Converting the file here to WTF-8 makes Geany able to open it just fine (like
SciTE does), and it's convertible back to the original UTF-16.
I'm not sure wh
On Mon, Sep 19, 2016 at 04:18:14AM -0700, elextr wrote:
> > Should be easy, and should also be how the program is implemented.
>
> and how do you keep all this updated and in sync with changes to the
> buffer as its edited?
The layered model - the bottom layer is the text/ raw utf8 stream.
The n
> Should be easy, and should also be how the program is implemented.
and how do you keep all this updated and in sync with changes to the buffer as
its edited?
> At least, that's how a superior programmer would implement it ;)
Most of the features you describe are handled by the Scintilla editi
On Mon, Sep 19, 2016 at 03:30:09AM -0700, Colomban Wendling wrote:
> It's pretty messy but fair enough. However, we probably won't do
> that, because being able to have a fixed encoding in the data we load
> means that we have to handle encoding conversion in a single place,
> instead of everywher
Closed #1238.
--
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/geany/geany/issues/1238#event-793860304
On Mon, Sep 19, 2016 at 03:11:22AM -0700, Colomban Wendling wrote:
> All I can imagine is that the file is broken, and the other editors
> you try either truncate it, or are more forgiving and leaving the
> invalid bytes as-is.
To be fair, the only program I've used to edit this file (from memory)
Actually I tested SciTE, which is kind enough to open the file without problem,
and simply showing plain bytes for the invalid ones, making line 1738 (last
one) look like this:
![l1738](https://cloud.githubusercontent.com/assets/793526/18629100/cbc62c9a-7e63-11e6-867e-cc052e495886.png)
It's pret
```
$ iconv -f UTF-16LE -t UTF-8 < CUSTOM-utf16le-2016.dic >
CUSTOM-utf16le-2016.dic_utf8
iconv: illegal input sequence at position 34076
```
Apparently that file has bytes `\xca \xde` near the end, and that doesn't seem
to be a sequence accepted by iconv (so I'd guess it really is invalid).
And
[CUSTOM-utf16le-2016.dic.zip](https://github.com/geany/geany/files/479602/CUSTOM-utf16le-2016.dic.zip)
Take 4, finally - hope it works :)
--
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/geany/geany/issues/1
> Attached is the file.
Unfortunately apparently GitHub drops attachments to email replies. Could you
upload it through the web UI, or somewhere else online?
--
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.c
On Mon, Sep 19, 2016 at 01:06:47AM -0700, elextr wrote:
> Any error from the conversion is shown in menu->help->debug messages. Please
> post it.
This is the error message:
18:54:54: Geany INFO: Couldn't convert from UTF-16LE to UTF-8
(Invalid byte sequence in conversion input).
Attached
Any error from the conversion is shown in menu->help->debug messages. Please
post it.
--
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/geany/geany/issues/1238#issuecomment-247932610
Could you provide a sample file showing the issue? UTF-16LE (with or without
BOM) certainly seems to work fine here, so there must be something wrong either
with the Windows version we ship or with a specificity of this file.
--
You are receiving this because you are subscribed to this thread.
I should certainly know better - thank you for reminding me that harsh
wording can be taking in non constructive ways.
Thanks for replying btw.
Hopefully geany can one day recognize this particular file format
properly. One way or another, it is not being recognized. NP++ and
Akelpad both recogni
> even a bad rendering of a file (with my "bad" encoding choices") would be
> better than Geany completely FAILING to open the file at all!
The Geany buffer is UTF-8, so Geany will not open the file if the conversion
fails or the result does not validate as UTF-8 since it would risk undefined
Microsoft office/ MS Word (all versions) saves its custom dictionary spelling
word lists in a text encoding that Microsoft Windows calls UTF16 or UTF-16, but
which is more specifically described as UTF-16LE or "little endian", as
distinct from UTF-16BE or "big endian".
Geany 1.28 for Windows "(
18 matches
Mail list logo