Re: [Github-comments] [geany/geany] fails to open Microsoft UTF-16LE file (MSO Word CUSTOM.DIC dictionary file) (#1238)

2016-09-19 Thread elextr
@b4n, checking the encoding is valid allows premature optimisers to use evil unchecked decoders like [this](https://gist.github.com/elextr/994dcf61b7f009297e229aba7f46f93d) :smile: -- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on G

Re: [Github-comments] [geany/geany] fails to open Microsoft UTF-16LE file (MSO Word CUSTOM.DIC dictionary file) (#1238)

2016-09-19 Thread elextr
> I'm not sure why we accept invalid UTF-8 (well, it's structurally valid, but > contains reserved code points), >From pickyweedia "Not decoding surrogate halves makes it impossible to store >invalid UTF-16, such as Windows filenames, as UTF-8. Therefore, detecting >these as errors is often not

Re: [Github-comments] [geany/geany] fails to open Microsoft UTF-16LE file (MSO Word CUSTOM.DIC dictionary file) (#1238)

2016-09-19 Thread Colomban Wendling
Reading the Wikipedia articles make me wonder about so-called "WTF-8", so I played with it and wrote that: https://github.com/b4n/wtf8tools Converting the file here to WTF-8 makes Geany able to open it just fine (like SciTE does), and it's convertible back to the original UTF-16. I'm not sure wh

Re: [Github-comments] [geany/geany] fails to open Microsoft UTF-16LE file (MSO Word CUSTOM.DIC dictionary file) (#1238)

2016-09-19 Thread Zenaan Harkness
On Mon, Sep 19, 2016 at 04:18:14AM -0700, elextr wrote: > > Should be easy, and should also be how the program is implemented. > > and how do you keep all this updated and in sync with changes to the > buffer as its edited? The layered model - the bottom layer is the text/ raw utf8 stream. The n

Re: [Github-comments] [geany/geany] fails to open Microsoft UTF-16LE file (MSO Word CUSTOM.DIC dictionary file) (#1238)

2016-09-19 Thread elextr
> Should be easy, and should also be how the program is implemented. and how do you keep all this updated and in sync with changes to the buffer as its edited? > At least, that's how a superior programmer would implement it ;) Most of the features you describe are handled by the Scintilla editi

Re: [Github-comments] [geany/geany] fails to open Microsoft UTF-16LE file (MSO Word CUSTOM.DIC dictionary file) (#1238)

2016-09-19 Thread Zenaan Harkness
On Mon, Sep 19, 2016 at 03:30:09AM -0700, Colomban Wendling wrote: > It's pretty messy but fair enough. However, we probably won't do > that, because being able to have a fixed encoding in the data we load > means that we have to handle encoding conversion in a single place, > instead of everywher

Re: [Github-comments] [geany/geany] fails to open Microsoft UTF-16LE file (MSO Word CUSTOM.DIC dictionary file) (#1238)

2016-09-19 Thread elextr
Closed #1238. -- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/geany/geany/issues/1238#event-793860304

Re: [Github-comments] [geany/geany] fails to open Microsoft UTF-16LE file (MSO Word CUSTOM.DIC dictionary file) (#1238)

2016-09-19 Thread Zenaan Harkness
On Mon, Sep 19, 2016 at 03:11:22AM -0700, Colomban Wendling wrote: > All I can imagine is that the file is broken, and the other editors > you try either truncate it, or are more forgiving and leaving the > invalid bytes as-is. To be fair, the only program I've used to edit this file (from memory)

Re: [Github-comments] [geany/geany] fails to open Microsoft UTF-16LE file (MSO Word CUSTOM.DIC dictionary file) (#1238)

2016-09-19 Thread Colomban Wendling
Actually I tested SciTE, which is kind enough to open the file without problem, and simply showing plain bytes for the invalid ones, making line 1738 (last one) look like this: ![l1738](https://cloud.githubusercontent.com/assets/793526/18629100/cbc62c9a-7e63-11e6-867e-cc052e495886.png) It's pret

Re: [Github-comments] [geany/geany] fails to open Microsoft UTF-16LE file (MSO Word CUSTOM.DIC dictionary file) (#1238)

2016-09-19 Thread Colomban Wendling
``` $ iconv -f UTF-16LE -t UTF-8 < CUSTOM-utf16le-2016.dic > CUSTOM-utf16le-2016.dic_utf8 iconv: illegal input sequence at position 34076 ``` Apparently that file has bytes `\xca \xde` near the end, and that doesn't seem to be a sequence accepted by iconv (so I'd guess it really is invalid). And

Re: [Github-comments] [geany/geany] fails to open Microsoft UTF-16LE file (MSO Word CUSTOM.DIC dictionary file) (#1238)

2016-09-19 Thread Zenaan Harkness
[CUSTOM-utf16le-2016.dic.zip](https://github.com/geany/geany/files/479602/CUSTOM-utf16le-2016.dic.zip) Take 4, finally - hope it works :) -- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/geany/geany/issues/1

Re: [Github-comments] [geany/geany] fails to open Microsoft UTF-16LE file (MSO Word CUSTOM.DIC dictionary file) (#1238)

2016-09-19 Thread Colomban Wendling
> Attached is the file. Unfortunately apparently GitHub drops attachments to email replies. Could you upload it through the web UI, or somewhere else online? -- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.c

Re: [Github-comments] [geany/geany] fails to open Microsoft UTF-16LE file (MSO Word CUSTOM.DIC dictionary file) (#1238)

2016-09-19 Thread Zenaan Harkness
On Mon, Sep 19, 2016 at 01:06:47AM -0700, elextr wrote: > Any error from the conversion is shown in menu->help->debug messages. Please > post it. This is the error message: 18:54:54: Geany INFO: Couldn't convert from UTF-16LE to UTF-8 (Invalid byte sequence in conversion input). Attached

Re: [Github-comments] [geany/geany] fails to open Microsoft UTF-16LE file (MSO Word CUSTOM.DIC dictionary file) (#1238)

2016-09-19 Thread elextr
Any error from the conversion is shown in menu->help->debug messages. Please post it. -- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/geany/geany/issues/1238#issuecomment-247932610

Re: [Github-comments] [geany/geany] fails to open Microsoft UTF-16LE file (MSO Word CUSTOM.DIC dictionary file) (#1238)

2016-09-19 Thread Colomban Wendling
Could you provide a sample file showing the issue? UTF-16LE (with or without BOM) certainly seems to work fine here, so there must be something wrong either with the Windows version we ship or with a specificity of this file. -- You are receiving this because you are subscribed to this thread.

Re: [Github-comments] [geany/geany] fails to open Microsoft UTF-16LE file (MSO Word CUSTOM.DIC dictionary file) (#1238)

2016-09-19 Thread Zenaan Harkness
I should certainly know better - thank you for reminding me that harsh wording can be taking in non constructive ways. Thanks for replying btw. Hopefully geany can one day recognize this particular file format properly. One way or another, it is not being recognized. NP++ and Akelpad both recogni

Re: [Github-comments] [geany/geany] fails to open Microsoft UTF-16LE file (MSO Word CUSTOM.DIC dictionary file) (#1238)

2016-09-18 Thread elextr
> even a bad rendering of a file (with my "bad" encoding choices") would be > better than Geany completely FAILING to open the file at all! The Geany buffer is UTF-8, so Geany will not open the file if the conversion fails or the result does not validate as UTF-8 since it would risk undefined

[Github-comments] [geany/geany] fails to open Microsoft UTF-16LE file (MSO Word CUSTOM.DIC dictionary file) (#1238)

2016-09-18 Thread Zenaan Harkness
Microsoft office/ MS Word (all versions) saves its custom dictionary spelling word lists in a text encoding that Microsoft Windows calls UTF16 or UTF-16, but which is more specifically described as UTF-16LE or "little endian", as distinct from UTF-16BE or "big endian". Geany 1.28 for Windows "(