[Github-comments] Re: [geany/geany] Various encodings conversion fixes (PR #3716)
@b4n the only one that I would question is the one that @eht16 had questioned, so I was waiting until that was decided. :-P "Data contains NULs" is referring to the result of the conversion, but the user doesn't know that, they only have the input file and it may not contain NULs, eg it might have an overlong encoding of a nul. So "Result of conversion to UTF-8 buffer contains NULs" might be better. -- Reply to this email directly or view it on GitHub: https://github.com/geany/geany/pull/3716#issuecomment-2068425388 You are receiving this because you are subscribed to this thread. Message ID:
[Github-comments] Re: [geany/geany] Various encodings conversion fixes (PR #3716)
@elextr Tell me if there are still strings to be updated or better error to report, I can make another follow-up :wink: -- Reply to this email directly or view it on GitHub: https://github.com/geany/geany/pull/3716#issuecomment-2068175821 You are receiving this because you are subscribed to this thread. Message ID:
[Github-comments] Re: [geany/geany] Various encodings conversion fixes (PR #3716)
Merged #3716 into master. -- Reply to this email directly or view it on GitHub: https://github.com/geany/geany/pull/3716#event-12551817780 You are receiving this because you are subscribed to this thread. Message ID:
[Github-comments] Re: [geany/geany] Various encodings conversion fixes (PR #3716)
@b4n commented on this pull request. > geany_debug("Couldn't convert from %s to UTF-8.", > charset); + g_set_error(error, G_CONVERT_ERROR, G_CONVERT_ERROR_ILLEGAL_SEQUENCE, + _("Data contains NULs")); Yes, if the conversion didn't fail (`conv_error == NULL`), it means the conversion was OK, but `g_utf8_validate()` failed, which can only happen on NULs if the data is otherwise UTF-8 (which it ought to be, unless there's a glaring bug in `g_convert()` yielding invalid data without reporting an error). -- Reply to this email directly or view it on GitHub: https://github.com/geany/geany/pull/3716#discussion_r1573864000 You are receiving this because you are subscribed to this thread. Message ID:
[Github-comments] Re: [geany/geany] Various encodings conversion fixes (PR #3716)
@eht16 commented on this pull request. > geany_debug("Couldn't convert from %s to UTF-8.", > charset); + g_set_error(error, G_CONVERT_ERROR, G_CONVERT_ERROR_ILLEGAL_SEQUENCE, + _("Data contains NULs")); Do we always know for sure the error here is exclusively NUL bytes? -- Reply to this email directly or view it on GitHub: https://github.com/geany/geany/pull/3716#pullrequestreview-2013391366 You are receiving this because you are subscribed to this thread. Message ID:
[Github-comments] Re: [geany/geany] Various encodings conversion fixes (PR #3716)
There are actually some encodings which seem not support on Windows: ``` 15:31:32.356445: Geany INFO : Encoding ISO-8859-10 is not supported by the system 15:31:32.356445: Geany INFO : Encoding ISO-8859-14 is not supported by the system 15:31:32.356445: Geany INFO : Encoding ISO-8859-16 is not supported by the system 15:31:32.356445: Geany INFO : Encoding UTF-7 is not supported by the system 15:31:32.356445: Geany INFO : Encoding ARMSCII-8 is not supported by the system 15:31:32.356445: Geany INFO : Encoding EUC-TW is not supported by the system 15:31:32.356445: Geany INFO : Encoding GEORGIAN-ACADEMY is not supported by the system 15:31:32.356445: Geany INFO : Encoding HZ is not supported by the system 15:31:32.356445: Geany INFO : Encoding ISO-IR-111 is not supported by the system 15:31:32.356445: Geany INFO : Encoding TCVN is not supported by the system 15:31:32.356445: Geany INFO : Encoding TIS-620 is not supported by the system 15:31:32.356445: Geany INFO : Encoding VISCII is not supported by the syste ``` For the rest, I did some basic testing on Linux and works as expected and by inspection it looks (almost) fine. Even if there might be subtile bugs on edge cases, I would merge this in master already to get broader testing of the changes. -- Reply to this email directly or view it on GitHub: https://github.com/geany/geany/pull/3716#issuecomment-2068028879 You are receiving this because you are subscribed to this thread. Message ID:
[Github-comments] Re: [geany/geany] Various encodings conversion fixes (PR #3716)
@elextr I'll pour it as soon as this gets merged ;) (unless I find out there's no dependency between the two, in case I can get those to cure in parallel) -- Reply to this email directly or view it on GitHub: https://github.com/geany/geany/pull/3716#issuecomment-1853012073 You are receiving this because you are subscribed to this thread. Message ID:
[Github-comments] Re: [geany/geany] Various encodings conversion fixes (PR #3716)
> All this commenting is for the future PR with it, no point in discussing thin > air, it's confusing for no reason ;) Ok, waiting for separate concrete :smile: -- Reply to this email directly or view it on GitHub: https://github.com/geany/geany/pull/3716#issuecomment-1852998898 You are receiving this because you are subscribed to this thread. Message ID:
[Github-comments] Re: [geany/geany] Various encodings conversion fixes (PR #3716)
> But I disagree that its the same as the encoding selection in the open dialog I'm not saying it's the same, just that we have the code for generating the combo box with the additional item as used in that dialog, so I don't even have to add this. All this commenting is for the future PR with it, no point in discussing thin air, it's confusing for no reason ;) -- Reply to this email directly or view it on GitHub: https://github.com/geany/geany/pull/3716#issuecomment-1852989850 You are receiving this because you are subscribed to this thread. Message ID:
[Github-comments] Re: [geany/geany] Various encodings conversion fixes (PR #3716)
> @elextr it's already in the open dialog, and I'm not gonna move anything. I'm > just gonna merge Use fixed encoding when opening non-Unicode files checkbox > and Default encoding (existing non-Unicode files) combo, only keeping the > combo and having an additional Detect from file item in it which takes over > the current Use fixed encoding when opening non-Unicode files checkbox role. Ok, that one is cosmetic, removing a checkbox and adding a do nothing combo setting instead. But I disagree that its the same as the encoding selection in the open dialog, the open dialog sets encoding to use for _all_ files, the one in prefs sets it for _non-unicode_ files IIUC. I didn't check the code, so if they in fact work the same then there is a bug because one of the functions is missing, or it was just described wrong and then the labels need fixing to say the right thing. Don't forget the manual if changing UI. -- Reply to this email directly or view it on GitHub: https://github.com/geany/geany/pull/3716#issuecomment-1852982553 You are receiving this because you are subscribed to this thread. Message ID:
[Github-comments] Re: [geany/geany] Various encodings conversion fixes (PR #3716)
> I count the more proper error message as cosmetic compared to the other > fixes. And as you may have guessed I see them as fixes, so of course I say all in together :wink: Will check the strings as soon as I find my purple editing pencil. -- Reply to this email directly or view it on GitHub: https://github.com/geany/geany/pull/3716#issuecomment-1852959563 You are receiving this because you are subscribed to this thread. Message ID:
[Github-comments] Re: [geany/geany] Various encodings conversion fixes (PR #3716)
@elextr it's already in the open dialog, and I'm not gonna move anything. I'm just gonna merge *Use fixed encoding when opening non-Unicode files* checkbox and *Default encoding (existing non-Unicode files)* combo, only keeping the combo and having an additional *Detect from file* item in it which takes over the current *Use fixed encoding when opening non-Unicode files* checkbox role. -- Reply to this email directly or view it on GitHub: https://github.com/geany/geany/pull/3716#issuecomment-1852957174 You are receiving this because you are subscribed to this thread. Message ID:
[Github-comments] Re: [geany/geany] Various encodings conversion fixes (PR #3716)
> Use fixed encoding when opening non-Unicode files. It's useful, but we have > code for adding a Detect from file entry directly in the combo box, as used > in the file Open dialog. Using this there as well instead of having a > separate checkbox strikes me as an improvement: simpler UI with less widgets > and less code. Well, IIUC the "use fixed encoding" is intended to save the user from having to switch the forced encoding on the open dialog from Unicode to fixed all the time, like on a Windows machine with a non-unicode default code page, but the user is also editing Unicode HTML. So if my understanding is correct its a useful convenience, not essential. But for sure moving it to the open dialog will make it more discoverable. -- Reply to this email directly or view it on GitHub: https://github.com/geany/geany/pull/3716#issuecomment-1852951628 You are receiving this because you are subscribed to this thread. Message ID:
[Github-comments] Re: [geany/geany] Various encodings conversion fixes (PR #3716)
> In the end thats all I am suggesting adding now. Done (plus a couple of weird loop I just couldn't let be). Let me know how I should update all strings :wink: > Which one? I only found "encoding of new files" and "use fixed encoding" both > of which seem useful? *Use fixed encoding when opening non-Unicode files*. It's useful, but we have code for adding a *Detect from file* entry directly in the combo box, as used in the file Open dialog. Using this there as well instead of having a separate checkbox strikes me as an improvement: simpler UI with less widgets and less code. > Well, ATM its just tests and fixes AFAICT, not sure what you see as > "cosmetic"? Cosmetic is the other this I have for the UI, and I count the more proper error message as cosmetic compared to the other fixes. -- Reply to this email directly or view it on GitHub: https://github.com/geany/geany/pull/3716#issuecomment-1852942612 You are receiving this because you are subscribed to this thread. Message ID:
[Github-comments] Re: [geany/geany] Various encodings conversion fixes (PR #3716)
@b4n pushed 2 commits. effe633ef710c690eeff4bb37ce5ce1d13d1c8fc Small code cleanup 053667e3a95650691f1a48673c94794d98b1da37 Report actual error to the caller in encodings_convert_to_utf8_auto() -- View it on GitHub: https://github.com/geany/geany/pull/3716/files/9b054d16b4ed54de2486c6762ea00478f041b08c..053667e3a95650691f1a48673c94794d98b1da37 You are receiving this because you are subscribed to this thread. Message ID:
[Github-comments] Re: [geany/geany] Various encodings conversion fixes (PR #3716)
> or should we keep the "fix encodings conversion" separate from the more > cosmetic stuff? Well, ATM its just tests and fixes AFAICT, not sure what you see as "cosmetic"? PS sure its several fixes, but they are all just part of making encoding work correctly, or as correctly as it can be made to work without rewriting. -- Reply to this email directly or view it on GitHub: https://github.com/geany/geany/pull/3716#issuecomment-1852935282 You are receiving this because you are subscribed to this thread. Message ID:
[Github-comments] Re: [geany/geany] Various encodings conversion fixes (PR #3716)
> including propagating the error better as you wished, as well as not showing > knowingly unsupported encodings in the UI In the end thats all I am suggesting adding now. > removing a unnecessary control in the preferences regarding file encodings. Which one? I only found encoding of new files and use fixed encoding both of which seem useful? -- Reply to this email directly or view it on GitHub: https://github.com/geany/geany/pull/3716#issuecomment-1852925749 You are receiving this because you are subscribed to this thread. Message ID:
[Github-comments] Re: [geany/geany] Various encodings conversion fixes (PR #3716)
> A perfectly good thing to put in a PR called "Various encodings conversion > fixes" he says hopefully? Maybe the lowest level function could display the > message if it was `force` otherwise don't worry about it because its just > Geany trying all encodings it knows again ;-) Would you really like me to add this here, in addition to the already non-trivial changes, or should this be a follow-up? I've got a few more cosmetic changes, including propagating the error better as you wished, as well as not showing knowingly unsupported encodings in the UI and removing a unnecessary control in the preferences regarding file encodings. Should I shove everything here because it's encoding-related, or should we keep the "fix encodings conversion" separate from the more cosmetic stuff? > It won't add any not supported by whoever made the list in `encodings.c`, but > hopefully in this time of Unicode that won't be many. :crossed_fingers: -- Reply to this email directly or view it on GitHub: https://github.com/geany/geany/pull/3716#issuecomment-1852882462 You are receiving this because you are subscribed to this thread. Message ID:
[Github-comments] Re: [geany/geany] Various encodings conversion fixes (PR #3716)
> It's (almost, but is for all practical purposes) documented as using > g_iconv(), which is documented as using iconv() or libiconv as fallback. Ok, the [code-ocumentation](https://github.com/GNOME/glib/blob/main/glib/gconvert.c) shows its iconv all the way, not a #ifdef windows in sight :-) > just use g_iconv_open(A, B) and it'll error out with EINVAL if the > conversion from B to A is not supported. So we can use that in `init_encodings()` to remove unsupported ones from the list. It won't add any not supported by whoever made the list in `encodings.c`, but hopefully in this time of Unicode that won't be many. -- Reply to this email directly or view it on GitHub: https://github.com/geany/geany/pull/3716#issuecomment-1850996725 You are receiving this because you are subscribed to this thread. Message ID:
[Github-comments] Re: [geany/geany] Various encodings conversion fixes (PR #3716)
> > Anyhow, yes, this could be improved, and ideally we'd probably not display > > the encoding in the list if it's not supported by the library we use > > (iconv), as it makes little sense then. > > Would be nice but IIRC last time we discussed this there was no easy way of > finding what was supported by `g_convert()`, and does that only use iconv, or > does it use something else on Windows and Macos? It's (almost, but is for all practical purposes) documented as using `g_iconv()`, which is documented as using `iconv()` or libiconv as fallback. > And even if its iconv all the way, there is no programmatic way of > determining what it supports. Oops, I didn't know it was impossible so I did it in the meantime :smile: (not pushed here, but I have something locally) More seriously though, no there don't seem to be any way of querying the list of encodings it supports, but there is a way to query whether a particular combination is supported or not: just use `g_iconv_open(A, B)` and it'll error out with `EINVAL` if the conversion from B to A is not supported. > So probably it would involve shell testing `iconv -l` at install time and > adapting the menu. Too hard. Yeah, no, that's not gonna happen I don't think. > > So yes, we could display it, it's a Mere Matter of Programming™, to report > > that error all the way up to the caller that will show the error. > > A perfectly good thing to put in a PR called "Various encodings conversion > fixes" he says hopefully? Fair enough :smile: I should stop those fairly generic names even when I ended up fixing a various things on the way, I end up getting sidetracked forever :upside_down_face: > Maybe the lowest level function could display the message if it was `force` > otherwise don't worry about it because its just Geany trying all encodings it > knows again ;-) Meh, that's a bit to ~~Geanyic~~as-hoc, but the error could definitely be propagated. -- Reply to this email directly or view it on GitHub: https://github.com/geany/geany/pull/3716#issuecomment-1850834644 You are receiving this because you are subscribed to this thread. Message ID:
[Github-comments] Re: [geany/geany] Various encodings conversion fixes (PR #3716)
> Anyhow, yes, this could be improved, and ideally we'd probably not display > the encoding in the list if it's not supported by the library we use (iconv), > as it makes little sense then. Would be nice but IIRC last time we discussed this there was no easy way of finding what was supported by `g_convert()`, and does that only use iconv, or does it use something else on Windows and Macos? And even if its iconv all the way, there is no programmatic way of determining what it supports. So probably it would involve shell testing `iconv -l` at install time and adapting the menu. Too hard. > So yes, we could display it, it's a Mere Matter of Programming™, to report > that error all the way up to the caller that will show the error. A perfectly good thing to put in a PR called "Various encodings conversion fixes" he says hopefully? Maybe the lowest level function could display the message if it was `force` otherwise don't worry about it because its just Geany trying all encodings it knows again ;-) -- Reply to this email directly or view it on GitHub: https://github.com/geany/geany/pull/3716#issuecomment-1849117350 You are receiving this because you are subscribed to this thread. Message ID:
[Github-comments] Re: [geany/geany] Various encodings conversion fixes (PR #3716)
> Several of these messages appear: > > ``` > # Geany-INFO: Couldn't convert from ISO-8859-8-I to UTF-8 (Conversion from > character set “ISO-8859-8-I” to “UTF-8” is not supported). > ``` > > What that points out is that Geany has a fixed list of conversions, which are > exposed to user selection, but not all systems support all conversions, as > you might guess my `iconv -l` does not contain `ISO-8859-8-I`. If I try to > open an ASCII file (which is valid for all 8859s) with `ISO-8859-8-I` I get > `The file is not valid ISO-8859-8-I` which is > wrong[2](#user-content-fn-2-75b2eae86374246fedd5de48e30f3431), does > `g_convert()` return a better error we could show (like the one from the > test)? `g_convert()` reports the message you see in the info message inside the parentheses. So yes, we could display it, it's a *Mere Matter of Programming™*, to report that error all the way up to the caller that will show the error. As of [ISO-8859-8-I](https://en.wikipedia.org/wiki/ISO/IEC_8859-8) it seems understandable not to be supported, as IIUC it's the same thing that should just be *rendered* differently (or not, which makes matter worse). Anyhow, yes, this could be improved, and ideally we'd probably not display the encoding in the list if it's not supported by the library we use (iconv), as it makes little sense then. But I'd argue this should probably be a separate improvement :) -- Reply to this email directly or view it on GitHub: https://github.com/geany/geany/pull/3716#issuecomment-1848957533 You are receiving this because you are subscribed to this thread. Message ID:
[Github-comments] Re: [geany/geany] Various encodings conversion fixes (PR #3716)
> that joy of having multiple build systems to maintain is back Feel free to remove autotools :wink: Anyway, tests seem to pass, although with 10 billion[^1] `Geany-INFO` messages. Several of these messages appear: ``` # Geany-INFO: Couldn't convert from ISO-8859-8-I to UTF-8 (Conversion from character set “ISO-8859-8-I” to “UTF-8” is not supported). ``` What that points out is that Geany has a fixed list of conversions, which are exposed to user selection, but not all systems support all conversions, as you might guess my `iconv -l` does not contain `ISO-8859-8-I`. If I try to open an ASCII file (which is valid for all 8859s) with `ISO-8859-8-I` I get `The file is not valid ISO-8859-8-I` which is wrong, does `g_convert()` return a better error we could show (like the one from the test)? [^1]: slight exaggeration for effect -- Reply to this email directly or view it on GitHub: https://github.com/geany/geany/pull/3716#issuecomment-1848910107 You are receiving this because you are subscribed to this thread. Message ID:
[Github-comments] Re: [geany/geany] Various encodings conversion fixes (PR #3716)
> Compiles ok, but encoding tests not made because not in `tests/meson.build` Oopsie, fixed. Aaah, that joy of having multiple build systems to maintain is back, I almost forgot it after Waf bit the dust… -- Reply to this email directly or view it on GitHub: https://github.com/geany/geany/pull/3716#issuecomment-1848667721 You are receiving this because you are subscribed to this thread. Message ID:
[Github-comments] Re: [geany/geany] Various encodings conversion fixes (PR #3716)
@b4n pushed 1 commit. 9b054d16b4ed54de2486c6762ea00478f041b08c meson: Add encodings test -- View it on GitHub: https://github.com/geany/geany/pull/3716/files/cb68cdb2f1b5c544b3235edde3bb12f149360fe7..9b054d16b4ed54de2486c6762ea00478f041b08c You are receiving this because you are subscribed to this thread. Message ID:
[Github-comments] Re: [geany/geany] Various encodings conversion fixes (PR #3716)
LGBI, I didn't inspect the test code more than a (very) quick skim. Compiles ok, but encoding tests not made because not in `tests/meson.build` and being an ASCII locale I have no other encodings, so untested. The whole encodings thing is still a mess, but its now looks a more correct mess :-) PS isn't the milestone a bit specific? :wink: -- Reply to this email directly or view it on GitHub: https://github.com/geany/geany/pull/3716#issuecomment-1848078447 You are receiving this because you are subscribed to this thread. Message ID: