[Github-comments] Re: [geany/geany] Various encodings conversion fixes (PR #3716)

2024-04-21 Thread elextr via Github-comments
@b4n the only one that I would question is the one that @eht16 had questioned, 
so I was waiting until that was decided. :-P

"Data contains NULs" is referring to the result of the conversion, but the user 
doesn't know that, they only have the input file and it may not contain NULs, 
eg it might have an overlong encoding of a nul.  

So "Result of conversion to UTF-8 buffer contains NULs" might be better.

-- 
Reply to this email directly or view it on GitHub:
https://github.com/geany/geany/pull/3716#issuecomment-2068425388
You are receiving this because you are subscribed to this thread.

Message ID: 

[Github-comments] Re: [geany/geany] Various encodings conversion fixes (PR #3716)

2024-04-21 Thread Colomban Wendling via Github-comments
@elextr Tell me if there are still strings to be updated or better error to 
report, I can make another follow-up :wink: 

-- 
Reply to this email directly or view it on GitHub:
https://github.com/geany/geany/pull/3716#issuecomment-2068175821
You are receiving this because you are subscribed to this thread.

Message ID: 

[Github-comments] Re: [geany/geany] Various encodings conversion fixes (PR #3716)

2024-04-21 Thread Colomban Wendling via Github-comments
Merged #3716 into master.

-- 
Reply to this email directly or view it on GitHub:
https://github.com/geany/geany/pull/3716#event-12551817780
You are receiving this because you are subscribed to this thread.

Message ID: 

[Github-comments] Re: [geany/geany] Various encodings conversion fixes (PR #3716)

2024-04-21 Thread Colomban Wendling via Github-comments
@b4n commented on this pull request.



>   geany_debug("Couldn't convert from %s to UTF-8.", 
> charset);
+   g_set_error(error, G_CONVERT_ERROR, 
G_CONVERT_ERROR_ILLEGAL_SEQUENCE,
+   _("Data contains NULs"));

Yes, if the conversion didn't fail (`conv_error == NULL`), it means the 
conversion was OK, but `g_utf8_validate()` failed, which can only happen on 
NULs if the data is otherwise UTF-8 (which it ought to be, unless there's a 
glaring bug in `g_convert()` yielding invalid data without reporting an error).

-- 
Reply to this email directly or view it on GitHub:
https://github.com/geany/geany/pull/3716#discussion_r1573864000
You are receiving this because you are subscribed to this thread.

Message ID: 

[Github-comments] Re: [geany/geany] Various encodings conversion fixes (PR #3716)

2024-04-21 Thread Enrico Tröger via Github-comments
@eht16 commented on this pull request.



>   geany_debug("Couldn't convert from %s to UTF-8.", 
> charset);
+   g_set_error(error, G_CONVERT_ERROR, 
G_CONVERT_ERROR_ILLEGAL_SEQUENCE,
+   _("Data contains NULs"));

Do we always know for sure the error here is exclusively NUL bytes?

-- 
Reply to this email directly or view it on GitHub:
https://github.com/geany/geany/pull/3716#pullrequestreview-2013391366
You are receiving this because you are subscribed to this thread.

Message ID: 

[Github-comments] Re: [geany/geany] Various encodings conversion fixes (PR #3716)

2024-04-21 Thread Enrico Tröger via Github-comments
There are actually some encodings which seem not support on Windows:
```
15:31:32.356445: Geany INFO : Encoding ISO-8859-10 is not supported 
by the system
15:31:32.356445: Geany INFO : Encoding ISO-8859-14 is not supported 
by the system
15:31:32.356445: Geany INFO : Encoding ISO-8859-16 is not supported 
by the system
15:31:32.356445: Geany INFO : Encoding UTF-7 is not supported by 
the system
15:31:32.356445: Geany INFO : Encoding ARMSCII-8 is not supported 
by the system
15:31:32.356445: Geany INFO : Encoding EUC-TW is not supported by 
the system
15:31:32.356445: Geany INFO : Encoding GEORGIAN-ACADEMY is not 
supported by the system
15:31:32.356445: Geany INFO : Encoding HZ is not supported by the 
system
15:31:32.356445: Geany INFO : Encoding ISO-IR-111 is not supported 
by the system
15:31:32.356445: Geany INFO : Encoding TCVN is not supported by the 
system
15:31:32.356445: Geany INFO : Encoding TIS-620 is not supported by 
the system
15:31:32.356445: Geany INFO : Encoding VISCII is not supported by 
the syste
```

For the rest, I did some basic testing on Linux and works as expected and by 
inspection it looks (almost) fine.
Even if there might be subtile bugs on edge cases, I would merge this in master 
already to get broader testing of the changes.

-- 
Reply to this email directly or view it on GitHub:
https://github.com/geany/geany/pull/3716#issuecomment-2068028879
You are receiving this because you are subscribed to this thread.

Message ID: 

[Github-comments] Re: [geany/geany] Various encodings conversion fixes (PR #3716)

2023-12-12 Thread Colomban Wendling via Github-comments
@elextr I'll pour it as soon as this gets merged ;) (unless I find out there's 
no dependency between the two, in case I can get those to cure in parallel)

-- 
Reply to this email directly or view it on GitHub:
https://github.com/geany/geany/pull/3716#issuecomment-1853012073
You are receiving this because you are subscribed to this thread.

Message ID: 

[Github-comments] Re: [geany/geany] Various encodings conversion fixes (PR #3716)

2023-12-12 Thread elextr via Github-comments
> All this commenting is for the future PR with it, no point in discussing thin 
> air, it's confusing for no reason ;)

Ok, waiting for separate concrete :smile:

-- 
Reply to this email directly or view it on GitHub:
https://github.com/geany/geany/pull/3716#issuecomment-1852998898
You are receiving this because you are subscribed to this thread.

Message ID: 

[Github-comments] Re: [geany/geany] Various encodings conversion fixes (PR #3716)

2023-12-12 Thread Colomban Wendling via Github-comments
> But I disagree that its the same as the encoding selection in the open dialog

I'm not saying it's the same, just that we have the code for generating the 
combo box with the additional item as used in that dialog, so I don't even have 
to add this.
All this commenting is for the future PR with it, no point in discussing thin 
air, it's confusing for no reason ;)

-- 
Reply to this email directly or view it on GitHub:
https://github.com/geany/geany/pull/3716#issuecomment-1852989850
You are receiving this because you are subscribed to this thread.

Message ID: 

[Github-comments] Re: [geany/geany] Various encodings conversion fixes (PR #3716)

2023-12-12 Thread elextr via Github-comments
> @elextr it's already in the open dialog, and I'm not gonna move anything. I'm 
> just gonna merge Use fixed encoding when opening non-Unicode files checkbox 
> and Default encoding (existing non-Unicode files) combo, only keeping the 
> combo and having an additional Detect from file item in it which takes over 
> the current Use fixed encoding when opening non-Unicode files checkbox role.

Ok, that one is cosmetic, removing a checkbox and adding a do nothing combo 
setting instead.

But I disagree that its the same as the encoding selection in the open dialog, 
the open dialog sets encoding to use for _all_ files, the one in prefs sets it 
for _non-unicode_ files IIUC.  I didn't check the code, so if they in fact work 
the same then there is a bug because one of the functions is missing, or it was 
just described wrong and then the labels need fixing to say the right thing.  
Don't forget the manual if changing UI.

-- 
Reply to this email directly or view it on GitHub:
https://github.com/geany/geany/pull/3716#issuecomment-1852982553
You are receiving this because you are subscribed to this thread.

Message ID: 

[Github-comments] Re: [geany/geany] Various encodings conversion fixes (PR #3716)

2023-12-12 Thread elextr via Github-comments
>  I count the more proper error message as cosmetic compared to the other 
> fixes.

And as you may have guessed I see them as fixes, so of course I say all in 
together :wink:

Will check the strings as soon as I find my purple editing pencil.

-- 
Reply to this email directly or view it on GitHub:
https://github.com/geany/geany/pull/3716#issuecomment-1852959563
You are receiving this because you are subscribed to this thread.

Message ID: 

[Github-comments] Re: [geany/geany] Various encodings conversion fixes (PR #3716)

2023-12-12 Thread Colomban Wendling via Github-comments
@elextr it's already in the open dialog, and I'm not gonna move anything.  I'm 
just gonna merge *Use fixed encoding when opening non-Unicode files* checkbox 
and *Default encoding (existing non-Unicode files)* combo, only keeping the 
combo and having an additional *Detect from file* item in it which takes over 
the current *Use fixed encoding when opening non-Unicode files* checkbox role.

-- 
Reply to this email directly or view it on GitHub:
https://github.com/geany/geany/pull/3716#issuecomment-1852957174
You are receiving this because you are subscribed to this thread.

Message ID: 

[Github-comments] Re: [geany/geany] Various encodings conversion fixes (PR #3716)

2023-12-12 Thread elextr via Github-comments
> Use fixed encoding when opening non-Unicode files. It's useful, but we have 
> code for adding a Detect from file entry directly in the combo box, as used 
> in the file Open dialog. Using this there as well instead of having a 
> separate checkbox strikes me as an improvement: simpler UI with less widgets 
> and less code.

Well, IIUC the "use fixed encoding" is intended to save the user from having to 
switch the forced encoding on the open dialog from Unicode to fixed all the 
time, like on a Windows machine with a non-unicode default code page, but the 
user is also editing Unicode HTML.  So if my understanding is correct its a 
useful convenience, not essential.  But for sure moving it to the open dialog 
will make it more discoverable.

-- 
Reply to this email directly or view it on GitHub:
https://github.com/geany/geany/pull/3716#issuecomment-1852951628
You are receiving this because you are subscribed to this thread.

Message ID: 

[Github-comments] Re: [geany/geany] Various encodings conversion fixes (PR #3716)

2023-12-12 Thread Colomban Wendling via Github-comments
> In the end thats all I am suggesting adding now.

Done (plus a couple of weird loop I just couldn't let be).  Let me know how I 
should update all strings :wink: 

> Which one? I only found "encoding of new files" and "use fixed encoding" both 
> of which seem useful?

*Use fixed encoding when opening non-Unicode files*.  It's useful, but we have 
code for adding a *Detect from file* entry directly in the combo box, as used 
in the file Open dialog.  Using this there as well instead of having a separate 
checkbox strikes me as an improvement: simpler UI with less widgets and less 
code.

> Well, ATM its just tests and fixes AFAICT, not sure what you see as 
> "cosmetic"?

Cosmetic is the other this I have for the UI, and I count the more proper error 
message as cosmetic compared to the other fixes.

-- 
Reply to this email directly or view it on GitHub:
https://github.com/geany/geany/pull/3716#issuecomment-1852942612
You are receiving this because you are subscribed to this thread.

Message ID: 

[Github-comments] Re: [geany/geany] Various encodings conversion fixes (PR #3716)

2023-12-12 Thread Colomban Wendling via Github-comments
@b4n pushed 2 commits.

effe633ef710c690eeff4bb37ce5ce1d13d1c8fc  Small code cleanup
053667e3a95650691f1a48673c94794d98b1da37  Report actual error to the caller in 
encodings_convert_to_utf8_auto()

-- 
View it on GitHub:
https://github.com/geany/geany/pull/3716/files/9b054d16b4ed54de2486c6762ea00478f041b08c..053667e3a95650691f1a48673c94794d98b1da37
You are receiving this because you are subscribed to this thread.

Message ID: 


[Github-comments] Re: [geany/geany] Various encodings conversion fixes (PR #3716)

2023-12-12 Thread elextr via Github-comments
> or should we keep the "fix encodings conversion" separate from the more 
> cosmetic stuff?

Well, ATM its just tests and fixes AFAICT, not sure what you see as "cosmetic"?

PS sure its several fixes, but they are all just part of making encoding work 
correctly, or as correctly as it can be made to work without rewriting.

-- 
Reply to this email directly or view it on GitHub:
https://github.com/geany/geany/pull/3716#issuecomment-1852935282
You are receiving this because you are subscribed to this thread.

Message ID: 

[Github-comments] Re: [geany/geany] Various encodings conversion fixes (PR #3716)

2023-12-12 Thread elextr via Github-comments
> including propagating the error better as you wished, as well as not showing 
> knowingly unsupported encodings in the UI

In the end thats all I am suggesting adding now.

> removing a unnecessary control in the preferences regarding file encodings. 

Which one?  I only found encoding of new files and use fixed encoding both of 
which seem useful?

-- 
Reply to this email directly or view it on GitHub:
https://github.com/geany/geany/pull/3716#issuecomment-1852925749
You are receiving this because you are subscribed to this thread.

Message ID: 

[Github-comments] Re: [geany/geany] Various encodings conversion fixes (PR #3716)

2023-12-12 Thread Colomban Wendling via Github-comments
> A perfectly good thing to put in a PR called "Various encodings conversion 
> fixes" he says hopefully? Maybe the lowest level function could display the 
> message if it was `force` otherwise don't worry about it because its just 
> Geany trying all encodings it knows again ;-)

Would you really like me to add this here, in addition to the already 
non-trivial changes, or should this be a follow-up?

I've got a few more cosmetic changes, including propagating the error better as 
you wished, as well as not showing knowingly unsupported encodings in the UI 
and removing a unnecessary control in the preferences regarding file encodings. 
 Should I shove everything here because it's encoding-related, or should we 
keep the "fix encodings conversion" separate from the more cosmetic stuff?

> It won't add any not supported by whoever made the list in `encodings.c`, but 
> hopefully in this time of Unicode that won't be many.

:crossed_fingers: 

-- 
Reply to this email directly or view it on GitHub:
https://github.com/geany/geany/pull/3716#issuecomment-1852882462
You are receiving this because you are subscribed to this thread.

Message ID: 

[Github-comments] Re: [geany/geany] Various encodings conversion fixes (PR #3716)

2023-12-11 Thread elextr via Github-comments
> It's (almost, but is for all practical purposes) documented as using 
> g_iconv(), which is documented as using iconv() or libiconv as fallback.

Ok, the 
[code-ocumentation](https://github.com/GNOME/glib/blob/main/glib/gconvert.c) 
shows its iconv all the way, not a #ifdef windows in sight :-)

>  just use g_iconv_open(A, B) and it'll error out with EINVAL if the 
> conversion from B to A is not supported.

So we can use that in `init_encodings()` to remove unsupported ones from the 
list.

It won't add any not supported by whoever made the list in `encodings.c`, but 
hopefully in this time of Unicode that won't be many.



-- 
Reply to this email directly or view it on GitHub:
https://github.com/geany/geany/pull/3716#issuecomment-1850996725
You are receiving this because you are subscribed to this thread.

Message ID: 

[Github-comments] Re: [geany/geany] Various encodings conversion fixes (PR #3716)

2023-12-11 Thread Colomban Wendling via Github-comments
> > Anyhow, yes, this could be improved, and ideally we'd probably not display 
> > the encoding in the list if it's not supported by the library we use 
> > (iconv), as it makes little sense then.
> 
> Would be nice but IIRC last time we discussed this there was no easy way of 
> finding what was supported by `g_convert()`, and does that only use iconv, or 
> does it use something else on Windows and Macos?

It's (almost, but is for all practical purposes) documented as using 
`g_iconv()`, which is documented as using `iconv()` or libiconv as fallback.

> And even if its iconv all the way, there is no programmatic way of 
> determining what it supports.

Oops, I didn't know it was impossible so I did it in the meantime :smile:  (not 
pushed here, but I have something locally)
More seriously though, no there don't seem to be any way of querying the list 
of encodings it supports, but there is a way to query whether a particular 
combination is supported or not: just use `g_iconv_open(A, B)` and it'll error 
out with `EINVAL` if the conversion from B to A is not supported.

> So probably it would involve shell testing `iconv -l` at install time and 
> adapting the menu. Too hard.

Yeah, no, that's not gonna happen I don't think.

> > So yes, we could display it, it's a Mere Matter of Programming™, to report 
> > that error all the way up to the caller that will show the error.
> 
> A perfectly good thing to put in a PR called "Various encodings conversion 
> fixes" he says hopefully?

Fair enough :smile:   I should stop those fairly generic names even when I 
ended up fixing a various things on the way, I end up getting sidetracked 
forever :upside_down_face: 

> Maybe the lowest level function could display the message if it was `force` 
> otherwise don't worry about it because its just Geany trying all encodings it 
> knows again ;-)

Meh, that's a bit to ~~Geanyic~~as-hoc, but the error could definitely be 
propagated.

-- 
Reply to this email directly or view it on GitHub:
https://github.com/geany/geany/pull/3716#issuecomment-1850834644
You are receiving this because you are subscribed to this thread.

Message ID: 

[Github-comments] Re: [geany/geany] Various encodings conversion fixes (PR #3716)

2023-12-10 Thread elextr via Github-comments
> Anyhow, yes, this could be improved, and ideally we'd probably not display 
> the encoding in the list if it's not supported by the library we use (iconv), 
> as it makes little sense then.

Would be nice but IIRC last time we discussed this there was no easy way of 
finding what was supported by `g_convert()`, and does that only use iconv, or 
does it use something else on Windows and Macos?  And even if its iconv all the 
way, there is no programmatic way of determining what it supports.  So probably 
it would involve shell testing `iconv -l` at install time and adapting the 
menu. Too hard.

> So yes, we could display it, it's a Mere Matter of Programming™, to report 
> that error all the way up to the caller that will show the error.

A perfectly good thing to put in a PR called "Various encodings conversion 
fixes" he says hopefully?  Maybe the lowest level function could display the 
message if it was `force` otherwise don't worry about it because its just Geany 
trying all encodings it knows again ;-)

-- 
Reply to this email directly or view it on GitHub:
https://github.com/geany/geany/pull/3716#issuecomment-1849117350
You are receiving this because you are subscribed to this thread.

Message ID: 

[Github-comments] Re: [geany/geany] Various encodings conversion fixes (PR #3716)

2023-12-10 Thread Colomban Wendling via Github-comments
> Several of these messages appear:
> 
> ```
> # Geany-INFO: Couldn't convert from ISO-8859-8-I to UTF-8 (Conversion from 
> character set “ISO-8859-8-I” to “UTF-8” is not supported).
> ```
>
> What that points out is that Geany has a fixed list of conversions, which are 
> exposed to user selection, but not all systems support all conversions, as 
> you might guess my `iconv -l` does not contain `ISO-8859-8-I`. If I try to 
> open an ASCII file (which is valid for all 8859s) with `ISO-8859-8-I` I get 
> `The file  is not valid ISO-8859-8-I` which is 
> wrong[2](#user-content-fn-2-75b2eae86374246fedd5de48e30f3431), does 
> `g_convert()` return a better error we could show (like the one from the 
> test)?

`g_convert()` reports the message you see in the info message inside the 
parentheses.  So yes, we could display it, it's a *Mere Matter of 
Programming™*, to report that error all the way up to the caller that will show 
the error.

As of [ISO-8859-8-I](https://en.wikipedia.org/wiki/ISO/IEC_8859-8) it seems 
understandable not to be supported, as IIUC it's the same thing that should 
just be *rendered* differently (or not, which makes matter worse).

Anyhow, yes, this could be improved, and ideally we'd probably not display the 
encoding in the list if it's not supported by the library we use (iconv), as it 
makes little sense then.  But I'd argue this should probably be a separate 
improvement :)

-- 
Reply to this email directly or view it on GitHub:
https://github.com/geany/geany/pull/3716#issuecomment-1848957533
You are receiving this because you are subscribed to this thread.

Message ID: 

[Github-comments] Re: [geany/geany] Various encodings conversion fixes (PR #3716)

2023-12-10 Thread elextr via Github-comments
> that joy of having multiple build systems to maintain is back

Feel free to remove autotools :wink:

Anyway, tests seem to pass, although with 10 billion[^1] `Geany-INFO` messages.

Several of these messages appear:

```
# Geany-INFO: Couldn't convert from ISO-8859-8-I to UTF-8 (Conversion from 
character set “ISO-8859-8-I” to “UTF-8” is not supported).
```

What that points out is that Geany has a fixed list of conversions, which are 
exposed to user selection, but not all systems support all conversions, as you 
might guess my `iconv -l` does not contain `ISO-8859-8-I`.  If I try to open an 
ASCII file (which is valid for all 8859s) with `ISO-8859-8-I` I get `The file 
 is not valid ISO-8859-8-I` which is wrong, does `g_convert()` return a 
better error we could show (like the one from the test)?

[^1]: slight exaggeration for effect

-- 
Reply to this email directly or view it on GitHub:
https://github.com/geany/geany/pull/3716#issuecomment-1848910107
You are receiving this because you are subscribed to this thread.

Message ID: 

[Github-comments] Re: [geany/geany] Various encodings conversion fixes (PR #3716)

2023-12-09 Thread Colomban Wendling via Github-comments
> Compiles ok, but encoding tests not made because not in `tests/meson.build`

Oopsie, fixed.  Aaah, that joy of having multiple build systems to maintain is 
back, I almost forgot it after Waf bit the dust…

-- 
Reply to this email directly or view it on GitHub:
https://github.com/geany/geany/pull/3716#issuecomment-1848667721
You are receiving this because you are subscribed to this thread.

Message ID: 

[Github-comments] Re: [geany/geany] Various encodings conversion fixes (PR #3716)

2023-12-09 Thread Colomban Wendling via Github-comments
@b4n pushed 1 commit.

9b054d16b4ed54de2486c6762ea00478f041b08c  meson: Add encodings test

-- 
View it on GitHub:
https://github.com/geany/geany/pull/3716/files/cb68cdb2f1b5c544b3235edde3bb12f149360fe7..9b054d16b4ed54de2486c6762ea00478f041b08c
You are receiving this because you are subscribed to this thread.

Message ID: 


[Github-comments] Re: [geany/geany] Various encodings conversion fixes (PR #3716)

2023-12-08 Thread elextr via Github-comments
LGBI, I didn't inspect the test code more than a (very) quick skim.  Compiles 
ok, but encoding tests not made because not in `tests/meson.build` and being an 
ASCII locale I have no other encodings, so untested.  

The whole encodings thing is still a mess, but its now looks a more correct 
mess :-)

PS isn't the milestone a bit specific? :wink: 

-- 
Reply to this email directly or view it on GitHub:
https://github.com/geany/geany/pull/3716#issuecomment-1848078447
You are receiving this because you are subscribed to this thread.

Message ID: