Re: [Github-comments] [geany/geany] Provide utf-8 command line arguments on Windows (#1258)

2017-04-20 Thread Thomas Martitz
I wonder, what makes this Windows-specific? On-disk file names can be arbitrary 
encoding on all platforms, right?

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/geany/geany/pull/1258#issuecomment-295607028

Re: [Github-comments] [geany/geany] Provide utf-8 command line arguments on Windows (#1258)

2016-12-06 Thread Enrico Tröger
@cshu still interested in this PR?
I think it would be cool and the sooner we get it merged (regarding the next 
release) the better it is.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/geany/geany/pull/1258#issuecomment-264694525

Re: [Github-comments] [geany/geany] Provide utf-8 command line arguments on Windows (#1258)

2016-12-06 Thread cshu
@eht16 No, I leave it to you. This PR is a small change.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/geany/geany/pull/1258#issuecomment-264706407

Re: [Github-comments] [geany/geany] Provide utf-8 command line arguments on Windows (#1258)

2016-11-09 Thread Enrico Tröger
eht16 commented on this pull request.



> +{
+   int num_arg;
+   LPWSTR *szarglist = CommandLineToArgvW(GetCommandLineW(), _arg);
+   char **utf8argv = g_new0(char *, num_arg + 1);
+   int i = num_arg;
+   while(i)
+   {
+   i--;
+   utf8argv[i] = g_utf16_to_utf8((gunichar2 *)szarglist[i], -1, 
NULL, NULL, NULL);
+   }
+   *pargc = num_arg;
+   *pargv = utf8argv;
+   LocalFree(szarglist);
+}
+
+void win32_free_argv_made_in_utf8(gint argc, gchar **argv)

Absolutely, I missed the point that the array is NULL-terminated. Sorry.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/geany/geany/pull/1258

Re: [Github-comments] [geany/geany] Provide utf-8 command line arguments on Windows (#1258)

2016-11-09 Thread Enrico Tröger
eht16 commented on this pull request.



> @@ -1231,6 +1230,9 @@ gint main_lib(gint argc, gchar **argv)
 #endif
 
gtk_main();
+#ifdef G_OS_WIN32
+   win32_free_argv_made_in_utf8(argc, argv);

Yeah, also nice. But I'd still vote to put the Windows specific code in the 
first G_OS_WIN32 block into src/win32.c.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/geany/geany/pull/1258

Re: [Github-comments] [geany/geany] Provide utf-8 command line arguments on Windows (#1258)

2016-11-09 Thread Colomban Wendling
b4n commented on this pull request.



> @@ -1032,4 +1032,31 @@ gchar *win32_get_user_config_dir(void)
return g_build_filename(g_get_user_config_dir(), "geany", NULL);
 }
 
+
+void win32_make_argc_and_argv_in_utf8(gint *pargc, gchar ***pargv)
+{
+   int num_arg;
+   LPWSTR *szarglist = CommandLineToArgvW(GetCommandLineW(), _arg);
+   char **utf8argv = g_new0(char *, num_arg + 1);
+   int i = num_arg;
+   while(i)
+   {
+   i--;
+   utf8argv[i] = g_utf16_to_utf8((gunichar2 *)szarglist[i], -1, 
NULL, NULL, NULL);

Well it's totally doable, but it involves calling `WideCharToMultiByte()` twice 
(or over-alloc `len*4+1` bytes), allocating manually and the like.  Indeed 
nothing complicated, but at least 4 lines or so worth of WINAPI code, where we 
basically have access to a 1-line thing.  So if it gives us something, yeah 
sure, otherwise I'm not really sure it's worth it.

Hell, let's see how it'd look:
```C
static gchar *wcstr_to_utf8(wchar_t *wcstr)
{
int len = WideCharToMultiByte(CP_UTF8, 0, wcstr, -1, NULL, 0, NULL, 
NULL);
gchar *utf8str = g_malloc(len + 1);
WideCharToMultiByte(CP_UTF8, 0, wcstr, -1, utf8str, len + 1, NULL, 
NULL);
utf8str[len] = 0; // FIXME: is that useful?
return utf8str;
}
```
And according to the 
[docs](https://msdn.microsoft.com/en-us/library/windows/desktop/dd374130%28v=vs.85%29.aspx)
 it probably doesn't really help:
> **WC_ERR_INVALID_CHARS**: **Windows Vista and later:** Fail if an invalid 
> input character is encountered. If this flag is not set, the function 
> silently drops illegal code points. A call to GetLastError returns 
> ERROR_NO_UNICODE_TRANSLATION. Note that this flag only applies when CodePage 
> is specified as CP_UTF8 or 54936 (for Windows Vista and later). It cannot be 
> used with other code page values.

So my understanding would be that in case of invalid UTF-16 (as it is possible 
in filenames), it would result in uninteresting UTF-8, while we'd actually want 
"WTF-8".

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/geany/geany/pull/1258

Re: [Github-comments] [geany/geany] Provide utf-8 command line arguments on Windows (#1258)

2016-11-09 Thread Matthew Brush
codebrainz commented on this pull request.



> @@ -1032,4 +1032,31 @@ gchar *win32_get_user_config_dir(void)
return g_build_filename(g_get_user_config_dir(), "geany", NULL);
 }
 
+
+void win32_make_argc_and_argv_in_utf8(gint *pargc, gchar ***pargv)
+{
+   int num_arg;
+   LPWSTR *szarglist = CommandLineToArgvW(GetCommandLineW(), _arg);
+   char **utf8argv = g_new0(char *, num_arg + 1);
+   int i = num_arg;
+   while(i)
+   {
+   i--;
+   utf8argv[i] = g_utf16_to_utf8((gunichar2 *)szarglist[i], -1, 
NULL, NULL, NULL);

> that Windows API is a lot more annoying to use for us

It's not _a lot_ more annoying, just `WideCharToMultiByte` with `CPUTF8`. Could 
even write a little function like `win32_wide_to_utf8` that hides the 
pre-flight pass and allocates the memory with GLib's allocator. It's maybe 5-10 
lines of code to write that function.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/geany/geany/pull/1258

Re: [Github-comments] [geany/geany] Provide utf-8 command line arguments on Windows (#1258)

2016-11-09 Thread Colomban Wendling
b4n commented on this pull request.



> @@ -1032,4 +1032,31 @@ gchar *win32_get_user_config_dir(void)
return g_build_filename(g_get_user_config_dir(), "geany", NULL);
 }
 
+
+void win32_make_argc_and_argv_in_utf8(gint *pargc, gchar ***pargv)
+{
+   int num_arg;
+   LPWSTR *szarglist = CommandLineToArgvW(GetCommandLineW(), _arg);
+   char **utf8argv = g_new0(char *, num_arg + 1);
+   int i = num_arg;
+   while(i)
+   {
+   i--;
+   utf8argv[i] = g_utf16_to_utf8((gunichar2 *)szarglist[i], -1, 
NULL, NULL, NULL);

@codebrainz myeah, theoretically it might, but in practice it won't change 
(nobody can afford that honestly), and that Windows API is a lot more annoying 
to use for us.  But if it does handle the invalid Unicode filenames and 
`g_utf16_to_utf8()` doesn't, it might be worth it.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/geany/geany/pull/1258

Re: [Github-comments] [geany/geany] Provide utf-8 command line arguments on Windows (#1258)

2016-11-09 Thread Matthew Brush
codebrainz commented on this pull request.



> @@ -1032,4 +1032,31 @@ gchar *win32_get_user_config_dir(void)
return g_build_filename(g_get_user_config_dir(), "geany", NULL);
 }
 
+
+void win32_make_argc_and_argv_in_utf8(gint *pargc, gchar ***pargv)
+{
+   int num_arg;
+   LPWSTR *szarglist = CommandLineToArgvW(GetCommandLineW(), _arg);
+   char **utf8argv = g_new0(char *, num_arg + 1);
+   int i = num_arg;
+   while(i)
+   {
+   i--;
+   utf8argv[i] = g_utf16_to_utf8((gunichar2 *)szarglist[i], -1, 
NULL, NULL, NULL);

IMO, it would be cleaner if it didn't hardcode UTF-16 assumptions. While it's 
highly unlikely to change any time soon, Win32 does provide functions to 
convert to/from various encodings to its native "wide" encoding without 
hard-coding what that is.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/geany/geany/pull/1258

Re: [Github-comments] [geany/geany] Provide utf-8 command line arguments on Windows (#1258)

2016-11-09 Thread elextr
> (Or we could please @codebrainz and rewrite using GApplication which does 
> this for us… but that would be a lot of changes :))

or maybe qapplication ;-P

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/geany/geany/pull/1258#issuecomment-259389716

Re: [Github-comments] [geany/geany] Provide utf-8 command line arguments on Windows (#1258)

2016-11-09 Thread Colomban Wendling
Hum, the [docs on 
`g_win32_get_command_line()`](https://developer.gnome.org/glib/unstable/glib-Windows-Compatibility-Functions.html#g-win32-get-command-line)
 mention that 
[`g_option_context_parse()`](https://developer.gnome.org/glib/unstable/glib-Commandline-option-parser.html#g-option-context-parse)
 (that we use) expects locale-encoded strings, not UTF-8 ones.  GLib 2.40+ has 
[`g_option_context_parse_strv()`](https://developer.gnome.org/glib/unstable/glib-Commandline-option-parser.html#g-option-context-parse-strv)
 which expects *GLib filename encoding* (that is, UTF-8 on Windows), but it's 
fairly recent.

(Or we could please @codebrainz and rewrite using GApplication which does this 
for us… but that would be a lot of changes :))

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/geany/geany/pull/1258#issuecomment-259368928

Re: [Github-comments] [geany/geany] Provide utf-8 command line arguments on Windows (#1258)

2016-11-09 Thread Colomban Wendling
b4n commented on this pull request.



> @@ -1032,4 +1032,31 @@ gchar *win32_get_user_config_dir(void)
return g_build_filename(g_get_user_config_dir(), "geany", NULL);
 }
 
+
+void win32_make_argc_and_argv_in_utf8(gint *pargc, gchar ***pargv)

Actually recent GLib (2.40+) have `g_win32_get_command_line()` which basically 
does just that.

(and doesn't bother with checking for error in the conversion, so it might be 
"alright" -- or maybe they find it OK because they don't return arg count and 
expect callers to use `g_strv_length()` which that would simply truncate the 
list, not sure)

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/geany/geany/pull/1258

Re: [Github-comments] [geany/geany] Provide utf-8 command line arguments on Windows (#1258)

2016-11-09 Thread Colomban Wendling
b4n requested changes on this pull request.

The idea seems good, and without having actually tested the PR I can confirm 
from other tests I did yetserday that the combination of `GetCommandLineW()` 
and `CommandLineToArgvW()` seems to work just fine.

> +{
+   int num_arg;
+   LPWSTR *szarglist = CommandLineToArgvW(GetCommandLineW(), _arg);
+   char **utf8argv = g_new0(char *, num_arg + 1);
+   int i = num_arg;
+   while(i)
+   {
+   i--;
+   utf8argv[i] = g_utf16_to_utf8((gunichar2 *)szarglist[i], -1, 
NULL, NULL, NULL);
+   }
+   *pargc = num_arg;
+   *pargv = utf8argv;
+   LocalFree(szarglist);
+}
+
+void win32_free_argv_made_in_utf8(gint argc, gchar **argv)

should actually use `g_strfreev()` as the `argv[]` array ends with a `NULL` 
element.

> @@ -1032,4 +1032,31 @@ gchar *win32_get_user_config_dir(void)
return g_build_filename(g_get_user_config_dir(), "geany", NULL);
 }
 
+
+void win32_make_argc_and_argv_in_utf8(gint *pargc, gchar ***pargv)
+{
+   int num_arg;
+   LPWSTR *szarglist = CommandLineToArgvW(GetCommandLineW(), _arg);
+   char **utf8argv = g_new0(char *, num_arg + 1);
+   int i = num_arg;
+   while(i)

I'd prefer a `for()` loop like
```C
for (int i = 0; i < num_arg; i++)
utf8argv[i] = g_utf16_to_utf8((gunichar2 *)szarglist[i], -1, NULL, 
NULL, NULL);
```

> @@ -1032,4 +1032,31 @@ gchar *win32_get_user_config_dir(void)
return g_build_filename(g_get_user_config_dir(), "geany", NULL);
 }
 
+
+void win32_make_argc_and_argv_in_utf8(gint *pargc, gchar ***pargv)
+{
+   int num_arg;
+   LPWSTR *szarglist = CommandLineToArgvW(GetCommandLineW(), _arg);
+   char **utf8argv = g_new0(char *, num_arg + 1);
+   int i = num_arg;
+   while(i)
+   {
+   i--;
+   utf8argv[i] = g_utf16_to_utf8((gunichar2 *)szarglist[i], -1, 
NULL, NULL, NULL);

Checking for error might be needed, as [Windows seems to allows invalid UTF-16 
in filenames](https://en.wikipedia.org/wiki/UTF-8#WTF-8), and I'm afraid 
`g_utf16_to_utf8()` doesn't allow this.
Another solution might be a custom "broken UTF-16 to WTF-8" converter, like I 
played with in https://github.com/b4n/wtf8tools.  It seems most places in GLib 
don't have problem with WTF8, as it's structurally fine and "just" encodes 
weird stuff.

Anyway, something should be done because only the last element of `argv[]` 
should ever be `NULL`, and I wouldn't be surprised it'd crash somewhere if it 
wasn't respected.

OK, this is a low-risk issue I guess (invalid UTF-16 filenames), but still a 
theoretically possible one.

> @@ -1032,4 +1032,31 @@ gchar *win32_get_user_config_dir(void)
return g_build_filename(g_get_user_config_dir(), "geany", NULL);
 }
 
+
+void win32_make_argc_and_argv_in_utf8(gint *pargc, gchar ***pargv)

Agreed

> @@ -1231,6 +1230,9 @@ gint main_lib(gint argc, gchar **argv)
 #endif
 
gtk_main();
+#ifdef G_OS_WIN32
+   win32_free_argv_made_in_utf8(argc, argv);

I'm not sure it's worth having a function which is a no-op on non-Windows just 
for not duplicating one guarding of a free; if we really don't want duplication 
here we find a way to only have one exit path :)

> @@ -1231,6 +1230,9 @@ gint main_lib(gint argc, gchar **argv)
 #endif
 
gtk_main();
+#ifdef G_OS_WIN32
+   win32_free_argv_made_in_utf8(argc, argv);

Alternatively the conversion and freeing could be done in `main()` itself (from 
*main.c*), so `main_lib()` simply receives a nice argument list directly:

```C
int main(int argc, char **argv)
{
int ret;
#ifdef G_OS_WIN32
/* on Windows, get the Unicode argulment list and convert it to UTF-8 
into argv[] */
LPWSTR *szarglist = CommandLineToArgvW(GetCommandLineW(), );

argv = g_new(char *, argc + 1);
for (int i = 0; i < argc; i++)
argv[i] = g_utf16_to_utf8((gunichar2 *)szarglist[i], -1, NULL, 
NULL, NULL);
argv[argc] = NULL;
LocalFree(szarglist);
#endif

ret = main_lib(argc, argv);

#ifdef G_OS_WIN32
g_strfreev(argv);
#endif

return ret;
}
```

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/geany/geany/pull/1258#pullrequestreview-7773217

Re: [Github-comments] [geany/geany] Provide utf-8 command line arguments on Windows (#1258)

2016-11-08 Thread Enrico Tröger
eht16 requested changes on this pull request.

Thanks for the hints for reproducing. After being able to reproduce the issue 
and testing your changes, it worked fine.

Thanks.



-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/geany/geany/pull/1258#pullrequestreview-7733367

Re: [Github-comments] [geany/geany] Provide utf-8 command line arguments on Windows (#1258)

2016-11-08 Thread Enrico Tröger
eht16 commented on this pull request.



> @@ -1032,4 +1032,31 @@ gchar *win32_get_user_config_dir(void)
return g_build_filename(g_get_user_config_dir(), "geany", NULL);
 }
 
+
+void win32_make_argc_and_argv_in_utf8(gint *pargc, gchar ***pargv)

Maybe `win32_convert_argv_to_utf8()` would be a better name.

> +{
+   int num_arg;
+   LPWSTR *szarglist = CommandLineToArgvW(GetCommandLineW(), _arg);
+   char **utf8argv = g_new0(char *, num_arg + 1);
+   int i = num_arg;
+   while(i)
+   {
+   i--;
+   utf8argv[i] = g_utf16_to_utf8((gunichar2 *)szarglist[i], -1, 
NULL, NULL, NULL);
+   }
+   *pargc = num_arg;
+   *pargv = utf8argv;
+   LocalFree(szarglist);
+}
+
+void win32_free_argv_made_in_utf8(gint argc, gchar **argv)

The code is quite generic and not Windows-specific, so it could go into 
src/utils.c as `utils_free_argv()`, similar to `utils_free_pointers()`.

> @@ -1231,6 +1230,9 @@ gint main_lib(gint argc, gchar **argv)
 #endif
 
gtk_main();
+#ifdef G_OS_WIN32
+   win32_free_argv_made_in_utf8(argc, argv);

Maybe the call including the G_OS_WIN32 guard could go into a seperate function 
in src/libmain.c as it should be also in 
https://github.com/geany/geany/blob/master/src/libmain.c#L1090.
Strictly there are more places where it should be called, e.g. every place 
where we call `exit()` but this would require some refactoring before which is 
out of scope of this PR (and technically it is not very necessary as the memory 
in cleaned up anyway on `exit()`).

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/geany/geany/pull/1258#pullrequestreview-7732004

Re: [Github-comments] [geany/geany] Provide utf-8 command line arguments on Windows (#1258)

2016-10-28 Thread cshu
I moved most code to win32.c, and included argv[0] in conversion.

Different file systems can use various encoding for filenames. (NTFS, FAT32, 
etc.) But `CommandLineToArgvW` always provide filenames in UTF-16. So in the 
case of windows api functions, underlying file system usually should not be a 
problem.

It's easy to get some random CJK characters from typing code unit. 
(Ctrl+Shift+I on firefox/chrome, in the console, type '\u')

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/geany/geany/pull/1258#issuecomment-256847252

Re: [Github-comments] [geany/geany] Provide utf-8 command line arguments on Windows (#1258)

2016-10-27 Thread cshu
@cshu pushed 1 commit.

e9e7c11  most-logic-in-win32


-- 
You are receiving this because you are subscribed to this thread.
View it on GitHub:
https://github.com/geany/geany/pull/1258/files/1dd0769d7ea77eb54af0a61238370c95e56cec9f..e9e7c117be223e23e05ea9e0f97b00ecb17a4afc


Re: [Github-comments] [geany/geany] Provide utf-8 command line arguments on Windows (#1258)

2016-10-24 Thread Enrico Tröger
In the initial description you talk about "unicode". Unicode characters can be 
saved using different encodings, this can be UTF-8, UTF-16 and so on. We cannot 
know which encoding the filename of a file on disk was used.

Yes, we can assume UTF-16 or we can assume UTF-8, maybe even both by trying.
But then the next user wants UTF-32 BE, then UTF-32 LE and whatever else.

The behaviour is probably very similar on non-Windows platforms. I think 
usually filenames should follow the system's locale. Mixed charsets are never a 
good idea.

Anyway, I don't know how I could create a file on Windows with a filename not 
in the system's locale and so cannot really test it. I just tested your changes 
with non-ASCII filenames in the system's locale and it still works.

Before this could get merged, two remarks:
- could you try to handle argv[0] as well as mentioned above
- it'd be nice if you could move the Windows specific code into a new function 
in src/win32.c, then src/main.c gets less cluttered. Usually we try to have the 
weird Windows-specific ~~hidden~~abstracted in src/win32.c :)

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/geany/geany/pull/1258#issuecomment-255880669

Re: [Github-comments] [geany/geany] Provide utf-8 command line arguments on Windows (#1258)

2016-10-16 Thread cshu
This can be reproduced when filename character is not supported by system 
locale's encoding.
E.g. When system locale is "English (United States)", double clicking 
`read我.txt` opens an empty tab in Geany with name `read?.txt`. The file is not 
really opened.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/geany/geany/pull/1258#issuecomment-254087072

Re: [Github-comments] [geany/geany] Provide utf-8 command line arguments on Windows (#1258)

2016-10-14 Thread cshu
I created the pull request mainly for the convenience of "open with" Geany on 
Windows

On the other hand it should be very rare for Geany's installation path to 
contain unicode character.
And I'd like to let geany launch as fast as possible when I press Win+R and 
type `geany`. So I avoided the conversion logic when `argc == 0`.

I guess maintainers can make the final decision.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/geany/geany/pull/1258#issuecomment-253808396

Re: [Github-comments] [geany/geany] Provide utf-8 command line arguments on Windows (#1258)

2016-10-13 Thread Stefan Oberhumer
I'm not sure if it's necessary but have you checked you don't have to handle 
argv[0] the same way ?
(Eg geany is in an directory containing unicode characters)

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/geany/geany/pull/1258#issuecomment-253580245

Re: [Github-comments] [geany/geany] Provide utf-8 command line arguments on Windows (#1258)

2016-10-06 Thread cshu
Should work, regardless of locale settings.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/geany/geany/pull/1258#issuecomment-251909482

Re: [Github-comments] [geany/geany] Provide utf-8 command line arguments on Windows (#1258)

2016-10-06 Thread elextr
Does this work if the locale is a non-unicode code page?

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/geany/geany/pull/1258#issuecomment-251900323