Re: Quotes around command-line argument that has unicode characters are not removed

2018-03-27 Thread Brian Inglis
On 2018-03-27 03:56, Andrey Repin wrote:
>>> Locale settings affecting Cygwin binary.
>>> If you
>>> set LANG=ru_RU.CP866
>>> (f.e.)
>>> before invoking cygwin testcase in native CMD, you will likely see it
>>> working better.
>> Thanks for this advise, Andrey. I see that it reacts, but works worth :)
>> I think it advises to output characters in CP866, but console is UTF-8:
>> D:\cli> set LANG=ru_RU.CP866
>> D:\cli> test "текст плюс.txt"
>> param 0 = test
>> param 1 = ⥪▒▒ .txt
>> Failed to open '⥪▒▒ .txt': No such file or directory
>> But.. ta-da! I made it working like that:
>> D:\cli> set LANG=ru_RU.UTF-8
>> D:\cli> test "текст плюс.txt"
>> param 0 = test
>> param 1 = текст плюс.txt
>> File 'текст плюс.txt' was opened
>> Hooray, it worked!
> This is no magic. Console settings must match locale set in the environment.
> Please test again with "chcp" to get current console codepage and setting 
> LANG to match it.
> I could not see which version of Windows you're using, sorry. It is possible
> that console is set to a different codepage than usual.
>>> Alternatively, you could try
>>> chcp 65001
>> That does not help:
>> D:\cli> chcp 65001
>> Active code page: 65001
>> D:\cli> test "текст плюс.txt"
>> param 0 = test
>> param 1 = "текст плюс.txt"
>> Failed to open '"текст плюс.txt"': No such file or directory
>> [1] 
>> https://github.com/openunix/cygwin/blob/master/winsup/cygwin/dcrt0.cc#L297
>> [2] 
>> https://github.com/openunix/cygwin/blob/master/winsup/cygwin/dcrt0.cc#L165


If you're using cmd you can also set AutoRun commands like:

$ cat HKCU-SW-MS-Command_Processor-AutoRun-chcp_65001.reg
Windows Registry Editor Version 5.00

[HKEY_CURRENT_USER\Software\Microsoft\Command Processor]
"AutoRun"="@chcp 65001 >nul"


- append " && command..." to add more commands to AutoRun; these must use only
the common base characters.

-- 
Take care. Thanks, Brian Inglis, Calgary, Alberta, Canada

--
Problem reports:   http://cygwin.com/problems.html
FAQ:   http://cygwin.com/faq/
Documentation: http://cygwin.com/docs.html
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple



Re: Quotes around command-line argument that has unicode characters are not removed

2018-03-27 Thread Andrey Repin
Greetings, Dmitry Katsubo!

>> Locale settings affecting Cygwin binary.
>> 
>> If you
>> set LANG=ru_RU.CP866
>> (f.e.)
>> before invoking cygwin testcase in native CMD, you will likely see it
>> working better.

> Thanks for this advise, Andrey. I see that it reacts, but works worth :)
> I think it advises to output characters in CP866, but console is UTF-8:

> D:\cli> set LANG=ru_RU.CP866

> D:\cli> test "текст плюс.txt"
> param 0 = test
> param 1 = ⥪▒▒ .txt
> Failed to open '⥪▒▒ .txt': No such file or directory

> But.. ta-da! I made it working like that:

> D:\cli> set LANG=ru_RU.UTF-8

> D:\cli> test "текст плюс.txt"
> param 0 = test
> param 1 = текст плюс.txt
> File 'текст плюс.txt' was opened

> Hooray, it worked!

This is no magic. Console settings must match locale set in the environment.
Please test again with "chcp" to get current console codepage and setting LANG 
to match it.
I could not see which version of Windows you're using, sorry. It is possible
that console is set to a different codepage than usual.

>> Alternatively, you could try
>> chcp 65001

> That does not help:

> D:\cli> chcp 65001
> Active code page: 65001

> D:\cli> test "текст плюс.txt"
> param 0 = test
> param 1 = "текст плюс.txt"
> Failed to open '"текст плюс.txt"': No such file or directory

> [1] https://github.com/openunix/cygwin/blob/master/winsup/cygwin/dcrt0.cc#L297
> [2] https://github.com/openunix/cygwin/blob/master/winsup/cygwin/dcrt0.cc#L165



-- 
With best regards,
Andrey Repin
Tuesday, March 27, 2018 12:51:10

Sorry for my terrible english...

Re: Quotes around command-line argument that has unicode characters are not removed

2018-03-24 Thread Kaz Kylheku

On 2018-03-22 15:21, Dmitry Katsubo via cygwin wrote:

On 2018-03-22 18:10, Kaz Kylheku wrote:

That may be so, yet there may be an issue here for someone packaging
Cygwin programs for use as native Windows applications.

That is to say, there could potentially be something here that the 
Cygnal

project could address:

http://www.kylheku.com/cygnal/

Cygnal is an ultra-light fork of the Cygwin DLL that is intended for 
users,
who run Cygwin programs out of the Windows environment directly, after 
building them in Cygwin.


Thanks for the hint. I confirm that just substituting cygwin1.dll makes
the test working:

D:\cli> test "текст плюс.txt"
param 0 = test
param 1 = текст плюс.txt
File 'текст плюс.txt' was opened


Well, that seems like a miracle, because in Cygnal, I don't remember 
doing anything

to the processing of the command line or initial locale.

I was not able to find any relevant difference in dcrt0.cc, but perhaps 
the

difference is in initial setting of locale (Cygnal initialization).


Could be some Cygwin issue caused by newer commit that isn't picked up 
in Cygnal;

i.e "red herring".

--
Problem reports:   http://cygwin.com/problems.html
FAQ:   http://cygwin.com/faq/
Documentation: http://cygwin.com/docs.html
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple



Re: Quotes around command-line argument that has unicode characters are not removed

2018-03-23 Thread Steven Penny

On Fri, 23 Mar 2018 08:39:21, Thomas Wolff wrote:
Due to the weird cmd.exe behaviour, you cannot. However, cygwin could 
apply a workaround by magic unquoting.


This is correct. note that "run" has this "workaround" already via the "--quote"
option. that code could perhaps be applied in other places:

http://sourceware.org/git/gitweb.cgi?p=cygwin-apps/run.git=blob=src/run.1.in


--
Problem reports:   http://cygwin.com/problems.html
FAQ:   http://cygwin.com/faq/
Documentation: http://cygwin.com/docs.html
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple



Re: Quotes around command-line argument that has unicode characters are not removed

2018-03-23 Thread Thomas Wolff

Am 22.03.2018 um 12:24 schrieb Andrey Repin:

...

when I put quotes around file that has
non-ASCII symbols, these quotes are passed to argv of the process literally,
otherwise they are removed. I would expect that there is a consistency.

Parameter unquoting done by the shell.
CMD does that differently from POSIX shells.
cmd.exe applies some inconsistent "smart" (in an MS sense...) magic 
quoting; it adds additional quotes if the parameter contains non-ASCII 
characters.

I have written a small C program that displays arguments, and run it three 
times:

...

You can also test this with cygwin /bin/echo:
C:\cygwin\bin>.\echo "bla"
bla

C:\cygwin\bin>.\echo "blö"
"blö"

This is also the reason why 'chere' fails on non-ASCII directories.


As one can see, the last run fails. I am a bit puzzled: how can I pass the name
of the file with space and Unicode symbols? I need to do it in uniform way, as I
am calling a Cygwin program from native Windows program, as in [1].
Due to the weird cmd.exe behaviour, you cannot. However, cygwin could 
apply a workaround by magic unquoting.


Thomas


Any feedback is appreciated.
[1] https://sourceware.org/ml/cygwin/2016-05/msg00082.html
[2] http://daviddeley.com/autohotkey/parameters/parameters.htm
[3] https://cygwin.com/cygwin-ug-net/using-specialnames.html#pathnames-at
[4] https://github.com/openunix/cygwin/blob/master/winsup/cygwin/dcrt0.cc#L177


--
Problem reports:   http://cygwin.com/problems.html
FAQ:   http://cygwin.com/faq/
Documentation: http://cygwin.com/docs.html
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple



Re: Quotes around command-line argument that has unicode characters are not removed

2018-03-22 Thread Dmitry Katsubo via cygwin
On 2018-03-22 18:10, Kaz Kylheku wrote:
> That may be so, yet there may be an issue here for someone packaging
> Cygwin programs for use as native Windows applications.
> 
> That is to say, there could potentially be something here that the Cygnal
> project could address:
> 
> http://www.kylheku.com/cygnal/
> 
> Cygnal is an ultra-light fork of the Cygwin DLL that is intended for users,
> who run Cygwin programs out of the Windows environment directly, after 
> building them in Cygwin.

Thanks for the hint. I confirm that just substituting cygwin1.dll makes
the test working:

D:\cli> test "текст плюс.txt"
param 0 = test
param 1 = текст плюс.txt
File 'текст плюс.txt' was opened

I was not able to find any relevant difference in dcrt0.cc, but perhaps the
difference is in initial setting of locale (Cygnal initialization).

-- 
With best regards,
Dmitry

--
Problem reports:   http://cygwin.com/problems.html
FAQ:   http://cygwin.com/faq/
Documentation: http://cygwin.com/docs.html
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple



Re: Quotes around command-line argument that has unicode characters are not removed

2018-03-22 Thread Dmitry Katsubo via cygwin
On 2018-03-22 14:25, Andrey Repin wrote:
> Greetings, Mikhail Usenko!
> 
>> In bare cmd.exe native-msvcrt binary is working OK with quoted non-ascii
>> arguments, while cygwin-flavor binary is not. But I don't know exactly which
>> level here: cmd.exe or msvcrt.dll/cygwin1.dll is responsible for
>> such a behavior.

Thanks, Mikhail! I generally agree with you. If you follow the links I've
provided in my original mail, you can see that cmd.exe does not do any argument
splitting. I also see that from this method signature [1]:

build_argv (char *cmd, char **, int , int winshell)

which basically takes a string as input and returns an array of strings plus
number of arguments as output. So this is either done by msvcrt.dll or by
cygwin1.dll and they have different ways of doing that, which is OK provided
it is documented and done consistently. I refer back to dcrt0.cc where the
woodoo is done. In particular in line 165 [2] it checks that execution was
performed from bare Windows, and behaves differently.

On 2018-03-22 12:24, Andrey Repin wrote:
> Run it in bash. I'm pretty sure you will see your results more consistent.

When "test.exe" is run from bash, it behaves correctly because as you said
bash did the most of dirty work. I also tried to workaround like below,
but it does not work:

D:\cli> bash -c "./test 'текст плюс.txt'"
bash: ./test 'текст плюс.txt': No such file or directory

> Locale settings affecting Cygwin binary.
> 
> If you
> set LANG=ru_RU.CP866
> (f.e.)
> before invoking cygwin testcase in native CMD, you will likely see it
> working better.

Thanks for this advise, Andrey. I see that it reacts, but works worth :)
I think it advises to output characters in CP866, but console is UTF-8:

D:\cli> set LANG=ru_RU.CP866

D:\cli> test "текст плюс.txt"
param 0 = test
param 1 = ⥪▒▒ .txt
Failed to open '⥪▒▒ .txt': No such file or directory

But.. ta-da! I made it working like that:

D:\cli> set LANG=ru_RU.UTF-8

D:\cli> test "текст плюс.txt"
param 0 = test
param 1 = текст плюс.txt
File 'текст плюс.txt' was opened

Hooray, it worked!

> Alternatively, you could try
> chcp 65001

That does not help:

D:\cli> chcp 65001
Active code page: 65001

D:\cli> test "текст плюс.txt"
param 0 = test
param 1 = "текст плюс.txt"
Failed to open '"текст плюс.txt"': No such file or directory

[1] https://github.com/openunix/cygwin/blob/master/winsup/cygwin/dcrt0.cc#L297
[2] https://github.com/openunix/cygwin/blob/master/winsup/cygwin/dcrt0.cc#L165

-- 
With best regards,
Dmitry

--
Problem reports:   http://cygwin.com/problems.html
FAQ:   http://cygwin.com/faq/
Documentation: http://cygwin.com/docs.html
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple



Re: Quotes around command-line argument that has unicode characters are not removed

2018-03-22 Thread Dmitry Katsubo via cygwin
On 2018-03-22 12:24, Andrey Repin wrote:
> 
> This is not cygwin, this is bare Windows.

This is executable linked against cygwin1.dll. I personally call such
binaries "Cygwin programs". However it is run from Windows.

> Parameter unquoting done by the shell.
> CMD does that differently from POSIX shells.

CMD does nothing when you execute a program from it. Command-line
is passed literally. I've download procmon.exe [1] and filtered by
process name "cmd.exe". When I run

D:\cli> test abc "текст\" плюс.txt"

(suppose that CMD will at least remove backslashes) I see the following
in the log:

test abc "текст\" плюс.txt"

[1] https://docs.microsoft.com/en-us/sysinternals/downloads/procmon

-- 
With best regards,
Dmitry

--
Problem reports:   http://cygwin.com/problems.html
FAQ:   http://cygwin.com/faq/
Documentation: http://cygwin.com/docs.html
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple



Re: Quotes around command-line argument that has unicode characters are not removed

2018-03-22 Thread Kaz Kylheku

On 2018-03-22 04:24, Andrey Repin wrote:

Greetings, Dmitry Katsubo!


Dear Cygwin community,



I observe the following on my Cygwin:


This is not cygwin, this is bare Windows.


That may be so, yet there may be an issue here for someone packaging
Cygwin programs for use as native Windows applications.

That is to say, there could potentially be something here that the 
Cygnal

project could address:

http://www.kylheku.com/cygnal/

Cygnal is an ultra-light fork of the Cygwin DLL that is intended for 
users like Dmitry Katsubo, who run Cygwin programs out of the Windows 
environment directly, after building them in Cygwin.





when I put quotes around file that has
non-ASCII symbols, these quotes are passed to argv of the process 
literally,
otherwise they are removed. I would expect that there is a 
consistency.


Parameter unquoting done by the shell.
CMD does that differently from POSIX shells.


As I seem to recall, CMD doesn't do anything, period! It passes the 
command line

as one big string. It has to since that's the OS mechanism.

The quoting conventions come from how various run-time libraries deal 
with that
string. An influential convention is that of the MS Visual C run-time 
library;
it behooves other run-times to be compatible with that for consistency 
with

programs whose main() was compiled with MSVC.


--
Problem reports:   http://cygwin.com/problems.html
FAQ:   http://cygwin.com/faq/
Documentation: http://cygwin.com/docs.html
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple



Re: Quotes around command-line argument that has unicode characters are not removed

2018-03-22 Thread Andrey Repin
Greetings, Mikhail Usenko!

> In bare cmd.exe native-msvcrt binary is working OK with quoted non-ascii
> arguments, while cygwin-flavor binary is not. But I don't know exactly which
> level here: cmd.exe or msvcrt.dll/cygwin1.dll is responsible for
> such a behavior.

Locale settings affecting Cygwin binary.

If you
set LANG=ru_RU.CP866
(f.e.)
before invoking cygwin testcase in native CMD, you will likely see it
working better.
Alternatively, you could try
chcp 65001


-- 
With best regards,
Andrey Repin
Thursday, March 22, 2018 16:22:13

Sorry for my terrible english...


--
Problem reports:   http://cygwin.com/problems.html
FAQ:   http://cygwin.com/faq/
Documentation: http://cygwin.com/docs.html
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple



Re: Quotes around command-line argument that has unicode characters are not removed

2018-03-22 Thread Mikhail Usenko via cygwin
On Thu, 22 Mar 2018 01:15:00 +0100
Dmitry Katsubo via cygwin <...> wrote:

> Dear Cygwin community,
> 
> I observe the following on my Cygwin: when I put quotes around file that has
> non-ASCII symbols, these quotes are passed to argv of the process literally,
> otherwise they are removed. I would expect that there is a consistency.
> 
> I have written a small C program that displays arguments, and run it three
> times:
> 
> #1 For the file with space, taken into quotes ("the file.txt") -- OK
> #2 For the file with non-ASCII characters (Château.txt) -- OK
> #3 For the file with non-ASCII characters, taken into quotes ("Château.txt") 
> -- WRONG
> 
> d:\cli> uname -a
> CYGWIN_NT-6.1-WOW PC 2.9.0(0.318/5/3) 2017-09-12 10:41 i686 Cygwin
> 
> D:\cli> chcp
> Active code page: 866
> 
> D:\cli> dir
> ...cut...
> 2018-03-22  00:43 0 Château.txt
> 2018-03-22  00:01   393 test.c
> 2018-03-22  00:01   150,230 test.exe
> 2018-03-21  00:15   186 test.pl
> 2018-03-22  00:43 0 the file.txt
> 2018-03-22  00:4016 текст плюс.txt
>6 File(s)150,825 bytes
>2 Dir(s)  41,972,293,632 bytes free
> 
> D:\cli> test "the file.txt"
> param 0 = test
> param 1 = the file.txt
> File 'the file.txt' was opened
> 
> D:\cli> test Château.txt
> param 0 = test
> param 1 = Château.txt
> File 'Château.txt' was opened
> 
> D:\cli> test "Château.txt"
> param 0 = test
> param 1 = "Château.txt"
> Failed to open '"Château.txt"': No such file or directory
> 
> As one can see, the last run fails. I am a bit puzzled: how can I pass the 
> name
> of the file with space and Unicode symbols? I need to do it in uniform way, 
> as I
> am calling a Cygwin program from native Windows program, as in [1].
> 
> D:\cli> test "текст плюс.txt"
> param 0 = test
> param 1 = "текст плюс.txt"
> Failed to open '"текст плюс.txt"': No such file or directory
> 
> I have search a bit, but I couldn't find a direct answer. From post [1] and 
> [2]
> I see that compiler inserts the code to do some argument pre-processing like
> @pathnames [3], but what are exactly the rules? Is quote pre-processing done 
> in
> dcrt0.cc:177 [4]?
> 
> Any feedback is appreciated.
> 
> [1] https://sourceware.org/ml/cygwin/2016-05/msg00082.html
> [2] http://daviddeley.com/autohotkey/parameters/parameters.htm
> [3] https://cygwin.com/cygwin-ug-net/using-specialnames.html#pathnames-at
> [4] https://github.com/openunix/cygwin/blob/master/winsup/cygwin/dcrt0.cc#L177
> 
> === test.c ===
> #include 
> #include 
> #include 
> 
> int main(int argc, char* argv[])
> {
>   for (int i = 0; i < argc; i++)
>   {
>   printf("param %d = %s\n", i, argv[i]);
>   }
>   FILE* f = fopen(argv[1], "r");
>   if (f != NULL)
>   {
>   printf("File '%s' was opened\n", argv[1]);
>   fclose(f);
>   } else {
>   printf("Failed to open '%s': %s\n", argv[1], strerror(errno));
>   }
>   return 0;
> }
> 
> -- 

Hello, Dmintry,
consider these test cases:

Native (msvcrt) binary:
---
$ x86_64-w64-mingw32-gcc test.c -o test-win.exe
$ ldd test-win.exe
ntdll.dll => /cygdrive/c/Windows/SYSTEM32/ntdll.dll (0x7fa0590)
KERNEL32.DLL => /cygdrive/c/Windows/system32/KERNEL32.DLL 
(0x7fa030e)
KERNELBASE.dll => /cygdrive/c/Windows/system32/KERNELBASE.dll 
(0x7fa028f)
msvcrt.dll => /cygdrive/c/Windows/system32/msvcrt.dll (0x7fa0322)
---

Cygwin-flavor binary:
-
$ gcc test.c -o test-cygwin.exe
$ ldd test-cygwin.exe
ntdll.dll => /cygdrive/c/Windows/SYSTEM32/ntdll.dll (0x7fa0590)
KERNEL32.DLL => /cygdrive/c/Windows/system32/KERNEL32.DLL 
(0x7fa030e)
KERNELBASE.dll => /cygdrive/c/Windows/system32/KERNELBASE.dll 
(0x7fa028f)
    cygwin1.dll => /usr/bin/cygwin1.dll (0x18004)
-

Create a file with non-ascii chars in the name:
---
$ touch "текст плюс.txt"
---

Run both binaries in mintty with bash:
--
$ ./test-win "текст плюс.txt"
param 0 = D:\wroot\test.cygwin\Quotes around command-line argument that has 
unicode characters are not removed\test-win.exe
param 1 = ▒ .txt
File '▒ .txt' was opened
$ ./test-cygwin "текст плюс.txt"
param 0 = ./test-cygwin
param 1

Re: Quotes around command-line argument that has unicode characters are not removed

2018-03-22 Thread Andrey Repin
Greetings, Dmitry Katsubo!

> Dear Cygwin community,

> I observe the following on my Cygwin:

This is not cygwin, this is bare Windows.

> when I put quotes around file that has
> non-ASCII symbols, these quotes are passed to argv of the process literally,
> otherwise they are removed. I would expect that there is a consistency.

Parameter unquoting done by the shell.
CMD does that differently from POSIX shells.

> I have written a small C program that displays arguments, and run it three
> times:

Run it in bash. I'm pretty sure you will see your results more consistent.

> #1 For the file with space, taken into quotes ("the file.txt") -- OK
> #2 For the file with non-ASCII characters (Château.txt) -- OK
> #3 For the file with non-ASCII characters, taken into quotes ("Château.txt") 
> -- WRONG

> d:\cli> uname -a
> CYGWIN_NT-6.1-WOW PC 2.9.0(0.318/5/3) 2017-09-12 10:41 i686 Cygwin

> D:\cli> chcp
> Active code page: 866

> D:\cli> dir
> ...cut...
> 2018-03-22  00:43 0 Château.txt
> 2018-03-22  00:01   393 test.c
> 2018-03-22  00:01   150,230 test.exe
> 2018-03-21  00:15   186 test.pl
> 2018-03-22  00:43 0 the file.txt
> 2018-03-22  00:4016 текст плюс.txt
>6 File(s)150,825 bytes
>2 Dir(s)  41,972,293,632 bytes free

> D:\cli> test "the file.txt"
> param 0 = test
> param 1 = the file.txt
> File 'the file.txt' was opened

> D:\cli> test Château.txt
> param 0 = test
> param 1 = Château.txt
> File 'Château.txt' was opened

> D:\cli> test "Château.txt"
> param 0 = test
> param 1 = "Château.txt"
> Failed to open '"Château.txt"': No such file or directory

> As one can see, the last run fails. I am a bit puzzled: how can I pass the 
> name
> of the file with space and Unicode symbols? I need to do it in uniform way, 
> as I
> am calling a Cygwin program from native Windows program, as in [1].

> D:\cli> test "текст плюс.txt"
> param 0 = test
> param 1 = "текст плюс.txt"
> Failed to open '"текст плюс.txt"': No such file or directory

> I have search a bit, but I couldn't find a direct answer. From post [1] and 
> [2]
> I see that compiler inserts the code to do some argument pre-processing like
> @pathnames [3], but what are exactly the rules? Is quote pre-processing done 
> in
> dcrt0.cc:177 [4]?

> Any feedback is appreciated.

> [1] https://sourceware.org/ml/cygwin/2016-05/msg00082.html
> [2] http://daviddeley.com/autohotkey/parameters/parameters.htm
> [3] https://cygwin.com/cygwin-ug-net/using-specialnames.html#pathnames-at
> [4] https://github.com/openunix/cygwin/blob/master/winsup/cygwin/dcrt0.cc#L177

> === test.c ===
> #include 
> #include 
> #include 

> int main(int argc, char* argv[])
> {
> for (int i = 0; i < argc; i++)
> {
> printf("param %d = %s\n", i, argv[i]);
> }
> FILE* f = fopen(argv[1], "r");
> if (f != NULL)
> {
> printf("File '%s' was opened\n", argv[1]);
> fclose(f);
> } else {
> printf("Failed to open '%s': %s\n", argv[1], strerror(errno));
> }
> return 0;
> }



-- 
With best regards,
Andrey Repin
Thursday, March 22, 2018 14:21:25

Sorry for my terrible english...
--
Problem reports:   http://cygwin.com/problems.html
FAQ:   http://cygwin.com/faq/
Documentation: http://cygwin.com/docs.html
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple



Quotes around command-line argument that has unicode characters are not removed

2018-03-21 Thread Dmitry Katsubo via cygwin
Dear Cygwin community,

I observe the following on my Cygwin: when I put quotes around file that has
non-ASCII symbols, these quotes are passed to argv of the process literally,
otherwise they are removed. I would expect that there is a consistency.

I have written a small C program that displays arguments, and run it three
times:

#1 For the file with space, taken into quotes ("the file.txt") -- OK
#2 For the file with non-ASCII characters (Château.txt) -- OK
#3 For the file with non-ASCII characters, taken into quotes ("Château.txt") -- 
WRONG

d:\cli> uname -a
CYGWIN_NT-6.1-WOW PC 2.9.0(0.318/5/3) 2017-09-12 10:41 i686 Cygwin

D:\cli> chcp
Active code page: 866

D:\cli> dir
...cut...
2018-03-22  00:43 0 Château.txt
2018-03-22  00:01   393 test.c
2018-03-22  00:01   150,230 test.exe
2018-03-21  00:15   186 test.pl
2018-03-22  00:43 0 the file.txt
2018-03-22  00:4016 текст плюс.txt
   6 File(s)150,825 bytes
   2 Dir(s)  41,972,293,632 bytes free

D:\cli> test "the file.txt"
param 0 = test
param 1 = the file.txt
File 'the file.txt' was opened

D:\cli> test Château.txt
param 0 = test
param 1 = Château.txt
File 'Château.txt' was opened

D:\cli> test "Château.txt"
param 0 = test
param 1 = "Château.txt"
Failed to open '"Château.txt"': No such file or directory

As one can see, the last run fails. I am a bit puzzled: how can I pass the name
of the file with space and Unicode symbols? I need to do it in uniform way, as I
am calling a Cygwin program from native Windows program, as in [1].

D:\cli> test "текст плюс.txt"
param 0 = test
param 1 = "текст плюс.txt"
Failed to open '"текст плюс.txt"': No such file or directory

I have search a bit, but I couldn't find a direct answer. From post [1] and [2]
I see that compiler inserts the code to do some argument pre-processing like
@pathnames [3], but what are exactly the rules? Is quote pre-processing done in
dcrt0.cc:177 [4]?

Any feedback is appreciated.

[1] https://sourceware.org/ml/cygwin/2016-05/msg00082.html
[2] http://daviddeley.com/autohotkey/parameters/parameters.htm
[3] https://cygwin.com/cygwin-ug-net/using-specialnames.html#pathnames-at
[4] https://github.com/openunix/cygwin/blob/master/winsup/cygwin/dcrt0.cc#L177

=== test.c ===
#include 
#include 
#include 

int main(int argc, char* argv[])
{
for (int i = 0; i < argc; i++)
{
printf("param %d = %s\n", i, argv[i]);
}
FILE* f = fopen(argv[1], "r");
if (f != NULL)
{
printf("File '%s' was opened\n", argv[1]);
fclose(f);
} else {
printf("Failed to open '%s': %s\n", argv[1], strerror(errno));
}
return 0;
}

-- 
With best regards,
Dmitry

--
Problem reports:   http://cygwin.com/problems.html
FAQ:   http://cygwin.com/faq/
Documentation: http://cygwin.com/docs.html
Unsubscribe info:  http://cygwin.com/ml/#unsubscribe-simple