Re: if \xe4==\xe4 failes,why?

2006-12-01 Thread Charles E Campbell Jr

mbbill wrote:


I met a very strange problem recently, that is
when I set the following options:
set encoding=utf-8
set ignorecase
then the expression: if \xe4==\xe4 fails.
I test it using:
if \xe4==\xe4
  echo test
endif
but I got nothing output, why ?

 

 


Try

set encoding=utf-8
if \xe4 == \xe4
redraw!
echo equal!
else
redraw!
echo not equal
endif


Looks like your message is doing an unwanted disappearing act.

Regards,
Chip Campbell



Re: if \xe4==\xe4 failes,why?

2006-12-01 Thread A.J.Mechelynck

Charles E Campbell Jr wrote:

mbbill wrote:


I met a very strange problem recently, that is
when I set the following options:
set encoding=utf-8
set ignorecase
then the expression: if \xe4==\xe4 fails.
I test it using:
if \xe4==\xe4
  echo test
endif
but I got nothing output, why ?

 

 


Try

set encoding=utf-8
if \xe4 == \xe4
redraw!
echo equal!
else
redraw!
echo not equal
endif


Looks like your message is doing an unwanted disappearing act.

Regards,
Chip Campbell




It's not as simple as that, Dr. Chip: I get 0 (zero) as reply to

:echo (\xe4 == \xe4)

when 'encoding' is UTF-8. However, the byte 0xE4 by itself is not a valid 
character in UTF-8. I also get 1 (one) in reply to


:echo (ä == \xc3\xa4)

where ä (a-umlaut) is Unicode codepoint U+00E4, represented in UTF-8 by the 
two bytes 0xC3 0xA4.



Best regards,
Tony.


Re: if \xe4==\xe4 failes,why?

2006-12-01 Thread Charles E Campbell Jr

A.J.Mechelynck wrote:


Charles E Campbell Jr wrote:


mbbill wrote:


I met a very strange problem recently, that is
when I set the following options:
set encoding=utf-8
set ignorecase
then the expression: if \xe4==\xe4 fails.
I test it using:
if \xe4==\xe4
  echo test
endif
but I got nothing output, why ?

 

 


Try

set encoding=utf-8
if \xe4 == \xe4
redraw!
echo equal!
else
redraw!
echo not equal
endif


Looks like your message is doing an unwanted disappearing act.

Regards,
Chip Campbell




It's not as simple as that, Dr. Chip: I get 0 (zero) as reply to

:echo (\xe4 == \xe4)

when 'encoding' is UTF-8. However, the byte 0xE4 by itself is not a 
valid character in UTF-8. I also get 1 (one) in reply to


:echo (ä == \xc3\xa4)

where ä (a-umlaut) is Unicode codepoint U+00E4, represented in UTF-8 
by the two bytes 0xC3 0xA4.


That's peculiar; I get (when I source the script):
equal!
1

with the following emendation to the script:

set encoding=utf-8
if \xe4 == \xe4
redraw!
echo equal!
else
redraw!
echo not equal
endif
echo (\xe4 == \xe4)

But, without those redraws, I get no message.

Regards,
Chip Campbell



Re: if \xe4==\xe4 failes,why?

2006-12-01 Thread A.J.Mechelynck

Charles E Campbell Jr wrote:

A.J.Mechelynck wrote:

[...]

It's not as simple as that, Dr. Chip: I get 0 (zero) as reply to

:echo (\xe4 == \xe4)

when 'encoding' is UTF-8. However, the byte 0xE4 by itself is not a 
valid character in UTF-8. I also get 1 (one) in reply to


:echo (ä == \xc3\xa4)

where ä (a-umlaut) is Unicode codepoint U+00E4, represented in UTF-8 
by the two bytes 0xC3 0xA4.


That's peculiar; I get (when I source the script):
equal!
1

with the following emendation to the script:

set encoding=utf-8
if \xe4 == \xe4
redraw!
echo equal!
else
redraw!
echo not equal
endif
echo (\xe4 == \xe4)

But, without those redraws, I get no message.

Regards,
Chip Campbell




I still get 0 (and sourcing the above scriptlet gives me

not equal
0

). In some earlier post, the OP mentioned he needed 'ignorecase' to see the 
unequal behaviour (I have encoding=utf-8 ignorecase as defaults).


Using gvim 7.0.174, huge version with GTK2-GNOME GUI, on SuSE Linux 9.3


Best regards,
Tony.


Re: if \xe4==\xe4 failes,why?

2006-11-30 Thread A.J.Mechelynck

mbbill wrote:
[...]

Some minutes before,when I test the bug somewhere else, after I set ignorecase and encoding to utf-8 
,nothing went wrong. The result of the expression  echo (\xe4==\xe4)  was 1

Before that,I have deleted all temp files and config files then reinstalled the 
VIM, and I did the same thing on my computer but the result was quite different.

[...]

Ah?
I'm using gvim 7.0.174 with GTK2/Gnome GUI on SuSE Linux Professional 9.3. My 
vimrc sets 'encoding' to UTF-8 (but that's usually not necessary, I'm 
apparently on a UTF-8 locale) and 'ignorecase' to TRUE.



Best regards,
Tony.


Re: if \xe4==\xe4 failes,why?

2006-11-29 Thread A.J.Mechelynck

mbbill wrote:

I met a very strange problem recently, that is
when I set the following options:
set encoding=utf-8
set ignorecase
then the expression: if \xe4==\xe4 fails.
I test it using:
if \xe4==\xe4
   echo test
endif
but I got nothing output, why ?

  



I confirm this:

:echo (\xe4 == \xe4)

outputs 0

I guess the strings, or at least one of them, are not evaluated as the U+00E4 
codepoint, i.e., 0xC3 0xA4 but as the one-byte string 0xE4, which is not a 
valid Unicode codepoint when followed by a null. The latter would be NaS (Not 
a String) in evaluations, and give the same kind of strange results as NaN 
(Not a Number) in floating-point comparisons.


This conjecture seems to be confirmed by

:echo (\xe4)

which outputs e4 in blue, not ä (a-umlaut) in black, which is output by

:echo ä

and by

:echo (\Char-0xe4)


Bug or feature?


Best regards,
Tony.


Re: if \xe4==\xe4 failes,why?

2006-11-29 Thread A.J.Mechelynck

A.J.Mechelynck wrote:

mbbill wrote:

I met a very strange problem recently, that is
when I set the following options:
set encoding=utf-8
set ignorecase
then the expression: if \xe4==\xe4 fails.
I test it using:
if \xe4==\xe4
   echo test
endif
but I got nothing output, why ?

 


I confirm this:

:echo (\xe4 == \xe4)

outputs 0

I guess the strings, or at least one of them, are not evaluated as the 
U+00E4 codepoint, i.e., 0xC3 0xA4 but as the one-byte string 0xE4, 
which is not a valid Unicode codepoint when followed by a null. The 
latter would be NaS (Not a String) in evaluations, and give the same 
kind of strange results as NaN (Not a Number) in floating-point 
comparisons.


This conjecture seems to be confirmed by

:echo (\xe4)

which outputs e4 in blue, not ä (a-umlaut) in black, which is output by

:echo ä

and by

:echo (\Char-0xe4)


Bug or feature?


Best regards,
Tony.



P.S.

:echo (ä == \xc3\xa4)

outputs 1 (one, i.e., TRUE). I think this proves my conjecture above.


Best regards,
Tony.


Re: if \xe4==\xe4 failes,why?

2006-11-29 Thread mbbill
Hello A.J.Mechelynck,

Thursday, November 30, 2006, 1:15:14 PM, you wrote:

?A.J.Mechelynck wrote:
?mbbill wrote:
?I met a very strange problem recently, that is
?when I set the following options:
?set encoding=utf-8
?set ignorecase
?then the expression: if \xe4==\xe4 fails.
?I test it using:
?if \xe4==\xe4
?   echo test
?endif
?but I got nothing output, why ?

? 

?I confirm this:

?:echo (\xe4 == \xe4)

?outputs 0

?I guess the strings, or at least one of them, are not evaluated as the 
?U+00E4 codepoint, i.e., 0xC3 0xA4 but as the one-byte string 0xE4, 
?which is not a valid Unicode codepoint when followed by a null. The 
?latter would be NaS (Not a String) in evaluations, and give the same 
?kind of strange results as NaN (Not a Number) in floating-point 
?comparisons.

?This conjecture seems to be confirmed by

?:echo (\xe4)

?which outputs e4 in blue, not ä (a-umlaut) in black, which is output by

?:echo ä

?and by

?:echo (\Char-0xe4)


?Bug or feature?


?Best regards,
?Tony.


?P.S.

?:echo (ä == \xc3\xa4)

?outputs 1 (one, i.e., TRUE). I think this proves my conjecture above.

Yes, I agree with your opinion.
When I test it somewhere else, I can not let the bug come again sometimes, 
may be some other options can affect the result of the expression.



-- 
Best regards,
 mbbillmailto:[EMAIL PROTECTED]



Re: if \xe4==\xe4 failes,why?

2006-11-29 Thread A.J.Mechelynck

mbbill wrote:

Hello A.J.Mechelynck,

Thursday, November 30, 2006, 1:15:14 PM, you wrote:


?A.J.Mechelynck wrote:

?mbbill wrote:

?I met a very strange problem recently, that is
?when I set the following options:
?set encoding=utf-8
?set ignorecase
?then the expression: if \xe4==\xe4 fails.
?I test it using:
?if \xe4==\xe4
?   echo test
?endif
?but I got nothing output, why ?


? 



?I confirm this:



?:echo (\xe4 == \xe4)



?outputs 0


?I guess the strings, or at least one of them, are not evaluated as the 
?U+00E4 codepoint, i.e., 0xC3 0xA4 but as the one-byte string 0xE4, 
?which is not a valid Unicode codepoint when followed by a null. The 
?latter would be NaS (Not a String) in evaluations, and give the same 
?kind of strange results as NaN (Not a Number) in floating-point 
?comparisons.



?This conjecture seems to be confirmed by



?:echo (\xe4)



?which outputs e4 in blue, not ä (a-umlaut) in black, which is output by



?:echo ä



?and by



?:echo (\Char-0xe4)




?Bug or feature?




?Best regards,
?Tony.




?P.S.



?:echo (ä == \xc3\xa4)



?outputs 1 (one, i.e., TRUE). I think this proves my conjecture above.


Yes, I agree with your opinion.
When I test it somewhere else, I can not let the bug come again sometimes, 
may be some other options can affect the result of the expression.





In all 8-bit encodings, \xe4 is (IIUC) whatever is represented in that 
encoding by the byte 0xe4, which is usually a valid character. In Unicode 
(always internally UTF-8 in Vim) 0xE4 is not a valid character, unless it is 
followed by exactly two bytes (no more, no less) in the range 0x80-0xBF, 
because UTF-8 codepoints are represented by one to six bytes each, and these 
bytes are as follows:

0x00-0x7F: standalone byte
0x80-0xBF: trailing byte (any byte but the first, in a multibyte sequence)
0xCO-0xDF: leading byte of a two-byte sequence
0xE0-0xEF: leading byte of a three-byte sequence
0xF0-0xF7: leading byte of a four-byte sequence
0xF8-0xFB: leading byte of a five-byte sequence
0xFC-0xFD: leading byte of a six-byte sequence
0xFE-0xFF: invalid

I don't know how \xe4 tests in non-Unicode multibyte encodings such as those 
used for Chinese, Japanese, Korean, etc.



Best regards,
Tony.