Re: [rfbproto] [PATCH] Specify UTF-8 for strings

Peter Rosin Fri, 26 Jun 2009 12:14:40 -0700

Den 2009-06-25 18:39 skrev Peter Åstrand:
> On Thu, 25 Jun 2009, Peter Rosin wrote:
> 
>>> I'm not sure how their version will affect anything. Given the history,
>>> you can never be sure that this field will display correctly when using
>>> anything other than ASCII.
>>
>> This is exactly my point. You can't use anything but ASCII it you want
>> it to display as intended. And this fact will remain for a looong time.
> 
> There's a distinction between "use by developers" and "use by end 
> users". The fact is that in practice, non-ASCCI doesn't work today. That 
> is, non-ASCII cannot be used by end users. This is a big problem which 
> needs to be fixed as soon as possible. By introducing some new magic 
> extension we are requiring that *every* single VNC implementation out 
> there is modifed/updated, before users can use non-ASCII. *That* would 
> take a "looong time".


Huh? Even if we do add an extension for this you are still free to
do it your way in case the extension isn't advertised by the client
(or by the server). But if we do not add an extension, there's just
no options left for someone who wants to do something different
should the other end not explicitely support UTF-8.

Adding support for the extension should be trivial if it is a NOP
when it's active (i.e. your way). The harder part is to change the
code to make all strings UTF-8.

>> If I have an RFB server somewhere that serves a variety of clients.
>> 50% of the clients speak CP1252, 30% speak UTF-8 and 10% speak ISO 8859-1
>> and 9% speak ISO 8859-whatever.
>> The last percent (probably less) speak something non-ASCII-compatible,
>> but I don't care about those because that's just not compatible...
>>
>> If I really want to serve some "international" text in that scenario,
>> what options do I have? Whatever I do, it will look like crap for many.
> 
> In this (fictional) example, ASCII is the only common subset. Thus, you 
> must configure the server to only use ASCII desktop names. And as we 
> have discussed before, this happens automatically with todays 
> implementations: You can't get anything else with the Windows servers, 
> and both the Xvnc and vncserver scripts setups ASCII only desktop names 
> by default.
> 
> Without a poll or something like that we'll never know how many uses 
> "strange" desktop names today, but my guess is that it's something like 
> 1 out of 10000 or so. Speaking of "many" just doesn't feels right.

You are now talking specifically about the desktop name in the
ServerInit message. The patch talked about *all* strings (that do not
explicitely mention any encoding, i.e. all but *CutText if memory
serves me).

>> skåra -> skara is just way better than skåra -> skÃ¥ra IMHO.
> 
> I disagree. Replacing non-ASCII characters with ASCII ones may cause 
> confusion or even security problems. If characters are to be replaced, 
> it is critical that this is visible to the user. (If I remember 
> correctly, this is pointed out in the Unicode book). A replacement 
> character such as a box or something like that can used (this is what 
> Microsoft Word does), but no such character is available in ASCII. But 
> the UTF-8 way should work good enough, since sequences such as "Ã¥" are 
> very rare in practice. This is a good property of UTF-8, and this is by 
> design.

You brought up security. What if I, using CP-1252, write:
HELGEÅ… (hex 48 45 4C 47 45 C5 85)
That will come out like this in a UTF-8 client:
HELGEŅ

Quite easy to miss the cedilla and misinterpret that one (note to
those not fluent in Swedish: HELGEÅ is a small river in the southern
part of Sweden, HELGEN w/o cedilla is Swedish for "the weekend").

Yes, it would be an improvement if *everybody* agreed on UTF-8. But
until then, I think it should be an extension.




Hmm, new idea:

The UTF-8 extension could be signalled by the server by sending the
desktop name twice with a zero in between.

I.e. skara\0skÃ¥ra

An old-style client would with all likelyhood display "skara", and a
new-style client could easily take the second string. If you think
"skara" is bad/insecure you could always send "skÃ¥ra" twice. But a
new-style client would know. The client can use a pseudo-encoding as
usual to signal UTF-8.

But if I had a wish, it might just be for "RFB 003.009\0" where the
only change would be UTF-8 for all strings. It's either that or good
health...

Cheers,
Peter


------------------------------------------------------------------------------
_______________________________________________
tigervnc-rfbproto mailing list
tigervnc-rfbproto@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tigervnc-rfbproto

Re: [rfbproto] [PATCH] Specify UTF-8 for strings

Reply via email to