Re: [rfbproto] [PATCH] Specify UTF-8 for strings

Adam Tkac Mon, 17 Aug 2009 02:00:29 -0700

On Mon, Aug 17, 2009 at 10:22:56AM +0200, Peter Rosin wrote:
> >> If it is so natural with UTF-8 and if it really is the only sane choise
> >> (I think it is), it's enough if our spec says (e.g.)
> >>
> >>     It is strongly recommended that all implementations use
> >>     UTF-8 for all strings (except explicitely stated otherwise)
> >>     to ensure interoperability. But be prepared that not all
> >>     implementation do, so fail gracefully if you receive
> >>     something else.
> >>
> >> instead of (e.g.)
> >>
> >>     All implementations MUST use UTF-8 for all strings (except
> >>     explicitely stated otherwise). But not all implementations
> >>     do, so you SHOULD fail gracefully if you receive something
> >>     else.
> >>
> >> I just don't see why the wording with MUST/SHOULD is so superior
> >> that it is worth rendering existing implementations incompatible
> >> with our spec.
> > 
> > This is ok with me. I don't think there's any difference in practice.
> 
> Oh, cool. Pierre previously asked if I had any alternative wording,
> so here is my suggestion:
> 
> diff --git a/rfbproto.rst b/rfbproto.rst
> index 7852746..0252e4f 100644
> --- a/rfbproto.rst
> +++ b/rfbproto.rst
> @@ -201,6 +201,26 @@ that you contact RealVNC Ltd to make sure that your 
> encodin security types do not clash. Please see the RealVNC website at
>   http://www.realvnc.com for details of how to contact them.
> 
> +String Encodings
> +================
> +
> +It is strongly recommended that strings in RFB are encoded using the
> +UTF-8 encoding. This allows full unicode support, yet retains good
> +compatibility with older RFB implementations.
> +
> +The encoding used for strings in the protocol has historically often
> +been unspecified, or has changed between versions of the protocol. As a
> +result, there are a lot of implementations which use different,
> +incompatible encodings. Commonly those encodings have been ISO 8859-1
> +(also known as Latin-1) or Windows code pages.
> +
> +Clients and servers are encouraged to send UTF-8 strings unless that
> +particular part of the protocol mandates another encoding. They should
> +however be prepared to receive invalid UTF-8 sequences at all times.
> +Such sequences should be handled gracefully by e.g. stripping the
> +invalid portions or trying to interpret the string using common
> +encodings such as ISO 8859-1 or Windows code page 1252.
> +


Hm, it is easy to say "invalid portions of UTF-8" string but it is
_very_ hard to create an algorithm which will determine if a part of
string is valid or invalid. If you are using UTF-8 users might create
strings with "obscure" characters. I think this kind of heuristic
should not be included in protocol.

If an implementation sends strings in, for example, the ISO 8859-*
encoding it will end with crippled characters but we have to live
with it, there is probably no algorithm to solve this problem.

Regards, Adam

-- 
Adam Tkac, Red Hat, Inc.

------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with 
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
tigervnc-rfbproto mailing list
tigervnc-rfbproto@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tigervnc-rfbproto

Re: [rfbproto] [PATCH] Specify UTF-8 for strings

Reply via email to