On Wed, Sep 9, 2009 at 1:07 AM, Matt Sanford <m...@twitter.com> wrote:
>    I more than agree with the above statement that a character is a
> character and Twitter shouldn't care. Data should be data. The main
> issue with that is that some clients compose characters and some
> don't. My common example of this is é. Depending on your client
> Twitter could get:
>
> é - 1 byte
>   - URL Encoded UTF-8: %C3%A9
>   - http://www.fileformat.info/info/unicode/char/00e9/index.htm
>
> -- or --
>
> é - 2 bytes
>   - URL Encoded UTF-8: %65%CC%81
>   - http://www.fileformat.info/info/unicode/char/0065/index.htm
>     + plus: http://www.fileformat.info/info/unicode/char/0301/index.htm
>
>    So, my fix will make it so that no matter the client if the user
> sees é it counts as a single character. I'll announce something in the
> change log once my fix is deployed.

Sorry for being picky about this, I'm just trying to make sure that
I'm understanding the terms correctly as you are using them.

I tend to think of Twitter as 140 "characters" (rather than bytes). I
realize that "character" may not have a precise definition, but to me,
each of these is "one character":

e é < & >

Am I understanding you correctly that Twitter is moving to standardize
where you can send a message with 140 "characters" regardless of
whether that's 140 e or 140 é or 140 < or 140 & or 140 > ?

I think that's what is being said, I just want to make sure I'm
understanding properly.

Thanks!

TjL

Reply via email to