On Wed, Sep 9, 2009 at 1:07 AM, Matt Sanford <m...@twitter.com> wrote: > I more than agree with the above statement that a character is a > character and Twitter shouldn't care. Data should be data. The main > issue with that is that some clients compose characters and some > don't. My common example of this is é. Depending on your client > Twitter could get: > > é - 1 byte > - URL Encoded UTF-8: %C3%A9 > - http://www.fileformat.info/info/unicode/char/00e9/index.htm > > -- or -- > > é - 2 bytes > - URL Encoded UTF-8: %65%CC%81 > - http://www.fileformat.info/info/unicode/char/0065/index.htm > + plus: http://www.fileformat.info/info/unicode/char/0301/index.htm > > So, my fix will make it so that no matter the client if the user > sees é it counts as a single character. I'll announce something in the > change log once my fix is deployed.
Sorry for being picky about this, I'm just trying to make sure that I'm understanding the terms correctly as you are using them. I tend to think of Twitter as 140 "characters" (rather than bytes). I realize that "character" may not have a precise definition, but to me, each of these is "one character": e é < & > Am I understanding you correctly that Twitter is moving to standardize where you can send a message with 140 "characters" regardless of whether that's 140 e or 140 é or 140 < or 140 & or 140 > ? I think that's what is being said, I just want to make sure I'm understanding properly. Thanks! TjL