One of my users mentioned that my client application was much more conservative in counting non English unicode bytes (specifically Persian) than Twitter itself.
I've looked over the following thread and all the other threads referenced within without discovering a good answer: http://groups.google.com/group/twitter-development-talk/browse_thread/thread/a1a74365aa6827e7/4a45daee5a993e47 I've noticed a couple different behaviors. With simple Unicode (smiley face, arrow, etc), the Twitter Web interface apparently counts each character as 1 byte when displaying the count. Posting a long string of these however can lead to truncation. It seems perhaps that the Twitter Javascript is not making an attempt to accurately count Unicode? With Persian unicode, the Twitter web interface seems to allow a post of an amount of text much greater than I would have expected. My user provided a sample here that I used to experiment with: http://bit.ly/1LDsVJ I'm using Javascript code based on the method described here: http://www.inter-locale.com/demos/countBytes.html Does anyone have a more accurate counting method in Javascript that might be better with all types of Unicode? Thanks, - Scott @scott_carter
