PHP treats strings as c strings basically (char/byte arrays). It won't really do anything special automagically and leaves it up to you to make sure you treat your strings safely. Make sure your code is encoded in utf-8 and make sure your content types are set to UTF-8 in your responses. Use UTF-8 wherever you can in your dbs and use utf8_encode/decode and the mb functions replacements where you can't. If you are making http requests mark your encodings in your requests correctly (with CURL set your charset to UTF-8 in your request headers).
In java, all strings are high level representations of chars (internally UCS2 wide chars but you don't need to worry about that). You just need to make sure you decode/encode properly and mark your charsets in your requests and responses everywhere. Zac Sent from my iPad On May 13, 2010, at 10:51 AM, Matt Sanford <m...@twitter.com> wrote: > Hi giustin, > > I don't think it's the same issue since yours is more PHP specific. > My guess is that the PHP library in question or the code you're using > to process the results is incorrectly converting between UTF-8 and > ISO-8859-1 [1]. Maybe someone on the list with some more PHP knowledge > can suggest a fix. > > Thanks; > — Matt Sanford / @mzsanford > > [1] = > > The UTF-8 encoding of ã is two bytes. When those same two bytes are > interpreted as ISO-8859-1 (a.k.a ISO-Latin-1) they are interpreted as > two characters, like so (fixed width font required): > > UTF-8 Bytes vs. Same bytes in ISO-8859-1 > ------------------------------------------------ > n 0x6E n > > ã 0xC3 à > 0xA3 £ > > o 0x6F o > > > On May 12, 7:19 pm, giustin <tgiu...@gmail.com> wrote: >> I have similar problems. >> >> When I try to search using the tag "não" the result is ""não". The >> API that I used were Twitter Search API from Ryan Faerman (http:// >> ryanfaerman.com/twittersearch/) >> >> Regards. >> >> On 12 maio, 21:47, Matt Sanford <m...@twitter.com> wrote: >> >> >> >>> Hi there, >> >>> All characters in Tweets are utf-8. I'm assuming you're looking >>> for something specific like accents or ASCII-art punctuation. Can you >>> describe your problem in a little more detail? I might be able to help >>> once I know what you're trying to prevent. >> >>> Thanks; >>> — Matt Sanford / @mzsanford >> >>> On May 12, 4:21 pm, adamjamesdrew <theikl...@gmail.com> wrote: >> >>>> any ideas?