PHP treats strings as c strings basically (char/byte arrays). It won't really 
do anything special automagically and leaves it up to you to make sure you 
treat your strings safely. Make sure your code is encoded in utf-8 and make 
sure your content types are set to UTF-8 in your responses. Use UTF-8 wherever 
you can in your dbs and use utf8_encode/decode and the mb functions 
replacements where you can't. If you are making http requests mark your 
encodings in your requests correctly (with CURL set your charset to UTF-8 in 
your request headers).    

In java, all strings are high level representations of chars (internally UCS2 
wide chars but you don't need to worry about that). You just need to make sure 
you decode/encode properly and mark your charsets in your requests and 
responses everywhere.   

Zac

Sent from my iPad

On May 13, 2010, at 10:51 AM, Matt Sanford <m...@twitter.com> wrote:

> Hi giustin,
> 
> I don't think it's the same issue since yours is more PHP specific.
> My guess is that the PHP library in question or the code you're using
> to process the results is incorrectly converting between UTF-8 and
> ISO-8859-1 [1]. Maybe someone on the list with some more PHP knowledge
> can suggest a fix.
> 
> Thanks;
> — Matt Sanford / @mzsanford
> 
> [1] =
> 
> The UTF-8 encoding of ã is two bytes. When those same two bytes are
> interpreted as ISO-8859-1 (a.k.a ISO-Latin-1) they are interpreted as
> two characters, like so (fixed width font required):
> 
> UTF-8 Bytes vs. Same bytes in ISO-8859-1
> ------------------------------------------------
> n 0x6E n
> 
> ã 0xC3 Ã
>  0xA3 £
> 
> o 0x6F o
> 
> 
> On May 12, 7:19 pm, giustin <tgiu...@gmail.com> wrote:
>> I have similar problems.
>> 
>> When I try to search using the tag "não" the result is ""não". The
>> API that I used were Twitter Search API from Ryan Faerman (http://
>> ryanfaerman.com/twittersearch/)
>> 
>> Regards.
>> 
>> On 12 maio, 21:47, Matt Sanford <m...@twitter.com> wrote:
>> 
>> 
>> 
>>> Hi there,
>> 
>>>     All characters in Tweets are utf-8. I'm assuming you're looking
>>> for something specific like accents or ASCII-art punctuation. Can you
>>> describe your problem in a little more detail? I might be able to help
>>> once I know what you're trying to prevent.
>> 
>>> Thanks;
>>>   — Matt Sanford / @mzsanford
>> 
>>> On May 12, 4:21 pm, adamjamesdrew <theikl...@gmail.com> wrote:
>> 
>>>> any ideas?

Reply via email to