Thanks for the answer, It doesnt make sense indeed ;-) But unfortunately It hasn't been processed by an xml parser in the php script. When i check the output direct after the curl_exec its already converted (tested by echoing $result, and by setting RETURNTRANSFER to 0). The perl script that creates the xml does it right, its easely to check, i just warn the xml at the moment that i print it out, so i can see the xml that is sent back to the php script in the webserver logs.
So i am 100% sure that the conversion takes place after it is sent to the webserver, and before its processed by the rest of the php script. The only other thing that i can think up is that the webserver itself might do something to the data? I am running apache on linux, is that a possibility? Merijn ----- Original Message ----- From: "Wez Furlong" <[EMAIL PROTECTED]> To: "Merijn van den Kroonenberg" <[EMAIL PROTECTED]> Cc: <[EMAIL PROTECTED]> Sent: Tuesday, August 27, 2002 11:49 AM Subject: Re: [PHP-DEV] curl and UTF-8, random encoding? > Hey, > > That does not make sense, since neither curl nor PHP do any > kind of conversion like that. > Are you sure that you're not looking at the output from an XML > processor that has mangled utf-8 -> iso-8859-1 ?? > (expat has source and target encodings that can be set separately), > And are you using something like mbstring with transparent encoding > translation turned on? > > --Wez. > > > On 08/27/02, "Merijn van den Kroonenberg" <[EMAIL PROTECTED]> wrote: > > Hello List, > > > > I have a problem with the php CURL module and UTF-8 data. > > My php script uses curl to do a post to a perl/cgi script. This perl script > > returns UTF-8 encoded XML. The perl script returns utf-8, i have verified > > that using the webserver logfiles, but the data that i receive in $result > > (see below) is decoded to ISO-8859-1. > > > > $ch = curl_init(); > > curl_setopt($ch, CURLOPT_URL, $post_url); > > curl_setopt($ch, CURLOPT_HEADER, 0); > > curl_setopt($ch, CURLOPT_VERBOSE, 0); > > curl_setopt($ch, CURLOPT_POST, 1); > > curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); > > curl_setopt($ch, CURLOPT_POSTFIELDS, $postfields); > > $result = curl_exec ($ch);// #### UTF compatible? > > curl_close ($ch); > > > > I did some further testing, and i found that this behaviour is not > > consistent. Actually i am pretty puzzled about this. > > > > I was testing with a xml document that > > contained only the following multi byte utf chacracter: > > \303\253 (octal utf8) (LATIN SMALL LETTER E WITH DIAERESIS) > > The output from CURL got automatically decoded to latin1. > > > > Then after that i tested with another xml document that > > contained the following multi byte utf character: > > \342\202\254 (octal utf8) (EURO SIGN) > > I was suprised to see that the output was now correct UTF-8. > > > > Now i modified the first document and inserted the EURO SIGN in this > > document. When i process this document again, the CURL output is UTF-8. So > > it seems the output of CURL depends on what it detects on its imput, and it > > will try to convert the data to latin1 if possible?? > > > > Does anyone know how i can disable this behaviour? For me, CURL should not > > do any en/de-coding of my data. > > > > I also looked around at the cURL library site (http://curl.haxx.se/) of the > > developer of CURL. In message > > http://curl.haxx.se/mail/curlphp-2001-02/0005.html the cURL developer > > indicates that the libraries do not care about character sets, and that it > > might have something to do with the implementation into PHP. > > > > If this is true, then theres probably not much i can do about it. If its the > > case, please let me know, so i can find an alternative. > > -- PHP Development Mailing List <http://www.php.net/> To unsubscribe, visit: http://www.php.net/unsub.php