Thanks for the answer,

It doesnt make sense indeed ;-) But unfortunately It hasn't been processed
by an xml parser in the php script. When i check the output direct after the
curl_exec its already converted (tested by echoing $result, and by setting
RETURNTRANSFER to 0). The perl script that creates the xml does it right,
its easely to check, i just warn the xml at the moment that i print it out,
so i can see the xml that is sent back to the php script in the webserver
logs.

So i am 100% sure that the conversion takes place after it is sent to the
webserver, and before its processed by the rest of the php script.

The only other thing that i can think up is that the webserver itself might
do something to the data? I am running apache on linux, is that a
possibility?

Merijn

----- Original Message -----
From: "Wez Furlong" <[EMAIL PROTECTED]>
To: "Merijn van den Kroonenberg" <[EMAIL PROTECTED]>
Cc: <[EMAIL PROTECTED]>
Sent: Tuesday, August 27, 2002 11:49 AM
Subject: Re: [PHP-DEV] curl and UTF-8, random encoding?


> Hey,
>
> That does not make sense, since neither curl nor PHP do any
> kind of conversion like that.
> Are you sure that you're not looking at the output from an XML
> processor that has mangled utf-8 -> iso-8859-1 ??
> (expat has source and target encodings that can be set separately),
> And are you using something like mbstring with transparent encoding
> translation turned on?
>
> --Wez.
>
>
> On 08/27/02, "Merijn van den Kroonenberg" <[EMAIL PROTECTED]> wrote:
> > Hello List,
> >
> > I have a problem with the php CURL module and UTF-8 data.
> > My php script uses curl to do a post to a perl/cgi script. This perl
script
> > returns UTF-8 encoded XML. The perl script returns utf-8, i have
verified
> > that using the webserver logfiles, but the data that i receive in
$result
> > (see below) is decoded to ISO-8859-1.
> >
> >     $ch = curl_init();
> >     curl_setopt($ch, CURLOPT_URL, $post_url);
> >     curl_setopt($ch, CURLOPT_HEADER, 0);
> >     curl_setopt($ch, CURLOPT_VERBOSE, 0);
> >     curl_setopt($ch, CURLOPT_POST, 1);
> >     curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
> >     curl_setopt($ch, CURLOPT_POSTFIELDS, $postfields);
> >     $result = curl_exec ($ch);// #### UTF compatible?
> >     curl_close ($ch);
> >
> > I did some further testing, and i found that this behaviour is not
> > consistent. Actually i am pretty puzzled about this.
> >
> > I was testing with a xml document that
> > contained only the following multi byte utf chacracter:
> > \303\253    (octal utf8) (LATIN SMALL LETTER E WITH DIAERESIS)
> > The output from CURL got automatically decoded to latin1.
> >
> > Then after that i tested with another xml document that
> > contained the following multi byte utf character:
> > \342\202\254 (octal utf8) (EURO SIGN)
> > I was suprised to see that the output was now correct UTF-8.
> >
> > Now i modified the first document and inserted the EURO SIGN in this
> > document. When i process this document again, the CURL output is UTF-8.
So
> > it seems the output of CURL depends on what it detects on its imput, and
it
> > will try to convert the data to latin1 if possible??
> >
> > Does anyone know how i can disable this behaviour? For me, CURL should
not
> > do any en/de-coding of my data.
> >
> > I also looked around at the cURL library site (http://curl.haxx.se/) of
the
> > developer of CURL. In message
> > http://curl.haxx.se/mail/curlphp-2001-02/0005.html the cURL developer
> > indicates that the libraries do not care about character sets, and that
it
> > might have something to do with the implementation into PHP.
> >
> > If this is true, then theres probably not much i can do about it. If its
the
> > case, please let me know, so i can find an alternative.
>
>


-- 
PHP Development Mailing List <http://www.php.net/>
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to