Hi,

I'm trying to fetch pages using PoCo::Client::HTTP, and some pages are
turning out to be in a different encoding then what the content-type
header says. I've traced this to this change:

  2006-10-25 06:55:14 (r294) by rcaputo
  lib/POE/Component/Client/HTTP.pm M; t/14_gzipped_content.t A; MANIFEST
  M; Makefile.PL M; lib/POE/Component/Client/HTTP/Request.pm M

    Apply Rob Bloodgood's patch to transparently decode non-streaming
    content before it's returned. This gives us support for gzip
    compressed content. Resolves long-standing rt.cpan.org ticket 8454.

Can't this feature be optional?


This is how the problem is reproduced:

  1) http://d.hatena.ne.jp/lestrrat/ is a page in euc-jp
  2) the server supports gzip encoding

  3) PoCo::Client::HTTP sends headers claiming it can
     handle gzip encoding (which is fine)

  4) In the response, content-encoding header is specified
  5) HTTP::Response->decoded_content is called

  6) in decoded_content, it handles the gzip encoding part
  7) then in the next clause it goes on toe do the following

    if ($ct && $ct =~ m,^text/,,) {
        my $charset = $opt{charset} || $ct_param{charset} ||
$opt{default_charset} || "ISO-8859-1";
        $charset = lc($charset);
        if ($charset ne "none") {
        require Encode;
        if (do{my $v = $Encode::VERSION; $v =~ s/_//g; $v} < 2.0901 &&
            !$content_ref_iscopy)
        {
            # LEAVE_SRC did not work before Encode-2.0901
            my $copy = $$content_ref;
            $content_ref = \$copy;
            $content_ref_iscopy++;
        }
        $content_ref = \Encode::decode($charset, $$content_ref,
                           Encode::FB_CROAK() | Encode::LEAVE_SRC());
        }
    }

At the end , I have a request with content-type = 'text/hml;
charset=euc-jp', and yet the content is UTF-8.

I realize it may be a problem in HTTP::Message more so than POE, but I'd
rather be able to turn off this feature by, for example, being able to
NOT send the accept-encoding header.

Can something like that be done?

--d

Reply via email to