Hi,
I'm trying to fetch pages using PoCo::Client::HTTP, and some pages are
turning out to be in a different encoding then what the content-type
header says. I've traced this to this change:
2006-10-25 06:55:14 (r294) by rcaputo
lib/POE/Component/Client/HTTP.pm M; t/14_gzipped_content.t A; MANIFEST
M; Makefile.PL M; lib/POE/Component/Client/HTTP/Request.pm M
Apply Rob Bloodgood's patch to transparently decode non-streaming
content before it's returned. This gives us support for gzip
compressed content. Resolves long-standing rt.cpan.org ticket 8454.
Can't this feature be optional?
This is how the problem is reproduced:
1) http://d.hatena.ne.jp/lestrrat/ is a page in euc-jp
2) the server supports gzip encoding
3) PoCo::Client::HTTP sends headers claiming it can
handle gzip encoding (which is fine)
4) In the response, content-encoding header is specified
5) HTTP::Response->decoded_content is called
6) in decoded_content, it handles the gzip encoding part
7) then in the next clause it goes on toe do the following
if ($ct && $ct =~ m,^text/,,) {
my $charset = $opt{charset} || $ct_param{charset} ||
$opt{default_charset} || "ISO-8859-1";
$charset = lc($charset);
if ($charset ne "none") {
require Encode;
if (do{my $v = $Encode::VERSION; $v =~ s/_//g; $v} < 2.0901 &&
!$content_ref_iscopy)
{
# LEAVE_SRC did not work before Encode-2.0901
my $copy = $$content_ref;
$content_ref = \$copy;
$content_ref_iscopy++;
}
$content_ref = \Encode::decode($charset, $$content_ref,
Encode::FB_CROAK() | Encode::LEAVE_SRC());
}
}
At the end , I have a request with content-type = 'text/hml;
charset=euc-jp', and yet the content is UTF-8.
I realize it may be a problem in HTTP::Message more so than POE, but I'd
rather be able to turn off this feature by, for example, being able to
NOT send the accept-encoding header.
Can something like that be done?
--d