Hi, I'm trying to fetch pages using PoCo::Client::HTTP, and some pages are turning out to be in a different encoding then what the content-type header says. I've traced this to this change:
2006-10-25 06:55:14 (r294) by rcaputo lib/POE/Component/Client/HTTP.pm M; t/14_gzipped_content.t A; MANIFEST M; Makefile.PL M; lib/POE/Component/Client/HTTP/Request.pm M Apply Rob Bloodgood's patch to transparently decode non-streaming content before it's returned. This gives us support for gzip compressed content. Resolves long-standing rt.cpan.org ticket 8454. Can't this feature be optional? This is how the problem is reproduced: 1) http://d.hatena.ne.jp/lestrrat/ is a page in euc-jp 2) the server supports gzip encoding 3) PoCo::Client::HTTP sends headers claiming it can handle gzip encoding (which is fine) 4) In the response, content-encoding header is specified 5) HTTP::Response->decoded_content is called 6) in decoded_content, it handles the gzip encoding part 7) then in the next clause it goes on toe do the following if ($ct && $ct =~ m,^text/,,) { my $charset = $opt{charset} || $ct_param{charset} || $opt{default_charset} || "ISO-8859-1"; $charset = lc($charset); if ($charset ne "none") { require Encode; if (do{my $v = $Encode::VERSION; $v =~ s/_//g; $v} < 2.0901 && !$content_ref_iscopy) { # LEAVE_SRC did not work before Encode-2.0901 my $copy = $$content_ref; $content_ref = \$copy; $content_ref_iscopy++; } $content_ref = \Encode::decode($charset, $$content_ref, Encode::FB_CROAK() | Encode::LEAVE_SRC()); } } At the end , I have a request with content-type = 'text/hml; charset=euc-jp', and yet the content is UTF-8. I realize it may be a problem in HTTP::Message more so than POE, but I'd rather be able to turn off this feature by, for example, being able to NOT send the accept-encoding header. Can something like that be done? --d