Bugs item #988772, was opened at 2004-07-10 21:33 Message generated for change (Comment added) made by cutting You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=491356&aid=988772&group_id=59548
Category: None Group: None >Status: Closed >Resolution: Accepted Priority: 5 Submitted By: Takashi Okamoto (toraneko) Assigned to: Nobody/Anonymous (nobody) Summary: [PATCH] detect charset from HTTP header Initial Comment: Nutch doesn't detect charset from HTTP header. This cause problem other than iso-8859-1 environment. I attached the patch to detect charset from HTTP header and it treat non iso-8859-1 pages correctly. regards, Takashi Okamoto ---------------------------------------------------------------------- >Comment By: Doug Cutting (cutting) Date: 2004-07-14 16:05 Message: Logged In: YES user_id=21778 Overall this looks good to me. Two problems, however. First, the patch to TextParser.java didn't compile (the contentType variable was unbound, and UnsupportedEncodingException was not caught). Second, there are no unit tests with this. I fixed the compilation problems and committed this, because I think this is very useful to have. But if you have a chance, could you please contribute some JUnit test cases? Thanks. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=491356&aid=988772&group_id=59548 ------------------------------------------------------- This SF.Net email is sponsored by BEA Weblogic Workshop FREE Java Enterprise J2EE developer tools! Get your free copy of BEA WebLogic Workshop 8.1 today. http://ads.osdn.com/?ad_id=4721&alloc_id=10040&op=click _______________________________________________ Nutch-developers mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/nutch-developers
