Bugs item #988772, was opened at 2004-07-10 21:33
Message generated for change (Comment added) made by cutting
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=491356&aid=988772&group_id=59548

Category: None
Group: None
>Status: Closed
>Resolution: Accepted
Priority: 5
Submitted By: Takashi Okamoto (toraneko)
Assigned to: Nobody/Anonymous (nobody)
Summary: [PATCH] detect charset from HTTP header

Initial Comment:
Nutch doesn't detect charset from HTTP header. This
cause problem other than iso-8859-1 environment.
I attached the patch to detect charset from HTTP header
and  
it treat non iso-8859-1 pages correctly.

regards,

Takashi Okamoto

----------------------------------------------------------------------

>Comment By: Doug Cutting (cutting)
Date: 2004-07-14 16:05

Message:
Logged In: YES 
user_id=21778

Overall this looks good to me.  Two problems, however. 
First, the patch to TextParser.java didn't compile (the
contentType variable was unbound, and
UnsupportedEncodingException was not caught).  Second, there
are no unit tests with this.

I fixed the compilation problems and committed this, because
I think this is very useful to have.  But if you have a
chance, could you please contribute some JUnit test cases? 
Thanks.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=491356&aid=988772&group_id=59548


-------------------------------------------------------
This SF.Net email is sponsored by BEA Weblogic Workshop
FREE Java Enterprise J2EE developer tools!
Get your free copy of BEA WebLogic Workshop 8.1 today.
http://ads.osdn.com/?ad_id=4721&alloc_id=10040&op=click
_______________________________________________
Nutch-developers mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to