crossafire wrote:
> 
> I just crawl some chinese website where Used GB2312 for Web Meta Charset,
> the crawl and search it's OK. But when I want to try the Web Cached It's
> encoding it's error.
> So I see The cached.jsp in my tomcat . I know try to edit the cached.jsp 
> 
> if (encoding != null) {
>       try {
>         content = new String(bean.getContent(details), encoding);
>       }
>       catch (UnsupportedEncodingException e) {
>         // fallback to windows-1252
>         content = new String(bean.getContent(details), "windows-1252");
>       }
>     }
>     else
>       content = new String(bean.getContent(details), "gb2312");
>   }
> 
> that the display Cached web it's Ok, But that just can do for web which
> used GB2312
> So it's not a good idear for me.
> I want get the Cached web encoding
> So I try to debug the Cached.jsp like this
> String encoding = (String) metaData.get("CharEncodingForConversion");
> System.out.print(encoding);
> It's debug the encoding is NULL;
> 
> Metadata metaData = bean.getParseData(details).getContentMeta();
> String contentType = (String) metaData.get(Metadata.CONTENT_TYPE);
> System.out.print(contenType);
> 
> It's just debug the contenType is text/html
> 
> I hope somebody can know how to get The Cachec Web encoding
> 
> Thanks
> 
> 
> 
> 

Thank you 
But I must to know the Html charset becasue many chinese web site used
gb2312 for html page
I think I just try the jchardet , Thank you very much 

-- 
View this message in context: 
http://www.nabble.com/How-can-I-know-the-Cached-Web-Charset-tf4769632.html#a13660093
Sent from the Nutch - User mailing list archive at Nabble.com.

Reply via email to