I just crawl some chinese website where Used GB2312 for Web Meta Charset,
the crawl and search it's OK. But when I want to try the Web Cached It's
encoding it's error.
So I see The cached.jsp in my tomcat . I know try to edit the cached.jsp 

if (encoding != null) {
      try {
        content = new String(bean.getContent(details), encoding);
      }
      catch (UnsupportedEncodingException e) {
        // fallback to windows-1252
        content = new String(bean.getContent(details), "windows-1252");
      }
    }
    else
      content = new String(bean.getContent(details), "gb2312");
  }

that the display Cached web it's Ok, But that just can do for web which used
GB2312
So it's not a good idear for me.
I want get the Cached web encoding
So I try to debug the Cached.jsp like this
String encoding = (String) metaData.get("CharEncodingForConversion");
System.out.print(encoding);
It's debug the encoding is NULL;

Metadata metaData = bean.getParseData(details).getContentMeta();
String contentType = (String) metaData.get(Metadata.CONTENT_TYPE);
System.out.print(contenType);

It's just debug the contenType is text/html

I hope somebody can know how to get The Cachec Web encoding

Thanks



-- 
View this message in context: 
http://www.nabble.com/How-can-I-know-the-Cached-Web-Charset-tf4769632.html#a13642889
Sent from the Nutch - User mailing list archive at Nabble.com.

Reply via email to