Build failed in Jenkins: Nutch-trunk #1418
See https://hudson.apache.org/hudson/job/Nutch-trunk/1418/ -- [...truncated 1009 lines...] A src/plugin/subcollection/src/java/org/apache/nutch/collection A src/plugin/subcollection/src/java/org/apache/nutch/collection/Subcollection.java A src/plugin/subcollection/src/java/org/apache/nutch/collection/CollectionManager.java A src/plugin/subcollection/src/java/org/apache/nutch/collection/package.html A src/plugin/subcollection/src/java/org/apache/nutch/indexer A src/plugin/subcollection/src/java/org/apache/nutch/indexer/subcollection A src/plugin/subcollection/src/java/org/apache/nutch/indexer/subcollection/SubcollectionIndexingFilter.java A src/plugin/subcollection/README.txt A src/plugin/subcollection/plugin.xml A src/plugin/subcollection/build.xml A src/plugin/index-more A src/plugin/index-more/ivy.xml A src/plugin/index-more/src A src/plugin/index-more/src/test A src/plugin/index-more/src/test/org A src/plugin/index-more/src/test/org/apache A src/plugin/index-more/src/test/org/apache/nutch A src/plugin/index-more/src/test/org/apache/nutch/indexer A src/plugin/index-more/src/test/org/apache/nutch/indexer/more A src/plugin/index-more/src/test/org/apache/nutch/indexer/more/TestMoreIndexingFilter.java A src/plugin/index-more/src/java A src/plugin/index-more/src/java/org A src/plugin/index-more/src/java/org/apache A src/plugin/index-more/src/java/org/apache/nutch A src/plugin/index-more/src/java/org/apache/nutch/indexer A src/plugin/index-more/src/java/org/apache/nutch/indexer/more A src/plugin/index-more/src/java/org/apache/nutch/indexer/more/MoreIndexingFilter.java A src/plugin/index-more/src/java/org/apache/nutch/indexer/more/package.html A src/plugin/index-more/plugin.xml A src/plugin/index-more/build.xml AUsrc/plugin/plugin.dtd A src/plugin/parse-ext A src/plugin/parse-ext/ivy.xml A src/plugin/parse-ext/src A src/plugin/parse-ext/src/test A src/plugin/parse-ext/src/test/org A src/plugin/parse-ext/src/test/org/apache A src/plugin/parse-ext/src/test/org/apache/nutch A src/plugin/parse-ext/src/test/org/apache/nutch/parse A src/plugin/parse-ext/src/test/org/apache/nutch/parse/ext A src/plugin/parse-ext/src/test/org/apache/nutch/parse/ext/TestExtParser.java A src/plugin/parse-ext/src/java A src/plugin/parse-ext/src/java/org A src/plugin/parse-ext/src/java/org/apache A src/plugin/parse-ext/src/java/org/apache/nutch A src/plugin/parse-ext/src/java/org/apache/nutch/parse A src/plugin/parse-ext/src/java/org/apache/nutch/parse/ext A src/plugin/parse-ext/src/java/org/apache/nutch/parse/ext/ExtParser.java A src/plugin/parse-ext/plugin.xml A src/plugin/parse-ext/build.xml A src/plugin/parse-ext/command A src/plugin/urlnormalizer-pass A src/plugin/urlnormalizer-pass/ivy.xml A src/plugin/urlnormalizer-pass/src A src/plugin/urlnormalizer-pass/src/test A src/plugin/urlnormalizer-pass/src/test/org A src/plugin/urlnormalizer-pass/src/test/org/apache A src/plugin/urlnormalizer-pass/src/test/org/apache/nutch A src/plugin/urlnormalizer-pass/src/test/org/apache/nutch/net A src/plugin/urlnormalizer-pass/src/test/org/apache/nutch/net/urlnormalizer A src/plugin/urlnormalizer-pass/src/test/org/apache/nutch/net/urlnormalizer/pass AU src/plugin/urlnormalizer-pass/src/test/org/apache/nutch/net/urlnormalizer/pass/TestPassURLNormalizer.java A src/plugin/urlnormalizer-pass/src/java A src/plugin/urlnormalizer-pass/src/java/org A src/plugin/urlnormalizer-pass/src/java/org/apache A src/plugin/urlnormalizer-pass/src/java/org/apache/nutch A src/plugin/urlnormalizer-pass/src/java/org/apache/nutch/net A src/plugin/urlnormalizer-pass/src/java/org/apache/nutch/net/urlnormalizer A src/plugin/urlnormalizer-pass/src/java/org/apache/nutch/net/urlnormalizer/pass AU src/plugin/urlnormalizer-pass/src/java/org/apache/nutch/net/urlnormalizer/pass/PassURLNormalizer.java AUsrc/plugin/urlnormalizer-pass/plugin.xml AUsrc/plugin/urlnormalizer-pass/build.xml A src/plugin/parse-html A src/plugin/parse-html/ivy.xml A src/plugin/parse-html/lib A src/plugin/parse-html/lib/tagsoup.LICENSE.txt A src/plugin/parse-html/src A src/plugin/parse-html/src/test A src/plugin/parse-html/src/test/org A src/plugin/parse-html/src/test/org/apache A src/plugin/parse-html/src/test/org/apache/nutch A src/plugin/parse-html/src/test/org/apache/nutch/parse A
[jira] Commented: (NUTCH-946) cache.jsp does not recognize encoding conversion from content different to UTF-8
[ https://issues.apache.org/jira/browse/NUTCH-946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13003281#comment-13003281 ] Nikos Mastropavlos commented on NUTCH-946: -- Having tried this on some Greek websites with encoding Windows-1253, the correct meta name seems to be Content-Encoding instead of CharEncodingForConversion. So, using the patch described above and adding a if (encoding==null) encoding = (String) parseMetaData.get(Content-Encoding); right after the CharEncodingForConversion search, seemed to do the trick for me. cache.jsp does not recognize encoding conversion from content different to UTF-8 Key: NUTCH-946 URL: https://issues.apache.org/jira/browse/NUTCH-946 Project: Nutch Issue Type: Bug Components: web gui Affects Versions: 1.2 Environment: Server version: Apache Tomcat/6.0.29 Server built: July 19 2010 1458 Server number: 6.0.0.29 OS Name:Linux OS Version: 2.6.18-128.7.1.el5 Architecture: i386 JVM Version:1.6.0_22-b04 JVM Vendor: Sun Microsystems Inc. Reporter: Enrique Berlanga Priority: Minor Attachments: cache-946.patch Cache view does not recognize encoding conversion needed to show properly page content stored in a segment. The problem is that it searchs CharEncodingForConversion meta in content metadata, but it's stored in parse metadata. Here is the patch I've generated for the fixed version: ### Eclipse Workspace Patch 1.0 #P branch-1.2 Index: src/web/jsp/cached.jsp === --- src/web/jsp/cached.jsp(revision 1027060) +++ src/web/jsp/cached.jsp(working copy) @@ -39,17 +39,18 @@ ResourceBundle.getBundle(org.nutch.jsp.cached, request.getLocale()) .getLocale().getLanguage(); - Metadata metaData = bean.getParseData(details).getContentMeta(); + Metadata contentMetaData = bean.getParseData(details).getContentMeta(); + Metadata parseMetaData = bean.getParseData(details).getParseMeta(); String content = null; - String contentType = (String) metaData.get(Metadata.CONTENT_TYPE); + String contentType = (String) contentMetaData.get(Metadata.CONTENT_TYPE); if (contentType.startsWith(text/html)) { // FIXME : it's better to emit the original 'byte' sequence // with 'charset' set to the value of 'CharEncoding', // but I don't know how to emit 'byte sequence' in JSP. // out.getOutputStream().write(bean.getContent(details)) may work, // but I'm not sure. -String encoding = (String) metaData.get(CharEncodingForConversion); +String encoding = (String) parseMetaData.get(CharEncodingForConversion); if (encoding != null) { try { content = new String(bean.getContent(details), encoding); -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira