Build failed in Jenkins: Nutch-trunk #1418

2011-03-06 Thread Apache Hudson Server
See https://hudson.apache.org/hudson/job/Nutch-trunk/1418/

--
[...truncated 1009 lines...]
A src/plugin/subcollection/src/java/org/apache/nutch/collection
A 
src/plugin/subcollection/src/java/org/apache/nutch/collection/Subcollection.java
A 
src/plugin/subcollection/src/java/org/apache/nutch/collection/CollectionManager.java
A 
src/plugin/subcollection/src/java/org/apache/nutch/collection/package.html
A src/plugin/subcollection/src/java/org/apache/nutch/indexer
A 
src/plugin/subcollection/src/java/org/apache/nutch/indexer/subcollection
A 
src/plugin/subcollection/src/java/org/apache/nutch/indexer/subcollection/SubcollectionIndexingFilter.java
A src/plugin/subcollection/README.txt
A src/plugin/subcollection/plugin.xml
A src/plugin/subcollection/build.xml
A src/plugin/index-more
A src/plugin/index-more/ivy.xml
A src/plugin/index-more/src
A src/plugin/index-more/src/test
A src/plugin/index-more/src/test/org
A src/plugin/index-more/src/test/org/apache
A src/plugin/index-more/src/test/org/apache/nutch
A src/plugin/index-more/src/test/org/apache/nutch/indexer
A src/plugin/index-more/src/test/org/apache/nutch/indexer/more
A 
src/plugin/index-more/src/test/org/apache/nutch/indexer/more/TestMoreIndexingFilter.java
A src/plugin/index-more/src/java
A src/plugin/index-more/src/java/org
A src/plugin/index-more/src/java/org/apache
A src/plugin/index-more/src/java/org/apache/nutch
A src/plugin/index-more/src/java/org/apache/nutch/indexer
A src/plugin/index-more/src/java/org/apache/nutch/indexer/more
A 
src/plugin/index-more/src/java/org/apache/nutch/indexer/more/MoreIndexingFilter.java
A 
src/plugin/index-more/src/java/org/apache/nutch/indexer/more/package.html
A src/plugin/index-more/plugin.xml
A src/plugin/index-more/build.xml
AUsrc/plugin/plugin.dtd
A src/plugin/parse-ext
A src/plugin/parse-ext/ivy.xml
A src/plugin/parse-ext/src
A src/plugin/parse-ext/src/test
A src/plugin/parse-ext/src/test/org
A src/plugin/parse-ext/src/test/org/apache
A src/plugin/parse-ext/src/test/org/apache/nutch
A src/plugin/parse-ext/src/test/org/apache/nutch/parse
A src/plugin/parse-ext/src/test/org/apache/nutch/parse/ext
A 
src/plugin/parse-ext/src/test/org/apache/nutch/parse/ext/TestExtParser.java
A src/plugin/parse-ext/src/java
A src/plugin/parse-ext/src/java/org
A src/plugin/parse-ext/src/java/org/apache
A src/plugin/parse-ext/src/java/org/apache/nutch
A src/plugin/parse-ext/src/java/org/apache/nutch/parse
A src/plugin/parse-ext/src/java/org/apache/nutch/parse/ext
A 
src/plugin/parse-ext/src/java/org/apache/nutch/parse/ext/ExtParser.java
A src/plugin/parse-ext/plugin.xml
A src/plugin/parse-ext/build.xml
A src/plugin/parse-ext/command
A src/plugin/urlnormalizer-pass
A src/plugin/urlnormalizer-pass/ivy.xml
A src/plugin/urlnormalizer-pass/src
A src/plugin/urlnormalizer-pass/src/test
A src/plugin/urlnormalizer-pass/src/test/org
A src/plugin/urlnormalizer-pass/src/test/org/apache
A src/plugin/urlnormalizer-pass/src/test/org/apache/nutch
A src/plugin/urlnormalizer-pass/src/test/org/apache/nutch/net
A 
src/plugin/urlnormalizer-pass/src/test/org/apache/nutch/net/urlnormalizer
A 
src/plugin/urlnormalizer-pass/src/test/org/apache/nutch/net/urlnormalizer/pass
AU
src/plugin/urlnormalizer-pass/src/test/org/apache/nutch/net/urlnormalizer/pass/TestPassURLNormalizer.java
A src/plugin/urlnormalizer-pass/src/java
A src/plugin/urlnormalizer-pass/src/java/org
A src/plugin/urlnormalizer-pass/src/java/org/apache
A src/plugin/urlnormalizer-pass/src/java/org/apache/nutch
A src/plugin/urlnormalizer-pass/src/java/org/apache/nutch/net
A 
src/plugin/urlnormalizer-pass/src/java/org/apache/nutch/net/urlnormalizer
A 
src/plugin/urlnormalizer-pass/src/java/org/apache/nutch/net/urlnormalizer/pass
AU
src/plugin/urlnormalizer-pass/src/java/org/apache/nutch/net/urlnormalizer/pass/PassURLNormalizer.java
AUsrc/plugin/urlnormalizer-pass/plugin.xml
AUsrc/plugin/urlnormalizer-pass/build.xml
A src/plugin/parse-html
A src/plugin/parse-html/ivy.xml
A src/plugin/parse-html/lib
A src/plugin/parse-html/lib/tagsoup.LICENSE.txt
A src/plugin/parse-html/src
A src/plugin/parse-html/src/test
A src/plugin/parse-html/src/test/org
A src/plugin/parse-html/src/test/org/apache
A src/plugin/parse-html/src/test/org/apache/nutch
A src/plugin/parse-html/src/test/org/apache/nutch/parse
A 

[jira] Commented: (NUTCH-946) cache.jsp does not recognize encoding conversion from content different to UTF-8

2011-03-06 Thread Nikos Mastropavlos (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-946?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13003281#comment-13003281
 ] 

Nikos Mastropavlos commented on NUTCH-946:
--

Having tried this on some Greek websites with encoding Windows-1253, the 
correct meta name seems to be Content-Encoding instead of 
CharEncodingForConversion. So, using the patch described above and adding a 
if (encoding==null) encoding = (String) 
parseMetaData.get(Content-Encoding);
right after the CharEncodingForConversion search, seemed to do the trick for me.


 cache.jsp does not recognize encoding conversion from content different to 
 UTF-8
 

 Key: NUTCH-946
 URL: https://issues.apache.org/jira/browse/NUTCH-946
 Project: Nutch
  Issue Type: Bug
  Components: web gui
Affects Versions: 1.2
 Environment: Server version: Apache Tomcat/6.0.29
 Server built:   July 19 2010 1458
 Server number:  6.0.0.29
 OS Name:Linux
 OS Version: 2.6.18-128.7.1.el5
 Architecture:   i386
 JVM Version:1.6.0_22-b04
 JVM Vendor: Sun Microsystems Inc.
Reporter: Enrique Berlanga
Priority: Minor
 Attachments: cache-946.patch


 Cache view does not recognize encoding conversion needed to show properly 
 page content stored in a segment.
 The problem is that it searchs CharEncodingForConversion meta in content 
 metadata, but it's stored in parse metadata.
 Here is the patch I've generated for the fixed version:
 ### Eclipse Workspace Patch 1.0
 #P branch-1.2
 Index: src/web/jsp/cached.jsp
 ===
 --- src/web/jsp/cached.jsp(revision 1027060)
 +++ src/web/jsp/cached.jsp(working copy)
 @@ -39,17 +39,18 @@
  ResourceBundle.getBundle(org.nutch.jsp.cached, request.getLocale())
  .getLocale().getLanguage();
  
 -  Metadata metaData = bean.getParseData(details).getContentMeta();
 +  Metadata contentMetaData = bean.getParseData(details).getContentMeta();
 +  Metadata parseMetaData = bean.getParseData(details).getParseMeta();
  
String content = null;
 -  String contentType = (String) metaData.get(Metadata.CONTENT_TYPE);
 +  String contentType = (String) contentMetaData.get(Metadata.CONTENT_TYPE);
if (contentType.startsWith(text/html)) {
  // FIXME : it's better to emit the original 'byte' sequence 
  // with 'charset' set to the value of 'CharEncoding',
  // but I don't know how to emit 'byte sequence' in JSP.
  // out.getOutputStream().write(bean.getContent(details)) may work, 
  // but I'm not sure.
 -String encoding = (String) metaData.get(CharEncodingForConversion); 
 +String encoding = (String) 
 parseMetaData.get(CharEncodingForConversion); 
  if (encoding != null) {
try {
  content = new String(bean.getContent(details), encoding);

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira