Does the 1.4 version of nutch have tika-app? Also..maybe I am not using the DocumentFragment object properly? Below is a summary version of my code:
public ParseResult filter(Content content, ParseResult parseResult, HTMLMetaTags metaTags, DocumentFragment doc) { for (int x = 0; x < doc.getChildNodes().getLength(); x++) { System.out.println("xml node name" + doc.getChildNodes().item(x).getNodeName()); System.out.println("xml node value" + doc.getChildNodes().item(x).getNodeValue()); System.out.println("xml text content" + doc.getChildNodes().item(x).getTextContent()); } -- View this message in context: http://lucene.472066.n3.nabble.com/Cached-page-like-google-with-hits-highlighted-tp4001374p4001440.html Sent from the Nutch - User mailing list archive at Nabble.com.