I am in trunk, trying to do the following bin/nutch crawl urls -dir crawl.test
where urls contains http://spack.net/ and conf/crawl-urlfilter.txt contains +^http://spack.net/ when I run the command I get Exception in thread "main" java.lang.NullPointerException at org.apache.nutch.indexer.IndexSegment.indexPages(IndexSegment.java:148) at org.apache.nutch.indexer.IndexSegment.main(IndexSegment.java:262) at org.apache.nutch.tools.CrawlTool.main(CrawlTool.java:153) I did a little digging and it appears that lang ends up being null (couldn't quite track down where lang should have been set). Not sure if it is a proper fix, but changing doc.getField("lang").stringValue() to doc.get("lang"), makes my little crawl complete. As did commenting out that LOG.info command. Like I say, not sure why lang is null, but if it is going to be null, probably shouldn't be calling stringValue() on it. Guess it didn't like http://spack.net/ Patch follows. Earl Index: Index: trunk/src/java/org/apache/nutch/indexer/IndexSegment.java =================================================================== --- trunk/src/java/org/apache/nutch/indexer/IndexSegment.java (revision 264952) +++ trunk/src/java/org/apache/nutch/indexer/IndexSegment.java (working copy) @@ -146,7 +146,7 @@ // add the document to the index NutchAnalyzer analyzer = AnalyzerFactory.get(doc.get("lang")); LOG.info(" Indexing [" + doc.getField("url").stringValue() + - "] with analyzer " + analyzer + " (" + doc.getField("lang").stringValue() + ")"); + "] with analyzer " + analyzer + " (" + doc.get("lang") + ")"); //LOG.info(" Doc is " + doc); writer.addDocument(doc, analyzer); if (count > 0 && count % LOG_STEP == 0) { __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com ------------------------------------------------------- SF.Net email is Sponsored by the Better Software Conference & EXPO September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf _______________________________________________ Nutch-developers mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-developers
