There is a problem with the indexer too. It doesn't check for the new
CrawlDatum statuses. Patch attached.
Index: src/java/org/apache/nutch/indexer/Indexer.java
===================================================================
--- src/java/org/apache/nutch/indexer/Indexer.java (revision 490954)
+++ src/java/org/apache/nutch/indexer/Indexer.java (working copy)
@@ -194,11 +194,15 @@
case CrawlDatum.STATUS_DB_UNFETCHED:
case CrawlDatum.STATUS_DB_FETCHED:
case CrawlDatum.STATUS_DB_GONE:
+ case CrawlDatum.STATUS_DB_REDIR_TEMP:
+ case CrawlDatum.STATUS_DB_REDIR_PERM:
dbDatum = datum;
break;
case CrawlDatum.STATUS_FETCH_SUCCESS:
case CrawlDatum.STATUS_FETCH_RETRY:
case CrawlDatum.STATUS_FETCH_GONE:
+ case CrawlDatum.STATUS_FETCH_REDIR_TEMP:
+ case CrawlDatum.STATUS_FETCH_REDIR_PERM:
fetchDatum = datum;
break;
default:
-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nutch-developers mailing list
Nutch-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nutch-developers