Hi Andrzej, Does it mean that when you inject an existing (in crawldb) a URL it changes its status to STATUS_DB_UNFETCHED?
Gal -----Original Message----- From: Andrzej Bialecki [mailto:[EMAIL PROTECTED] Sent: Thursday, February 15, 2007 8:47 AM To: nutch-dev@lucene.apache.org Subject: Re: Injector checking for other than STATUS_INJECTED [EMAIL PROTECTED] wrote: > Hi All, > > I think I am missing something. In the Injector reduce code we have the > following. > > ------------------------------------------------------------------------ > while (values.hasNext()) { > CrawlDatum val = (CrawlDatum)values.next(); > if (val.getStatus() == CrawlDatum.STATUS_INJECTED) { > injected = val; > injected.setStatus(CrawlDatum.STATUS_DB_UNFETCHED); > } else { > old = val; > } > } > > CrawlDatum res = null; > if (old != null) res = old; // don't overwrite existing value > else res = injected; > ------------------------------------------------------------------------ > > Basically if it is not just injected then don't overwrite. But I am not > seeing where the input could be such that the CrawlDatum wasn't just > injected and could have previous values. Is this just in case someone > uses the Injector as a Reducer and not a Mapper or am I missing how this > condition can occur. > This handles an important case, when you inject URLs that already exist in the DB - then you have both the old value and the newly created value under the same key. In previous versions of Injector CrawlDatum-s for such URLs could be overwritten with new values, and you could lose valuable metadata accumulated in old values. -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __________________________________ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys-and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Nutch-developers mailing list Nutch-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nutch-developers