Because those pages do not exist in reality. However we have all the metadata required to build an index.
This is solved by generating the *required outlinks* for the xml (hub) page. On Mon, Apr 8, 2013 at 8:45 PM, feng lu <[email protected]> wrote: > Hi Sourajit > > Why do you want to index unfetched webpages? The index processing will be > failed if these pages will not have some fields that is to be needed by > indexer, such as digest. > > > On Mon, Apr 8, 2013 at 7:15 PM, Sourajit Basak <[email protected] > >wrote: > > > We have a use case where we are generating multiple parse outputs per > url. > > In short the url hosts a custom xml file which is being parsed to > generate > > several records. > > > > However, in reality the discovered or generated urls are not actually > > fetched. According to NUTCH-514, anything which isn't fetched will be > > skipped during index. > > > > We need to override this behavior. Any ideas how it can be accomplished ? > > > > > > -- > Don't Grow Old, Grow Up... :-) >

