Finally found it in JIRA. https://issues.apache.org/jira/browse/NUTCH-1483
I'll give the patch a try and see if that fixes my issue. On Wed, Mar 27, 2013 at 4:29 PM, Lewis John Mcgibbney < [email protected]> wrote: > Nutch version please? > Sebastian and others worked on this a while ago. > I don't know about the progress on it. There is most certainly > open/resolved tickets for it on Jira please look there. > Thank you > Lewis > > On Wed, Mar 27, 2013 at 12:26 PM, Bai Shen <[email protected]> > wrote: > > > I'm trying to crawl a local file system. I've made the changes to not > > ignore file urls and added protocol-file to the plugins list. I've > > included file:///data/mydir in my url fille. > > > > However, when I run the fetch, Nutch tries to connect to > file://data/mydir > > and therefore returns a 404 error. I think the root slash is being > > stripped during the injection, but I can't seem to find out why. > > > > Anybody have any suggestions or ideas? > > > > Thanks. > > > > > > -- > *Lewis* >

