Sorry. I'm using 2.1. I did a general web search and didn't find any instances of the problem. I found a couple tutorials using the file:///data/mydir format with no mention of any issues.
The problem is that the normalizers(not sure which one) strip out that leading / which changes the url from absolute to relative. I turned off the normalizers but now I'm getting an index out of bounds exception from unreverseUrl. I haven't dug through the code yet but I'm betting that it's not liking the slash since that's not something that would show up in a http url. On Wed, Mar 27, 2013 at 4:29 PM, Lewis John Mcgibbney < [email protected]> wrote: > Nutch version please? > Sebastian and others worked on this a while ago. > I don't know about the progress on it. There is most certainly > open/resolved tickets for it on Jira please look there. > Thank you > Lewis > > On Wed, Mar 27, 2013 at 12:26 PM, Bai Shen <[email protected]> > wrote: > > > I'm trying to crawl a local file system. I've made the changes to not > > ignore file urls and added protocol-file to the plugins list. I've > > included file:///data/mydir in my url fille. > > > > However, when I run the fetch, Nutch tries to connect to > file://data/mydir > > and therefore returns a 404 error. I think the root slash is being > > stripped during the injection, but I can't seem to find out why. > > > > Anybody have any suggestions or ideas? > > > > Thanks. > > > > > > -- > *Lewis* >

