Please also see https://issues.apache.org/jira/browse/NUTCH-1484
Sebastien resolved this one off and AFAIK fixed the solution. On Thu, Mar 28, 2013 at 6:09 AM, Bai Shen <[email protected]> wrote: > Finally found it in JIRA. > > https://issues.apache.org/jira/browse/NUTCH-1483 > > I'll give the patch a try and see if that fixes my issue. > > On Wed, Mar 27, 2013 at 4:29 PM, Lewis John Mcgibbney < > [email protected]> wrote: > > > Nutch version please? > > Sebastian and others worked on this a while ago. > > I don't know about the progress on it. There is most certainly > > open/resolved tickets for it on Jira please look there. > > Thank you > > Lewis > > > > On Wed, Mar 27, 2013 at 12:26 PM, Bai Shen <[email protected]> > > wrote: > > > > > I'm trying to crawl a local file system. I've made the changes to not > > > ignore file urls and added protocol-file to the plugins list. I've > > > included file:///data/mydir in my url fille. > > > > > > However, when I run the fetch, Nutch tries to connect to > > file://data/mydir > > > and therefore returns a 404 error. I think the root slash is being > > > stripped during the injection, but I can't seem to find out why. > > > > > > Anybody have any suggestions or ideas? > > > > > > Thanks. > > > > > > > > > > > -- > > *Lewis* > > > -- *Lewis*

