Re: Root slash being stripped from file path

Bai Shen Thu, 28 Mar 2013 06:04:55 -0700

Sorry.  I'm using 2.1.  I did a general web search and didn't find any
instances of the problem.  I found a couple tutorials using the
file:///data/mydir format with no mention of any issues.

The problem is that the normalizers(not sure which one) strip out that
leading / which changes the url from absolute to relative.  I turned off
the normalizers but now I'm getting an index out of bounds exception from
unreverseUrl.  I haven't dug through the code yet but I'm betting that it's
not liking the slash since that's not something that would show up in a
http url.

On Wed, Mar 27, 2013 at 4:29 PM, Lewis John Mcgibbney <
[email protected]> wrote:

> Nutch version please?
> Sebastian and others worked on this a while ago.
> I don't know about the progress on it. There is most certainly
> open/resolved tickets for it on Jira please look there.
> Thank you
> Lewis
>
> On Wed, Mar 27, 2013 at 12:26 PM, Bai Shen <[email protected]>
> wrote:
>
> > I'm trying to crawl a local file system.  I've made the changes to not
> > ignore file urls and added protocol-file to the plugins list.  I've
> > included file:///data/mydir in my url fille.
> >
> > However, when I run the fetch, Nutch tries to connect to
> file://data/mydir
> > and therefore returns a 404 error.  I think the root slash is being
> > stripped during the injection, but I can't seem to find out why.
> >
> > Anybody have any suggestions or ideas?
> >
> > Thanks.
> >
>
>
>
> --
> *Lewis*
>

Re: Root slash being stripped from file path

Reply via email to