Re: why does nutch interpret directory as URL

xiao yang Wed, 28 Apr 2010 21:30:17 -0700

Because it's a URL indeed.
You can either filter this kind of URL by configuring
crawl-urlfilter.txt (-^.*/$ may helps, but I'm not sure about the
regular expression) or filter the search result (you need to develop a
nutch plugin).
Thanks!


Xiao

On Thu, Apr 29, 2010 at 4:33 AM, BK <bk4...@gmail.com> wrote:
> While indexing files on local file system, why does NUTCH interpret the
> directory as a URL - fetching file:/C:/temp/html/
> This causes the index page of this directory to show up on search results.
> Any solutions for this issue??
>
>
> Bharteesh Kulkarni
>

Re: why does nutch interpret directory as URL

Reply via email to