Make sure you add -. at the end of your regex file to disallow anything
else.

On Mon, 2006-02-06 at 09:03 +0530, Saravanaraj Duraisamy wrote:
> Hi i am using nutch to index files in local FS and FTP.
> 
> my filter file is
> 
> -^(http|ftp|mailto):
> -\.(gif|GIF|jpg|JPG|ico|ICO|css|sit|eps|wmf|zip|mpg|gz|rpm|tgz|mov|MOV|exe|png|PNG|jar)$
> [EMAIL PROTECTED]
> -.*(/.+?)/.*?\1/.*?\1/
> +^file:/E:/Index Samples/
> -^file:/E:/Index Samples/Index/
> 
> but nutch crawls the forbidden folders also. is there a web db kind of thing
> for files also. is it possible to make nutch to index files based on the
> last modified date.
> 
> can anybody suggest the datastructure for webdb (filedb??) for files. it
> will be good to group files and create seperate segements for each group. so
> if some files are changed, only those segments can be replaced.
> 
> Rgds,
> D.Saravanaraj


Reply via email to