Make sure you add -. at the end of your regex file to disallow anything
else.

On Mon, 2006-02-06 at 09:03 +0530, Saravanaraj Duraisamy wrote:
> Hi i am using nutch to index files in local FS and FTP.
> 
> my filter file is
> 
> -^(http|ftp|mailto):
> -\.(gif|GIF|jpg|JPG|ico|ICO|css|sit|eps|wmf|zip|mpg|gz|rpm|tgz|mov|MOV|exe|png|PNG|jar)$
> [EMAIL PROTECTED]
> -.*(/.+?)/.*?\1/.*?\1/
> +^file:/E:/Index Samples/
> -^file:/E:/Index Samples/Index/
> 
> but nutch crawls the forbidden folders also. is there a web db kind of thing
> for files also. is it possible to make nutch to index files based on the
> last modified date.
> 
> can anybody suggest the datastructure for webdb (filedb??) for files. it
> will be good to group files and create seperate segements for each group. so
> if some files are changed, only those segments can be replaced.
> 
> Rgds,
> D.Saravanaraj




-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to