Make sure you add -. at the end of your regex file to disallow anything else.
On Mon, 2006-02-06 at 09:03 +0530, Saravanaraj Duraisamy wrote: > Hi i am using nutch to index files in local FS and FTP. > > my filter file is > > -^(http|ftp|mailto): > -\.(gif|GIF|jpg|JPG|ico|ICO|css|sit|eps|wmf|zip|mpg|gz|rpm|tgz|mov|MOV|exe|png|PNG|jar)$ > [EMAIL PROTECTED] > -.*(/.+?)/.*?\1/.*?\1/ > +^file:/E:/Index Samples/ > -^file:/E:/Index Samples/Index/ > > but nutch crawls the forbidden folders also. is there a web db kind of thing > for files also. is it possible to make nutch to index files based on the > last modified date. > > can anybody suggest the datastructure for webdb (filedb??) for files. it > will be good to group files and create seperate segements for each group. so > if some files are changed, only those segments can be replaced. > > Rgds, > D.Saravanaraj
