Hi i am using nutch to index files in local FS and FTP. my filter file is
-^(http|ftp|mailto): -\.(gif|GIF|jpg|JPG|ico|ICO|css|sit|eps|wmf|zip|mpg|gz|rpm|tgz|mov|MOV|exe|png|PNG|jar)$ [EMAIL PROTECTED] -.*(/.+?)/.*?\1/.*?\1/ +^file:/E:/Index Samples/ -^file:/E:/Index Samples/Index/ but nutch crawls the forbidden folders also. is there a web db kind of thing for files also. is it possible to make nutch to index files based on the last modified date. can anybody suggest the datastructure for webdb (filedb??) for files. it will be good to group files and create seperate segements for each group. so if some files are changed, only those segments can be replaced. Rgds, D.Saravanaraj
