Hi i am using nutch to index files in local FS and FTP.

my filter file is

-^(http|ftp|mailto):
-\.(gif|GIF|jpg|JPG|ico|ICO|css|sit|eps|wmf|zip|mpg|gz|rpm|tgz|mov|MOV|exe|png|PNG|jar)$
[EMAIL PROTECTED]
-.*(/.+?)/.*?\1/.*?\1/
+^file:/E:/Index Samples/
-^file:/E:/Index Samples/Index/

but nutch crawls the forbidden folders also. is there a web db kind of thing
for files also. is it possible to make nutch to index files based on the
last modified date.

can anybody suggest the datastructure for webdb (filedb??) for files. it
will be good to group files and create seperate segements for each group. so
if some files are changed, only those segments can be replaced.

Rgds,
D.Saravanaraj

Reply via email to