I need of your help.
I want make the index of a directory in filessystem.
What I can modificed?
Maybe the file crawl-urlfilter.txt ?
Only this file?
I write the file "crawl-urlfilter.txt" in this mode:
# Creative Commnons crawl filter
# Each non-comment, non-blank line contains a regular expression
# prefixed by '+' or '-'. The first matching pattern in the file
# determines whether a URL is included or ignored. If no pattern
# matches, the URL is ignored.
# skip file:, ftp:, & mailto: urls
-^(http|ftp|mailto):
# skip image and other suffixes we can't yet parse
-\.(gif|GIF|jpg|JPG|ico|ICO|css|sit|eps|wmf|rtf|zip|ppt|mpg|xls|gz|rpm|tgz|mov|MOV|exe)$
# skip URLs containing certain characters as probable queries, etc.
[EMAIL PROTECTED]
#+[?&=%]
[EMAIL PROTECTED]
#URLs VALIDE
+^file:///usr/Proventi2/([a-z0-9]*\.)/
# accept anything else
+.*
it is ok? what I do?
please answer me, it is very important for me!
help help!!!
Adriano Palombo
----------------------------------------------------------------
This message was sent using IMP, the Internet Messaging Program.
-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems? Stop! Download the new AJAX search engine that makes
searching your log files as easy as surfing the web. DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general