Hi,
I have some questions:
1) There are someone that know the limitations of nutch?
2) I have a site with frames of servlet , It is possible to crawl this page?
We see also that if the frame is a html page ,nutch-crawler works, instead if
the frame is a servlet ,nutch-crawler doesn't work.
Plea
Hi,
thank you for your hints but I didn' give you the following information:
I modified the file crawl-urlfilter.txt in this mode:
#start crawl-urlfilter
# skip file:, ftp:, & mailto: urls
-^(file|ftp|mailto):
# skip image and other suffixes we can't yet parse
-\.(gif|GIF|jpg|JPG|ico|ICO|css|sit