Hi all,
I am new to Nutch, and have two questions that I couldn't find the answer via
the web pages and configuration files. May I kindly ask you to give me some
suggestions/hints on them please? many thanks in advance.
1) Can I control the actually crawl speed? E.g. If I know a web site would
block me if I send one request per two seconds, can I control to make sure
nutch wouldn't crawl faster than that? and How?
2) Can nutch send HTTP Post for each crawl request (Not for authentication
purpose)? Some web sites require the requests send via http post instead of
http get.