What are you doing with the crawled data? If you index it using solr then you can open the port solr is listening on and run nutch on a server without a firewall and have it send the documents to the solr behind your firewall using the port you opened.
Is it a one time crawl? How often do you plan to perform the crawl? On Wed, Feb 12, 2014 at 3:53 AM, A Laxmi <[email protected]> wrote: > Hi, > > I installed Nutch 2.2.1 on a server that is restricted by firewall to > access internet. I tried to run my first crawl on that server, I started > getting timedout errors and the crawl was getting hung. So, firewall was > actually blocking my nutch crawler to crawl any site. I verified that with > hosting admin and they mentioned firewall does block the crawler from > crawling websites. > > I am not sure how I go about getting nutch to crawl websites in such a > firewall restricted environment? Please suggest > > Thanks! >

