Re: Nutch 2.2.1 crawler cannot progress due to restriction from firewall

d_k Tue, 11 Feb 2014 22:38:53 -0800

What are you doing with the crawled data?
If you index it using solr then you can open the port solr is listening on
and run nutch on a server without a firewall and have it send the documents
to the solr behind your firewall using the port you opened.


Is it a one time crawl? How often do you plan to perform the crawl?


On Wed, Feb 12, 2014 at 3:53 AM, A Laxmi <[email protected]> wrote:

> Hi,
>
> I installed Nutch 2.2.1 on a server that is restricted by firewall to
> access internet. I tried to run my first crawl on that server, I started
> getting timedout errors and the crawl was getting hung. So, firewall was
> actually blocking my nutch crawler to crawl any site. I verified that with
> hosting admin and they mentioned firewall does block the crawler from
> crawling websites.
>
> I am not sure how I go about getting nutch to crawl websites in such a
> firewall restricted environment? Please suggest
>
> Thanks!
>

Re: Nutch 2.2.1 crawler cannot progress due to restriction from firewall

Reply via email to