Hi d_k! Yes, I am indexing them using Solr.
Solr is also running on the same server on port 8983. I plan to perform the
crawl every 30 days to update the old crawled data and to crawl any new
sites.


On Wed, Feb 12, 2014 at 1:37 AM, d_k <[email protected]> wrote:

> What are you doing with the crawled data?
> If you index it using solr then you can open the port solr is listening on
> and run nutch on a server without a firewall and have it send the documents
> to the solr behind your firewall using the port you opened.
>
> Is it a one time crawl? How often do you plan to perform the crawl?
>
>
> On Wed, Feb 12, 2014 at 3:53 AM, A Laxmi <[email protected]> wrote:
>
> > Hi,
> >
> > I installed Nutch 2.2.1 on a server that is restricted by firewall to
> > access internet. I tried to run my first crawl on that server, I started
> > getting timedout errors and the crawl was getting hung. So, firewall was
> > actually blocking my nutch crawler to crawl any site. I verified that
> with
> > hosting admin and they mentioned firewall does block the crawler from
> > crawling websites.
> >
> > I am not sure how I go about getting nutch to crawl websites in such a
> > firewall restricted environment? Please suggest
> >
> > Thanks!
> >
>

Reply via email to