Thanks for pointing out, Kiran. My bad I overlooked it.

I'm trying hard to authenticate with our proxy but always ending up with HTTP 
407.

My conf/nutch-site.xml has the http.proxy.host, http.proxy.port, 
http.proxy.username, http.proxy.password values set correctly.
The plugin.includes has the following:
<property>
  <name>plugin.includes</name>
  
<value>protocol-httpclient|urlfilter-regex|parse-(html|tika)|index-(basic|anchor)|scoring-opic|urlnormalizer-(pass|regex|basic)</value>
  <description>Regular expression naming plugin directory names to
  include.  Any plugin not matching this expression is excluded.
  In any case you need at least include the nutch-extensionpoints plugin. By
  default Nutch includes crawling just HTML and plain text via HTTP,
  and basic indexing and search plugins. In order to use HTTPS please enable
  protocol-httpclient, but be aware of possible intermittent problems with the
  underlying commons-httpclient library.
  </description>
</property>

Still, even google.com returns 407.. Any ideas?

Thank you
Suresh.


-----Original Message-----
From: kiran chitturi [mailto:[email protected]] 
Sent: Monday, June 03, 2013 10:44 AM
To: [email protected]
Subject: Re: Nutch not crawling fully

> fetch of http://www.igate.com/ failed with: Http code=407, url= 
> http://www.igate.com <http://www.igate.com/ -finishing>


Hi Suresh,

The url is never successfully fetched. The http error code 407 is thrown away. 
That is the reason it is in unfetched status.

>
>
>
>
> dwbilab01@dwbilab01-OptiPlex-990:~/apache-nutch-1.6$ bin/nutch readdb 
> mondaycrawl/crawldb/ -stats CrawlDb statistics start: 
> mondaycrawl/crawldb/ Statistics for CrawlDb: mondaycrawl/crawldb/
> TOTAL urls:     1
> retry 1:        1
> min score:      1.0
> avg score:      1.0
> max score:      1.0
> status 1 (db_unfetched):        1
> CrawlDb statistics: done
>
>
>
>
>
>
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~Disclaimer~~
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> Information contained and transmitted by this e-mail is confidential 
> and proprietary to iGATE and its affiliates and is intended for use 
> only by the recipient. If you are not the intended recipient, you are 
> hereby notified that any dissemination, distribution, copying or use 
> of this e-mail is strictly prohibited and you are requested to delete 
> this e-mail immediately and notify the originator or [email protected] 
> <mailto:
> [email protected]>. iGATE does not enter into any agreement with any 
> party by e-mail. Any views expressed by an individual do not 
> necessarily reflect the view of iGATE. iGATE is not responsible for 
> the consequences of any actions taken on the basis of information provided, 
> through this email.
> The contents of an attachment to this e-mail may contain software 
> viruses, which could damage your own computer system. While iGATE has 
> taken every reasonable precaution to minimise this risk, we cannot 
> accept liability for any damage which you sustain as a result of 
> software viruses. You should carry out your own virus checks before 
> opening an attachment. To know more about iGATE please visit www.igate.com 
> <http://www.igate.com>.
>
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
>



--
Kiran Chitturi

<http://www.linkedin.com/in/kiranchitturi>

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~Disclaimer~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Information contained and transmitted by this e-mail is confidential and 
proprietary to iGATE and its affiliates and is intended for use only by the 
recipient. If you are not the intended recipient, you are hereby notified that 
any dissemination, distribution, copying or use of this e-mail is strictly 
prohibited and you are requested to delete this e-mail immediately and notify 
the originator or [email protected] <mailto:[email protected]>. iGATE does 
not enter into any agreement with any party by e-mail. Any views expressed by 
an individual do not necessarily reflect the view of iGATE. iGATE is not 
responsible for the consequences of any actions taken on the basis of 
information provided, through this email. The contents of an attachment to this 
e-mail may contain software viruses, which could damage your own computer 
system. While iGATE has taken every reasonable precaution to minimise this 
risk, we cannot accept liability for any damage which you sustain as a result 
of software viruses. You should carry out your own virus checks before opening 
an attachment. To know more about iGATE please visit www.igate.com 
<http://www.igate.com>.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Reply via email to