Hi

Nutch cannot do this by default and is tricky to make because there may not be 
one unique referrer per page. What you can try is to add the referrer to 
outlinks when parsing records. This outlink can be added to CrawlDatum's 
MetaData which you can then later use to set the referrer. To set the referrer 
you must hack your protocol plugin to add the heder.

Cheers

 
 
-----Original message-----
> From:SebaZ <[email protected]>
> Sent: Wed 06-Jun-2012 13:37
> To: [email protected]
> Subject: HTTP REFERER is missing
> 
> I have succesfully implemented NUTCH as crawler for SOLR index on 
> http://szukaj.ug.edu.pl http://szukaj.ug.edu.pl  site. But there is some
> problem with HTTP REFERER. Nutch is not sending referer header when crawling
> sites. 
> 
> Is it possible to order NUTCH to send referer header on request?
> 
> Scenario:
> 1. Nutch open www.domain.pl
> 2. Nutch founds www.domain.pl/abcd.pdf link.
> 3. Nutch requested www.domain.pl/abcd.pdf but without
> HTTP_REFERER=www.domain.pl
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/HTTP-REFERER-is-missing-tp3987967.html
> Sent from the Nutch - User mailing list archive at Nabble.com.
> 

Reply via email to