Markus Jelsma-2 wrote
> 
> Nutch cannot do this by default and is tricky to make because there may
> not be one unique referrer per page.
> 
I don't realy need unique referrer. All I want is to inform requested server
on which URL crawler found the link.

There is some site which admin informed me that he has a lot of 404 errors
on logs from my Search server. Crawler is opening weard urls like
http://www.domain.com/~tdz/sbd/zabezpieczanie_baz.pdf;O=A but it should be
http://www.domain.com/~tdz/sbd/zabezpieczanie_baz.pdf, without *;O=A*. I was
searching linkdb and it don't have any information about this good and bad
url. Without Referrer I can't find on which site is wrong link or code
directing to wrong urls.



Markus Jelsma-2 wrote
> 
> What you can try is to add the referrer to outlinks when parsing records.
> This outlink can be added to CrawlDatum's MetaData which you can then
> later use to set the referrer. To set the referrer you must hack
Can you help me with it a little bit? Can I do it in configuration of Nutch?
I am not good at JAVA programming also. I'm using Nutch as a crawler app
only. I was trying to find exact file/code where I can change it (http
plugin) but I didn't find any solution.


Regards
SZ


--
View this message in context: 
http://lucene.472066.n3.nabble.com/HTTP-REFERER-is-missing-tp3987967p3990533.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Reply via email to