Markus Jelsma-2 wrote > > Nutch cannot do this by default and is tricky to make because there may > not be one unique referrer per page. > I don't realy need unique referrer. All I want is to inform requested server on which URL crawler found the link.
There is some site which admin informed me that he has a lot of 404 errors on logs from my Search server. Crawler is opening weard urls like http://www.domain.com/~tdz/sbd/zabezpieczanie_baz.pdf;O=A but it should be http://www.domain.com/~tdz/sbd/zabezpieczanie_baz.pdf, without *;O=A*. I was searching linkdb and it don't have any information about this good and bad url. Without Referrer I can't find on which site is wrong link or code directing to wrong urls. Markus Jelsma-2 wrote > > What you can try is to add the referrer to outlinks when parsing records. > This outlink can be added to CrawlDatum's MetaData which you can then > later use to set the referrer. To set the referrer you must hack Can you help me with it a little bit? Can I do it in configuration of Nutch? I am not good at JAVA programming also. I'm using Nutch as a crawler app only. I was trying to find exact file/code where I can change it (http plugin) but I didn't find any solution. Regards SZ -- View this message in context: http://lucene.472066.n3.nabble.com/HTTP-REFERER-is-missing-tp3987967p3990533.html Sent from the Nutch - User mailing list archive at Nabble.com.

