Hi: I had a problem with not getting relative URLs. Have you played with the db.max.outlinks.per.page property? Check out this discussion: http://www.mail-archive.com/[email protected]/msg08665.html
----- Original Message ---- From: Raphael A. Bauer <[EMAIL PROTECTED]> To: [email protected] Sent: Thursday, August 9, 2007 7:12:32 AM Subject: Re: Relative Links Problem IS ALSO +escape(document.referrer)+ Raphael A. Bauer wrote: > i am currently doing a > > "nutch crawl urls -dir crawl -depth 10" > > - pretty much what is described in the tutorial. and in fact everything > works. > > the only problem is that relative links - say <a href="../XYZ"> > are not crawled and cannot be searched, what is quite a problem for me. > > is there an option i am missing out - or any suggestions how i can fix > this issue? hi, just to bring the question up again. i am still searching for a solution to my problem that the nutch crawl tools does not crawl relative links. it states: fetching http://url/+escape(document.referrer)+ and does not investigate into those html page any further. so - maybe my question is way too stupid (RTFM - arg.. i read it ;) ), or the solution is too simple to tell - in either case i really would appreciate any statement regarding my problem. is there a switch to enable this? something i've missed? there is no problem reimplemeting the fetch code - but i don't want to write the code twice. thanks again! ra ____________________________________________________________________________________ Moody friends. Drama queens. Your life? Nope! - their life, your story. Play Sims Stories at Yahoo! Games. http://sims.yahoo.com/
