Hi:  I had a problem with not getting relative URLs.  Have you played with the 
db.max.outlinks.per.page property?  Check out this discussion:
http://www.mail-archive.com/[email protected]/msg08665.html


----- Original Message ----
From: Raphael A. Bauer <[EMAIL PROTECTED]>
To: [email protected]
Sent: Thursday, August 9, 2007 7:12:32 AM
Subject: Re: Relative Links Problem IS ALSO +escape(document.referrer)+

Raphael A. Bauer wrote:
> i am currently doing a
> 
> "nutch crawl urls -dir crawl -depth 10"
> 
> - pretty much what is described in the tutorial. and in fact everything 
> works.
> 
> the only problem is that relative links - say <a href="../XYZ">
> are not crawled and cannot be searched, what is quite a problem for me.
> 
> is there an option i am missing out - or any suggestions how i can fix 
> this issue?
hi,

just to bring the question up again. i am still searching for a solution 
to my problem that the nutch crawl tools does not crawl relative links.

it states:
fetching http://url/+escape(document.referrer)+ and does not investigate 
into those html page any further.

so - maybe my question is way too stupid (RTFM - arg.. i read it ;) ), 
or the solution is too simple to tell - in either case i really would 
appreciate any statement regarding my problem. is there a switch to 
enable this?  something i've missed?

there is no problem reimplemeting the fetch code - but i don't want to 
write the code twice.

thanks again!

ra












       
____________________________________________________________________________________
Moody friends. Drama queens. Your life? Nope! - their life, your story. Play 
Sims Stories at Yahoo! Games.
http://sims.yahoo.com/  

Reply via email to