Hi,
 
I have used Nutch to crawl four hosts and the four host names correspond to the same IP address. I used the WebDBReader to get the dump links of URLs. Why it found the unabsolute links (pages in one host have links to pages in other hosts).
 
For example:
 
from http://www.l3s.uni-hannover.de/morob/Galleries/ER1/pages/09_DSCF0492.html
 to http://www.l3s.uni-hannover.de/morob/Galleries/ER1/index.html
 to http://www.l3s.uni-hannover.de/morob/Galleries/ER1/pages/08_DSCF0493.html
 to http://www.l3s.uni-hannover.de/morob/Galleries/ER1/pages/10_DSCF0499.html
 to http://www.learninglab.de/morob/Galleries/ER1/index.html
 to http://www.learninglab.de/morob/Galleries/ER1/pages/08_DSCF0493.html
 to http://www.learninglab.de/morob/Galleries/ER1/pages/10_DSCF0499.html
 to http://www.learninglab.uni-hannover.de/morob/Galleries/ER1/index.html
 to http://www.learninglab.uni-hannover.de/morob/Galleries/ER1/pages/08_DSCF0493.html
 to http://www.learninglab.uni-hannover.de/morob/Galleries/ER1/pages/10_DSCF0499.html
ragards,
Niti

__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com

Reply via email to