Hey, currently I'm working on nutch with solr for our company pages.
Assuming the following situation: We have a website: www.mysite.lol<http://www.mysite.lol> at this site there is a Link: www.mysite.lol/tespage/3512-1564/<http://www.mysite.lol/tespage/3512-1564/> As you can see there is a type I should be /testpage/: www.mysite.lol/testpage/3512-1564/<http://www.mysite.lol/testpage/3512-1564/> As our Framework doesn't care about the text before the ID, we could type everything we want and the site will be displayed because of the id. That is why both link are fine and there is no 404. If I change the link from the mainpage to the correct one, let nutch crawl the site again, an send is to solr, the old one is still found. So the link www.mysite.lol/tespage/3512-1564/<http://www.mysite.lol/tespage/3512-1564/> is still at the nutch db, because the link is valid --> no 404. But there is no mainpage pointing to this website. How do I tell nutch to ignore sites, which doesn't have a link to it. Basically --> revalidating links and removing site without links to it? Mit freundlichen Grüßen David Kumar Senior Software Engineer Java, B. Sc. Projektmanager PIM Abteilung Infotech TechniSat Digital GmbH Julius-Saxler-Straße 3 TechniPark D-54550 Daun / Germany Tel.: + 49 (0) 6592 / 712 -2826 Fax: + 49 (0) 6592 / 712 -2829 www.technisat.com/de_DE/<http://www.technisat.com/de_DE/> www.facebook.com/technisat

