Hi Tamanjit, I thought I had sent this message eariler but obviously not, apologies about this. I don't seem to be able to post to user@ when replying to your mail so this is the reason you may or may not have recieved replies.
A couple of things here which spring to mind. Before I cover these, it is usually helpful to include the threads of previous posts so we can see what progress (if any) has been made and what kind of suggestions have been previously advised. 1 )Did you manually delete documents fro, your Solr index? We have commands available in newer versions of Nutch to improve the quality of our Solr index in a more effective way e.g. solrdedup, solrclean. Have you been using any of these? 2) In a situation like this (where we have a partiular URL we wish to know information about), I have found it beneficial to use the command line options. The documentation we have for Nutch <1.2 can be found here [1] and for Nutch 1.3 here [2]. Using various reader classes we are able to dump information about whole crawldb/linkdb or alternatively pass parameters for individual links... in this case I think this is what you are after. This also enables us to understand the actions Nutch is taking when undertaking your breadth first crawl of the web graph. 3) When you say that it fetched a lot of sites in the index, do you mean in the site-map? If this is the case then maybe you need to increase the http.content.limit or something similar within your nutch-site.xml as anything above this value will be truncated, outlinks will not be included etc etc. This is also another reason to use the read commands to view what different configuartion options give us when undertaking this type of crawl. 4) You may also wish to take a look at the time.limits between successive fetches of URLs within nutch-site. This may alter your results for obtaining various links in the site-map page you mentioned. [1] http://wiki.apache.org/nutch/08CommandLineOptions [2] http://wiki.apache.org/nutch/CommandLineOptions (please note this is under construction) -- *Lewis*

