It sounds like you really want to create a simplistic crawler for something that small. Nutch does a *pile* of other stuff that you don't seem to care about. Google for: open source web crawlers . I think there is one called Sphynx that is simple.
Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch ----- Original Message ---- > From: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]> > To: [email protected] > Sent: Monday, February 18, 2008 1:53:35 AM > Subject: Help needed to crawl webpages > > Hi All, > > I am using nutch 0.9. > I want to crawl the webpage in a manner that it should give me the no. > of links and the corresponding links in that webpage. > But nutch is doing all the things like creating webdb, a set of > segments, and the index. > I have to calculate the time that how much time nutch is taking to crawl > a webpage in comparison to other crawlers. > For example, > > Input - http://localhost:8080/HTML/1.html > output - no. of links in 1.html > > I want to achieve this functionality, can it be possible with nutch. > > Thanks & Regards, > Naveen Goswami > 91 9899547886 > > > The information contained in this electronic message and any attachments to > this > message are intended for the exclusive use of the addressee(s) and may > contain > proprietary, confidential or privileged information. If you are not the > intended > recipient, you should not disseminate, distribute or copy this e-mail. Please > notify the sender immediately and destroy all copies of this message and any > attachments. > > WARNING: Computer viruses can be transmitted via email. The recipient should > check this email and any attachments for the presence of viruses. The company > accepts no liability for any damage caused by any virus transmitted by this > email. > > www.wipro.com > >
