Re: Help needed to crawl webpages

Otis Gospodnetic Mon, 18 Feb 2008 09:50:01 -0800

It sounds like you really want to create a simplistic crawler for something 
that small.  Nutch does a *pile* of other stuff that you don't seem to care 
about.  Google for: open source web crawlers .  I think there is one called 
Sphynx that is simple.


Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

----- Original Message ----
> From: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
> To: [email protected]
> Sent: Monday, February 18, 2008 1:53:35 AM
> Subject: Help needed to crawl webpages
> 
> Hi All,
> 
> I am using nutch 0.9.
> I want to crawl the webpage in a manner that it should give me the no.
> of links and the corresponding links in that webpage.
> But nutch is doing all the things like creating webdb, a set of
> segments, and the index.
> I have to calculate the time that how much time nutch is taking to crawl
> a webpage in comparison to other crawlers.
> For example,
> 
>     Input -     http://localhost:8080/HTML/1.html
>     output -   no. of links in 1.html
> 
> I want to achieve this functionality, can it be possible with nutch.
> 
> Thanks & Regards,
> Naveen Goswami
> 91 9899547886
> 
> 
> The information contained in this electronic message and any attachments to 
> this 
> message are intended for the exclusive use of the addressee(s) and may 
> contain 
> proprietary, confidential or privileged information. If you are not the 
> intended 
> recipient, you should not disseminate, distribute or copy this e-mail. Please 
> notify the sender immediately and destroy all copies of this message and any 
> attachments.
> 
> WARNING: Computer viruses can be transmitted via email. The recipient should 
> check this email and any attachments for the presence of viruses. The company 
> accepts no liability for any damage caused by any virus transmitted by this 
> email.
> 
> www.wipro.com
> 
>

Re: Help needed to crawl webpages

Reply via email to