Re: Using nutch as a web crawler

Lourival Júnior Thu, 05 Apr 2007 05:31:15 -0700

Nutch has a file called crawl-urlfilter.txt where you can set your site
domain or site list, so nutch will only crawl this list. Download nutch and
see it working, is better for you :). Take a look:
http://lucene.apache.org/nutch/tutorial8.html


Regards,

On 4/5/07, Meryl Silverburgh <[EMAIL PROTECTED]> wrote:


Thanks. Can you please tell me how can I plugin in my own handling
when nutch sees a site instead of building the search database for
that site?



On 4/3/07, Lourival Júnior <[EMAIL PROTECTED]> wrote:
> I have total certainty that nutch is what are you looking for. Take a
look
> to nutch's documentation for more details and you will see :).
>
> On 4/3/07, Meryl Silverburgh <[EMAIL PROTECTED]> wrote:
> >
> > Hi,
> >
> > I would like to know if know if it is a good idea to use nutch web
> > carwler?
> > Basically, this is what I need:
> > 1. I have a list of web site
> > 2. I want the web crawler to go thru each site, parser the anchor. if
> > it is the same domain, go thru the same step for 3 level.
> > 3. For each link, write to a new file.
> >
> > Is nutch a good solution? or there is other better open source
> > alternative for my purpose?
> >
> > Thank you.
> >
>
>
>
> --
> Lourival Junior
> Universidade Federal do Pará
> Curso de Bacharelado em Sistemas de Informação
> http://www.ufpa.br/cbsi
> Msn: [EMAIL PROTECTED]
>




--
Lourival Junior
Universidade Federal do Pará
Curso de Bacharelado em Sistemas de Informação
http://www.ufpa.br/cbsi
Msn: [EMAIL PROTECTED]

Re: Using nutch as a web crawler

Reply via email to