Re: [Nutch-general] crawling a certain site

Lukas Vlcek Tue, 01 Aug 2006 22:11:27 -0700

Hi,

IMHO Nutch doesn't directly supports this now. However, I can imagine
you can generate bunch of urls and inject them prior to crawling.
Other strategy would be having some kind of "site map" page which
would contain all these links anyway. But in that case you need to be
sure that nutch will extract all of these links from a single page (or
you can break site map into several html pages).


Anyway, I think you would need to write some code (be it directly for
nutch or for the web in question).

Regards,
Lukas

On 8/1/06, Cam Bazz <[EMAIL PROTECTED]> wrote:
> Hello,
>
> lets say I have a site with some data. the urls are like
> http://localhost/do.asp?id=1 to http://localhost/do.asp?id=1000000
>
> I do not want to crawl using the url links, but rather crawl in an
> iterative process from id=1 to id=n ?
>
> how can I do this?
>
> Best Regards,
> -C.B.
>

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys -- and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Re: [Nutch-general] crawling a certain site

Reply via email to