Re: indexing folders with nutch

Lourival Júnior Fri, 01 Sep 2006 04:45:23 -0700

Yes Cam, if you use a depth 1 you will crawl only the first document. With a
depth 2 you will crawl the first document and all the links found on this
document. With depth 3, you will crawl the first one, its links and all
links found in cycle 2. And so on. Increasing you depth will increasing your
WebDB too. Try it ;)


Regards

On 8/31/06, Sandy Polanski <[EMAIL PROTECTED]> wrote:

Cam, try increasing the depth and see what happens.
It seems that logic would say that they're on the same
directory depth/level; however, just give it a try
because I ran into a similar problem, and if I'm not
mistaken, that fixed it.

--- Cam Bazz <[EMAIL PROTECTED]> wrote:

> Hello,
>
> I have a problem. I tried to index some localfiles
> with nutch.
>
> What I have done is put them in a local apache
> server, (html files)
> and create a urls file that contains
> http://localhost/file01.html etc.
>
> then I do a nutch crawl urls . -dir crawl -depth 1
>
> but the crawl stales after a while, and nothing
> happens.
>
> I also tried -topN 10000
>
> is not there a more convinient way of indexing from
> file system?
>
> Best regards,
> -C.B.
>

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around
http://mail.yahoo.com




--
Lourival Junior
Universidade Federal do Pará
Curso de Bacharelado em Sistemas de Informação
http://www.ufpa.br/cbsi
Msn: [EMAIL PROTECTED]

Re: indexing folders with nutch

Reply via email to