Hello Lewis,

Pardon me for the non-verbose desription. I have a set of urls, namely
product urls, in range of millions.

So I want to write my urls, in a flat file, and have nutch crawl them
to depth = 1

However, I might remove url's from this list, or add new ones. I also
would like nutch to revisit each site each 1 day.

I would like removed urls to be deleted, and new ones to be reinjected
each time nutch starts.

Best Regards,
-C.B.

On Thu, Jul 7, 2011 at 6:21 PM, lewis john mcgibbney
<[email protected]> wrote:
> Hi C.B.,
>
> This is way to vague. We really require more information regarding roughly
> what kind of results you wish to get. It would be a near impossible task for
> anyone to try and specify a solution to this open ended question.
>
> Please elaborate
>
> Thank you
>
> On Thu, Jul 7, 2011 at 12:56 PM, Cam Bazz <[email protected]> wrote:
>
>> Hello,
>>
>> I have a case where I need to crawl a list of exact url's. Somewhere
>> in the range of 1 to 1.5M urls.
>>
>> I have written those urls in numereus files under /home/urls , ie
>> /home/urls/1 /home/urls/2
>>
>> Then by using the crawl command I am crawling to depth=1
>>
>> Are there any recomendations or general guidelines that I should
>> follow when making nutch just to fetch and index a list of urls?
>>
>>
>> Best Regards,
>> C.B.
>>
>
>
>
> --
> *Lewis*
>

Reply via email to