Re: Iterative Crawling

Dat Tran Mon, 18 Mar 2013 18:25:37 -0700

Tejas Patil wrote
> On Thu, Mar 14, 2013 at 7:36 PM, Dat Tran &lt;

> tranquydat@


> &gt; wrote:
> 
>> Update: When i remove or add new url in urls.txt( seed list), it is
>> strange
>> the crawling result is not changed. It means  nutch crawls always the
>> first
>> seed list.
> 
> 
> You mean the older copy of seeds file is used effectively and the updates
> you did not got reflected ?
> What command are you using to run the crawler ?
> Yes i used the older copy of seeds file, updated and did not got
> reflected. Even i uses the other seed file, it is not reflected.
> To run the crawler i execute the command :  bin/nutch parse  urls ( urls
> is the directory where seed file located)
> This problem is caused by any temporary file ? How can i resolve
>> it? where can i find temporary file of nutch
>>
> 
> In shell, you can use "ls -a" to see the hidden files. That way you can
> find if there are hidden backup files in your seeds directory.
> 
>>
>>
>> --
>> View this message in context:
>> http://lucene.472066.n3.nabble.com/Iterative-Crawling-tp4046501p4047572.html
>> Sent from the Nutch - User mailing list archive at Nabble.com.
>>





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Iterative-Crawling-tp4046501p4047643.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Re: Iterative Crawling

Reply via email to