Hello all,
I need to remove from nutch some urls which are marked with status
"db_gone".

I already removed these urls from crawldb using:
- I have specified a filter in regex-urlfilter.txt to remove these urls.
- bin/nutch mergedb crawl/crawldb2 crawl/crawldb -filter
- mv crawl/crawldb2 crawl/crawldb

What I want to know is if should I remove this urls from anywhere else.(exp:
should do anything with linkdb or segments? )


Thanks in advance,
Marseld Dedgjonaj



<p class="MsoNormal"><span style="color: rgb(31, 73, 125);">Gjeni <b>Pun&euml; 
t&euml; Mir&euml;</b> dhe <b>t&euml; Mir&euml; p&euml;r Pun&euml;</b>... 
Vizitoni: <a target="_blank" 
href="http://www.punaime.al/";>www.punaime.al</a></span></p>
<p><a target="_blank" href="http://www.punaime.al/";><span 
style="text-decoration: none;"><img width="165" height="31" border="0" 
alt="punaime" src="http://www.ikub.al/images/punaime.al_small.png"; 
/></span></a></p>


Reply via email to