Hi Stefan, As I understand, when you use 'nutch generate' to generate fetch list, it doesn't call urlfilter. Only in 'nutch updatedb' and 'nutch fetch' it does call urlfilter. So the page after 30 days will be generated even if you use url filter to filter it.
Best regards, Keren --- Stefan Groschupf <[EMAIL PROTECTED]> wrote: > not if you filter it in the url filter. > There is a database based url filter I think in the > jira somewhere > somehow, this can help to filter larger lists of > urls. > > Am 03.02.2006 um 21:35 schrieb Keren Yu: > > > Hi Stefan, > > > > Thank you. You are right. I have to use a url > filter > > and remove it from the index. But after 30 days > later, > > the page will be generated again in generating > fetch > > list. > > > > Thanks, > > Keren > > > > --- Stefan Groschupf <[EMAIL PROTECTED]> wrote: > > > >> And also it makes no sense, since it will come > back > >> as soon the link > >> is found on a page. > >> Use a url filter instead and remove it from the > >> index. > >> Removing from webdb makes no sense. > >> > >> Am 03.02.2006 um 21:27 schrieb Keren Yu: > >> > >>> Hi everyone, > >>> > >>> It took about 10 minutes to remove a page from > >> WEBDB > >>> using WebDBWriter. Does anyone know other method > >> to > >>> remove a page, which is faster. > >>> > >>> Thanks, > >>> Keren > >>> > >>> > __________________________________________________ > >>> Do You Yahoo!? > >>> Tired of spam? Yahoo! Mail has the best spam > >> protection around > >>> http://mail.yahoo.com > >>> > >> > >> > > > > > > __________________________________________________ > > Do You Yahoo!? > > Tired of spam? Yahoo! Mail has the best spam > protection around > > http://mail.yahoo.com > > > > __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com ------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Do you grep through log files for problems? Stop! Download the new AJAX search engine that makes searching your log files as easy as surfing the web. DOWNLOAD SPLUNK! http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642 _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
